• Aucun résultat trouvé

Web Services 
 and Open Data

N/A
N/A
Protected

Academic year: 2022

Partager "Web Services 
 and Open Data"

Copied!
76
0
0

Texte intégral

(1)

Web Services 
 and Open Data

Sébastien Tixeuil

sebastien.Tixeuil@lip6.fr

Thanks to Lélia Blin, Quentin Bramas, Fabien Mathieu

(2)

Web Services

(3)

What is a Web Service?

A Web Service is a method of communication between two programs over the Web.

HTTP is the typical protocol used to communicate via

Web Services.

(4)

What is a Web Service?

Request

Response

Client HTTP Server

(5)

What is a Web Service?

Request

Response

Client HTTP Server

XML:

XML:

<id>5</id>

<note id=‘5’>

<to>Tove</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget the diner</body>

</note>

(6)

What is a Web Service?

Request

Response

Client HTTP Server

SOAP:

SOAP:

<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">

<soap:Header>

</soap:Header>

<soap:Body>

<m:GetStockPrice xmlns:m="http://www.example.org/stock/Surya">

<m:StockName>IBM</m:StockName>

</m:GetStockPrice>

</soap:Body>

</soap:Envelope>

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">

<soap:Header>

<ResponseHeader xmlns="https://www.google.com/apis/ads/publisher/v201508">

<requestId>xxxxxxxxxxxxxxxxxxxx</requestId>

<responseTime>1063</responseTime>

</ResponseHeader>

</soap:Header>

<soap:Body>

<getAdUnitsByStatementResponse xmlns="https://www.google.com/apis/ads/publisher/v201508">

<rval>

<totalResultSetSize>1</totalResultSetSize>

<startIndex>0</startIndex>

<results>

<id>2372</id>

<name>RootAdUnit</name>

<description></description>

<targetWindow>TOP</targetWindow>

<status>ACTIVE</status>

<adUnitCode>1002372</adUnitCode>

<inheritedAdSenseSettings>

<value>

<adSenseEnabled>true</adSenseEnabled>

<borderColor>FFFFFF</borderColor>

(7)

What is a Web Service?

Request

Response

Client HTTP Server

Url Encoded:

JSON:

order=date&limit=2

{

"data": [{

"id": 1001, "name": "Jim"

}, {

"id": 1002, "name": "Matt"

}]

}

(8)

API Business Model

(9)

REST Web API

Is a web service using simpler REpresentational State Transfer (REST) based communication.

Request is just a HTTP Method over an URI.


Response is typically JSON or XML.

Example:

GET : http://pokeapi.co/api/v1/pokemon/25

HTTP Method URI that represents a resource

base URL of the API API version

(10)

REST Web API Call Example

{


"name": "Pikachu",
 "attack": 55,


"abilities": [
 {


"name": "static",


"resource_uri": "/api/v1/ability/9/"


},
 {


"name": "lightningrod",


"resource_uri": "/api/v1/ability/31/"


}
 ]
 } GET /api/v1/pokemon/25/ HTTP/1.1

Host: pokeapi.co

Connection: keep-alive Pragma: no-cache

Cache-Control: no-cache

Accept: application/json,;q=0.9,*/

*;q=0.8

Accept-Encoding: gzip, deflate, sdch

HTTP Request Headers

HTTP Response Headers

HTTP/1.1 200 OK

Server: nginx/1.1.19

Date: Fri, 08 Jan 2016 13:10:08 GMT Content-Type: application/json

Transfer-Encoding: chunked Connection: keep-alive

Vary: Accept

X-Frame-Options: SAMEORIGIN

Cache-Control: s-maxage=360, max-age=360

HTTP Response Body

(11)

REST Web API Call Example

{


"name": "Pikachu",
 "attack": 55,


"abilities": [
 {


"name": "static",


"resource_uri": "/api/v1/ability/9/"


},
 {


"name": "lightningrod",


"resource_uri": "/api/v1/ability/31/"


}
 ]
 } GET /api/v1/pokemon/25/ HTTP/1.1

Host: pokeapi.co

Connection: keep-alive Pragma: no-cache

Cache-Control: no-cache

Accept: application/json,;q=0.9,*/

*;q=0.8

Accept-Encoding: gzip, deflate, sdch

HTTP Request Headers

HTTP Response Headers

HTTP/1.1 200 OK

Server: nginx/1.1.19

Date: Fri, 08 Jan 2016 13:10:08 GMT Content-Type: application/json

Transfer-Encoding: chunked Connection: keep-alive

Vary: Accept

X-Frame-Options: SAMEORIGIN

Cache-Control: s-maxage=360, max-age=360

HTTP Response Body

(12)

REST: Architectural Properties

Simplicity of a uniform interface

Modifiability of components to meet changing needs (even while the application is running)

Visibility of communication between components by service agents

Portability of components by moving program code with the data

Reliability in the resistance to failure at the system level in the

presence of of failures within components, connectors, or data

(13)

REST: Architectural Constraints

• Client-server architecture

• Statelessness

• Cacheability

• Layered system

• Code on demand (optional)

• Uniform interface

(14)

Resources

Command based (ex: Flicker Api):


GET: 


https://api.flickr.com/services/rest/?method=flickr.galleries.getList&user_id=XX

POST: 


https://api.flickr.com/services/rest/?method=flickr.galleries.addPhoto&gallery_id=XX


(15)

Resources

• ex: Facebook Graph Api:


GET: /{photo-id} to retrieve the info of a photo 


GET: /{photo-id}/likes to retrieve the people who like it
 POST: /{photo-id} to update the photo


DELETE : /{photo-id} to delete the photo


URI/Resource based:

• ex: Google Calendar Api:


GET: /calendars/{calendarId} to retrieve the info of a calendar 


PUT: /calendars/{calendarId} to update a calendar
 DELETE : /calendars/{calendarId} to delete a calendar
 POST: /calendars to create a calendar
 GET: /calendars/{calendarId}/events/{eventId}


(16)

Response

HTTP Response:

200: OK

3 _ _: Redirection

404: not found

(4 _ _ : something went wrong with what you try to access)

5 _ _ : Server Error

API Response:

Flickr:


{ "stat": "fail", "code": 1, "message": "User not found" }

{ "galleries": { ... }, "stat": "ok" }

Google Calendar:

{ "error": {"code": 403, "message": "User Rate Limit Exceeded" } }
 { "kind": "calendar#events","summary": ..., "description": ...

(17)

text/plain

text/html

text/xml or application/xml

application/json

image/png

...

Response

Content-Type:

(18)

Client-side HTTP

(19)

HTTP Requests

from requests import *

manga = "http://lelscano.com"

r = get(manga)

print(f"Request status is {r.status_code},\n"

f"Content length is {len(r.content)} bytes,\n"

f"Request encoding is {r.encoding},\n"

f"Text size is {len(r.text)} chars.")

print(f"Response headers: {r.headers}")

(20)

HTTP Requests

from requests import * manga = "http://

lelscano.com"

r = get(manga)

print(f"Request status is {r.status_code},\n"

f"Content length is {len(r.content)} bytes,\n"

f"Request encoding is {r.encoding},\n"

f"Text size is {len(r.text)} chars.")

print(f"Response headers:

{r.headers}")

Request status is 200,

Content length is 53111 bytes, Request encoding is UTF-8,

Text size is 53105 chars.

(21)

HTTP Requests

from requests import * manga = "http://

lelscano.com"

r = get(manga)

print(f"Request status is {r.status_code},\n"

f"Content length is {len(r.content)} bytes,\n"

f"Request encoding is {r.encoding},\n"

f"Text size is {len(r.text)} chars.")

print(f"Response headers:

{r.headers}")

Request status is 200,

Content length is 53111 bytes, Request encoding is UTF-8,

Text size is 53105 chars.

Response headers: {'Date': 'Wed, 04 Nov 2020 14:40:27 GMT', 'Content-Type': 'text/html;

charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Set-Cookie':

'__cfduid=da1986d3c036d3d4b0dfdbf3f16812e5f160450082 7; expires=Fri, 04-Dec-20 14:40:27 GMT; path=/;

domain=.lelscan.net; HttpOnly; SameSite=Lax,

mobile_lelscan=0; expires=Thu, 05-Nov-2020 14:40:27 GMT; Max-Age=86400; path=lelscan.net', 'Vary':

'Accept-Encoding', 'CF-Cache-Status': 'DYNAMIC', 'cf-request-id': '06354cc73b000032c20c30b000000001', 'Expect-CT': 'max-age=604800, report-uri="https://

report-uri.cloudflare.com/cdn-cgi/beacon/expect- ct"', 'Report-To': '{"endpoints":[{"url":"https:\\/\

\/a.nel.cloudflare.com\\/report?

s=KFPpQxY2A5IilAqwG6j1BXgoJEskCp%2BkW7uCp0z63eYihMbv UnyfBx7abOP6nhy%2B5H1KHR51De457l7y84Ois4b3gD5D1Fi15R rJmklRlavxKwGsFBw3fA%3D%3D"}],"group":"cf-

nel","max_age":604800}', 'NEL': '{"report_to":"cf- nel","max_age":604800}', 'Server': 'cloudflare', 'CF-RAY': '5ecf171ec88e32c2-CDG', 'Content-

Encoding': 'gzip'}

(22)

HTTP Requests

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/

xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<title>One Piece lecture en ligne scan</title>

<meta name="description" content="One Piece Lecture en ligne, tous les scan One Piece." />

<meta name="lelscan" content="One Piece" />

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1" />

<meta http-equiv="Content-Language" content="fr" />

<meta name="keywords" content="One Piece lecture en ligne, lecture en ligne One Piece, scan One Piece, One Piece scan, One Piece lel, lecture en ligne One Piece, Lecture, lecture, scan, chapitre, chapitre One Piece, lecture One Piece, lecture Chapitre One Piece, mangas, manga, One Piece, One Piece fr, One Piece france, scans, image One Piece " />

<meta name="subject" content="One Piece lecture en ligne scan" />

<meta name="identifier-url" content="https://lelscan.net" />

<meta property="og:image" content="/mangas/one-piece/thumb_cover.jpg" />

<meta property="og:title" content="Lecture en ligne One Piece scan" />

<meta property="og:url" content="/lecture-ligne-one-piece.php" />

<meta property="og:description" content="One Piece lecture en ligne - lelscan" />

<link rel="alternate" type="application/rss+xml" title="flux rss" href="/rss/

rss.xml" />

<link rel="icon" type="image" href="/images/icones/favicon.ico" />

<style type="text/css" media="screen">

from requests import * manga = "http://

lelscano.com"

r = get(manga)

print(f"Request status is {r.status_code},\n"

f"Content length is {len(r.content)} bytes,\n"

f"Request encoding is {r.encoding},\n"

f"Text size is {len(r.text)} chars.") print(f"Response headers:

{r.headers}")

print(f"{r.text}")

(23)

Stream Downloading

from pathlib import * from requests import *

def stream_download(source_url, dest_file):

r = get(source_url, stream=True) dest_file = Path(dest_file)

with open(dest_file, "wb") as f:

for chunk in r.iter_content(chunk_size=8192):

if chunk:

f.write(chunk)

(24)

Stream Downloading

from pathlib import * from requests import *

def stream_download(source_url, dest_file):

r = get(source_url, stream=True) dest_file = Path(dest_file)

with open(dest_file, "wb") as f:

for chunk in r.iter_content(chunk_size=8192):

if chunk:

f.write(chunk)

img = "http://ftp.crifo.org/debian-cd/current/amd64/iso- dvd/debian-10.6.0-amd64-DVD-1.iso"

stream_download(source_url=img, dest_file="debian1.iso")

(25)

Elementary

String Parsing

(26)

Split

s = "Python is a great language\n but Erlang is pretty cool too"

l = s.split() print(l)

l2 = s.split('a') print(l2)

l3 = s.split('\n') print(l3)

l4 = s.split('an') print(l4)

(27)

Split

s = "Python is a great language\n but Erlang is pretty cool too"

l = s.split() print(l)

l2 = s.split('a') print(l2)

l3 = s.split('\n') print(l3)

l4 = s.split('an') print(l4)

['Python', 'is', 'a', 'great',

'language', 'but', 'Erlang', 'is', 'pretty', 'cool', ‘too']

['Python is ', ' gre', 't l', 'ngu', 'ge\n but Erl', 'ng is pretty cool too']

['Python is a great language', ' but Erlang is pretty cool too']

[‘Python is a great l', 'guage\n but

Erl', 'g is pretty cool too']

(28)

Join

s4 = 'an'.join(l4) print(s4)

s3 = '\n'.join(l3) print(s3)

s2 = 'a'.join(l2) print(s2)

s1 = ' '.join(l)

print(s1)

(29)

Join

s4 = 'an'.join(l4) print(s4)

s3 = '\n'.join(l3) print(s3)

s2 = 'a'.join(l2) print(s2)

s1 = ' '.join(l) print(s1)

Python is a great language

but Erlang is pretty cool too Python is a great language

but Erlang is pretty cool too Python is a great language

but Erlang is pretty cool too

Python is a great language but Erlang

is pretty cool too

(30)

Regular Expressions

(31)

Regular Expressions

a, X, 9, < -- ordinary characters just match themselves exactly. The meta-characters that have special meanings are: . ^ $ * + ? { [ ] \ | ( ) (details below)

. (a period) -- matches any single character except newline '\n'

\w -- (lowercase w) matches a "word" character: a letter or digit or underscore [a- zA-Z0-9_]. \W matches any non-word character.

\b -- boundary between word and non-word

\s -- (lowercase s) matches a single whitespace character -- space, newline, return, tab, form [ \n\r\t\f]. \S (upper case S) matches any non-whitespace

character.

\t, \n, \r -- tab, newline, return

\d -- decimal digit [0-9]

^=start,$=end—match the start or the end of the string

\ -- inhibit the "specialness" of a character. So, for example, use \. to match a

period or \\ to match a slash. If you are unsure if a character has special meaning, such as '@', you can put a slash in front of it, \@, to make sure it is treated just as a character.

(32)

Regular Expressions

[] — set of possible characters

| — or

{n} — exactly n occurrences.

()— create group

+ — at least one occurence.

* — zero or more occurence

? — zero or one occurence

(33)

sebastien.tixeuil@lip6.fr

Regular Expressions

Extract Email Information:

([^@]+)@([^@]+)

[ ]

^

a character that is not

@ the at symbol

+ at least one of this character

m = re.match('([^@]+)@([^@]+)','sebastien.tixeuil@lip6.fr') 
 print(m.group(1))


print(m.group(2))

(34)

sebastien.tixeuil@lip6.fr

Regular Expressions

Extract Email Information:

([^@]+)@([^@]+)

[ ]

^

a character that is not

@ the at symbol

+ at least one of this character

m = re.match(‘([^@]+)@([^@]+)',sebastien.tixeuil@lip6.fr) 
 print(m.group(1))


print(m.group(2))

sebastien.tixeuil

lip6.fr

(35)

Extracting Information with

Regular Expressions

(36)

Extracting Information with

Regular Expressions

(37)

Extracting Information with Regular Expressions

from requests import * from re import *

r = get('https://www.lip6.fr/recherche/

team_membres.php?acronyme=NPA')

print(findall(‘(26-00/([0-9]{3}))',

r.text))

(38)

Extracting Information with Regular Expressions

from requests import * from re import *

r = get('https://www.lip6.fr/recherche/

team_membres.php?acronyme=NPA')

print(findall(‘(26-00/([0-9]{3}))', r.text))

[('26-00/103', '103'), ('26-00/112', '112'), ('26-00/122', '122'), ('26-00/109', '109'), ('26-00/111', '111'), ('26-00/108', '108'), ('26-00/103', '103'), ('26-00/107', '107'), ('26-00/126', '126'), ('26-00/105', '105'), ('26-00/105', '105'), ('26-00/115', '115'), ('26-00/128', '128'), ('26-00/114', '114'), ('26-00/113', '113'), ('26-00/224', '224'), ('26-00/410', '410'), ('26-00/412', '412'), ('26-00/230', '230'), ('26-00/216', '216'), ('26-00/119', '119'), ('26-00/119', '119'), ('26-00/116', '116'), ('26-00/132', '132'), ('26-00/102', '102'), ('26-00/120', '120'), ('26-00/116', '116'), ('26-00/120', '120'), ('26-00/132', '132'), ('26-00/104', '104'), ('26-00/102', '102'), ('26-00/102', '102'), ('26-00/132', '132'), ('26-00/102', '102'), ('26-00/104', '104'), ('26-00/420', '420'), ('26-00/120', '120'), ('26-00/120', '120'), ('26-00/132', '132'), ('26-00/119', '119'), ('26-00/119', '119')]

(39)

JSON Parsing

(40)

JSON

from json import * from socket import *

print(dumps(['aéçèà',1234,[2,3,4,5,6]]))

print(loads('["a\u00e9\u00e7\u00e8\u00e0", 1234, [2, 3, 4, 5, 6]]'))

s = socket(AF_INET,SOCK_STREAM) try:

print(dumps(s)) except TypeError:

print("this data does not seem serializable with JSON")

(41)

JSON

from json import * from socket import *

print(dumps(['aéçèà',1234,[2,3,4,5,6]]))

print(loads('["a\u00e9\u00e7\u00e8\u00e0", 1234, [2, 3, 4, 5, 6]]'))

s = socket(AF_INET,SOCK_STREAM) try:

print(dumps(s)) except TypeError:

print("this data does not seem serializable with JSON")

["a\u00e9\u00e7\u00e8\u00e0", 1234, [2, 3, 4, 5, 6]]

['aéçèà', 1234, [2, 3, 4, 5, 6]]

(42)

JSON Files

from json import *

data = {}

data['people'] = []

data['people'].append({

'name': 'Mark',

'website': 'facebook.com', })

data['people'].append({

'name': 'Larry',

'website': 'google.com', })

data['people'].append({

'name': 'Tim',

'website': 'apple.com', })

(43)

JSON Files

with open('data.txt', 'w') as outfile:

dump(data, outfile)

{"people": [{"name": "Mark", "website":

"facebook.com"}, {"name": "Larry", "website":

"google.com"}, {"name": "Tim", "website":

"apple.com"}]}

data.txt

(44)

JSON Files

with open('data.txt') as infile:

data = load(infile)

for p in data['people']:

print('Name: ' + p['name'])

print('Website: ' + p['website']) print('')

{"people": [{"name": "Mark", "website":

"facebook.com"}, {"name": "Larry", "website":

"google.com"}, {"name": "Tim", "website":

"apple.com"}]}

data.txt

(45)

JSON Files

with open('data.txt') as infile:

data = load(infile)

for p in data['people']:

print('Name: ' + p['name'])

print('Website: ' + p['website']) print('')

{"people": [{"name": "Mark", "website":

"facebook.com"}, {"name": "Larry", "website":

"google.com"}, {"name": "Tim", "website":

"apple.com"}]}

data.txt

Name: Mark

Website: facebook.com Name: Larry

Website: google.com Name: Tim

Website: apple.com

(46)

XML Parsing

(47)

XML Example

<?xml version="1.0" encoding="UTF-8"?>

<note>

<to>Tove</to>


<from>Jani</from>

<heading>Reminder</heading>


<body>Don't forget me this weekend!</body>

</note>

(48)

XML Example 2

<?xml version="1.0"?> <data>

<country name="Liechtenstein"> <rank>1</rank>


<year>2008</year> <gdppc>141100</gdppc>


<neighbor name="Austria" direction="E"/>

<neighbor name="Switzerland" direction="W"/>

</country>


<country name="Singapore">

<rank>4</rank>


<year>2011</year> <gdppc>59900</gdppc>


<neighbor name="Malaysia" direction="N"/>

</country>


<country name="Panama">

<rank>68</rank>


<year>2011</year> <gdppc>13600</gdppc>


<neighbor name="Costa Rica" direction="W"/>

<neighbor name="Colombia" direction="E"/>

</country> </data>

(49)

XML Example 2

<?xml version="1.0"?> <data>

<country name="Liechtenstein"> <rank>1</rank>


<year>2008</year> <gdppc>141100</gdppc>


<neighbor name="Austria" direction="E"/>

<neighbor name="Switzerland" direction="W"/>

</country>


<country name="Singapore">

<rank>4</rank>


<year>2011</year> <gdppc>59900</gdppc>


<neighbor name="Malaysia" direction="N"/>

</country>


<country name="Panama">

<rank>68</rank>


<year>2011</year> <gdppc>13600</gdppc>


<neighbor name="Costa Rica" direction="W"/>

<neighbor name="Colombia" direction="E"/>

</country> </data>

countryXML.xml

(50)

XML Example 2

<?xml version="1.0"?> <data>

<country name="Liechtenstein"> <rank>1</rank>


<year>2008</year> <gdppc>141100</gdppc>


<neighbor name="Austria" direction="E"/>

<neighbor name="Switzerland" direction="W"/>

</country>


<country name="Singapore">

<rank>4</rank>


<year>2011</year> <gdppc>59900</gdppc>


<neighbor name="Malaysia" direction="N"/>

</country>


<country name="Panama">

<rank>68</rank>


<year>2011</year> <gdppc>13600</gdppc>


<neighbor name="Costa Rica" direction="W"/>

<neighbor name="Colombia" direction="E"/>

</country> </data>

countryXML.xml

(51)

XML Parsing

With xml.etree.ElementTree

xml.etree.ElementTree loads the whole file, you can then navigate in the tree structure.

import xml.etree.ElementTree as ET

tree = ET.parse(‘countryXML.xml')

(52)

XML Parsing

import xml.etree.ElementTree as ET
 tree=ET.parse('countryXML.xml')


root=tree.getroot()
 print(root.tag)


print(root.attrib)


print(root[0][1].text)

for child in root:

print(child.tag, child.attrib) for n in root.iter(‘neighbor’):

print(n.attrib)

(53)

XML Parsing

import xml.etree.ElementTree as ET
 tree=ET.parse('countryXML.xml')


root=tree.getroot()
 print(root.tag)


print(root.attrib)


print(root[0][1].text)

for child in root:

print(child.tag, child.attrib)

for n in root.iter(‘neighbor’):

print(n.attrib)

'data'
 {}


‘2008’

(54)

XML Parsing

import xml.etree.ElementTree as ET
 tree=ET.parse('countryXML.xml')


root=tree.getroot()
 print(root.tag)


print(root.attrib)


print(root[0][1].text)

for child in root:

print(child.tag, child.attrib)

for n in root.iter(‘neighbor’):

print(n.attrib)

'data'
 {}


‘2008’

country {‘name’: ‘Liechtenstein’}

country {‘name’: ‘Singapore’}

country {‘name’: ‘Panama’}

(55)

XML Parsing

import xml.etree.ElementTree as ET
 tree=ET.parse('countryXML.xml')


root=tree.getroot()
 print(root.tag)


print(root.attrib)


print(root[0][1].text)

for child in root:

print(child.tag, child.attrib)

for n in root.iter(‘neighbor’):

print(n.attrib)

'data'
 {}


‘2008’

country {‘name’: ‘Liechtenstein’}

country {‘name’: ‘Singapore’}

country {‘name’: ‘Panama’}

{'direction': 'E', 'name': 'Austria'}


{'direction': 'W', 'name': 'Switzerland'}


{'direction': 'N', 'name': 'Malaysia'}


{'direction': 'W', 'name': 'Costa Rica'}


{'direction': 'E', 'name': 'Colombia'}

(56)

XML Parsing

import xml.etree.ElementTree as ET tree = ET.parse('ContryXML.xml') root = tree.getroot()

# Or Short: root = ET.fromstring(country_data_as_string) print("---country")

for child in root:

print(child.tag, child.attrib) print("---Rank:") for rank in root.iter('rank'):

print(rank.text)

print("---neighbors")


for neighbor in root.iter('neighbor'):

print(neighbor.attrib)

print("---neighbors name") for neighbor in root.iter('neighbor'):

print(neighbor.get('name'))

print("---country and neighbors") for child in root:

print("the neighbors of",child.get('name'),":") for neighbor in root.iter('neighbor'):

print(neighbor.get('name'))

(57)

CSV Parsing

(58)

CSV File

(59)

CSV File

name,iso_a3,currency_code,local_price,dollar_ex,GDP_dollar,date Argentina,ARG,ARS,2.5,1,,2000-04-01

Australia,AUS,AUD,2.59,1.68,,2000-04-01 Brazil,BRA,BRL,2.95,1.79,,2000-04-01

Britain,GBR,GBP,1.9,0.632911392,,2000-04-01 Canada,CAN,CAD,2.85,1.47,,2000-04-01

Chile,CHL,CLP,1260,514,,2000-04-01 China,CHN,CNY,9.9,8.28,,2000-04-01

Czech Republic,CZE,CZK,54.37,39.1,,2000-04-01 Denmark,DNK,DKK,24.75,8.04,,2000-04-01

Euro area,EUZ,EUR,2.56,1.075268817,,2000-04-01 Hong Kong,HKG,HKD,10.2,7.79,,2000-04-01

Hungary,HUN,HUF,339,279,,2000-04-01 Indonesia,IDN,IDR,14500,7945,,2000-04-01 Israel,ISR,ILS,14.5,4.05,,2000-04-01

Japan,JPN,JPY,294,106,,2000-04-01

Malaysia,MYS,MYR,4.52,3.8,,2000-04-01 Mexico,MEX,MXN,20.9,9.41,,2000-04-01

New Zealand,NZL,NZD,3.4,2.01,,2000-04-01 Poland,POL,PLN,5.5,4.3,,2000-04-01

Russia,RUS,RUB,39.5,28.5,,2000-04-01

(60)

CSV Parsing

from csv import *

with open('big-mac-source-data.csv', newline='') as csvfile:

r = reader(csvfile, delimiter=',', quotechar='|')

for row in r:

if(row[0] == "France"):

print(str(row[0]) + ',' + str(row[3]) + ',' + str(row[6]) )

(61)

CSV Parsing

France,3.5,2011-07-01 France,3.6,2012-01-01 France,3.6,2012-07-01 France,3.6,2013-01-01 France,3.9,2013-07-01 France,3.8,2014-01-01 France,3.9,2014-07-01 France,3.9,2015-01-01 France,4.1,2015-07-01 France,4.1,2016-01-01 France,4.1,2016-07-01 France,4.1,2017-01-01 France,4.1,2017-07-01 France,4.2,2018-01-01 France,4.2,2018-07-01 France,4.2,2019-01-01 France,4.2,2019-07-09 France,4.2,2020-01-14 France,4.2,2020-07-01 from csv import *

with open('big-mac-source-data.csv', newline='') as csvfile:

r = reader(csvfile, delimiter=',', quotechar='|')

for row in r:

if(row[0] == "France"):

print(str(row[0]) + ',' + str(row[3]) + ',' + str(row[6]) )

(62)

HTML Parsing

(63)

Beautiful Soup

Make a soup (a navigable version of a string) Browse a soup

soup.find("tag") / soup.tag (returns soup)

soup.find_all("tag") / soup("tag") (returns list) soup.find("tag", {'attr_name':

'attr_value'})

soup.contents (list of children) Extract text

soup.decode_contents(): returns soup as string soup.encode_contents() : returns soup as bytes soup.text: return soup as tagless string

soup['attr_name']: return attribute value

(64)

Make a Soup

from requests import *

from bs4 import BeautifulSoup as bs

news = "https://www.lip6.fr/production/publications- type.php?id=-1&annee=2020&type_pub=ART"

r = get(news)

soup = bs(r.text, features="lxml")

(65)

Example

(66)

Example

(67)

Browse Soup and Extract Text

print(soup.find('li', {'class': ‘D700'}))

(68)

Browse Soup and Extract Text

print(soup.find('li', {'class': ‘D700'}))

<li class="D700"><strong>L. Amorim Reis, A. Murillo Piedrahita, S. Rueda Rodríguez, N. Castro Fernandes, D. Scherly Varela de

Medeiros, M. Dias De Amorim, D. Ferrazani Mattos</

strong> : “<a href="https://hal.archives- ouvertes.fr/hal-02569404">Unsupervised and Incremental Learning Orchestration for Cyber-

Physical Security</a>”, Transactions on emerging

telecommunications technologies, (Wiley-Blackwell)

[Amorim Reis 2020]</li>

(69)

Browse Soup and Extract Text

print(soup.find('li', {'class': ‘D700’}))

for p in soup.find_all('li', {'class': 'D700'}):

print(p.find(‘a’)[‘href'])

https://hal.archives-ouvertes.fr/hal-02569404 https://hal.archives-ouvertes.fr/hal-02945354 https://hal.archives-ouvertes.fr/hal-02986029 https://hal.archives-ouvertes.fr/hal-02980298 https://hal.archives-ouvertes.fr/hal-02985997 https://hal.archives-ouvertes.fr/hal-02443135 https://hal.archives-ouvertes.fr/hal-02911665 https://hal.archives-ouvertes.fr/hal-02931632 https://hal.archives-ouvertes.fr/hal-02527916 https://hal.archives-ouvertes.fr/hal-02955863 https://hal.archives-ouvertes.fr/hal-02984494 https://hal.archives-ouvertes.fr/hal-02945921 https://hal.archives-ouvertes.fr/hal-02906806 https://hal.archives-ouvertes.fr/hal-02985461 https://hal.archives-ouvertes.fr/hal-02400963 https://hal.archives-ouvertes.fr/hal-02929626 https://hal.archives-ouvertes.fr/hal-01805478 https://hal.archives-ouvertes.fr/hal-02682005 https://hal.archives-ouvertes.fr/hal-02568587

(70)

Some Websites have

Python Library!

(71)

Wikipedia

(72)

Wikipedia

from wikipedia import *

r = page("Python (programming language)")

print(r.summary)

(73)

Wikipedia

from wikipedia import *

r = page("Python (programming language)") print(r.summary)

Python is an interpreted, high-level and general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of

significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented, and functional programming. Python is often described as a "batteries included" language due to its comprehensive standard library.Python was created in the late 1980s as a successor to the ABC language. Python 2.0, released in 2000, introduced features like list comprehensions and a garbage collection system with reference counting.

Python 3.0, released in 2008, was a major revision of the language that is not completely backward-compatible, and much Python 2 code does not run unmodified on Python 3.

The Python 2 language was officially discontinued in 2020 (first planned for 2015), and "Python 2.7.18 is the last Python 2.7 release and therefore the last Python 2 release." No more security patches or other

improvements will be released for it. With Python 2's end-of-life, only Python 3.6.x and later are supported.

Python interpreters are available for many operating systems. A global community of programmers develops and maintains CPython, a free and open-source reference implementation. A non-profit organization, the Python Software Foundation, manages and directs resources for Python and CPython development.

(74)

Google Scholar

(75)

Google Scholar

from scholarly import *

s = next(scholarly.search_author("Sebastien Tixeuil"))

print(s.interests)

(76)

Google Scholar

from scholarly import *

s = next(scholarly.search_author("Sebastien Tixeuil"))

print(s.interests)

['Algorithms & Theory', 'Computer Networks', 'Distributed Computing']

Références

Documents relatifs

faire un retour sur ce qu'il s'est passé par rapport à ce que vous aviez prévu.. from turtle import *

Il faut bien comprendre l'idée : vous avez construit une fonction ; le programme une fois compilé et exécuté, la fonction est prête à l'emploi ... et n'attend que ça.. répéter) ;

Dans un nouveau programme SABLE, créer une fonction prix() qui renvoie le prix à payer quand on connaît la quantité de sable acheté..

Ecrire une fonction ´ saisie qui effectue une saisie valide et renvoie la valeur saisie sous forme d’une chaˆıne de caract` eres. Ecrire une fonction ´ proportion qui re¸coit

import absmod3 cherche dans le répertoire courant le fichier absmod3.py , puis dans la liste des répertoires définis dans la variable d’environnement PYTHONPATH et enfin dans

ni comme exclusif (une fonction ou une commande absente de cette liste n'est pas interdite : si un candidat utilise à très bon escient d'autres fonctions MAIS sait aussi répondre

➔ toujours placer une espace après une virgule, un point-virgule ou deux-points ;. ➔ ne jamais placer d’espace avant une virgule, un point-virgule ou

ni comme exclusif (une fonction ou une commande absente de cette liste n'est pas interdite : si un candidat utilise à très bon escient d'autres fonctions MAIS sait aussi répondre