Web Service and Open Data
By Lélia Blins - ProgRes 2018 lelia.blin@lip6.fr
Thanks to Quentin Bramas
What is a Web Service ?
A Web Service is a method of communication between two electronic devices over the Web.
HTTP is the typical protocol used by WebService to communicate.
What is a Web Service ?
Request
Response
Device HTTP Server
Application
Programming
Interface
An Interface
used by Programs to interact
with an Application
APIs exposes a service which consumes the service Developers
write a program
Examples
Example twitter API
Example geocoding APIs
Geocoding APIs
• Open Street Map API
• Google Map API
• Adress Data Gouv
• ….
What is the format of the response?
Request
Response
Device HTTP Server
https://maps.googleapis.com/maps/api/geocode/xml?
address=25%20rue%20lang%20france
https://maps.googleapis.com/maps/api/geocode/json?
address=25%20rue%20lang%20france
Web Api - example
REpresentational State
Transfert
REST Web Api
Is a web service using simpler REpresentational State Transfer (REST) based communication.
Request is just a HTTP Method over an URI.
Response is typically JSON or XML.
Request HTTP
GET POST PUT
DELETE
Request Python
>>> import requests
>>> r = requests.get("http://linuxfr.org/")
>>> print(r.text)
<!DOCTYPE html>
<html lang="fr">
<head>
<meta charset="utf-8">
<title>Accueil - LinuxFr.org</title>
<style type="text/css">header#branding h1 { background-image: url('/images/logos/linuxfr2_mountain.png') }</style>
r = requests.put("http://linuxfr.org/") r = requests.delete("http://linuxfr.org/") r = requests.patch("http://linuxfr.org/") r = requests.post("http://linuxfr.org/") r = requests.head("http://linuxfr.org/") r = requests.options("http://linuxfr.org/")
Request Python
Send data
data = {"first_name":"Richard", "second_name":"Stallman"}
r = requests.post("http://linuxfr.org", data = data)
Picture
file = {'file': open("photo.png", "rb")}
r = requests.post("http://linuxfr.org", files = file)
r.text #Return the content (unicode) r.content #Return the content (bytes) r.json #Return the content (json) r.headers #Return the content (Dict)
Resources
• ex: Facebook Graph Api:
GET: /{photo-id} to retrieve the info of a photo
GET: /{photo-id}/likes to retrieve the people who like it POST: /{photo-id} to update the photo
DELETE : /{photo-id} to delete the photo
URI/Resource based:
• ex: Google Calendar Api:
GET: /calendars/{calendarId} to retrieve the info of a calendar
PUT: /calendars/{calendarId} to update a calendar DELETE : /calendars/{calendarId} to delete a calendar POST: /calendars to create a calendar GET: /calendars/{calendarId}/events/{eventId}
Response
HTTP Response:
• 200: OK
• 3 _ _: Redirection
• 404: not found (4 _ _ : something went wrong with what you try to access)
• 5 _ _ : Server Error
API Response:
• Flickr:
{ "stat": "fail", "code": 1, "message": "User not found" }
{ "galleries": { ... }, "stat": "ok" }
• Google Calendar:
{ "error": {"code": 403, "message": "User Rate Limit Exceeded" } } { "kind": "calendar#events","summary": ..., "description": ...
• text/plain
• text/html
• text/xml or application/xml
• application/json
• image/png
• ...
Response
Content-Type:
Python
JSON and XML
Parsing
use the json package:
>> obj = json.loads('{"attr1": "v1", "attr2": 42}')
>> obj['attr1']
'v1'
>> obj['attr2']
42
>> obj = {'id':1, 'data':[1,2,3,4]}
>> json.dumps(obj) # returns a string '{'id':1, 'data':[1,2,3,4]}'
JSON Parsing
Convert JSON to Python Object (Dict)
use the json package:
import json
json_data = '{"name": "Marie", "city": "Paris"}' python_obj = json.loads(json_data)
print python_obj[« name"]
print python_obj[« city"]
Result
>python3 01_Json.py Maria
Paris
Convert JSON to Python Object (List)
import json
json_data = '{"persons": [{"name": "Marie", "city": "Paris"}, {"name": "Pierre", "city": "Lyon"} ] }' python_obj = json.loads(json_data)
print json.dumps(python_obj, sort_keys=True, indent=4)
Result
>python3 02_Json.py
{
"persons": [ {
"city": "Paris", "name": "Marie"
}, {
"city": "Lyon", "name": "Pierre"
} ]
}
Convert JSON to Python Object
import json
json_input = '{"persons": [{"name": "Marie", "city": "Paris"}, {"name": "Pierre", "city": "Lyon"} ] }'
try:
decoded = json.loads(json_input)
# Access data
for x in decoded['persons']:
print x['name']
except (ValueError, KeyError, TypeError):
print "JSON format error"
Result
>python3 03_Json.py Marie
Pierre
Use JSON file
import json
data = json.load(open('lang.json')) try:
# Access data
for x in data['results']:
print x['formatted_address']
except (ValueError, KeyError, TypeError):
print "JSON format error"
Result
>python3 04_Json.py
25 Rue Cité Lang, 68560 Hirsingue, France
25 Rue Raphaël Lang, 54500 Vandœuvre-lès-Nancy, France
XML Parsing
With xml.etree.ElementTree, xml.sax, or html.parser
import xml.etree.ElementTree as ET tree = ET.parse(‘countryXML.xml')
xml.etree.ElementTree load the whole file, you can then naviguate in the tree structure.
$ python
Python 2.7.10 (default, Feb 7 2017, 00:08:15)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import xml.etree.ElementTree as ET
>>> tree=ET.parse('contryXML.xml')
>>> root=tree.getroot()
>>> root.tag 'data'
>>> root.attrib {}
>>> for child in root:
... print child.tag, child.attrib ...
country {'name': 'Liechtenstein'}
country {'name': 'Singapore'}
country {'name': 'Panama'}
>>> root[0][1].text '2008'
>>> for n in root.iter('neighbor'):
... print n.attrib ...
{'direction': 'E', 'name': 'Austria'}
{'direction': 'W', 'name': 'Switzerland'}
{'direction': 'N', 'name': 'Malaysia'}
{'direction': 'W', 'name': 'Costa Rica'}
{'direction': 'E', 'name': 'Colombia'}
Simple API to XML: SAX
import xml.sax
class MyHandler ( xml.sax.ContentHandler):
def __init__( self):
xml.sax.ContentHandler.__init__( self) self.element_name2count = {}
def startElement( self, name, attrs):
self.element_name2count[ name] =
self.element_name2count.get( name, 0) + 1 filename = "lang.xml"
handler = MyHandler()
xml.sax.parse( filename, handler)
# sort elements according to their count to_sort = [(count,name) for name,count in handler.element_name2count.iteritems()]
to_sort.sort( reverse=True) for count,name in to_sort:
print "%s: %d" % (name,count)
Simple API to XML: SAX
Result
type: 24
short_name: 14 long_name: 14
address_component: 14 lng: 6
lat: 6
viewport: 2 southwest: 2 result: 2 place_id: 2
partial_match: 2 northeast: 2
location_type: 2 location: 2
geometry: 2
formatted_address: 2 status: 1
GeocodeResponse: 1
Beautifulsoup (HTML parser)
import requests
from bs4 import BeautifulSoup
r = requests.get("https://fr.wikipedia.org/wiki/
Beautiful_Soup")
soup = BeautifulSoup(r.content, "html.parser")
#print(soup)
print (soup.title)
> python3 06_BS.py
<title>Python — Wikipédia</title>
Regular Expressions
Regular Expressions
Regular expressions are a powerful language for matching text patterns.
The Python "re" module provides regular expression support.
>>> import re
>>> re.findall("([0-9]+)", "Bonjour 111 Aurevoir 222") ['111', '222']
Regular Expressions
• a, X, 9, < -- ordinary characters just match themselves exactly. The meta-characters
which do not match themselves because they have special meanings are: . ^ $ * + ? { [ ] \
| ( ) (details below)
• . (a period) -- matches any single character except newline '\n'
• \w -- (lowercase w) matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_].
Note that although "word" is the mnemonic for this, it only matches a single word char, not a whole word. \W (upper case W) matches any non-word character.
• \b -- boundary between word and non-word
• \s -- (lowercase s) matches a single whitespace character -- space, newline, return, tab, form [ \n\r\t\f]. \S (upper case S) matches any non-whitespace character.
• \t, \n, \r -- tab, newline, return
• \d -- decimal digit [0-9] (some older regex utilities do not support but \d, but they all support \w and \s)
• ^ = start, $ = end -- match the start or end of the string
• \ -- inhibit the "specialness" of a character. So, for example, use \. to match a period or \\
to match a slash. If you are unsure if a character has special meaning, such as '@', you can put a slash in front of it, \@, to make sure it is treated just as a character.
lelia.blin@lip6.fr
Regular Expressions
Extract Email Information:
([^@]+)@([^@]+)
[ ]
^
a character that is not
@ the at symbol
+ at least one of this character
>>> m= re.match('([^@]+)@([^@]+)','lelia.blin@lip6.fr')
>>> m.group(1) 'lelia.blin'
>>> m.group(2) 'lip6.fr'
>>>
Create an API
with Python
Create an API
• Django: Powerful web framework with a lot of modules. Great to build a complete website.
• Flask: Small Framework to build simple website.
• Bottle: Similar to Flask, but even simpler. Perfect to build an API
Available library/framework in python:
Create an API
The Bottle Framework (single file module, no dependencies)
• Routing: Requests to function-call mapping with support for clean and dynamic URLs.
• Templates: Fast and pythonic built-in template engine
• Utilities: Convenient access to form data, file uploads, cookies, headers and other HTTP-related metadata.
• Server: Built-in HTTP development server and
support for other WSGI capable HTTP server. (WSGI is the Web Server Gateway Interface, which is a specification for web server in python)
Create an API
from bottle import route, run
@route('/hello') def hello():
return 'Hello world'
run(host='localhost', port=8080)
Hello world example:
>python3 07_Hello.py
Bottle v0.12.13 server starting up (using WSGIRefServer())...
Listening on http://localhost:8080/
Hit Ctrl-C to quit.
http://localhost:8080/hello
Id in URL
from bottle import route, run, template
@route('/hello/<name>') def hello(name):
return 'Hello ' + name
run(host='localhost', port=8080)
File: 08_HelloName.py
URL: http://localhost:8080/hello/Marie
Id in URL
from bottle import route, run, template
@route('/hello/<name>') def hello(name):
return 'Hello ' + name
#http://localhost:8080/hello/Marie
@route('/bonjour/<name>') def bonjour(name):
return 'Bonjour ' + name
#http://localhost:8080/bonjour/Marie
@route('/buenas/<name>') def buena(name):
return 'Buenas dias ' + name
#http://localhost:8080/buenas/Marie
run(host='localhost', port=8080)
File: 09_HelloPL.py
Id in URL
from bottle import Bottle, run, view, request app = Bottle()
@app.route('/jemesure') def jemesure():
return "Je mesure " + request.params.taille + " cm"
run(app, host='localhost', port=8080)#, reloader=True)
File: 10_Taille.py
URL: http://localhost:8080/jemesure?taille=133
Static content
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from bottle import Bottle, run, static_file app = Bottle()
@app.route('/static/<filename:path>') def server_static(filename):
return static_file(filename, root='.')
run(app, host='localhost', port=8080, reloader=True)
File: 11_Img.py
URL: http://localhost:8080/static/cube.png
Open Data
Open Data
Publicly available API / Dataset about:
• Education
• Public Transport
• Economie
• Sport Results