• Aucun résultat trouvé

General Presentation and Class Structure MPRI 2.26.2: Web Data Management

N/A
N/A
Protected

Academic year: 2022

Partager "General Presentation and Class Structure MPRI 2.26.2: Web Data Management"

Copied!
30
0
0

Texte intégral

(1)

General Presentation and Class Structure

MPRI 2.26.2: Web Data Management

Antoine Amarilli Friday, December 7th

(2)

Web Data Management

• A class aboutthe Weband the data that it contains

• Strongpractical aspectsbut connections totheory

• e.g.,XPath(practice) vstree automata(theory)

• e.g.,SPARQL(practice) vsregular path queries(theory)

→ A way to see somepracticewithin the confines of MPRI

→ A way to see some exotictheorymotivated by practice

2/12

(3)

Teachers

Antoine Amarilli Télécom ParisTech

Pierre Senellart École normale supérieure

(4)

Class time and modalities

• OnFriday afternoonfrom16:15to19:30with break(s).

• Sorry about your weekend plans...

• Attendance isnot mandatorybut we haveattendance sheets

• So we can know whether you sometimes show up in class...

4/12

(5)

Class time and modalities

• OnFriday afternoonfrom16:15to19:30with break(s).

• Sorry about your weekend plans...

• Attendance isnot mandatorybut we haveattendance sheets

• So we can know whether you sometimes show up in class...

(6)

Moodle

https://moodle.di.ens.fr/course/view.php?id=9(cf wikimpri)

• Pleaseregister to the classon Moodle!

ENS studentscan directly use theSPI CASto login

Other studentscan create an account

Everyonecanself-enrolto the class

5/12

(7)

What is Moodle good for?

• Finding theclass material(slides, etc.)

• Ask questions(better than via email)

• Read questionsasked by others

• You can subscribe tonotificationsif you wish

• Submit your project(more soon)

(8)

Class evaluation

• 50%of the grade will be anexam(on March 1st)

This is required by MPRI rules...

• 50%of the grade will be aproject

Namely...

7/12

(9)

Class evaluation

• 50%of the grade will be anexam(on March 1st)

This is required by MPRI rules...

• 50%of the grade will be aproject

Namely...

(10)

About that project...

• All details areon Moodle, here are the key points:

• 1 studentor2 studentsper project

• Free choice of topicrelated to the Web

• Deadlines and deliverables:

Dec 21:submit on Moodle the project description and group

• Thecodebaseshould be open-source on apublic repository

• There should be aREADMEwith minimal documentation

Feb 22:End of project,defensewith slides and ademo

→ Use the project for...

• Trying out someoriginal idea

• Scratching apersonal itch

• Contributing to anexisting codebase

→ Try tohave fun! ;-)

8/12

(11)

About that project...

• All details areon Moodle, here are the key points:

• 1 studentor2 studentsper project

• Free choice of topicrelated to the Web

• Deadlines and deliverables:

Dec 21:submit on Moodle the project description and group

• Thecodebaseshould be open-source on apublic repository

• There should be aREADMEwith minimal documentation

Feb 22:End of project,defensewith slides and ademo

→ Use the project for...

• Trying out someoriginal idea

• Scratching apersonal itch

• Contributing to anexisting codebase

→ Try tohave fun! ;-)

(12)

About that project...

• All details areon Moodle, here are the key points:

• 1 studentor2 studentsper project

• Free choice of topicrelated to the Web

• Deadlines and deliverables:

Dec 21:submit on Moodle the project description and group

• Thecodebaseshould be open-source on apublic repository

• There should be aREADMEwith minimal documentation

Feb 22:End of project,defensewith slides and ademo

→ Use the project for...

• Trying out someoriginal idea

• Scratching apersonal itch

• Contributing to anexisting codebase

→ Try tohave fun! ;-)

8/12

(13)

About that project...

• All details areon Moodle, here are the key points:

• 1 studentor2 studentsper project

• Free choice of topicrelated to the Web

• Deadlines and deliverables:

Dec 21:submit on Moodle the project description and group

• Thecodebaseshould be open-source on apublic repository

• There should be aREADMEwith minimal documentation

Feb 22:End of project,defensewith slides and ademo

→ Use the project for...

• Trying out someoriginal idea

• Scratching apersonal itch

• Contributing to anexisting codebase

→ Try tohave fun! ;-)

(14)

About that project...

• All details areon Moodle, here are the key points:

• 1 studentor2 studentsper project

• Free choice of topicrelated to the Web

• Deadlines and deliverables:

Dec 21:submit on Moodle the project description and group

• Thecodebaseshould be open-source on apublic repository

• There should be aREADMEwith minimal documentation

Feb 22:End of project,defensewith slides and ademo

→ Use the project for...

• Trying out someoriginal idea

• Scratching apersonal itch

• Contributing to anexisting codebase

→ Try tohave fun! ;-)

8/12

(15)

About that project...

• All details areon Moodle, here are the key points:

• 1 studentor2 studentsper project

• Free choice of topicrelated to the Web

• Deadlines and deliverables:

Dec 21:submit on Moodle the project description and group

• Thecodebaseshould be open-source on apublic repository

• There should be aREADMEwith minimal documentation

Feb 22:End of project,defensewith slides and ademo

→ Use the project for...

• Trying out someoriginal idea

(16)

Class schedule

• December 7:Modern Web Technologies (Antoine)

• December 14:Semistructured Data on the Web (Antoine)

• December 21:Web Crawling and Web Scraping (Pierre)

• (Merry Christmas and Happy New Year!)

• January 11:Information Extraction and the Semantic Web (Antoine)

• January 18: Veracity and Explainability on the Web (Antoine)

• February 1:Web Information Retrieval (Pierre)

• February 8: Computation and Data Storage at Web Scale (Pierre)

• February 15:Web Data Integration, the Deep Web (Pierre)

9/12

(17)

Class schedule

• December 7:Modern Web Technologies (Antoine)

• December 14:Semistructured Data on the Web (Antoine)

• December 21:Web Crawling and Web Scraping (Pierre)

• (Merry Christmas and Happy New Year!)

• January 11:Information Extraction and the Semantic Web (Antoine)

• January 18: Veracity and Explainability on the Web (Antoine)

• February 1:Web Information Retrieval (Pierre)

• February 8: Computation and Data Storage at Web Scale (Pierre)

• February 15:Web Data Integration, the Deep Web (Pierre)

(18)

Class schedule

• December 7:Modern Web Technologies (Antoine)

• December 14:Semistructured Data on the Web (Antoine)

• December 21:Web Crawling and Web Scraping (Pierre)

• (Merry Christmas and Happy New Year!)

• January 11:Information Extraction and the Semantic Web (Antoine)

• January 18: Veracity and Explainability on the Web (Antoine)

• February 1:Web Information Retrieval (Pierre)

• February 8: Computation and Data Storage at Web Scale (Pierre)

• February 15:Web Data Integration, the Deep Web (Pierre)

9/12

(19)

Class schedule

• December 7:Modern Web Technologies (Antoine)

• December 14:Semistructured Data on the Web (Antoine)

• December 21:Web Crawling and Web Scraping (Pierre)

• (Merry Christmas and Happy New Year!)

• January 11:Information Extraction and the Semantic Web (Antoine)

• January 18: Veracity and Explainability on the Web (Antoine)

• February 1:Web Information Retrieval (Pierre)

• February 8: Computation and Data Storage at Web Scale (Pierre)

• February 15:Web Data Integration, the Deep Web (Pierre)

(20)

Class schedule

• December 7:Modern Web Technologies (Antoine)

• December 14:Semistructured Data on the Web (Antoine)

• December 21:Web Crawling and Web Scraping (Pierre)

• (Merry Christmas and Happy New Year!)

• January 11:Information Extraction and the Semantic Web (Antoine)

• January 18: Veracity and Explainability on the Web (Antoine)

• February 1:Web Information Retrieval (Pierre)

• February 8: Computation and Data Storage at Web Scale (Pierre)

• February 15:Web Data Integration, the Deep Web (Pierre)

9/12

(21)

Class schedule

• December 7:Modern Web Technologies (Antoine)

• December 14:Semistructured Data on the Web (Antoine)

• December 21:Web Crawling and Web Scraping (Pierre)

• (Merry Christmas and Happy New Year!)

• January 11:Information Extraction and the Semantic Web (Antoine)

• January 18:Veracity and Explainability on the Web (Antoine)

• February 1:Web Information Retrieval (Pierre)

• February 8: Computation and Data Storage at Web Scale (Pierre)

• February 15:Web Data Integration, the Deep Web (Pierre)

(22)

Class schedule

• December 7:Modern Web Technologies (Antoine)

• December 14:Semistructured Data on the Web (Antoine)

• December 21:Web Crawling and Web Scraping (Pierre)

• (Merry Christmas and Happy New Year!)

• January 11:Information Extraction and the Semantic Web (Antoine)

• January 18:Veracity and Explainability on the Web (Antoine)

• February 1:Web Information Retrieval (Pierre)

• February 8: Computation and Data Storage at Web Scale (Pierre)

• February 15:Web Data Integration, the Deep Web (Pierre)

9/12

(23)

Class schedule

• December 7:Modern Web Technologies (Antoine)

• December 14:Semistructured Data on the Web (Antoine)

• December 21:Web Crawling and Web Scraping (Pierre)

• (Merry Christmas and Happy New Year!)

• January 11:Information Extraction and the Semantic Web (Antoine)

• January 18:Veracity and Explainability on the Web (Antoine)

• February 1:Web Information Retrieval (Pierre)

• February 15:Web Data Integration, the Deep Web (Pierre)

(24)

Class schedule

• December 7:Modern Web Technologies (Antoine)

• December 14:Semistructured Data on the Web (Antoine)

• December 21:Web Crawling and Web Scraping (Pierre)

• (Merry Christmas and Happy New Year!)

• January 11:Information Extraction and the Semantic Web (Antoine)

• January 18:Veracity and Explainability on the Web (Antoine)

• February 1:Web Information Retrieval (Pierre)

• February 8:Computation and Data Storage at Web Scale (Pierre)

• February 15:Web Data Integration, the Deep Web (Pierre)

9/12

(25)

An MPRI disclaimer

I have beenin your shoes, not so long ago...

What I remember from these days is not great...

• Ultra-specializedclasses

• No effortto teach prerequisites

• Only relevant to people who want tospecializein the field

• Onlytheoryand nopractice

(26)

An MPRI disclaimer

I have beenin your shoes, not so long ago...

What I remember from these days is not great...

• Ultra-specializedclasses

• No effortto teach prerequisites

• Only relevant to people who want tospecializein the field

• Onlytheoryand nopractice

10/12

(27)

An MPRI confession

Now I’m a teacher andunderstandwhy teachers teach like that:

• They enjoyresearchmore thanteaching

• They arepromotedbased onresearchnotteaching

• They arehereto findPhD studentsto do more research

• They areshort on timeso...

• Making complicated things understandabletakes time

• It’s to recycle existing slides about stuff you know!

(28)

Will this class be any different?

• Lessincomprehensible theoryand moreshallow practice

• The project can befun(hopefully)

• The class material/structure is probablynot perfect, sorry...

• OK I won’t try to hide it: we are hiring!

• ... but most of this class isn’tso relatedto what I do, so...

12/12

(29)

Will this class be any different?

• Lessincomprehensible theoryand moreshallow practice

• The project can befun(hopefully)

• The class material/structure is probablynot perfect, sorry...

• OK I won’t try to hide it: we are hiring!

• ... but most of this class isn’tso relatedto what I do, so...

(30)

Will this class be any different?

• Lessincomprehensible theoryand moreshallow practice

• The project can befun(hopefully)

• The class material/structure is probablynot perfect, sorry...

• OK I won’t try to hide it: we are hiring!

• ... but most of this class isn’tso relatedto what I do, so...

12/12

Références

Documents relatifs

• CSS3 makes it possible to position elements in two dimensions using CSS Grid. • Use display: grid on the element which will serve as

SVG+JS Javascript manipulation of vector graphics WebGL Javascript API for accelerated 3D using a GPU. WebVR Describe scenes for virtual reality headsets with a-scene WebRTC Voice

JSP Integrating Java and a Web server (e.g., Apache Tomcat) node.js Chrome’s JavaScript engine (V8) plus a Web server Python Web frameworks: Django , CherryPy, Flask. Ruby

DOM (Document Object Model) gives an API for the tree representation of HTML and XML documents (independent from the programming language). Schema languages Specify the structure

• YAML: extends JSON with several features, e.g., explicit types, user-defined types, anchors and references. • Some JSON parsers are permissive and allow,

• While inside an element we can check that the word of its children satisfies the deterministic regular expression used to define it. • Can be implemented with a

descendant::C[@att1='1'] is a step which denotes all the Element nodes named C, descendant of the context node, having an Attribute node att1 with value

• Information extraction: creating structured information out of existing Web Data