General Presentation and Class Structure
MPRI 2.26.2: Web Data Management
Antoine Amarilli Friday, December 7th
Web Data Management
• A class aboutthe Weband the data that it contains
• Strongpractical aspectsbut connections totheory
• e.g.,XPath(practice) vstree automata(theory)
• e.g.,SPARQL(practice) vsregular path queries(theory)
→ A way to see somepracticewithin the confines of MPRI
→ A way to see some exotictheorymotivated by practice
2/12
Teachers
Antoine Amarilli Télécom ParisTech
Pierre Senellart École normale supérieure
Class time and modalities
• OnFriday afternoonfrom16:15to19:30with break(s).
• Sorry about your weekend plans...
• Attendance isnot mandatorybut we haveattendance sheets
• So we can know whether you sometimes show up in class...
4/12
Class time and modalities
• OnFriday afternoonfrom16:15to19:30with break(s).
• Sorry about your weekend plans...
• Attendance isnot mandatorybut we haveattendance sheets
• So we can know whether you sometimes show up in class...
Moodle
https://moodle.di.ens.fr/course/view.php?id=9(cf wikimpri)
• Pleaseregister to the classon Moodle!
• ENS studentscan directly use theSPI CASto login
• Other studentscan create an account
• Everyonecanself-enrolto the class
5/12
What is Moodle good for?
• Finding theclass material(slides, etc.)
• Ask questions(better than via email)
• Read questionsasked by others
• You can subscribe tonotificationsif you wish
• Submit your project(more soon)
Class evaluation
• 50%of the grade will be anexam(on March 1st)
→ This is required by MPRI rules...
• 50%of the grade will be aproject
→ Namely...
7/12
Class evaluation
• 50%of the grade will be anexam(on March 1st)
→ This is required by MPRI rules...
• 50%of the grade will be aproject
→ Namely...
About that project...
• All details areon Moodle, here are the key points:
• 1 studentor2 studentsper project
• Free choice of topicrelated to the Web
• Deadlines and deliverables:
• Dec 21:submit on Moodle the project description and group
• Thecodebaseshould be open-source on apublic repository
• There should be aREADMEwith minimal documentation
• Feb 22:End of project,defensewith slides and ademo
→ Use the project for...
• Trying out someoriginal idea
• Scratching apersonal itch
• Contributing to anexisting codebase
→ Try tohave fun! ;-)
8/12
About that project...
• All details areon Moodle, here are the key points:
• 1 studentor2 studentsper project
• Free choice of topicrelated to the Web
• Deadlines and deliverables:
• Dec 21:submit on Moodle the project description and group
• Thecodebaseshould be open-source on apublic repository
• There should be aREADMEwith minimal documentation
• Feb 22:End of project,defensewith slides and ademo
→ Use the project for...
• Trying out someoriginal idea
• Scratching apersonal itch
• Contributing to anexisting codebase
→ Try tohave fun! ;-)
About that project...
• All details areon Moodle, here are the key points:
• 1 studentor2 studentsper project
• Free choice of topicrelated to the Web
• Deadlines and deliverables:
• Dec 21:submit on Moodle the project description and group
• Thecodebaseshould be open-source on apublic repository
• There should be aREADMEwith minimal documentation
• Feb 22:End of project,defensewith slides and ademo
→ Use the project for...
• Trying out someoriginal idea
• Scratching apersonal itch
• Contributing to anexisting codebase
→ Try tohave fun! ;-)
8/12
About that project...
• All details areon Moodle, here are the key points:
• 1 studentor2 studentsper project
• Free choice of topicrelated to the Web
• Deadlines and deliverables:
• Dec 21:submit on Moodle the project description and group
• Thecodebaseshould be open-source on apublic repository
• There should be aREADMEwith minimal documentation
• Feb 22:End of project,defensewith slides and ademo
→ Use the project for...
• Trying out someoriginal idea
• Scratching apersonal itch
• Contributing to anexisting codebase
→ Try tohave fun! ;-)
About that project...
• All details areon Moodle, here are the key points:
• 1 studentor2 studentsper project
• Free choice of topicrelated to the Web
• Deadlines and deliverables:
• Dec 21:submit on Moodle the project description and group
• Thecodebaseshould be open-source on apublic repository
• There should be aREADMEwith minimal documentation
• Feb 22:End of project,defensewith slides and ademo
→ Use the project for...
• Trying out someoriginal idea
• Scratching apersonal itch
• Contributing to anexisting codebase
→ Try tohave fun! ;-)
8/12
About that project...
• All details areon Moodle, here are the key points:
• 1 studentor2 studentsper project
• Free choice of topicrelated to the Web
• Deadlines and deliverables:
• Dec 21:submit on Moodle the project description and group
• Thecodebaseshould be open-source on apublic repository
• There should be aREADMEwith minimal documentation
• Feb 22:End of project,defensewith slides and ademo
→ Use the project for...
• Trying out someoriginal idea
Class schedule
• December 7:Modern Web Technologies (Antoine)
• December 14:Semistructured Data on the Web (Antoine)
• December 21:Web Crawling and Web Scraping (Pierre)
• (Merry Christmas and Happy New Year!)
• January 11:Information Extraction and the Semantic Web (Antoine)
• January 18: Veracity and Explainability on the Web (Antoine)
• February 1:Web Information Retrieval (Pierre)
• February 8: Computation and Data Storage at Web Scale (Pierre)
• February 15:Web Data Integration, the Deep Web (Pierre)
9/12
Class schedule
• December 7:Modern Web Technologies (Antoine)
• December 14:Semistructured Data on the Web (Antoine)
• December 21:Web Crawling and Web Scraping (Pierre)
• (Merry Christmas and Happy New Year!)
• January 11:Information Extraction and the Semantic Web (Antoine)
• January 18: Veracity and Explainability on the Web (Antoine)
• February 1:Web Information Retrieval (Pierre)
• February 8: Computation and Data Storage at Web Scale (Pierre)
• February 15:Web Data Integration, the Deep Web (Pierre)
Class schedule
• December 7:Modern Web Technologies (Antoine)
• December 14:Semistructured Data on the Web (Antoine)
• December 21:Web Crawling and Web Scraping (Pierre)
• (Merry Christmas and Happy New Year!)
• January 11:Information Extraction and the Semantic Web (Antoine)
• January 18: Veracity and Explainability on the Web (Antoine)
• February 1:Web Information Retrieval (Pierre)
• February 8: Computation and Data Storage at Web Scale (Pierre)
• February 15:Web Data Integration, the Deep Web (Pierre)
9/12
Class schedule
• December 7:Modern Web Technologies (Antoine)
• December 14:Semistructured Data on the Web (Antoine)
• December 21:Web Crawling and Web Scraping (Pierre)
• (Merry Christmas and Happy New Year!)
• January 11:Information Extraction and the Semantic Web (Antoine)
• January 18: Veracity and Explainability on the Web (Antoine)
• February 1:Web Information Retrieval (Pierre)
• February 8: Computation and Data Storage at Web Scale (Pierre)
• February 15:Web Data Integration, the Deep Web (Pierre)
Class schedule
• December 7:Modern Web Technologies (Antoine)
• December 14:Semistructured Data on the Web (Antoine)
• December 21:Web Crawling and Web Scraping (Pierre)
• (Merry Christmas and Happy New Year!)
• January 11:Information Extraction and the Semantic Web (Antoine)
• January 18: Veracity and Explainability on the Web (Antoine)
• February 1:Web Information Retrieval (Pierre)
• February 8: Computation and Data Storage at Web Scale (Pierre)
• February 15:Web Data Integration, the Deep Web (Pierre)
9/12
Class schedule
• December 7:Modern Web Technologies (Antoine)
• December 14:Semistructured Data on the Web (Antoine)
• December 21:Web Crawling and Web Scraping (Pierre)
• (Merry Christmas and Happy New Year!)
• January 11:Information Extraction and the Semantic Web (Antoine)
• January 18:Veracity and Explainability on the Web (Antoine)
• February 1:Web Information Retrieval (Pierre)
• February 8: Computation and Data Storage at Web Scale (Pierre)
• February 15:Web Data Integration, the Deep Web (Pierre)
Class schedule
• December 7:Modern Web Technologies (Antoine)
• December 14:Semistructured Data on the Web (Antoine)
• December 21:Web Crawling and Web Scraping (Pierre)
• (Merry Christmas and Happy New Year!)
• January 11:Information Extraction and the Semantic Web (Antoine)
• January 18:Veracity and Explainability on the Web (Antoine)
• February 1:Web Information Retrieval (Pierre)
• February 8: Computation and Data Storage at Web Scale (Pierre)
• February 15:Web Data Integration, the Deep Web (Pierre)
9/12
Class schedule
• December 7:Modern Web Technologies (Antoine)
• December 14:Semistructured Data on the Web (Antoine)
• December 21:Web Crawling and Web Scraping (Pierre)
• (Merry Christmas and Happy New Year!)
• January 11:Information Extraction and the Semantic Web (Antoine)
• January 18:Veracity and Explainability on the Web (Antoine)
• February 1:Web Information Retrieval (Pierre)
• February 15:Web Data Integration, the Deep Web (Pierre)
Class schedule
• December 7:Modern Web Technologies (Antoine)
• December 14:Semistructured Data on the Web (Antoine)
• December 21:Web Crawling and Web Scraping (Pierre)
• (Merry Christmas and Happy New Year!)
• January 11:Information Extraction and the Semantic Web (Antoine)
• January 18:Veracity and Explainability on the Web (Antoine)
• February 1:Web Information Retrieval (Pierre)
• February 8:Computation and Data Storage at Web Scale (Pierre)
• February 15:Web Data Integration, the Deep Web (Pierre)
9/12
An MPRI disclaimer
I have beenin your shoes, not so long ago...
What I remember from these days is not great...
• Ultra-specializedclasses
• No effortto teach prerequisites
• Only relevant to people who want tospecializein the field
• Onlytheoryand nopractice
An MPRI disclaimer
I have beenin your shoes, not so long ago...
What I remember from these days is not great...
• Ultra-specializedclasses
• No effortto teach prerequisites
• Only relevant to people who want tospecializein the field
• Onlytheoryand nopractice
10/12
An MPRI confession
Now I’m a teacher andunderstandwhy teachers teach like that:
• They enjoyresearchmore thanteaching
• They arepromotedbased onresearchnotteaching
• They arehereto findPhD studentsto do more research
• They areshort on timeso...
• Making complicated things understandabletakes time
• It’s to recycle existing slides about stuff you know!
Will this class be any different?
• Lessincomprehensible theoryand moreshallow practice
• The project can befun(hopefully)
• The class material/structure is probablynot perfect, sorry...
• OK I won’t try to hide it: we are hiring!
• ... but most of this class isn’tso relatedto what I do, so...
12/12
Will this class be any different?
• Lessincomprehensible theoryand moreshallow practice
• The project can befun(hopefully)
• The class material/structure is probablynot perfect, sorry...
• OK I won’t try to hide it: we are hiring!
• ... but most of this class isn’tso relatedto what I do, so...
Will this class be any different?
• Lessincomprehensible theoryand moreshallow practice
• The project can befun(hopefully)
• The class material/structure is probablynot perfect, sorry...
• OK I won’t try to hide it: we are hiring!
• ... but most of this class isn’tso relatedto what I do, so...
12/12