• Aucun résultat trouvé

Web Search, Télécom ParisTech

N/A
N/A
Protected

Academic year: 2022

Partager "Web Search, Télécom ParisTech"

Copied!
1
0
0

Texte intégral

(1)

Web Search, Télécom ParisTech

Putting it all together

Pierre Senellart (

pierre.senellart@telecom-paristech.fr

)

 March 

The purpose of this lab session is to put together the crawler, inverted index construction, PageRank com- putation, and information extractor that you worked on during the first three labs. It mainly consists in the items that were highlighted in the sections “To go further” of each lab assignment. This is also the occasion for finishing and polishing your implementations.

You can either start with your own implementation or with the reference implementation available on the course website, depending on how far you went. If you start with the reference implementation, take the time to study how it actually works. If you start with your own implementation, please check that the results are correct:

• The conjunctive query “president france ” should return documents such as “Nicolas Sarkozy” and

“Jacques Chirac” in good positions.

• The two pages with the highest PageRank in the Simple English Wikipedia should be “United States”

and “France”, in that order.

There is no fixed list of assignment for this lab session, but focus on connecting the systems produced in the first four labs: How to use PageRank to improve the results of the inverted index? How to build the inverted index from a crawled subset of the Web, and use PageRank on top of this? How to integrate the information retrieval system with the information extracted from the articles? You are also welcome to try new things and show initiative!

Références

Documents relatifs

The progression of sections leads the reader from the principles of quantum mechanics and several model problems which illustrate these principles and relate to chemical

Combine this with the inverted index you built yesterday for the Simple English dataset, so that queries over this dataset use a combination of tf-idf and PageRank. Combine this

There is no fixed list of assignment for this lab session, but focus on connecting the systems produced in the first three labs: How to use PageRank to improve the results of

We use the following scheme for class hierarchies: An abstract subclass of Robot redefines the start method in order to indicate how the function processURL should be called on

this very impressive "total body" gazeous dissection of retroperitoneum ( ) and mediastinum ( ) after a common endoscopic resection of a

In any case, it is important that the co-authors be aware about your plan of paper writing before even you begin the redaction phase.. It would not be a good idea to complete the

The objectives that have been proclaimed many times have never been achieved, the best known being the commitment of states to devote 15 per cent of their national

unnecessary load on the root servers as advice has been heard such as "To avoid having a corruptible cache, make your server a stealth secondary for the