• Aucun résultat trouvé

Web Search, Télécom ParisTech

N/A
N/A
Protected

Academic year: 2022

Partager "Web Search, Télécom ParisTech"

Copied!
1
0
0

Texte intégral

(1)

Web Search, Télécom ParisTech

Putting it all together

Pierre Senellart (

pierre.senellart@telecom-paristech.fr

)

 March 

The purpose of this lab session is to put together the crawler, inverted index construction, and PageRank computation that you worked on during the first three labs. It mainly consists in the items that were highlighted in the sections “To go further” of each lab assignment. This is also the occasion for finishing and polishing your implementations.

You can either start with your own implementation or with the correction available on the course website, depending on how far you went. If you start with the correction, take the time to study how it actually works.

If you start with your own implementation, please check that the results are correct:

• The conjunctive query “president france ” should return documents such as “Nicolas Sarkozy” and

“Jacques Chirac” in good positions.

• The two pages with the highest PageRank in the Simple English Wikipedia should be “United States”

and “France”, in that order.

There is no fixed list of assignment for this lab session, but focus on connecting the systems produced in the first three labs: How to use PageRank to improve the results of the inverted index? How to build the inverted index from a crawled subset of the Web, and use PageRank on top of this? You are also welcome to try new things and show initiative!

Références

Documents relatifs

Combine this with the inverted index you built yesterday for the Simple English dataset, so that queries over this dataset use a combination of tf-idf and PageRank. Combine this

There is no fixed list of assignment for this lab session, but focus on connecting the systems produced in the first four labs: How to use PageRank to improve the results of

We use the following scheme for class hierarchies: An abstract subclass of Robot redefines the start method in order to indicate how the function processURL should be called on

this very impressive "total body" gazeous dissection of retroperitoneum ( ) and mediastinum ( ) after a common endoscopic resection of a

The progression of sections leads the reader from the principles of quantum mechanics and several model problems which illustrate these principles and relate to chemical

In any case, it is important that the co-authors be aware about your plan of paper writing before even you begin the redaction phase.. It would not be a good idea to complete the

unnecessary load on the root servers as advice has been heard such as "To avoid having a corruptible cache, make your server a stealth secondary for the

The "arpa" domain is positioned as a top level domain in order to avoid potential operational instabilities caused by multiple DNS lookups spanning several