• Aucun résultat trouvé

Lab ʆ: non-programmers

N/A
N/A
Protected

Academic year: 2022

Partager "Lab ʆ: non-programmers"

Copied!
2
0
0

Texte intégral

(1)

Web Search, Renmin University of China

Lab : non-programmers

Pierre Senellart (

pierre@senellart.com

)

 July 

Labs can be made individually or by groups of two. Make sure to send by email topierre@senellart.com, either at the end of the lab session or at the latest at midnight of the same day, an informal report about what you did in the lab, and the results you obtained. You do not have to finish all exercises, if you advance reasonably during the lab session but do not go to the end of the assignment, you will still get a passing grade.

 Hearst Patterns

. Consider the four following Hearst patterns:

X is aY

Ys, such asX1,X2, ...

X1,X2, ... and otherY

• manyYs, includingX,

For each of these patterns, and for each of the following four types:

• country

• island

• rock singer

• search engine

use the “*” advanced operator of Google for looking for instances of these types. Extract the correspond- ing instances, and compute the precision of the top- results for each of these  combinations.

. Come up with two Hearst patterns for Chinese, and repeat the same evaluation on the four types (trans- lated into Chinese) of the previous question.

 Instance Extraction with POS Tagging

Have a look at the POS-annotated corpus for the programmers’ assignment. Try to come up with extraction rules (either written as regular expressions, or informally defined) to extract instances and their types, using the POS annotations. Apply manually these rules on the first few articles, and compute their precision. You should strive to have a precision of at least .

 Set Expansion

Play with the set expansion algorithm of Google Sets (http://labs.google.com/sets). Find three examples where it is useful; compute the precision each time.

(2)

 Google Squared

Google Squared is a Google prototype which apply information extraction techniques to build dynamic tabular results about entities. Columns and lines of tables can be automatically added. Play with it to understand how it works. Use it to obtain the following information:

• The phone numbers of Chinese universities;

• French cheeses with a soft texture made out of cow milk;

• Doctoral advisors of Turing award winners.

How many results do you get this way? Try using a classical search engine to get the same results, does it work?

 Exploring Yago (by Fabian Suchanek)

YAGO is a large ontology obtained using information extraction technologies, that is being developed by the Max-Planck Institute for Informatics in Germany. Go to the YAGO Web site,http://mpii.de/yago, click on the “Demo” tab and start the textual browser. This browser allows navigating through the YAGO ontology.

. Choose a person of public interest. Type the name in the box and hit ENTER. Why do you not land directly at the entity itself?

. Click on the link with the entity (on the right). Follow thetypeandsubclassOflinks to the root class.

Note down all classes you encounter on one path to the root, hand them in.

. Find at least one incorrect statement in the ontology.

 OpenCalais

Pick an English-language news article of your choice and submit its content to the OpenCalais information extractor (http://viewer.opencalais.com/). For each of the annotations made by the system, suggest a technique that has probably been used to extract the information.

 Wikipedia IE

Pick a Wikipedia article, not too short. Explain which information can be extracted in an easy manner from it, describing the technique(s) that can be used for this purpose.

Références

Documents relatifs

2.4 Detecting two faults is hard for our test in the worst case The test of Section 2.1 works to detect a one-gate error, because if only one gate is affected then there is a

soit l'utilisation que vous comptez faire des fichiers, n'oubliez pas qu'il est de votre responsabilité de un ouvrage appartient au domaine public américain, n'en déduisez pas

Pour revenir à nous , mes Frères , je dois vous rappeler la position de la des Cœurs Fidèles au mo- ment de la reprise de ses travaux , il y a maintenant dix mois ; je dois vous

seule chose que je puisse admettre , c’est que , si la société des Templiers existait réellement , ce dont je doute beau- coup , du temps de Massillon , Fénélon et des autres

fein, &t puis que pour faire cette réparation vous avez voulu demeurer neuf mois caché dans le fein virginal. de Marie, & après

Inoltre il Paris attribuisce allo stesso autore l'altro poema: La Prise de Pampelune il quate non sarebbe che il séguito della Entrée de Spagne e questo autore* egli suppone, non

+ Ne pas supprimer V attribution Le filigrane Google contenu dans chaque fichier est indispensable pour informer les internautes de notre projet et leur permettre d'acceder a

+ Ne pas supprimer V attribution Le filigrane Google contenu dans chaque fichier est indispensable pour informer les internautes de notre projet et leur permettre d’accéder à