• Aucun résultat trouvé

Project Ungava

N/A
N/A
Protected

Academic year: 2021

Partager "Project Ungava"

Copied!
21
0
0

Texte intégral

(1)

Publisher’s version / Version de l'éditeur:

Vous avez des questions? Nous pouvons vous aider. Pour communiquer directement avec un auteur, consultez la

première page de la revue dans laquelle son article a été publié afin de trouver ses coordonnées. Si vous n’arrivez pas à les repérer, communiquez avec nous à PublicationsArchive-ArchivesPublications@nrc-cnrc.gc.ca.

Questions? Contact the NRC Publications Archive team at

PublicationsArchive-ArchivesPublications@nrc-cnrc.gc.ca. If you wish to email the authors directly, please see the first page of the publication for their contact information.

https://publications-cnrc.canada.ca/fra/droits

L’accès à ce site Web et l’utilisation de son contenu sont assujettis aux conditions présentées dans le site LISEZ CES CONDITIONS ATTENTIVEMENT AVANT D’UTILISER CE SITE WEB.

CNI Spring 2007 Task Force Meeting [Proceedings], 2007

READ THESE TERMS AND CONDITIONS CAREFULLY BEFORE USING THIS WEBSITE. https://nrc-publications.canada.ca/eng/copyright

NRC Publications Archive Record / Notice des Archives des publications du CNRC :

https://nrc-publications.canada.ca/eng/view/object/?id=064ae806-dfd4-4e9f-af53-d663a9ad65e6 https://publications-cnrc.canada.ca/fra/voir/objet/?id=064ae806-dfd4-4e9f-af53-d663a9ad65e6 NRC Publications Archive

Archives des publications du CNRC

This publication could be one of several versions: author’s original, accepted manuscript or the publisher’s version. / La version de cette publication peut être l’une des suivantes : la version prépublication de l’auteur, la version acceptée du manuscrit ou la version de l’éditeur.

Access and use of this website and the material on it are subject to the Terms and Conditions set forth at

Project Ungava

(2)

Project Ungava

Glen Newton

Group Leader, CISTI Research April 16 2007

(3)

Outline

• Context & Problem Space

• Long-term Strategic Objectives • Short term objectives

• The catalog

• Search & navigation

• Clustering & visualization

• Semantic Text Mining supporting Domain-specific navigation & search

(4)

Context & Problem

Space

• CISTI: Canada Institute for Scientific and Technical Information • Canadian National Science Library

• Client groups: academic researchers, industry researchers, students, scientific / medical / engineering innovators, etc. • Also scientific publisher: NRC Research Press

(5)

CISTI's Strategic Plan

• “Value proposition....An integrated "infostructure": electronic

access to scientific information, using intelligent search and

analysis tools”

• “Objectives...

– 1.4 Offer tools to facilitate discovery and exploitation of research

– 4.4 Conduct research in information science to advance knowledge and promote the adoption of new practices”

(6)

CISTI Research

• Research group for CISTI

• Digital library research

• Three main research areas: – Text mining

– Visualization – Agents

(7)

Long Term Strategic

Objectives

• 5 – 10yr : knowledge not documents (but supported by documents) • Discovery tools facilitating discovery of explicit and implicit

actionable knowledge in literature, data, other • Allow researchers to work in their language:

• "Show me all the compounds which effect the glycolysis

metabolic pathway by inhibiting kinases"

• "Show me all the organisms which are found in arctic soils which

are collembolen predators and resistant to heavy metals"

• "Show me all the compounds which are non-metallic, have a

melting point > 500C, tensile strength > 49.6 kg/mm/mm and electrical resistivity < 4600 Ω*m"

(8)

Short Term Objectives

• Improve discovery (search, navigation & visualization) at CISTI • Develop capacity, partnerships & prototypes in semantic web to

extend discovery to offer richer semantic capabilities: domain specific search & navigation

(9)

Ungava

• Improving discovery in existing collections (JOS, Source, Catalog, etc)

• Leveraging best-of-breed Open Source systems like Lucene,

Carrot2, Aduna, Simile Project etc.

• Scalable, flexible, usable, economical: – metadata & full-text search

– Simple integrated navigational search – Similar documents functionality

– Union search (single index for JOS, Source, Catalog, other) – Results clutsering & visualization, including tag cloud-like

functionality from Web 2.0

(10)
(11)
(12)

How?

• Get the content out of the (traditional) library catalog: XML export • Index with Lucene

• Perform results clustering & visualization of results: Carrot2, Aduna, Simile

(13)
(14)

Aduna Cluster Map

(Carrot2 Visualization)

(15)

Search Box Fallacy

• Assumption that the single (or multiple entry) search box solves all use cases

• Solves some, probably many use cases

• Alternate views, modalities (like visualization, etc) • Important other ones: personal collections:

– “I found these 3 documents that are just what I was looking for

(elsewhere). Show me all the documents in your collection that are like this”

– “I found these 4 web pages...

(16)

KDSD-SITMSL Sub-project

• Knowledge Discovery Supported Domain-Specific and

Interdisciplinary Text Mining of Scientific Literature

• Use taxonomies/classifications/thesaurii etc from research domain to offer enhanced search & navigation to researchers from that

domain (Domain-specific faceted search & browse) • Support interdisciplinary discovery

• Text mine out entities & create search & navigation taxonomies • Initial domains: Biological taxonomy (classification), minerological

(17)

KDSD-SITMSL Sub-project

• Knowledge Discovery Supported Domain-Specific and

Interdisciplinary Text Mining of Scientific Literature

• Use taxonomies/classifications/thesaurii etc from research domain to offer enhanced search & navigation to researchers from that

domain (Domain-specific faceted search & browse) • Support interdisciplinary discovery

• Text mine out entities & create search & navigation taxonomies • Initial domains: Biological taxonomy (classification), minerological

(18)

Next steps

• Continue with building base

• NSERC grant application with U of Ottawa & Carleton University • Examine new Semantic Web inference engines

• Look at alternate full-text engines like Terrier, etc.

(19)

Acknowledgements

• Jeff Demaine, Greg Kresko, Dr. Andre Vellino: CISTI Research • Dr. Val Behan-Pelletier, Dr. Guy Baillergeon (ITIS database):

Agriculture Canada

• Dr. Diana Inkpen, Unicersity of Ottawa

(20)

Questions?

• Glen Newton glen.newton@nrc-cnrc.gc.ca

(21)

Références

Documents relatifs

Although Berkeley’s results for these tasks are high, compared to the overall average MAP for all participants, not much can be claimed for this since there were very few

As Figure 2 shows, there are also large differences be- tween the numbers of MeSH annotations per document and the numbers of IPC annotations per patent: While most patents have

PatentExplorer uses similarity search to first retrieve patents for a list of patent IDs or given patent text and then offers the ability to refine the search results by their

We tackle task1 in the AILA 2020 shared task, where the goal is to retrieve precedent and statute documents related to a case document query in the Indian legal domain.. We use

The need to understand and elicit information needs in a particular domain The results obtained by our prediction experiments show that the queries issued to the conversational

Specifically, in order to address the new user cold-start challenge, we propose a cross-domain preference assumption, i.e., users with similar app-installation behav- iors are likely

Based on the Chinese Cardiovascular Diseases Database (CCDD, http://58.210.56.164: 88/ccdd/)we have designed, physicians diagnosis experience and relevant formal representation

In this context, the generative tools are driven by domain-specic information that is translated into dierent forms: declarations (Section 3), annotations (Section 4), and