• Aucun résultat trouvé

Mining Machine-Readable Knowledge from Structured Web Markup

N/A
N/A
Protected

Academic year: 2022

Partager "Mining Machine-Readable Knowledge from Structured Web Markup"

Copied!
1
0
0

Texte intégral

(1)

Mining Machine-Readable Knowledge from Structured Web Markup

Ran Yu

GESIS - Leibniz Institute for the Social Sciences 50676 K¨oln, Germany

[email protected]

Abstract. The World Wide Web constitutes the largest collection of knowledge and is accessed by billions of users in their daily lives through applications such as search engines and smart assistants. However, most of the knowledge available on the Web is unstructured and is difficult for machines to process which leads to the lowered performance of such smart applications. Hence improving the accessibility of knowledge on the Web for machines is a prerequisite for improving the performance of such applications.

Knowledge bases (KBs) here refers to RDF datasets contains machine- readable knowledge collections. While KBs capture large amounts of fac- tual knowledge, their coverage and completeness vary heavily across dif- ferent types of domains. In particular, there is a large percentage of less popular (long-tail) entities and properties that are under-represented.

Recent efforts in knowledge mining aim at exploiting data extracted from the Web to construct new KBs or to fill in missing statements of existing KBs. These approaches extract triples from Web documents, or exploit semi-structured data from Web tables. Although the extraction of structured data from Web documents is costly and error-prone, the recent emergence of structured Web markup has provided an unprece- dented source of explicit entity-centric data, describing factual knowledge about entities contained in Web documents. Building on standards such as RDFa, Microdata and Microformats, and driven by initiatives such as schema.org, a joint effort led by Google, Yahoo!, Bing and Yandex, markup data has become prevalent on the Web. Through its wide avail- ability, markup lends itself as a diverse source of input data for KBA.

However, the specific characteristics of facts extracted from embedded markup pose particular challenges.

This work gives a brief overview of the existing works on min- ing machine-readable knowledge from both structured and unstructured data on the Web, and introduces the KnowMore approach for augment- ing knowledge bases using structured Web markup data.

Copyrightc 2019 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0).

Références

Documents relatifs

These efforts include providing Google Fusion Tables, a service for easily ingesting, visualizing and integrating data, mining the Web for high-quality HTML tables, and

The most prevalent formats for embedding structured data are Microformats, which use style definitions to annotate HTML text with terms from a fixed set of vocabularies; RDFa, which

However if there are some unloaded images or missing style information in a web page VIPS may fail to build correct visual containment tree which leads to data extraction problems

4 e.g.. Mining Resource Roles. In this work, the goal is to understand the actual roles of project members that collaborate using a VCS. Roles are decided in the project planning

Af- ter the Comment Resource was published the Annotation Client notifies the Semantic Pingback Service by sending a pingback request with the Comment Resource as source and

In this paper, we describe a solution for representing, acquiring and querying assessment data that uses (1) domain ontologies and standard terminologies to give formal descriptions

In contrast, supervised methods have been proposed that are trained using existing LOD datasets and applied to extract new facts, either by using the dataset as a training set for

Motivated by the demand of having algorithms that use lim- ited memory space for discovering collections of frequently co-occurring connected edges from big, interlinked, dynamic