• Aucun résultat trouvé

Methods and Techniques for Information Extraction by Text Segmentation

N/A
N/A
Protected

Academic year: 2022

Partager "Methods and Techniques for Information Extraction by Text Segmentation"

Copied!
1
0
0

Texte intégral

(1)

Methods and Techniques for Information Extraction by Text Segmentation

Altigran Soares da Silva and Eli Cortez

Instituto de Computação Universidade Federal do Amazonas

Manaus, Brazil

{alti,eccv}@icomp.ufam.edu.br

Abstract. The growing use of text files for information exchange, such as HTML pages, XML documents, e-mail, blogs posts, tweets, RSS and SMS messages, has brought many problems related to how properly ex- ploit the information implicitly contained therein. In particular, prob- lems related to Information Extraction from such sources have motivated many studies in various scientific communities in areas such as Databases, Data Mining, Information Retrieval and Artificial Intelligence. In this tutorial, we present recent methods and techniques in the literature to address the problem of Information Extraction by Text Segmentation (IETS), which consists in extracting values of interest organized into semi-structured records (e.g., postal addresses, bibliographic citations, classified ads, etc.), implicitly present in textual sources. We will dis- cuss the most recent major approaches proposed in the literature, with particular emphasis on probabilistic methods.

18

Références

Documents relatifs

Our current system applies a four-stage strategy to the label contained within the panel itself, text detection, where a segmentation scheme is used to focus

• Want to view a page of a document but see context of surrounding text and

− Understand the contents of a document or collection of documents without reading them.?. Spring 2004 CS

In order to lead this evaluation of different tasks, document analysis problem have been divided into five subtasks respectively dedicated to segmentation, writing type

The proposed study combines the state of the art methods within distributed computing, text retrieval, clustering methods, and finally, using a classification method to

Therefore, this article proposes a method for determining information proximity in large arrays of text information, the distinctive feature of which is the use

The extractions of the systems were matched against the relationship in Test 1 golden dataset, and metrics of exact Precision (EP), exact Recall (ER), partial Precision (PP),

The peculiarities of the classical models of "information gatekeepers" are analyzed and the approach to clustering of messages in social networks is proposed