• Aucun résultat trouvé

Generation of patent abstracts: a challenge for automatic text summarization

N/A
N/A
Protected

Academic year: 2022

Partager "Generation of patent abstracts: a challenge for automatic text summarization"

Copied!
1
0
0

Texte intégral

(1)

Generation of Patent Abstracts:

A Challenge for Automatic Text Summarization

Leo Wanner, ICREA and DTIC, UPF

It is well known that patents drive the modern economies. But they do even more:

patents also serve as a valuable and unique source of up-to-date scientific and technological information. It is assumed that only 10% to 15% of the content presented in patents are described in other publications as well. The worldwide stock of patents thus comprises about 85% to 90% of scientific knowledge. Given that central parts of patents are authored in an idiosyncratic and complex language which is difficult to read and comprehend, and since author-written patent abstracts have the goal to obfuscate the precise nature and the real scope of the inventions rather than to clarify them, an efficient access to this knowledge, for instance, via concise and transparent summaries, appears crucial. However, partially due to the aforementioned language idiosyncrasy, which implies extremely long sentences with complex repetitive linguistic constructions, common extraction-oriented automatic text summarization techniques cannot be expected to show an acceptable performance when applied to patents. Other, more content-oriented (or abstractive) summarization techniques are needed. In my talk, I will present the recent and ongoing research on patent summarization carried out by the Natural Language Processing Group of the Department of Information and Communication Technologies, UPF as member European consortia. I will first describe the techniques for the summarization of patent claims developed in the scope of the PATExpert project and outline then how these techniques are about to be improved in the TOPAS project by considering information from other sections of a patent, notably the description of the invention. In the last part of my presentation, I will summarize the remaining challenges and suggest some lines of future research which are crucial if we want automatic patent summarization to be a real alternative to (semi-)manual abstracting, which still dominates the patent domain.

Références

Documents relatifs

In order to compare this PCB-based PAF-SW-SIW waveguide with the CNT-based AF- SW-SIW waveguide, the simulation results of the phase constant

SummBank is a dataset of parallel English and Chinese documents containing, in addition to source documents, multi-document summaries for sets of related stories (i.e., 40

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

This paper considers the issues of automatic summarization of text documents taking into account the syntactic relations between words and word forms in sentences that can be

With this challenge we created a basis for interesting discussions and hope to have taken the       NLP community for German text understanding a step further. .. For the challenge,

To generate synthetic data, first, a system in the reverse direction (i.e. source as sum- mary and target as text) is trained and then used to generate text for the given summary..

For the abstractive summarization task, the ground-truth labels are constructed from the annotated named entities in the order of BRAND, FUNCTION, VARIATION, SIZE, and COUNT, same as

Objective methods do not incorporate user judgment into the evaluation criteria but evaluate the performance of a given technique based on, for example, the extent to which speci fi