• Aucun résultat trouvé

Requirement Mining in Technical Documents

N/A
N/A
Protected

Academic year: 2021

Partager "Requirement Mining in Technical Documents"

Copied!
3
0
0

Texte intégral

(1)

HAL Id: hal-03246596

https://hal.archives-ouvertes.fr/hal-03246596

Submitted on 4 Jun 2021

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Requirement Mining in Technical Documents

Juyeon Kang Choi, Patrick Saint-Dizier

To cite this version:

Juyeon Kang Choi, Patrick Saint-Dizier. Requirement Mining in Technical Documents. 1st ACL workshop on argument mining (2014), Association for Computational Linguistics (ACL), Jun 2014, Baltimore, Maryland, United States. pp.108-109, �10.3115/v1/W14-2118�. �hal-03246596�

(2)

Proceedings of the First Workshop on Argumentation Mining, pages 108–109, Baltimore, Maryland USA, June 26, 2014. c 2014 Association for Computational Linguistics

Requirement Mining in Technical Documents

Juyeon Kang

Prometil

42 Avenue du G´en´eral De Croutte 31100 Toulouse, France j.kang@prometil.com Patrick Saint-Dizier IRIT-CNRS 118 route de Narbonne 31062 Toulouse, France stdizier@irit.fr Abstract

In this paper, we first develop the linguis-tic characterislinguis-tics of requirements which are specific forms of arguments. The dis-course structures that refine or elaborate requirements are also analyzed. These specific discourse relations are conceptu-ally characterized, with the functions they play. An implementation is carried out in Dislog on the <TextCoop> platform. Dislog allows high level specifications in logic for a fast and easy prototyping at a high level of linguistic adequacy.

1 The Structure of Requirement Compounds

Arguments and in partticular requirements in writ-ten texts or dialogues seldom come in isolation, as independent statements. They are often em-bedded into a context that indicates e.g. circum-stances, elaborations or purposes. Relations be-tween a requirement and its context may be con-ceptually complex. They often appear in small and closely related groups or clusters that often share similar aims, where the first one is complemented, supported, reformulated, contrasted or elaborated by the subsequent ones and by additional state-ments.

The typical configuration of a requirement com-pound can be summarized as follows:

CIRCUMSTANCE(S)/CONDITION(S),PURPOSE(S)--> [REQUIREMENT CONCLUSION + SUPPORT(S)]*

<-- PURPOSE(S), , ELABORATION(S) CONCESSION(S) / CONTRAST(S)

In terms of language realization, clusters of re-quirements and their related context may be all included into a single sentence via coordination or subordination or may appear as separate sen-tences. In both cases, the relations between the different elements of a cluster are realized by means of conjunctions, connectors, various forms

of references and punctuation. We call such a clus-ter an requirement compound. The idea behind this term is that the elements in a compound form a single, possibly complex, unit, which must be considered as a whole from a conceptual and ar-gumentative point of view. Such a compound con-sists of a small number of sentences, so that its contents can be easily assimilated.

2 Linguistic Analysis

2.1 Corpus characteristics

Our corpus of requirements comes from 3 orga-nizations and 6 companies. Our corpus contains 1,138 pages of text extracted from 22 documents. The main features considered to validate our cor-pus are the following:

- specifications come form various industrial ar-eas;

- documents are produced by various actors; - requirement documents follow various authoring guidelines;

- requirements correspond to different conceptual levels.

A typical simple example is the following:

<ReqCompound> <definition> Inventory of qualifications

refers to norm YY.< /definition>

<mainReq> Periodically, an inventory of supplier’s

qualifi-cations shall be produced.< /mainReq>

<secondaryReq>In addition, the supplier’s quality

de-partment shall periodically conduct a monitoring audit program.< /secondaryReq>

<elaboration> At any time, the supplier should be able

to provide evidences that EC qualification is maintained.

</elaboration> < /ReqCompound>

2.2 The model

Let us summarize the processing model.

Requirement indetification in isolation:

Re-quirements are identified on the basis of a small number of patterns since they must follow precise

(3)

formulations, according e.g. to IEEE guidelines. On a small corpus of 64 pages of text (22 058 words), where 215 requirements have been man-ually annotated, a precision of 97% and a recall of 96% have been reached.

Identification and delimitation of require-ment compounds The principle is that all the

statements in a compound must be related either by the reference to the same theme or via phrasal connectors. These form a cohesion link in the compound. The theme is a nominal construction (object or event, e.g. inventory of qualifications)). This is realized by (1) the use of the theme in the sentences that follow or precede the main re-quirement with possible morphological variations, a different determination or simple syntactic vari-ations, This situation occurs in about 82% of the cases. (2) the use of a more generic term than the theme or a generic part of the theme, (3) the refer-ence to the parts of the theme, (3) the use of dis-course connectors to introduce a sentence, or (4) the use of sentence binders.

Relations between requirements in a com-pound Our observations show that the first

re-quirement is always the main rere-quirement of the compound. The requirements that follow develop some of its facets. Secondary requirements essen-tially develop forms of contrast, concession,

spe-cializations and constraints.

Linguistic characterization of discourse structures in a compound Sentences not identified as requirements must be bound to requirements via discourse relations and must be characterized by the function they play e.g. (Couper-Khulen et al. 2000). The structure and the markers and connectors typical of discourse relations found in technical texts are developed in (Saint-Dizier 2014) from (Marcu 2000) and (Stede 2012). These have been enhanced and adapted to the requirement context via several sequences of tests on our corpus. The main relations are the following: information and definitions which always occur before the main

requirement, elaborations which always follow a requirement, since this relation is very large, we consider it as the by-default relation in the compound, result which specifies the outcome of an action, purpose which expresses the underlying motivations of the requirements, and

circumstance which introduces a kind of local

frame under which the requirement compound is

valid or relevant.

A conceptual model is constructed in a first

stage from the discourse relations and functions presented above, and the notion of polarity and strength for requirements. Its role is to represent the relations between the various units of the com-pound in order to allow to draw inferences be-tween compounds, to make generalizations and to check coherence, e.g. (Bagheri et al. 2011).

2.3 Indicative evaluation

The system is implemented in Dislog on our TextCoop platform. The first step, requirement identification, produces very good results since their form is very regular: precision 97%, recall 96%. The second step, compound identification, produces the following results:

precision recall identification 93% 88% opening boundary 96% 91% closing boundary 92% 82% The identification of discourse structures in a compound produces the following results:

relations nb of nb of precision recall rules annotations contrast 14 24 84 88 concession 11 44 89 88 specialization 5 37 72 71 information 6 23 86 80 definition 9 69 87 78 elaboration 13 107 84 82 result 14 97 86 80 circumstance 15 102 89 83 purpose 17 93 91 83 References

Ebrahim Bagheri, Faezeh Ensan. 2011. Consolidating Multiple Requirement Specifcations through Argu-mentation, SAC’11 Proceedings of the 2011 ACM Symposium on Applied Computing.

Elena Couper-Kuhlen, Bernt Kortmann. 2000. Cause, Condition, Concession, Contrast: Cognitive and Discourse Perspectives, Topics in English Linguis-tics, No 33, de Gryuter.

Dan Marcu. 2000. The Theory and Practice of Dis-course Parsing and Summarization, MIT Press. Patrick Saint-Dizier, 2014 Challenges of Discourse

Processing: the acse of technical documents, Cam-bridge Scholars.

Manfred Stede. 2012 Discourse Processing, Morgan and Claypool Publishers.

Références

Documents relatifs

Following a shooting attack by two self-proclaimed Islamist gunmen at the offices of French satirical weekly Charlie Hebdo on 7th January 2015, there emerged the

Moreover, our result is also sharp in the sense that all viscosity solutions to the exterior Dirichlet problem is asymptotic to a quadratic polynomial by the result of Caffarelli and

We would like to comment on the paper: “Is post-mortem CT of the dentition adequate for correct forensic identifica- tion?: comparison of dental computed tomograpy and visual

Similar results are indicated in Figure 15 for fresh gale conditions, where the same qualitative comments apply as for the moderate gale: the conventional lifeboat shows

/ La version de cette publication peut être l’une des suivantes : la version prépublication de l’auteur, la version acceptée du manuscrit ou la version de l’éditeur. Access

With pure liquid methane, the bubbles were large and no foam was present, whereas for LNG, there was a distribution of bubbles sizes with many that were very small,

observable text features identified through corpus analysis, signalling the kind of relations between lexical items used in building terminologies (such as generic/specific, see

We might additionally require the results of human computation to have high validity or high utility, but the results must be reproducible in order to measure the validity or utility