• Aucun résultat trouvé

Semi-automatic analysis of Navyany¯aya compounds

N/A
N/A
Protected

Academic year: 2022

Partager "Semi-automatic analysis of Navyany¯aya compounds"

Copied!
9
0
0

Texte intégral

(1)

Semi-automatic analysis of Navyany¯ aya compounds

S. R. Arjuna & G´erard Huet

Sanskrit Studies UoH & Inria Paris-Rocquencourt

30th SALA February 8th, 2014 University of Hyderabad

(2)

Navyany¯ aya style

Long compounds

Specific technical vocabulary

Productive use of various taddhitasuffixes

Semi-formal compound structure

Possible discrepancy with P¯an.inian rules ?

(3)

Collection of Navyany¯ aya compounds

S. R. Arjuna has been studying work by Ga ˙nges.a on the concept ofvy¯apti. This logician gives a set of 5 definitions, calledvy¯aptipa˜ncakamorpa˜ncalaks.an.¯ı. He then sets to refute them, and quotes two previous definitions, by T¯arkikasimha and T¯arkikavy¯aghra, together known asSimhavy¯aghralaks.anam. This work is commented in a book “Vy¯aptipa˜ncakam

simhavy¯aghralaks.an.am ca”, which was selected by Arjuna as typical corpus ofNavyany¯ayadiscussions.

He extracted from this corpus a set of 14,000 exemples, mostly nominal compound examples. From this digital corpus he chose at random 356 examples for the purposes of the present

experiment, aiming at using the Heritage Reader for semi-automated processing ofNavyany¯ayacorpus.

(4)

Experimental version of Heritage Reader

At mid-december 2013, after learning of Arjuna’s difficulties, G´erard set to design an experimental version of the Heritage Reader, tuned for recognition ofNavyany¯ayacompounds.

Here are the salient features of this extension:

Databanks are added for taddhitasuffixes inflected forms

Suffixes involved are-taa (Fem),-tva(Neu), and-vat/-vat¯ı (all 3 genders)

Segmenter transitions added to accommodatetaddhita productivity

Word mode for single padain order to curb overgeneration

Graph interface used to display the huge solution space

Technical vocabulary ofNavyany¯ayaacquired in lexicon Experiments unveiled a few weak spots to fix, such asavagraha treatment, and required lexicon acquisition of a small number of vocabulary items.

(5)

Experiments iteration

In mid-december 2013, S. R. Arjuna experimented with a limited set of 356 examples, and reported problems.

In january 2014, G. Huet fixed a few issues on the basis of this report, and incorporated the experimental mode in a new V2.80 version of the Heritage Engine. He then classified the remaining problems as follows.

(6)

Classification of problems on a 356 random examples selection

Firstly corpus data anomalies:

2 encoding problems (vowel l)

7 examples with -itiorna-segments

3 typos in corpus

These problems were solved by data cleaning/standardization.

The 7 non-compound examples were discarded from the test set.

(7)

Classification of problems (2)

Secondly processing failures:

3 time-out failures

3 -ka suffix

7 -t¯a-kaconstructions

13 inchoative compounds (-cvi) used as component (iic. or ifc.) of an englobing compound

This gives a recall of 91%. The last category points out an incompleteness in the current compound recognizer, which deserves correction. When the corresponding extension is available, the recall ought to improve to 95%.

If the (compound) suffix-t¯aka is added as productive, the recall might reach 97%. Of course this might degrade the precision, and even may induce more silence by time-out.

(8)

Conclusion

It is possible to rapidly tailor specific versions of our software to cater to specific corpus characteristics. For the case in point, il appears that most long compounds ofNavyany¯ayatexts are amenable to semi-automatic segmentation.

The correctness of the method relies on an important property of the selected productivetaddhitasuffixes, namely that the sandhi used for their (fake) declension does not overlap with the sandhi of their own affixing.

It is to be expected that the treatment of the extended set of 14,000 exemples will reveal other difficulties. Some further taddhitasuffixes, such as-mat, may be required. This will not be too problematic hopefully. The general problem of making fulltaddhita recognition raises interesting issues.

An interesting conclusion is that so far the considered Navyany¯ayacorpus is basically conformant to P¯an.ini.

(9)

Thank you for your attention!

Références

Documents relatifs

An atmospheric pressure chemical ionization - ion-trap mass spectrometer for the on-line analysis of volatile compounds in foods: a tool for linking aroma release to

The problem is reformulated as a distributed control problem by using the Fokker-Planck equation for the probability distribution of the stochastic process; then, an extended

Correlations of Raj and Morris, Mackay and Matsugu, Green and Maloney were tested with the experimental data and poor accuracy was observed (average absolute relative error

The sum of reactions (12) and (13) results in a net transformation of the reactive •OH into the less reactive •OOH radicals thus explaining the increased NTC-amplitude and the lack

Associating for the first time a total energy calculation and a phase transition approach, we present an analysis of the relative stability of two structural phases

l-limonene (monoterpene), β -caryophyllene and germacrene D (sesquiterpenes), (11E,13Z)-labdadien-8-ol and abienol (neutral diterpe- nes), oleic and stearic acids (fatty acids)

FIGURE 4 | FVA is performed to compute the flux variability of each reaction: (1) at different percentage (0–100%) of y bm whilst optimizing target product; and (2) at

Further improvements will be made to the ECCAD database in the forthcoming months, such as the inclusion of non-gridded data and the development of tools to compare gridded and