HAL Id: halshs-01071463
https://halshs.archives-ouvertes.fr/halshs-01071463
Submitted on 6 Oct 2014
HAL is a multi-disciplinary open access archive
for the deposit and dissemination of scientific
re-search documents, whether they are published or not.
The documents may come from teaching and research
institutions in France or abroad, or from public or
pri-vate research centers.
L’archive ouverte pluridisciplinaire HAL, est
des-tinée au dépôt et à la diffusion de documents
scien-tifiques de niveau recherche, publiés ou non, émanant
des établissements d’enseignement et de recherche
français ou étrangers, des laboratoires publics ou
privés.
Компьютерная модель морфологического анализа
словоформ французского языка
Alexandre Bolkhovityanov, Elena Egorova, Alexei Lavrentiev, Andrey Chepovskiy
To cite this version:
Alexandre Bolkhovityanov, Elena Egorova, Alexei Lavrentiev, Andrey Chepovskiy. Компьютерная модель
морфологического анализа словоформ французского языка. Ershov Informatics Conference, Jun 2014,
Saint-Pétersbourg, Russia. pp.20-28. ⟨halshs-01071463⟩
. . , . . , . . , . . Ч :
© . .Э. ?
« »
ICAR Research Lab - CNRS, Universitц de Lyon, ASLAN Labex, 2014
К
а
а а
а
а
а
. . 1, . . 2, . . 3, . . Ч 2 1 . .Э. 2 « » 3ICAR Research Lab - CNRS, UnТvОrsТtц НО Lвon, ASLAN LКЛОб
А а . . , -. -. . . К а: , -, .
21
Computerized model of morphological analysis applied to
French wordforms
. BolkСovТtвКnov1, E. Egorova2, A. Lavrentev3, A. Chepovskiy2 1
Bauman Moscow State Technical University, Moscow, Russia
2
National Research University Higher School of Economics, Moscow, Russia
3
ICAR Research Lab - CNRS, UnТvОrsТtц НО Lвon, ASLAN LКЛОб
Abstract. In this paper a computerized model for morphological
anal-ysis of French wordforms is proposed. Firstly, a brief description of a stemming algorithm is given. The main idea is to create structural schemes and corresponding lists of suffixes, and the detailed proce-dure of this work is described in the second part of the paper.. The next part of this work concerns the technique of determining of gram-matical characteristics for the French language. In the final part of the work, some results of execution of the proposed algorithms are pro-vided.
Keywords: stemming algorithm, algorithms of morphological
analy-sis, natural language processing.
1
а
а а а
а
, : : . , . — . . М . Д1]., . . , . , , , , . . , , . : , , , , . : , , .
2
а
а
, . , . . . . Д2], 9 :(nom), (adjectif), (verbe),
(adverbe), (article), (pronom), (prцposition),
(conjonction) (interjection).
-, , .
23 : . , ( ) ( , ). . , , , , . , , .
v
. . ( , . .). Д2-5]. , : , , . – .3
а
а
а
а
, , , . , . , . . , . . . « », . . . , , ,v
. , :
v
, marcher ( ), march-er. march-ions ( ), march-erai ( ) . . -er, -ir . , -ifier -iser. ( ), . , . . : -ifi-er -is-er. Э , (-er) , , -ifi -is- , -er . , , , .
aill, ass, ill, och, onn, ot, ouill, ard, сtr,
-et-, -in-; . : chant-er
chant-onn-er (
), touss-er
touss-ot-er (
) . . ,
c
, . , :
c
v
, , , . , . : . , , :
n
v
,n
, .
a
v
c
v
,a
v
, .25
4
В
а
а а
а а
. : . , , . : ppV
A
,A
— (avoir шtre),V
pp — participe passц. , , : , , . , , , . . , : , , . . : , . , . .5
а
а
а
( , ) C++ , . : , . . « ». Unicode (UTF-16), . , , , ( , , ).,
. ,
:
static const uint64 FRANCE_GC_SINGULAR = (uint64)1 << 4; static const uint64 FRANCE_GC_PLURAL = (uint64)1 << 5;
/ , . , , . , , . . : – . ( ) . , . ( ) . , , . « » , . , . Э , . 6 , . ,
27 . , . . . , , . Linux Windows : 32- 64- .
6
За
-, . ++ ( , ) . , . , , -, -. , , -, « -» . , 95 100 82 100 . . -, , . , 3 5 22%, 29%.-.
а
1. . ., Ч . . // -CPT2013, 12-19 2013 ., , , . Ф , - , ISBN 978-5-88835-025-6, .154-1592. DuBois J.,Lagane R. Livres de bord : Grammaire. – 1995. 3. Grevisse M., Goosse A. Le bon usage. – 2007.
4. . . . –
.: URSS: , 2006.
5. Huot H. La morphologiО: ПormО Оt sОns НОs mots Нu ПrКnхКТs. – Armand Colin, 2006.-249p.