• Aucun résultat trouvé

Метод выявления имплицитно выраженных заимствований в научно-технических текстах на основе их концептуального анализа (A Method for Detecting Implicit Plagiarism in Scientific and Technical Texts on the Basis of their Conceptual Analysis)

N/A
N/A
Protected

Academic year: 2022

Partager "Метод выявления имплицитно выраженных заимствований в научно-технических текстах на основе их концептуального анализа (A Method for Detecting Implicit Plagiarism in Scientific and Technical Texts on the Basis of their Conceptual Analysis)"

Copied!
7
0
0

Texte intégral

(1)

-

© . . , . a.a.horoshilov@mail.ru

! " #$ , "

%!

# ". & #" !#

! #' # "

$$ # %!

$ " ! #$

% ,

# "

$$ % . (

$) # # , ! * +%% ' !

# ! , *

$ – ! .

1

1.1 !

%! ) $ ' !' $' %

$*)" , # + ' $". ! + '

$' ! # '!

$ , , # #

# " $ !$'

$'" #', ! ' ! " "

!". - + ! $ ! " ! , ! ! $ !$'

$'" #'.

# , ' ' %

! %,

!* $$$ , , '!$

", #

$# , ! /#

#". *) ,

#! # ! "

#$ , ' '

% ! . &

! " ! #, , $ +

# ". 0 ! , ! " +

$ * $* $$$ , * # '' . #$ , – + # '", $ " ! #$ $ ' # ,

%! #.

1.2 "

'* # "

! " #$ . $!" '*

#$*) !#:

x # ' !' #

# # #$ .

x # ' !' #

$% # " " ( $ "

! ! , ).

x # ' !' # $ " ! "

#$ .

x 3!'

! " #$ .

x 3!'

! " #$ .

x & +' # ,

$ *) # ' " +%% '

! #

! " #$ . 1.3 # $%$

90- # XX # '

# !# '

& XVII '

DAMDID/RCDL’2015 «

», #, 13-16 2015

(2)

, #! # . - ! , '!$

! $, * TurnItIn , SafeAssign, CopyScape, WriteCheck, iThenticate, PlagAware, PlagScan, Copyscape, CheckForPlagiarism.net, PlagiarismDetection.org. # 2000- # ' # $

!. (#" ! ! ! " Forecsys . ( '!$

67, # $$, $# . # ! #$

' , eTXT , Advego Plagiatus Text.ru. (# ! "

!' #$ " ! !",

#)" #$. 0 $ $' ! ! ". ! # '!$ $* !$

#$ , '!$

$ '

! # ' . 0 !*

! ## #

#$ #$ ". 3 *) # # # .

.

# #" # [21] !*

# #$ # n- # ,

!' #$ # " #$ . 9#$

$ ,

#*) !', # +"

'!$ # #$. & #"

# ', $)

$ $* $$$ #$, # # ,

# # $ . ; ' ' ! $ # " $ ! # $ , $* .

, . ( " #" # [22]

«$» - ! $*) $ #$.

, + $ #*, #$ * . (# ! ! # , I-Match. ;

# #$

' , #)

#" #$, + # ) ' " #$$

$$ . 7 I-Match

$ #$. ; + '

$# , ! +-

%$. &$

#" #$ ! +%% ' #$ #$ ".

.( # # #

!* # # # '"

% " # – [12, 24, 25]. 0

# ' # ' ! # # # , +

# ' # # . &

! # ' # * +-#. ; #

#$ # ' ' #* +-# .

. #

#$ , *)

# " %, *

# $) . ( ! * ' , $* $$$ , +$ '!$ #, ' ' " $ ! "

, , # , +

* ! ! #$

. $ # !#, * #$

% -

!. (# ' # ##. ( #' #

# $) $*)

[2,14,15,23,26] – $'" % $

# $$,

# $# !# [13], $*)

$.

# !# # *

# # , , -

! , # ,

! # ! "

*)" + + #

! * # $ ' ' +

! . ? - $

# [19-20] ! * '

!# $ )

! ". - , # $ ( , !$$

), $ , $ #"

" +

$. - !# # +%%

! " #

!' # # %!

" $$ , ! $ % $ " #$ " [8].

(3)

2

2.1 &

& #

# #' ! ' # " "

$$. & # %

$" # * [5]. ( !* '

! , )'*

# * ! ' , %$*

# .

3 $ # ! , '! , $' $" $$ ",

# *) # + #.

!#' , , ' $ – C## # * [5], !

$) # #

*#. “”

" ' ! * $

" #$ ! [9,10,11,17]. & $ # , ! [5]. ?!$' +

$ # , '" '" ! . & + , ' !' # . G' !' . &# " !'*

% $# ' $' ! ! " ", #H # *) # %.

ʞ˓ˈˑ˄˓˃ˊˑ˅˃ːˋˈ˕ˈˍ˔˕ˑ˅ˑˆˑ˒˓ˈˇ˔˕˃˅ˎˈːˋˢ˅

ˈˆˑ ˗ˑ˓ˏ˃ˎˋˊˑ˅˃ːːˑˈ ˔ˏ˞˔ˎˑ˅ˑˈ

˒˓ˈˇ˔˕˃˅ˎˈːˋˈ ˑ˄ˈ˔˒ˈ˚ˋ˅˃ˈ˕ ˔ˑ˒ˑ˔˕˃˅ˎˈːˋˈ

˕ˈˍ˔˕ˑ˅ ˒ˑ ˋ˘ ˔ˏ˞˔ˎˑ˅ˑˏ˖

˔ˑˇˈ˓ˉ˃ːˋˡሾͳǡ͵ǡͶǡͳ͸ሿǤ ʟˈˊ˖ˎ˟˕˃˕ˑˏ ˕˃ˍˑˆˑ

˔ˑ˒ˑ˔˕˃˅ˎˈːˋˢ ˢ˅ˎˢˈ˕˔ˢ ˖˔˕˃ːˑ˅ˎˈːˋˈ

˔ˏ˞˔ˎˑ˅ˑˌ ˄ˎˋˊˑ˔˕ˋ ˇˑˍ˖ˏˈː˕ˑ˅Ǥ &#

" !'* , # $ %, ˈˏ˞˘ ˑˇːˑˌ

ˋ ˕ˑˌ ˉˈ ˔ˑ˅ˑˍ˖˒ːˑ˔˕˟ˡ

", ˋˈ $, $ ' ! $ , !˕ˈˎ˟ːˑ #˞ˏˋǤ ʗˇˈː˕ˋ˚ː˞ˏˋ ˏˑˆ˖˕ ˄˞˕˟ ˔ˋ˕˖˃˙ˋˋǡ ˈ˔ˎˋ

#

" * " ( !'

# ") # # ,

$# *)" $ '"

'" . ʢ˔ˎˑ˅ˋˈˏ ˎˑˍ˃ˎ˟ːˑˆˑ

˔ˏ˞˔ˎˑ˅ˑˆˑ ˔˘ˑˇ˔˕˅˃ ˢ˅ˎˢˈ˕˔ˢ ˔˘ˑˇ˔˕˅ˑ ˍˑː˕ˈˍ˔˕ːˑˆˑ ˑˍ˓˖ˉˈːˋˢ ˋˇˈː˕ˋ˚ː˞˘

ː˃ˋˏˈːˑ˅˃ːˋˌ ˒ˑːˢ˕ˋˌ ˅ ˇ˅˖˘ ˕ˈˍ˔˕˃˘ ˋˎˋ ˋ˘

˗˓˃ˆˏˈː˕˃˘Ǥ ʢ˔ˎˑ˅ˋˈˏ ˆˎˑ˄˃ˎ˟ːˑˆˑ

˔ˏ˞˔ˎˑ˅ˑˆˑ ˔˘ˑˇ˔˕˅˃ ˢ˅ˎˢˈ˕˔ˢ ˔˘ˑˇ˔˕˅ˑ

˒ˑ˓ˢˇˍ˃ ˔ˎˈˇˑ˅˃ːˋˢ ː˃ˋˏˈːˑ˅˃ːˋˌ ˒ˑːˢ˕ˋˌ ˅

˔˓˃˅ːˋ˅˃ˈˏ˞˘ˇ˅˖˘˕ˈˍ˔˕˃˘ˋˎˋˋ˘˗˓˃ˆˏˈː˕˃˘Ǥ 7#' # ', '$

!'"

#

$, " #

# $' ! ! " ", ' ' $# #' #

$ ($#" # '' " " # .) # '" '*

# % $' ! ! " ",

#H #, #

$# $ * '" . 2.2

# # # # $#

'! ' $' " " - $'" !

#$ [6,7], #"

$ " " ሺʙʝʓʙʝሻ.

! ʙʝʓʙʝ ൌ ሼʜʞǡ ܭȁ݅ א ሾͳǡ ݊ʜʞ ሿሽ, # ʜʞൌ ሺݐǡ

i ሻ - i- ; ݊ʜʞ -

" ";

ݐ - # i–

;

i - # #" i – ;

ܭ - i – , # + * !

ʙൌ ሼʜʞʙȁ݅ א ሾͳǡ ݊ʜʞʙሿሽǡʜʞʙൌ ሺݐǡiሻǤ ʓˎˢ ˘˓˃ːˈːˋˢ ˓ˈˊ˖ˎ˟˕˃˕ˑ˅ ˖˔˕˃ːˑ˅ˎˈːˋˢ ˎˑˍ˃ˎ˟ːˑˆˑ ˔ˏ˞˔ˎˑ˅ˑˆˑ ˔˘ˑˇ˔˕˅˃ ˇˑˍ˖ˏˈː˕ˑ˅

ˋ ˒ˑˇ˔˚ˈ˕˃ ˍˑˠ˗˗ˋ˙ˋˈː˕ˑ˅ ˔ˏ˞˔ˎˑ˅ˑˌ

˄ˎˋˊˑ˔˕ˋ ˏˈˉˇ˖ ˗˓˃ˆˏˈː˕˃ˏˋ ˕ˈˍ˔˕ˑ˅ ˄˖ˇˈ˕

ˋ˔˒ˑˎ˟ˊˑ˅˃˕˟˔ˢˏ˃˕˓ˋ˙˃

¸¸

¸¸

¸

¹

·

¨¨

¨¨

¨

©

§

1 2 2

2

1 1

..

: :

:

..

..

2 1

2 22

21

1 12

11

n n n

n

ij n n

(4)

˓˃ˊˏˈ˓ːˑ˔˕ˋn n1 un2ǡˆˇˈn1 Ǧˍˑˎˋ˚ˈ˔˕˅ˑ ˠˎˈˏˈː˕ˑ˅ ˗ˑ˓ˏ˃ˎˋˊˑ˅˃ːːˑˆˑ ˔ˏ˞˔ˎˑ˅ˑˆˑ ˑ˒ˋ˔˃ːˋˢ ͳǦˑˆˑ ˋˊ ˔˓˃˅ːˋ˅˃ˈˏ˞˘ ˇˑˍ˖ˏˈː˕ˑ˅ǡ

2

n Ǧ ˍˑˎˋ˚ˈ˔˕˅ˑ ˠˎˈˏˈː˕ˑ˅

˗ˑ˓ˏ˃ˎˋˊˑ˅˃ːːˑˆˑ˔ˏ˞˔ˎˑ˅ˑˆˑˑ˒ˋ˔˃ːˋˢʹǦˑˆˑ ˋˊ˔˓˃˅ːˋ˅˃ˈˏ˞˘ˇˑˍ˖ˏˈː˕ˑ˅ǡ

ʒˇˈ ij Ǧ ˚ˋ˔ˎˈːː˃ˢ ˘˃˓˃ˍ˕ˈ˓ˋ˔˕ˋˍ˃

˅˞˒ˑˎːˈːˋˢ ˖˔ˎˑ˅ˋˢ ˎˑˍ˃ˎ˟ːˑˆˑ ˔ˏ˞˔ˎˑ˅ˑˆˑ

˔˘ˑˇ˔˕˅˃Ǥ ʑ ˔ˎ˖˚˃ˈij 0ˇ˃ːːˑˈ ˖˔ˎˑ˅ˋˈ Ȃ ːˈ

˅˞˒ˑˎːˈːˑǡ ˒˓ˋij !0Ǧ ˅˞˒ˑˎːˈːˑ ˚˃˔˕ˋ˚ːˑǡ

˃˒˓ˋij 0Ǧ˅˞˒ˑˎːˈːˑ˒ˑˎːˑ˔˕˟ˡǤ

ʔ˔ˎˋ(-&pi,-&qj)ൌͲǡ˕ˑij 0ǡˋː˃˚ˈ

) n n

( 3

䌥 (9 ,K )) (

2 3

) -&

, (-&

qj pi

-&9qj -&9pi

-&9 -&9

n , n

0

, pil qj m

qj pi

lm

ij

ǡˆˇˈ (-&pi,-&qj) Ǧ ˗˖ːˍ˙ˋˢ ˑ˒˓ˈˇˈˎˈːˋˢ

ˠˍ˅ˋ˅˃ˎˈː˕ːˑ˔˕ˋ ˔ˎˑ˅ˑ˔ˑ˚ˈ˕˃ːˋˌǡ ˒˓ˋ˚ˈˏ ]

1 , 0

[ ) -&

, -&

( pi qj

ǡ-&piǦ ‹Ǧ˞ˌ ˠˎˈˏˈː˕

˗ˑ˓ˏ˃ˎˋˊˑ˅˃ːːˑˆˑ˔ˏ˞˔ˎˑ˅ˑˆˑˑ˒ˋ˔˃ːˋˢ’Ǧˑˆˑ ˇˑˍ˖ˏˈː˕˃ǡ -&qj Ǧ ŒǦ˞ˌ ˠˎˈˏˈː˕

˗ˑ˓ˏ˃ˎˋˊˑ˅˃ːːˑˆˑ˔ˏ˞˔ˎˑ˅ˑˆˑˑ˒ˋ˔˃ːˋˢ“Ǧˑˆˑ ˇˑˍ˖ˏˈː˕˃Ǥ

ʞˑ˔ˍˑˎ˟ˍ˖ ˗˖ːˍ˙ˋˢ (-&pi,-&qj) Ǧ

˅ˑˊ˅˓˃˜˃ˈ˕ ˊː˃˚ˈːˋˢ ˑ˕ˎˋ˚ː˞ˈ ˑ˕ Ͳ ˊː˃˚ˋ˕ˈˎ˟ːˑ ˏˈː˟˛ˈˈ ˍˑˎˋ˚ˈ˔˕˅ˑ ˓˃ˊǡ ˚ˈˏ ː˖ˎˈ˅˞ˈ ˊː˃˚ˈːˋˢǡ ˏ˃˕˓ˋ˙˃ ˄˖ˇˈ˕ ˔˚ˋ˕˃˕˟˔ˢ

˓˃ˊ˓ˈˉˈːːˑˌǤʠˑˑ˕˅ˈ˕˔˕˅ˈːːˑˇ˃ːː˖ˡˏ˃˕˓ˋ˙˖

ˏˑˉːˑˊ˃ˏˈːˋ˕˟͵Ǧˢ˅ˈˍ˕ˑ˓˃ˏˋǤ

¸¸

¸¸

¸

¹

·

¨¨

¨¨

¨

©

§

n

:

:

1

Ȃ˅ˈˍ˕ˑ˓ˊː˃˚ˈːˋˌˋˊˏ˃˕˓ˋ˙˞ʛʬǡ

ˇˎˢ ˍˑ˕ˑ˓˞˘ ˅˞˒ˑˎːˢˈ˕˔ˢ ˖˔ˎˑ˅ˋˈ ij !0 ǡ

˒˓ˋ˚ˈˏi䌜[0,n2j䌜[0,n1

¸¸

¸¸

¸

¹

·

¨¨

¨¨

¨

©

§

pn

p

p

:

:

1

Ȃ ˅ˈˍ˕ˑ˓ ˋːˇˈˍ˔ˑ˅ ˋˊ

˗ˑ˓ˏ˃ˎˋˊˑ˅˃ːːˑˆˑ˔ˏ˞˔ˎˑ˅ˑˆˑˑ˒ˋ˔˃ːˋˢ’Ǧˑˆˑ ˇˑˍ˖ˏˈː˕˃ǡ ˔ˑˑ˕˅ˈ˕˔˕˅˖ˡ˜ˋ˘ ˊː˃˚ˈːˋˢˏ ˋˊ

˅ˈˍ˕ˑ˓˃ʑʖǤ

¸¸

¸¸

¸

¹

·

¨¨

¨¨

¨

©

§

qn q

q

:

:

1

Ȃ ˅ˈˍ˕ˑ˓ ˋːˇˈˍ˔ˑ˅ ˋˊ

˗ˑ˓ˏ˃ˎˋˊˑ˅˃ːːˑˆˑ˔ˏ˞˔ˎˑ˅ˑˆˑˑ˒ˋ˔˃ːˋˢ“Ǧˑˆˑ ˇˑˍ˖ˏˈː˕˃ǡ ˔ˑˑ˕˅ˈ˕˔˕˅˖ˡ˜ˋ˘ ˊː˃˚ˈːˋˢˏ ˋˊ

˅ˈˍ˕ˑ˓˃ʑʖǤ

ʢ˔ˎˑ˅ˋˈˏ ˆˎˑ˄˃ˎ˟ːˑˆˑ ˔ˏ˞˔ˎˑ˅ˑˆˑ ˔˘ˑˇ˔˕˅˃

ˢ˅ˎˢˈ˕˔ˢ ˔˘ˑˇ˔˕˅ˑ ˒ˑ˓ˢˇˍ˃ ˔ˎˈˇˑ˅˃ːˋˢ ː˃ˋˏˈːˑ˅˃ːˋˌ˒ˑːˢ˕ˋˌǡːˑ˒ˑ˔ˍˑˎ˟ˍ˖˒ˑ˓ˢˇˑˍ

˔ˎˈˇˑ˅˃ːˋˢ ː˃ˋˏˈːˑ˅˃ːˋˌ ˒ˑːˢ˕ˋˌ ˖˚˕ˈː ˒˓ˋ

˒ˑˇ˔˚ˈ˕ˈ ˍˑˠ˗˗ˋ˙ˋˈː˕ˑ˅ ijǡ˔ ˕ˑ˚ːˑ˔˕˟ˡ ˇˑ

˒ˈ˓ˈ˔˕˃ːˑ˅ˑˍǡˍˑ˕ˑ˓˞ˈ ˅ˑˊˏˑˉː˞˅ ˕ˈˍ˔˕ˈˋˊǦ ˊ˃ ˑ˔ˑ˄ˈːːˑ˔˕ˈˌ ˈ˔˕ˈ˔˕˅ˈːːˑˆˑ ˢˊ˞ˍ˃Ǥ ʞˑ˔ˎˈ

ˠ˕ˑˆˑ ˒˓ˑˋˊ˅ˑˇˋ˕˔ˢ ˒ˑˋ˔ˍ

˒ˑ˔ˎˈˇˑ˅˃˕ˈˎ˟ːˑ˔˕ˈˌ ː˃ˋˏˈːˑ˅˃ːˋˌ ˒ˑːˢ˕ˋˌǡ

˖ ˍˑ˕ˑ˓˞˘ ˊː˃˚ˈːˋˢ ˎˑˍ˃ˎ˟ːˑˌ ˔ˏ˞˔ˎˑ˅ˑˌ

˔˘ˑˉˈ˔˕ˋi˅˞˛ˈ ːˈˍˑˆˑ ˊ˃ˇ˃ːːˑˆˑ ˒ˑ˓ˑˆ˃ǡ ˅ ˋ˔˔ˎˈˇˑ˅˃ːˋˢ˘ ˋ˔˒ˑˎ˟ˊˑ˅˃ˎˋ˔˟ ˊː˃˚ˈːˋˢ

65 .

!0

i Ǥ ʖ˃˕ˈˏ ˇˎˢ ˠ˕ˋ˘

˒ˑ˔ˎˈˇˑ˅˃˕ˈˎ˟ːˑ˔˕ˈˌ ˅˞˚ˋ˔ˎˢˡ˕˔ˢ ˏˈ˓˞

˅˞˒ˑˎːˈːˋˢ ˖˔ˎˑ˅ˋˢ ˆˎˑ˄˃ˎ˟ːˑˆˑ ˔ˏ˞˔ˎˑ˅ˑˆˑ

˔˘ˑˇ˔˕˅˃Ǥ ʓ˃ːː˃ˢ ˊ˃ˇ˃˚˃ ˔˅ˑˇˋ˕˔ˢ ˍ

˅˞˚ˋ˔ˎˈːˋˡ˔˓ˈˇːˈˆˑˊː˃˚ˈːˋˢ˘˃˓˃ˍ˕ˈ˓ˋ˔˕ˋˍ

˅˞˒ˑˎːˈːˋˢ ˖˔ˎˑ˅ˋˢ ˎˑˍ˃ˎ˟ːˑˆˑ ˔ˏ˞˔ˎˑ˅ˑˆˑ

˔˘ˑˇ˔˕˅˃ǡ ˔ˑˇˈ˓ˉ˃˜ˋ˘˔ˢ ˅ ˠ˕ˋ˘

˒ˑ˔ˎˈˇˑ˅˃˕ˈˎ˟ːˑ˔˕ˢ˘ ː˃ˋˏˈːˑ˅˃ːˋˌ ˒ˑːˢ˕ˋˌǤ ʬ˕˃ ˅ˈˎˋ˚ˋː˃ ˋ ˄˖ˇˈ˕ ˢ˅ˎˢ˕˟˔ˢ ˍˑˠ˗˗ˋ˙ˋˈː˕ˑˏ ˔ˏ˞˔ˎˑ˅ˑˌ ˄ˎˋˊˑ˔˕ˋ

˗˓˃ˆˏˈː˕ˑ˅˕ˈˍ˔˕ˑ˅ǣ

! n

i !i

" n K

!

0

ʒˇˈ ! Ǧ ˠˎˈˏˈː˕ ˅ˈˍ˕ˑ˓˃ ʑʖǡ

˒˓ˋː˃ˇˎˈˉ˃˜ˋˌ ː˃ˌˇˈːːˑˌ ˙ˈ˒ˑ˚ˍˈǡ n Ǧ

˚ˋ˔ˎˑˠˎˈˏˈː˕ˑ˅˅˙ˈ˒ˑ˚ˍˈǤ

2.3

!$' # # "

! ! "

#$ . -# $ #

! + '! + $' ( 09), *) "

90%. ' 09 (/ 1.7 . " "), **)"

"" , !# C! [7]

$ ! ! 30 .

% #$ , #) !

# - [3] '! # + # "

" . ; !#

# "

$% #

! # # %": 1) + # " %;

2) ! # ( " !, ! - #); 3)

$ %! #

(5)

# # ! #

%! (/H 500 . ") ! # !

$% %!

# (! #" ).

!$' #$

- $'

! * 09 *

%!$* "

$! # .

&# # )'*

$! # # . 9 $ # !#

$ #

$#" ( $) $'*

" . (-, 7

" 7 " ).

()" # ! "

* #$*) +:

( 1. !$ )'*

+ $' $' ! " "

$! " + " .

( 1. 9#

)'* #$ " "

! $%

%! # "

" # $% " %.

( 2. &! # #*) " "

%! # " #$ .

( 3. ; #$

$ * ' #$ !

$ $ #*.

( 4. ; #$ -

#$ # ! #$ , "#

. 4 $ * ! $ #* %

!$ .

( 5. ; #" $ " .5

! $ %

#* ' ' " " + % .

( 6. * # '

", *) ! '"

" !# . ;

#" " # ' "

" ' '" " .

3 ) %

1. # '

! " #$

&

(!

"

#)

&

(#

)

' (!

"

#)

' (#

)

0.67 0.48 0.94 0.96

! $ " ! #$

!# : ' " ! #$ ( 12%) ' " "

#$ ( 30 + ). - + $ 17

#$ , #$

' 23 # 117. ; # # ! ! " #

!$' , $ . 6# !$' #

# # 1.

4 *%

3!" ## *

! " $- $' ! +

# ' # " ! * $* +%% ' $

! # ! ", *

$ –

! . 0 ## !$

#$*) ! #

%! ! " $$

:

x #

" " #$ .

x # $

!" #$

".

x # #

# "

$% $ %! $

# *.

x # !

!$ ! !

%.

x # $ "

! ! % !$

.

!* # ',

(6)

+%%

! " $ ## *, ! , #$ #'* !

# # %! "

$$ , $ !#

$! # #

# ! " %.

+

[1]9$! .&. !

" %. – .: -$, 1978.

– 175 .

[2]( M.. & !"

$' : ( . – .: -$. C!, 1997. – 112 .

[3]? M.M.

%, 2.

%. &# )" #"

9.. 9$ . – .: 30 . M.. & , 2008.– 342 .

[4]' .M., 9 .&. # ! " . – .:

& 3-, 2008. – 301 .

[5]* C## #. 9$ )"

. – .: &, 1977. – 370 . [6]?! .., ? M.., ..

# " !

#$ ) $-"

% # !#

#$ //

%! !'. – 2012. – . 8.

[7]7 .-., ..

#

# %!

" // $# XIV-" . $. %.

«0 :

# , +

» – RCDL’2012, . & '- 7", 3, 15 – 18 2012 . [8]7 .-., .. #

!#

! " $$ $- #$

! // $# XV-"

. $. %. «0 : # , + » –

RCDL’2013, . U ', 14 – 17 2013

#.

[9]'$ .. (

#" « ֞». – ., 1974 (2-

!#., 1999).

[10]'$ .. 3$" ! # «

֞». – – , 1995.

[11] W.;., ?$ " .., # G.G. #. G 0&-2. – .: -$, 1989.

[12]7 W.M., ..

'" ! # # #$ # WEB-#$ //

$# 9-" "" $"

% «0 : # , + » RCDL’2007: . $ $ / & '- 7", 3, 2007.

[13]? .W. , G .Y. , .. & $' % ## + //

. 9-" . $. %. «0 : #

, + » — RCDL 2007. — & '-7", 3, 2007.

— . 2. — . 104—110.

[14]. . &, . G. 9 ", -. M. &

9$' % -! "

% // Z Z . -9i , 2009,N N 3.-.67-79

[15]^$ .G. #' $$

# " % # ! -

" %// ;. … #.

. $. – -&$, 2003. – 185 . [16]? M.M., ? .. #.

" $'" ! . // -$- %.

. 2. – .: -, 2002. – _ 10.

[17]7 .. &#

!$ . – .: !#- $ , 1976.

[18] Banea C., Hassan S., Mohler M., Mihalcea R.

UNT: A Supervised Synergistic Approach to Semantic Text Similarity// Proc. of the Sixth Int.

Workshop on Semantic Evaluation SemEval, 2012.

[19] Hassan S., Mihalcea R. Measuring semantic relatedness using salient encyclopedic concepts//

Arti`cial Intelligence, Special Issue, 2011.

[20] Mohler M., Mihalcea R. Text-to-text semantic similarity for automatic short answer grading// In Proc. of the European Association for

Computational Linguistics (EACL 2009), Athens, Greece.

[2[1] Salton, G.; Wong, A.; Yang, C. S. (1975). "A vector space model for automatic indexing" / Communications of the ACM Volume 18 Issue 11, New York, NY, USA, Nov. 1975 Pages 613- 620., Salton et al. 1994.

[22] Abdur Chowdhury, Ophir Frieder, David Grossman, Mary Catherine McCabe // Collection statistics for fast duplicate document detection //

Journal ACM Transactions on Information Systems (TOIS) TOIS Homepage archive Volume 20 Issue 2, April 2002, Pages 171-191.

[23] Vor der Brück T., Hartrumpf S. A readability checker based on deep semantic indicators// In

(7)

Human Language Technology. Challenges of the Information Society – 2009. – V. 5603 of Lecture Notes in Computer Science (LNCS). – P. 232-244.

Berlin, Germany: Springer.

[24] A. Broder. On the resemblance and containment of documents. Compression and Complexity of Sequences (SEQUENCES'97), pages 21-29. IEEE Computer Society, 1998.

[25] Broder, S. Glassman, M. Manasse and G. Zweig.

Syntactic clustering of the Web. Proc. of the 6th International World Wide Web Conference, April 1997.

[26]Hartrumpf, Sven; Tim vor der Brück; and Christian Eichhorn (2010a). Detecting duplicates with shallow and parser-based methods. In Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering (NLPKE), pp. 142-149. Beijing, China.

A Method for Detecting Implicit Plagiarism in Scientific and Technical Texts on the Basis of Their Conceptual

Analysis

Alexey A. Khoroshilov

The paper presents the process of automatic plagiarism detection in documents on the base of comparison of their formalized representations. In solving this problem we developed a model of the semantic structure of texts. To detect plagiarism, we developed an algorithm for detection of similar semantic fragments and a method for identification of semantic similarity between text fragments. The main advantage of this method is that it makes it possible to detect not only minor changes in the structure or lexical structure of the text, but also more complicated cases of intended changes in the plagiarized texts.

Références

Documents relatifs

В таких сообществах существенны заимствования текста между диссертациями, что указывает на наличие коллективов, занимающихся подготовкой диссертаций

Однако при об- работке готовой коллекции документов с целью об- наружения и устранения дубликатов важно выявить все пары сообщений, в которых имеет место дубли- рование,

Such scores demonstrate acceptable performance for the approach used, which is geared towards copying with low levels of obfuscation, as even without subsequent

In this paper, we presented a method for the second step of the plagiarism detection process in order to identify similar parts of two documents. In this method, by

For each no-plagiarism document, a source set is selected as described in step d. However, in this step a similarity detection at the sentence level for each randomly

Mahak Samim: A Corpus of Persian Academic Texts for Evaluating Plagiarism Detection Systems.. Morteza Rezaei Sharifabadi Seyed

Ïåðâûå äâà êðèòåðèÿ ïîçâîëÿþò âûÿâëÿòü ïðîòÿæåííûå îáëàñòè èçîáðàæåíèÿ îäíîãî öâåòà. Òðå- òèé êðèòåðèé íåîáõîäèì äëÿ âûÿâëåíèÿ îáëàñòåé ñ ãðàäèåíòíîé

For evaluation of corpora based on statistical information, we categorized the statisti- cal information in three different aspects: The first view describes the numerical