• Aucun résultat trouvé

Algorithms for computing approximate repetitions in musical sequences

N/A
N/A
Protected

Academic year: 2021

Partager "Algorithms for computing approximate repetitions in musical sequences"

Copied!
17
0
0

Texte intégral

(1)

HAL Id: hal-00452240

https://hal.archives-ouvertes.fr/hal-00452240

Submitted on 20 Mar 2013

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Algorithms for computing approximate repetitions in musical sequences

Emilios Cambouropoulos, Maxime Crochemore, Costas S. Iliopoulos, Laurent Mouchard, Yoan J. Pinzón Ardila

To cite this version:

Emilios Cambouropoulos, Maxime Crochemore, Costas S. Iliopoulos, Laurent Mouchard, Yoan J.

Pinzón Ardila. Algorithms for computing approximate repetitions in musical sequences. Australasian Workshop On Combinatorial Algorithms, 1999, Australia. pp.129-144. �hal-00452240�

(2)

Repetitions in Musial Sequenes

EmiliosCambouropoulos 1?

,Maxime Crohemore 2??

, CostasS.Iliopoulos 3???

,

LaurentMouhard 4 y

,andYoanJ.Pinzon 3

1

AustrianResearhInstitutefor ArtiialIntelligene,Shottengasse3,1010Wien,

Austria

emiliosai.univie.a.at

www.ai.univie.a.at/emilios

2

InstitutGaspardMonge,UniversitedeMarne-la-Vallee,77454Marne-la-Vallee

CEDEX2,Frane.

mauniv-mlv.fr

www-igm.univ-mlv.fr/ma

3

Dept.ComputerSiene,King'sCollegeLondon,LondonWC2R2LS,England,

andShoolofComputing,CurtinUniversityofTehnology,GPOBox1987U,WA.

fsi,pinzongds.kl.a.uk,

www.ds.kl.a.uk/staff/si, www.ds.kl.a.uk/pg/pinzon

4

LIFAR-ABISS,UniversitedeRouen,76821MontSaintAignan,Frane.and

ShoolofComputing,CurtinUniversityofTehnology,GPOBox1987 U,WA.

lmdir.univ-rouen.fr

www.dir.univ-rouen.fr/lm

Abstrat. Hereweintroduetwonewnotionsofapproximatemathing

withappliationinomputerassisted musianalysis. Wepresent algo-

rithmsforeahnotionofapproximation:forapproximatestringmathing

andforomputingapproximatesquares.

Keywords: String algorithms,approximate stringmathing, dynamiprogram-

ming,omputer-assistedmusianalysis.

1 Introdution

This paperfouses on a set of stringpattern-mathing problems that arise in

musialanalysis,andespeiallyinmusialinformationretrieval.Amusialsore

anbeviewedasastring:ataveryrudimentarylevel,thealphabetouldsimply

betheset ofnotesin thehromatiordiatoninotation,ortheset ofintervals

thatappearbetweennotes(e.g.pithmayberepresentedasMIDInumbersand

?

SupportedbytheSTARTprogrammeY99-INF,AustrianFederalMinistryofSiene

andTransport

??

PartiallysupportedbytheC.N.R.S.Program\Genomes"

???

PartiallysupportedbytheEPSRCgrantGR/J17844.

y

PartiallysupportedbytheC.N.R.S.Program\Genomes"

(3)

musial works play a ruial role in disovering similarities between dierent

musial entities and may be used for establishing \harateristi signatures"

(see [6℄). Suh algorithms an be partiularly useful for melody identiation

andmusialretrieval.

Bothexatandapproximatemathingtehniqueshavebeenusedforavariety

of musial appliations (see overviews in MGettrik [23℄ ; Crawford et al [6℄;

Rolland et al [28℄;Cambouropoulos et al [4℄). Thespei problem studied in

this paperis pattern-mathing for numeri stringswhere aertaintolerane is

allowedduringthemathingproedure.Thistypeofpattern-mathinghasbeen

onsideredneessaryforvariousmusialappliationsandhasbeenusedbysome

researhers(see,forinstane,Cope[5℄).AnumberofeÆientalgorithmswillbe

presentedin thispaperthattaklevariousaspetsofthis problem.

Mostomputer-aidedmusialappliationsadopt anabsolute numeripith

representation(mostommonlyMIDIpithandpithintervalsinsemitones;du-

rationisalsoenodedinanumeriform).Theabsolutepithenoding,however,

maybeinsuÆientfor appliationsin tonal musi asit disregardstonal quali-

tiesofpithesand pith-intervals(e.g. atonal transposition from amajortoa

minor keyresultsin adierentenoding ofthemusialpassageandthus exat

mathing annot detet the similaritybetween the two passages). Oneway to

aountforsimilaritybetweenloselyrelatedbutnon-identialmusialstringsis

to usewhatwill bereferredto asÆ-approximatemathing (and-approximate

mathing). In Æ-approximate mathing, equal-length patterns onsisting of in-

tegers math if eah orresponding integer diers by not more than Æ- e.g. a

C-majorf60;64;65;67gandaC-minorf60;63;65;67gsequeneanbemathed

ifatoleraneÆ=1isallowedinthemathingproess(-approximatemathing

is desribedin the next setion). Twosimple musial examplesthat illustrate

theusefulnessoftheproposedpattern-mathingtehniquesarepresentedinAp-

pendiesI andII.

Exatrepetitionshavebeenstudiedextensively.Therepetitionsanbeeither

onatenatedwiththeoriginal substringortheymayoverlaportheymaynot.

Algorithmsforndingnon-overlappingrepetitionsinagivenstringanbefound

in [1,8,15,21,18,26℄ andalgorithms foromputingoverlappingrepetitionsan

befound in [3,13,14,25℄. A natural extension of therepetitions problem is to

allow the presene of errors; that is, the identiation of substrings that are

dupliated to within a ertain tolerane k (usually edit distane orHamming

distane).Moreover,therepeatedsubstringmaybesubjettootheronstraints:

itmayberequiredtobeofat leastaertainlength,andertainpositionsin it

mayberequiredtobeinvariant.

Furthermore,eÆientalgorithmsforomputingtheapproximaterepetitions

arealsodiretlyappliabletomoleularbiology(see[11,17,24℄)andinpartiular

in DNA sequening by hybridization ([27℄), reonstrution of DNA sequenes

from known DNA fragments (see [29,30℄), in human organand bonemarrow

transplantationaswellasthedeterminationofevolutionarytreesamongdistint

speies([29℄).

(4)

thatofndingevolutionaryhains:givenastringt(the\text")andapatternp

(the\motif"),ndwhetherthereexistsasequeneu

1

=p;u

2

;:::;u

`

ourring

in the text t suh that u

i+1

ours to therightof u

i

in t andu

i and u

i+1 are

\similar" for 1 i < ` (i.e. they dier by a ertain number of symbols). In

[9℄ and[7℄ algorithms foroverlappingand non-overlappingevolutionary hains

werepresentedandseveralvariantsoftheproblemwerestudied: omputingthe

longesthain,omputingthehain withtheleastnumberoferrors.

Thepaperisorganisedasfollows.Inthenextsetionwepresentsomebasi

denitionsforstringsandbakgroundnotionsforapproximatemathing.InSe-

tion3wepresentanalgorithmforÆ-approximate(therstnotionofapproxima-

tion)patternmathing.Insetion4wepresentanalgorithmforÆ;-approximate

(theseond notionofapproximation)patternmathing.Insetion5wepresent

algorithmsforomputingallÆandfÆ;g-approximatesquaresin agiventext.

FinallyinSetion 6wepresentouronlusionsandopenproblems.

2 Bakground and basi string denitions

A string isasequeneof zeroormoresymbolsfrom analphabet;thestring

withzerosymbolsisdenoted by.Thesetofallstringsoverthealphabet is

denotedby

.Astringxoflengthnisrepresentedbyx

1 :::x

n

,wherex

i 2

for 1 i n. A string w is a substring of x if x = uwv for u;v 2

; we

equivalentlysaythatthestringwoursatpositionjuj+1ofthestringx.The

position juj+1 is said to be the starting position of w in x and the position

jwj+juj the end position of uin x.A stringw is a prex of x if x =wu for

u2

.Similarly,wisasuÆxofx ifx=uwforu2

.

Thestringxyisaonatenationoftwostringsxandy.Theonatenations

ofkopiesofxisdenotedbyx k

.Fortwostringsx=x

1 :::x

n

andy=y

1 :::y

m

suhthatx

n i+1 :::x

n

=y

1 :::y

i

forsomei1,thestringx

1 :::x

n y

i+1 :::y

m

isasuperpositionofx andy.Wesaythat xandy overlap.

Letxbeastringoflengthn.A prexx

1 :::x

p

, 1p<n,ofx is aperiod

of x ifx

i

=x

i+p

forall1in p.Theperiod of astringxis theshortest

period ofx.A stringy isaborder ofx ifyisaprexandasuÆxofx.

LetbeanalphabetofintegersandÆaninteger.Twosymbolsa;bofare

saidtobeÆ-approximate,denoted a=

Æ

bifandonlyif

ja bjÆ

Wesaythattwostringsx;y areÆ-approximate,denoted x Æ

=y ifandonlyif

jxj=jyj; andx

i

=

Æ y

i

; 8i2f1::jxjg (2:1)

Let beaninteger.Twostringsx;y aresaidto be-approximate,denoted

x

=y ifandonlyif

jxj=jyj; and jxj

X

jx

i y

i

j< (2:2)

(5)

Furthermore,wesaythattwostringsx;yaref;Æg-approximate,denotedx = y,

ifandonlyifx andy satisfyonditions(2.1)and(2.2).

3 Æ-Approximate Pattern Mathing

The problem of Æ-approximate pattern mathing isformally dened as follows:

givenastringt=t

1 :::t

n

andapatternp=p

1 :::p

m

omputeallpositionsj of

t suhthat

p Æ

=t[j::j+m 1℄

Thealgorithm isbasedontheO(1)-time omputationofthe\Deltastates"

DState

j

;j2f1::ngby usingbitoperationsunder theassumptionthat mw,

wherewisthenumberofbitsinamahineword.Thebasistepsofthealgorithm

areas follows:

1. Firstweomputethe\Deltatable"DT:wesetDT()=r,wheredenotes

asymbolourringin t andr=r

1 :::r

m

isabinarywordwithr

i

equalto

1ifj p

i

jÆ,otherwiser

i

isequalto 0fori2f1::mg.

2. LetLeftShiftbeabit-wiseoperationthatshiftsthebitsofabinaryword

byoneposition totheleft.Wedene

DState

j

=(LeftShift(DState

j 1

) OR 1) AND DT[t

j

(3:1)

forj=1:::nandDState

0

=0;henethisproedureisalled\Shift-And".

OnewehaveomputedtheDT table,weanuseittoomputetheDState

j

forj=1 :::n,usingthereursiveformula(3.1).

3. WesaythatthereisaÆ-approximatemath(orsimplyÆ-math)atposition

j m+1ifandonlyifthem-th bitof DState

j

is1orequivalentlyifand

onlyifDState

j

, isgreaterorequalto 2 m 1

whenitis viewedasadeimal

integer.

Example. For=f1, :::, 9g let us onsider p=3,4,6,2, t=3,4,6,2,8,2,4,5,7,1

andÆ=1.Inthepreproessingtable,DT()denotesthepositionswherej p

i j

Æ. Forexample,DT[3℄=1011beausej3 p

i

j1fori=1;2;4.

i

pi DT[1℄ DT[2℄ DT[3℄ DT[4℄ DT[5℄ DT[6℄ DT[7℄ DT[8℄ DT[9℄

4 2 1 1 1 0 0 0 0 0 0

3 6 0 0 0 0 1 1 1 0 0

2 4 0 0 1 1 1 0 0 0 0

1 3 0 1 1 1 0 0 0 0 0

Table 1.ThetableDT forpatternp=2;6;4;3andalphabet=f1;:::;9g.

Thetable belowevaluatesDState usingtherelation(3.1).Forexample,

(6)

4 3 4

=(LeftShift(0100)OR1)ANDDT[2℄

=(1000OR1)AND1001

=1001AND1001

=1001

whihimpliesthatthereisamathstartingatposition1oft,sinethe4-th

bitofDState

4 is1.

j

1 2 3 4 5 6 7 8 9 10

tj 3 4 6 2 8 2 4 5 7 1

LeftShift(DState

j 1

)OR1 0001 0011 0111 1001 0011 0001 0011 0111 1101 1001

DT[t

j

1011 0011 0100 1001 0000 1001 0011 0110 0100 1000

DStatej 0001 0011 0100 1001 0000 0001 0011 0110 0100 1000

[DState

j

10

1 3 4 9 0 1 3 6 4 8

Table 2.ComputingtheDstatesandndingtheÆ-approximatemathes.

AÆ-approximatemathoursatpositionj m+1oftif[DState

j

10 2

m 1

,

where[DState

j

10

denotestheDState

j

asadeimalinteger.Therefore,thereis

onemath ending atposition 4oft (f3,4,6,2g) andanother oneat position 10

oft(f4,5,7,1g)sinefDState

4

;DState

10 g2

3

.

3.1 Pseudo-ode

Fig.1givesaomplete speiationofthealgorithm.Intheline3wehavethe

preproessingphasewhihomputetheDT table.Inline6weusethereursive

formulatoomputetheDStates.Finally,inline7weapplythemathingriteria

tosee whetherthereisaÆ-approximatemathornot.

1. proedure Shift-And(p,t,Æ) fn=jtj; m=jpjg

2. begin

3. DT

i [℄

1 if j pijÆ

0 otherwise

8i2f1::mg; 82

4. DState

0 0

5. forj 1to n do

6. DStatej (LeftShift(DStatej

1

)OR 1 ) AND DT[tj℄

7. if DStatej2 m 1

then write j-m+1

8. od

9. end

Fig.1.TheShift-AndProedure.

(7)

Assuming that the pattern length is nolonger than the memory word size of

themahine(thusO(1)size), thetimeomplexityofthepreproessingphaseis

O(n)(sine weneedto evaluateDT only forthesymbolsthat ourin t) and

thetimeomplexityofthesearhingphaseinO(n).Figure2showsthetiming 1

fordierenttextsizes.

0 0.2 0.4 0.6 0.8 1

0 200 400 600 800 1000

Text Size (k)

Time (in secs.)

Pattern Size = 15 δ = 2

0 0.2 0.4 0.6 0.8 1

0 200 400 600 800 1000

Text Size (k)

Time (in secs.)

Pattern Size = 20 δ = 2

Fig.2.Timingurvesfor theShift-AndProedure.

4 fÆ;g-Approximate Pattern Mathing

Theproblem of fÆ;g-approximate pattern mathingis formally dened asfol-

lows:givenastringt=t

1 :::t

n

andapatternp=p

1 :::p

m

omputeallpositions

j oft suhthat

p Æ;

= t[j::j+m 1℄

InordertosolvethisproblemwerstmakeuseoftheShift-Andalgorithm

to nd the Æ-approximate mathes of the pattern p in t. One we nd a Æ-

approximatemathwewantto knowwhetheritisalsoa-approximatemath.

Todo so,weseektoomputesuessive\DeltaStates"DState

j

and\Gamma

States"GStates

j

in O(1) timeusing bitoperationsunder theassumption that

mwwherewisthenumberofbitsinamahineword.Themainstepsofthe

algorithmareas follows:

1. Weneedtoomputethe\DeltaTable"DTaswedidbeforeandthe\Gamma

Table" GT table; we set GT() = r, where denotes a symbol in the

alphabetandr=r

1 :::r

m

isawordwithr

i

equaltoj p

i

jifj p

i jÆ,

otherwiser

i

isequalto 0fori2f1::mg. Eahr

i

,i 2f1::mg isstored asa

binarynumberofdbitswhered=dlog(Æm)e.

1

UsingaSUNUltraEnterprise300MHzrunningSolarisUnix.

(8)

oneposition to theleft and R ightShift shiftsthe bits of abinary wordd

positions to theright.One wehaveomputed theDT and GT tables, we

anusethemto omputetheDState

j

andGState

j

forj=1:::n,usingthe

reursiveformulas

DState

j

=(LeftShift(DState

j 1

) OR 1) AND DT[t

j

(4:1)

GState

j

=R ightShift(GState

j 1

;d)+GT[t

j

(4:2)

Wealso needto dene the seeds DState

0

=0 and GState

0

=0. We allthis

proedure \Shift-Plus" beauseweuse the \shift" and \plus" operators

toomputeeahnewstate.

3. Wesaythatthereisamath(fÆ;g-approximatemath)atpositionj m+1

ifandonlyifthem-thbitofDState

j

is1andthem-thblokofdbitstaken

asanintegeris.

Example.Forourexamplelet=f1;:::;9g, thepattern p=3;4;6;2;the

text t =3;4;6;2;8;2;4;5;7;1, Æ =1 and =3. We will use bloks of size 3

(d =3) to store thej p

i

j valueswhere j p

i

j Æ. Forexample,GT[3℄=

000 100000100 beausej3 p

i

j 1for i=1,2,4 and thedierenes are 0,1,1

respetively.(seeleft handtableoftable 3).

i p

i

1 2 3 4 5 6 7 8 9

0 1 0 1 0 0 0 0 0

1 3 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

0 0 1 0 1 0 0 0 0

2 4 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

0 0 0 0 1 0 1 0 0

3 6 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

1 0 1 0 0 0 0 0 0

4 2 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

j 1 2 3 4 5 6 7 8 9 10

tj 3 4 6 2 8 2 4 5 7 1

0 1 0 1 0 1 1 0 0 0

3 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

1 0 1 0 1 0 1 0 0 0

4 0 0 0 0 0 0 0 1 0 0

0 0 0 0 0 0 0 0 0 0

0 1 0 1 0 1 0 0 1 0

6 0 0 0 0 0 0 0 1 1 0

0 0 0 0 0 0 0 0 0 0

1 0 1 0 1 0 1 0 0 0

2 0 0 0 0 0 0 0 0 1 0

0 0 0 0 0 0 0 0 0 1

Table 3.Thelefthandsidetableisthe\GammaTable"GT andtherighthandside

tableisthetableforndingf;Æg-approximatemathes.

Therighthand tableaboveshowstheomputation oftheDStatesandthe

GStatesusing(4.2).Forexample,

GState

9

=R ightShift(000010010000,3)+000000100000

=000000010010+000000100000=000000110010

Wealreadyknowthat therearetwoÆ-approximatemathesendingat posi-

tions4and10oft.NowweanusethelastthreebitsofGState

4

andGState

10

tondoutthevaluesof,whih are0and4respetively(seerighthandtable

ofFig.3).

(9)

Fig.3belowgivesaompletedesriptionofthealgorithm.Inthelines3and4are

thepreproessingphasewhihomputetheDT tableandGT tablerespetively.

Inlines8and9weomputethenextDStateandGStaterespetively.Finally,

inline10weapplythemathingriteriatoseewhetherthereisamathornot.

1. proedure Shift-Plus(p,t,Æ,) fn=jtj; m=jpjg

2. begin

3. DT

i [℄

1 if j pijÆ

0 otherwise

8i2f1::mg;82

4. GT

di d:::di 1 [℄

j pij if DTi[℄=1

0 otherwise

8i2f1::mg;82

5. DState

0 0

6. GState0 0

7. for j 1to n do

8. DState

j

(LeftShift(DState

j 1

)OR 1) AND DT[t

j

9. GStatej R ightShift(GStatej 1,d)+ GT[tj℄

10. if DState

j 2

m 1

AND GState

dm d:::dm 1

then write j-m+1

11. od

12. end

Fig.3.TheShift-PlusAlgorithm.

4.2 Runningtime

Assumingthat Æm2 d

1thetimeomplexity ofthepreproessingphase

isO(Æm+jj)andthetimeomplexityofthesearhingphasein O(n),thus

independentfromthealphabetsize andthepatternlength.Figure4showsthe

timingfordierenttext sizes.

!htb

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0 20 40 60 80 100

Text Size (k)

Time (in secs.)

Pattern Size = 20 δ = 2

γ = 20

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0 20 40 60 80 100

Text Size (k)

Time (in secs.)

Pattern Size = 15 δ = 2

γ = 20

Fig.4.TimingurvesfortheShift-PlusAlgorithm.

Références

Documents relatifs

In order to:show this fact, we prove that, starting from an initial data set which satisfies the more general condition of York, we can create a time-

The main result of this paper is that for an asymptotically flat initial data set, with the mass density large on a large region, there is an apparent horizon (and a

In this appendix we give a brief discussion of the Regularity Estimate (2.1), and the existence of smooth solutions of the two-dimensional problem of least area for

It gives, in particular, a possibility to decide whether a reduced evolution operator corresponds to a total Hamiltonian with a spectrum bounded below.. In

Lelong, P.: Fonctionnelles analytiques et fonctions enti6res (n variables). Colloque sur les

Signalons que R. support dans S, sous des hypotheses assez generales sur S.. Courants positifs extr6maux et conjecture de Hodge 355 DtSmonstration.. On va encore

There exists an algorithm that computes the non-overlapping evo- lutionary chain in O ( kn ) time for xed alphabets, where n is the length of the input text and k the is maximum

The exact number of runs has been considered for special types of strings (see Sections 4 and 5): Fibonacci strings and more generally Stur-.. Several runs starting at the