Spectral similarity metrics for sound source formation based on the common variation cue

(1)

HAL Id: hal-01132571

https://hal.archives-ouvertes.fr/hal-01132571

Submitted on 10 Jan 2019

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

based on the common variation cue

Mathieu Lagrange, Martin Raspaud

To cite this version:

Mathieu Lagrange, Martin Raspaud. Spectral similarity metrics for sound source formation based on

the common variation cue. Multimedia Tools and Applications, Springer Verlag, 2010, pp.185-205.

�hal-01132571�

(2)

Spe tral SimilarityMetri s For Sound Sour e Formation

Based on the Common Variation Cue

Mathieu Lagrange

·

MartinRaspaud

Re eived:date/A epted:date

Abstra t S eneanalysisisarelevantwayofgatheringinformationaboutthestru ture

ofanaudiostream.For ontent extra tionpurposes,italsoprovidespriorknowledge

that anbe taken into a ount inorderto provide morerobust resultsfor standard

lassi ationapproa hes.

Inordertoperformsu hs eneanalysis,webelievethatthenotionoftemporalityis

important.Consequently,westudyinthispaperanewwayofmodelingtheevolution

over time of the frequen y and amplitude parameters of spe tral omponents. We

evaluateitsbenetsby onsideringitsabilitytoautomati allygatherthe omponents

ofthesamesoundsour e.Theevaluationoftheproposedmetri showsthatita hieves

goodperforman eandtakesbettera ountofmi ro-modulations.

Keywords auditorys ene analysis,mid-level representation, lustering, ommon

variation ue

1Introdu tion

Extra ting ontent from polyphoni audio su h as musi al streams appears to be

boundedtomoderateperforman e ifthestreamis onsidered'blindly',i.e.pro essed

withoutanypriorknowledgeofthestru tureofthestream[2℄.Ass eneanalysis isa

relevant way of gatheringinformations aboutthe stru tureof anaudio stream,

per-formingsu hoperationpriorextra ting ontentisawaytoaddressthisissue.

Onthehighend,one an onsideramid-levelrepresentationofthepolyphony[13,

5℄des ribingpolyphoni soundsasa setof oherent spe tralregions, where ea hset

anbe onsideredas monophoni .In this ase, one an fo us the ontent extra tion

M.Lagrange

Tele omParisTe h46,rueBarrault75634PARISCedex13-FRANCE

Tel.:+33(0)145817324Fax:+33(0)145817144

E-mail:lagrangetele om-pariste h.fr

M.Raspaud

LinköpingUniversityBredgatan33SE-60174Norrköping-SWEDEN

(3)

pro ess to agivenelement of thes ene[28℄. Ona lower end,one an onsider some

time segmentation of the audio stream where se tions that have similar properties

are identiedand/or lustered.Basedonthisrepresentation,thetemporalpriorsare

onsidered to integrate the indexing de ision done at ea h analysis frame to obtain

morerobust lassi ationresults[21℄.

Inordertoextra tsu hrepresentationorsegmentation,many ues anbe

onsid-ered[6℄. Timbreisoneofthem.Thedes riptionofthetimbreofmonophoni sounds

hasbeenwidelystudied[31℄andmanydes riptorshavebeenproposed[18℄.These

de-s riptorsorfeatures are mainlybasedonthetemporalorspe tralobservationsofthe

soundssin eTimbredependsprimarilyuponthespe trumofthestimulus,butitalso

dependsonthewaveform,thesoundpressure,thefrequen ylo ation,ofthespe trum,

andthetemporal hara teristi softhestimulus.,asstatedintheANSIdenitionof

timbre[19℄.Unfortunately,mostofthesedes riptors annotbedire tlyextra tedfrom

polyphoni re ordings.

Ifthesoundsprodu edbytheinstruments anbe onsideredaspseudo-periodi ,a

monophoni orpolyphoni signalmaybede omposedintosinusoidal omponentswith

parametersthatevolveslowlywithtime,thepartials.Thisrestri tionisnottoostrong

sin emost lassi alinstrumentstinthis ategory,fromstringstobrassinstruments.

Inthis ase,several riteriaorpsy hoa ousti al' ues'proposedintheAuditoryS ene

Analysis(ASA)literature[6℄maythenbe onsideredforanautomati evaluationofthe

timbreofea hsoundssour es[14℄.Inparti ular,itisshownintheworkofM Adams

[32℄that the orrelatedevolutionoftheparametersofthe partialsofagivenmusi al

orvo altoneisanimportant uefor theper eptionoftimbre.

Consequently,in order to ensure the relevan e of the approa h proposed inthis

paper,theanalysedsignalshavetobepseudo-periodi inordertobesuitableforthe

sinusoidalmodelthatisthefront-endofourmethod.Thesignals anbeinharmoni .In

fa t,thatisthemainmotivationoftheuseofthe ommonvariation ueto omplement

theharmoni ityone.Theyshouldbebestmonophoni butin aseofweakpolyphonies,

i.e.nounison, somepartials are not overlappingand anbe assignedto onlyoneof

thetwodierentsour esa tiveatthesametime.

The ommon variation ue has been used for sour e separation [9,12,46℄ i.e. to

determinewhi hpartials havebeenprodu ed simultaneouslyby thesameProdu ing

SoundSystem (PSS)and therefore automati ally extra ta high leveldes ription of

polyphoni sound. This ue is also a musi al parameter that des ribes timbre and

thereforealsohavepotentialforMusi alInformationRetrieval(MIR)appli ationssu h

as musi alinstrument,instrument lassidenti ation,and instrumentalistor lo utor

re ognition.

Theseappli ationsbothrelyonthedenitionofametri toevaluatehowdissimilar

twopartialsare,a ordingtothe ommonvariationoftheirparameters.Wewillshow

inthis paperthat onsideringthe spe trumof these variations allows us to propose

arobustdissimilarity metri . Thepaper isorganized as follows: after apresentation

of the sinusoidal modelin Se tion 2,existing metri sproposed in the literature are

reviewedinSe tion3andtherequisitesofarelevantmetri arealsodetailed.

Theproposedmetri isnextintrodu edinSe tion4.Motivatedbytheproperties

oftheevolutionsofthefrequen iesofthepartials,arstmetri isproposed.Wenext

showthatthismetri analsobesu essfullyusedwhile onsideringtheevolutionsof

(4)

theevaluationmethodologypresentedinSe tion5,wherethedatabaseandthe riteria

that evaluatethe ability of the testedmetri to dis riminatepartials produ edfrom

dierentPSS.TheresultsofthisevaluationarepresentedinSe tion6.

Thetimbraldis rimination apabilitiesoftheproposedmetri ,i.e.itsabilityto

dif-ferentiatepartialsprodu edbynotonlydierentPSSbutalsodierentinstrumentsor

dierent lassesofintrumentsarestudiedinSe tion7andsomepotentialappli ations

aredes ribedinSe tion8.

2High-LevelRepresentationofPolyphoni Sounds

Most ofthe des riptors usedinMIRappli ations onsider temporal featuressu has

meanzero- rossingrateorspe tralonessu hasMel-Frequen yCepstrumCoe ients

(MFCC), see the work of P. Herrera et al. [18℄ for a deeperreview. These

des rip-tors are generally extra ted ona frame basis and the frames are usually onsidered

independently,loosingmostofthetemporalinformation.

For various appli ations, oneneeds arepresentation of polyphoni soundswhere

thetimbralinformationaswellastheirevolutionswithrespe ttotimeofea hsound

sour es an be onsidered. In this se tion, we dis uss the fa t that the well-known

sinusoidalmodel anbeabasisforsu harepresentation.

2.1 SinusoidalModel

The sinusoidal model represents pseudo-periodi sounds as sums of sinusoids

so- alled partials ontrolledbyparametersthat evolve slowly withtime[33,43℄.More

formallyput,theaudiosignal

s

anbe al ulatedfromthe ontrollingparametersusing

Equations1and2,where

N

isthenumberofpartialsandthefun tions

f

p

,

a

p

,and

φ

p

aretheinstantaneousfrequen y,amplitude,andphaseofthe

p

-thpartial,respe tively.

The

N

pairs

(f

p

, a

p

)

aretheparametersoftheadditivemodelandrepresentpointsin thefrequen y-amplitudeplaneattime

t

.

s(t) =

N

X

p=1

a

p

(t) cos(φ

p

(t))

(1)

φ

p

(t) = φ

p

(0) + 2π

Z

t

0 f

p

(u) du

(2)

This analsobewrittenfromthesetpointofview:

P

k

(m) = {F

k

(m), A

k

(m), Φ

k

(m)}

(3)

where

F

k

(m)

,

A

k

(m)

,and

Φ

k

(m)

arerespe tivelythefrequen y,amplitude,andphase ofthepartial

P

k

attimeindex

m

.Theseparametersarevalidforall

m

∈ [b

k

,

· · · , b

k

+

l

k

− 1]

,wherethe

b

k

and

l

k

arerespe tivelythestartingindexandthelengthofthe partial.

Onaframebasis,theinstantaneousfrequen y,amplitude,andphaseofea h

(5)

go beyond the resolution limitation of the Fourier transform,one an also onsider

parametri methods like the ESPRITalgorithm [29,4℄ or maximumlikelihood ones,

likethemat hingpursuit [8,10℄. Those estimate anbe omplementedwiththe

esti-mationoftheslopeofthefrequen yandamplitude[1,42℄that ouldbe onsideredat

thetra kingphasetoobtainamorepre isemodelingofthelongtermevolutionofthe

frequen yandamplitudeparametersthroughtime.

Thepartials anbeextra tedfromtheparametersestimatedonaframebasisusing

partialtra kingalgorithms[33,43,44,27,40,35℄. Polyphoni sounds anbe onsidered

withdedi atedtra king algorithms[11,26℄. However, inorderto avoid problemsdue

tostrongpolyphony[13℄,weonly onsiderinthispapermixturesofentitiesextra ted

frommonophoni signals.

2.2 A ousti alEntities

Thesesinusoidal omponentsare alledpartialsbe ausetheyareonlyapartofamore

per eptively oherententitythatmaybe alled ana ousti alentity.

This anbewrittenas:

S =

N

[

n=1

E

n

(4)

with

S

being themid-levelrepresentationofthe sound,

E

beingana ousti al entity

andNthetotalnumberofentitiesinthesound.Hen eea hentityismadeofagroup

ofpartials:

E

n

=

M

n

[

k=1

P

_k

n

(5)

where

M

n

isthetotalnumberofpartials

P

n

k

intheentity.

To extra ttheseentitiesfrom asinusoidal representation of asound, similarities

betweenpartialsshouldbe onsideredinordertogathertheonesbelongingtothesame

a ousti alentity.Fromtheper eptualpointofview,somepartialsbelongtothesame

entityiftheyare per eivedbythe humanauditorysystem asauniquesound.There

are several ues thatlead tothis per eptualfusion: the ommononset,theharmoni

relationofthefrequen ies,the orrelatedevolutionsoftheparametersandthespatial

lo ation[6℄.

The earliest attempts at a ousti al entity identi ation and separation onsider

harmoni ityasthesole uefor groupformation.Somerelyonapriordete tionofthe

fundamental frequen y[17,15℄ andothers onsideronlythe harmoni relation ofthe

frequen iesofthepartials[23,46,41℄.Yet,manymusi alinstrumentsarenotperfe tly

harmoni .

In ontrast, the ue that onsider the orrelated evolutionsof the parametersof

thepartialsisgeneri .Also,numerouspsy hoa ousti alstudiesshowedthatthe

vari-ationsorthemi ro-modulationsareimportantforper eption.Bregmanwrites:Small

u tuationsin frequen y o ur naturally inthe humanvoi e and in musi al

instru-ments.Theu tuationsare notoften verylarge,rangingfromlessthan1per entfor

a larinettonetoabout1per entforavoi etryingtoholdasteadypit h,withlarger

ex ursionsofasmu hthanas20per entforthevibratoofthesinger.Eventhesmaller

(6)

B

A

C

D

Time

Frequency

Fig.1 Representationoftwo tivesoundsinthetime-frequen ydomain.PartialsA,B,and

C( learly orrelatedinmodulationandstartingandendingtimes,thatis ommonvariation)

representthesinusoidal omponentsoftherstsound,whileDandErepresentthesinusoidal

omponentsofthese ondsound.

partialsisper eivedasauniquea ousti alentityonlyifthesevariationsare orrelated.

Therefore,the orrelatedevolutionsoftheparametersofthe partialsisageneri ue

sin eit anbeobservedwithanyvibratinginstruments.Asanexample,seeFigure1.

Inordertodeneadissimilaritymetri that onsidersthe ommonvariation ue,we

willstudyinthenextse tionthephysi alpropertiesoftheevolutionsofthefrequen y

andamplitudeparametersofthepartials.

3TheCommon Variation Cue

Inordertodene adissimilarity metri that onsidersthe ommonvariation ue,we

havetostudythephysi alpropertiesoftheevolutionsofthefrequen yandamplitude

parametersofthepartials.

Letus onsideraharmoni tonemodulatedbyavibratoofgivendepthandrate.

Alltheharmoni saremodulatedatthesamerateandphasebuttheirrespe tivedepth

iss aledbyafa torequaltotheirharmoni rank(seeFigure2(a)).Itisthenimportant

to onsiderametri whi hiss ale-invariant.

Cooke usesadistan e [9℄ equivalent to the osine dissimilarity

d

c

,also knownas inter orrelation:

d

c

(X

1 , X

2 ) = 1 −

c(X

1 , X

2 )

p

c(X

1 , X

1 )

p

c(X

2 , X

2 )

(6)

c(X

1 , X

2 ) =

N

X

i=1

X

1 (i) X

2 (i)

(7)

where

X

1

and

X

2

arerealve torsofsize

N

.Thisdissimilarityiss ale-invariant. T. Virtanen et al.proposed (in [46℄)to use the mean-squarederror betweenthe

ve torsrstnormalizedbytheiraveragevalues:

d

v

(X

1 , X

2 ) =

1 N

N

X

i=1

„

X

1 (i)

¯

X

1 −

X

2 _¯

(i)

X

2 «

2

(8)

where

X

1

and

X

2

areve torsofsize

N

and

¯

(7)

normaliza-themeanfrequen yofagivenharmoni andtheoneofthefundamentalisequaltoits

harmoni rank.

It is proposed in [24℄ to onsider the Auto-Regressive (AR) model as a

s ale-invariant metri that onsidersonlythe predi tablepartof theevolutionsof the

pa-rameters:

X

l

(n) ≈

n

X

i=1

k

l

(i)X

l

(n − i)

(9)

wherethe

k

l

(i)

aretheAR oe ients.Sin ethedire t omparisonoftheAR oe- ients omputedfromthetwove tors

X

1

and

X

2

isnotrelevant,thespe trumofthese oe ientsis omparedasproposedbyItakura[20℄:

d

AR

(X

1 , X

2 ) = log

Z

π

−π

|K

1 (ω)|

|K

2 (ω)|

dω

2π

(10) where

K

l

(ω) = 1 +

n

X

i=1

K

l

(i)e

−jiω

(11)

When onsideringthe amplitudes of the partials, a s ale-invariant metri is also

important. In this ontext, the normalization proposed by T.Virtanen is no longer

motivated sin e the relative amplitudesof the harmoni sdependon theenvelopeof

thesound.For example,onFigure2(b), thetopmost urve(withsmall modulations)

representstheamplitudesofthefundamentalpartial,whilethese ondtothetop urve

withbroados illationrepresentstherstharmoni .

Moreover the envelope is globally de reasing as the frequen y grows, but it an

appearthattheamplitudeoftheenvelopeisalsoas endingduetothespe i shapeof

theenvelopearoundformants.Therefore,whenthefrequen yofapartialismodulated,

theamplitudemaybemodulated withaphaseshift, seethebottom urveof Figure

2(b).Therefore,ametri thatisphase-invariantshouldbe onsidered.

Theamplitudeevolutionofapartialis omposedofatemporalenvelopeandsome

periodi modulations.Sin etheenvelopeoftheamplitudeofthepartials anbevery

dierentfrompartials topartialsofthesameentityitmaybeusefulto onsideronly

theperiodi modulationswhile omputingtheirsimilarities.Themetri introdu edin

thenextse tionwill opewiththeseissues.

4Proposed Metri

Wepropose to gobeyondtemporaldomainby takingtheparametersto thespe tral

domain. There was already anattemptat this, usingARmodels (see equation 10).

Sin etheFouriertransformisbasedonthefa tthattheinputsignalisperiodi ,using

a spe trum of the evolution of the partials might show ommon periodi ities of the

partials. This will be handy for the modulations of the partials reated by vibrato

andtremolo,sin ewe anassimilatethesemodulationstosinusoidalonesoverashort

periodoftime(see[30℄). It anbealsointeresting formi ro-modulations su hasthe

onesprodu edbyvibratingstringssu hasthestringsofapiano(seeFigure3).Hen e,

(8)

0

50

100

150

200

250

300

350

400 −25

−20

−15

−10

−5

0

5

10

15 Time (frames)

Centered Frequency (Hz)

(a)Frequen ies

0

50

100

150

200

250

300

350

400

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35 Time (frames)

Amplitude

(b)Amplitudes

Fig.2 Mean- enteredfrequen iesandamplitudesofsomepartialsofasaxophonetonewith

vibrato.

4.1 UsingtheFrequen iesofthePartials

Therststepinthe al ulationofournewmetri isto orrelatetheevolutionsofthe

frequen iesofthepartials.Aswesaidbefore, agooddes riptionoftheseevolutionsis

givenbythespe traoftheseevolutions.

Theway to omputethe spe tra ofthefrequen y evolutionsofthesignal from a

partialis totakeothe meanvalue ofthis frequen yand then omputetheFourier

transformoftheresultingsignal.Indeed,inordertohavea leanspe trumrelevantto

theevolutions,itisne essarytohavetheevolutions enteredaroundzero.

Then,weapplythepreviouslyexposedpro esstothefrequen iesofallthepartials

fromwhi hwewanttomeasureevolution orrelation.On ewehavethesefrequen ies

expressed interms ofspe tra,the way to omputethedistan e between two partial

signalsistointer orrelatetheirspe tra(seeequation6).Thisgives

(9)

0

10

20

30

40

50

60

70

80

90

100

1

2

3

4

5 Time (frames)

Harmonic index

0

200

400

600

800 1000

1200

1

2

3

4

5 Harmonic index

Frequency (Hz)

Fig.3 Centeredfrequen ies(top)ofapianonoteandtheir orrespondingspe tra(bottom).

Ea h urveisshiftedfor laritysake.

0

50

100

150

200

250 −1

0

1

2

3

4

5 Time (Frames)

Amplitude

Partial

Polynomial

Fig.4 AmplitudesofapartialofanBbClarinetanditspolynomialenvelopeestimation.

where

f

1

and

f

2

are the frequen yve tors oftwo partials

P

1

and

P

2

and

F

k

is the Fourier spe trum of

f

k

. Thanks to the absolute value applied to the spe tra, this distan eisphase-invariant.

4.2 UsingtheAmplitudesofthePartials

Inthe aseoftheamplitudesofthepartials,theproblemisslightlymore ompli ated.

Indeed,inorderto entertheos illatingpartofthesignalaroundzerosubtra tingthe

(10)

0

50

100

150

200

250 −4

−3

−2

−1

0

1 Time (Frames)

Amplitude

(a)Modulations

0

200

400

600

800 1000

1200

−1

−0.5

0

0.5

1

1.5

2 Frequency

Amplitude

(b)CorrespondingSpe tra

Fig.5 AmplitudesofthreepartialsofanBbClarinetwhenthepolynomialenvelopeisremoved

(a),andtheir orrespondingspe tra(b).The urveshavebeenshiftedfor laritysake.

behind this polynomial subtra tion is that the envelope of a sound (seen as atta k,

de ay,sustain andrelease) anberoughlyapproximatedbya9thdegreepolynomial.

Anexampleofsu hasubtra tionisshownonFigure5.

Thisgivesusthedistan e

d

sp

:

d

sp

(a

1 , a

2 ) = d

c

(| f

A

1 |, | f

A

2 |)

(13)

where

A

f

k

istheFourierspe trumof

f

a

k

with

f

a

k

= a

k

− Π(a

k

)

where

a

1

and

a

2

aretheamplitudesoftwopartials,

Π

(x)

istheenvelopepolynomial omputedfromsignal

x

,usingasimpleleast-squaresmethod[34℄.

(11)

4.3 Metri Combination

Inorderto exploitboth thefrequen y and amplitudeparameters,we needaway to

ombinethemeasuresofamplitudeandfrequen ydistan es.

T.Virtanen et al. proposed to ombine frequen y and amplitudeparameters

dis-tan esbymeansofaddingthetwodistan emeasureswhile onsideringanharmoni ity

fa tor.Intheirwork[46℄,ea hdistan esareweightedbeforeperformingtheaddition.

For omparisonpurposes,we onsiderthefollowingdistan e:

d

v+v

(P

1 , P

2 ) =

d

v

(f

1 , f

2 ) + d

v

(a

1 , a

2 )

2

(14)

where

f

k

and

a

k

arerespe tivelythe frequen iesandamplitudeofpartials

P

k

.Sin e theweightsarenotsuppliedandnoharmoni ityinformationisavailableitisonlyan

approximationofthe ombinations hemeproposedbyT.Virtanen.

Sin eourproposeddistan es

d

s

and

d

sp

arenormalized,ifwewanttogivethesame weighttothetwodistan es,we an ombinethefrequen yandamplitudedistan esby

performingasimplemean.Thiswouldthenyield:

d

+

(P

1 , P

2 ) =

d

s

(f

1 , f

2 ) + d

sp

(a

1 , a

2 )

2

(15)

Inorder to take into a ount the best result on part of one of the measures, a

methodwouldbetotaketheminimumofthetwodistan es:

d

m

(P

1 , P

2 ) = min(d

s

(f

1 , f

2 ), d

sp

(a

1 , a

2 ))

(16)

Asitwill bepresentedinSe tion6,betterresults area hievedwhenwemultiply

amplitudeandfrequen yparameterdistan es. This ombination,howeverlessrobust

to errors, seemsto take better a ount ofthe performan e of ea h distan emeasure

independently.Inordertokeepthemetri sinthesames ale,asquarerootisapplied

tothe ombination:

d

×

(P

1 , P

2 ) =

q

d

s

(f

1 , f

2 )d

sp

(a

1 , a

2 )

(17)

5Evaluation

In this se tion, we present the methodology used for evaluating the performan e of

thedierentmetri sreviewedinSe tion3and proposedinSe tion4.Theevaluation

databaseisrstdes ribed.Next,several riteriaare presented,ea honeevaluatinga

spe i propertyoftheevaluatedmetri .

Theobje tiveoftheevaluationpresentedintheremainingofthepaperistostudy

if theproposed similarity metri sare good andidatesfor implementinga lustering

of the partials of the same a ousti al entity. In Se tion 7, we extendthis study by

onsidering the statisti alpropertiesof oneofthe proposedmetri while onsidering

(12)

Inthisstudy,wefo usonasubsetofmusi alinstrumentsthatprodu epseudo-periodi

soundsand modelthemasasumof partials(see Se tion2).Theinstrumentsofthe

IOWAdatabase[16℄whoseinstrumenthierar hyisplottedinFigure7,globally tto

this onditioneventhough somesampleshavetobe removed.The pizzi ato tones,

i.eplu ked-stringtoneswithstrong atta kand weakresonating phaseas well asthe

pianissimo tonesi.etoneswithverylowamplitudearedis arded.

Inordertoextra tthepartialsforea htone,ea hleoftheIOWAdatabaseissplit

intoaseries ofaudioles,ea h ontainingonlyonetone.Thespe tralparametersat

ea hframesareestimatedusingthephasederivativemethodstudiedin[25℄ withthe

followingparameters:thewindowsizeis2048sampleslong,thehopsizeis512samples

long at a samplingrate of 44100 Hz. Animplementationof the algorithm proposed

byM AulyandQuatieriin[33℄isusedwithafrequen ytoleran eof50Hz.Sin ewe

onsideronlytheprominentpartialsofagiventone,onlytheextra tedpartialslasting

foratleast2se ondsareretained.Forea hentity,onlythe20partialswiththehighest

amplitudeareretained.

5.2 Methodology

To omparethemetri sproposedinSe tion4andthosereviewedinSe tion3,weuse

thefollowingmethodologyto omputethethreeevaluation riteria.Forthetwoentities

of the onsidered ouple, the medianvalues ofthe starting/endingtimeindexofthe

partials

t

s

and

t

e

are omputed.Onlythepartialsexistingbeforeandafter

t

s

+ ǫ

s

and

t

e

− ǫ

e

arekept(seeFigure6).Thevalues

ǫ

s

and

ǫ

e

arearbitrarilysmall onstants.

Then,thepartialsofthetwoentitiesaregathered.Onlythe ommonpartdened

asthetimeintervalwhereallthepartialsarea tiveis onsideredtoevaluatethetested

metri .Forexample,the ommonpartofthepartialsrepresentedinFigure6isbetween

c

s

and

c

e

.

Frequen y

Time

t

_s

c

_s

c

_e

t

_e

Fig.6 Sele tionofthe ommonpartsofthepartialsofthetwoa ousti alentities.Apartial

startisrepresentedwithabla klleddotanditsendwithawhitelleddot.Onlythepartials

existingbeforeandafter

ts

and

te

arekept,represented withsolidlines.Theindexes

cs

and

(13)

5.3 Performan eCriteria

On etheevaluationdatabaseandtheevaluationmethodologyaredened,some riteria

havetobedenedthatree tif,by onsideringtheevaluatedmetri ,twopartialsare

loseiftheya tuallybelongtothesamea ousti alentityandfarotherwise.

5.3.1Fisher riterion

Arelevantdissimilaritymetri betweentwopartialsisametri whi hislowforpartials

ofthesameentitythe lassfromthestatisti alpointofviewandhighforpartials

that donot belong to the same entity.The intra- lass dissimilarity should then be

minimalandtheinter- lassdissimilarityashighaspossible.Let

U

bethesetofelements

of ardinal

# U

and

C

i

theentityofindex

i

between

N

c

dierententities.Anestimation oftherelevan eofagivendissimilarity

d(x, y)

foragivena ousti alentityis:

intra

(C

i

) =

n

i

X

j=1

n

i

X

k=1

d(C

i

(j), C

i

(k))

(18) inter

(C

i

) =

n

i

X

j=1

# U −n

i

X

l=1

d(C

i

(j), C

i

(l))

(19)

F(C

i

) =

inter

(C

i

)

intra

(C

i

)

(20)

where

n

i

isthenumberofpartialsin

C

i

and

C

i

= U \C

i

.Theoverallquality

F(U )

is thendenedas:

F(U ) =

P

N

c

i=1

inter

(C

i

)

P

N

c

i=1

intra

(C

i

)

(21)

Thislast riterion

F(U )

islooselybasedonthesherdis riminant ommonlyusedin

statisti alanalysis.Itprovidesarstevaluationofthedis riminationqualityofagiven

metri . It anhoweverbe noti edthatthis riterionis dependentof thes ale ofthe

studieddissimilaritymetri .

5.3.2Density riterion

Dissimilarity-ve torbased lassi ationinvolves al ulating adissimilaritymetri

be-tweenpair-wise ombinationsof elementsand grouping togetherthosefor whi hthe

dissimilaritymetri issmalla ordingtoagiven lassi ationalgorithm.

Thedensity riterion

D

intends to evaluate aproperty ofthe tested metri that

shouldbefullledinordertoberelevantlyusedin ombinationwith ommon

lassi a-tionalgorithmssu hashierar hi al lusteringorK-means.Indeed,many lassi ation

algorithmsiteratively lusterpartialswhi hrelativedistan eis thesmallestone. The

density riterionveriesthatthesetwopartialsa tuallybelongtothesamea ousti al

entity.

Moreformally,given asetof elements

X

,

ζ(X)

is denedas theratio of ouples

(14)

l:

X

→

N

a

7→

i

where

i

istheindexofthe lassof

a

.Weget:

D(X) =

# {(a, b) | d(a, b) = min

c∈X

d(a, c) ∧

l

(a) =

l

(b)}

# X

(22)

where

X

anbeeitherana ousti alentity

C

i

ortheuniverse

U

and

# x

denotesthe ardinalof

x

.

5.3.3Classi ation riterion

Forthis riterion,thequalityofthetestedmetri isevaluatedby onsideringthequality

ofa lassi ationdoneusingthetestedmetri anda lassi ationalgorithm.

We onsider anagglomerative hierar hi al lustering (AHC)pro edure [22℄. This

algorithmprodu esaseriesofpartitionsofthepartials:

(P

n

, P

n−1

, . . . , P

1 )

.

Therst partition

P

n

onsists of

n

singletons and the last partition

P

1

onsists ofa single lass ontaining allthe partials.Atea hstage, themethodjoins together

thetwo lusterofpartialswhi haremostsimilara ordingtothe hosendissimilarity

metri .Attherststage,of ourse,thisendsinjoiningtogetherthetwopartialsthat

are losesttogether,sin eattheinitialstageea h lusterhasonlyonepartial.Atea h

stage,thedissimilaritybetweenthenew lusterandtheotheronesis omputedusing

themethodproposedbyWard[47℄.

Hierar hi al lustering may berepresented by atwodimensional diagramknown

asdendrogram whi hillustratesthefusionsmadeatea hsu essivestageof lustering,

seeFigure 7where the lengthof the verti al barthat links two lasses is al ulated

a ordingtothedistan ebetweenthetwojoined lusters.

Thea ousti alentities anthenbefoundby utting thedendrogramatrelevant

levels. Here, for the lassi ation riterion, the a ousti al entities are identied by

simply uttingthedendrogram atthehighestlevelstoa hieve thedesired numberof

entities.Ifthedesirednumberofentitiesis2,onlythehighestlevelis ut(seeFigure

7).

The lassi ation riterion

H

isthen dened as the numberof partials orre tly

lassiedversusthenumberofpartials lassied:

H(X) =

# {a|a ∈ ˆ

C

i

∧ cl(a) = i}

# X

(23)

where

ˆ

C

i

isana ousti alentityextra tedfromthehierar hy.

6Results

Ea hmetri sreviewedinSe tion3andproposedinSe tion4arenow omparedusing

theevaluationmethodology des ribedinthepreviousse tion. The orrelationmetri

d

c

ofEquation6andthemetri

d

v

proposedbyT.Virtanen(seeEquation8)requires noparameterization.

The metri

d

ar

onsiders AR ve tors of

4

oe ients omputed with the Burg

(15)

p

1 p

2 p

3 p

4 p

5 p

6

Fig.7 Dendrogramrepresentingthehierar hyobtainedusingtheAHCalgorithmwith6

par-tials.The utatthehighestlevelofthehierar hyrepresentedbyadotidentifytwoa ousti al

entities

C1

= {p

1, p6

, p2}

and

C2

= {p

3, p4

, p5}

.

F

D

H

dc

2.909 0.938(0.216) 0.929(0.137)

dv

1.763 0.929(0.230) 0.881(0.172)

d

ar 1.863 0.712(0.326) 0.757(0.166)

ds

3.488 0.944(0.210) 0.940(0.130)

dsp

2.909 0.936(0.219) 0.931(0.133)

Table1 Three riteria(Fisher,density,hierar hi al lassi ation)resultsforthevemetri s

presentedinthispaper,appliedonthefrequen iesofthepartials.Thedensityandhierar hi al

riteria(twolast olumns)arepresentedass oresbetween0and1.Forevery riteria,ahigher

valuemeansbetterperforman e.

omputation of the metri

d

sp

(see Equation 13) is similar ex eptthat a

9 th

order

polynomialisrstestimatedandremovedbeforetheFFT omputation.Theresultsare

presentedasmeanvaluesforea h riterion,andthebra ketedvaluesarethestandard

deviations(notshownfor

F

sin ethevalueisalreadynormalized).

6.1 Frequen yParameter

Themetri sbetweenpartialsbasedonthefrequen yparameterisshowedonTable1.

The

d

s

metri we proposedgivesthe bestresults for the three riteria. It shouldbe notedthatthe orrelationmetri (

d

c

)givesalsogoodresultsforthetwolast riteria. We analsoseethatremovingthepolynomialfromthefrequen iesofthepartialsdoes

not ontributetothequalityofthemetri sin efrequen iesofthepartialsofthesounds

intheIOWAdatabasearequasi-stationary.Theperforman eisevenworsebe auseof

themodulationsthatthepolynomialmighttakeawayfromthefrequen yevolutions.

6.2 AmplitudeParameter

AspresentedonTable2,theperforman eofthemetri sfortheamplitudeparameter

are globally worse than those obtained for the frequen y parameter, lowering from

(16)

dv

1.298 0.784(0.316) 0.773(0.159)

d

ar

1.938 0.664(0.331) 0.733(0.156)

ds

1.452 0.778(0.301) 0.781(0.163)

dsp

1.366 0.796(0.297) 0.803(0.171)

Table2 Three riteria(Fisher,density,hierar hi al lassi ation)resultsforthevemetri s

presentedinthispaper,appliedontheamplitudesofthepartials.Thedensityandhierar hi al

riteria(twolast olumns)arepresentedass oresbetween0and1.Forevery riteria,ahigher

valuemeansbetterperforman e.

F

D

H

dv+v

1.298 0.784(0.316) 0.773(0.159)

d+

2.040 0.923(0.230) 0.928(0.137)

dm

3.303 0.934(0.216) 0.943(0.122)

d×

2.702 0.937(0.217) 0.951(0.116)

Table 3 Three riteria(Fisher,density,hierar hi al lassi ation)resultsforthe four

om-binedmetri swedened.Thedensityandhierar hi al riteria(twolast olumns)arepresented

ass oresbetween0and.Forevery riteria,ahighervaluemeansbetterperforman e.

Themetri

d

c

performsbestfor thedensity riterionsin eitisgenerallyverylow for verysimilarpartials. Themetri

d

ar

givesagood resultfor theFis her riterion

while itperforms badly forthe twoother riteria.Thismetri was tested inanother

work[24℄,butonlyonaverylimiteddatabase.Onalarger databasesu has onethe

oneoftheIOWA,we anseethatthismetri doesnotseemverystableonthethree

riteria.Inthismater,thespe tralmetri s

d

s

and

d

sp

performbest.

6.3 Combination

Inordertojointlytakeinto a ount the ommonvariation ueofthe frequen yand

amplitude parameters,we onsidered all possible ombinations of pre eding metri s

(

d

c

,

d

v

,

d

ar,

d

s

,

d

sp

)forea hspe tralparamterwiththethreeoperatorsweproposed (

+

,

×

,

min

).Onlythemostrelevantonesare presentedonTable 3for laritysake.

The metri

d

m

is given best for theFis her riterion while the metri

d

×

shows best resultsfor both densityand hierar hi al lassi ation riteria(the lassi ation

performan eisenhan edby1%overtheobtainedresultswiththefrequen y ueonly).

Hen e the metri

d

×

will be kept for timbral dis rimination presented in the next Se tion.

7InstrumentsClass dis rimination

Inthepreviousse tion,weusedtheevaluationdatabasegloballyinorderto ompare

thedierentmetri s.Westudyinthisse tion adetailedevaluationofthebehaviorof

theproposedmetri by onsideringseveral levelsintheinstrumentshierar hy ofthe

(17)

Instruments intra(a) intra(b) inter(a,b)

a b mean

σ

max mean

σ

max mean

σ

min

Ob Ob 0.018 0.020 0.099 0.018 0.020 0.099 0.101 0.087 0.004 Ob Sx 0.018 0.021 0.092 0.062 0.072 0.652 0.314 0.225 0.007 Tu To 0.021 0.033 0.334 0.012 0.015 0.131 0.277 0.152 0.011 BW WW 0.015 0.022 0.295 0.083 0.102 0.667 0.315 0.184 0.016 BS SS 0.127 0.119 0.905 0.479 0.3 1.157 0.5 0.265 0.012 S W 0.237 0.216 0.946 0.059 0.11 0.928 0.373 0.204 0.024

Table 4 Evaluation of the dis rimination apabilities of the proposed metri for dierent

instrumentssu hasOboe(Ob),Saxophone(Sx),Trumpet(Tu)andTrombone(To)aswellas

setsofinstrumentsofthe IOWAdatabasesu hasBrassWinds(BW), WoodWinds(WW),

BowedStrings(BS),and Stru kedStrings(SS).Thevaluesinthe tablearerespe tivelythe

mean,standarddeviationandmaximalvaluesofthe

d×

metri .

Strings Wind

Bowed Brass Wood

Bass, Cello, Violin IOWA Piano Trombone, Trumpet Flute, Saxophone, Clarinet, Flute, Bassoon, Oboe Stru k

Fig.8 TheIOWAdatabasehierar hy.

thisse tion.Ea hgroup orrespondstoanodeatagivenlevelofthehierar hyshowed

inFigure7.

The methodology used for these experiments is the one des ribed in Se tion 5.

For ea h experiment, we randomly sele t 100 entities of ea h onsidered groupand

theintraandinterare omputedforea h oupleofentities,ea hentitybelongingto

one group. Only ouples with dierent entities are onsidered. In order to improve

the larityof theresults,theintraand intervaluesarenotaveraged overall ouples.

Instead,the mean andthe standarddeviationis omputed,as well asthe maximum

valuerespe tivelyfortheintraandtheinter.

In the rst experiment, whi h results are reported in the rst line of Table 4,

we onsider a ousti al entities produ ed by the Oboe only. Sin ethe same groupis

onsidered onbothsides, the intravalues areequal. However, the interis not equal

totheintrasin ethe omputationoftheintrainvolvesonlythepartialsofoneentity,

whilethe omputationoftheinteralwaysinvolvespartialsofdierententities.

Inordertoseparateperfe tlytwoentitiesoftheOboe,wewouldneedtohavethe

minimumvalue oftheintergreaterthanthemaximumvalueoftheintra.Itis learly

(18)

theSaxophone andtwoinstrumentsoftheBrass Windfamily,theTrumpetandthe

Trombone.Sin ethesetofentitiesisdierentfromthepreviousexperimentwithOboe

only,theintraisslightlydierent.By onsideringtwodierentinstruments,theinteris

in reasedtoavaluethatremainsalmoststableinthehigherlevelsofthehierar hy.It

showsthatthedieren ebetweeninstrumentsisthemostsalientlevelofthehierar hy,

asfarastheproposedmetri is onsidered.

Next,theBrassWindandtheWoodWindfamilya hieveverylowintra,meaning

thatpartialsofthesameentityofthesetwofamiliesaredensea ordingtotheproposed

metri .ThefthlineofTable4presentstheresultswhile onsideringtheBowedStrings

andStru kStringsfamilies,thatappeartobeverydissimilar.Thehighintervaluemay

beexplainedbythedierenttypesofex itationsleadtoverydierenttimbre.

Thepartialsofthea ousti alentitiesprodu edbythePiano(uniqueinstrumentof

thestru kstringfamilyinthedatabase)arespreadoverthefeaturespa e.Eventhough

thenewmetri onsidersspe tralinformationwhi hdoesimprovetheperforman eover

thetemporalinformationin aseofmi ro-modulations,seeFigure3,itappearsthatthe

mi ro-modulationsarenotassalientaslargermodulationssu hasvibratoortremolo.

8Appli ations

Inthis se tion, we des ribe someappli ations where su hdes riptionof the

spe tro-temporal ontentofaudiostreams anbehelpful.

8.1 BinauralS eneAnalysis

The urrent paperdeals with the ommonvariationof partials. However, two more

ues are importantfor the per eptualgathering ofpartials: the ommondire tionof

arrival,andtheharmoni ityamongpartials[6℄.

The ommon dire tion of arrival an be determinedin the ase of multi hannel

audio.Inthe ase of binaural sounds(stereosounds re orded atthe entran eofthe

auditory hannels),itispossibletoobtainanoverallgoodestimationofthedire tion

ofarrivalofsoundsour es.Asstudiedin[37℄,where itisshownthatthedire tionof

arrivalofpartials,althoughnotaperfe t riterion anbeusedas apartial lustering

ue.Theharmoni ity uehasbeenusedforthegatheringofpartialstoo,su hasin[46℄.

Bydeterminingtheharmoni relationshipbetweenpartials,itispossibletodetermine

gatherthepartialsbysour esoftheonehand,andpointouttheoverlappingpartials.

Thesethree uesworkverydierentlyfromea hother.Hen e,by ombiningthem,

we think that we may be able to enhan e the robustness and pre ision of the

par-tial gathering pro ess as the diversity added by the dierent ues shows interesting

perspe tives.

8.2 A ousti alEntitiesSimilarity

(19)

We are interested inthis type of appli ation sin e there is anin reased interest

towardsre ommendationsystemsthatarenotbasedonanontologysu hasgenre[45℄

orinstrumenttype[21℄.Alternatively,one an onsiderare ommendationsystemthat

statesshow metunes that are similarto the onesI like.In this ase,one needsto

dene the similarity between musi al audio signals and the timbre is aninteresting

dimensionto onsider.

Weare urrentlyinvestigatingageneralizedversionofthedes riptorsdes ribedin

thispaperforsu hapurpose.Preliminarevaluationsshowthaton ontinuousmusi al

solos, theuseof thosedes riptors ombinedwithstandard segmentaldes riptors like

theMFCC'ssigni antlyimprovetheperforman es.

8.3 SingingVoi eDete tion

Astheproposeddes riptors apturethemodulationsovertimeofthespe tral

param-eters, theymodele ientlythemodulations of thesinging voi e, su h asvibrato or

tremolo. Assuming that the singing voi e is almost always modulated [39℄, one an

onsiderthattheproposeddes riptors anbe onsideredtoestimatewhetherasinging

voi eisa tiveornot.Preliminarexperimentsshow ompetitiveperforman e ompared

to state-of-the-artstatisti al approa hesusingstandard des riptors likethe MFCC's

[36℄.Astheproposeddes riptorsandtheMFCC'smodeldierentaspe tsoftheaudio

stream,itisexpe tedthata ombinationofbothapproa heswillprovidesasigni ant

improvement.

9Con lusion

In this arti le, we have proposed a new metri that dis riminate partials of

dier-enta ousti alentitiesby onsideringthe evolutionsoftheir frequen yandamplitude

parameters.

Consideringthe orrelationofthespe trumoftheseevolutionsleadtomorestable

results thanthe oneobtained withthe ARmodeling approa hproposed inprevious

work [24℄. A ording to the experiments, the modulations of the frequen y appear

tobethemost relevant ue,howeveraslight improvement anbe gained on erning

theamplitudeiftheenvelopeis removed.Wealsodemonstratedthat onsideringthe

ombinationof metri soffrequen ies and theamplitudes enhan edthe lassi ation

resultsasfarasthedensityandhierar hi al riteriaare on erned.

Thisnewmetri maybeusedforthe lassi ationofpartialsintoa ousti alentities.

It has to be noted that the hierar hi al lassi ation used as a quality riterion in

our study, even though very naive, yields to very good results, about 95 per ents

of orre t lassi ations. Even better performan e ould ertainlybe obtained using

more sophisti ated lassi ation methods, whi h ould be of interestfor many MIR

appli ations.

A knowledgements This work has been initiated when the authors were at the LaBRI

(UMR-Cnrs5800,UniversityofBordeaux1)andhasbeenpartlyfundedbytheOSEOproje t

(20)

1. M. Abeand I.Smith,J. O. Am/fmrateestimationfortime-varyingsinusoidal

model-ing.InPro .IEEEInternationalConferen eonA ousti s,Spee h,andSignalPro essing

(ICASSP'05),volume3,pagesiii/201iii/204,1823Mar h2005.

2. J.-J.Au outurierandF.Pa het.Theinuen eofpolyphonyonthedynami almodelling

ofmusi altimbre. PatternRe ognitionLetters,28(5):654661,2007.

3. F.AugerandP.Flandrin. ImprovingtheReadabilityofTime-Frequen yandTime-S ale

RepresentationsbytheReassignmentMethod. IEEETransa tionsonSignalPro essing,

43:10681089,May1995.

4. R.Badeau,G.Ri hard,andB.David. Performan eofespritforestimatingmixturesof

omplexexponentials modulatedbypolynomials. SignalPro essing, IEEETransa tions

on[seealsoA ousti s,Spee h,andSignalPro essing,IEEETransa tionson℄,56:492504,

2008.

5. J.P.BelloandJ.Pi kens. ARobustMid-levelRepresentationforHarmoni Contentin

Musi Signals. InISMIR,O tober2005.

6. A.S.Bregman. AuditoryS eneAnalysis: ThePer eptual Organization ofSound. The

MITPress,1990.

7. J.P.Burg.MaximumEntropy Spe tralAnalysis. PhDthesis,StanfordUniversity,1975.

8. M.G.ChristensenandS.H.Jensen.Onper eptualdistortionminimizationandnonlinear

least-squaresfrequen yestimation. IEEETransa tionsonAudio,Spee h,andLanguage

Pro essing,14(1):99109,Jan.2006.

9. M.Cooke.ModellingAuditoryPro essingandOrganization.CambridgeUniversityPress,

NewYork,1993.

10. L.Daudet. Sparseandstru turedde ompositionsofsignalswiththemole ularmat hing

pursuit.IEEETransa tionsonAudio,Spee h,andLanguagePro essing,14(5):18081816,

Sept.2006.

11. P.Depalle,G.Gar ia,andX.Rodet. Tra kingofPartialsforAdditiveSoundSynthesis

UsingHiddenMarkovModels. InIEEEInternational Conferen e onA ousti s, Spee h

andSignalPro essing(ICASSP),volume1,pages225228,April1993.

12. D.Ellis. Predi tion-driven omputationalauditory s ene analysis. PhDthesis,

Depart-ment.ofEle tri alEngineering&ComputerS ien e,M.I.T,1996.

13. D.Ellisand D.Rosenthal. Mid-levelrepresentationsforComputational AuditoryS ene

Analysis.InInternationalJointConferen eonArti ialIntelligen e(IJCAI)-Workshop

onComputationalAuditoryS eneAnalysis,August1995.

14. D.EllisandB.Ver oe.Aper eptualrepresentationofsoundforauditorysignalseparation.

In123rd meetingoftheA ousti alSo ietyofAmeri a,May1992.

15. P. Fernandez and J. Casajus-Quiros. Multi-Pit h Estimation for Polyphoni Musi al

Signals. InIEEEInternational Conferen eonA ousti s, Spee h and SignalPro essing

(ICASSP),pages35653568,April1998.

16. L. Fritts. The IOWA Musi Instrument Samples. Online. URL:

http://theremin.musi .uiow a.e du, 1997.

17. S.Grossberg. Pit hBased StreaminginAuditoryPer eption.CambridgeMA,MitPress,

1996.

18. P.Herrera, G. Peeters, and S. Dubnov. Automati Classi ation of Musi al Sounds.

"JournalofNewMusi alResear h",32(1):321,2003.

19. A.N.S.Institute.USAStandardA ousti alTerminology,1960.

20. F.Itakura.MinimumPredi tionResidualPrin ipleAppliedtoSpee hRe ognition.IEEE

Transa tionsonA ousti s,Spee handSignalPro essing,23(1):6772,1975.

21. C.Joder,S.Essid,andG.Ri hard. TemporalIntegration forAudioClassi ation with

Appli ation toMusi alInstrumentClassi ation. IEEETransa tionsonAudio, Spee h

andLanguagePro essing,17(1):174186,2009.

22. S.C.Johnson.Hierar hi alClusteringS hemes.Psy hometrika,2(2):241254,1967.

23. A.Klapuri.SeparationofHarmoni SoundsUsingLinearModelsfortheOvertoneSeries.

InIEEEInternationalConferen eonA ousti s,Spee handSignalPro essing(ICASSP),

2002.

24. M. Lagrange. ANew DissimilarityMetri For The Clustering Of Partials Using The

CommonVariationCue. InPro eedingsoftheInternationalComputerMusi Conferen e

(21)

25. M. Lagrange and S. Mar hand. Estimating the instantaneous frequen y of sinusoidal

omponentsusingphase-basedmethods.JournaloftheAudioEngineeringSo iety,2007.

26. M. Lagrange, S. Mar hand, and J. Rault. Enhan ing the tra kingof partials for the

sinusoidal modeling of polyphoni sounds. IEEE Transa tions on Audio, Spee h and

LanguagePro essing,28:357366,Aug.2007.

27. M. Lagrange, S. Mar hand, and J.-B.Rault. Using Linear Predi tion to Enhan e the

Tra kingofPartials. InIEEEInternationalConferen eonA ousti s,Spee handSignal

Pro essing(ICASSP),volume4,pages241244,May2004.

28. M.Lagrange,L.G.Martins,J.Murdo h,andG.Tzanetakis. NormalizedCutsfor

Pre-dominantMelodi Sour eSeparation.IEEETransa tionsonAudio,Spee handLanguage

Pro essing,16(2):278290,2008.

29. J. Laro he. Theuse ofthe matrix pen il methodfor the spe trum analysisofmusi al

signals.TheJournal oftheA ousti alSo ietyofAmeri a,94(4):19581965,1993.

30. S.Mar handandM.Raspaud.Enhan edTime-Stret hingUsingOrder-2Sinusoidal

Mod-eling.InPro .DAFx,pages7682.Federi oIIUniversityofNaple,Italy,O tober2004.

31. K. D. Martin and Y. E. Kim. Musi al Instrument Re ognition: a pattern-re ognition

approa h. In136thmeetingoftheA ousti alSo ietyofAmeri a,O tober1998.

32. S. M Adams. Segregation of Con urrrents Sounds : Ee ts of Frequen y Modulation

Coheren e. Journal oftheAudioEngineeringSo iety,86(6):21482159,1989.

33. R.J.M AulayandT.F.Quatieri.Spee hAnalysis/SynthesisBasedonaSinusoidal

Repre-sentation.IEEETransa tionsonA ousti s,Spee handSignalPro essing,34(4):744754,

1986.

34. A.Nealen.Anas-short-as-possibleintrodu tiontotheleastsquares,weightedleastsquares

and movingleast squaresmethods for s attered data approximation and interpolation.

URL:http://www.nealen. om/ proj e ts /,May2004.

35. L.Nunes,R.Mer hed, andL.Bis ainho. Re ursiveleast-squaresestimationofthe

evo-lutionofpartialsinsinusoidalanalysis. InIEEEInternationalConferen eonA ousti s,

Spee handSignalPro essing(ICASSP),2007.

36. M.RamonaandG.Ri hard. Vo aldete tioninmusi withsupportve torma hines. In

IEEE International Conferen e onA ousti s, Spee h andSignal Pro essing (ICASSP),

2008.

37. M.RaspaudandG.Evangelista.Binauralpartialtra king.InPro .DAFx,pages123128,

Espoo,Finland,September2008.

38. M.Raspaud,S.Mar hand,andL.Girin.AGeneralizedPolynomialandSinusoidalModel

for Partial Tra king and Time Stret hing. In Pro . DAFx, pages 2429. Universidad

Polit

ni adeMadrid,September2005.ISBN:84-7402-318-1.

39. L. Régnier and G. Peeters. Singing voi e dete tion inmusi tra ks usingdire t voi e

vibrato dete tion. InIEEE International Conferen e on A ousti s, Spee h and Signal

Pro essing(ICASSP),2009.

40. A. Röbel. Adaptive additivemodeling with ontinuous parameter traje tories. IEEE

Transa tionsonA ousti s,Spee handSignalPro essing,14(4):14401453,2006.

41. J.RosierandY.Grenier. UnsupervisedClassi ationTe hniquesforMultipit h

Estima-tion. In116thConventionoftheAudioEngineering So iety.AudioEngineeringSo iety

(AES),May2004.

42. A.R

¶

bel. Frequen y-slopeestimationanditsappli ation toparameterestimationfor

non-stationarysinusoids. ComputerMusi Journal,32:6879,2008.

43. X.Serra. Musi alSignalPro essingwith SinusoidsplusNoise, hapter3,pages91122.

StudiesonNewMusi Resear h.Swets&Zeitlinger,Lisse,theNetherlands,1997.

44. A.SterianandG.H.Wakeeld. AModel-BasedApproa htoPartialTra kingforMusi al

Trans ription. SPIEannualmeeting,SanDiego,California,1998.

45. G.TzanetakisandP.Cook.Musi algenre lassi ationofaudiosignals.IEEE

Transa -tionsonAudio,Spee h andLanguagePro essing,10(5):293302,2002.

46. T. Virtanenand A. Klapuri. Separation of Harmoni Sound Sour es UsingSinusoidal

Modeling.InIEEEInternationalConferen eonA ousti s,Spee handSignalPro essing

(ICASSP),volume2,pages765768,April2000.

47. J.H. Ward. Hierar hi alGrouping toOptimizean Obje tiveFun tion. Journal of the