Phase reconstruction of spectrograms with linear unwrapping : application to audio signal restoration

(1)

HAL Id: hal-02287339

https://hal.telecom-paris.fr/hal-02287339

Submitted on 4 Feb 2021

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Phase reconstruction of spectrograms with linear

unwrapping : application to audio signal restoration

Paul Magron, Roland Badeau, Bertrand David

To cite this version:

Paul Magron, Roland Badeau, Bertrand David. Phase reconstruction of spectrograms with linear

unwrapping : application to audio signal restoration. [Research Report] 2015D002, Télécom ParisTech.

2015. �hal-02287339�

(2)

Phase reconstruction of spectrograms

with linear unwrapping :

application to audio signal restoration

Reconstruction de phases de spectrogrammes

par déroulé linéaire : application à la restauration

de signaux audio

Paul Magron,

Roland Badeau

Bertrand David

avril 2015

Département Traitement du Signal et des Images

Groupe AAO : Audio, Acoustique et Ondes

(3)

appli ation to audio signal restoration

Re onstru tion de phases de spe trogrammes par déroulé linéaire :

appli ation à la restauration de signaux audio

PaulMagron Roland Badeau Bertrand David

Institut Mines-Télé om,Télé om ParisTe h,CNRS LTCI, Paris, Fran e

<firstname>.<lastname>tel e om -par iste h. fr

∗

Abstra t

Thispaperintrodu esanovelte hniquefor re onstru tingthephaseofmodiedspe trogramsofaudio

signals. Fromtheanalysisofmixturesofsinusoidsweobtainrelationshipsbetweenphasesofsu essivetime

framesintheTime-Frequen y(TF)domain. Toobtainsimilarrelationshipsoverfrequen ies,inparti ular

withinonsetframes,westudyanimpulsemodel. Instantaneousfrequen iesandatta ktimesareestimated

lo ally to en ompass the lass of non-stationary signals su has vibratos. These te hniques ensure both

the verti al oheren eofpartials (overfrequen ies) andthehorizontal oheren e (overtime). Themethod

is testedonavarietyofdataand demonstratesbetterperforman ethantraditional onsisten y-based

ap-proa hes. We alsointrodu eanaudio restoration frameworkand observethat ourte hniqueoutperforms

traditionalmethods.

Key words

Phasere onstru tion,sinusoidalmodeling,linearunwrapping,phase onsisten y,audiorestoration.

Résumé

Cerapportprésenteunenouvellete hniquepourlare onstru tiondephasesdespe trogrammesmodiés.

Apartirdel'analysedemélangesdesinusoïdes,onobtientdesrelationsentrelesphasestramessu essives

dansleplantemps-fréquen e(TF).Pourobtenirdesrelationssimilairesentrefréquen es,enparti ulierau

sein destramesd'attaque, nousétudionsunmodèled'impulsion.Les fréquen esinstantanéesetlestemps

d'attaque sont estiméslo alement an de pouvoir représenter des signaux non stationnaires, tels que les

vibratos.Ces te hniquespermettentd'assureràlafois une ohéren everti aleentre lespartiels (àtravers

lesfréquen es)ethorizontale(au oursdutemps).Cetteméthodeesttestéesurdesdonnéesexpérimentales,

et montre de meilleurs résultats que l'appro he traditionnelle basée sur la onsistan e. Nous proposons

également d'introduire ettete hniquedansun ontextederestaurationdesignauxaudio,danslequelune

meilleureperforman equ'ave lesméthodestraditionnellesestobservée.

Mots lés

Re onstru tiondephase,mélangesdesinusoïdes,déeroulélinéaire, onsistan edephase,restaurationde

signauxaudio

∗

ThisworkispartlysupportedbytheFren hNationalResear hAgen y(ANR)asapartoftheEDISON3Dproje t

(4)

A variety of musi signalpro essing te hniques a t in the TF domain, exploiting the parti ular stru ture of

musi signals. Forinstan e,thefamilyofte hniquesbasedonNonnegativeMatrixFa torization(NMF)isoften

applied to spe trogram-likerepresentations, and hasprovedto provideasu essful and promisingframework

forsour eseparation[1℄. Magnitude-re overyte hniques arealsousefulforrestoringmissingdatain orrupted

signals[2℄.

However, whenit omes to resynthesize timesignals, thephasere overyof the orrespondingShort-Time

Fourier Transform (STFT) is ne essary. In the sour e separationframework, a ommon pra ti e onsists in

applyingWiener-likeltering(softmaskingofthe omplex-valuedSTFToftheoriginalmixture). Whenthere

isnoprioronthephaseofa omponent(e.g. inthe ontextofaudiorestoration),a onsisten y-basedapproa h

isoftenusedforphasere overy[3℄. Thatis,a omplex-valuedmatrixisiteratively omputedtobe losetothe

STFTof a time signal. A re ent ben hmark has been ondu ted to assess thepotentialof sour e separation

methodswithphasere overyinNMF[4℄. Itpointsoutthat onsisten y-basedapproa hesprovidepoorresults

in terms of audio quality. Besides, Wiener lteringfails to provide good resultswhen sour es overlapin the

TFdomain. Thus,phasere overyofmodiedaudio spe trogramsisstill anopenissue. TheHigh Resolution

NMF(HRNMF) model [5℄ hasshownto beapromising approa h,sin e itmodelsaTF mixtureasa sumof

autoregressive(AR) omponentsintheTFdomain, thus dealingexpli itlywithaphasemodel.

Anotherapproa htore onstru tthephaseofaspe trogramistouseaphasemodelbasedontheobservation

of fundamental signals that are mixtures of sinusoids. Contrary to onsisten y-based approa hes using the

redundan y of the STFT, this model exploits the natural relationship betweenadja entTF bins due to the

model. This approa h is used in the phase vo oder algorithm [6℄, although it is mainly dedi ated to time

stret hing andpit h modi ationofsignals,and itrequiresthephaseoftheoriginal STFT.Morere ently,[7℄

proposeda omplexNMFframeworkwithphase onstraintsbasedonsinusoidalmodeling. Althoughpromising,

this approa h is limited to harmoni and stationary signals, and requires prior knowledge on fundamental

frequen iesand numbersofpartials.

In this paper, we propose a generalizationof this approa h that onsists in estimating the phase eld of

mixtures of sinusoids from its expli it al ulation. We then obtain an algorithm whi h unwraps the phases

horizontally (overtimeframes) toensurethe temporal oheren eof thesignal,and verti ally (overfrequen y

hannels)toenfor espe tral oheren ebetweenpartials,whi harenaturallyobservedinmusi ala ousti s. Our

te hniqueissuitableforavarietyofpit hedmusi signals,su haspianoorguitarsounds. Adynami estimation

(atea htimeframe)ofinstantaneousfrequen iesextendsthevalidityofthiste hniquetonon-stationarysignals

su has ellosandspee h. Thiste hniqueistestedonavarietyofsignalsandintegratedinanaudiorestoration

framework.

Thepaperisorganizedasfollows. Se tion2presentsthehorizontal phaseunwrappingmodel. Se tion3is

dedi atedtophasere onstru tionononsetframes. Se tion4presentsaperforman eevaluationofthiste hnique

through various experiments. Se tion 5 introdu es anaudio restoration framework using this phase re overy

method. Finally,se tion6drawssome on ludingremarks.

2 Horizontal phase re onstru tion

2.1 Sinusoidal modeling

Letus onsiderasinusoidofnormalizedfrequen y

f

0 ∈ [−

1

2 ;

1

2 ]

,originphase

φ

0 ∈ [−π; π]

andamplitude

A >

0

:

∀n ∈ Z

,

x(n) = Ae

2iπf

0 n+iφ

0 .

(1)

TheexpressionoftheSTFTis,forea hfrequen y hannel

k

∈

J

−

F −1

2 ;

F −1

2

K(with

F

theodd-valuedFourier

transformlength)andtimeframe

t

∈ Z

:

X(k, t) =

N −1

X

n=0

x(n + tS)w(n)e

−2iπ

k

F

n

(2)

(5)

where

w

isa

N

sample-longanalysiswindowand

S

isthetimeshift(insamples)betweensu essiveframes. Let

W

(f ) =

P

N −1

n=0

w(n)e

−2iπf n

bethedis retetimeFouriertransformoftheanalysiswindowforea hnormalized

frequen y

f

∈ [−

1

2 ;

1

2 ]

. ThentheSTFTofthesinusoid (1)is:

X

(k, t) = Ae

2iπf

0 St+iφ

0 W

k

F

− f

0 .

(3)

TheunwrappedphaseoftheSTFTisthen:

φ(k, t) = φ

0 + 2πSf

0 t

+ ∠W

k

F

− f

0

(4)

where

∠z

denotestheargumentofthe omplexnumber

z

. This leadsto arelationshipbetweentwosu essive

timeframes:

φ(k, t) = φ(k, t − 1) + 2πSf

0 .

(5)

Moregenerally,we an omputethephaseoftheSTFTofafrequen y-modulatedsinusoid. Ifthefrequen y

variationislowbetweentwosu essivetimeframes,we angeneralizethepreviousequation:

φ(k, t) = φ(k, t − 1) + 2πSf

0 (t).

(6)

Instantaneousfrequen ymustthenbeestimatedatea htimeframetoen ompassvariablefrequen ysignals

su h asvibratos,whi h ommonlyo urinmusi signals(singingvoi eor ellosignalsforinstan e).

2.2 Instantaneous frequen y estimation

Quadrati interpolation FFT (QIFFT) is a powerfultool for estimating the instantaneous frequen y near a

magnitude peak in thespe trum[8℄. It onsists in approximatingtheshapeof aspe trumnear amagnitude

peakbyaparabola. Thisparaboli approximationisjustiedtheoreti allyforGaussiananalysiswindows,and

usedinpra ti alappli ationsforanywindowtype. The omputationofthemaximumoftheparabolaleadsto

theinstantaneousfrequen yestimate. Note thatthis te hniqueissuitableforsignalswhere onlyonesinusoid

isa tiveperfrequen y hannel.

The frequen y bias of this method anbe redu edby in reasing thezero-padding fa tor [9℄. Fora Hann

windowwithoutzero-padding, the frequen yestimation erroris lessthan

1

%,whi h ishardly per eptible in

mostmusi appli ationsa ordingtotheauthors.

2.3 Regions of inuen e

Whenthemixtureis omposedofonesinusoid,thephasemustbeunwrappedinallfrequen y hannelsa ording

to (5) using the instantaneous frequen y

f

0

. When there is more than onesinusoid, frequen y estimation is

performednearea hmagnitudepeak. Then,thewholefrequen yrangemustbede omposedinseveralregions

(regionsofinuen e[6℄)toensurethatthephaseinagivenfrequen y hannelisunwrappedwiththeappropriate

instantaneousfrequen y.

At time frame

t

, we onsider amagnitude peak

A

p

in hannel

k

p

. The magnitudes (resp. the frequen y

hannels) of neighboring peaks are denoted

A

p−1

and

A

p+1

(resp.

k

p−1

and

k

p+1

). We dene the region of

inuen e

I

p

ofthe

p

-thpeakasfollows:

I

p

=

A

p

k

p−1

+ A

p−1

k

p

A

p

+ A

p−1

;

A

p

k

p+1

+ A

p+1

k

p

A

p

+ A

p+1

.

(7)

The greater

A

p

is relativelyto

A

p−1

and

A

p+1

, the wider

I

p

is. Note that other denitions of regions of

(6)

3.1 Impulse model

Impulsesignalsareusefulto obtainarelationshipbetweenphasesoverfrequen ies(verti alunwrapping)[10℄.

Although they do not a urately model atta k sounds, they provide simple equations that an be further

improvedformore omplexsignals. Themodelis:

∀n ∈ Z

,

x(n) = Aδ

n−n

0

(8)

where

δ

isequaltooneif

n

= n

0

(theso- alledatta ktime)andzeroelsewhereand

A >

0

istheamplitude. Its

STFTisequaltozeroex eptwithinatta kframes:

X

(k, t) = Aw(n

0 − St)e

−2iπ

k

F

(n

0 _−St)

.

(9)

We anthenobtainarelationshipbetweenthephasesoftwosu essivefrequen y hannelswithinanonset

frame,assumingthat

w

≥ 0

:

φ(k, t) = φ(k − 1, t) −

2π

F

(n

0 − St).

(10)

Thesimilaritybetween(10)and(5)wasexpe tedbe ausetheimpulseisthedualofthesinusoidin theTF

domain. This omparisonnaturallyleadstoestimatingparameter

n

0

(the"instantaneous"atta ktime)inea h

frequen y hannelaswepreviouslyestimated

f

0

(theinstantaneousfrequen y)inea htimeframe( f. equation

(6)). Thisleadsto thefollowingverti alunwrappingequation:

φ(k, t) = φ(k − 1, t) −

2π

F

(n

0 (k) − St).

(11)

3.2 Atta k time estimation

Inordertoestimate

n

0 (k)

, welookatthemagnitudeoftheSTFToftheimpulseinafrequen y hannel

k

:

|X(k, t)| = Aw(n

0 (k) − St).

(12)

Wethen hoose

n

0

su h thattheSTFTmagnitudeoftheimpulseoveronsetframeshasashapesimilar to

that oftheanalysiswindow. Forinstan e,aleast-squaresestimationmethod anbeused. Syntheti mixtures

of impulses areperfe tly re onstru ted with this te hnique. Alternatively, we an alsoestimate

n

0 (k)

witha

temporalQIFFT andupdatethephasewith(11).

4 Experimental evaluation

4.1 Proto ol and datasets

TheMATLABTempogramToolbox[11℄providesafastandreliableonsetframesdete tionfromspe trograms.

Weuseseveraldatasetsinourexperiments:

A: 30mixturesofpianonotesfromtheMidiAligned PianoSounds(MAPS)database[12℄,

B: 30pianopie esfrom theMAPS database,

C: 12stringquartetsfrom theSCoreInformedSour e SeparationDataBase(SCISSDB)[13℄,

D: 40spee hex erptsfromtheComputationalHearinginMultisour eEnvironments(CHiME)database[14℄.

The data issampled at

F

s

= 11025

Hz and theSTFTis omputed witha

512

sample-longHann window

and

75

%overlap. TheSignaltoDistortionRatio(SDR)isusedforperforman emeasurement. Itis omputed

with the BSS Eval toolbox [15℄ and expressed in dB. The popular onsisten y-basedGrin and Lim (GL)

algorithm [3℄ isalso used as areferen e. We run

200

iterationsof this algorithm (performan eis notfurther

improved beyond). It is initialized with known phase values ifany or random valuesif not, and results are

(7)

Time (s)

Frequency (Hz)

0

0.5

1

1.5

2

2.5

3

500 1000

1500

2000

2500

3000

3500

4000

4500

5000

5500

0

0.5

1

1.5

2

2.5

3 1500

1520

1540

1560

1580

1600

1620

1640

1660

Time (s)

Frequency (Hz)

phase vocoder

QIFFT

Figure4.1: Spe trogramofamixturewithvibrato(left)andinstantaneousfrequen iesinthe

2800

Hz hannel

(right) Dataset Error GL PU A

0.38 −6.9

2 .5

B

0.36 −12.6

1 .7

C

0.41 −9.7

5 .3

D

0.52 −0.4

0 .5

Table1: Frequen yestimationerror(%)andre onstru tionperforman e(SDRindB)forvariousaudio

datasets

4.2 Horizontal phase re onstru tion

Arstexperiment onsistsin estimatinginstantaneousfrequen iesonsyntheti mixturesofdampedsinusoids,

whi h parameters(in parti ularthe frequen ies) areuser-dened. Frequen y estimation errorwith QIFFT is

belowthethresholdof

0.2

%, ommonlyreferredtoasthemaximalhumanauditory resolution.

Figure4.1 illustratestheinstantaneousfrequen iesestimatedwiththephasevo oderte hnique[6℄,used as

areferen e,andwithouralgorithmonavibrato. Identi alresultsareobtained. Ourmethodisthussuitablefor

estimatingvariableinstantaneousfrequen ysignalsaswellasstationary omponents. We omputedtheaverage

frequen y error between phasevo oder and QIFFT estimates for the datasets presented in se tion 4.1. The

resultspresentedintherst olumnofTable1 onrmthatQIFFTprovidesana uratefrequen yestimation.

Table1 alsopresentsre onstru tion performan e (assumingonset phases areknown) for both Grin and

Lim(GL)andourPhaseUnwrapping(PU)algorithms. Ourapproa hsigni antlyoutperformsthetraditional

GL method: both stationary and variable frequen y signals are re onstru ted a urately. In addition, our

algorithm isfaster than theGL te hnique: ona

3

min

48

s pianopie e,the re onstru tionis performedin

18

s

withourapproa handin

623

swithGLalgorithm.

4.3 Onset phase re onstru tion

Onset phases an be re onstru ted with

n

0

-estimation using the impulse magnitude (PU-Impulse) or with

QIFFT (PU-QIFFT). We also test random phases values (PU-Rand, no verti al oheren e), zero phases

(PU-0, partialsin phase) and alternatingpartialphasesbetween

0

and

π

(PU-Alt, phase-opposed partials).

These hoi esarejustiedbytheobservationoftherelationshipbetweenpartialsinmusi ala ousti s[16℄. The

phaseofthepartialsisthenfullyre overedwithhorizontal unwrapping. Wetestthese methodsondatasetA.

ResultspresentedinTable2showthatallourapproa hesprovidebetterresultsthanGLalgorithmonthis lass

ofsignals. Onsetphaseunwrappingwith

n

0

-estimationbasedonQIFFTprovidesthebestresult,ensuringsome

formofverti al oheren e. Inparti ular,weper eptuallyobservethatthisapproa hprovidesaneatper ussive

(8)

GL

−7.9

PU-Impulse

−4.0

PU-QIFFT

−2.6

PU-Rand

−4.3

PU-0

−4.7

PU-Alt

−3.5

Table2: Signalre onstru tion performan eofdierentmethodsondatasetA

10

20

30

40

50

60

70

80

90

100 −15

−10

−5

0

5

10

15

20

25

30 Percentage of corrupted bins

SDR (dB)

Phase unwrapping

GriffinLim

Corrupted

Figure4.2: Re onstru tion performan eofdierentmethodsandper entagesof orruptionondatasetA

4.4 Complete phase re onstru tion

We onsider unalteredmagnitudespe trogramsfrom datasetA.A variable per entageofthe STFTphasesis

randomly orrupted. We evaluate the performan e of our algorithm to restore thephase both on onset and

non-onsetframes.

Figure 4.2 onrms the potential ofthis te hnique. Ourmethod produ ed anaverage in reasein SDR of

6

dBoverthe orrupted data. It also performs better thanthe GL algorithm whena high per entageof the

STFTphasesmustbere overed.

However,notethatthisexperiment onsistsinphasere onstru tionof onsistent spe trograms(i.e positive

matri esthatare themagnitudeoftheSTFTofatimesignal): GLalgorithm isthennaturallyadvantagedin

this ase. Realisti appli ations( f. nextse tion)involvetherestorationofbothphaseand magnitude,whi h

leadsto in onsistentspe trograms.

The goalof audio inpaintingis to restore orrupted ormissing value of asignal. Sin e orruption an be

doneinthetemporaldomain orintheTFdomain,wewillstudythosetwoapproa hes.

First,wewillpartiallyre overphasefroma onsistentspe trogram(the STFTmagnitudeisnotmodied).

Se ondly, wewillre onstru t whole parts ofSTFT(bothmagnitudeandspe trogram). Finally, atimesignal

is orrupted with li ks and we ompare restoration with a temporal method, our approa h, and HR NMF

algorithm[5℄.

5 Appli ation of phase re onstru tion to audio restoration

%subse tionTemporal orruption: li kremoval

A ommonalterationofmusi signalsisthepresen e ofnoiseonshort timeperiods(afew samples) alled

li ks. We orrupttimesignalswith li ksthatrepresentlessthan

1

%ofthetotalduration. Cli ksareobtained

bydierentiatinga

10

sample-longHannwindow.

Magnitude restoration of missing binsis performed by linear interpolation of the log-magnitudes in ea h

(9)

Time (s)

Frequency (Hz)

0

1

2 1000

2000

3000

4000

5000

Time (s)

0

1

2 1000

2000

3000

4000

5000

Time (s)

0

1

2 1000

2000

3000

4000

5000

Figure5.1: Restorationofspe trogrambylinearinterpolationofthelog-magnitudesonapianonotes

Dataset AR HRNMF GL PU

A

11.4

16 .9

8.6

11 .7

B

4.3

10 .9

5.9

7 .1

C

8.2

10 .6

6.6

7 .1

D

8.3

10 .9

8.9

9 .4

Table3: Signalrestorationperforman e(SDRindB)forvariousmethodsanddatasets

(PU)oralternativelywiththeGL algorithm. We omparethose resultstothetraditional restorationmethod

basedonautoregressive(AR)modelingofthetimesignal[18℄, andwithHRNMF[5℄.

Table 3presents results of restoration. HRNMFprovidesthe best results in termsof SDR. Though, our

approa houtperformsthetraditionalmethodandGLalgorithm.Besides,weunderlinethattheHRNMFmodel

uses the phase of the non- orruptedbins, while our algorithm is blind. Lastly, ourte hnique remains faster

thanHRNMF:fora

3

min

55

spianopie e,restorationisperformedin

99

swithouralgorithm andin

222

swith

HRNMF.

6 Con lusion

The new phase re onstru tion te hnique introdu ed in this work appears to be an e ient and promising

method. Theanalysisofmixturesofsinusoidsleadstorelationshipsbetweensu essiveTFbinsphases. Physi al

parameters su h as instantaneous frequen ies and atta k times are estimated dynami ally, en ompassing a

variety of signals su h as piano and ellos sounds. The phase is then unwrapped in all frequen y hannels

for onset framesand overtimefor partials. Experimentshavedemonstrated thea ura y of theunwrapping

method, andweintegrateditinanaudiorestorationframework. Betterresultsthanwith traditionalmethods

havebeenrea hed.

There onstru tion ofonset framesstillneedsto beimprovedassuggestedby thevarietyof data. Further

work willfo usonexploitingknownphasedataforre onstru tion: missingbins anbeinferredfromobserved

phasevalues. Alternatively,time-invariantparameterssu hasphaseosetsbetweenpartials[19℄ anbeused.

Su hdevelopmentswillbeintrodu edinanaudiosour eseparationframework,wherethephaseofthemixture

(10)

[1℄ ParisSmaragdisandJudithC.Brown, Non-negativematrixfa torizationforpolyphoni musi

trans rip-tion, in Pro . IEEE Workshop onAppli ations of SignalPro essing toAudioandA ousti s(WASPAA),

NewPaltz,NY, USA,O tober2003.

[2℄ DerryFitzgerald and Dan Barry, On inpainting the adress algorithm, in Pro . IET Irish Signals and

SystemsConferen e(ISSC),Maynooth,Ireland,June2012,pp.16.

[3℄ DanielGrinandJaeLim, Signalestimationfrommodiedshort-timeFouriertransform, IEEE

Trans-a tionson A ousti s,Spee h andSignalPro essing, vol.32,no.2,pp.236243,April1984.

[4℄ Paul Magron, Roland Badeau, and Bertrand David, Phase re onstru tion in NMF for audio sour e

separation: An insightfulben hmark, inPro . IEEE International Conferen e onA ousti s, Spee h and

SignalPro essing (ICASSP),Brisbane,Australia,April2015.

[5℄ Roland Badeau and Mark D. Plumbley, Multi hannel high resolution NMF for modelling onvolutive

mixtures ornon-stationary signalsin thetime-frequen y domain, IEEE Transa tions on Audio Spee h

andLanguage Pro essing,vol.22,no.11,pp.16701680,November2014.

[6℄ JeanLaro heandMarkDolson, Improvedphasevo odertime-s alemodi ationofaudio, IEEE

Trans-a tionson Spee h andAudioPro essing,vol.7,no.3,pp.323332,May1999.

[7℄ JamesBronsonand PhilippeDepalle, Phase onstrained omplexNMF:Separatingoverlappingpartials

in mixtures of harmoni musi al sour es, in Pro . IEEE International Conferen e on A ousti s, Spee h

andSignalPro essing(ICASSP),Floren e,Italy,May2014.

[8℄ MototsuguAbeandJuliusO.Smith, Design riteriaforsimplesinusoidalparameterestimationbasedon

quadrati interpolationof FFTmagnitude peaks, in Audio Engineering So iety Convention 117,Berlin,

Germany,May2004,Audio EngineeringSo iety.

[9℄ MototsuguAbeandJuliusO.Smith, Design riteriaforthequadrati allyinterpolatedFFTmethod (i):

Biasduetointerpolation, Te h.Rep.STAN-M-117,StanfordUniversity,DepartmentofMusi ,2004.

[10℄ AkihikoSugiyamaandRyojiMiyahara, Tapping-noisesuppressionwithmagnitude-weightedphase-based

dete tion, in Prof. of IEEE Workshop on Appli ations of Signal Pro essing to Audio and A ousti s

(WASPAA),NewPaltz,NY,USA,O tober2013,pp.14.

[11℄ Peter Gros heandMeinardMüller, TempogramToolbox: MATLABtempoand pulseanalysis ofmusi

re ordings, in Pro . International So iety for Musi Information Retrieval (ISMIR)Conferen e, Miami,

USA,O tober2011.

[12℄ Valentin Emiya, Nan y Bertin, Bertrand David, and Roland Badeau, MAPS - A piano database for

multipit h estimation and automati trans ription of musi , Te h. Rep. 2010D017, Télé om ParisTe h,

Paris,Fran e, July2010.

[13℄ RomainHennequin,RolandBadeau,and BertrandDavid, S oreinformedaudiosour eseparationusing

aparametri modelof non-negativespe trogram, in Pro . IEEE International Conferen e onA ousti s,

Spee h andSignalPro essing(ICASSP),Prague,Cze h Republi ,May2011,pp.4548.

[14℄ Jon Barker, Emmanuel Vin ent, Ning Ma, Heidi Christensen, and Phil Green, The PASCAL CHiME

Spee h Separationand Re ognitionChallenge, Computer Spee h andLanguage, vol.27, no.3, pp.621

633,Feb.2013.

[15℄ Emmanuel.Vin ent,RémiGribonval,andCédri Févotte,Performan emeasurementinblindaudiosour e

separation, IEEETransa tions onSpee h andAudioPro essing,vol.14,no.4,pp.14621469,July2006.

(11)

fa torization, ImageandVision Computing,2007.

[18℄ SimonJ.GodsillandPeterJ.W.Rayner,DigitalAudioRestoration-AStatisti alModel-BasedApproa h,

Springer-Verlag,1998.

[19℄ HolgerKir hho,RolandBadeau,andSimonDixon, Towards omplexmatrixde omposition of

spe tro-grambasedontherelativephaseosetsofharmoni sounds, inPro .IEEE International Conferen eon

(12)

Dépôt légal : 2015 – 2

e

trimestre

Imprimé à Télécom ParisTech – Paris

ISSN 0751-1345 ENST D (Paris) (France 1983-9999)

(13)