HAL Id: inria-00577079
https://hal.inria.fr/inria-00577079
Submitted on 16 Mar 2011
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de
Amir Adler, Valentin Emiya, Maria Jafari, Michael Elad, Rémi Gribonval, Mark D. Plumbley
To cite this version:
Amir Adler, Valentin Emiya, Maria Jafari, Michael Elad, Rémi Gribonval, et al.. Audio Inpainting.
IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2012, 20 (3), pp.922 - 932. �10.1109/TASL.2011.2168211�. �inria-00577079�
a p p o r t
d e r e c h e r c h e
0249-6399ISRNINRIA/RR--7571--FR+ENG
Domaine Audio, Speech, and Language Processing
Audio Inpainting
Amir Adler — Valentin Emiya — Maria G. Jafari — Michael Elad — Rémi Gribonval — Mark D. Plumbley
N° 7571
March 16, 2011
Centre de recherche INRIA Rennes – Bretagne Atlantique
AmirAdler , ValentinEmiya, MariaG. Jafari, Mihael Elad ,
RémiGribonval,Mark D. Plumbley
Domaine:
Équipe-ProjetMetiss
Rapportdereherhe n°7571Marh 16,201124pages
Abstrat:
We propose the Audio Inpainting framework that reoversaudio intervals
distortedduetoimpairmentssuhasimpulsivenoise,lipping,andpaketloss.
Inthisframework,thedistortedsamplesaretreatedasmissing,and thesignal
is deomposed into overlappingtime-domainframes. Therestoration problem
is then formulatedasan inverse problem peraudioframe. Sparserepresenta-
tionmodelingisemployedperframe,andeahinverseproblemis solvedusing
theOrthogonalMathingPursuitalgorithmtogetherwithadisreteosineora
Gabor ditionary. Theperformaneofthis algorithmisshownto beompara-
ble orbetterthan state-of-the-artmethodswhen bloks ofsamplesof variable
durationsaremissing. Wealsodemonstratethatthesizeoftheblokofmissing
samples, rather than the overall number of missing samples, is a ruial pa-
rameterforhighqualitysignalrestoration. Wefurtherintrodueaonstrained
MathingPursuitapproahforthespeialaseofaudiodelippingthatexploits
thesignpatternoflippedaudiosamplesandtheirmaximalabsolutevalue,as
wellasallowingtheusertospeifythemaximumamplitudeofthesignal. This
approah is shown to outperforms state-of-the-artand ommerially available
methods foraudiodelipping.
Key-words: Inpainting,lipping,sparserepresentation,mathingpursuit.
A.AdlerandM.EladarewiththeComputerSieneDepartment,TheTehnion,Haifa
32000, Israel. V.EmiyaandR.GribonvalarewithINRIA,CentreInriaRennes-Bretagne
Atlantique,35042RennesCedex,Frane. M.G.JafariandM.D.PlumbleyarewithQueen
MaryUniversityofLondon,CentreforDigitalMusi,DepartmentofEletroniEngineering,
LondonE14NS,U.K.,(e-mail:maria.jafariele.qmul.a.uk).
This workhas been submittedto IEEE Transations onAudio Speeh and Language
Proessing. Partofthisworkhasbeenpresented atthe IEEEInternational Confereneon
Aoustis,SpeehandSignalProessing(ICASSP)in2011[1℄.
ThisworkwassupportedbytheEUFramework7FET-OpenprojetFP7-ICT-225913-
SMALL:SparseModels,AlgorithmsandLearningforLarge-Saledata.
Résumé: Nousintroduisonsleoneptd'InpaintingAudiopourlarestauration
deportionsdedonnéesaudiodistorduespardesdégradationstelsqueleslis,
lelippingoulapertedepaquets. Danseontexte,lesdonnéesdistorduessont
onsidéréesommemanquantesetlesignalestdéomposédansledomainetem-
porelen trames. Le problèmederestaurationest formuléommeunproblème
inversedanshaquetrame. Celle-iestmodéliséeparunereprésentationpari-
monieuseetleproblèmeinverseestrésoluvial'algorithmeOrthogonalMathing
PursuitenutilisantunditionnairedeosinusdisretoudeGabor. Lesperfor-
manesobtenuessontomparablesàl'étatdel'art,avedesblosd'éhantillons
manquants de durée variable. Nous montrons également que la qualité de la
restauration dépend davantagede lataille desblos d'éhantillonsmanquants
que du nombre total d'éhantillons manquants. Nous introduisons enn un
algorithme de type Mathing Pursuit ave ontraintes pour le as partiulier
dudelippingaudio,danslaquellesontexploitéeslespropriétésd'amplitudedes
éhantillonssaturés: signe,amplitudeminimumetmaximum. Lesperformanes
obtenuessontsupérieuresàellesdel'étatdel'artetàdelogiielsommeriaux
pourledelipping.
Mots-lés : Inpainting, lipping, représentations parimonieuses, mathing
pursuit.
0 0.01 0.02 0.03
−1 0 1
Time (s)
Amplitude
(a) Speeh signal orrupted by liks (ir-
les).
0 0.01 0.02 0.03
−1 0 1
Time (s)
Amplitude
(b)Clippedversion(blak)ofaspeehsignal
(gray).
()Theimageinpaintingproblem:reov-
eryofloally-hiddenpixels.
Figure1: Examplesofrestorationproblemsrelatedtoinpainting.
1 Introdution
Speeh and musi signalsare often subjetto loalized distortions, where the
intervalsof distorted samples are surrounded by undistorted samples. Exam-
ples inlude impulsive noiseorliks(see Fig.1a), lipping (seeFig. 1b), CD
srathes,paketlossin ordlessphonesorVoieoverIP (VoIP)andmore. In
suhsituations,thedistorted samplesanbetreatedasmissing. Arestoration
algorithm is employedto reonstrut the missing samples,in a similarwayas
for image inpainting(see Fig.1). However,in the audioeld,suh problems
have been treated separately and depending on the ontext, they have been
referredtoasaudiointerpolation[26℄,extrapolation[3,7,8℄,imputation[9,10℄,
indution [11℄,(bandwidth)extension[1215℄oronealment[16,17℄.
Substantialeorthasbeenfoused onthe restorationof audiosignalsor-
rupted by liksdue to old reordingsorsrathed CDs (seeFig.1a). In this
problem,intervalsoforruptedsamplesfrom20µsto4ms[4℄ouratran-
domloations. Typialapproahesemployautoregressive(AR)modeling[2,3℄,
or Bayesian estimation to reover the orrupted samples [4℄. Other methods
utilize neuralnetworks[18℄ or sinusoidalmodeling[5,8℄. A relatedproblem is
automati speeh reognitionin the presene of isolated noisy samples. This
problem istreatedin [10℄ withaompressivesensingapproah in thespetro-
gramimage domain,andbysolvinganl1 regularizedleastsquaresproblem.
Anotherimportant thoughless often addressed problem is audiolip-
ping[6,7,19℄,whihreferstothetrunationofthewaveformbeyondathreshold
when the maximum range in an aquisition systemis exeeded (see Fig. 1b).
Thelippedsamplesarearrangedingroupsandtheirloationsaredetermined
bytheamplitudeofthesignal(ratherthanbeingrandomlyspread). Thedelip-
ping problemispartiularly hallengingfor thisreasonandastheinformation
arriedbythehighest-amplitudesamplesisompletely absent.
Long intervals of samples may be lost during transmission over ordless
phones orin VoIP systems, where the problem is addressedusing paket loss
onealmentalgorithms [16,17℄. Missing intervals lengthsare in the range of
5msto60ms,whiharelosetothetypialdurationforthepseudo-stationarity of audiosignals. Thelowlateny requirement in theVoIP ase resultsin rel-
ativelysimple algorithms; however,estimating missingpakets in peer-to-peer
repositories is a new appliation where higher quality reonstrution an be
expeted(as thelatenyrequirementislessstringent).
Finally, the unreliable or missing audio data an be time-frequeny re-
gions[5,9,11,14,20℄,inlassiationappliationslikeautomatispeehreogni-
tion[9,20℄orsoureseparationwithtime-frequenyloalizedinterferenethe
phraseaudio inpainting hasbeen usedone in thisspei ase [11℄. Band-
width extension [1215℄ is another important time-frequeny-domain applia-
tion,wherehighfrequenyontentisestimatedfromthelowfrequenyontent
in ordertoprovidehighqualityaudio.
Inthispaper,wepresentauniedframeworkfortherestorationofdistorted
audiodata,leveragingtheoneptofImageInpainting [2123℄. Intheproposed
framework, termed Audio Inpainting, the distorted data is assumed missing
and its loation is assumed to be known a-priori. We further employ Sparse
Representations(SR),whihhavebeendemonstratedtofaithfullymodelaudio
signals[24,25℄andto addresstheimageinpaintingframework[22,26,27℄. The
proposedapproahisdiretlybaseduponthosepriorworks.
Theontributionsofthispaperarefour-fold:
a) Audio inpainting isdened asan inverse problem, basedupon theonept
ofimageinpainting.
b) Aframeworkforaudioinpaintingin thetimedomain isproposed,basedon
sparserepresentations. It exploits twopossibleditionaries(disrete osine
andGabor)knownto provideauratesparsemodelsforaudiosignals.
) TheOrthogonalMathingPursuit (OMP)algorithmforaudioinpaintingis
adapted,in partiulartodealwiththepropertiesoftheGaborditionary.
d) Aonstrainedmathingpursuitapproahisappliedtosigniantlyenhane
theperformaneforaudiodelippingproblems.
Thispaperisorganizedasfollows. InSetion2,audioinpaintingisformal-
izedasaninverseproblem. TheproposedframeworkisintroduedinSetion3
inludingthesparsemodelsused fortime-domain audioinpainting. Theadap-
tation ofthe OMP algorithm foraudio inpaintingin the timedomain and for
audiodelippingispresentedinSetion4. Severalexperimentsareproposedin
Setion 5,whilewedisussourndingsanddrawonlusionsinSetion6.
2 Audio Inpainting Problem Statement
We dene audio inpainting as a general problem enountered in many appli-
ations: oneobserves apartial set of reliable audio data while the remaining
unreliabledataiseithertotallymissingorhighlydegraded;theunreliabledata
isonsideredmissing anditisestimatedfromthereliabledataportion.
The general formulation of audio inpainting is given in Setion 2.1 while
severalpartiulartime-domainasesaredetailedin Setions2.2and2.3.
2.1 Formulation of audio inpainting
We onsider a vetor s ∈ RL of audio data and an a-priori known partition
{Im, Ir} of the support I , {1,2,· · ·, L} of s: Im ⊂ I and Ir , I\Im. We
assume that the oeients s(Im) are either missing or masked by a severe
distortion. Thus, theobserveddata y ∈RL oinides withs onIr only. The
audio inpainting problem is dened as the reovery of the oeients s(Im)
basedontheknowledgeof:
1. thereliabledatayr,y(Ir) =s(Ir),
2. thepartition {Im, Ir},
3. additionalinformationabouttheobservedsignal,
4. and, optionaly, informationabout themissing data (see e.g. in thease
oflippingbelow).
Inmatrixform,thereliabledatayrresultfromthelinearmodel
yr=Mrs, (1)
whereMristheso-alledmeasurementmatrixobtainedfromtheL×Lidentity
matrixIL byseletingtherowsIrassoiatedwiththereliableoeientsin s.
In a similar way, the missing data to be reoveredare s(Im) = Mms, where MmonsistsoftherowsIm inIL.
Inthegeneralaudioinpaintingframework,audiodataanbeeithersamples
in waveformsor oeientsin transforms liketime-frequeny representations.
The problemformulation aboveanbe usedfor multi-dimensionalsignalslike
multihannelwaveformsortime-frequenyoeients,bysimplyreshapingthe
signalmatrixasavetors.
Intherestofthispaper,weonlyonsidertheinpaintingofmissingsamples
in a single-hannel waveform. The multi-dimensional ase is disussed in the
onlusion(seeSe.6).
2.2 Inpainting samples distorted by impulsive noise
Inthepartiularaseofasignalorruptedbyimpulsivenoisesuhasliks(see
Fig. 1a),Im is aset of integersbetween 1 and L and must be estimated in a
preliminarystage. Oneoftenonsidersthatthedistortedsamplesareorrupted
byaGaussiannoisenwithhighvariane. Hene,theompleteobservedsignal
inludes boththereliablesamplesyranddistorted onesym: (yr =Mrs
ym =Mms+n,
(2)
wherethesamplesMmsinym aremaskedbynsothattheyareonsideredas
unknown.
2.3 Inpainting intervals of missing samples
In the ase where intervals of samples are missing, due to paket lossduring
transmissionor tomaskingbyaudible interferenes,Im isomposedof groups
of onseutive integers: the samples s(Im) are totally missing and one only
observesyr=Mrs.
Intheaseoflippedsignals,thesamplestobeestimatedarealsoarranged
in intervals of onseutive samples, as depited in Fig. 1b. Their loations
dependontheamplitudeofthesignal,suhthat
Im,{n|1≤n≤L,|s(n)| ≥θlip}, (3)
where θlip is the lipping level. One observes both the un-lipped, reliable
samplesyr andthelipped, maskedsamplesym (yr =Mry=Mrs
ym =Mmy=Mmsign (s)θlip,
(4)
where sign (·) is theelement-wisesign funtion. As presented in thenext se- tions, the information provided by ym, even though very rude a sign (per
sample)andthelippinglevel,stillsubstantiallyenhanestheestimationper-
formane.
3 Time-domain framework and models
Theproposed frameworkfousesontime-domainaudioinpainting. It relieson
aframe-basedproessing,asdetailedinSetion3.1andonthesparserepresen-
tationsmodelingofaudiosignals,aspresentedin Setion3.2. Twoditionaries
usedin thismodelingareintroduedinSetion 3.3.
3.1 Frame-based proessing and reonstrution
Asin manyaudioproessingtasks,thesignalisloally proessed:
bysegmentingitintoframes,
byindependentlyinpaintingeah frame,
andbysynthesizingthefull restoredsignalusing theoverlap-add(OLA)
method[28℄.
Wedeomposethesignalintooverlappingframesindexedbyi,startingattime tiandweightedbyananalysiswindowwawithlengthN. Bystraightforwardly adapting to the loal frames theproblem statementdened for thefull signal
in Setion2,thereliablesamplesin frameianbewrittenas
yri =Mrisi (5)
where Mri isthemeasurementmatrixofthe i-thframeobtainedfrom Mr and si(t),s(t+ti)wa(t) isthe windowedframe dened for0 ≤t≤N−1. We
alsodenethesupportsIirandIimofthereliablesamplesandofthemissingor
maskedsamples,respetively. One theestimationbsi of si bysomeinpainting
algorithmisahieved,thereonstrutionof thefullsignalisobtainedas
bs(t),X
i
ws(t−ti)bsi(t−ti) (6)
where ws is the synthesiswindow suh that P
iws(t−ti)wa(t−ti) = 1,∀t.
Intheproposedapproahes,weutilized64ms-frameswith75%overlap,aret-
angularwindowforwa andasine windowforws.
3.2 Sparse Representations modeling of audio frames
IntheSparseRepresentations(SR)modelingframework[23℄,itisassumedthat
eahframeiswellapproximatedbyasparselinearombinationoftheolumns
ofa(possiblyoveromplete)ditionary:
si≈Dxi, (7)
where D ∈ RN×KD is theditionary, N ≤ KD and xi ∈RKD×1 is the repre- sentationvetorofthei-thframe. xi isassumedtobesparse,i.e. tohavefew
non-zerooeientsompared toN. Asaonsequene,weanalsoutilizethe
SRmodelfortheobservedreliablesamplesin eahframe
yir,Mrisi≈MriDxi. (8)
We propose to reover the unknown samples si(Iim) by estimating as xˆi
the(sparse)representationvetorofeah frame,givenonlytheleanobserved
samples(8)andlimitedsideinformation(forthelippingase)
bsi(Iim) =Mmi Dˆxi. (9)
This formulation inluding the notion of sparsity was rst introdued for
image inpainting [22℄ with a global treatment with global transforms. Then,
eortsweredediatedtoworkonloalpathessimilartoaudioframesand
to introdue alearned ditionary to improve the inpainting results [26℄; they
have been improved [27℄ by modeling betterthe problem and by learningthe
ditionarydiretlyfromtheorruptedimage.
3.3 Ditionaries
We propose two optionsto hoose aditionary D in whih audio signals are
sparse: theDisreteCosineTransformditionary,andaGaborditionary. Both
are widely used for sparse models of audio signals [24,25,29℄. Other xed
ditionaries suh asmultisale DCT [30℄,orlearnedditionary [26℄ spei to
partiularinpaintingtasksmayalso beinterestingoptions.
3.3.1 DisreteCosine Transform (DCT) ditionary
TherstoptiononsistsinawindowedDisreteCosineTransform(DCT)over-
omplete ditionary Dc =£
dc0, . . . ,dcK
c−1
¤
, atom j being dened for 0 ≤j ≤ Kc−1 and0≤t≤N−1 as
dcj(t),wd(t) cos µ π
Kc
µ t+1
2
¶ µ j+1
2
¶¶
(10)
whereKcisthesizeoftheDCTditionaryi.e. thenumberofdisretefrequen-
ies andwd isaweightingwindowset bytheuser. Thishoieis motivated
bythewideuseofwindowedDCTatomsforsparserepresentationofaudiosig-
nals[25℄. However,thezerophaseofDc atomsisnotadaptedto audiosignals
thataremadeupwithsinusoidalomponentswithinitialphasedistributedbe-
tween0and2π. As aonsequene,theDCTmodelatsasabasisratherthan
asasynthesismodelandthesignalsarenotreallysparsein Dc.
3.3.2 Gaborditionary
Theseondoptionaimsatsparselymodelingarbitrary-phasesinusoidalompo-
nentsbyusingaGaborditionaryDg=n dg(j,ϕ)o
(j,ϕ)∈Γ
inwhihtheatomsare
index byaontinuoussetΓ,J0, Kg−1K×[0,2π[andaredenedas dgj,ϕ ,wd(t) cos
µ π Kg
µ t+1
2
¶ µ j+1
2
¶ +ϕ
¶
, (11)
whereKg isthesizeoftheGaborditionary.
Notethat in theurrentaseofaontinuously-indexed ditionary,eq. (7),
(8)and(9)arestillvalid ifwedene
Dgxi= X
(j,ϕ)∈Γ xi(j,ϕ)6=0
dgj,ϕxi(j, ϕ) (12)
where xi ={xi(j, ϕ)}(j,ϕ)∈Γ. Indeed, eq.(12)is anite sumsineonly afew
oeientsinthesparserepresentationvetorxiarenon-zero. Thealgorithmi
aspetsofthisdeompositionwillbeaddressedinSetions4.2and4.3.
4 Audio inpainting algorithms based on Orthog-
onal Mathing Pursuit
Foragiven ditionaryD, we usethe OrthogonalMathingPursuit algorithm
toperformtheinpaintingofanaudioframe,aspresentedin Setion4.1. Some
ditionary-dependentalgorithmistagesarethendetailedinSetion4.2and4.3.
An extension of thealgorithm spei to delipping is nally detailledin Se-
tion4.4.