"Active Learning Strategies: a case study for detection of emotions in speech",

(1)

detetion of emotions in speeh

AlexisBondu,VinentLemaire,andBarbaraPoulain

R&DFraneTeleom,

TECH/EASY/TSI

2avenuePierreMarzin22300Lannion

Abstrat. Mahine learning indiates methods and algorithms whih

allowa model tolearn abehavior thanksto examples.Ative learning

gathersmethodswhihseletexamplesusedtobuildatrainingsetforthe

preditivemodel.Allthestrategiesaimtousethelessexamplesaspossi-

bleandtoseletthemostinformativeexamples.Afterhavingformalized

theativelearningproblemandafterhavingloateditintheliterature,

this artile synthesizes inthe rst part the mainapproahes of ative

learning.Takingintoaount emotionsinHuman-mahineinterations

anbehelpfulfor intelligentsystemsdesigning.Themaindiulty,for

theoneptionofallsenter'sautomatishuntingsystem,istheostof

datalabeling.Thelastsetionofthispaperproposetoreduethisost

thankstotwoativelearningstrategies.Thestudyisbasedonrealdata

resultingfromtheuseofavoalstokexhangeserver.

1 Introdution

Ativelearningmethodsomefromaparallelbetweenativeeduationalmeth-

ods and learning theory. The learner is from now astatistial model and not

a student. The interations between the student and the teaher orrespond

to theopportunity(possibility)to themodelto interat withahumanexpert.

The examples are situations used by the model to generate knowledge on the

problem.

Ative learningmethods allow the model to interat with its environment

byseletingthemoreinformative situations. Thepurposeisto trainamodel

whih usesas litleas possibleexamples. Theelaboration of thetraining set is

doneininterationwithahumanexperttomaximizeprogressofthemodel.The

modelmustbeabletodetetthemoreinformativeexamplesforitslearningand

toask totheexpert:what shouldbedonein thesesituations.

Thepurposeofthispaperistopresenttwomainativelearningapproahes

found in thestateof theart. These approahesare presentedin ageneri way

withoutonsideredakindofmodel(the onewhih learnsusingexamplesdeliv-

eredbytheexpertaftereveryofitsrequests).Othersapproahesexistbutthey

arenotpresentedinthispaperalthoughreferenesaregivenforthereaderwho

(2)

ageneriwayandestablishmathematialnotationsused.Theaimofthissetion

istoplaeativelearningamongothersstatistiallearningmethods(supervised,

unsupervised...).Thefourthsetionpresentsindetailstwomainativelearning

approahes. These two strategies are then used in the fth setion on a real

problem.Finallythelastsetionisadisussiononquestionopeninthispaper.

2 Ative Learning

2.1 General remarks

Theobjetiveofstatistiallearning(unsupervised,semi-supervised,supervised 1

)

is to inulate a behavior to a model using observations (examples) and a

learning algorithm.The observations are pointsof viewon the problem to be

resolvedand onstitutethelearningdata. Atthe endof thetrainingstage the

model hastogeneralizeitslearningto unseensituationsinareasonableway.

Forexamplelet'simagineamodelwhihtrytodetethappyandunhappy

people from passport photo. If themodel realizes good preditions for unseen

peopleduringitstrainingstagethenthemodelorretlygeneralize.

Charateristisof used data hange depending on the learning mode. Un-

supervised learning is a method of mahine learning where a model is t to

observations.Itisdistinguishedfrom supervisedlearningbythefatthat there

are no apriori outputson data. The learnerhas to disoveritself orrelations

betweenexampleswhihareshowntoit.Inaseoftheexampleabove(happy

/unhappy people),themodelistrainedusingpassportphotosdepriveoflabel

and hasno indiation on what we try to make it learn. Among unsupervised

learningmethodsonendslusteringmethods[2℄andassoiationrulesmethods

[3℄.

Semi-supervisedlearning[4℄isalass oftehniques that makesuseof both

labeledandunlabeleddatafortraining;typiallyasmallamountoflabeleddata

andalargeamountofunlabeleddata.Amongpossibleutilizationofthislearning

mode, weould distinguish (i)semi-supervisedlustering whih tries to group

similar instanes but using information given by the small amount of labeled

data[5℄and(ii)semi-supervisedlassiation[6℄whihisbasedrstonlabeled

datato elaboratearstmodelandthenunlabeleddatatoimprovethemodel.

Supervisedlearningis amahine learningtehniqueforreatingafuntion

fromtrainingdata.Thetrainingdataonsistofpairsofinputobjets(typially

vetors), anddesired outputs.The outputof the funtion anbe aontinuous

value(alledregression),oranpreditalasslabeloftheinputobjet(alled

lassiation).Thetask ofthe supervisedlearner isto preditthevalueofthe

funtionforanyvalidinputobjetafterhavingseenanumberoftrainingexam-

ples (i.e. pairsof input and target output). Inase of the illustrativeexample

above,exampleswouldbepassport photosassoiatedtolabelshappy orun-

happy.

1

(3)

lesspassivethantheothersdesribedabove.Thisstrategyallowsthemodel to

onstrut its own training set in interation with ahuman expert. Thelearn-

ing startswithfewdesiredoutputs (lasslabelsfor lassiationorontinuous

value for regression).Then, the model selets examples (without desired out-

puts)thatitonsidersthemoreinformativeandaskstothehumanexperttheir

desiredoutputs. Inaseoftheillustrativeexample,themodelasks lasslabels

of passport photos presented to the human expert. In this paper, we restrit

ativelearningtolassiationbutitis obviousthatourpresentationof ative

learningstrategiesanbetransposeforregression.

Ativelearningisdierentfromallotherslearningmethodsbeauseitinter-

atswithitsenvironment;theexamplesarenotrandomlyhosen.Ativelearning

strategiesallowthemodeltolearnfaster(thelearnerrihthebestperformanes

usinglessdata)onsideringrstthemoreinformativeexamples.Thisapproah

ismorespeiallyattrativewhendataareexpensivetoobtainortolabel.

2.2 Twopossiblesenarios

Thedistintionbetweenrawdataanddatadesriptors(whihareassoiated)is

important.Intheillustrativeexampleabove,therawdata arepassport photos

andthedatadesriptorsareattributesdesribingthephotos(pixel,luminosity,

ontrasts, et.). The model makes the predition of the lass happy or un-

happy to everyvetorof desriptors. Theelaborationof desriptorsfrom raw

dataisnotalwaysbijetive;sometimesitisimpossibletoomposerawdataus-

inglistofdesriptors.Adaptivesampling andseletivesampling,whiharethe

twomain senariostosetativelearning[7℄,userespetivelydatadesriptors

andrawdata.

Inthe ase of adaptive sampling [8℄ the model requires of expert labels

orrespondingtovetorsofdesriptors.Themodelisnotrestritedandanex-

ploreallthespaeofvariationsofthedesriptors,searhingareato besampled

morenely.Adaptivesamplinganposeprobleminitsimplementationwhenit

is diultto knowifthe vetorsof desriptors(generated bythe model)have

ameaning with respet to theinitial problem. Letus suppose that themodel

requiresthelabelassoiatedwiththevetor

[10, 4, 5..., 12]

^.^Does^this^vetor^or-

respondto aset ofdesriptorswhihrepresentapassportphoto,ahumanfae

photo,aowerorsomethingelse?

Intheaseofseletivesampling[9℄,themodelobservesonlyonerestrited

part of theuniverse materialized by training examplesstripped of label. Con-

sequently, theinput vetorsseleted bythe model alwaysorrespondto araw

data. The image of a bag of instanes for whih the model an ask labels

assoiated(totheexamplesin thebag)isusuallyused.Themodelrequiresthe

label assoiated withthevetor

[10, 4, 5..., 12]

^whih ^orresponds ^to^a^passport

photo.

(4)

ofunlabeledexamplesandforwhihlabelingisexpensive.Therefore,fromnow

thepointofviewofseletivesamplingisonlyonsidered.Inpratie,thehoie

of seletive oradaptivesampling depends primarily on theappliability where

themodelisauthorized,ornot,to generate newexamples.

2.3 Notations

M ∈ M

îs ^the ^preditive ^model ^whih îs ^trained ^thanks ^to ân âlgorithm

L

^.

X ⊆ R ⁿ

^representsâll^the^possibleînputêxamplesôf^the^modelând

x ∈ X

^is^a

partiularexamples.

Y

îs^the^setôf^the^possibleôutputs^(answers)ôf^the^model;

y ∈ Y

^a^lass ^label²^related(assoiated)to

x ∈ X

^.

Duringits training,themodel observes(see Figure1) onlyonepart

Φ ⊆ X

oftheuniverse.Thesetofexamplesislimitedandthelabelsassoiatedtothese

examplesare not neessarilyknown. The set of exampleswhere the labels are

known(atastepofthetrainingalgorithm)isalled

L x

ând^the^setôfêxamples

wherethelabelsareunknownisalled

U x

^with

Φ ≡ U x ∪ L x

^and

U x ∩ L x ≡ ∅

^.

Theoneptwhihislearnedanbeseenasafuntion,

f : X → Y

^,^with

f (x 1 )

isthedesiredanswerofthemodelfortheexample

x 1

^and

f b : X → Y

^the^obtained

answerofthemodel;anestimationoftheonept.Theelementsof

L x

^and^the

assoiatedlabelsonstituteatrainingset

T

^.^The^trainingêxamplesâre^pairsôf

inputvetorsanddesiredlabelssuhas

(x, f (x))

^:

∀x ∈ L x , ∃!(x, f (x)) ∈ T

^.

Fig.1.Notations

2

The word label is used here for a disrete value in lassiation problems or a

(5)

3.1 Introdution

TheproblemofseletivesamplingwasposedformallybyMuslea[10℄(seeAlgo-

rithm1).Itusesanutilityfuntion,

U tility(u, M)

^,^whihêstimates^theûtilityôf

anexample

u

^for^the^training^of^the^model

M

^.^Thanks^to^this^funtion,^the^model

presentstotheexpertexamplesforwhihithopesthegreatestimprovementof

itsperformanes.

TheAlgorithm1is generiinsofarasonlythefuntion

U tility(u, M)

^must

bemodiedtoexpressapartiularativelearningstrategy.Howtomeasurethe

interestofanexamplewillbedisussnow.

Considering:

• M

â^preditive^model^provided^withâ^trainingâlgorithm

L

• U x

^et

L x

^the^sets^of^examplesrespetivelynotlabeledandlabeled

• n

^the^desired^number^of^training^examples

• T

^the^training^set^with

kT k < n

• U : X × M → ℜ

^the^funtion^whihêstimates^theûtilityôfânêxample^for

thetrainingofthemodel

Repeat

(A)Trainthemodel

M

^thanks^to

L

^and

T

^(and^possibly

U x

^).

(B)Looktheexamplesuhas

q = argmax u∈U _x U(u, M)

(C)Withdraw

q

^of

U x

^and^ask^the^label

f(q)

^to^the^expert.

(D)Add

q

^to

L x

^and^add

(q, f (q))

^to

T

until

kT k < n

Algorithm1:Seletivesampling, Muslea2002

3.2 Unertainty sampling

Unertainty sampling is an ative learning strategy [11,12℄ whih is based on

ondene that themodel hasinits preditions.Themodel usedmust beable

to estimate the reliability of its answers, to provide the probabilities,

y j

^, ^to

observeeahlass(

j

⁾^for^an^examples

u

^.^Thus^the^model^an^make^a^predition

hoosingthemostprobablelassfor

u

^.^The^hoie^of^new^examples^to^be^labeled

proeedsin twosteps:

•

^the^modelâvailableât^theîteration

t

îsûsed^to^predit^the^labelsôf^the

unlabeledexamples;

•

êxamples^with^the^moreûnertain^preditionâre^seleted.

The unertainty of a predition an also be dened using a threshold of

(6)

0and those whih will be lassied1.Theloser ananswerof themodel is to

thethresholdofdeision,themoreunertainisthedeision.

Thisrstapproahhastheadvantagetobeintuitive,easytoimplementand

fast.Theunertaintysamplingshowsitslimitshoweverwhentheproblemtobe

solved is not separable by the model. Indeed, this strategy will tend to selet

the examplesto be labeled in mixture zones,where there is nothing anymore

tolearn.

Fig.2.Abinarylassiationproblem:theboundaryplottedrepresentsthethreshold

ofdeision.Levelsetsareplottedtooandtheirdistanesfromtheboundarylinesindi-

atetheunertainty.Unlabeleddatalosetotheboundarylinearethemoreunertain

andwillbeseletedtobelabeledbytheexpert.

3.3 Riskredution

Thepurposeofthisapproahistoreduethegeneralizationerror,

E(M)

^,^of^the

model [13℄.It hooses examplesto belabeled so as to minimize this error. In

pratiethiserrorannotbealulatedbeausethedistributionoftheexamples,

X

^,îs ûnknown.^Howeverîtân^be^write, ât ânîteration

^t

^, ^using^a^loss^funtion

(

Loss(M ^t , x)

⁾^whihêvaluates^theêrrorôf^the^model^forâ^givenêxample

x ∈ X

suhas:

E(M ^t ) = Z

X

Loss(M ^t , x)P (x)dx

The same model at the next iteration

t + 1

^is ^dened ^as:

M ^t+1 _(x ⋄ ,y ^⋄ )

^. ^This

modeltakesintoaountanewtrainingexample:

(x ^⋄ , y ^⋄ )

^.^F^or^real^problems,^the

outputofthemodel

y ^⋄

^is^unknown^sine

x ^⋄

îsâ^not^labeled^data.^Tôêstimate^the

generalizationerrorat

t+1

^,^all^thepossibilitiesofthe

Y

^set^have^to^be^onsidered

(7)

E(M ^t+1 _x ⋄ ) = Z

X

Z

Y

P (y|x ^⋄ )Loss(M ^t+1 _(x ⋄ ,y) , x)P(x)dxdy

This strategies selets the example

q

^whih ^minimizes

E(M ^t+1 _x ⋄ )

^. ^One ^la-

beled,thisexampleisinorporatedto thetrainingset.Stepbystepthis proe-

duretriesto elaborateanoptimaltrainingset.

Niholas Roy [9℄ shows how to bring this strategy into play sine all the

elementsof

X

âre^not^known.^Heûsesânûniform^prior^for

P (x)

^whih^gives^:

E(M b ^t ) = 1

kL x k

X

i=1

Loss(M ^t , x i )

This strategy, where dierent lossfuntions anbeused, is summarized in

the algorithm 2. The model is, for all examples

i

^, ^trained ^several ^times ⁽

||Y||

times), to estimate

E(M b ^t+1 _(x

i ,y j ) )

^. ^The ^example

i

^whih ^minimizes ^the ^expeted

lossfuntion(

E(M b ^t+1 _(x

i ) )

⁾^will^beinorporatedinthetrainingset.

Considering:

• M

â^preditive^model^provided^withâ^trainingâlgorithm

L

• U x

^and

L x

^the^sets^of^examplesrespetivelynotlabeledandlabeled

• n

^the^desired^number^of^training^examples

• T

^the^training^set^with

kT k < n

• Y

^the^label^set^whihân^be^given^to^theêxamplesôf

U x

• Loss : M → ℜ

^thegeneralization error

• Err : U x × M → ℜ

^the ^expetedgeneralization error for the model

M

trainedwithanadditional example,

T ∪ (x i , f(x i ))

Repeat

(A)Trainthemodel

M

^thanks^to

L

^and

T

For allexamples

x i ∈ U x

^do

For all label

y j ∈ Y

^do

i)Trainthemodel

M i,j

^thanks^to

L

^and

(T ∪ (x i , y j ))

ii)Computethegeneralization error

E(M b ^t+1 _(x

i ,y _j ) )

endFor

Computethegeneralizationerror

E(M b ^t+1 _x _i ) = P

y j ∈ Y E(M b ^t+1 _(x

i ,y _j ∗) ).P(y j |x i )

endFor

(B)Lookfortheexample

q = argmin u∈U _x E(M b ^t+1 x _i )

(C)Withdraw

q

^of

U x

^ans^ask^the^label

f(q)

^to^the^expert.

(D)Add

q

^to

L x

^and^add

(q, f (q))

^to

T

until

kT k < n

(8)

thegeneralizationerror(

E(M)

⁾^using^the^empirial^risk:

E(M) = b R(M) = X N

n=1

X

y j ∈ Y

1

{f(l n )6=y j } P (y j |l n )P(l n )

^with

l n ∈ L x

where

f

^is^the^model^whih^estimates^theprobabilitythatanexamplebelong toalass(aParzenwindow[15℄in[14℄),

P(y i |l n )

^the^realprobabilitytoobserve the lass

y i

^for ^the ^example

l n ∈ L x

^, ¹ ^the ^indiating ^funtion ^equal ^to

1

^if

f (l n >) 6= y i

^and^equal^to

0

^if^not.^Therefor

R(M)

^is^the^sum^of^theprobabilities

that themodelmakesabaddeisiononthetrainingset(

L x

^).

Usinganuniformpriortoestimate

P (l n )

^:

R(M) = ˆ 1

N X N

n=1

X

y _j ∈Y

1

{f(l n )6=y j } P ˆ (y j |l n )

Theexpetedost foranysingleexample

u

⁽

u ∈ U x

⁾^added^to ^the^training

set (forbinarylassiationproblem)isthen:

R(M ˆ ^+u ) = X

y _j ∈Y

P(y ˆ j |u) ˆ R(M ^+(u,y ^j ⁾ )

^with

u ∈ U x

3.4 Disussion

Both strategiesdesribedabove arenot theonly ones whih exist. Thereader

an see athird main strategy whih is based on Queryby Committee [16,17℄

andafourthonewhereauthorsfousonamodelapproahtoativelearningin

aversion-spaeofonepts[1820℄.

4 Appliation of ative learning to detetion of emotion

in speeh

4.1 Introdution

Thanks to reent tehniquesof speeh proessing, many automatiphone all

enters appear. Thesevoalserversareused byustomersto arryoutvarious

tasks onversing with a mahine. Companies aim to improve their ustomer's

satisfationbyrediretingthemtowardsahumanoperator,intheeventofdi-

ulty.Theshuntingofunsatisedusersamountsdetetingthenegativeemotions

intheirdialogueswiththemahine,undertheassumptionthataproblemofdi-

aloguegeneratesapartiularemotionalstatein thesubjet.

Thedetetion of expressed emotions in speeh is generally onsidered asa

supervisedlearning problem. The detetion of emotions is limited to a binary

lassiationsinetakingintoaountmorelassesrisesproblemoftheobjetiv-

ityoflabelingtask [21℄.The aquisitionandthelabelingofdataare expensive

in this framework. Ative learning an redue this ost by labeling only the

(9)

Thisstudyisbasedonapreviouswork[22℄whihharaterizesvoalexhanges,

inoptimalway,forthelassiationofexpressedemotionsinspeeh.Theobje-

tiveisto ontrolthedialoguebetweenusersandavoal server.Morepreisely,

thisstudy dealstherelevaneofvariablesdesribingdata,aordingtothede-

tetionofemotions.

Theuseddataresultfromanexperimentinvolving32userswhotestastok

exhangeservieimplementedonavoalserver.Aordingtotheuserspointof

view,thetestonsistsinmanagingavirtualwalletofstokoptions,thegoalis

to realizethe strongestprot. Theobtainedvoal traesonstitutetheorpus

of this study: 5496 speeh turns exhanged with the mahine. Speeh turns

are haraterizedby 200 aoustivariables, desribing variations of the sound

intensity, variationsofvoieheight,frequenyof eloution...et. Dataare also

haraterized by 8 dialogialvariables desribing the rankof a speeh turn in

agivendialogue (a dialogueontains several speeh turn), thedurationof the

dialogue...Eahspeehturn ismanuallylabeledasarryingpositiveornegative

emotions.

Thesubsetofthemostinformativevariableswithrespettothedetetionof

expressed emotions in speeh isgiventhanksto anaiveBayesianseletor [23℄.

Atthebeginningofthisproess(theseletionofthemostinformativevariables),

thesetofattributesisempty.Theattributewhihmostimprovesthepreditive

qualityofthemodelisthenaddedateahiteration.Thealgorithm stopswhen

theadditionof attributesdoesnotimproveanymorethequalityofthemodel.

Finally,20variableswereseletedtoharaterizevoalexhanges.Inthisartile,

used dataresultfrom thesameorpusand from thispreviousstudy. So,every

speehturnis haraterizedby20variables.

4.3 The hoie ofthe model

Thelargerangeofmodelsabletosolvelassiationproblems(andsometimes

the great number of parametersuseful to use them) may representdiulties

to measure the ontribution of a learning strategy. A Parzen window, with a

Gaussian kernel[15℄, is used in experimentsbelow sinethis preditivemodel

usesasingleparameterandisableto workwithfewexamples.Theoutput of

thismodelisanestimateoftheprobabilitytoobservethelabel

y j

onditionally totheinstane

u

^:

P ˆ (y j |u) = P N

n=1

¹

{f(l n )=y j } K(u, l n ) P N

n=1 K(u, l n ) avec l n , ∈ L x et u ∈ U x ∪ L x

⁽¹⁾

where

K(u, l n ) = e ^||u−ln||

2 2 σ 2

Theoptimal value (

σ ²

^=0.24) ^of ^the ^kernel ^parameter^was^found ^thanks ^to

(10)

rameter.Theresultsobtainedbythismodel(usingthewhole oftrainingdata)

aresimilarwiththepreviousresultsobtainedbyanaiveBayesianlassier[22℄.

Consequently,Parzenwindowsareonsideredsatisfyingandvalidforthefollow-

ing ative learning proedures.Kernel methods and loser neighbors methods

areusuallyusedin lassiationofexpressedemotionsin speeh[25℄.

The model must be able to assign a label

f ˆ (u)

^to ^an ^input ^data

u

^, ^so ^a

deisionthresholdnoted

T h(L x )

îsâlulatedât êah îteration.^This^threshold

minimizes the error of the model 3

on the available training set. The label at-

tributed is

f ˆ (u n ) = 1

^if

{ P(y ˆ 1 |u n ) > T h(L x )}

^, ^else

f ˆ (u n ) = 0

^. ^Sine^the^single

parameteroftheParzenwindowisxed, thetrainingstageisreduedtoount

instanes(within themeaningoftheGaussiankernel).Thestrategiesofexam-

ples seletion, withoutbeinginuened bythe trainingof the model, are thus

omparable.

4.4 Used Ative Learning strategies

TwoAtivelearningstrategiesare onsideredin thispaper;theativelearning

strategy whih tries to redue the generalization error of the model and the

strategyonsistsinseletingtheinstaneforwhih thepreditionofthemodel

ismostunertainhavebeentested.

Forthe rststrategy theParzen window estimates

P (y i |l n )

^. ^The^empirial

riskis approximatedadopting auniform apriorion the

P (l n )

^. ^The^purpose ^is

to selet the unlabeled instane

u i ∈ U x

^whih ^will ^minimize ^the ^risk ^of ^the

next iteration.

R(M ^+u ⁿ )

^the ^expeted ^risk ^resulting^from ^the^labeling^of ^the

instane

u n

^(iteration

t + 1

⁾îsêstimated.Âvailable^labeled^dataâreûsed ^to^do

thisestimationwhentheassumption

f (u n ) = y 1

^[resp

f (u n ) = y 0

^℄ ^to^estimate

R(M ˆ ^+(u ⁿ ^,y ¹ ⁾ )

^[resp

R(M ˆ ^+(u ⁿ ^,y ⁰ ⁾ )

^℄ ^is ^done.

For the seond strategy the unertainty of a predition is maximum when

theoutputprobabilityofthemodelapproahesthedeisionthreshold.

Apartfrom these two ative strategies, astohasti approah whih uni-

formlyseletstheexamplesaordingtotheirprobabilitydistributionisonsid-

ered.Thislastapproahplayaroleofrefereneusedtomeasuretheontribution

oftheativestrategies.

4.5 Results

Thepresentedresultsomefromseveralexperimentsonpreviouslearningstrate-

gies.Eahexperimenthasbeendonevetimes 4

.Atthebeginningoftheexper-

iments, thetraining set is onlyonstituted by two examples(one positiveand

onenegative)seletedrandomly.Ateahiteration,tenexamplesareseletedto

3

TheusederrormeasurementistheBalanedErrorRate,formoredetailsseesetion

4.5

4

thenathesontheurvesofthegure3orrespondto4times thevarianeof the

results(

±2σ

^).

(11)

here,is unbalaned:there are92%ofpositiveorneutral emotions and8% of

negativeemotions.Toobserveorretlythelassiationprots(whenadding

labeledexamples),themodelevaluationisdoneusingtheareaunderROCurve

(AUC)on thetest set 5

.A ROCurveis alulatedfrom the detetionrate of

asinglelass. Conseently, weuse thesumofthe AUCsweightedbyreferene

lass'sprevaleneinthedata.

0.6 0.65 0.7 0.75 0.8 0.85 0.9

0 200 400 600 800 1000 1200

AUC

Random Strategy Uncertainty Maximization Error Reduction

Fig.3.Fousoftheresultsonthetestusing[0:1200℄trainingexamples

Theriskredutionisthestrategywhihmaximizesthequalityofthemodel

for a number of training examples in the range [2:100℄. Between100 and 700

the strategybased onunertaintywins. After600 trainingexamplesthe three

strategiesonvergetotheoptimalAUC(AreaUnderRoCurve).

Thetwoativestrategiesallowobtainingfasterthantherandomstrategythe

optimalresult(the optimalAUCis0.84usingthewhole trainingset).Theuse

ofativelearningis positivein thisreal problem.Howevertheresultsobtained

raisequestionswhihwill detailedinthenextsetion.

5

(12)

Thispapershowstheinterestofativelearningforaeldwhereaquisitionand

labeling of data are partiularly expensive. Obtained results show that ative

learningisrelevantforthedetetionofexpressedemotionsinspeeh.Butwhat-

everthe strategyonsidered (eventhe twostrategiesevokedin setion 3.4but

notderailedin thispaper)severalquestionexistandanberaised:

evaluation -Thequality ofan ative strategyisusually representedbya

urveassessingtheperformaneofthemodelversusthenumberoftraining

exampleslabeled (see Figure 3). The performane riterion used an take

several dierent ways aordingto theproblem. This typeof urveallows

onlyomparisons between strategiesin apuntual way, i.e. for apoint on

the urve (a given number of training examples). If two urves pass eah

other(asintheFigure3,itisimpossibletodetermineifastrategyisbetter

than another (on the total set of training examples). The elaboration of

ariterion whih measures the ontributionof a strategyompared to the

randomstrategyonthewholedatasetshouldbeinteresting.Thispointwill

bedisussedinafuturepaper.

testset-Ativelearningstrategiesare,often,usedwhendataaquisitionis

expensive.Therefore,inpratie,atestsetisnotavailable(otherwiseitan

beused to thetraining) andtheevaluation ofthemodelduring astrategy

isdiultto implement.

stopping riterion -Themaximal numberof examplesto labeled,oran

estimationoftheprogressof themodel,anbeused tostopthealgorithm.

Thisisverylinktotheuseofatestsetorthemodelemployed.Forexample

in theFigure 3thestrategy basedonthe riskgivesthe sameresultswhen

15 examples have been labeled than results using all the available data.

Inthis ase the ost using 15 examplesand 600 will benot the same... A

goodriterionshouldbeindependentofthemodelandofatestset.Atual

experiments (not yet published) willallow usto propose ariterionof this

typeattheendof2007.

number ofexamplesto belabeled:thestateoftheartseemstoinor-

porateanonlyoneexampleateahstepofthestrategy.Butinrealasethe

expertisahumanandwhenthemodelneedstimetolearnateahiteration

ofthestrategythisouldbenoteient.Sometimesmorethanoneexample

mustbeinorporated.Thisaspethasbeenalitlebitstudied in[26℄but it

hasto bemoreanalyseinthefuture.

unertainenvironment-Ifananswerouldbegiventothepointsabove

thenativestrategiesouldbeusedforon-line learninginanunertainen-

vironment.Forexampletotagpartofgraph(graphhereissoialnetwork).

Whenwritingthispaperwehopethatthispointwillaeptedandinorpo-

ratedin apropositionsenttoaFrenhProjet(andnanedbytheFrenh

NationalAgenyofResearh(ANR))groupingindustrialanduniversities.

Generally,ativelearningstrategiesestimatetheutilityoftrainingexamples.

(13)

Thisapproahwouldbeableto onsidernonstationaryproblemsanditisable

totrainamodelwhih adaptsitselftothevariationsof theobservedsystem.

Forthedetetionofexpressedemotionsinspeehouldbetreatedbyadouble

strategyreduingtheostlinkedtothedata:(i)avariablesseletionallowingto

preserveonlytheneessaryandsuientharateristisforlassiation;(ii)an

examplesseletionallowingtopreserveonly usefulinstanes for training.This

willbeexploredinfuture work.

Referenes

1. Harmon,M.: Reinforementlearning:atutorial. http://eureka1.aa.wpafb.af.mil/

rltutorial/(1996)

2. Jain,A.K.,Murty,M.N.,Flynn,P.J.: Datalustering:areview. ACMComputing

Surveys31(3) (1999)264323

3. Jamy, I., Jen, T.Y., Laurent, D., Loizou, G., Sy, O.: Extration de règles

d'assoiation pour la prédition de valeurs manquantes. Revue Afriaine de la

ReherheenInformatiqueetMathématiqueAppliquéeARIMASpéialCARI04

(2005)103124

4. Chapelle, O., Shölkopf, B., Zien, A.: Semi-Supervised Learning. MIT Press,

Cambridge,MA(inpress)(2006)http://www.kyb.tuebingen.mpg.de /ssl-book /

ssl_to.pdf.

5. Cohn,D.,Caruana,R.,MCallum,A.: Semi-supervisedlusteringwithuserfeed-

bak. TehnialReport1892,CornellUniversity(2003)

6. Chapelle,O., Zien, A.: Semi-supervisedlassiation by low density separation.

InProeedingsoftheTenthInternationalWorkshoponArtiialIntelligeneand

Statistis(2005)

7. Castro,R., Willett, R.,Nowak, R.: Faster rateinregressionvia ative learning.

In:NIPS(NeuralInformationProessingSystems),Vanouver(2005)

8. Singh,A.,Nowak,R.,Ramanathan,P.:Ativelearningforadaptivemobilesensing

networks. In:IPSN'06: Proeedings ofthe fth international onferene on In-

formationproessinginsensornetworks,NewYork,NY,USA,ACMPress(2006)

6068

9. Roy,N.,MCallum,A.: Towardoptimalativelearningthroughsamplingestima-

tionof errorredution. In:Pro. 18thInternationalConf. onMahine Learning,

MorganKaufmann,SanFraniso,CA(2001)441448

10. Muslea,I.:AtiveLearningWithMultipleView.Phdthesis,Universityofsouthern

alifornia (2002)

11. Lewis,D.,Gale, A.: A sequentialalgorithmfortrainingtext lassiers. InCroft,

W.B., vanRijsbergen, C.J., eds.: Proeedings of SIGIR-94, 17th ACMInterna-

tionalConfereneonResearhandDevelopmentinInformationRetrieval,Dublin,

SpringerVerlag,Heidelberg(1994)312

12. Thrun,S.B.,Möller,K.: Ativeexplorationindynamienvironments. InMoody,

J.E.,Hanson,S.J.,Lippmann,R.P.,eds.:AdvanesinNeuralInformationProess-

ingSystems.Volume4.,MorganKaufmannPublishers,In.(1992)531538

13. Cohn,D.A.,Ghahramani,Z.,Jordan,M.I.:Ativelearningwithstatistialmodels.

In Tesauro, G., Touretzky, D., Leen, T., eds.: Advanes in Neural Information

(14)

supervisedlearning usinggaussian eldsandharmoni funtions. In:ICML(In-

ternationalConfereneonMahine Learning),Washington(2003)

15. Parzen,E.: Onestimationofaprobabilitydensityfuntionandmode. Annalsof

MathematialStatistis33(1962)10651076

16. Freund,Y.,Seung,H.S.,Shamir,E.,Tishby,N.:Seletivesamplingusingthequery

byommitteealgorithm. MahineLearning28(2-3)(1997)133168

17. Seung,H.S.,Opper,M.,Sompolinsky,H.:Querybyommittee.In:Computational

LearningTheory.(1992)287294

18. Dasgupta,S.: Analysisofgreedyativelearningstrategy. In:NIPS(NeuralInfor-

mationProessingSystems),SanDiego(2005)

19. Cohn,D.A.,Atlas,L.,Ladner,R.E.:Improvinggeneralizationwithativelearning.

MahineLearning15(2)(1994)201221

20. Tong,S., Koller,D.: Supportvetormahineativelearning withappliations to

textlassiation.InLangley,P.,ed.:ProeedingsofICML-00,17thInternational

Conferene on Mahine Learning, Stanford, US, Morgan KaufmannPublishers,

SanFraniso,US(2000)9991006

21. Lisombe,J.,Riardi, G., Hakkani-Tür, D.: Using ontext to improveemotion

detetioninspokendialogsystems.In:InterSpeeh,Lisbon(2005)

22. Poulain,B.: Séletiondevariablesetmodélisationd'expressionsd'émotionsdans

desdialogueshommes-mahine.In:EGC(ExtrationetGestiondeConnaissane),

Lille.+TehnialReportavalaiblehere:http://perso.rd.franeteleom.fr/lemaire

(infrenh)(2006)

23. Boullé, M.: An enhaned seletive naive bayesmethod withoptimal disretiza-

tion. InGuyon, I.,Gunn,S.,Nikravesh,M., Zadeh,L.,eds.:Featureextration,

foundationsandAppliation. Springer(August2006)499507

24. Chappelle,O.: Ativelearningfor parzenwindows lassier. In:AI &Statistis,

Barbados(2005)4956

25. Guide, V., Rakotomamonjy, Canu, S.: Méthode à noyaux pour l'identiation

d'émotion. In: RFIA (Reonnaissane des Formes et Intelligene Artiielle).

(2003)

26. Bondu,A.,Lemaire, V.: Etudedel'inuenedunombred'exemplesàétiquetter

dans une proédure d'apprentissage atif. submitted to CAP 2006 (Conferene

franophonesurl'apprentissageautomatique)

"Active Learning Strategies: a case study for detection of emotions in speech",

[10, 4, 5..., 12]

[10, 4, 5..., 12]

M ∈ M

L

X ⊆ R n

x ∈ X

Y

y ∈ Y

x ∈ X

Φ ⊆ X

L x

U x

Φ ≡ U x ∪ L x

U x ∩ L x ≡ ∅

f : X → Y

f (x 1 )

x 1

f b : X → Y

L x

T

(x, f (x))

∀x ∈ L x , ∃!(x, f (x)) ∈ T

U tility(u, M)

u

M

U tility(u, M)

• M

L

• U x

L x

• n

• T

kT k < n

• U : X × M → ℜ

M

L

T

U x

q = argmax u∈U x U(u, M)

q

U x

f(q)

q

L x

(q, f (q))

T

kT k < n

y j

j

u

u

•

t

•

E(M)

X

t

Loss(M t , x)

x ∈ X

E(M t ) = Z

X

Loss(M t , x)P (x)dx

t + 1

M t+1 (x ⋄ ,y ⋄ )

(x ⋄ , y ⋄ )

y ⋄

x ⋄

t+1

Y

E(M t+1 x ⋄ ) = Z

X

Z

Y

P (y|x ⋄ )Loss(M t+1 (x ⋄ ,y) , x)P(x)dxdy

q

E(M t+1 x ⋄ )

X

P (x)

E(M b t ) = 1

X ⊆ R ⁿ

q = argmax u∈U _x U(u, M)

^t

Loss(M ^t , x)

E(M ^t ) = Z

Loss(M ^t , x)P (x)dx

M ^t+1 _(x ⋄ ,y ^⋄ )

(x ^⋄ , y ^⋄ )

y ^⋄

x ^⋄

E(M ^t+1 _x ⋄ ) = Z

P (y|x ^⋄ )Loss(M ^t+1 _(x ⋄ ,y) , x)P(x)dxdy

E(M ^t+1 _x ⋄ )

E(M b ^t ) = 1

Loss(M ^t , x i )

E(M b ^t+1 _(x

E(M b ^t+1 _(x

E(M b ^t+1 _(x

i ,y _j ) )

E(M b ^t+1 _x _i ) = P

y j ∈ Y E(M b ^t+1 _(x

i ,y _j ∗) ).P(y j |x i )

q = argmin u∈U _x E(M b ^t+1 x _i )

y _j ∈Y