detetion of emotions in speeh
AlexisBondu,VinentLemaire,andBarbaraPoulain
R&DFraneTeleom,
TECH/EASY/TSI
2avenuePierreMarzin22300Lannion
Abstrat. Mahine learning indiates methods and algorithms whih
allowa model tolearn abehavior thanksto examples.Ative learning
gathersmethodswhihseletexamplesusedtobuildatrainingsetforthe
preditivemodel.Allthestrategiesaimtousethelessexamplesaspossi-
bleandtoseletthemostinformativeexamples.Afterhavingformalized
theativelearningproblemandafterhavingloateditintheliterature,
this artile synthesizes inthe rst part the mainapproahes of ative
learning.Takingintoaount emotionsinHuman-mahineinterations
anbehelpfulfor intelligentsystemsdesigning.Themaindiulty,for
theoneptionofallsenter'sautomatishuntingsystem,istheostof
datalabeling.Thelastsetionofthispaperproposetoreduethisost
thankstotwoativelearningstrategies.Thestudyisbasedonrealdata
resultingfromtheuseofavoalstokexhangeserver.
1 Introdution
Ativelearningmethodsomefromaparallelbetweenativeeduationalmeth-
ods and learning theory. The learner is from now astatistial model and not
a student. The interations between the student and the teaher orrespond
to theopportunity(possibility)to themodelto interat withahumanexpert.
The examples are situations used by the model to generate knowledge on the
problem.
Ative learningmethods allow the model to interat with its environment
byseletingthemoreinformative situations. Thepurposeisto trainamodel
whih usesas litleas possibleexamples. Theelaboration of thetraining set is
doneininterationwithahumanexperttomaximizeprogressofthemodel.The
modelmustbeabletodetetthemoreinformativeexamplesforitslearningand
toask totheexpert:what shouldbedonein thesesituations.
Thepurposeofthispaperistopresenttwomainativelearningapproahes
found in thestateof theart. These approahesare presentedin ageneri way
withoutonsideredakindofmodel(the onewhih learnsusingexamplesdeliv-
eredbytheexpertaftereveryofitsrequests).Othersapproahesexistbutthey
arenotpresentedinthispaperalthoughreferenesaregivenforthereaderwho
ageneriwayandestablishmathematialnotationsused.Theaimofthissetion
istoplaeativelearningamongothersstatistiallearningmethods(supervised,
unsupervised...).Thefourthsetionpresentsindetailstwomainativelearning
approahes. These two strategies are then used in the fth setion on a real
problem.Finallythelastsetionisadisussiononquestionopeninthispaper.
2 Ative Learning
2.1 General remarks
Theobjetiveofstatistiallearning(unsupervised,semi-supervised,supervised 1
)
is to inulate a behavior to a model using observations (examples) and a
learning algorithm.The observations are pointsof viewon the problem to be
resolvedand onstitutethelearningdata. Atthe endof thetrainingstage the
model hastogeneralizeitslearningto unseensituationsinareasonableway.
Forexamplelet'simagineamodelwhihtrytodetethappyandunhappy
people from passport photo. If themodel realizes good preditions for unseen
peopleduringitstrainingstagethenthemodelorretlygeneralize.
Charateristisof used data hange depending on the learning mode. Un-
supervised learning is a method of mahine learning where a model is t to
observations.Itisdistinguishedfrom supervisedlearningbythefatthat there
are no apriori outputson data. The learnerhas to disoveritself orrelations
betweenexampleswhihareshowntoit.Inaseoftheexampleabove(happy
/unhappy people),themodelistrainedusingpassportphotosdepriveoflabel
and hasno indiation on what we try to make it learn. Among unsupervised
learningmethodsonendslusteringmethods[2℄andassoiationrulesmethods
[3℄.
Semi-supervisedlearning[4℄isalass oftehniques that makesuseof both
labeledandunlabeleddatafortraining;typiallyasmallamountoflabeleddata
andalargeamountofunlabeleddata.Amongpossibleutilizationofthislearning
mode, weould distinguish (i)semi-supervisedlustering whih tries to group
similar instanes but using information given by the small amount of labeled
data[5℄and(ii)semi-supervisedlassiation[6℄whihisbasedrstonlabeled
datato elaboratearstmodelandthenunlabeleddatatoimprovethemodel.
Supervisedlearningis amahine learningtehniqueforreatingafuntion
fromtrainingdata.Thetrainingdataonsistofpairsofinputobjets(typially
vetors), anddesired outputs.The outputof the funtion anbe aontinuous
value(alledregression),oranpreditalasslabeloftheinputobjet(alled
lassiation).Thetask ofthe supervisedlearner isto preditthevalueofthe
funtionforanyvalidinputobjetafterhavingseenanumberoftrainingexam-
ples (i.e. pairsof input and target output). Inase of the illustrativeexample
above,exampleswouldbepassport photosassoiatedtolabelshappy orun-
happy.
1
lesspassivethantheothersdesribedabove.Thisstrategyallowsthemodel to
onstrut its own training set in interation with ahuman expert. Thelearn-
ing startswithfewdesiredoutputs (lasslabelsfor lassiationorontinuous
value for regression).Then, the model selets examples (without desired out-
puts)thatitonsidersthemoreinformativeandaskstothehumanexperttheir
desiredoutputs. Inaseoftheillustrativeexample,themodelasks lasslabels
of passport photos presented to the human expert. In this paper, we restrit
ativelearningtolassiationbutitis obviousthatourpresentationof ative
learningstrategiesanbetransposeforregression.
Ativelearningisdierentfromallotherslearningmethodsbeauseitinter-
atswithitsenvironment;theexamplesarenotrandomlyhosen.Ativelearning
strategiesallowthemodeltolearnfaster(thelearnerrihthebestperformanes
usinglessdata)onsideringrstthemoreinformativeexamples.Thisapproah
ismorespeiallyattrativewhendataareexpensivetoobtainortolabel.
2.2 Twopossiblesenarios
Thedistintionbetweenrawdataanddatadesriptors(whihareassoiated)is
important.Intheillustrativeexampleabove,therawdata arepassport photos
andthedatadesriptorsareattributesdesribingthephotos(pixel,luminosity,
ontrasts, et.). The model makes the predition of the lass happy or un-
happy to everyvetorof desriptors. Theelaborationof desriptorsfrom raw
dataisnotalwaysbijetive;sometimesitisimpossibletoomposerawdataus-
inglistofdesriptors.Adaptivesampling andseletivesampling,whiharethe
twomain senariostosetativelearning[7℄,userespetivelydatadesriptors
andrawdata.
Inthe ase of adaptive sampling [8℄ the model requires of expert labels
orrespondingtovetorsofdesriptors.Themodelisnotrestritedandanex-
ploreallthespaeofvariationsofthedesriptors,searhingareato besampled
morenely.Adaptivesamplinganposeprobleminitsimplementationwhenit
is diultto knowifthe vetorsof desriptors(generated bythe model)have
ameaning with respet to theinitial problem. Letus suppose that themodel
requiresthelabelassoiatedwiththevetor
[10, 4, 5..., 12]
.Doesthisvetoror-respondto aset ofdesriptorswhihrepresentapassportphoto,ahumanfae
photo,aowerorsomethingelse?
Intheaseofseletivesampling[9℄,themodelobservesonlyonerestrited
part of theuniverse materialized by training examplesstripped of label. Con-
sequently, theinput vetorsseleted bythe model alwaysorrespondto araw
data. The image of a bag of instanes for whih the model an ask labels
assoiated(totheexamplesin thebag)isusuallyused.Themodelrequiresthe
label assoiated withthevetor
[10, 4, 5..., 12]
whih orresponds toapassportphoto.
ofunlabeledexamplesandforwhihlabelingisexpensive.Therefore,fromnow
thepointofviewofseletivesamplingisonlyonsidered.Inpratie,thehoie
of seletive oradaptivesampling depends primarily on theappliability where
themodelisauthorized,ornot,to generate newexamples.
2.3 Notations
M ∈ M
is the preditive model whih is trained thanks to an algorithmL
.X ⊆ R n representsallthepossibleinputexamplesofthemodelandx ∈ X
isa
partiularexamples.
Y
isthesetofthepossibleoutputs(answers)ofthemodel;y ∈ Y
alass label2related(assoiated)tox ∈ X
.Duringits training,themodel observes(see Figure1) onlyonepart
Φ ⊆ X
oftheuniverse.Thesetofexamplesislimitedandthelabelsassoiatedtothese
examplesare not neessarilyknown. The set of exampleswhere the labels are
known(atastepofthetrainingalgorithm)isalled
L x andthesetofexamples
wherethelabelsareunknownisalled
U xwithΦ ≡ U x ∪ L xandU x ∩ L x ≡ ∅
.
U x ∩ L x ≡ ∅
.Theoneptwhihislearnedanbeseenasafuntion,
f : X → Y
,withf (x 1 )
isthedesiredanswerofthemodelfortheexample
x 1andf b : X → Y
theobtained
answerofthemodel;anestimationoftheonept.Theelementsof
L x andthe
assoiatedlabelsonstituteatrainingset
T
.Thetrainingexamplesarepairsofinputvetorsanddesiredlabelssuhas
(x, f (x))
:∀x ∈ L x , ∃!(x, f (x)) ∈ T
.Fig.1.Notations
2
The word label is used here for a disrete value in lassiation problems or a
3.1 Introdution
TheproblemofseletivesamplingwasposedformallybyMuslea[10℄(seeAlgo-
rithm1).Itusesanutilityfuntion,
U tility(u, M)
,whihestimatestheutilityofanexample
u
forthetrainingofthemodelM
.Thankstothisfuntion,themodelpresentstotheexpertexamplesforwhihithopesthegreatestimprovementof
itsperformanes.
TheAlgorithm1is generiinsofarasonlythefuntion
U tility(u, M)
mustbemodiedtoexpressapartiularativelearningstrategy.Howtomeasurethe
interestofanexamplewillbedisussnow.
Considering:
• M
apreditivemodelprovidedwithatrainingalgorithmL
• U x
etL x
thesetsofexamplesrespetivelynotlabeledandlabeled• n
thedesirednumberoftrainingexamples• T
thetrainingsetwithkT k < n
• U : X × M → ℜ
thefuntionwhihestimatestheutilityofanexampleforthetrainingofthemodel
Repeat
(A)Trainthemodel
M
thankstoL
andT
(andpossiblyU x
).(B)Looktheexamplesuhas
q = argmax u∈U x U(u, M)
(C)Withdraw
q
ofU x
andaskthelabelf(q)
totheexpert.(D)Add
q
toL x
andadd(q, f (q))
toT
until
kT k < n
Algorithm1:Seletivesampling, Muslea2002
3.2 Unertainty sampling
Unertainty sampling is an ative learning strategy [11,12℄ whih is based on
ondene that themodel hasinits preditions.Themodel usedmust beable
to estimate the reliability of its answers, to provide the probabilities,
y j, to
observeeahlass(
j
)foranexamplesu
.Thusthemodelanmakeapreditionhoosingthemostprobablelassfor
u
.Thehoieofnewexamplestobelabeledproeedsin twosteps:
•
themodelavailableattheiterationt
isusedtopreditthelabelsoftheunlabeledexamples;
•
exampleswiththemoreunertainpreditionareseleted.The unertainty of a predition an also be dened using a threshold of
0and those whih will be lassied1.Theloser ananswerof themodel is to
thethresholdofdeision,themoreunertainisthedeision.
Thisrstapproahhastheadvantagetobeintuitive,easytoimplementand
fast.Theunertaintysamplingshowsitslimitshoweverwhentheproblemtobe
solved is not separable by the model. Indeed, this strategy will tend to selet
the examplesto be labeled in mixture zones,where there is nothing anymore
tolearn.
Fig.2.Abinarylassiationproblem:theboundaryplottedrepresentsthethreshold
ofdeision.Levelsetsareplottedtooandtheirdistanesfromtheboundarylinesindi-
atetheunertainty.Unlabeleddatalosetotheboundarylinearethemoreunertain
andwillbeseletedtobelabeledbytheexpert.
3.3 Riskredution
Thepurposeofthisapproahistoreduethegeneralizationerror,
E(M)
,ofthemodel [13℄.It hooses examplesto belabeled so as to minimize this error. In
pratiethiserrorannotbealulatedbeausethedistributionoftheexamples,
X
,is unknown.Howeveritanbewrite, at aniterationt
, usingalossfuntion(
Loss(M t , x)
)whihevaluatestheerrorofthemodelforagivenexamplex ∈ X
suhas:
E(M t ) = Z
X
Loss(M t , x)P (x)dx
The same model at the next iteration
t + 1
is dened as:M t+1 (x ⋄ ,y ⋄ ). This
modeltakesintoaountanewtrainingexample:
(x ⋄ , y ⋄ )
.Forrealproblems,theoutputofthemodel
y ⋄isunknownsinex ⋄isanotlabeleddata.Toestimatethe
generalizationerrorat
t+1
,allthepossibilitiesoftheY
sethavetobeonsideredE(M t+1 x ⋄ ) = Z
X
Z
Y
P (y|x ⋄ )Loss(M t+1 (x ⋄ ,y) , x)P(x)dxdy
This strategies selets the example
q
whih minimizesE(M t+1 x ⋄ )
. One la-beled,thisexampleisinorporatedto thetrainingset.Stepbystepthis proe-
duretriesto elaborateanoptimaltrainingset.
Niholas Roy [9℄ shows how to bring this strategy into play sine all the
elementsof
X
arenotknown.HeusesanuniformpriorforP (x)
whihgives:E(M b t ) = 1
kL x k
kL x k
X
i=1
Loss(M t , x i )
This strategy, where dierent lossfuntions anbeused, is summarized in
the algorithm 2. The model is, for all examples
i
, trained several times (||Y||
times), to estimate
E(M b t+1 (x
i ,y j ) ). The examplei
whih minimizes the expeted
lossfuntion(
E(M b t+1 (x
i ) ))willbeinorporatedinthetrainingset.
Considering:
• M
apreditivemodelprovidedwithatrainingalgorithmL
• U x
andL x
thesetsofexamplesrespetivelynotlabeledandlabeled• n
thedesirednumberoftrainingexamples• T
thetrainingsetwithkT k < n
• Y
thelabelsetwhihanbegiventotheexamplesofU x
• Loss : M → ℜ
thegeneralization error• Err : U x × M → ℜ
the expetedgeneralization error for the modelM
trainedwithanadditional example,
T ∪ (x i , f(x i ))
Repeat
(A)Trainthemodel
M
thankstoL
andT
For allexamples
x i ∈ U x
doFor all label
y j ∈ Y
doi)Trainthemodel
M i,j
thankstoL
and(T ∪ (x i , y j ))
ii)Computethegeneralization error
E(M b t+1 (x
i ,y j ) )
endFor
Computethegeneralizationerror
E(M b t+1 x i ) = P
y j ∈ Y E(M b t+1 (x
i ,y j ∗) ).P(y j |x i )
endFor
(B)Lookfortheexample
q = argmin u∈U x E(M b t+1 x i )
(C)Withdraw
q
ofU x
ansaskthelabelf(q)
totheexpert.(D)Add
q
toL x
andadd(q, f (q))
toT
until
kT k < n
thegeneralizationerror(
E(M)
)usingtheempirialrisk:E(M) = b R(M) = X N
n=1
X
y j ∈ Y
1
{f(l n )6=y j } P (y j |l n )P(l n )withl n ∈ L x
where
f
isthemodelwhihestimatestheprobabilitythatanexamplebelong toalass(aParzenwindow[15℄in[14℄),P(y i |l n )
therealprobabilitytoobserve the lassy i for the example l n ∈ L x, 1 the indiating funtion equal to 1
if
f (l n >) 6= y iandequalto0
ifnot.ThereforR(M)
isthesumoftheprobabilities
1
iff (l n >) 6= y iandequalto0
ifnot.ThereforR(M)
isthesumoftheprobabilities
that themodelmakesabaddeisiononthetrainingset(
L x).
Usinganuniformpriortoestimate
P (l n )
:R(M) = ˆ 1
N X N
n=1
X
y j ∈Y
1
{f(l n )6=y j } P ˆ (y j |l n )
Theexpetedost foranysingleexample
u
(u ∈ U x)addedto thetraining
set (forbinarylassiationproblem)isthen:
R(M ˆ +u ) = X
y j ∈Y
P(y ˆ j |u) ˆ R(M +(u,y j ) )
withu ∈ U x
3.4 Disussion
Both strategiesdesribedabove arenot theonly ones whih exist. Thereader
an see athird main strategy whih is based on Queryby Committee [16,17℄
andafourthonewhereauthorsfousonamodelapproahtoativelearningin
aversion-spaeofonepts[1820℄.
4 Appliation of ative learning to detetion of emotion
in speeh
4.1 Introdution
Thanks to reent tehniquesof speeh proessing, many automatiphone all
enters appear. Thesevoalserversareused byustomersto arryoutvarious
tasks onversing with a mahine. Companies aim to improve their ustomer's
satisfationbyrediretingthemtowardsahumanoperator,intheeventofdi-
ulty.Theshuntingofunsatisedusersamountsdetetingthenegativeemotions
intheirdialogueswiththemahine,undertheassumptionthataproblemofdi-
aloguegeneratesapartiularemotionalstatein thesubjet.
Thedetetion of expressed emotions in speeh is generally onsidered asa
supervisedlearning problem. The detetion of emotions is limited to a binary
lassiationsinetakingintoaountmorelassesrisesproblemoftheobjetiv-
ityoflabelingtask [21℄.The aquisitionandthelabelingofdataare expensive
in this framework. Ative learning an redue this ost by labeling only the
Thisstudyisbasedonapreviouswork[22℄whihharaterizesvoalexhanges,
inoptimalway,forthelassiationofexpressedemotionsinspeeh.Theobje-
tiveisto ontrolthedialoguebetweenusersandavoal server.Morepreisely,
thisstudy dealstherelevaneofvariablesdesribingdata,aordingtothede-
tetionofemotions.
Theuseddataresultfromanexperimentinvolving32userswhotestastok
exhangeservieimplementedonavoalserver.Aordingtotheuserspointof
view,thetestonsistsinmanagingavirtualwalletofstokoptions,thegoalis
to realizethe strongestprot. Theobtainedvoal traesonstitutetheorpus
of this study: 5496 speeh turns exhanged with the mahine. Speeh turns
are haraterizedby 200 aoustivariables, desribing variations of the sound
intensity, variationsofvoieheight,frequenyof eloution...et. Dataare also
haraterized by 8 dialogialvariables desribing the rankof a speeh turn in
agivendialogue (a dialogueontains several speeh turn), thedurationof the
dialogue...Eahspeehturn ismanuallylabeledasarryingpositiveornegative
emotions.
Thesubsetofthemostinformativevariableswithrespettothedetetionof
expressed emotions in speeh isgiventhanksto anaiveBayesianseletor [23℄.
Atthebeginningofthisproess(theseletionofthemostinformativevariables),
thesetofattributesisempty.Theattributewhihmostimprovesthepreditive
qualityofthemodelisthenaddedateahiteration.Thealgorithm stopswhen
theadditionof attributesdoesnotimproveanymorethequalityofthemodel.
Finally,20variableswereseletedtoharaterizevoalexhanges.Inthisartile,
used dataresultfrom thesameorpusand from thispreviousstudy. So,every
speehturnis haraterizedby20variables.
4.3 The hoie ofthe model
Thelargerangeofmodelsabletosolvelassiationproblems(andsometimes
the great number of parametersuseful to use them) may representdiulties
to measure the ontribution of a learning strategy. A Parzen window, with a
Gaussian kernel[15℄, is used in experimentsbelow sinethis preditivemodel
usesasingleparameterandisableto workwithfewexamples.Theoutput of
thismodelisanestimateoftheprobabilitytoobservethelabel
y j onditionally
totheinstaneu
:
P ˆ (y j |u) = P N
n=1
1{f(l n )=y j } K(u, l n ) P N
n=1 K(u, l n ) avec l n , ∈ L x et u ∈ U x ∪ L x
(1)where
K(u, l n ) = e ||u−ln||
2 2 σ 2
Theoptimal value (
σ 2=0.24) of the kernel parameterwasfound thanks to
rameter.Theresultsobtainedbythismodel(usingthewhole oftrainingdata)
aresimilarwiththepreviousresultsobtainedbyanaiveBayesianlassier[22℄.
Consequently,Parzenwindowsareonsideredsatisfyingandvalidforthefollow-
ing ative learning proedures.Kernel methods and loser neighbors methods
areusuallyusedin lassiationofexpressedemotionsin speeh[25℄.
The model must be able to assign a label
f ˆ (u)
to an input datau
, so adeisionthresholdnoted
T h(L x )
isalulatedat eah iteration.Thisthresholdminimizes the error of the model 3
on the available training set. The label at-
tributed is
f ˆ (u n ) = 1
if{ P(y ˆ 1 |u n ) > T h(L x )}
, elsef ˆ (u n ) = 0
. SinethesingleparameteroftheParzenwindowisxed, thetrainingstageisreduedtoount
instanes(within themeaningoftheGaussiankernel).Thestrategiesofexam-
ples seletion, withoutbeinginuened bythe trainingof the model, are thus
omparable.
4.4 Used Ative Learning strategies
TwoAtivelearningstrategiesare onsideredin thispaper;theativelearning
strategy whih tries to redue the generalization error of the model and the
strategyonsistsinseletingtheinstaneforwhih thepreditionofthemodel
ismostunertainhavebeentested.
Forthe rststrategy theParzen window estimates
P (y i |l n )
. Theempirialriskis approximatedadopting auniform apriorion the
P (l n )
. Thepurpose isto selet the unlabeled instane
u i ∈ U x whih will minimize the risk of the
next iteration.
R(M +u n )
the expeted risk resultingfrom thelabelingof theinstane
u n (iterationt + 1
)isestimated.Availablelabeleddataareused todo
thisestimationwhentheassumption
f (u n ) = y 1[resp f (u n ) = y 0℄ toestimate
R(M ˆ +(u n ,y 1 ) )
[respR(M ˆ +(u n ,y 0 ) )
℄ is done.For the seond strategy the unertainty of a predition is maximum when
theoutputprobabilityofthemodelapproahesthedeisionthreshold.
Apartfrom these two ative strategies, astohasti approah whih uni-
formlyseletstheexamplesaordingtotheirprobabilitydistributionisonsid-
ered.Thislastapproahplayaroleofrefereneusedtomeasuretheontribution
oftheativestrategies.
4.5 Results
Thepresentedresultsomefromseveralexperimentsonpreviouslearningstrate-
gies.Eahexperimenthasbeendonevetimes 4
.Atthebeginningoftheexper-
iments, thetraining set is onlyonstituted by two examples(one positiveand
onenegative)seletedrandomly.Ateahiteration,tenexamplesareseletedto
3
TheusederrormeasurementistheBalanedErrorRate,formoredetailsseesetion
4.5
4
thenathesontheurvesofthegure3orrespondto4times thevarianeof the
results(
±2σ
).here,is unbalaned:there are92%ofpositiveorneutral emotions and8% of
negativeemotions.Toobserveorretlythelassiationprots(whenadding
labeledexamples),themodelevaluationisdoneusingtheareaunderROCurve
(AUC)on thetest set 5
.A ROCurveis alulatedfrom the detetionrate of
asinglelass. Conseently, weuse thesumofthe AUCsweightedbyreferene
lass'sprevaleneinthedata.
0.6 0.65 0.7 0.75 0.8 0.85 0.9
0 200 400 600 800 1000 1200
AUC
Random Strategy Uncertainty Maximization Error Reduction
Fig.3.Fousoftheresultsonthetestusing[0:1200℄trainingexamples
Theriskredutionisthestrategywhihmaximizesthequalityofthemodel
for a number of training examples in the range [2:100℄. Between100 and 700
the strategybased onunertaintywins. After600 trainingexamplesthe three
strategiesonvergetotheoptimalAUC(AreaUnderRoCurve).
Thetwoativestrategiesallowobtainingfasterthantherandomstrategythe
optimalresult(the optimalAUCis0.84usingthewhole trainingset).Theuse
ofativelearningis positivein thisreal problem.Howevertheresultsobtained
raisequestionswhihwill detailedinthenextsetion.
5
Thispapershowstheinterestofativelearningforaeldwhereaquisitionand
labeling of data are partiularly expensive. Obtained results show that ative
learningisrelevantforthedetetionofexpressedemotionsinspeeh.Butwhat-
everthe strategyonsidered (eventhe twostrategiesevokedin setion 3.4but
notderailedin thispaper)severalquestionexistandanberaised:
evaluation -Thequality ofan ative strategyisusually representedbya
urveassessingtheperformaneofthemodelversusthenumberoftraining
exampleslabeled (see Figure 3). The performane riterion used an take
several dierent ways aordingto theproblem. This typeof urveallows
onlyomparisons between strategiesin apuntual way, i.e. for apoint on
the urve (a given number of training examples). If two urves pass eah
other(asintheFigure3,itisimpossibletodetermineifastrategyisbetter
than another (on the total set of training examples). The elaboration of
ariterion whih measures the ontributionof a strategyompared to the
randomstrategyonthewholedatasetshouldbeinteresting.Thispointwill
bedisussedinafuturepaper.
testset-Ativelearningstrategiesare,often,usedwhendataaquisitionis
expensive.Therefore,inpratie,atestsetisnotavailable(otherwiseitan
beused to thetraining) andtheevaluation ofthemodelduring astrategy
isdiultto implement.
stopping riterion -Themaximal numberof examplesto labeled,oran
estimationoftheprogressof themodel,anbeused tostopthealgorithm.
Thisisverylinktotheuseofatestsetorthemodelemployed.Forexample
in theFigure 3thestrategy basedonthe riskgivesthe sameresultswhen
15 examples have been labeled than results using all the available data.
Inthis ase the ost using 15 examplesand 600 will benot the same... A
goodriterionshouldbeindependentofthemodelandofatestset.Atual
experiments (not yet published) willallow usto propose ariterionof this
typeattheendof2007.
number ofexamplesto belabeled:thestateoftheartseemstoinor-
porateanonlyoneexampleateahstepofthestrategy.Butinrealasethe
expertisahumanandwhenthemodelneedstimetolearnateahiteration
ofthestrategythisouldbenoteient.Sometimesmorethanoneexample
mustbeinorporated.Thisaspethasbeenalitlebitstudied in[26℄but it
hasto bemoreanalyseinthefuture.
unertainenvironment-Ifananswerouldbegiventothepointsabove
thenativestrategiesouldbeusedforon-line learninginanunertainen-
vironment.Forexampletotagpartofgraph(graphhereissoialnetwork).
Whenwritingthispaperwehopethatthispointwillaeptedandinorpo-
ratedin apropositionsenttoaFrenhProjet(andnanedbytheFrenh
NationalAgenyofResearh(ANR))groupingindustrialanduniversities.
Generally,ativelearningstrategiesestimatetheutilityoftrainingexamples.
Thisapproahwouldbeableto onsidernonstationaryproblemsanditisable
totrainamodelwhih adaptsitselftothevariationsof theobservedsystem.
Forthedetetionofexpressedemotionsinspeehouldbetreatedbyadouble
strategyreduingtheostlinkedtothedata:(i)avariablesseletionallowingto
preserveonlytheneessaryandsuientharateristisforlassiation;(ii)an
examplesseletionallowingtopreserveonly usefulinstanes for training.This
willbeexploredinfuture work.
Referenes
1. Harmon,M.: Reinforementlearning:atutorial. http://eureka1.aa.wpafb.af.mil/
rltutorial/(1996)
2. Jain,A.K.,Murty,M.N.,Flynn,P.J.: Datalustering:areview. ACMComputing
Surveys31(3) (1999)264323
3. Jamy, I., Jen, T.Y., Laurent, D., Loizou, G., Sy, O.: Extration de règles
d'assoiation pour la prédition de valeurs manquantes. Revue Afriaine de la
ReherheenInformatiqueetMathématiqueAppliquéeARIMASpéialCARI04
(2005)103124
4. Chapelle, O., Shölkopf, B., Zien, A.: Semi-Supervised Learning. MIT Press,
Cambridge,MA(inpress)(2006)http://www.kyb.tuebingen.mpg.de /ssl-book /
ssl_to.pdf.
5. Cohn,D.,Caruana,R.,MCallum,A.: Semi-supervisedlusteringwithuserfeed-
bak. TehnialReport1892,CornellUniversity(2003)
6. Chapelle,O., Zien, A.: Semi-supervisedlassiation by low density separation.
InProeedingsoftheTenthInternationalWorkshoponArtiialIntelligeneand
Statistis(2005)
7. Castro,R., Willett, R.,Nowak, R.: Faster rateinregressionvia ative learning.
In:NIPS(NeuralInformationProessingSystems),Vanouver(2005)
8. Singh,A.,Nowak,R.,Ramanathan,P.:Ativelearningforadaptivemobilesensing
networks. In:IPSN'06: Proeedings ofthe fth international onferene on In-
formationproessinginsensornetworks,NewYork,NY,USA,ACMPress(2006)
6068
9. Roy,N.,MCallum,A.: Towardoptimalativelearningthroughsamplingestima-
tionof errorredution. In:Pro. 18thInternationalConf. onMahine Learning,
MorganKaufmann,SanFraniso,CA(2001)441448
10. Muslea,I.:AtiveLearningWithMultipleView.Phdthesis,Universityofsouthern
alifornia (2002)
11. Lewis,D.,Gale, A.: A sequentialalgorithmfortrainingtext lassiers. InCroft,
W.B., vanRijsbergen, C.J., eds.: Proeedings of SIGIR-94, 17th ACMInterna-
tionalConfereneonResearhandDevelopmentinInformationRetrieval,Dublin,
SpringerVerlag,Heidelberg(1994)312
12. Thrun,S.B.,Möller,K.: Ativeexplorationindynamienvironments. InMoody,
J.E.,Hanson,S.J.,Lippmann,R.P.,eds.:AdvanesinNeuralInformationProess-
ingSystems.Volume4.,MorganKaufmannPublishers,In.(1992)531538
13. Cohn,D.A.,Ghahramani,Z.,Jordan,M.I.:Ativelearningwithstatistialmodels.
In Tesauro, G., Touretzky, D., Leen, T., eds.: Advanes in Neural Information
supervisedlearning usinggaussian eldsandharmoni funtions. In:ICML(In-
ternationalConfereneonMahine Learning),Washington(2003)
15. Parzen,E.: Onestimationofaprobabilitydensityfuntionandmode. Annalsof
MathematialStatistis33(1962)10651076
16. Freund,Y.,Seung,H.S.,Shamir,E.,Tishby,N.:Seletivesamplingusingthequery
byommitteealgorithm. MahineLearning28(2-3)(1997)133168
17. Seung,H.S.,Opper,M.,Sompolinsky,H.:Querybyommittee.In:Computational
LearningTheory.(1992)287294
18. Dasgupta,S.: Analysisofgreedyativelearningstrategy. In:NIPS(NeuralInfor-
mationProessingSystems),SanDiego(2005)
19. Cohn,D.A.,Atlas,L.,Ladner,R.E.:Improvinggeneralizationwithativelearning.
MahineLearning15(2)(1994)201221
20. Tong,S., Koller,D.: Supportvetormahineativelearning withappliations to
textlassiation.InLangley,P.,ed.:ProeedingsofICML-00,17thInternational
Conferene on Mahine Learning, Stanford, US, Morgan KaufmannPublishers,
SanFraniso,US(2000)9991006
21. Lisombe,J.,Riardi, G., Hakkani-Tür, D.: Using ontext to improveemotion
detetioninspokendialogsystems.In:InterSpeeh,Lisbon(2005)
22. Poulain,B.: Séletiondevariablesetmodélisationd'expressionsd'émotionsdans
desdialogueshommes-mahine.In:EGC(ExtrationetGestiondeConnaissane),
Lille.+TehnialReportavalaiblehere:http://perso.rd.franeteleom.fr/lemaire
(infrenh)(2006)
23. Boullé, M.: An enhaned seletive naive bayesmethod withoptimal disretiza-
tion. InGuyon, I.,Gunn,S.,Nikravesh,M., Zadeh,L.,eds.:Featureextration,
foundationsandAppliation. Springer(August2006)499507
24. Chappelle,O.: Ativelearningfor parzenwindows lassier. In:AI &Statistis,
Barbados(2005)4956
25. Guide, V., Rakotomamonjy, Canu, S.: Méthode à noyaux pour l'identiation
d'émotion. In: RFIA (Reonnaissane des Formes et Intelligene Artiielle).
(2003)
26. Bondu,A.,Lemaire, V.: Etudedel'inuenedunombred'exemplesàétiquetter
dans une proédure d'apprentissage atif. submitted to CAP 2006 (Conferene
franophonesurl'apprentissageautomatique)