• Aucun résultat trouvé

"Active Learning Strategies: a case study for detection of emotions in speech",

N/A
N/A
Protected

Academic year: 2022

Partager ""Active Learning Strategies: a case study for detection of emotions in speech","

Copied!
14
0
0

Texte intégral

(1)

detetion of emotions in speeh

AlexisBondu,VinentLemaire,andBarbaraPoulain

R&DFraneTeleom,

TECH/EASY/TSI

2avenuePierreMarzin22300Lannion

Abstrat. Mahine learning indiates methods and algorithms whih

allowa model tolearn abehavior thanksto examples.Ative learning

gathersmethodswhihseletexamplesusedtobuildatrainingsetforthe

preditivemodel.Allthestrategiesaimtousethelessexamplesaspossi-

bleandtoseletthemostinformativeexamples.Afterhavingformalized

theativelearningproblemandafterhavingloateditintheliterature,

this artile synthesizes inthe rst part the mainapproahes of ative

learning.Takingintoaount emotionsinHuman-mahineinterations

anbehelpfulfor intelligentsystemsdesigning.Themaindiulty,for

theoneptionofallsenter'sautomatishuntingsystem,istheostof

datalabeling.Thelastsetionofthispaperproposetoreduethisost

thankstotwoativelearningstrategies.Thestudyisbasedonrealdata

resultingfromtheuseofavoalstokexhangeserver.

1 Introdution

Ativelearningmethodsomefromaparallelbetweenativeeduationalmeth-

ods and learning theory. The learner is from now astatistial model and not

a student. The interations between the student and the teaher orrespond

to theopportunity(possibility)to themodelto interat withahumanexpert.

The examples are situations used by the model to generate knowledge on the

problem.

Ative learningmethods allow the model to interat with its environment

byseletingthemoreinformative situations. Thepurposeisto trainamodel

whih usesas litleas possibleexamples. Theelaboration of thetraining set is

doneininterationwithahumanexperttomaximizeprogressofthemodel.The

modelmustbeabletodetetthemoreinformativeexamplesforitslearningand

toask totheexpert:what shouldbedonein thesesituations.

Thepurposeofthispaperistopresenttwomainativelearningapproahes

found in thestateof theart. These approahesare presentedin ageneri way

withoutonsideredakindofmodel(the onewhih learnsusingexamplesdeliv-

eredbytheexpertaftereveryofitsrequests).Othersapproahesexistbutthey

arenotpresentedinthispaperalthoughreferenesaregivenforthereaderwho

(2)

ageneriwayandestablishmathematialnotationsused.Theaimofthissetion

istoplaeativelearningamongothersstatistiallearningmethods(supervised,

unsupervised...).Thefourthsetionpresentsindetailstwomainativelearning

approahes. These two strategies are then used in the fth setion on a real

problem.Finallythelastsetionisadisussiononquestionopeninthispaper.

2 Ative Learning

2.1 General remarks

Theobjetiveofstatistiallearning(unsupervised,semi-supervised,supervised 1

)

is to inulate a behavior to a model using observations (examples) and a

learning algorithm.The observations are pointsof viewon the problem to be

resolvedand onstitutethelearningdata. Atthe endof thetrainingstage the

model hastogeneralizeitslearningto unseensituationsinareasonableway.

Forexamplelet'simagineamodelwhihtrytodetethappyandunhappy

people from passport photo. If themodel realizes good preditions for unseen

peopleduringitstrainingstagethenthemodelorretlygeneralize.

Charateristisof used data hange depending on the learning mode. Un-

supervised learning is a method of mahine learning where a model is t to

observations.Itisdistinguishedfrom supervisedlearningbythefatthat there

are no apriori outputson data. The learnerhas to disoveritself orrelations

betweenexampleswhihareshowntoit.Inaseoftheexampleabove(happy

/unhappy people),themodelistrainedusingpassportphotosdepriveoflabel

and hasno indiation on what we try to make it learn. Among unsupervised

learningmethodsonendslusteringmethods[2℄andassoiationrulesmethods

[3℄.

Semi-supervisedlearning[4℄isalass oftehniques that makesuseof both

labeledandunlabeleddatafortraining;typiallyasmallamountoflabeleddata

andalargeamountofunlabeleddata.Amongpossibleutilizationofthislearning

mode, weould distinguish (i)semi-supervisedlustering whih tries to group

similar instanes but using information given by the small amount of labeled

data[5℄and(ii)semi-supervisedlassiation[6℄whihisbasedrstonlabeled

datato elaboratearstmodelandthenunlabeleddatatoimprovethemodel.

Supervisedlearningis amahine learningtehniqueforreatingafuntion

fromtrainingdata.Thetrainingdataonsistofpairsofinputobjets(typially

vetors), anddesired outputs.The outputof the funtion anbe aontinuous

value(alledregression),oranpreditalasslabeloftheinputobjet(alled

lassiation).Thetask ofthe supervisedlearner isto preditthevalueofthe

funtionforanyvalidinputobjetafterhavingseenanumberoftrainingexam-

ples (i.e. pairsof input and target output). Inase of the illustrativeexample

above,exampleswouldbepassport photosassoiatedtolabelshappy orun-

happy.

1

(3)

lesspassivethantheothersdesribedabove.Thisstrategyallowsthemodel to

onstrut its own training set in interation with ahuman expert. Thelearn-

ing startswithfewdesiredoutputs (lasslabelsfor lassiationorontinuous

value for regression).Then, the model selets examples (without desired out-

puts)thatitonsidersthemoreinformativeandaskstothehumanexperttheir

desiredoutputs. Inaseoftheillustrativeexample,themodelasks lasslabels

of passport photos presented to the human expert. In this paper, we restrit

ativelearningtolassiationbutitis obviousthatourpresentationof ative

learningstrategiesanbetransposeforregression.

Ativelearningisdierentfromallotherslearningmethodsbeauseitinter-

atswithitsenvironment;theexamplesarenotrandomlyhosen.Ativelearning

strategiesallowthemodeltolearnfaster(thelearnerrihthebestperformanes

usinglessdata)onsideringrstthemoreinformativeexamples.Thisapproah

ismorespeiallyattrativewhendataareexpensivetoobtainortolabel.

2.2 Twopossiblesenarios

Thedistintionbetweenrawdataanddatadesriptors(whihareassoiated)is

important.Intheillustrativeexampleabove,therawdata arepassport photos

andthedatadesriptorsareattributesdesribingthephotos(pixel,luminosity,

ontrasts, et.). The model makes the predition of the lass happy or un-

happy to everyvetorof desriptors. Theelaborationof desriptorsfrom raw

dataisnotalwaysbijetive;sometimesitisimpossibletoomposerawdataus-

inglistofdesriptors.Adaptivesampling andseletivesampling,whiharethe

twomain senariostosetativelearning[7℄,userespetivelydatadesriptors

andrawdata.

Inthe ase of adaptive sampling [8℄ the model requires of expert labels

orrespondingtovetorsofdesriptors.Themodelisnotrestritedandanex-

ploreallthespaeofvariationsofthedesriptors,searhingareato besampled

morenely.Adaptivesamplinganposeprobleminitsimplementationwhenit

is diultto knowifthe vetorsof desriptors(generated bythe model)have

ameaning with respet to theinitial problem. Letus suppose that themodel

requiresthelabelassoiatedwiththevetor

[10, 4, 5..., 12]

.Doesthisvetoror-

respondto aset ofdesriptorswhihrepresentapassportphoto,ahumanfae

photo,aowerorsomethingelse?

Intheaseofseletivesampling[9℄,themodelobservesonlyonerestrited

part of theuniverse materialized by training examplesstripped of label. Con-

sequently, theinput vetorsseleted bythe model alwaysorrespondto araw

data. The image of a bag of instanes for whih the model an ask labels

assoiated(totheexamplesin thebag)isusuallyused.Themodelrequiresthe

label assoiated withthevetor

[10, 4, 5..., 12]

whih orresponds toapassport

photo.

(4)

ofunlabeledexamplesandforwhihlabelingisexpensive.Therefore,fromnow

thepointofviewofseletivesamplingisonlyonsidered.Inpratie,thehoie

of seletive oradaptivesampling depends primarily on theappliability where

themodelisauthorized,ornot,to generate newexamples.

2.3 Notations

M ∈ M

is the preditive model whih is trained thanks to an algorithm

L

.

X ⊆ R n

representsallthepossibleinputexamplesofthemodeland

x ∈ X

isa

partiularexamples.

Y

isthesetofthepossibleoutputs(answers)ofthemodel;

y ∈ Y

alass label2related(assoiated)to

x ∈ X

.

Duringits training,themodel observes(see Figure1) onlyonepart

Φ ⊆ X

oftheuniverse.Thesetofexamplesislimitedandthelabelsassoiatedtothese

examplesare not neessarilyknown. The set of exampleswhere the labels are

known(atastepofthetrainingalgorithm)isalled

L x

andthesetofexamples

wherethelabelsareunknownisalled

U x

with

Φ ≡ U x ∪ L x

and

U x ∩ L x ≡ ∅

.

Theoneptwhihislearnedanbeseenasafuntion,

f : X → Y

,with

f (x 1 )

isthedesiredanswerofthemodelfortheexample

x 1

and

f b : X → Y

theobtained

answerofthemodel;anestimationoftheonept.Theelementsof

L x

andthe

assoiatedlabelsonstituteatrainingset

T

.Thetrainingexamplesarepairsof

inputvetorsanddesiredlabelssuhas

(x, f (x))

:

∀x ∈ L x , ∃!(x, f (x)) ∈ T

.

Fig.1.Notations

2

The word label is used here for a disrete value in lassiation problems or a

(5)

3.1 Introdution

TheproblemofseletivesamplingwasposedformallybyMuslea[10℄(seeAlgo-

rithm1).Itusesanutilityfuntion,

U tility(u, M)

,whihestimatestheutilityof

anexample

u

forthetrainingofthemodel

M

.Thankstothisfuntion,themodel

presentstotheexpertexamplesforwhihithopesthegreatestimprovementof

itsperformanes.

TheAlgorithm1is generiinsofarasonlythefuntion

U tility(u, M)

must

bemodiedtoexpressapartiularativelearningstrategy.Howtomeasurethe

interestofanexamplewillbedisussnow.

Considering:

• M

apreditivemodelprovidedwithatrainingalgorithm

L

• U x

et

L x

thesetsofexamplesrespetivelynotlabeledandlabeled

• n

thedesirednumberoftrainingexamples

• T

thetrainingsetwith

kT k < n

• U : X × M → ℜ

thefuntionwhihestimatestheutilityofanexamplefor

thetrainingofthemodel

Repeat

(A)Trainthemodel

M

thanksto

L

and

T

(andpossibly

U x

).

(B)Looktheexamplesuhas

q = argmax u∈U x U(u, M)

(C)Withdraw

q

of

U x

andaskthelabel

f(q)

totheexpert.

(D)Add

q

to

L x

andadd

(q, f (q))

to

T

until

kT k < n

Algorithm1:Seletivesampling, Muslea2002

3.2 Unertainty sampling

Unertainty sampling is an ative learning strategy [11,12℄ whih is based on

ondene that themodel hasinits preditions.Themodel usedmust beable

to estimate the reliability of its answers, to provide the probabilities,

y j

, to

observeeahlass(

j

)foranexamples

u

.Thusthemodelanmakeapredition

hoosingthemostprobablelassfor

u

.Thehoieofnewexamplestobelabeled

proeedsin twosteps:

themodelavailableattheiteration

t

isusedtopreditthelabelsofthe

unlabeledexamples;

exampleswiththemoreunertainpreditionareseleted.

The unertainty of a predition an also be dened using a threshold of

(6)

0and those whih will be lassied1.Theloser ananswerof themodel is to

thethresholdofdeision,themoreunertainisthedeision.

Thisrstapproahhastheadvantagetobeintuitive,easytoimplementand

fast.Theunertaintysamplingshowsitslimitshoweverwhentheproblemtobe

solved is not separable by the model. Indeed, this strategy will tend to selet

the examplesto be labeled in mixture zones,where there is nothing anymore

tolearn.

Fig.2.Abinarylassiationproblem:theboundaryplottedrepresentsthethreshold

ofdeision.Levelsetsareplottedtooandtheirdistanesfromtheboundarylinesindi-

atetheunertainty.Unlabeleddatalosetotheboundarylinearethemoreunertain

andwillbeseletedtobelabeledbytheexpert.

3.3 Riskredution

Thepurposeofthisapproahistoreduethegeneralizationerror,

E(M)

,ofthe

model [13℄.It hooses examplesto belabeled so as to minimize this error. In

pratiethiserrorannotbealulatedbeausethedistributionoftheexamples,

X

,is unknown.Howeveritanbewrite, at aniteration

t

, usingalossfuntion

(

Loss(M t , x)

)whihevaluatestheerrorofthemodelforagivenexample

x ∈ X

suhas:

E(M t ) = Z

X

Loss(M t , x)P (x)dx

The same model at the next iteration

t + 1

is dened as:

M t+1 (x ⋄ ,y )

. This

modeltakesintoaountanewtrainingexample:

(x , y )

.Forrealproblems,the

outputofthemodel

y

isunknownsine

x

isanotlabeleddata.Toestimatethe

generalizationerrorat

t+1

,allthepossibilitiesofthe

Y

sethavetobeonsidered

(7)

E(M t+1 x ⋄ ) = Z

X

Z

Y

P (y|x )Loss(M t+1 (x ⋄ ,y) , x)P(x)dxdy

This strategies selets the example

q

whih minimizes

E(M t+1 x ⋄ )

. One la-

beled,thisexampleisinorporatedto thetrainingset.Stepbystepthis proe-

duretriesto elaborateanoptimaltrainingset.

Niholas Roy [9℄ shows how to bring this strategy into play sine all the

elementsof

X

arenotknown.Heusesanuniformpriorfor

P (x)

whihgives:

E(M b t ) = 1

kL x k

kL x k

X

i=1

Loss(M t , x i )

This strategy, where dierent lossfuntions anbeused, is summarized in

the algorithm 2. The model is, for all examples

i

, trained several times (

||Y||

times), to estimate

E(M b t+1 (x

i ,y j ) )

. The example

i

whih minimizes the expeted

lossfuntion(

E(M b t+1 (x

i ) )

)willbeinorporatedinthetrainingset.

Considering:

• M

apreditivemodelprovidedwithatrainingalgorithm

L

• U x

and

L x

thesetsofexamplesrespetivelynotlabeledandlabeled

• n

thedesirednumberoftrainingexamples

• T

thetrainingsetwith

kT k < n

• Y

thelabelsetwhihanbegiventotheexamplesof

U x

• Loss : M → ℜ

thegeneralization error

• Err : U x × M → ℜ

the expetedgeneralization error for the model

M

trainedwithanadditional example,

T ∪ (x i , f(x i ))

Repeat

(A)Trainthemodel

M

thanksto

L

and

T

For allexamples

x i ∈ U x

do

For all label

y j ∈ Y

do

i)Trainthemodel

M i,j

thanksto

L

and

(T ∪ (x i , y j ))

ii)Computethegeneralization error

E(M b t+1 (x

i ,y j ) )

endFor

Computethegeneralizationerror

E(M b t+1 x i ) = P

y j ∈ Y E(M b t+1 (x

i ,y j ∗) ).P(y j |x i )

endFor

(B)Lookfortheexample

q = argmin u∈U x E(M b t+1 x i )

(C)Withdraw

q

of

U x

ansaskthelabel

f(q)

totheexpert.

(D)Add

q

to

L x

andadd

(q, f (q))

to

T

until

kT k < n

(8)

thegeneralizationerror(

E(M)

)usingtheempirialrisk:

E(M) = b R(M) = X N

n=1

X

y j ∈ Y

1

{f(l n )6=y j } P (y j |l n )P(l n )

with

l n ∈ L x

where

f

isthemodelwhihestimatestheprobabilitythatanexamplebelong toalass(aParzenwindow[15℄in[14℄),

P(y i |l n )

therealprobabilitytoobserve the lass

y i

for the example

l n ∈ L x

, 1 the indiating funtion equal to

1

if

f (l n >) 6= y i

andequalto

0

ifnot.Therefor

R(M)

isthesumoftheprobabilities

that themodelmakesabaddeisiononthetrainingset(

L x

).

Usinganuniformpriortoestimate

P (l n )

:

R(M) = ˆ 1

N X N

n=1

X

y j ∈Y

1

{f(l n )6=y j } P ˆ (y j |l n )

Theexpetedost foranysingleexample

u

(

u ∈ U x

)addedto thetraining

set (forbinarylassiationproblem)isthen:

R(M ˆ +u ) = X

y j ∈Y

P(y ˆ j |u) ˆ R(M +(u,y j ) )

with

u ∈ U x

3.4 Disussion

Both strategiesdesribedabove arenot theonly ones whih exist. Thereader

an see athird main strategy whih is based on Queryby Committee [16,17℄

andafourthonewhereauthorsfousonamodelapproahtoativelearningin

aversion-spaeofonepts[1820℄.

4 Appliation of ative learning to detetion of emotion

in speeh

4.1 Introdution

Thanks to reent tehniquesof speeh proessing, many automatiphone all

enters appear. Thesevoalserversareused byustomersto arryoutvarious

tasks onversing with a mahine. Companies aim to improve their ustomer's

satisfationbyrediretingthemtowardsahumanoperator,intheeventofdi-

ulty.Theshuntingofunsatisedusersamountsdetetingthenegativeemotions

intheirdialogueswiththemahine,undertheassumptionthataproblemofdi-

aloguegeneratesapartiularemotionalstatein thesubjet.

Thedetetion of expressed emotions in speeh is generally onsidered asa

supervisedlearning problem. The detetion of emotions is limited to a binary

lassiationsinetakingintoaountmorelassesrisesproblemoftheobjetiv-

ityoflabelingtask [21℄.The aquisitionandthelabelingofdataare expensive

in this framework. Ative learning an redue this ost by labeling only the

(9)

Thisstudyisbasedonapreviouswork[22℄whihharaterizesvoalexhanges,

inoptimalway,forthelassiationofexpressedemotionsinspeeh.Theobje-

tiveisto ontrolthedialoguebetweenusersandavoal server.Morepreisely,

thisstudy dealstherelevaneofvariablesdesribingdata,aordingtothede-

tetionofemotions.

Theuseddataresultfromanexperimentinvolving32userswhotestastok

exhangeservieimplementedonavoalserver.Aordingtotheuserspointof

view,thetestonsistsinmanagingavirtualwalletofstokoptions,thegoalis

to realizethe strongestprot. Theobtainedvoal traesonstitutetheorpus

of this study: 5496 speeh turns exhanged with the mahine. Speeh turns

are haraterizedby 200 aoustivariables, desribing variations of the sound

intensity, variationsofvoieheight,frequenyof eloution...et. Dataare also

haraterized by 8 dialogialvariables desribing the rankof a speeh turn in

agivendialogue (a dialogueontains several speeh turn), thedurationof the

dialogue...Eahspeehturn ismanuallylabeledasarryingpositiveornegative

emotions.

Thesubsetofthemostinformativevariableswithrespettothedetetionof

expressed emotions in speeh isgiventhanksto anaiveBayesianseletor [23℄.

Atthebeginningofthisproess(theseletionofthemostinformativevariables),

thesetofattributesisempty.Theattributewhihmostimprovesthepreditive

qualityofthemodelisthenaddedateahiteration.Thealgorithm stopswhen

theadditionof attributesdoesnotimproveanymorethequalityofthemodel.

Finally,20variableswereseletedtoharaterizevoalexhanges.Inthisartile,

used dataresultfrom thesameorpusand from thispreviousstudy. So,every

speehturnis haraterizedby20variables.

4.3 The hoie ofthe model

Thelargerangeofmodelsabletosolvelassiationproblems(andsometimes

the great number of parametersuseful to use them) may representdiulties

to measure the ontribution of a learning strategy. A Parzen window, with a

Gaussian kernel[15℄, is used in experimentsbelow sinethis preditivemodel

usesasingleparameterandisableto workwithfewexamples.Theoutput of

thismodelisanestimateoftheprobabilitytoobservethelabel

y j

onditionally totheinstane

u

:

P ˆ (y j |u) = P N

n=1

1

{f(l n )=y j } K(u, l n ) P N

n=1 K(u, l n ) avec l n , ∈ L x et u ∈ U x ∪ L x

(1)

where

K(u, l n ) = e ||u−ln||

2 2 σ 2

Theoptimal value (

σ 2

=0.24) of the kernel parameterwasfound thanks to

(10)

rameter.Theresultsobtainedbythismodel(usingthewhole oftrainingdata)

aresimilarwiththepreviousresultsobtainedbyanaiveBayesianlassier[22℄.

Consequently,Parzenwindowsareonsideredsatisfyingandvalidforthefollow-

ing ative learning proedures.Kernel methods and loser neighbors methods

areusuallyusedin lassiationofexpressedemotionsin speeh[25℄.

The model must be able to assign a label

f ˆ (u)

to an input data

u

, so a

deisionthresholdnoted

T h(L x )

isalulatedat eah iteration.Thisthreshold

minimizes the error of the model 3

on the available training set. The label at-

tributed is

f ˆ (u n ) = 1

if

{ P(y ˆ 1 |u n ) > T h(L x )}

, else

f ˆ (u n ) = 0

. Sinethesingle

parameteroftheParzenwindowisxed, thetrainingstageisreduedtoount

instanes(within themeaningoftheGaussiankernel).Thestrategiesofexam-

ples seletion, withoutbeinginuened bythe trainingof the model, are thus

omparable.

4.4 Used Ative Learning strategies

TwoAtivelearningstrategiesare onsideredin thispaper;theativelearning

strategy whih tries to redue the generalization error of the model and the

strategyonsistsinseletingtheinstaneforwhih thepreditionofthemodel

ismostunertainhavebeentested.

Forthe rststrategy theParzen window estimates

P (y i |l n )

. Theempirial

riskis approximatedadopting auniform apriorion the

P (l n )

. Thepurpose is

to selet the unlabeled instane

u i ∈ U x

whih will minimize the risk of the

next iteration.

R(M +u n )

the expeted risk resultingfrom thelabelingof the

instane

u n

(iteration

t + 1

)isestimated.Availablelabeleddataareused todo

thisestimationwhentheassumption

f (u n ) = y 1

[resp

f (u n ) = y 0

toestimate

R(M ˆ +(u n ,y 1 ) )

[resp

R(M ˆ +(u n ,y 0 ) )

is done.

For the seond strategy the unertainty of a predition is maximum when

theoutputprobabilityofthemodelapproahesthedeisionthreshold.

Apartfrom these two ative strategies, astohasti approah whih uni-

formlyseletstheexamplesaordingtotheirprobabilitydistributionisonsid-

ered.Thislastapproahplayaroleofrefereneusedtomeasuretheontribution

oftheativestrategies.

4.5 Results

Thepresentedresultsomefromseveralexperimentsonpreviouslearningstrate-

gies.Eahexperimenthasbeendonevetimes 4

.Atthebeginningoftheexper-

iments, thetraining set is onlyonstituted by two examples(one positiveand

onenegative)seletedrandomly.Ateahiteration,tenexamplesareseletedto

3

TheusederrormeasurementistheBalanedErrorRate,formoredetailsseesetion

4.5

4

thenathesontheurvesofthegure3orrespondto4times thevarianeof the

results(

±2σ

).

(11)

here,is unbalaned:there are92%ofpositiveorneutral emotions and8% of

negativeemotions.Toobserveorretlythelassiationprots(whenadding

labeledexamples),themodelevaluationisdoneusingtheareaunderROCurve

(AUC)on thetest set 5

.A ROCurveis alulatedfrom the detetionrate of

asinglelass. Conseently, weuse thesumofthe AUCsweightedbyreferene

lass'sprevaleneinthedata.

0.6 0.65 0.7 0.75 0.8 0.85 0.9

0 200 400 600 800 1000 1200

AUC

Random Strategy Uncertainty Maximization Error Reduction

Fig.3.Fousoftheresultsonthetestusing[0:1200℄trainingexamples

Theriskredutionisthestrategywhihmaximizesthequalityofthemodel

for a number of training examples in the range [2:100℄. Between100 and 700

the strategybased onunertaintywins. After600 trainingexamplesthe three

strategiesonvergetotheoptimalAUC(AreaUnderRoCurve).

Thetwoativestrategiesallowobtainingfasterthantherandomstrategythe

optimalresult(the optimalAUCis0.84usingthewhole trainingset).Theuse

ofativelearningis positivein thisreal problem.Howevertheresultsobtained

raisequestionswhihwill detailedinthenextsetion.

5

(12)

Thispapershowstheinterestofativelearningforaeldwhereaquisitionand

labeling of data are partiularly expensive. Obtained results show that ative

learningisrelevantforthedetetionofexpressedemotionsinspeeh.Butwhat-

everthe strategyonsidered (eventhe twostrategiesevokedin setion 3.4but

notderailedin thispaper)severalquestionexistandanberaised:

evaluation -Thequality ofan ative strategyisusually representedbya

urveassessingtheperformaneofthemodelversusthenumberoftraining

exampleslabeled (see Figure 3). The performane riterion used an take

several dierent ways aordingto theproblem. This typeof urveallows

onlyomparisons between strategiesin apuntual way, i.e. for apoint on

the urve (a given number of training examples). If two urves pass eah

other(asintheFigure3,itisimpossibletodetermineifastrategyisbetter

than another (on the total set of training examples). The elaboration of

ariterion whih measures the ontributionof a strategyompared to the

randomstrategyonthewholedatasetshouldbeinteresting.Thispointwill

bedisussedinafuturepaper.

testset-Ativelearningstrategiesare,often,usedwhendataaquisitionis

expensive.Therefore,inpratie,atestsetisnotavailable(otherwiseitan

beused to thetraining) andtheevaluation ofthemodelduring astrategy

isdiultto implement.

stopping riterion -Themaximal numberof examplesto labeled,oran

estimationoftheprogressof themodel,anbeused tostopthealgorithm.

Thisisverylinktotheuseofatestsetorthemodelemployed.Forexample

in theFigure 3thestrategy basedonthe riskgivesthe sameresultswhen

15 examples have been labeled than results using all the available data.

Inthis ase the ost using 15 examplesand 600 will benot the same... A

goodriterionshouldbeindependentofthemodelandofatestset.Atual

experiments (not yet published) willallow usto propose ariterionof this

typeattheendof2007.

number ofexamplesto belabeled:thestateoftheartseemstoinor-

porateanonlyoneexampleateahstepofthestrategy.Butinrealasethe

expertisahumanandwhenthemodelneedstimetolearnateahiteration

ofthestrategythisouldbenoteient.Sometimesmorethanoneexample

mustbeinorporated.Thisaspethasbeenalitlebitstudied in[26℄but it

hasto bemoreanalyseinthefuture.

unertainenvironment-Ifananswerouldbegiventothepointsabove

thenativestrategiesouldbeusedforon-line learninginanunertainen-

vironment.Forexampletotagpartofgraph(graphhereissoialnetwork).

Whenwritingthispaperwehopethatthispointwillaeptedandinorpo-

ratedin apropositionsenttoaFrenhProjet(andnanedbytheFrenh

NationalAgenyofResearh(ANR))groupingindustrialanduniversities.

Generally,ativelearningstrategiesestimatetheutilityoftrainingexamples.

(13)

Thisapproahwouldbeableto onsidernonstationaryproblemsanditisable

totrainamodelwhih adaptsitselftothevariationsof theobservedsystem.

Forthedetetionofexpressedemotionsinspeehouldbetreatedbyadouble

strategyreduingtheostlinkedtothedata:(i)avariablesseletionallowingto

preserveonlytheneessaryandsuientharateristisforlassiation;(ii)an

examplesseletionallowingtopreserveonly usefulinstanes for training.This

willbeexploredinfuture work.

Referenes

1. Harmon,M.: Reinforementlearning:atutorial. http://eureka1.aa.wpafb.af.mil/

rltutorial/(1996)

2. Jain,A.K.,Murty,M.N.,Flynn,P.J.: Datalustering:areview. ACMComputing

Surveys31(3) (1999)264323

3. Jamy, I., Jen, T.Y., Laurent, D., Loizou, G., Sy, O.: Extration de règles

d'assoiation pour la prédition de valeurs manquantes. Revue Afriaine de la

ReherheenInformatiqueetMathématiqueAppliquéeARIMASpéialCARI04

(2005)103124

4. Chapelle, O., Shölkopf, B., Zien, A.: Semi-Supervised Learning. MIT Press,

Cambridge,MA(inpress)(2006)http://www.kyb.tuebingen.mpg.de /ssl-book /

ssl_to.pdf.

5. Cohn,D.,Caruana,R.,MCallum,A.: Semi-supervisedlusteringwithuserfeed-

bak. TehnialReport1892,CornellUniversity(2003)

6. Chapelle,O., Zien, A.: Semi-supervisedlassiation by low density separation.

InProeedingsoftheTenthInternationalWorkshoponArtiialIntelligeneand

Statistis(2005)

7. Castro,R., Willett, R.,Nowak, R.: Faster rateinregressionvia ative learning.

In:NIPS(NeuralInformationProessingSystems),Vanouver(2005)

8. Singh,A.,Nowak,R.,Ramanathan,P.:Ativelearningforadaptivemobilesensing

networks. In:IPSN'06: Proeedings ofthe fth international onferene on In-

formationproessinginsensornetworks,NewYork,NY,USA,ACMPress(2006)

6068

9. Roy,N.,MCallum,A.: Towardoptimalativelearningthroughsamplingestima-

tionof errorredution. In:Pro. 18thInternationalConf. onMahine Learning,

MorganKaufmann,SanFraniso,CA(2001)441448

10. Muslea,I.:AtiveLearningWithMultipleView.Phdthesis,Universityofsouthern

alifornia (2002)

11. Lewis,D.,Gale, A.: A sequentialalgorithmfortrainingtext lassiers. InCroft,

W.B., vanRijsbergen, C.J., eds.: Proeedings of SIGIR-94, 17th ACMInterna-

tionalConfereneonResearhandDevelopmentinInformationRetrieval,Dublin,

SpringerVerlag,Heidelberg(1994)312

12. Thrun,S.B.,Möller,K.: Ativeexplorationindynamienvironments. InMoody,

J.E.,Hanson,S.J.,Lippmann,R.P.,eds.:AdvanesinNeuralInformationProess-

ingSystems.Volume4.,MorganKaufmannPublishers,In.(1992)531538

13. Cohn,D.A.,Ghahramani,Z.,Jordan,M.I.:Ativelearningwithstatistialmodels.

In Tesauro, G., Touretzky, D., Leen, T., eds.: Advanes in Neural Information

(14)

supervisedlearning usinggaussian eldsandharmoni funtions. In:ICML(In-

ternationalConfereneonMahine Learning),Washington(2003)

15. Parzen,E.: Onestimationofaprobabilitydensityfuntionandmode. Annalsof

MathematialStatistis33(1962)10651076

16. Freund,Y.,Seung,H.S.,Shamir,E.,Tishby,N.:Seletivesamplingusingthequery

byommitteealgorithm. MahineLearning28(2-3)(1997)133168

17. Seung,H.S.,Opper,M.,Sompolinsky,H.:Querybyommittee.In:Computational

LearningTheory.(1992)287294

18. Dasgupta,S.: Analysisofgreedyativelearningstrategy. In:NIPS(NeuralInfor-

mationProessingSystems),SanDiego(2005)

19. Cohn,D.A.,Atlas,L.,Ladner,R.E.:Improvinggeneralizationwithativelearning.

MahineLearning15(2)(1994)201221

20. Tong,S., Koller,D.: Supportvetormahineativelearning withappliations to

textlassiation.InLangley,P.,ed.:ProeedingsofICML-00,17thInternational

Conferene on Mahine Learning, Stanford, US, Morgan KaufmannPublishers,

SanFraniso,US(2000)9991006

21. Lisombe,J.,Riardi, G., Hakkani-Tür, D.: Using ontext to improveemotion

detetioninspokendialogsystems.In:InterSpeeh,Lisbon(2005)

22. Poulain,B.: Séletiondevariablesetmodélisationd'expressionsd'émotionsdans

desdialogueshommes-mahine.In:EGC(ExtrationetGestiondeConnaissane),

Lille.+TehnialReportavalaiblehere:http://perso.rd.franeteleom.fr/lemaire

(infrenh)(2006)

23. Boullé, M.: An enhaned seletive naive bayesmethod withoptimal disretiza-

tion. InGuyon, I.,Gunn,S.,Nikravesh,M., Zadeh,L.,eds.:Featureextration,

foundationsandAppliation. Springer(August2006)499507

24. Chappelle,O.: Ativelearningfor parzenwindows lassier. In:AI &Statistis,

Barbados(2005)4956

25. Guide, V., Rakotomamonjy, Canu, S.: Méthode à noyaux pour l'identiation

d'émotion. In: RFIA (Reonnaissane des Formes et Intelligene Artiielle).

(2003)

26. Bondu,A.,Lemaire, V.: Etudedel'inuenedunombred'exemplesàétiquetter

dans une proédure d'apprentissage atif. submitted to CAP 2006 (Conferene

franophonesurl'apprentissageautomatique)

Références

Documents relatifs

For a Parzen Window classifier (top-most cells), probabilistic active learning outperforms the other strategies significantly over all datasets, both af- ter 20 and 40 acquired

These effects are so counterintuitive and contradicting with our daily understanding of space and time that physics students find it hard to learn special

The re- sults obtained on the Cart Pole Balancing problem suggest that the inertia of a dynamic system might impact badly the context quality, hence the learning performance. Tak-

In integrated corpus testing, however, the classifier is machine learned on the merged databases and this gives promisingly robust classification results, which suggest that

Contrary to the previous study where the main strategy that contributed to self-evaluation of the learning method was CEC, in the present study both CEC and IEC have a similar

However, mixed-media/mixed- mode representations (concrete units in drawn containers) were actually most successful, due to the enhanced spatial structuring of the groups provided

all fusion methods that use multiple sensors should at least be better than a network using only the best sensor (the accelerometer).... 3.2

Among the commonly used techniques, Conditional Random Fields (CRF) exhibit excellent perfor- mance for tasks related to the sequences annotation (tagging, named entity recognition