• Aucun résultat trouvé

"Purchase of data labels by batches: study of the impact on the planning of two active learning strategies",

N/A
N/A
Protected

Academic year: 2022

Partager ""Purchase of data labels by batches: study of the impact on the planning of two active learning strategies","

Copied!
10
0
0

Texte intégral

(1)

impat on the planning of two ative learning

strategies

V.Lemaire,A.Bondu,andF.Clérot

FraneTéléomR&DLannion,TECH/EASY/TSI

http://perso.rd.franeteleom.fr /lema ire

vinent.lemaireorange-ftgrou p.om

Abstrat. Ativemahinelearningalgorithmsareusedwhenlargenum-

bersofunlabelledexamplesareavailableandgetting labels forthemis

ostly(e.g.requiresahumanexpert).Ativelearningmethodsseletex-

amplestobuildatrainingsetforapreditivemodelandaimatthemost

informativeexamples.Thenumberofexamplestobelabelledateahit-

erationoftheativestrategyis,mostoften,randomlyhosenorxedto

one.Howeverinpratial situations,thisnumberis aparameterwhih

inuenestheperformaneoftheativestrategy.Thispaperstudiesthe

inueneofthisparameterontwoativelearningstrategies.

1 Introdution

Mahinelearningonsistsofmethodsandalgorithmswhihlearnbehaviortoa

preditivemodel,usingtrainingexamples.Passivelearningstrategiesuseexam-

pleswhih arerandomlyhosen.Ativelearningstrategiesallowthepreditive

model to onstruts its training set in interation with a human expert. The

learningstarts with few labelled examples.Then, the model selets theexam-

pleswithnolabelwhihitonsidersthemostinformativeandaskstheirdesired

assoiatedoutputs to thehuman expert. Themodel learnsfaster using ative

learningstrategies,reahingthebestperformanesusinglessdata.Ativelearn-

ingismorespeiallyattrativeforappliationsforwhihdataisexpensiveto

obtainortolabel.

Ativelearningstrategiesarealsousefulonnewproblem,forinstanelas-

siationproblemwhereinformativeexamplesorinformativedataareunknown.

Thequestionishowto obtaintheinformation requiredto solvethisnewprob-

lem? Anoperationalplanningofanativealgorithm appliedonanewlassi-

ationproblem ouldbedened astheadditiononindividualost, individual

step,whih allowto athinformationtosolvethisnewproblem:

(

I

)aninitialisation:whih,howandhowmanylabelshavetobebuyatthe beginning(before therstlearning)).

(

P P

)apre-partition[1℄;

(2)

(

P S

)apre-seletion[2℄;

(

D

)adiversiation[3℄;

(

B

)thepurhaseof

N

example(s)(ustomarily

N = 1

)

(

E

)theiterationevaluation [4℄;

(

M

)themodelused.

Planningthepurhaseofnewexamples(perpakages) isaompromise(

C

)

betweenthesedierentstepswhihinludethedilemmabetweenexploration[5℄

andexploitation[6℄,suhthat:

C = EW (α 1 I + α 2 P P + α 3 P S + α 4 D + α 5 B + α 5 E + α 6 M )

where

EW

is theevaluationof theoverall proedure.The qualityof an ative

strategyisusuallyrepresentedbyaurveassessingtheperformaneofthemodel

versusthenumberoftrainingexampleslabelled.

Fortheoneptionofanautomatishuntingsystem(forphoneservers)whih

takesintoaountemotionsinspeeh[7℄(ournewproblem)thisapproahan

beused. Inthisase,data isomposed byturn ofspeeh whihare exhanged

betweenusersandthemahine.Eahpieeofdatahastobelistenedbyahuman

expert to be labelled as ontaining (or not) negative emotions. The purpose

of ative strategies whih are onsidered in this artile is to selet the most

informativeunlabelledexamples.These approahesminimizethelabellingost

indutedbythetrainingofapreditivemodel.Fortheoneptionofautomati

shuntingsystem(forphoneservers),whihtakesintoaountemotionsinspeeh,

ourorpusontainsmorethan100000turnsofspeeh.Thereforetheoperational

planningisveryimportant.

Twomain ative learning strategies are used in the literature (see setion

2). We suspet that suh ative learners are good for exploitation (labelling

examplesneartheboundarytoreneit),buttheydonotondutexploration

(searhingforlargeareasintheinstanespaethattheywouldinorretlylas-

sify); evenworsethantherandomsampling whenlabelsare boughtbypaket.

Onewayto examinethe"exploration"behaviorof thesetwomain strategiesis

tobuymorethanonelabelateveryiteration(the weight of

α 5

above),thisis

thepurposeofthispaper.

2 Ative Learning

2.1 Notations

M ∈ M

isthepreditivemodelwhih istrainedusinganalgorithm

L

.

X ⊆ R n

representsallthepossibleinputexamplesofthemodeland

x ∈ X

isapartiular

example.

Y

isthe set ofthepossibleoutputs ofthe model;

y ∈ Y

alass label

relatedto

x ∈ X

.

Duringitstraining,themodelobservesonlyonepart

Φ ⊆ X

oftheuniverse.

The set of examples is limited and the assoiated labels are not neessarily

(3)

trainingalgorithm)isalled

L x

andthesetofexamplesforwhihthelabelsare

unknownisalled

U x

with

Φ = U x ∪ L x

and

U x ∩ L x ≡ ∅

.

Theoneptwhihislearnedanbeseenasafuntion,

f : X → Y

,with

f (x 1 )

isthedesiredanswerofthemodelfortheexample

x 1

and

f b : X → Y

theanswer

obtainedofthemodel;anestimationoftheonept.Theelementsof

L x

andthe

assoiatedlabelsonstituteatrainingset

T

.Thetrainingexamplesarepairsof

inputvetorsanddesiredlabelssuhas

(x, f (x))

:

∀x ∈ L x , ∃(x, f (x)) ∈ T

.

2.2 Ative Learning Methods

Introdution The point of view of seletive sampling is adopted [8℄ in this

artile. The model observes only one restrited part of the universe

Φ ⊆ X

whihismaterializedbytrainingexampleswithoutlabel.Theimageof abag

ontaining instanes for whih the model an ask assoiated labels is usually

usedto desribethisapproah.

Considering:

• M

apreditivemodelprovidedwithatrainingalgorithm

L

• U x

and

L x

thesetsofexamplesrespetivelynotlabelledandlabelled

• n

thedesirednumberoftrainingexamples

• T

thetrainingsetwith

kT k < n

• U : X × M → ℜ

thefuntionwhihestimatestheutilityofanexamplefor

thetrainingofthemodel

Repeat

(A)Trainthemodel

M

using

L

and

T

(andpossibly

U x

).

(B)Findtheexamplesuhthat

q = argmax u∈U x U(u, M)

(C)Withdraw

q

from

U x

andaskthelabel

f(q)

fromtheexpert.

(D)Add

q

to

L x

andadd

(q, f (q))

to

T

until

kT k < n

Algorithm1:Seletivesampling, Muslea2002

The problem of seletive sampling was posed formally by Muslea [9℄ (see

Algorithm1).Itusesautilityfuntion,

U tility(u, M)

,whihestimatestheutility

ofanexample

u

forthetrainingofthemodel

M

.Usingthisfuntion,themodel

seletsexamplesforwhihithopesthegreatestimprovementofitsperformanes,

andshowstheseexamplestotheexpert.

TheAlgorithm1is generiinsofarasonlythefuntion

U tility(u, M)

must

bemodiedtoexpressapartiularativelearningstrategy.Howtomeasurethe

(4)

the ondene that themodel has in its preditions. Themodel must beable

to produe anoutput and to estimatetherelevane ofits answers.Themodel

estimatestheprobabilityofobservingeahlass,givenaninstane

x ∈ X

.This

estimate is done seleting the lass whih maximizes

P(y ˆ j |x)

(with

y j ∈ Y

)

amongallpossiblelasses.Theweakertheprobabilityto observethepredited

lass,themorepreditionisonsideredunertain.Thisstrategyofativelearning

seletsunlabelledexampleswhihmaximize theunertainty ofthemodel.The

unertaintyanbeexpressedasfollow:

U ncertain(x) = 1

argmax y j ∈ Y P ˆ (y j |x) x ∈ X

Sampling by risk redution Thepurpose of thisapproahis to redue the

generalizationerror,

E(M)

,ofthemodel[11℄.Ithoosesexamplestobelabelled

soastominimizethiserror.Inpratiethiserrorannotbealulatedbeause

thedistributionof instanes in

X

is unknown. Niholas Roy [11℄showshow to

bringthisstrategyinto playsinealltheelementsof

X

arenotknown.Heuses

auniformpriorfor

P (x)

whihgives:

E(M b t ) = 1

|L|

X |L|

i=1

Loss(M t , x i )

Inthis artile,oneestimatesthegeneralizationerror(

E(M)

)usingtheem-

pirialrisk[12℄givenby:

E(M) = ˆ R(M) = X |L|

i=1

X

y j ∈ Y

1

{f(x i )6=y j } P (y j |x i )P(x i )

where

f

is the model whih estimates theprobability that an examplebelong to alass,

P(y i |x i )

the realprobabilityto observethelass

y i

fortheexample

x i ∈ L

, 1the indiatingfuntion equalto

1

if

f (x i ) 6= y i

andequal to

0

else.

Therefore

R(M)

is the sum of the probabilities that the model makes a bad deision on the training set (

L

).Using a uniform prior to estimate

P (x i )

, one

anwrite :

R(M) = ˆ 1

|L|

X |L|

i=1

X

y j ∈ Y

1

{f(x i )6=y j } P(y ˆ j |x i )

In order to selet examples, the model is re-trained several times onsidering

onemoretive example.Eah instane

x ∈ U

and eah label

y j ∈ Y

anbe

assoiatedtoonstitutethissupplementaryexample.Theexpetedostforany

singleexample

x ∈ U

whih isaddedtothetrainingsetisthen:

R(M ˆ +x ) = X

y j ∈ Y

P(y ˆ j |x) ˆ R(M +(x,y j ) ) with x ∈ U

(5)

ThereaderanseeathirdmainstrategywhihisbasedonQuerybyCommittee

[13℄andafourthonewhereauthorsfousonamodelapproahtoativelearning

in aversion-spaeofonepts[14,15℄.

3 Number of labelled examples at every iteration

Inpratie,thenumberoflabelledexamplesateveryiteration(noted

n

)ishosen

inanarbitraryway.Nevertheless,thisparameterinuenestheimplementation

of an ative learning strategy. To understand the stakes of this problem, let

us onsider both extreme situations. On the one hand the omputation time

neessary for the examples seletion "explodes", labelling a single example at

eahiteration.Inthisase,theappliationofativelearningstrategiestolarge

databasesbeomesproblemati.Thewaitingtimetopresentanexampletothe

human expert is too long and beomes unreasonable. On the other hand the

ontribution of an ativelearningstrategy dereases, labellinga largenumber

of examplesat everyiteration. The regulationof theparameter

n

anbeseen

in anintuitivewayastheresearhfor aompromisebetweentheomputation

timeandtheeienyofanativelearningstrategy.

Sine the purpose here is to measure the inuene of the value on

n

. The

experimentswere arriedoutonseverallassiationproblems,using thesame

model andthetwostrategiesdened inprevioussetion.

3.1 Evaluationriteria

The riterion whih is used to estimate model performanesis the areaunder

ROCurve[16℄(AUC).ROCurvesareusuallybuiltonsideringasinglelass.

Consequently,onehandlesasmanyROCurvestherearelasses.TobuildROC

urvesina

m

lassesproblem,oneonsidersameta-lass

Y 1 = y i

(whihisthe

target),otherslassesonstitutetheseondmeta-lass

Y 2 = S m

j=1 , j6=i y j

.AUC

is alulated for eah ROCurve,and the global performane of themodel is

estimatedbythemathematialexpetedvalueofAUC,overalllasses:

AU C global = X |Y|

i=1

P (y i ).AU C(y i )

(1)

AUC an be seen as a proportion of the spae in whih ROC urves are

dened. This area is equal to

1

if the model is perfet and is equal to

1 2

for

random models. AUC has interesting statistial properties. It orresponds to

theprobabilitythatthemodelattributesamoreimportantsore,toaninstane

belongingtothegoodlass, thananinstaneofanotherlass[16℄.

3.2 Protool

Beforehand, data is normalizedusing mean and variane. At thebeginning of

(6)

randomlyhosenamongavailabledata.Ateahiteration,

n

examplesaredrawn

in the dataset to belabelled andadded to thetrainingset. The rstseries of

experimentationadds1exampleateahiterationusinganativestrategy.Then

fourother seriesofexperimentationarerepeatedbyinreasing,everytime,the

numberofaddedexamples;thequantityofinformationbringtothemodel(

n

=1,

4,8,16).

ThelassierisaParzenwindowwhihusesaGaussiankernel(

σ

,theparam-

eterofthekernelisadjustedusingarossvalidationasin[17℄).Eahexperiment

hasbeendonetentimesinordertoobtainanaverageandavariane,forevery

pointoftheresulturves.

3.3 Used model

The large range of models whih are able to solvelassiation problems and

sometimes the great numberof parametersuseful to use them, may represent

diulties tomeasuretheontributionofalearningstrategy.

AParzenwindow,withaGaussiankernel[17℄,isusedinexperimentsbelow

sinethispreditivemodelusesasingleparameterandisabletoworkwithfew

examples.Theoutputofthismodelisanestimateoftheprobabilitytoobserve

thelabel

y j

onditionallytotheinstane

u

:

P ˆ (y j |u) = P N

n=1

1

{f(l n )=y j } K(u, l n ) P N

n=1 K(u, l n ) with l n , ∈ L x and u ∈ U x ∪ L x

(2)

where

K(u, l n ) = e

||u−ln||2 2σ2

First,a Parzen window hasbeenrealized using all training exampleto es-

timate ifthis model is able to solve the problem. For thethree databasesthe

answerhas been positive (a good valueon theAUC hasbeenobtained). Con-

sequently,Parzenwindowsareonsidered satisfyingandvalidfor thefollowing

ativelearningproedureswithregardstheinueneof

n

.

Theoptimalvalueofthekernelparameterwasfoundusingaross-validation

ontheaveragequadratierror,usingallavailabletrainingdata[17℄.Thereafter,

thisvalueisusedtoxtheParzenwindowparameter.Sinethesingleparameter

of theParzenwindowis xed, thetrainingstage isreduedto ountinstanes

(withinthesupportoftheGaussiankernel).Thestrategiesofexamplesseletion

arethusomparable,withoutbeinginuenedbythetrainingofthemodel.

4 Experimentations

4.1 Database

Three publi data sets whih ome from the "UCI repository" (http://www.

(7)

termsoftheiroxideontent(i.e.Na,Fe,K,et).Allattributesarenumeri-

valued. This data set inludes 214 instanes (Train: 146,Test: 68) hara-

terizedby9attributeswhih areontinuouslyvalued.The6lassesarethe

type of glass. The parzen window lassies an example to one of these 6

lasses.

IrisPlantDatabase:Thedatasetontains3lassesof50instaneseah,

where eah lass refers to atype of iris plant. This data set inludes 150

owers(Train:90,Test:60)desribedby4attributeswhihareontinuously

valued.Theparzenwindowlassiesanexampletooneofthese3lasses.

ImagesegmentationDatabase:Theinstanesweredrawnrandomlyfrom

adatabaseof7outdoorimages.Theimageswerehandsegmentedtoreate

alassiationforeverypixel. This databaseinludes 2310images hara-

terizedby9pixels(Train:310,Test:2000).Theparzenwindowlassiesan

exampletooneofthese7lasses

Forthethreedatasets, whihontainmorethantwolasses,theperformanes

areevaluatedusingequation1.

4.2 Results

Figures1,2,3showobtainedresultsonthethreedatasets.Oneverygure:(i)

fromuptodownandlefttoright:1,4,8or16examplesaddedateahiterationof

theativealgorithm;(ii)oneahsub-gurehorizontalandvertialaxisrepresent

respetively the number of examples labelled used and the AUC (see setion

3.1). Oneah urvetest resultsusingsampling basedonunertainty, sampling

basedonriskredutionandrandomsamplingareplotted versusthenumberof

exampleslabelledinthetrainingset.Thenathesrepresentthevarianeofthe

results(

±2σ

).ResultsonAUCshowthat,onthesethreedatasetsitisdiult

topointto astrategy.Ifweonsiderthatadding:

oneexampleateveryiteration:theunertainstrategywinsonGlassbutthe

riskstrategywinsontheothersdatasets.

fourexamples at everyiteration:the unertainstrategywins on Glassbut

therisk strategy and therandom strategy share the suess onthe others

datasets.

eight orsixteen examplesat everyiteration: the random strategy wins on

thethreedata sets.

Byinreasingthenumberof exampleslabelledat eahiteration,the ative

strategies are less and less ompetitive ompared to the random strategy. We

notieeahtimethat:(i)theresultsdonotlookso dierentfordierentbath

sizes (but ative strategies allow to obtain the optimal AUC with a smaller

numberofexamples)(ii)therandomstrategybeomesmorepowerfulthanthe

twoativestrategieswhen

n

beomeslarge,partiularlyforn

8.

(8)

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9

0 20 40 60 80 100 120 140

Random Uncertainty Risk

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9

0 20 40 60 80 100 120 140

Random Uncertainty Risk

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9

0 20 40 60 80 100 120 140

Random Uncertainty

Risk 0.5

0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9

0 20 40 60 80 100 120 140

Random Uncertainty Risk

Fig.1.ResultsonthedatasetGlass

0.5 0.6 0.7 0.8 0.9 1

0 10 20 30 40 50 60 70 80

Random Uncertainty Risk

0.5 0.6 0.7 0.8 0.9 1

0 10 20 30 40 50 60 70 80

Random Uncertainty Risk

0.5 0.6 0.7 0.8 0.9 1

0 10 20 30 40 50 60 70 80

Random Uncertainty

Risk 0.5

0.6 0.7 0.8 0.9 1

0 10 20 30 40 50 60 70 80

Random Uncertainty Risk

Fig.2.ResultsonthedatasetIris

(9)

0.86 0.88 0.9 0.92 0.94 0.96 0.98

0 50 100 150 200 250 300

Random Uncertainty

Risk 0.86

0.88 0.9 0.92 0.94 0.96 0.98

0 50 100 150 200 250 300

Random Uncertainty Risk

0.86 0.88 0.9 0.92 0.94 0.96 0.98

0 50 100 150 200 250 300

Random Uncertainty

Risk 0.86

0.88 0.9 0.92 0.94 0.96 0.98

0 50 100 150 200 250 300

Random Uncertainty Risk

Fig.3.ResultsonthedatasetSegment

5 Conlusion and future works

Theobtainedresultsshowthatthenumberoflabelledexamplesateahiteration

of a proedure of ative learninginuenes the quality of the involvedmodel.

Theexperimentswhihwerearriedoutonrmtheintuitionthattheontribu-

tionofanativestrategy(relativelytotherandomstrategy)dereaseswhenone

inreasesthenumberoflabelledexamples.Methodswhihallowtobuyateah

iterationthesamenumber(n>1)numberoflabelsexist[18,3℄totrytoinorpo-

rateapartofexploration.Buttoourknowledgenoneoptimizesthevalueof

N

into eahiteration(whihhoose avariable numberof examplesand/orwhih

selet"pakages"ofexamplesin anoptimalway). Weareurrentlyinterested

onthissubjet:forexampletheoneptof"trajetory"ofamodelinthespae

ofthedeisionsithastotakeduringitstraining.

Theelaborationofariterion(

EW

)theevaluation(whihmeasurestheon-

tributionofastrategyomparedtotherandomstrategyonthewholedataset)

should beinteresting:theperformane riterionused antakeseveral dierent

waysaordingtotheproblem. Thistypeofurveallowsonlyomparisonsbe-

tweenstrategiesinapuntualway,i.e.forapointontheurve(agivennumber

oftrainingexamples). Iftwourvespasseahother,itisverydiulttodeter-

mineifastrategyisbetterthananother(onthetotalsetoftrainingexamples).

Thispointwillbedisussedinafuturepaper.

Finally we note that the maximal number of examples to labelled, or an

(10)

ofagood riterionshouldbeindependentofthemodel andofatestset andit

isanotherwayoffuture works.

Referenes

1. Nguyen,H.T.,Smeulders,A.:Ativelearningusingpre-lustering.In:International

ConfereneonMahineLearning(ICML).(2003)

2. Gosselin,P.H.,Cord,M.: Ativelearning tehniquesfor userinterativesystems

:appliationtoimageretrieval. In:InternationalWorkshoponMahineLearning

forMultiMedia (InonjontionwithICML).(2005)

3. Brinker, K.: Inorporating diversity in ative learning withsupport vetor ma-

hines. In:InternationalConfereneonMahineLearning(ICML).(2003)5966

4. Culver, M., Kun,D., Sott, S.: Ativelearning to maximize area underthe ro

urve.In:InternationalConfereneonDataMining(ICDM).(2006)

5. Thrun, S.: Exploration inative learning. In:to appear in: Handbookof Brain

SieneandNeuralNetworks. MihaelArbib(2007)

6. Osugi, T., Kun, D., Sott, S.: Balaning exploration and exploitation: A new

algorithmforativemahinelearning.In:InternationalConfereneonDataMining

(ICDM).(2005)

7. Bondu,A.,Lemaire, V.,Poulain,B.: Ativelearning strategies: aase studyfor

detetionofemotionsinspeeh.In:IndustrialConfereneofDataMining(ICDM),

Leipzig(2007)

8. Castro,R., Willett, R.,Nowak, R.: Faster rateinregressionvia ative learning.

In:NeuralInformationProessing Systems(NIPS),Vanouver(2005)

9. Muslea,I.:AtiveLearningWithMultipleView.Phdthesis,Universityofsouthern

alifornia (2002)

10. Thrun,S.B.,Möller,K.: Ativeexplorationindynamienvironments. InMoody,

J.E.,Hanson,S.J.,Lippmann,R.P.,eds.:AdvanesinNeuralInformationProess-

ingSystems.Volume4.,MorganKaufmannPublishers,In.(1992)531538

11. Roy,N.,MCallum,A.: Towardoptimalativelearningthroughsamplingestima-

tionoferrorredution.In:InternationalConfereneonMahineLearning(ICML),

MorganKaufmann,SanFraniso,CA(2001)441448

12. Zhu, X., Laerty, J., Ghahramani, Z.: Combining ative learning and semi-

supervisedlearningusinggaussianeldsandharmonifuntions.In:International

ConfereneonMahineLearning(ICML),Washington(2003)

13. Freund,Y.,Seung,H.S.,Shamir,E.,Tishby,N.:Seletivesamplingusingthequery

byommitteealgorithm. MahineLearning28(2-3)(1997)133168

14. Dasgupta,S.: Analysisofgreedyativelearningstrategy. In:NeuralInformation

Proessing Systems(NIPS),SanDiego(2005)

15. Cohn,D.A.,Atlas,L.,Ladner,R.E.:Improvinggeneralizationwithativelearning.

MahineLearning15(2)(1994)201221

16. Fawett, T.: Ro graphs:Notes andpratial onsiderations for dataminingre-

searhers. TehnialReportHPL-2003-4,HPLabs(2003)

17. Chappelle,O.: Ativelearningfor parzenwindows lassier. In:AI&Statistis,

Barbados(2005)4956

18. Lindenbaun,M.,Markovith,S.,Rusakov,D.:Seletivesamplingfornearestneigh-

borlassiers. MahineLearning(2004)

Références

Documents relatifs

The novelty lies in the development of network traffic analysis algorithms based on a neural network, differing in the method of selection, generation and selection of features,

Although the performance of Additive Regression on its own is worse when compared directly with other two ensemble techniques such as Random Forest and Random Subspace in table

Gadet, ibid, « la totalité des faits de langage relève de la matière, mais la linguistique va se donner pour objet la langue comme système formel »p73. Saussure, Cours de

Construit pour les voyages rapides, tout de puissance, de prestance et de race, maître-joyau du genre, le nouveau SPIDER 2000 2 + 2 places est seul à concilier les

Within the framework of learning strategies and CSCL, the present work intends to investigate the learning strategies used by students of vocational schools, in both

The method employed for learning from unlabeled data must not have been very effective because the results at the beginning of the learning curves are quite bad on some

Ative learning methods allow the model to interat with its environment.. by seleting the more

In this paper, we will investigate whether the approximated optimal distributions of µ points regarding R2 based on different preference articulations [6] can be accurately