"Purchase of data labels by batches: study of the impact on the planning of two active learning strategies",

(1)

impat on the planning of two ative learning

strategies

V.Lemaire,A.Bondu,andF.Clérot

FraneTéléomR&DLannion,TECH/EASY/TSI

http://perso.rd.franeteleom.fr /lema ire

vinent.lemaireorange-ftgrou p.om

Abstrat. Ativemahinelearningalgorithmsareusedwhenlargenum-

bersofunlabelledexamplesareavailableandgetting labels forthemis

ostly(e.g.requiresahumanexpert).Ativelearningmethodsseletex-

amplestobuildatrainingsetforapreditivemodelandaimatthemost

informativeexamples.Thenumberofexamplestobelabelledateahit-

erationoftheativestrategyis,mostoften,randomlyhosenorxedto

one.Howeverinpratial situations,thisnumberis aparameterwhih

inuenestheperformaneoftheativestrategy.Thispaperstudiesthe

inueneofthisparameterontwoativelearningstrategies.

1 Introdution

Mahinelearningonsistsofmethodsandalgorithmswhihlearnbehaviortoa

preditivemodel,usingtrainingexamples.Passivelearningstrategiesuseexam-

pleswhih arerandomlyhosen.Ativelearningstrategiesallowthepreditive

model to onstruts its training set in interation with a human expert. The

learningstarts with few labelled examples.Then, the model selets theexam-

pleswithnolabelwhihitonsidersthemostinformativeandaskstheirdesired

assoiatedoutputs to thehuman expert. Themodel learnsfaster using ative

learningstrategies,reahingthebestperformanesusinglessdata.Ativelearn-

ingismorespeiallyattrativeforappliationsforwhihdataisexpensiveto

obtainortolabel.

Ativelearningstrategiesarealsousefulonnewproblem,forinstanelas-

siationproblemwhereinformativeexamplesorinformativedataareunknown.

Thequestionishowto obtaintheinformation requiredto solvethisnewprob-

lem? Anoperationalplanningofanativealgorithm appliedonanewlassi-

ationproblem ouldbedened astheadditiononindividualost, individual

step,whih allowto athinformationtosolvethisnewproblem:

(

I

⁾^aninitialisation:whih,howandhowmanylabelshavetobebuyatthe beginning(before therstlearning)).

(

P P

⁾^apre-partition[1℄;

(2)

•

⁽

P S

⁾^apre-seletion[2℄;

•

⁽

D

⁾^adiversiation[3℄;

•

⁽

B

⁾^the^purhase^of

N

^example(s)(ustomarily

N = 1

⁾

•

⁽

E

⁾^theîterationêvâluation ^[4℄;

(

M

⁾^the^model^used.

Planningthepurhaseofnewexamples(perpakages) isaompromise(

C

⁾

betweenthesedierentstepswhihinludethedilemmabetweenexploration[5℄

andexploitation[6℄,suhthat:

C = EW (α 1 I + α 2 P P + α 3 P S + α 4 D + α 5 B + α 5 E + α 6 M )

where

EW

îs ^theêvaluationôf ^theôverall ^proedure.^The ^qualityôf ân âtive

strategyisusuallyrepresentedbyaurveassessingtheperformaneofthemodel

versusthenumberoftrainingexampleslabelled.

Fortheoneptionofanautomatishuntingsystem(forphoneservers)whih

takesintoaountemotionsinspeeh[7℄(ournewproblem)thisapproahan

beused. Inthisase,data isomposed byturn ofspeeh whihare exhanged

betweenusersandthemahine.Eahpieeofdatahastobelistenedbyahuman

expert to be labelled as ontaining (or not) negative emotions. The purpose

of ative strategies whih are onsidered in this artile is to selet the most

informativeunlabelledexamples.These approahesminimizethelabellingost

indutedbythetrainingofapreditivemodel.Fortheoneptionofautomati

shuntingsystem(forphoneservers),whihtakesintoaountemotionsinspeeh,

ourorpusontainsmorethan100000turnsofspeeh.Thereforetheoperational

planningisveryimportant.

Twomain ative learning strategies are used in the literature (see setion

2). We suspet that suh ative learners are good for exploitation (labelling

examplesneartheboundarytoreneit),buttheydonotondutexploration

(searhingforlargeareasintheinstanespaethattheywouldinorretlylas-

sify); evenworsethantherandomsampling whenlabelsare boughtbypaket.

Onewayto examinethe"exploration"behaviorof thesetwomain strategiesis

tobuymorethanonelabelateveryiteration(the weight of

α 5

^above),^this^is

thepurposeofthispaper.

2 Ative Learning

2.1 Notations

M ∈ M

îs^the^preditive^model^whih îs^trainedûsingânâlgorithm

L

^.

X ⊆ R ⁿ

representsallthepossibleinputexamplesofthemodeland

x ∈ X

^is^a^partiular

example.

Y

îs^the ^set ôf^the^possibleôutputs ôf^the ^model;

y ∈ Y

^a^lass ^label

relatedto

x ∈ X

^.

Duringitstraining,themodelobservesonlyonepart

Φ ⊆ X

^of^the^universe.

The set of examples is limited and the assoiated labels are not neessarily

(3)

trainingalgorithm)isalled

L x

ând^the^setôfêxamples^for^whih^the^labelsâre

unknownisalled

U x

^with

Φ = U x ∪ L x

^and

U x ∩ L x ≡ ∅

^.

Theoneptwhihislearnedanbeseenasafuntion,

f : X → Y

^,^with

f (x 1 )

isthedesiredanswerofthemodelfortheexample

x 1

^and

f b : X → Y

^the^answer

obtainedofthemodel;anestimationoftheonept.Theelementsof

L x

^and^the

assoiatedlabelsonstituteatrainingset

T

^.^The^trainingêxamplesâre^pairsôf

inputvetorsanddesiredlabelssuhas

(x, f (x))

^:

∀x ∈ L x , ∃(x, f (x)) ∈ T

^.

2.2 Ative Learning Methods

Introdution The point of view of seletive sampling is adopted [8℄ in this

artile. The model observes only one restrited part of the universe

Φ ⊆ X

whihismaterializedbytrainingexampleswithoutlabel.Theimageof abag

ontaining instanes for whih the model an ask assoiated labels is usually

usedto desribethisapproah.

Considering:

• M

â^preditive^model^provided^withâ^trainingâlgorithm

L

• U x

^and

L x

^the^sets^of^examplesrespetivelynotlabelledandlabelled

• n

^the^desired^number^of^training^examples

• T

^the^training^set^with

kT k < n

• U : X × M → ℜ

^the^funtion^whihêstimates^theûtilityôfânêxample^for

thetrainingofthemodel

Repeat

(A)Trainthemodel

M

^using

L

^and

T

^(and^possibly

U ^x

^).

(B)Findtheexamplesuhthat

q = argmax u∈U _x U(u, M)

(C)Withdraw

q

^from

U x

andaskthelabel

f(q)

^from^the^expert.

(D)Add

q

^to

L x

^and^add

(q, f (q))

^to

T

until

kT k < n

Algorithm1:Seletivesampling, Muslea2002

The problem of seletive sampling was posed formally by Muslea [9℄ (see

Algorithm1).Itusesautilityfuntion,

U tility(u, M)

^,^whih^estimates^the^utility

ofanexample

u

^for^the^training^of^the^model

M

^.^Using^this^funtion,^the^model

seletsexamplesforwhihithopesthegreatestimprovementofitsperformanes,

andshowstheseexamplestotheexpert.

TheAlgorithm1is generiinsofarasonlythefuntion

U tility(u, M)

^must

bemodiedtoexpressapartiularativelearningstrategy.Howtomeasurethe

(4)

the ondene that themodel has in its preditions. Themodel must beable

to produe anoutput and to estimatetherelevane ofits answers.Themodel

estimatestheprobabilityofobservingeahlass,givenaninstane

x ∈ X

^.^This

estimate is done seleting the lass whih maximizes

P(y ˆ j |x)

^(with

y j ∈ Y

⁾

amongallpossiblelasses.Theweakertheprobabilityto observethepredited

lass,themorepreditionisonsideredunertain.Thisstrategyofativelearning

seletsunlabelledexampleswhihmaximize theunertainty ofthemodel.The

unertaintyanbeexpressedasfollow:

U ncertain(x) = 1

argmax y j ∈ Y P ˆ (y j |x) x ∈ X

Sampling by risk redution Thepurpose of thisapproahis to redue the

generalizationerror,

E(M)

^,ôf^the^model^[11℄.Ît^hoosesêxamples^to^be^labelled

soastominimizethiserror.Inpratiethiserrorannotbealulatedbeause

thedistributionof instanes in

X

^is ^unknown. ^Niholas ^Roy ^[11℄^shows^how ^to

bringthisstrategyinto playsinealltheelementsof

X

^are^not^known.^He^uses

auniformpriorfor

P (x)

^whih^gives^:

E(M b ^t ) = 1

|L|

X |L|

i=1

Loss(M ^t , x i )

Inthis artile,oneestimatesthegeneralizationerror(

E(M)

⁾^using^the^em-

pirialrisk[12℄givenby:

E(M) = ˆ R(M) = X |L|

i=1

X

y _j ∈ Y

1

{f(x i )6=y j } P (y j |x i )P(x i )

where

f

^is ^the ^model ^whih ^estimates ^theprobability that an examplebelong to alass,

P(y i |x i )

^the ^realprobabilityto observethelass

y i

^for^the^example

x i ∈ L

^, ¹^the ^indiating^funtion ^equal^to

1

^if

f (x i ) 6= y i

^and^equal ^to

0

^else.

Therefore

R(M)

^is ^the ^sum ^of ^the probabilities that the model makes a bad deision on the training set (

L

^).Using â ûniform ^prior ^to êstimate

P (x i )

^, ^one

anwrite :

R(M) = ˆ 1

|L|

X |L|

i=1

X

y _j ∈ Y

1

{f(x i )6=y j } P(y ˆ j |x i )

In order to selet examples, the model is re-trained several times onsidering

onemoretive example.Eah instane

x ∈ U

^and ^eah ^label

y j ∈ Y

^an^be

assoiatedtoonstitutethissupplementaryexample.Theexpetedostforany

singleexample

x ∈ U

^whih îsâdded^to^the^training^setîs^then:

R(M ˆ ^+x ) = X

y j ∈ Y

P(y ˆ j |x) ˆ R(M ^+(x,y ^j ⁾ ) with x ∈ U

(5)

ThereaderanseeathirdmainstrategywhihisbasedonQuerybyCommittee

[13℄andafourthonewhereauthorsfousonamodelapproahtoativelearning

in aversion-spaeofonepts[14,15℄.

3 Number of labelled examples at every iteration

Inpratie,thenumberoflabelledexamplesateveryiteration(noted

n

⁾^is^hosen

inanarbitraryway.Nevertheless,thisparameterinuenestheimplementation

of an ative learning strategy. To understand the stakes of this problem, let

us onsider both extreme situations. On the one hand the omputation time

neessary for the examples seletion "explodes", labelling a single example at

eahiteration.Inthisase,theappliationofativelearningstrategiestolarge

databasesbeomesproblemati.Thewaitingtimetopresentanexampletothe

human expert is too long and beomes unreasonable. On the other hand the

ontribution of an ativelearningstrategy dereases, labellinga largenumber

of examplesat everyiteration. The regulationof theparameter

n

^an^be^seen

in anintuitivewayastheresearhfor aompromisebetweentheomputation

timeandtheeienyofanativelearningstrategy.

Sine the purpose here is to measure the inuene of the value on

n

^. ^The

experimentswere arriedoutonseverallassiationproblems,using thesame

model andthetwostrategiesdened inprevioussetion.

3.1 Evaluationriteria

The riterion whih is used to estimate model performanesis the areaunder

ROCurve[16℄(AUC).ROCurvesareusuallybuiltonsideringasinglelass.

Consequently,onehandlesasmanyROCurvestherearelasses.TobuildROC

urvesina

m

^lasses^problem,ôneônsidersâ^meta-lass

Y 1 = y i

^(whih^is^the

target),otherslassesonstitutetheseondmeta-lass

Y 2 = S m

j=1 , j6=i y j

^.^AUC

is alulated for eah ROCurve,and the global performane of themodel is

estimatedbythemathematialexpetedvalueofAUC,overalllasses:

AU C global = X |Y|

i=1

P (y i ).AU C(y i )

⁽¹⁾

AUC an be seen as a proportion of the spae in whih ROC urves are

dened. This area is equal to

1

îf ^the ^model îs ^perfet ând îs êqual ^to

¹ ₂

^for

random models. AUC has interesting statistial properties. It orresponds to

theprobabilitythatthemodelattributesamoreimportantsore,toaninstane

belongingtothegoodlass, thananinstaneofanotherlass[16℄.

3.2 Protool

Beforehand, data is normalizedusing mean and variane. At thebeginning of

(6)

randomlyhosenamongavailabledata.Ateahiteration,

n

^examples^are^drawn

in the dataset to belabelled andadded to thetrainingset. The rstseries of

experimentationadds1exampleateahiterationusinganativestrategy.Then

fourother seriesofexperimentationarerepeatedbyinreasing,everytime,the

numberofaddedexamples;thequantityofinformationbringtothemodel(

n

^=1,

4,8,16).

ThelassierisaParzenwindowwhihusesaGaussiankernel(

σ

^,^the^param-

eterofthekernelisadjustedusingarossvalidationasin[17℄).Eahexperiment

hasbeendonetentimesinordertoobtainanaverageandavariane,forevery

pointoftheresulturves.

3.3 Used model

The large range of models whih are able to solvelassiation problems and

sometimes the great numberof parametersuseful to use them, may represent

diulties tomeasuretheontributionofalearningstrategy.

AParzenwindow,withaGaussiankernel[17℄,isusedinexperimentsbelow

sinethispreditivemodelusesasingleparameterandisabletoworkwithfew

examples.Theoutputofthismodelisanestimateoftheprobabilitytoobserve

thelabel

y j

onditionallytotheinstane

u

^:

P ˆ (y j |u) = P N

n=1

¹

{f(l n )=y j } K(u, l n ) P N

n=1 K(u, l n ) with l n , ∈ L x and u ∈ U x ∪ L x

⁽²⁾

where

K(u, l n ) = e

||u−ln||2 2σ2

First,a Parzen window hasbeenrealized using all training exampleto es-

timate ifthis model is able to solve the problem. For thethree databasesthe

answerhas been positive (a good valueon theAUC hasbeenobtained). Con-

sequently,Parzenwindowsareonsidered satisfyingandvalidfor thefollowing

ativelearningproedureswithregardstheinueneof

n

^.

Theoptimalvalueofthekernelparameterwasfoundusingaross-validation

ontheaveragequadratierror,usingallavailabletrainingdata[17℄.Thereafter,

thisvalueisusedtoxtheParzenwindowparameter.Sinethesingleparameter

of theParzenwindowis xed, thetrainingstage isreduedto ountinstanes

(withinthesupportoftheGaussiankernel).Thestrategiesofexamplesseletion

arethusomparable,withoutbeinginuenedbythetrainingofthemodel.

4 Experimentations

4.1 Database

Three publi data sets whih ome from the "UCI repository" (http://www.

(7)

termsoftheiroxideontent(i.e.Na,Fe,K,et).Allattributesarenumeri-

valued. This data set inludes 214 instanes (Train: 146,Test: 68) hara-

terizedby9attributeswhih areontinuouslyvalued.The6lassesarethe

type of glass. The parzen window lassies an example to one of these 6

lasses.

IrisPlantDatabase:Thedatasetontains3lassesof50instaneseah,

where eah lass refers to atype of iris plant. This data set inludes 150

owers(Train:90,Test:60)desribedby4attributeswhihareontinuously

valued.Theparzenwindowlassiesanexampletooneofthese3lasses.

ImagesegmentationDatabase:Theinstanesweredrawnrandomlyfrom

adatabaseof7outdoorimages.Theimageswerehandsegmentedtoreate

alassiationforeverypixel. This databaseinludes 2310images hara-

terizedby9pixels(Train:310,Test:2000).Theparzenwindowlassiesan

exampletooneofthese7lasses

Forthethreedatasets, whihontainmorethantwolasses,theperformanes

areevaluatedusingequation1.

4.2 Results

Figures1,2,3showobtainedresultsonthethreedatasets.Oneverygure:(i)

fromuptodownandlefttoright:1,4,8or16examplesaddedateahiterationof

theativealgorithm;(ii)oneahsub-gurehorizontalandvertialaxisrepresent

respetively the number of examples labelled used and the AUC (see setion

3.1). Oneah urvetest resultsusingsampling basedonunertainty, sampling

basedonriskredutionandrandomsamplingareplotted versusthenumberof

exampleslabelledinthetrainingset.Thenathesrepresentthevarianeofthe

results(

±2σ

^).^ResultsônÂUC^show^that,ôn^these^three^data^setsîtîs^diult

topointto astrategy.Ifweonsiderthatadding:

oneexampleateveryiteration:theunertainstrategywinsonGlassbutthe

riskstrategywinsontheothersdatasets.

fourexamples at everyiteration:the unertainstrategywins on Glassbut

therisk strategy and therandom strategy share the suess onthe others

datasets.

eight orsixteen examplesat everyiteration: the random strategy wins on

thethreedata sets.

Byinreasingthenumberof exampleslabelledat eahiteration,the ative

strategies are less and less ompetitive ompared to the random strategy. We

notieeahtimethat:(i)theresultsdonotlookso dierentfordierentbath

sizes (but ative strategies allow to obtain the optimal AUC with a smaller

numberofexamples)(ii)therandomstrategybeomesmorepowerfulthanthe

twoativestrategieswhen

n

^beomes^large,partiularlyforn

≥

^8.

(8)

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9

0 20 40 60 80 100 120 140

Random Uncertainty Risk

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9

0 20 40 60 80 100 120 140

Random Uncertainty Risk

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9

0 20 40 60 80 100 120 140

Random Uncertainty

Risk 0.5

0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9

0 20 40 60 80 100 120 140

Random Uncertainty Risk

Fig.1.ResultsonthedatasetGlass

0.5 0.6 0.7 0.8 0.9 1

0 10 20 30 40 50 60 70 80

Random Uncertainty Risk

0.5 0.6 0.7 0.8 0.9 1

0 10 20 30 40 50 60 70 80

Random Uncertainty Risk

0.5 0.6 0.7 0.8 0.9 1

0 10 20 30 40 50 60 70 80

Random Uncertainty

Risk 0.5

0.6 0.7 0.8 0.9 1

0 10 20 30 40 50 60 70 80

Random Uncertainty Risk

Fig.2.ResultsonthedatasetIris

(9)

0.86 0.88 0.9 0.92 0.94 0.96 0.98

0 50 100 150 200 250 300

Random Uncertainty

Risk 0.86

0.88 0.9 0.92 0.94 0.96 0.98

0 50 100 150 200 250 300

Random Uncertainty Risk

0.86 0.88 0.9 0.92 0.94 0.96 0.98

0 50 100 150 200 250 300

Random Uncertainty

Risk 0.86

0.88 0.9 0.92 0.94 0.96 0.98

0 50 100 150 200 250 300

Random Uncertainty Risk

Fig.3.ResultsonthedatasetSegment

5 Conlusion and future works

Theobtainedresultsshowthatthenumberoflabelledexamplesateahiteration

of a proedure of ative learninginuenes the quality of the involvedmodel.

Theexperimentswhihwerearriedoutonrmtheintuitionthattheontribu-

tionofanativestrategy(relativelytotherandomstrategy)dereaseswhenone

inreasesthenumberoflabelledexamples.Methodswhihallowtobuyateah

iterationthesamenumber(n>1)numberoflabelsexist[18,3℄totrytoinorpo-

rateapartofexploration.Buttoourknowledgenoneoptimizesthevalueof

N

into eahiteration(whihhoose avariable numberof examplesand/orwhih

selet"pakages"ofexamplesin anoptimalway). Weareurrentlyinterested

onthissubjet:forexampletheoneptof"trajetory"ofamodelinthespae

ofthedeisionsithastotakeduringitstraining.

Theelaborationofariterion(

EW

⁾^the^evaluation^(whih^measures^the^on-

tributionofastrategyomparedtotherandomstrategyonthewholedataset)

should beinteresting:theperformane riterionused antakeseveral dierent

waysaordingtotheproblem. Thistypeofurveallowsonlyomparisonsbe-

tweenstrategiesinapuntualway,i.e.forapointontheurve(agivennumber

oftrainingexamples). Iftwourvespasseahother,itisverydiulttodeter-

mineifastrategyisbetterthananother(onthetotalsetoftrainingexamples).

Thispointwillbedisussedinafuturepaper.

Finally we note that the maximal number of examples to labelled, or an

(10)

ofagood riterionshouldbeindependentofthemodel andofatestset andit

isanotherwayoffuture works.

Referenes

1. Nguyen,H.T.,Smeulders,A.:Ativelearningusingpre-lustering.In:International

ConfereneonMahineLearning(ICML).(2003)

2. Gosselin,P.H.,Cord,M.: Ativelearning tehniquesfor userinterativesystems

:appliationtoimageretrieval. In:InternationalWorkshoponMahineLearning

forMultiMedia (InonjontionwithICML).(2005)

3. Brinker, K.: Inorporating diversity in ative learning withsupport vetor ma-

hines. In:InternationalConfereneonMahineLearning(ICML).(2003)5966

4. Culver, M., Kun,D., Sott, S.: Ativelearning to maximize area underthe ro

urve.In:InternationalConfereneonDataMining(ICDM).(2006)

5. Thrun, S.: Exploration inative learning. In:to appear in: Handbookof Brain

SieneandNeuralNetworks. MihaelArbib(2007)

6. Osugi, T., Kun, D., Sott, S.: Balaning exploration and exploitation: A new

algorithmforativemahinelearning.In:InternationalConfereneonDataMining

(ICDM).(2005)

7. Bondu,A.,Lemaire, V.,Poulain,B.: Ativelearning strategies: aase studyfor

detetionofemotionsinspeeh.In:IndustrialConfereneofDataMining(ICDM),

Leipzig(2007)

8. Castro,R., Willett, R.,Nowak, R.: Faster rateinregressionvia ative learning.

In:NeuralInformationProessing Systems(NIPS),Vanouver(2005)

9. Muslea,I.:AtiveLearningWithMultipleView.Phdthesis,Universityofsouthern

alifornia (2002)

10. Thrun,S.B.,Möller,K.: Ativeexplorationindynamienvironments. InMoody,

J.E.,Hanson,S.J.,Lippmann,R.P.,eds.:AdvanesinNeuralInformationProess-

ingSystems.Volume4.,MorganKaufmannPublishers,In.(1992)531538

11. Roy,N.,MCallum,A.: Towardoptimalativelearningthroughsamplingestima-

tionoferrorredution.In:InternationalConfereneonMahineLearning(ICML),

MorganKaufmann,SanFraniso,CA(2001)441448

12. Zhu, X., Laerty, J., Ghahramani, Z.: Combining ative learning and semi-

supervisedlearningusinggaussianeldsandharmonifuntions.In:International

ConfereneonMahineLearning(ICML),Washington(2003)

13. Freund,Y.,Seung,H.S.,Shamir,E.,Tishby,N.:Seletivesamplingusingthequery

byommitteealgorithm. MahineLearning28(2-3)(1997)133168

14. Dasgupta,S.: Analysisofgreedyativelearningstrategy. In:NeuralInformation

Proessing Systems(NIPS),SanDiego(2005)

15. Cohn,D.A.,Atlas,L.,Ladner,R.E.:Improvinggeneralizationwithativelearning.

MahineLearning15(2)(1994)201221

16. Fawett, T.: Ro graphs:Notes andpratial onsiderations for dataminingre-

searhers. TehnialReportHPL-2003-4,HPLabs(2003)

17. Chappelle,O.: Ativelearningfor parzenwindows lassier. In:AI&Statistis,

Barbados(2005)4956

18. Lindenbaun,M.,Markovith,S.,Rusakov,D.:Seletivesamplingfornearestneigh-

borlassiers. MahineLearning(2004)

"Purchase of data labels by batches: study of the impact on the planning of two active learning strategies",

I

P P

•

P S

•

D

•

B

N

N = 1

•

E

M

C

C = EW (α 1 I + α 2 P P + α 3 P S + α 4 D + α 5 B + α 5 E + α 6 M )

EW

α 5

M ∈ M

L

X ⊆ R n

x ∈ X

Y

y ∈ Y

x ∈ X

Φ ⊆ X

L x

U x

Φ = U x ∪ L x

U x ∩ L x ≡ ∅

f : X → Y

f (x 1 )

x 1

f b : X → Y

L x

T

(x, f (x))

∀x ∈ L x , ∃(x, f (x)) ∈ T

Φ ⊆ X

• M

L

• U x

L x

• n

• T

kT k < n

• U : X × M → ℜ

M

L

T

U x

q = argmax u∈U x U(u, M)

q

U x

f(q)

q

L x

(q, f (q))

T

kT k < n

U tility(u, M)

u

M

U tility(u, M)

x ∈ X

P(y ˆ j |x)

y j ∈ Y

U ncertain(x) = 1

argmax y j ∈ Y P ˆ (y j |x) x ∈ X

E(M)

X

X

P (x)

E(M b t ) = 1

|L|

X |L|

i=1

Loss(M t , x i )

E(M)

E(M) = ˆ R(M) = X |L|

X ⊆ R ⁿ

U ^x

q = argmax u∈U _x U(u, M)

E(M b ^t ) = 1

Loss(M ^t , x i )

y _j ∈ Y

y _j ∈ Y

R(M ˆ ^+x ) = X

P(y ˆ j |x) ˆ R(M ^+(x,y ^j ⁾ ) with x ∈ U

¹ ₂