LIDIC - UNSL's Participation at eRisk 2017: Pilot Task on Early Detection of Depression

(1)

Pilot task on Early Detetion of Depression

Notebook for eRisk at CLEF 2017

Ma.PaulaVillegas

1

, DarioG.Funez

1

,

Ma.JoséGariarenaUelay

1

,LetiiaC.Cagnina

1 , 2

,andMareloL.Errealde

1

LIDICResearhGroup,

UniversidadNaionaldeSanLuis,Argentina

2

ConsejoNaionaldeInvestigaionesCientíasyTénias(CONICET)

{villegasmariapaula74,funezda rio,m jgar iare naue lay} gmai l.o m

{lagnina,merrealde}gmail.o m

Abstrat. InthispaperwedesribethepartiipationoftheLIDICRe-

searhGroupofUniversidadNaionaldeSanLuis(UNSL)-Argentinaat

eRisk2017pilottask.Themaingoalofthistaskisonsideringearlyrisk

detetion senarios, depression inthis ase, where the issue of getting

timely preditions with a reasonable ondene level beomes ritial.

In the pilot task, systems mustbe able to sequentially proess piees

of evidene and detet early traes of depression as soon as possible.

Theused data set is a olletion of writings labeled as depressed or

non-depressed thatwerereleasedtothepilottaskpartiipantsintwo

dierent stages:training andtest.Ourproposalfor thistaskwas based

onasemantirepresentationofdoumentsthatexpliitly onsidersthe

partialinformationthatismadeavailableinthedierenthunkstothe

earlyriskdetetion systemsalongthe time.Thattemporal approah

was omplemented with other standard text ategorization models in

some spei situations that seemed not to be orretly addressed by

ourapproahalone.Intheteststage,theresultingsystemobtainedthe

lowest

ERDE 50

êrrorônâ ^total ôf³⁰ submissionsfrom 8dierent in- stitutions.However,onethegolden-truth ofthetestingdatasetwas

released,weouldverifythatourtemporalapproahalonemighthave

obtained veryrobust results and the lowest reported

ERDE

^error ^for

boththresholdsusedinthepilottask.

Keywords:EarlyRiskDetetion, Unbalaned DataSets,TextRepre-

sentations,SemantiAnalysisTehniques.

1 Introdution

ThewidespreaduseofInternet,soialnetworksandotheromputertehnologies

hasmarkedthebeginningofaneweraintheommuniationamongpeople.In

thisontext,wehavenowavailablealot ofinformationthatmightbeusefulto

detet,inatimelyway,thosesituationsthatmightbepotentiallydangerousor

riskyforthephysial well-beingandhealthofindividuals,ommunitiesandso-

(2)

todepression,amongothers.

This type of situations have started to be studied in a new researh eld

known as early risk detetion (ERD) whih has reeived inreasingly interest

from sienti researhersat world level due to the important impat that it

ould have in many relevant and urrent problems of the real world. In this

ontext, thisyearwas organizedtherstearlyrisk preditiononfereneeRisk

2017 3

intheontextoftheCLEF2017Workshop.Aspartofthisevent,apilot

task was organized and the present artile desribes our partiipation in this

task.

TheeRisk 2017pilot task wasorganizedinto twodierentstages:training

andteststages.It alsoassumedanERDsenario,thatis,dataaresequentially

read as a stream and the hallenge onsists in deteting risk asesas soon as

possible.Inordertoreproduethissenario,theeRisk2017'sorganizersreleased

thetestdataset followingasequential,hunkbyhunk riterion;that is,the

rsthunkontainedtheoldest10%ofthemessages,theseondhunkontains

theseondoldest10%,andsoforthuptoomplete10hunksthatrepresentthe

fullwritingoftheanalysedindividuals.

To deal with this problem we implemented an original proposal that we

named temporal variation of terms (TVT) [3℄.However,at thetrainingstage,

TVT seemedto showsome weaknessat spei hunks sowedeidedto om-

plement itwith otherstandardmethods to help ourapproahin these spei

situations.Thus,ourimplementedERDsystemisinfataombinedsystemthat

strongly depends on TVTbut alsouses other methods asadditionalsoure of

opinions.Theresultingsystemhadaveryaeptableperformaneonthedata

setreleasedintheteststageobtainingthelowest

ERDE 50

êrrorônâ^totalôf³⁰

submissionsfrom8dierentinstitutions.However,onethegolden-truthofthe

testdatasetwasreleased,weouldverifythat TVTalonemighthaveobtained

very robust results and the lowest reported

ERDE

^error ^for ^both ^thresholds

used in thepilottask. Forthis reason,wealso inlude anadditionalsetionin

this work with preliminaryresults obtainedwith TVT aloneon thetest setin

orderto observethepotentialofthis methodforgeneralERD problems.

Therestoftheartileisorganizedasfollows:Setion2desribessomegeneral

aspetsofthedatasetusedinthepilottaskandthemethodsusedinourERD

system. Next, in Setion 3 the ativities arried out in the training stage are

desribedand thejustiationsof themain designdeisionsmadeonourERD

system, are presented. Setion 4 showsthe obtainedresults with our proposal

ontheeRisk2017datasetreleasedintheteststage.Then,someomplementary

results are presented in Setion 5 where interesting aspets are shown on the

performane of TVT working alone on the test set. Finally, Setion 6 depits

potentialfuture worksandtheobtainedonlusions.

3

(3)

2.1 Data Set

The dataset used in the pilottask of theeRisk ompetition 4

wasinitially de-

sribed in [8℄. It is a olletion of writings(posts or omments) from a set of

Soial Media users. There are two ategories of users, depressed and non-

depressed and,foreahuser,theolletionontainsasequeneofwritings(in

hronologial order). Foreah user, his olletionof writingshas beendivided

into 10 hunks. The rst hunk ontains the oldest 10% of the messages, the

seond hunkontainstheseondoldest 10%,and soforth.This olletionwas

split into atraining and atest set that we will refer as

T R _DS

^and

T E _DS

^re-

spetively. The

T R DS

^set ôntained ⁴⁸⁶ûsers ⁽⁸³^positive, ⁴⁰³^negative) ând

the(test)

T E DS

^set ôntained⁴⁰¹ûsers⁽⁵² ^positive, ³⁴⁹^negative).^Theûsers

labeledaspositivearethosethathaveexpliitlymentionedthattheyhavebeen

diagnosedwithdepression.

Thepilot task was divided by their organizers into a training stage and a

teststage.Intherstone,thepartiipatingteamshadaesstothe

T R DS

^data

setwithallhunksofalltrainingusers.Theyouldthereforetunetheirsystems

withthetrainingdata.Then,intheteststage,the10hunksofthe

T E DS

^data

setweregraduallyreleasedbytheorganizersonebyoneuntilompletingallthe

hunks that orrespond to theompletewritingsof theonsidered individuals.

Eah time that a hunk

ch i

^was ^released, partiipants in the pilot task were askedtogivetheirpreditionsontheusersontainedinthe

T E DS

^,^based^on^the

partialinformationread fromhunks

ch 1

^to

ch i

^.

2.2 Methods

In order to dealwith theproblem posed in thepilot task we proposed a new

doumentrepresentationnamedtemporalvariation ofterms (TVT)[3℄.TVTis

anelaborateapproahthatrequiresenoughspaetobedesribedinanadequate

way.Forthisreason,inthepresentworkwewillonlyfousonthedeisionaspets

that wereonsidered at thepilot task toombine TVTwith otherwell-known

doument representation approaheslike Bag of Words (BoW). Thus, we will

give below a separate desription of the method with its main harateristis

but theinterestreaderanobtainadetaileddesriptionofTVTin[3℄.

Weusedinourexperimentsdierentdoumentrepresentationsandlearning

algorithms. Therefore, we start giving some general ideas of those doument

representationsandalittlemoredetailedexplanationofTVT.Afterthat,some

generalonsiderationsontheused learningalgorithmsarebrieyintrodued.

Doumentrepresentations

4

(4)

thelanguagemodelsmostusedin textategorizationtasks.InBoW repre-

sentations, features arewordsand doumentsare simply treatedasolle-

tionsofunorderedwords.Formally,adoument

d

^isrepresentedbythevetor ofweights

d BoW = (w 1 , w 2 , ..., w n )

^where

n

îs ^the^sizeôf ^the^voabulary ôf

thedataset.Eahweight

w i

îsâ^value^thatîsâssigned^toêah^feature^(word)

aordingtowhetherthewordappearsinadoumentorhowfrequentlythis

word appears. This popularrepresentation is simple to implement, fast to

obtainand an be used under dierent weighting shemes (boolean, term

frequeny(

tf

⁾^or^term ^frequeny^- ^inverse^doument^frequeny⁽

tf − idf

^)).

InourstudyweusedBoWwiththebooleanweightingsheme.

Conise Semanti Analysis (or Seond Order Attributes (SOA)). Conise

SemantiAnalysisisasemantianalysistehniquethatinterpretswordsand

textfragmentsinaspaeofoneptsthatarelose(orequal)totheategory

labels.Forinstane,ifdoumentsinthedatasetarelabeledwith

q

^dierent

ategorylabels(usuallynomorethan100elements),wordsanddouments

willberepresentedina

q

-dimensionalspae.Thatspaesizeisusuallymuh smallerthan standardBoW representations whih diretly depend on the

voabulary size (more than 10000 or20000elements in general). CSA has

been usedin generaltext ategorizationtasks[6℄ and hasbeen adapted to

work in author proling tasksunder the nameof SeondOrder Attributes

(SOA)[7℄.

Charater3-grams. A

n

^-gram îs â^sequene ôf

n

^haraters ^obtained^from

eah text in the dataset.In the

n

^-gram^model, ^the^ditionary ^ontains^all

n

^-grams^that^ourⁱⁿ^any^termⁱⁿ^the^voabulary.^Therepresentationsusing

harater

n

^-grams ^havedemonstrated to be eetive in many appliations [5℄where

n

^-grams âreônsidered^the^termsûsedⁱⁿ ^BoWrepresentations.

LIW C

^-based ^features. ^Features^derived^from ^Linguisti ^Inquiry ^and ^Word

Count (LIWC)[11,10℄ havebeenused in severalstudies related to psyho-

logialaspets of individuals. LIWC has been suessfullyused to identify

depressedand non-depressedpeople analyzinglinguistimarkersofdepres-

sionsuhastheuseofthepersonalpronounsandpositive-negativeemotions

[12℄andthepreseneofwordsrelatedto thedeath(e.g.,dead,kill,sui-

ide),sex(e.g., arouse, makeout, orgasm)and ingestion (e.g., hew,

drink,hunger)besidesemotionsalso resultinguseful[1℄. Studiesonsui-

idalindividuals haveinorporatedLIWCasa valuabletool to extratin-

formationrelated to suiide and suiidal ideation analyzing the ategories

death,health,sad,they,I,sexual,ller,swear,anger,andnegativeemotions

[2,4℄.Due to the fat wewantedto onsidermoremeaningful features, we

alsoonsideredinpreliminarystudiesthemostinformativefeaturesbelong-

ing to linguisti dimensions (for example, personal pronouns and number

offuntionwords),summary oflanguagevariables(forexample,ditionary

words and words with more than 6letters), psyhologialproess (forex-

ample,negativeemotionsandaetiveproesses)and,grammar(verbsand

(5)

(TVT) method [3℄ is an approah for early risk detetion based on using the

variationofvoabularyalongthedierenttimestepsasoneptspaefordou-

mentrepresentation.Thismethodis thekeyomponentofourEDSsystemfor

theeRiskpilottask.

TVT is based on the onise semanti analysis (CSA) tehnique proposed

in [6℄ and later extended in [7℄ forauthor proling tasks. In this ontext, the

underlyingideaisthatvariationsofthetermsusedindierentsequentialstages

ofthedoumentsmayhaverelevantinformationforthelassiationtask.With

this idea in mind, this method enrihes the douments of the minority lass

with the partial douments read in the rst hunks.These rsthunks of the

minority lass, along with theiromplete douments, are onsidered asa new

oneptspaeforaCSAmethod.

TVTnaturallyopeswiththesequentialarateristisofERDproblemsand

alsogivesatoolfordealingwithunbalaneddatasets.Preliminaryresultsofthis

methodinomparisontoCSAandBoWrepresentations[3℄showeditspotential

todealwithERDproblems.

Learning algorithms Inpreliminarystudieswetesteddierentlearningalgo-

rithmsandweobtainedthebestresultswithRandomForestandNaïveBayesin

severalomparisonswithotherpopularmethodslikeLIBSVM 5

.Wealsousedin

oursystem,adeisiontree(Weka'sJ48)obtainedbyrstseletingthe100words

withthehighestinformationgainandthenremovingfromthat listthosewords

that were onsidered dependent of spei domains (names of politiians like

Obama andountrieslikeChina).TheJ48algorithmtrainedonthissubset

offeaturesobtainedadeisiontreeofonly39nodesontainingsomeinteresting

wordslikemeds,depression,therapist andry,amongothers.As wewill

seelater,this deisiontreewasusedto assisttotheTVTmethodin theinitial

hunks.

EventhoughwearriedoutseveralomparativestudieswithLIWCfeatures,

harater3-grams,CSArepresentationsandtheLIBSVMalgorithm,wedeided

toseletonlythoseapproahesthatseemedtobeeetivetoassistTVTinsome

speisituations.Forthisreason,inourERDsystemweonlyusedtheRandom

ForestandNaïveBayesalgorithmswithBoWandTVTrepresentationsandthe

J48deisiontreepreviouslyexplained.

3 Training Stage

The pilot task was divided into a training stage and a test stage. So, in the

rst one,the partiipatingteams had aess to the

T R DS

^set ^with ^all^hunks

ofalltrainingusers.Therefore,weouldtunein thisstageoursystemwiththe

trainingdata.

5

For all the testedalgorithms we used the implementations provided by the Weka

(6)

severalpreliminarystudiestogivebetterresultsthanBoWandCSA(see[3℄for

adetailedstudy),welakedofarobustriteriontosolvearitialaspetinERD

problemsthatwewillreferasthelassiationtimedeision(CTD)issue.That

is, an ERD system needsto onsider not only whih lass should be assigned

to adoument; italsoneedstodeidewhen to makethat assignment.Inother

words,ERDmethodsmusthaveariteriontodeidewhen (inwhatsituations)

thelassiationgeneratedbythesystemisonsideredthenal/denitivedei-

sionon theevaluatedinstanes. Although thisaspethasbeenaddressedwith

verysimpleheuristi rules 6

, we wantedto have arule for the CTD issue that

attempted to exploit thestrengths oftheTVT method andsimultaneouslyal-

leviate its weaknesses. With this goalin mind, webegan athorough study to

determineinwhihsituationstheTVTmethodbehavedsatisfatorilyandwhen

itneededtobeimproved.

Westartedourstudybydividingthetrainingset

T R DS

^into^a^new^training

set that we will refer as

T R DS − train

^and ^a ^test ^set ^named

T R DS − test

^.

Those sets maintained the same proportions of post per user and words per

user as desribed in [8℄.

T R _DS − train

^and

T R _DS − test

^were ^generated ^by

randomlyseletingarounda70%of writingsfortherstone andtherest 30%

fortheseondone.Thus,

T R _DS − train

^resultedⁱⁿ³⁵¹individuals(63positive, 288negative)meanwhile

T R _DS − test

^ontains¹³⁵individuals(20positive,115 negative).Thedivisioninto10hunksofthewritingsprovidedbytheorganizers

waskeptforbotholletions.

Then,weanalyzedtheperformaneoftheTVTmethodbyusingthe

T R DS − train

^setâs^training^setândêvaluating^theôbtained^results^with

T R DS − test

^.

Westartedassumingthatthelassiationismadeonastatihunkbyhunk

basis.Thatis,foreahhunk

C ˆ i

^provided^to^theÊRD^systems^weêvâluated^the

TVT'sperformaneonsideringthatthemodelis(simultaneously)appliedtothe

writingsreeiveduptothehunk

C ˆ i

^.^With^this^type^of^study^it^was^possible^to

observetowhatextenttheTVTmethodwasrobusttothepartialinformationin

thedierentstages,inwhihmomentitstartedtoobtainaeptableresults,and

other interestingstatistis.Inthis ontext, weonsideredthat the

F 1

^measure

whihombines preisionandreallwouldbeavaluablemeasureto gainsome

insightintotheperformaneoftheTVTmethodalongthedierenthunks.

Figure1showsthistypeofinformationwhereweanseethatTVTrepresen-

tationswith twodierentlearningalgorithms(NaïveBayes(NB) andRandom

Forest(RF))reahedthehighest

F 1

^valuesⁱⁿ^the^hunks³ând^4.În^bothâses,

those values were obtained when an instane was lassied as positive if the

probabilityassignedbythelassierwasgreaterorequalthan0.6(

p ≥ 0.6

^).^Be-

sides, we ouldobservethat in the rsthunk,TVTrepresentations produed

a high numberof false positives.That weakness of TVT methods is shown in

Figure2whereweanseethelowestpreisionvaluesinthehunk1.Thatprob-

lem ofTVTmethodslooksreasonableifweonsiderthatTVTis basedonthe

6

Forinstane,exeedingaspeiondenethresholdinthepreditionofthelas-

(7)

hunks, where little information is available,TVT will need to be assisted by

otheromplementarymethods.

Fig.1.

F 1

^values^obtained^with^TVT^models.

Fig.2.PreisionvaluesobtainedwithTVTmodels.

Due to the fat that we wanted to fous our preditions on those hunks

wherebestresultswereobtained(hunks3and4)andassumingthatafterthose

hunksthepenalizationomponentinthe

ERDE

^omputation^would^aet^the

TVT's resultswedeided to set somehunk by hunk rulesthat aomplish

ertainbasipropertiesinthedierenthunks:

1. Chunk1: HeretheERD systemshould beextremelyonservativeandonly

lassifyinganinstaneaspositive(depressed)ifthereexistsstrongevidene

of that.Weused for this ase, theriterion of only lassifyingan instane

as positive if both models obtainedwith TVT-NB and TVT-RFlassied

the instane as positive with probability

p = 1

ând ^the ^text înluded âll

thewordsinawhitelist. Thatlistwasobtainedfrom thewordswiththe

highestinformationgainofthedoumentsofthersthunk.Itinludedthe

wordsdepression anddiagnosed.

2. Chunk2: Heretherestritionofthewhitelist ouldberelaxedandom-

plementtheTVTmethodswith amoregeneralapproah. Forthisend, we

usedthepreditionsofthe J48deisiontreeexplainedaboveandlassied

aninstane aspositiveifthethreelassierslassieditinthat way.

3. Chunk 3: In this hunk mostof the lassiers obtainedthe best preision

values. However,they obtained low reall values. In order to address this

aspet,aninstanewaslassiedaspositiveifatleasttwolassierslassify

itas positivewithprobability

p ≥ 0.9

^.

(8)

preision.Wealso ouldobservethat whenit wasombinedwiththeTVT

method,thislastmethodplayedaroleoflterobtaininggoodresultswhen

bothmethodslassiedaninstaneaspositive.Ontheother hand,many

instanes not deteted for this ombination were lassied as positive by

theJ48method. Forthisreason,anyinstanelassiedaspositivebyboth,

BoWandTVTmethodswithprobability

p ≥ 0.7

^or ^by^the^J48^method^was

lassiedaspositive.

5. Chunks5to10:Fromhunk5forward,weassumedthatmostoftherelevant

lassiationshad alreadybeenmadein the previoushunks.However,we

kept a monitoring system to identify those ases that needed muh more

additionalinformationtodeterminewhetheranindividualwasdepressiveor

not.Forthis purpose,weused thesameruleof hunk4forhunks 5and6

butinreasingtheprobabilityofBoW andTVTto

p ≥ 0.8

^.^From^hunks⁷

to10aninstanewaslassiedaspositiveifatleasttwolassierslassied

itas positivewithprobability

p ≥ 0.9

^.

Withthose empiriallyderivedrulesasgeneralriterionfortheCTDissue,

wethen analyzed ifthe TVT method ombined with the other approahesin

spei situations obtained abetter performane than the individual methods

workingseparately.Theresultsobtainedfromthisanalysisarepresentedbelow.

3.1 Analysis of results

In order to study the performane of our ombined approah against eah

method testedseparately,weused

T R _DS − train

^and

T R _DS − test

^sets ⁱⁿ^the

experiments. Individual methods also need a riterion to deal with the CTD

issue. In this ase, we an diretly use the probability (or some measure of

ondene)assignedbythelassiertodeidewhentostopreadingadoument

and givingits lassiation.That approah,that in [8℄is referredasdynami,

onlyonsidersthatthisprobabilityexeedssomepartiularthresholdtolassify

theinstane/individualaspositive.Inourstudies,wealsoused,fortheindividual

methods,thisdynamiapproahbyonsideringdierentprobabilityvalues:

p = 1

^,

p ≥ 0.9

^,

p ≥ 0.8

^,

p ≥ 0.7

^and

p ≥ 0.6

^.

Tables 1, 2 and 3 report the values of preision (

π

^), ^reall ⁽

ρ

⁾ ^and

F 1

^-

measure (

F 1

⁾ ^of ^the ^target (depressed) lass for eah onsidered individual model anddierentprobabilities.Statistisalsoinludetheearly riskdetetion

error (ERDE)measureproposedin[8℄withtwovaluesoftheparameter

o

^used

in thepilottask:

o = 5

⁽

ERDE 5

⁾^and

o = 50

⁽

ERDE 50

^).

As we an see from Table 1 themodel TVT-NBobtained the best perfor-

manewhenthelassierassignedtothepositivelass(depressed),theinstanes

withprobability1.ForTVT-RFandBoWthestatistisaredierent.InTable2

thebest

ERDE 50

^(and

F 1

^measure)^was^obtained^when^Random^F^orest^lassier

usedthelowestprobability(

p ≥ 0.6

⁾^but^at^the^same^time^with^thisprobability the

ERDE 5

êrrorîs ^the ^worst ^value. ^The ^BoW ^model ^(Tâble³⁾ ôbtained^the

best

ERDE 50

^value^onsidering

p ≥ 0.8

^and^with^this^onguration^the^best

F 1

^,

(9)

π

^and

ρ

âreôbtained.^The^J48^modelâssigns^theînstanes^to ^the^positive^lass

usingprobability1,thenitsperformaneisdiretlyshownintheomparisonin

Table4.

Table1.PerformaneofTVT-NBmodel(

T R DS − test

^set).

p = 1 p ≥ 0 . 9 p ≥ 0 . 8 p ≥ 0 . 7 p ≥ 0 . 6 ERDE 5

^14.13 ^16.66 ^16.68 ^16.82 ^16.85

ERDE ⁵⁰

^11.25 ^15.34 ^15.21 ^15.24 ^15.07

F 1

^0.40 ^0.27 ^0.28 ^0.27 ^0.28

π

^0.47 ^0.50 ^0.48 ^0.43 ^0.44

ρ

^0.35 ^0.18 ^0.19 ^0.19 ^0.20

Table 2.PerformaneofTVT-RFmodel(

T R DS − test

^set).

p = 1 p ≥ 0 . 9 p ≥ 0 . 8 p ≥ 0 . 7 p ≥ 0 . 6 ERDE 5

^16.92 ^16.85 ^16.92 ^16.96 ^17.14

ERDE ⁵⁰

^16.43 ^16.05 ^16.12 ^16.02 ^15.79

F 1

^0.11 ^0.15 ^0.16 ^0.21 ^0.22

π

^0.50 ^0.54 ^0.50 ^0.50 ^0.43

ρ

^0.06 ^0.08 ^0.10 ^0.13 ^0.14

Table3. PerformaneofBoWmodel(

T R DS − test

^set).

p = 1 p ≥ 0 . 9 p ≥ 0 . 8 p ≥ 0 . 7 p ≥ 0 . 6 ERDE 5

^20.79 ^20.83 ^21.05 ^21.16 ^21.27

ERDE 50

^18.82 ^18.66 ^18.13 ^18.24 ^18.35

F ¹

^0.19 ^0.21 ^0.24 ^0.24 ^0.23

π

^0.11 ^0.13 ^0.14 ^0.14 ^0.14

ρ

^0.5 ^0.65 ^0.75 ^0.75 ^0.75

Tohoieapartiularprobabilitytoomparetheperformaneofthemodels,

weseleted the onefor whih themodel obtainedthe best ERDE onsidering

o = 50

⁽

ERDE 50

^).^We^deided ^to^use ^this^metri^beause^we^knewⁱⁿ ^advane

thattheresultsinthepilottaskwouldbeevaluatedwithERDE butwedidnot

knowthe spei

o

^value.În ^this ôntext, ^weônsidered ^that

ERDE 50

^would

bemoreimportant/informativethanthe

ERDE 5

^error.

Table4showstheomparison of performane ofall models. Asweanob-

(10)

exeptforthe

ERDE 5

^error^for^whih^TVT-NB^model^obtained^the^best^value.

It is interesting to note that beyond that TVT-NB obtained a good

ERDE 5

value, in the other metris it was slightly worse than the ombined methods.

These resultsonrm theadequatenessof theombinedapproahfortheERD

taskovertheotherindividualmethods.

Table4.Performane omparisonofall models(

T R DS − test

^set).

ERDE ⁵ ERDE ⁵⁰ F ¹ π ρ

Combinedmethods 15.36 9.16 0.590.480.75

J48 17.01 12.94 0.4 0.33 0.5

BoW(

p ≥ 0 . 8

) 21.05 18.13 0.24 0.14 0.75 TVT-NB(

p = 1

) 14.13 11.25 0.4 0.47 0.35 TVT-RF(

p ≥ 0 . 6

) 17.14 15.79 0.22 0.43 0.14

4 Test Stage

Thepreviousresultswereobtainedbytrainingthelassierswiththe

T R DS − train

^data ^set ^and ^testing ^them ^with ^the

T R _DS − test

^data ^set. ^In ^this ^sub-

setion, we show the results obtained by the ombined methods trained with

the fulltrainingset of the pilottask (

T R _DS

⁾^and ^tested ^with

T E _DS

^that ^was

inrementallyreleasedduringthetestingphaseofthepilottask.

InTable5weshow thethree submissionsthat obtainedthe best

ERDE 5

^,

ERDE 50

^and

F 1

^values ⁱⁿ ^the êRisk ^pilot ^task âs ^reported în[9℄. ^There, ^we

an observethatourombinedmethods (

U N SLA

⁾^obtained^the^best

ERDE 50

value.On theotherhand,FHDO-BCSGAobtainedthebest F-measurewitha

ERDE ⁵⁰

^value^slightly^worse^thanôurâpproah. ^FHDO-BCSGBôbtained^the

best

ERDE 5

^and ^the^worst

F 1

^over^the^three ^approahes. ^With ^these ^results,

weanonludethatourproposalisaompetitiveapproahforERDtasks.

Table 5.Bestintherankingofthepilottask(

T E DS

^set).

ERDE 5 ERDE 50 F 1 π ρ

Combinedmethods(UNSLA) 13.66 9.68 0.59 0.480.79

FHDO-BCSGA 12.82 9.69 0.640.610.67

FHDO-BCSGB 12.70 10.39 0.55 0.690.46

(11)

OurERD systemtested onthepilot taskwasderivedfrom ouranalysis ofthe

weakness and strengths of the TVT method on the training data. However,

one the golden-truth information of the

T E DS

^set ^was ^made ^available ^by

theorganizers,weould analyzewhat would havebeentheperformaneofthe

models, in partiular the TVTmethod, working aloneif dierentprobabilities

hadbeenseleted.

Table 6 shows this type of information by reporting the results obtained

withTVTrepresentationandusingNaïveBayesandRandomForestaslearning

algorithms. Besides, dierent probability values were tested for the dynami

approahestotheCTDaspet.Theobtainedresultsareonlusiveinthisase.

TVT shows a high robustness in the

ERDE

^measures independently of the algorithm used to learnthe model, and the probability threshold. Mostof the

TVT's

ERDE 5

^values^were^lowândⁱⁿ⁷ôutôf¹⁰^settings,^the

ERDE 50

^values

werelowestthanthebestonereportedinthepilottask(theombinedmethods

(

U N SLA

⁾ ^with ^9.68). În ^this ôntext, ^TVT âhieves ^the ^best

ERDE 5

^value

reported uptonow(12.30)with thesettingTVT-RF (

p ≥ 0.8

⁾^and ^the^lowest

ERDE ⁵⁰

^value^(8.17)^with ^the^model^TVT-NB⁽

p ≥ 0.8

^).^The ^performane ^of

theTVTmethods onthe testsetwassurprising forusbeauseon thetraining

stage weouldnotobtainsuh asgood results.Thismakesus onludethatif

theTVTmethodhadpartiipatedaloneinthepilottask,ithadobtainedsimilar

orbetterresultsthantheonesobtainedwiththeombinedmethods.

Table6.TVT's performaneonthe

T E DS

^set.

ERDE ⁵ ERDE ⁵⁰ F ¹ π ρ

TVT-NB(

p ≥ 0 . 6

) 13.59 8.40 0.50 0.37 0.75 TVT-NB(

p ≥ 0 . 7

) 13.43 8.24 0.51 0.39 0.75 TVT-NB(

p ≥ 0 . 8

) 13.13 8.17 0.54 0.42 0.73 TVT-NB(

p ≥ 0 . 9

) 13.07 8.35 0.52 0.42 0.69 TVT-NB(

p = 1

) 12.38 9.84 0.42 0.50 0.37 TVT-RF(

p ≥ 0 . 6

) 12.46 8.37 0.55 0.49 0.63 TVT-RF(

p ≥ 0 . 7

) 12.49 8.52 0.55 0.50 0.62 TVT-RF(

p ≥ 0 . 8

) 12.30 8.95 0.56 0.54 0.58 TVT-RF(

p ≥ 0 . 9

) 12.34 10.28 0.47 0.55 0.40 TVT-RF(

p = 1

) 12.82 11.82 0.20 0.67 0.12

6 Conlusions and future work

This artile presents the partiipation of LIDIC - UNSL at eRisk 2017 Pilot

task onEarlyDetetionof Depression.Weproposed aninterestingrepresenta-

(12)

of thedouments.Beause theexperimentsonthe trainingstageshowed some

weakness of TVT, we proposed a ombined approah to assist TVT in some

spei situations. The results obtained on the pilot task with our ombined

methodsweregood,obtainingthebest

ERDE 50

^value^over^allpartiipants.An interesting aspet wasthat the TVTrepresentation alone on the test set,ob-

tainedbetterresultsthantheones ahievedbytheombinedapproah.

As future workweplan to extendthe analysis ofTVT representationto solve

other ERD problems suh as theidentiation of sexual predators and people

withsuiidetendeny.

Referenes

1. M.R.Capeelatro,M.D.Sahet,P.F.Hithok,S.M.Miller,andW.B.Britton.

Majordepressiondurationreduesappetitiveworduse:Anelaboratedverbalreall

ofemotionalphotographs. JournalofPsyhiatriResearh,47(6):809815,2013.

2. M.DeChoudhury,E.Kiiman,M.Dredze,G.Coppersmith,andM.Kumar.Dis-

overingshiftstosuiidalideationfrommentalhealthontentinsoialmedia. In

Proeedingsofthe2016CHIConfereneonHumanFatorsinComputingSystems,

CHI'16,pages20982110, NewYork,NY,USA,2016.ACM.

3. M.L. Errealde,M.P.Villegas,D.G.Funez,M.J.GariarenaUelay,andL.C.

Cagnina.TemporalVariationofTermsasoneptspaeforearlyriskpredition.In

G.J.F.Jones,S.Lawless,J.Gonzalo,L.Kelly,L.Goeuriot,T.Mandl,L.Cappel-

lato,andN.Ferro,editors,ExperimentalIR MeetsMultilinguality,Multimodality,

andInteration.,volume10456.Springer,2017.

4. A.Garia-Caballero,J.Jiménez,M.Fernandez-Cabana,andI.Garía-Lado. Last

words:anliwanalysisofsuiidenotesfromspain. EuropeanPsyh.,27:1,2012.

5. M.Koppel,J.Shler,andS.Argamon.Computationalmethodsinauthorshipattri-

bution. JournaloftheAmerianSoietyforInformationSiene andTehnology,

60(1):926, 2009.

6. Z.Li,Z.Xiong,Y.Zhang,C.Liu,andK.Li. Fasttextategorizationusingonise

semantianalysis. PatternReogn. Lett.,32(3):441448,February2011.

7. A.P.López-Monroy,M. MontesyGómez,H. J.Esalante, L.Villaseñor-Pineda,

and Efstathios Stamatatos. Disriminative subprole-spei representations for

authorprolinginsoialmedia. Knowledge-Based Systems,89:134147,2015.

8. D.E.Losada andF.Crestani. A TestColletionforResearhonDepressionand

Language Use,pages2839. SpringerInternationalPublishing,Cham,2016.

9. D.E.Losada,F.Crestani,andJ.Parapar. eRISK2017:CLEFLabonEarlyRisk

Predition ontheInternet:Experimentalfoundations. InProeedingsConferene

andLabsoftheEvaluationForumCLEF2017,Dublin,Ireland,2017.

10. J. W.Pennebaker, R.L. Boyd, K.Jordan,and K.Blakburn. Thedevelopment

andpsyhometripropertiesofliw2015. 2015.

11. J.W.Pennebaker,M.R.Mehl,andK.G.Niederhoer.Psyhologialaspetsofnat-

urallanguageuse:Ourwords,ourselves. Annualreviewofpsyhology,54(1):547

577, 2003.

12. N. Ramirez-Esparza, C. K. Chung, E. Kaewiz, and J. W. Pennebaker. The

psyhologyofworduseindepressionforumsinenglishandinspanish:Testingtwo

textanalytiapproahes. InPro.ICWSM2008,2008.