Editorial: Special issue on natural language processing and text analytics in industry

(1)

Editorial:

Special

issue

on

natural

language

processing

and

text

analytics

in

industry

Therecentyearshavewitnessedanunprecedentedgrowthin

thevolumeofunstructureddataexpressedinnatural language.

This explosion in thevolume of text can to a large extent be

attributed to the emergence of Web 2.0 platforms and online

communities, including blogs, forums, review sites, and social

networks. Hidden within these novel channels is valuable

information,which, ifproperly identiﬁedand extracted,can be

employed to enhance a plethora of corporate activities. For

example, product reviews written by consumers on blogs or

retailerwebsitescontainusefulinformationaboutbrand

percep-tionandproductquality.

The scientiﬁc discipline of Natural Language Processing (or

ComputationalLinguistics)andtheappliedﬁeldofTextAnalytics,

in which Natural Language Processing coalesces with Machine

LearningandDataMining,havealsoevolvedovertheyearstokeep

pacewiththeproliferationofnovelsourcesoftextdata.Natural

LanguageProcessing(NLP)andTextAnalytics(TA)algorithmsand

techniques are increasingly being developed, adopted and

deployedforaddressing awide spectrumof real-life,industrial

problems. Typical examples include document classiﬁcation,

documentclustering,topicdetectionandmodeling,andopinion

mining/sentiment analysis. However, at the same time, the

successful application of such innovative technologies raises a

numberofpertinentresearchquestions,withboththeoreticaland

practicalramiﬁcations.

ThemainobjectiveofthisSpecialIssue(SI)wastocollectand

consolidatehigh-qualityknowledgeonstateoftheartresearchin

NLP/TAmethodsandtheirapplicationsinindustrial(enterprise)

activities.Approximately70abstractswerereceivedfollowingthe

SIcall.Thisrelativelyhugenumberistestimonialtothegrowing

importanceoftheﬁeld.Theywereevaluatedfortheirpertinence

basedonseveralcriteria,includingasoundscientiﬁccontribution

and a clear industrial application. Around 30 abstracts were

selectedaspartofthis evaluation.Therespective authorswere

requestedtosubmittheirfullmanuscripts,whichwerereviewed

by at least 2 reviewers according to the journal’s guidelines.

Finally, 7 articles were selected for publication (after another

reviewround)inthisSI.Inaddition,theSIalsoincludesanarticle

fromtheguest-editors,whichdiscussesthetrendsandchallenges

oftext analyticsapplicationsin industry. Anoverviewof these

articlesisprovidedbelow.

Analyzingandevaluatingthetaskofautomatictweetgeneration:

Knowledgetobusiness(LloretandPalomar):Thisarticlepresents

anapproachbasedontextsummarizationtechniquestogenerate

tweets. Furthermore, statistical measures are employed to

estimatetheinformativenessandinterestingnessoftweets.They

showthatautomaticallygeneratedtweetscanbeasinformative

and interesting as manually generated ones for the English

language.ResultsaremoremitigatedforSpanish.

Adistributionalapproachtoopenquestions inmarketresearch

(Evert et al.): The authorspresent theKlugator Engine (TKE),a

systemforanalyzingsurveyresponsesexpressedinEnglishand

German. In essence, their analysis involves determining the

sentimentsof responsesand clusteringthemaccording totheir

topicalsimilarity,whicharethendisplayedasnodesonasemantic

map.Itisworthnotingthatthissystemisalreadyincorporatedina

commercialsolution.

Integrating a semantic-based retrieval agent into case-based

reasoningsystems:Acasestudyofanonlinebookstore(Changetal.):

Techniquesforshort-textsemanticsimilarity(STSS)and

recogniz-ing textual entailment (RTE) are integrated in a case-based

reasoning system. Experimental evaluations revealed that the

proposedsystemoutperformedexistingones. Furthermore,ina

case-study,theaccuracyoftheproposedsysteminrespondingto

users’requests(queries,questions)wasalsocomparedtothatof

othersimilarsolutionsfoundintypicale-commercewebsites.The

case-studyshowedthattheproposedsystemwasmoreeffective

andprovidedmoreaccurateresponsestoaddresstheinformation

needsofusers.

Turning user generated health-related content into actionable

knowledge through text analytics services (Martinez et al.): A

systemformonitoringhealth-relatedinformationfor

pharmaco-vigilance ispresented. The proposedsystem integratesseveral

text analytics solution, such as GATE and MeaningCloud, for

extracting pertinent information pertaining to adverse drug

reactionsfrom social media sources. Informationextractionis

mainlyperformedusingnamedentityrecognitionandsemantic

relationextractionfromSpanishtexts.Thesetasksareperformed

by leveraging upon several domain-speciﬁc dictionaries and

ontologies.

Amethodologyfortrafﬁc-relatedtwittermessagesinterpretation

(Albuquerque et al.): Tweets are a valuable sourceof real-time

ComputersinIndustry78(2016)1–2

ContentslistsavailableatScienceDirect

Computers

in

Industry

j ou rna l h ome p a ge : w ww . e l se v i e r. co m/ l oc a te / c om pi nd

http://dx.doi.org/10.1016/j.compind.2016.01.001

(2)

trafﬁcinformation.However, animportantchallengeliesin the

analysis and interpretation of tweets, especially for trafﬁc

monitoringwhichrequireverypreciseinformation(fore.g.exact

location,numberofvehiclesinvolvedinaccident).Thechallengeis

addressedinthis article,whichpresentsasystemthatanalyzes

andinterpretstweets,expressedinPortuguese,bytransforming

themintoRDFtriplesbasedona domain-speciﬁcontology.The

applicationhasbeendeployedtomonitortruckﬂeetsoperatedby

aliquidgasandafueldistributioncompany.

Textclassificationbasedfiltersforadomain-specificsearchengine

(Schmidt et al.): Domain-speciﬁc search engines often on rely

ﬁlterstonarrowdowntheirsearchresults.Thisarticlediscusses

howtextclassiﬁcationusingSupportVectorMachines(SVM)can

beusedtopredictﬁlters.Inaddition,italsoshowshowactive

learning can be applied to enhance the effectiveness of the

classiﬁcationtask.Theproposedsystemhasbeenemployedina

commercialGermansearchengineforthedomainofjoboffersand

vacancies.

Natural language processing for aviation safety reports:

From classiﬁcation to interactive analysis (Tanguy et al.):

This article presents a system for classifying aviation safety

reports into incident categories. Classiﬁcation is performed

usingSVM, and the performance is enhanced by adopting an

active learning procedure. In addition to classiﬁcation, the

systemalsoperformstopicdetectionfromthereports,usingan

approach based on Gibbs sampling. However, the results for

topicdetectionweremitigated,withmuchbetterperformance

achieved in the text classiﬁcation task. Furthermore, the

proposedsystemincludesaninformationretrievalapplication,

enablinguserstosearchforreportsdescribingsimilarincidents.

The results (i.e. reports describing similar incidents) are

displayedalongatemporalaxisona2-Dscatter-plotforeasier

visualization.

Text analytics in industry: Challenges, desiderata and trends

(Ittooetal.):ThelastarticleinthisSIisareviewarticlebythe

guest-editors. The current state of the art in text analytics

applicationsindustryis systematicallyreviewedanddiscussed

alongseveraldimensions,suchastheapplicationcontexts,the

techniques utilized and the evaluation procedure. From this

review, the authors identify the challenges and constraints

thatreal-worldenvironmentsimposeontextanalytics

applica-tions, andsubsequently, identify a set of desiderata that text

analytics applications should possess for their successful

deployment in industry. Finally, the article discusses future

trends in text analytics and their potential application in

industry,including therevivalin neural-network-based

meth-ods for deep learning, word-embeddings and scene (image,

videosequence)labeling.

Since 2013, Ashwin Ittoo is an Asst-Professor in

InformationSystemsattheHECManagementSchool,

UniversityofLie´ge,Belgium.Hisresearchinterestsare

minimally-supervisedlearningtechniquesforMachine

LearningandNaturalLanguageProcessingandinthe

applicationofthesetechniquesformeasuring

socio-economicindicators.HereceivedhisPh.D.degreefrom

the University of Groningen, The Netherlands in

2012andhisBachelorsandMastersfromtheNational

UniversityofSingaporeandtheNanyangTechnological

University,Singapore.

LeMinhNguyeniscurrentlyanAssociateProfessorof

SchoolofInformationScience,JAIST.Heleadsthelabon

MachineLearningandNaturallanguageUnderstanding

atJAIST.HereceivedhisB.Sc.degreeininformation

technologyfromHanoiUniversityofScience,andM.Sc.

degree in information technology from Vietnam

NationalUniversity,Hanoiin1998and2001,

respec-tively.He receivedhis Ph.D. degreein information

science from School of information science, Japan

AdvancedInstituteofScienceandTechnology(JAIST)

in2004. HewasanassistantprofessoratSchoolof

informationscience,JAIST from2008-2013. His

re-searchinterestsincludemachinelearning,text

sum-marization, machine translation, natural language

processing,andinformationretrieval.

AntalvandenBosch(PhD1997,Universiteit

Maas-tricht)isprofessoroflanguageandspeechtechnology

at the Centre for Language Studies at Radboud

University,Nijmegen,theNetherlands. His research

interestsincludememory-basedandexemplar-based

naturallanguage modeling,textanalyticsappliedto

historicaltextsandsocialmedia,andprooﬁngtools.He

isamemberoftheNetherlandsRoyalAcademyofArts

andSciencesandECCAIFellow.

AshwinIttooa,_*

LeMinhNguyenb

AntalvandenBoschc

a_HEC_Management_School,_University_of_Lie`ge,_Lie`ge,_Belgium

b_School_of_Information_Science,_Japan_Advanced_Insitute_of_Science

andTechnology,Japan-DivisionofDataScience,TonDucThang

University,HoChiMinhCity,Vietnam

c_Centre_for_Language_Studies,_Faculty_of_Arts,_Radboud_University

Nijmegen,Nijmegen,TheNetherlands

*Correspondingauthor

E-mailaddresses:[email protected](A.Ittoo),

[email protected](L.M.Nguyen),

[email protected](A.vandenBosch).

Availableonline12February2016

Editorial/ComputersinIndustry78(2016)1–2