Editorial:
Special
issue
on
natural
language
processing
and
text
analytics
in
industry
Therecentyearshavewitnessedanunprecedentedgrowthin
thevolumeofunstructureddataexpressedinnatural language.
This explosion in thevolume of text can to a large extent be
attributed to the emergence of Web 2.0 platforms and online
communities, including blogs, forums, review sites, and social
networks. Hidden within these novel channels is valuable
information,which, ifproperly identifiedand extracted,can be
employed to enhance a plethora of corporate activities. For
example, product reviews written by consumers on blogs or
retailerwebsitescontainusefulinformationaboutbrand
percep-tionandproductquality.
The scientific discipline of Natural Language Processing (or
ComputationalLinguistics)andtheappliedfieldofTextAnalytics,
in which Natural Language Processing coalesces with Machine
LearningandDataMining,havealsoevolvedovertheyearstokeep
pacewiththeproliferationofnovelsourcesoftextdata.Natural
LanguageProcessing(NLP)andTextAnalytics(TA)algorithmsand
techniques are increasingly being developed, adopted and
deployedforaddressing awide spectrumof real-life,industrial
problems. Typical examples include document classification,
documentclustering,topicdetectionandmodeling,andopinion
mining/sentiment analysis. However, at the same time, the
successful application of such innovative technologies raises a
numberofpertinentresearchquestions,withboththeoreticaland
practicalramifications.
ThemainobjectiveofthisSpecialIssue(SI)wastocollectand
consolidatehigh-qualityknowledgeonstateoftheartresearchin
NLP/TAmethodsandtheirapplicationsinindustrial(enterprise)
activities.Approximately70abstractswerereceivedfollowingthe
SIcall.Thisrelativelyhugenumberistestimonialtothegrowing
importanceofthefield.Theywereevaluatedfortheirpertinence
basedonseveralcriteria,includingasoundscientificcontribution
and a clear industrial application. Around 30 abstracts were
selectedaspartofthis evaluation.Therespective authorswere
requestedtosubmittheirfullmanuscripts,whichwerereviewed
by at least 2 reviewers according to the journal’s guidelines.
Finally, 7 articles were selected for publication (after another
reviewround)inthisSI.Inaddition,theSIalsoincludesanarticle
fromtheguest-editors,whichdiscussesthetrendsandchallenges
oftext analyticsapplicationsin industry. Anoverviewof these
articlesisprovidedbelow.
Analyzingandevaluatingthetaskofautomatictweetgeneration:
Knowledgetobusiness(LloretandPalomar):Thisarticlepresents
anapproachbasedontextsummarizationtechniquestogenerate
tweets. Furthermore, statistical measures are employed to
estimatetheinformativenessandinterestingnessoftweets.They
showthatautomaticallygeneratedtweetscanbeasinformative
and interesting as manually generated ones for the English
language.ResultsaremoremitigatedforSpanish.
Adistributionalapproachtoopenquestions inmarketresearch
(Evert et al.): The authorspresent theKlugator Engine (TKE),a
systemforanalyzingsurveyresponsesexpressedinEnglishand
German. In essence, their analysis involves determining the
sentimentsof responsesand clusteringthemaccording totheir
topicalsimilarity,whicharethendisplayedasnodesonasemantic
map.Itisworthnotingthatthissystemisalreadyincorporatedina
commercialsolution.
Integrating a semantic-based retrieval agent into case-based
reasoningsystems:Acasestudyofanonlinebookstore(Changetal.):
Techniquesforshort-textsemanticsimilarity(STSS)and
recogniz-ing textual entailment (RTE) are integrated in a case-based
reasoning system. Experimental evaluations revealed that the
proposedsystemoutperformedexistingones. Furthermore,ina
case-study,theaccuracyoftheproposedsysteminrespondingto
users’requests(queries,questions)wasalsocomparedtothatof
othersimilarsolutionsfoundintypicale-commercewebsites.The
case-studyshowedthattheproposedsystemwasmoreeffective
andprovidedmoreaccurateresponsestoaddresstheinformation
needsofusers.
Turning user generated health-related content into actionable
knowledge through text analytics services (Martinez et al.): A
systemformonitoringhealth-relatedinformationfor
pharmaco-vigilance ispresented. The proposedsystem integratesseveral
text analytics solution, such as GATE and MeaningCloud, for
extracting pertinent information pertaining to adverse drug
reactionsfrom social media sources. Informationextractionis
mainlyperformedusingnamedentityrecognitionandsemantic
relationextractionfromSpanishtexts.Thesetasksareperformed
by leveraging upon several domain-specific dictionaries and
ontologies.
Amethodologyfortraffic-relatedtwittermessagesinterpretation
(Albuquerque et al.): Tweets are a valuable sourceof real-time
ComputersinIndustry78(2016)1–2
ContentslistsavailableatScienceDirect
Computers
in
Industry
j ou rna l h ome p a ge : w ww . e l se v i e r. co m/ l oc a te / c om pi nd
http://dx.doi.org/10.1016/j.compind.2016.01.001
trafficinformation.However, animportantchallengeliesin the
analysis and interpretation of tweets, especially for traffic
monitoringwhichrequireverypreciseinformation(fore.g.exact
location,numberofvehiclesinvolvedinaccident).Thechallengeis
addressedinthis article,whichpresentsasystemthatanalyzes
andinterpretstweets,expressedinPortuguese,bytransforming
themintoRDFtriplesbasedona domain-specificontology.The
applicationhasbeendeployedtomonitortruckfleetsoperatedby
aliquidgasandafueldistributioncompany.
Textclassificationbasedfiltersforadomain-specificsearchengine
(Schmidt et al.): Domain-specific search engines often on rely
filterstonarrowdowntheirsearchresults.Thisarticlediscusses
howtextclassificationusingSupportVectorMachines(SVM)can
beusedtopredictfilters.Inaddition,italsoshowshowactive
learning can be applied to enhance the effectiveness of the
classificationtask.Theproposedsystemhasbeenemployedina
commercialGermansearchengineforthedomainofjoboffersand
vacancies.
Natural language processing for aviation safety reports:
From classification to interactive analysis (Tanguy et al.):
This article presents a system for classifying aviation safety
reports into incident categories. Classification is performed
usingSVM, and the performance is enhanced by adopting an
active learning procedure. In addition to classification, the
systemalsoperformstopicdetectionfromthereports,usingan
approach based on Gibbs sampling. However, the results for
topicdetectionweremitigated,withmuchbetterperformance
achieved in the text classification task. Furthermore, the
proposedsystemincludesaninformationretrievalapplication,
enablinguserstosearchforreportsdescribingsimilarincidents.
The results (i.e. reports describing similar incidents) are
displayedalongatemporalaxisona2-Dscatter-plotforeasier
visualization.
Text analytics in industry: Challenges, desiderata and trends
(Ittooetal.):ThelastarticleinthisSIisareviewarticlebythe
guest-editors. The current state of the art in text analytics
applicationsindustryis systematicallyreviewedanddiscussed
alongseveraldimensions,suchastheapplicationcontexts,the
techniques utilized and the evaluation procedure. From this
review, the authors identify the challenges and constraints
thatreal-worldenvironmentsimposeontextanalytics
applica-tions, andsubsequently, identify a set of desiderata that text
analytics applications should possess for their successful
deployment in industry. Finally, the article discusses future
trends in text analytics and their potential application in
industry,including therevivalin neural-network-based
meth-ods for deep learning, word-embeddings and scene (image,
videosequence)labeling.
Since 2013, Ashwin Ittoo is an Asst-Professor in
InformationSystemsattheHECManagementSchool,
UniversityofLie´ge,Belgium.Hisresearchinterestsare
minimally-supervisedlearningtechniquesforMachine
LearningandNaturalLanguageProcessingandinthe
applicationofthesetechniquesformeasuring
socio-economicindicators.HereceivedhisPh.D.degreefrom
the University of Groningen, The Netherlands in
2012andhisBachelorsandMastersfromtheNational
UniversityofSingaporeandtheNanyangTechnological
University,Singapore.
LeMinhNguyeniscurrentlyanAssociateProfessorof
SchoolofInformationScience,JAIST.Heleadsthelabon
MachineLearningandNaturallanguageUnderstanding
atJAIST.HereceivedhisB.Sc.degreeininformation
technologyfromHanoiUniversityofScience,andM.Sc.
degree in information technology from Vietnam
NationalUniversity,Hanoiin1998and2001,
respec-tively.He receivedhis Ph.D. degreein information
science from School of information science, Japan
AdvancedInstituteofScienceandTechnology(JAIST)
in2004. HewasanassistantprofessoratSchoolof
informationscience,JAIST from2008-2013. His
re-searchinterestsincludemachinelearning,text
sum-marization, machine translation, natural language
processing,andinformationretrieval.
AntalvandenBosch(PhD1997,Universiteit
Maas-tricht)isprofessoroflanguageandspeechtechnology
at the Centre for Language Studies at Radboud
University,Nijmegen,theNetherlands. His research
interestsincludememory-basedandexemplar-based
naturallanguage modeling,textanalyticsappliedto
historicaltextsandsocialmedia,andproofingtools.He
isamemberoftheNetherlandsRoyalAcademyofArts
andSciencesandECCAIFellow.
AshwinIttooa,*
LeMinhNguyenb
AntalvandenBoschc
aHECManagementSchool,UniversityofLie`ge,Lie`ge,Belgium
bSchoolofInformationScience,JapanAdvancedInsituteofScience
andTechnology,Japan-DivisionofDataScience,TonDucThang
University,HoChiMinhCity,Vietnam
cCentreforLanguageStudies,FacultyofArts,RadboudUniversity
Nijmegen,Nijmegen,TheNetherlands
*Correspondingauthor
E-mailaddresses:ashwin.ittoo@ulg.ac.be(A.Ittoo),
nguyenml@jaist.ac.jp(L.M.Nguyen),
a.vandenbosch@let.ru.nl(A.vandenBosch).
Availableonline12February2016
Editorial/ComputersinIndustry78(2016)1–2