• Aucun résultat trouvé

Editorial: Special issue on natural language processing and text analytics in industry

N/A
N/A
Protected

Academic year: 2021

Partager "Editorial: Special issue on natural language processing and text analytics in industry"

Copied!
2
0
0

Texte intégral

(1)

Editorial:

Special

issue

on

natural

language

processing

and

text

analytics

in

industry

Therecentyearshavewitnessedanunprecedentedgrowthin

thevolumeofunstructureddataexpressedinnatural language.

This explosion in thevolume of text can to a large extent be

attributed to the emergence of Web 2.0 platforms and online

communities, including blogs, forums, review sites, and social

networks. Hidden within these novel channels is valuable

information,which, ifproperly identifiedand extracted,can be

employed to enhance a plethora of corporate activities. For

example, product reviews written by consumers on blogs or

retailerwebsitescontainusefulinformationaboutbrand

percep-tionandproductquality.

The scientific discipline of Natural Language Processing (or

ComputationalLinguistics)andtheappliedfieldofTextAnalytics,

in which Natural Language Processing coalesces with Machine

LearningandDataMining,havealsoevolvedovertheyearstokeep

pacewiththeproliferationofnovelsourcesoftextdata.Natural

LanguageProcessing(NLP)andTextAnalytics(TA)algorithmsand

techniques are increasingly being developed, adopted and

deployedforaddressing awide spectrumof real-life,industrial

problems. Typical examples include document classification,

documentclustering,topicdetectionandmodeling,andopinion

mining/sentiment analysis. However, at the same time, the

successful application of such innovative technologies raises a

numberofpertinentresearchquestions,withboththeoreticaland

practicalramifications.

ThemainobjectiveofthisSpecialIssue(SI)wastocollectand

consolidatehigh-qualityknowledgeonstateoftheartresearchin

NLP/TAmethodsandtheirapplicationsinindustrial(enterprise)

activities.Approximately70abstractswerereceivedfollowingthe

SIcall.Thisrelativelyhugenumberistestimonialtothegrowing

importanceofthefield.Theywereevaluatedfortheirpertinence

basedonseveralcriteria,includingasoundscientificcontribution

and a clear industrial application. Around 30 abstracts were

selectedaspartofthis evaluation.Therespective authorswere

requestedtosubmittheirfullmanuscripts,whichwerereviewed

by at least 2 reviewers according to the journal’s guidelines.

Finally, 7 articles were selected for publication (after another

reviewround)inthisSI.Inaddition,theSIalsoincludesanarticle

fromtheguest-editors,whichdiscussesthetrendsandchallenges

oftext analyticsapplicationsin industry. Anoverviewof these

articlesisprovidedbelow.

Analyzingandevaluatingthetaskofautomatictweetgeneration:

Knowledgetobusiness(LloretandPalomar):Thisarticlepresents

anapproachbasedontextsummarizationtechniquestogenerate

tweets. Furthermore, statistical measures are employed to

estimatetheinformativenessandinterestingnessoftweets.They

showthatautomaticallygeneratedtweetscanbeasinformative

and interesting as manually generated ones for the English

language.ResultsaremoremitigatedforSpanish.

Adistributionalapproachtoopenquestions inmarketresearch

(Evert et al.): The authorspresent theKlugator Engine (TKE),a

systemforanalyzingsurveyresponsesexpressedinEnglishand

German. In essence, their analysis involves determining the

sentimentsof responsesand clusteringthemaccording totheir

topicalsimilarity,whicharethendisplayedasnodesonasemantic

map.Itisworthnotingthatthissystemisalreadyincorporatedina

commercialsolution.

Integrating a semantic-based retrieval agent into case-based

reasoningsystems:Acasestudyofanonlinebookstore(Changetal.):

Techniquesforshort-textsemanticsimilarity(STSS)and

recogniz-ing textual entailment (RTE) are integrated in a case-based

reasoning system. Experimental evaluations revealed that the

proposedsystemoutperformedexistingones. Furthermore,ina

case-study,theaccuracyoftheproposedsysteminrespondingto

users’requests(queries,questions)wasalsocomparedtothatof

othersimilarsolutionsfoundintypicale-commercewebsites.The

case-studyshowedthattheproposedsystemwasmoreeffective

andprovidedmoreaccurateresponsestoaddresstheinformation

needsofusers.

Turning user generated health-related content into actionable

knowledge through text analytics services (Martinez et al.): A

systemformonitoringhealth-relatedinformationfor

pharmaco-vigilance ispresented. The proposedsystem integratesseveral

text analytics solution, such as GATE and MeaningCloud, for

extracting pertinent information pertaining to adverse drug

reactionsfrom social media sources. Informationextractionis

mainlyperformedusingnamedentityrecognitionandsemantic

relationextractionfromSpanishtexts.Thesetasksareperformed

by leveraging upon several domain-specific dictionaries and

ontologies.

Amethodologyfortraffic-relatedtwittermessagesinterpretation

(Albuquerque et al.): Tweets are a valuable sourceof real-time

ComputersinIndustry78(2016)1–2

ContentslistsavailableatScienceDirect

Computers

in

Industry

j ou rna l h ome p a ge : w ww . e l se v i e r. co m/ l oc a te / c om pi nd

http://dx.doi.org/10.1016/j.compind.2016.01.001

(2)

trafficinformation.However, animportantchallengeliesin the

analysis and interpretation of tweets, especially for traffic

monitoringwhichrequireverypreciseinformation(fore.g.exact

location,numberofvehiclesinvolvedinaccident).Thechallengeis

addressedinthis article,whichpresentsasystemthatanalyzes

andinterpretstweets,expressedinPortuguese,bytransforming

themintoRDFtriplesbasedona domain-specificontology.The

applicationhasbeendeployedtomonitortruckfleetsoperatedby

aliquidgasandafueldistributioncompany.

Textclassificationbasedfiltersforadomain-specificsearchengine

(Schmidt et al.): Domain-specific search engines often on rely

filterstonarrowdowntheirsearchresults.Thisarticlediscusses

howtextclassificationusingSupportVectorMachines(SVM)can

beusedtopredictfilters.Inaddition,italsoshowshowactive

learning can be applied to enhance the effectiveness of the

classificationtask.Theproposedsystemhasbeenemployedina

commercialGermansearchengineforthedomainofjoboffersand

vacancies.

Natural language processing for aviation safety reports:

From classification to interactive analysis (Tanguy et al.):

This article presents a system for classifying aviation safety

reports into incident categories. Classification is performed

usingSVM, and the performance is enhanced by adopting an

active learning procedure. In addition to classification, the

systemalsoperformstopicdetectionfromthereports,usingan

approach based on Gibbs sampling. However, the results for

topicdetectionweremitigated,withmuchbetterperformance

achieved in the text classification task. Furthermore, the

proposedsystemincludesaninformationretrievalapplication,

enablinguserstosearchforreportsdescribingsimilarincidents.

The results (i.e. reports describing similar incidents) are

displayedalongatemporalaxisona2-Dscatter-plotforeasier

visualization.

Text analytics in industry: Challenges, desiderata and trends

(Ittooetal.):ThelastarticleinthisSIisareviewarticlebythe

guest-editors. The current state of the art in text analytics

applicationsindustryis systematicallyreviewedanddiscussed

alongseveraldimensions,suchastheapplicationcontexts,the

techniques utilized and the evaluation procedure. From this

review, the authors identify the challenges and constraints

thatreal-worldenvironmentsimposeontextanalytics

applica-tions, andsubsequently, identify a set of desiderata that text

analytics applications should possess for their successful

deployment in industry. Finally, the article discusses future

trends in text analytics and their potential application in

industry,including therevivalin neural-network-based

meth-ods for deep learning, word-embeddings and scene (image,

videosequence)labeling.

Since 2013, Ashwin Ittoo is an Asst-Professor in

InformationSystemsattheHECManagementSchool,

UniversityofLie´ge,Belgium.Hisresearchinterestsare

minimally-supervisedlearningtechniquesforMachine

LearningandNaturalLanguageProcessingandinthe

applicationofthesetechniquesformeasuring

socio-economicindicators.HereceivedhisPh.D.degreefrom

the University of Groningen, The Netherlands in

2012andhisBachelorsandMastersfromtheNational

UniversityofSingaporeandtheNanyangTechnological

University,Singapore.

LeMinhNguyeniscurrentlyanAssociateProfessorof

SchoolofInformationScience,JAIST.Heleadsthelabon

MachineLearningandNaturallanguageUnderstanding

atJAIST.HereceivedhisB.Sc.degreeininformation

technologyfromHanoiUniversityofScience,andM.Sc.

degree in information technology from Vietnam

NationalUniversity,Hanoiin1998and2001,

respec-tively.He receivedhis Ph.D. degreein information

science from School of information science, Japan

AdvancedInstituteofScienceandTechnology(JAIST)

in2004. HewasanassistantprofessoratSchoolof

informationscience,JAIST from2008-2013. His

re-searchinterestsincludemachinelearning,text

sum-marization, machine translation, natural language

processing,andinformationretrieval.

AntalvandenBosch(PhD1997,Universiteit

Maas-tricht)isprofessoroflanguageandspeechtechnology

at the Centre for Language Studies at Radboud

University,Nijmegen,theNetherlands. His research

interestsincludememory-basedandexemplar-based

naturallanguage modeling,textanalyticsappliedto

historicaltextsandsocialmedia,andproofingtools.He

isamemberoftheNetherlandsRoyalAcademyofArts

andSciencesandECCAIFellow.

AshwinIttooa,*

LeMinhNguyenb

AntalvandenBoschc

aHECManagementSchool,UniversityofLie`ge,Lie`ge,Belgium

bSchoolofInformationScience,JapanAdvancedInsituteofScience

andTechnology,Japan-DivisionofDataScience,TonDucThang

University,HoChiMinhCity,Vietnam

cCentreforLanguageStudies,FacultyofArts,RadboudUniversity

Nijmegen,Nijmegen,TheNetherlands

*Correspondingauthor

E-mailaddresses:ashwin.ittoo@ulg.ac.be(A.Ittoo),

nguyenml@jaist.ac.jp(L.M.Nguyen),

a.vandenbosch@let.ru.nl(A.vandenBosch).

Availableonline12February2016

Editorial/ComputersinIndustry78(2016)1–2

Références

Documents relatifs

Therefore, our Text Insights web application offers analytics for investigating the publicly available data on a specific Facebook page on both the numerical and the textual level..

To this end, we designed a pipeline of natural language processing tech- niques including aspect extraction, sentiment analysis and text summa- rization to gather the reviews,

During the implementation of the above method, existing models for identifying the most informative components of the natural language text from the position of

automation of generating metadata files describing the semantic representation of a web page; semantic network construction for a given set of texts; semantic search execution for

Keywords: Natural Language Processing, Compositional Distributional Model of Meaning, Quantum Logic, Similarity Measures..

We add additional tags if special words (extreme, obligately, etc.) recognized in the texts. A BioNLP data is always domain-specific. All the texts in the corpus [4] are about

Observing that the social media mixed script posts often involve english as the mixing language and can involve at- most one more other language as the length of the messages are

We present a system that can generate natural language text in multiple different languages, in large quantities and at low costs due to the utilization