• Aucun résultat trouvé

Computational fact-checking: Problems, state of the art and perspectives

N/A
N/A
Protected

Academic year: 2021

Partager "Computational fact-checking: Problems, state of the art and perspectives"

Copied!
49
0
0

Texte intégral

(1)

HAL Id: hal-01995318

https://hal.inria.fr/hal-01995318

Submitted on 26 Jan 2019

Computational fact-checking: Problems, state of the art and perspectives

Ioana Manolescu

To cite this version:

Ioana Manolescu. Computational fact-checking: Problems, state of the art and perspectives. 19e Conférence Francophone sur l’Extraction et Gestion de Connaissances (EGC), Jan 2019, Metz, France.

�hal-01995318�

(2)

Computa(onal fact-checking:

Problems, state of the art and perspec(ves

Ioana Manolescu

CEDAR team, Inria Saclay and Ecole polytechique

h<p://pages.saclay.inria.fr/ioana.manolescu, @ioanamanol ExtracDon et GesDon de Connaissances, Metz, 2019

24/01/19 1

(3)

MOTIVATION

(4)

Bad memories: Romania, 1989

3 ExtracDon et GesDon de Connaissances, 2019

24/01/19

(5)

Bad memories: Romania, 1989

Ceauşescu re-elected at the 14th congress!

(6)

Bad memories: Romania, 1989

5 ExtracDon et GesDon de Connaissances, 2019

Ceauşescu re-elected at the 14th congress!

24/01/19

He had held power since 1965.

(7)

Bad memories: Romania, 1989

(8)

Bad memories: Romania, 1985

7 ExtracDon et GesDon de Connaissances, 2019

24/01/19

(9)

Things get be<er

(10)

... kind of

9 ExtracDon et GesDon de Connaissances, 2019

1000 dead (approx.) No one convicted.

24/01/19

(11)

DemocraDc socieDes crucially need the press

•  To debate and express dissent

•  To analyze, confim or refute public

statements Fact-checking

(Data) journalism

(12)

DemocraDc socieDes crucially need the press

•  To debate and express dissent

•  To analyze, confim or refute public statements

•  To expose and explain society funcDoning

ExtracDon et GesDon de Connaissances, 2019 11

24/01/19

Fact-checking (Data) journalism

(13)

DATA JOURNALISM AND

JOURNALISTIC FACT-CHECKING

(14)

Data journalism

InvesDgaDve journalism based on complex and/or large data

ExtracDon et GesDon de Connaissances, 2019 13

24/01/19 h<p://abonnes.lemonde.fr/les-decodeurs/poraolio/2017/04/18/

les-fractures-francaises-1-5-le-logement-les-raisons-de-la-crise_5112859_4355770.html

(15)

Data journalism

Panama Papers (InternaDonal ConsorDum

of InvesDgaDve Journalism, ICIJ)

(16)

Fact-checking (since 1930 approx.)

“The day I became a fact-checker at The New Yorker, I received one set of red pencils […]

for underlining passages on page proofs of ar(cles that might contain checkable facts. […]

confirmed with the help of reference books from the magazine’s library”

h<p://www.nyDmes.com/2010/08/22/magazine/22FOB- medium-t.html

15

Fact-checking: verification of facts mentioned in media content

q  To protect media reputation and avoid legal action

ExtracDon et GesDon de Connaissances, 2019 24/01/19

(17)

Fact-checking (2012 – ongoing )

(18)

It's not just checking

•  Most aspects of modern reality are complex

•  Explaining can be as important and useful as checking

–  Helps also analyze the future

17 ExtracDon et GesDon de Connaissances, 2019

24/01/19

(19)

LibéraDon Désintox: cost of saving Cyprus

$2-3 bn

$10 bn

(20)

Saving Cyprus: how much does it cost?

ExtracDon et GesDon de Connaissances, 2019 19

24/01/19

The article reads:

q  The money is not given but lent

q  The European Mechanism of Stability will lend $9bn

q Out of which France contributes 20%

(approximately $2bn)

However, things are complicated because:

q  The initial contribution of France to the EMS ($16 bn) counts toward French public debt

q The remainder contribution ($124) is not

(21)

Do we need to understand?

"Populism is telling people that there are simple answers to complex problems"

(22)

Fact checking vs. fake news detecDon

•  Fact checking is based on some background informa(on source

–  Truth commonly agreed upon

•  Fake news detecDon may or may not use a source

•  E.g., text classifier (true, fake) trained with major news agency / fake content (osen virulent style)

24/01/19 ExtracDon et GesDon de Connaissances, 2019 21

(23)

Most common fact-checking scenarios

•  "What is the value of metric X in space Y at (me T"?

–  X=youth unemployment, Y=Germany, T=2018 –  X=illegal immigrants, Y=Italy, T=[2015-2018]

–  X=budget for research, Y=France, T=2019 –  X=average montly wages, Y=China, T=2018

•  Comparison pa<erns

(24)

Most common fact-checking scenarios

24/01/19 ExtracDon et GesDon de Connaissances, 2019 23

(25)

Most common fact-checking scenarios

•  "What did X say about Y

[at Dme T]"? •  "Is X related [in sense S] to Y?"

(26)

A CONTENT MANAGEMENT PERSPECTIVE

24/01/19 ExtracDon et GesDon de Connaissances, 2019 25

(27)

Lines of past and current research

1.  Model fact-checking through a data and informaDon management perspecDve

2.  IdenDfy exis(ng tools and techniques which could be directly applied

– In a special journalisDc context (see next)

3.  Devise new models, tools and techniques for

(28)

Projects and collaboraDons

Google Award (2015) with X. Tannier (LIMSI)

•  ANR ContentCheck (2016-2019) with

X. Tannier (Sorbonne Université), S. Cazalens, P.

Lamarre, J.-M. PeDt, M. Plantevit (U. Lyon),

F. Goasdoué (U. Rennes 1), Les Décodeurs (Le Monde)

•  Inria Associated Team WebClaimExplain with AIST Japan (Julien Leblay)

Collabora(ons with H. Galhardas (Technical U. Lisbon), former PhD S. Zampetakis and others

24/01/19 ExtracDon et GesDon de Connaissances, 2019 27

h<p://contentcheck.inria.fr

(29)

Fact-checking as a content management problem

Claim to be checked

(text or data) Media

content

Media context

Human actors (journalists, experts, crowd

workers)

VerificaDon tool (query, match, source search…)

Analysis result + proof

« True / rather true / rather false / false See sources: h<p://dataref.com… »

(30)

Fact-checking as a content management problem

Claim to be checked

(text or data) Media

content

Media context

Reference informaDon

source 1

Human actors (journalists, experts, crowd

workers)

Reference informaDon

source 2

Reference informaDon

source n

VerificaDon tool (query, match, source search…)

Analysis result + proof

« True / rather true / rather false / false See sources: h<p://dataref.com… »

29

Claim extracDon

Social network

analysis

Source selecDon

ReconciliaDon, reputaDon

ExtracDon et GesDon de Connaissances, 2019 24/01/19

Reference source construcDon, refinement, integraDon

(31)

Fact-checking as a content management problem

Claim to be checked

(text or data) Media

content

Media context

Human actors (journalists, experts, crowd

workers)

VerificaDon tool (query, match, source search…)

Analysis result + proof

« True / rather true / rather false / false See sources: h<p://dataref.com… » Claim

extracDon

Social network

analysis

ReconciliaDon, reputaDon

Reference informaDon source Source search / source selecDon

(32)

Fact-checking as a content management problem

24/01/19 ExtracDon et GesDon de Connaissances, 2019 31

[WWW2018] "A Content Management PerspecDve on Fact-Checking",

S. Cazalens, J. Leblay, I. Manolescu, X. Tannier (fact-checking track)

[WWW2018 tutorial] "ComputaDonal fact-checking: problems, state of the art, and perspecDves", J. Leblay, I. Manolescu, X. Tannier

[VLDB2018 tutorial] "ComputaDonal fact-checking: a content management

perspecDve", S. Cazalens, J. Leblay, P. Lamarre, I. Manolescu, X. Tannier

(33)

Which exisDng technologies can be used?

•  Database queries?

– Journalists do not, generally, build databases.

– "Not part of our job"

– PersisDng data is novel to some – SQL not user-friendly

– Not always helped by journal

(34)

Which exisDng technologies can be used?

•  Database queries?

– Journalists do not, generally, build databases.

– "Not part of our job"

•  Curse of the coverage: they need to cover (almost) any topic

•  Curse of noteworthiness: write about hot topics of today (or tomorrow)

•  They work under strong (me pressure

24/01/19 ExtracDon et GesDon de Connaissances, 2019 33

(35)

Which exisDng technologies can be used?

•  IR/NLP?

– Be<er opDons

•  The outcome of fact-check has to be explainable

•  Not (only) "an ML algorithm said so"

•  But they don't like to train ML systems, either.

•  Extremely picky on data sources

(36)

AutomaDc source selecDon: FactMinder

24/01/19 ExtracDon et GesDon de Connaissances, 2019

[SIGMOD2013demo] F. Goasdoué, K. Karanasos, Y. Katsis, J.

Leblay, I. Manolescu, S. Zampetakis: "Fact-checking and analyzing the Web"

•  Plug-in

•  Brings relevant data from document

and knowledge bases

•  « Second screen »

35

(37)

Improving access to reference data sources

[BDA2018] T. D. Cao, I. Manolescu, X. Tannier. "ExtracDng Linked Data from staDsDc spreadsheets" (WebDB 2018, SBD 2017)

“Créa3ons d’entreprises en France en 2015” à we return:

(38)

Keyword search across heterogeneous sources

24/01/19 ExtracDon et GesDon de Connaissances, 2019 37

[VLDB2018demo] C. Chanial, R. Dziri, H. Galhardas, J. Leblay, M.

Le Nguyen and I. Manolescu: "ConnecDonLens: Finding ConnecDons Across Heterogeneous Data Sources"

(39)

A data model for facts, statement and beliefs

With F. Goasdoué and L. Duroyon (U. Rennes 1) Modeling who said what when

•  "On Jan 22, 2017, Le Canard Enchaîné wrote that François Fillon had stated on Jan 21 that Penelope worked for the NA from 2002 to 2008."

•  What has one actor heard of?

(40)

Summary of our work

•  AutomaDcally find most relevant reference data for a claim

–  RDF dump of full INSEE staDsDcs

–  Table-aware full text search on the resulDng graphs –  IdenDfying staDsDc menDons in text (submi<ed)

•  IdenDfy connecDons between enDDes across heterogeneous content: ConnecDonLens

•  Modelling temporal facts, beliefs and statements

24/01/19 ExtracDon et GesDon de Connaissances, 2019 39

(41)

STATUS AND ROADMAP

(42)

Status

•  Any ques(on answering framework which can be plugged on trusted data sources is useful

–  Usually no Dme nor skills to integrate the data

•  No IT infrastructure can be counted upon

–  More "a jumble of tools"

•  ML promisses a lot but delivery is hard(er)

–  Also: journalists ambivalence

•  Strong and increasingly IT-literate community

–  E.g. h<ps://www.poynter.org/channels/fact-checking

24/01/19 ExtracDon et GesDon de Connaissances, 2019 41

(43)

Roadmap: computer science research

•  DB, KR, IR, NLP

•  In its most general statement, fact-checking supposes perfect NLP à study sub-problems!

•  In fact-checking journalism, human writers chose topics, angle, style...

– "A story wrapped around a query"

(44)

A vision of journalisDc dataspaces

"Dataspaces": Franklin, Halevy and Maier, SIGMOD Record 2005

•  Ingest data of any nature: structured (relaDonal), semistructured (JSON, XML, [social] graphs),

unstructured (text), KB...

Storage, indexing

Search across the data

–  Dong and Halevy [SIGMOD 2017]: kwd search, result can come from any data source

–  Bonaque, CauDs, Goasdoué, Manolescu [EDBT2016]:

document search with social score component

–  ConnecDonLens: find answers in any combinaDon of data sources; "ad-hoc linked data"

24/01/19 ExtracDon et GesDon de Connaissances, 2019 43

(45)

Requirements for journalisDc dataspaces

•  Time

–  Of data acquisiDon –  Of events described in

the data

•  Provenance

–  Authorship metadata –  AnnotaDon by users

•  Capacity to "derive"

content (à la views)

•  (Semi-automaDc)

semanDc annotaDon and classificaDon

•  Social connecDons analysis

(46)

Roadmap: society

•  Educate: the general audience, journalists, other social scienDsts

– "Fake news creaDon"

games, e.g.

h<p://getbadnews.com

24/01/19 ExtracDon et GesDon de Connaissances, 2019 45

(47)

Roadmap: society

•  Educate: the general audience, journalists, other social scienDsts

– EducaDon to media and the internet in schools

(48)

Is this worth it?

“Some people will never be convinced”

•  “Facts have a liberal bias” (Paul Krugman)

h<ps://www.nyDmes.com/2017/12/08/opinion/facts-have-a-well- known- liberal-bias.html

•  "ScienDsts and humanity scholars believe in a constructed, logical discourse, and believe humans yield to reason.

Businesspeople know this is not true, in general. Businesspeople have thus an advantage in winning poliDcal compeDDons."

George Lakoff, former Berkeley professor

h<ps://georgelakoff.com/2016/11/22/a-minority-president-why- the-polls-failed-and-what-the-majority-can-do/

•  Conspiracy theory adepts believe two obviously contradicDng theories [Wood et al., 2012]

24/01/19 ExtracDon et GesDon de Connaissances, 2019 47

(49)

THANK YOU / QUESTIONS?

HTTP://CONTENTCHECK.INRIA.FR/

Références

Documents relatifs

L’agriculture biologique, et plus largement les systèmes alimentaires biologiques, prototypes 488. incontournables de systèmes alimentaires durables (75), constituent

q  Good journalists are very picky with their data sources.. Databases for fact-checking and

This paper will present the main bioactives in fruit and vegetables, and their co-products, and the precautions to preserve these molecules in the food processing chain, with a

Such deeper understanding of phonon transport has fostered the development of new nanomaterials and devices with engineered phononic and thermal properties, ranging from

The TuCSoN middleware is publicly available from its home page 9 , which provides both the binaries (a ready-to-use Java jar file) and a link to the source code repository.. From

Given that adult bilinguals are “not two monolinguals in one person” (Grosjean, 1989), our review will aim to shed light on the similarities and differences between mono-

We consider our analysis of fact checking as a content management problem to be new and original, and we are eager to present the many opportunity for research raised by data

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des