• Aucun résultat trouvé

Democracy Big Bang: What data management can(not) do for journalism

N/A
N/A
Protected

Academic year: 2021

Partager "Democracy Big Bang: What data management can(not) do for journalism"

Copied!
47
0
0

Texte intégral

(1)

HAL Id: hal-01968347

https://hal.inria.fr/hal-01968347

Submitted on 2 Jan 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de

Democracy Big Bang: What data management can(not) do for journalism

Ioana Manolescu

To cite this version:

Ioana Manolescu. Democracy Big Bang: What data management can(not) do for journalism. 34ème Conférence sur la Gestion de Données – Principes, Technologies et Applications, Oct 2018, Bucarest, Romania. 2018. �hal-01968347�

(2)

Democracy Big Bang

What data management can(not) do for journalism

Ioana Manolescu

Inria Saclay and Ecole polytechique

24/10/18 Bases de Données Avancées, 2018 1

(3)

MOTIVATION

(4)

Bad memories: Romania, 1989

3 Bases de Données Avancées, 2018

24/10/18

(5)

Bad memories: Romania, 1989

Ceauşescu re-elected at the 14th congress!

(6)

Bad memories: Romania, 1989

5 Bases de Données Avancées, 2018

Ceauşescu re-elected at the 14th congress!

He was in power since 1965.

24/10/18

(7)

Bad memories: Romania, 1989

(8)

Bad memories: Romania, 1985

7 Bases de Données Avancées, 2018

24/10/18

(9)

Things get beYer

(10)

... kind of

9 Bases de Données Avancées, 2018

1000 dead (approx) No one convicted.

24/10/18

(11)

Democra]c socie]es crucially need the press

•  To debate and express dissent

•  To analyze, confim or refute public statements

•  To expose and explain society func]oning

Fact-checking (see next)

(12)

Democra]c socie]es crucially need the press

•  To debate and express dissent

•  To analyze, confim or refute public statements

•  To expose and explain society func]oning

Bases de Données Avancées, 2018 11

Fact-checking (see next)

24/10/18

(13)

DATA JOURNALISM AND

JOURNALISTIC FACT-CHECKING

(14)

Data journalism

Inves]ga]ve journalism based on complex and/or large data

Bases de Données Avancées, 2018 13

24/10/18 hYp://abonnes.lemonde.fr/les-decodeurs/poraolio/2017/04/18/

les-fractures-francaises-1-5-le-logement-les-raisons-de-la-crise_5112859_4355770.html

(15)

Data journalism

Panama Papers (Interna]onal Consor]um of Inves]ga]ve Journalism, ICIJ)

(16)

Data journalism

Bases de Données Avancées, 2018 15

Panama Papers (Interna]onal Consor]um

of Inves]ga]ve Journalism, ICIJ)

24/10/18

(17)

Fact-checking (since 1930 approx.)

“The day I became a fact-checker at The New Yorker, I received one set of red pencils […]

for underlining passages on page proofs of arFcles that might contain checkable facts. […]

confirmed with the help of reference books from the magazine’s library”

hYp://www.ny]mes.com/2010/08/22/magazine/22FOB- medium-t.html

Fact-checking: verification of facts mentioned in media content

q  To protect media reputation and avoid legal action

(18)

Fact-checking (2012 – ongoing )

Bases de Données Avancées, 2018 17

24/10/18

(19)

It's not just checking

•  Most aspects of modern reality are complex

•  Explaining can be as important and useful as checking

–  Helps also analyze the future

(20)

Libéra]on Désintox: cost of saving Cyprus

$2-3 bn

$10 bn

$15 bn

Bases de Données Avancées, 2018 19

24/10/18

(21)

Saving Cyprus: how much does it cost?

The article reads:

q The money is not given but lent

q The European Mechanism of Stability will lend $9bn

q  Out of which France contributes 20%

or a bit less than $2bn

However, things are complicated because

q The initial contribution of France to the EMS ($16 bn) is counted toward French public debt

q  The remainder contribution ($124) is not

(22)

Bases de Données Avancées, 2018 21 24/10/18

Saving Cyprus: how much does it cost?

The article reads:

q The money is not given but lent

q The European Mechanism of Stability will lend $9bn

q  Out of which France contributes 20%

or a bit less than $2bn

However, things are complicated because

q The initial contribution of France to the EMS ($16 bn) is counted toward French public debt

q  The remainder contribution ($124) is not

(23)

The article reads:

q The money is not given but lent

q The European Mechanism of Stability will lend $9bn

q  Out of which France contributes 20% or a bit less

than $2bn

However, things are complicated because

q The initial contribution of France to the EMS ($16 bn) is counted toward French public debt

q  The remainder contribution ($124) is not

The ini]al statements are mostly false

Saving Cyprus: how much does it cost?

(24)

The article reads:

q The money is not given but lent

q The European Mechanism of Stability will lend $9bn

q  Out of which France contributes 20% or a bit less

than $2bn

However, things are complicated because

q The initial contribution of France to the EMS ($16 bn) is counted toward French public debt

q  The remainder contribution ($124) is not

The ini]al statements are mostly false

The complete picture is complicated

Bases de Données Avancées, 2018 23

24/10/18

Saving Cyprus: how much does it cost?

(25)

The article reads:

q The money is not given but lent

q The European Mechanism of Stability will lend $9bn

q  Out of which France contributes 20% or a bit less

than $2bn

However, things are complicated because

q The initial contribution of France to the EMS ($16 bn) is counted toward French public debt

q  The remainder contribution ($124) is not

The ini]al statements are mostly false

The complete picture is complicated

I would not have been able to give the answer

Saving Cyprus: how much does it cost?

(26)

The article reads:

q The money is not given but lent

q The European Mechanism of Stability will lend $9bn

q  Out of which France contributes 20% or a bit less

than $2bn

However, things are complicated because

q The initial contribution of France to the EMS ($16 bn) is counted toward French public debt

q  The remainder contribution ($124) is not

The ini]al statements are mostly false

The complete picture is complicated

Forming a competent opinion is non trivial

Bases de Données Avancées, 2018 25

24/10/18

Saving Cyprus: how much does it cost?

(27)

Do we need to understand?

"Populism is telling people that there are simple answers to complex problems"

(28)

Fact checking vs. fake news detec]on

•  Fact checking is based on some background informa]on source

•  Fake news detec]on may or may not use a source

•  E.g., text classifier (true, fake) trained with major news agency content, resp. known fake content (oren virulent style)

24/10/18 Bases de Données Avancées, 2018 27

(29)

A CONTENT MANAGEMENT

PERSPECTIVE

(30)

Objec]ves

1.  Model fact-checking through a data and informa]on management perspec]ve

2.  Iden]fy exisFng tools and techniques which could be directly applied

– In a special journalis]c context (see next)

3.  Devise new models, tools and techniques for fact-checking and data journalism problems

24/10/18 Bases de Données Avancées, 2018 29

(31)

ANR ContentCheck project

•  2016-2019

•  Partners:

–  Inria Saclay: I. Manolescu

–  LIMSI (now Sorbonne Université): Xavier Tannier

–  U. Lyon: Sylvie Cazalens, Philippe Lamarre, Jean-Marc Pe]t, M. Plantevit

–  U. Rennes 1: François Goasdoué –  Le Monde: Les Décodeurs

•  Also: WebClaimExplain team with AIST Japan (Julien Leblay)

(32)

Fact-checking as a content management problem

Claim to be checked

(text or data) Media

content

Media context

Reference informa]on

source 1

Human actors (journalists, experts, crowd

workers)

Reference informa]on

source 2

Reference informa]on

source n

Verifica]on tool (query, match, source search…)

Analysis result + proof

« True / rather true / rather false / false See sources: hYp://dataref.com… »

31 Bases de Données Avancées, 2018

24/10/18

(33)

Fact-checking as a content management problem

Claim to be checked

(text or data) Media

content

Media context

Reference informa]on

source 1

Human actors (journalists, experts, crowd

workers)

Reference informa]on

source 2

Reference informa]on

source n

Verifica]on tool (query, match, source search…)

Analysis result + proof

« True / rather true / rather false / false See sources: hYp://dataref.com… » Claim

extrac]on

Social network

analysis

Source selec]on

Reconcilia]on, reputa]on

(34)

Fact-checking as a content management problem

Claim to be checked

(text or data) Media

content

Media context

Reference informa]on

source 1

Human actors (journalists, experts, crowd

workers)

Reference informa]on

source 2

Reference informa]on

source n

Verifica]on tool (query, match, source search…)

Analysis result + proof

« True / rather true / rather false / false See sources: hYp://dataref.com… »

33

Claim extrac]on

Social network

analysis

Reconcilia]on, reputa]on

Source d’informa]on de référence n+1

Source d’informa]on de référence n+1

Reference informa]on source n+1

Source search / source selec]on

Bases de Données Avancées, 2018 24/10/18

Reference source construc]on, refinement, integra]on

(35)

Fact-checking as a content management problem

[WWW2018] "A Content Management Perspec]ve on Fact-Checking",

S. Cazalens, J. Leblay, I. Manolescu, X. Tannier (fact-checking track)

[WWW2018 tutorial] "Computa]onal fact-checking: problems, state of the art, and perspec]ves", J. Leblay, I. Manolescu, X. Tannier

[VLDB2018 tutorial] "Computa]onal fact-checking: a content management

perspec]ve", S. Cazalens, J. Leblay, P. Lamarre, I. Manolescu, X. Tannier

(36)

Which exis]ng technologies can be used?

•  Database queries?

– Journalists do not, generally, build databases.

– It's seen as "not part of our job"

– Persis]ng data is novel to some – SQL unfriendly

– Not always helped by journal informa]on systems

•  Remarkable excep]on:

Ouest France

24/10/18 Bases de Données Avancées, 2018 35

(37)

Which exis]ng technologies can be used?

•  Database queries?

– Journalists do not, generally, build databases.

– It's seen as "not part of our job"

•  Curse of the coverage: they need to cover (almost) any topic

•  Curse of noteworthiness: write about hot topics of today (or tomorrow)

•  Thet work under strong Fme pressure

(38)

Which exis]ng technologies can be used?

•  IR/NLP?

– BeYer op]ons

•  The outcome of fact-check has to be explainable

•  Not (only) "an ML algorithm said so"

•  But they don't like to train ML systems, either.

•  Extremely picky on data sources

– "A majority of Web sources claim it" is not an op]on

24/10/18 Bases de Données Avancées, 2018 37

(39)

Automa]c source selec]on: FactMinder

[SIGMOD2013demo] F. Goasdoué, K. Karanasos, Y. Katsis, J.

Leblay, I. Manolescu, S. Zampetakis: "Fact-checking and analyzing the Web"

•  Plug-in

•  Brings relevant data from document

and knowledge bases

(40)

Improving access to reference data sources

[BDA2018] T.-D. Cao, I. Manolescu, X. Tannier. "Extrac]ng Linked Data from sta]s]c spreadsheets" (WebDB 2018, SBD 2017)

“Créa3ons d’entreprises en France en 2015” à we return:

24/10/18 Bases de Données Avancées, 2018 39

(41)

Keyword search across heterogeneous sources

[VLDB2018demo] C. Chanial, R. Dziri, H. Galhardas, J. Leblay, M.

Le Nguyen and I. Manolescu: "Connec]onLens: Finding Connec]ons Across Heterogeneous Data Sources"

•  Also today

(42)

Temporal beliefs:

A structured data model for fact- checking

With F. Goasdoué and L. Duroyon (U. Rennes 1) Modeling who said what when

•  "On Jan 22, 2017, Le Canard Enchaîné wrote that François Fillon had stated on Jan 21 that Penelope worked for the NA from 2002 to 2008."

•  What has one actor heard of?

•  Who knew this at that ]me?

•  Who reversed their posi]on on a topic?

24/10/18 Bases de Données Avancées, 2018 41

(43)

WRAP-UP

WHERE WE STAND, WHAT CAN WE DO?

(44)

Perspec]ves

•  Any quesFon answering framework which can be plugged on trusted data sources is useful

–  Usually no ]me nor skills to integrate the data

•  No IT infrastructure can be counted upon

–  More "a jumble of tools"

•  ML promisses a lot but delivery is hard(er)

–  Also: journalists ambivalence

•  Educate: the general audience, journalists, other social scien]sts

•  Strong and increasingly IT-literate community

–  E.g. hYps://www.poynter.org/channels/fact-checking

24/10/18 Bases de Données Avancées, 2018 43

(45)

NLP is hard

(46)

Is this worth it?

“Some people will never be convinced”

•  Correct.

•  “Facts have a liberal bias” (Paul Krugman)

•  Source: hYps://www.ny]mes.com/2017/12/08/opinion/facts- have-a-well-known- liberal-bias.html

•  Scien]sts and humanity scholars believe in a constructed, logical discourse, and believe humans yield to reason.

Businesspeople know this is not true, in general. Businesspeople have thus an advantage in winning poli]cal compe]]ons

George Lakoff, former Berkeley professor

hYps://georgelakoff.com/2016/11/22/a-minority-president-why- the-polls-failed-and-what-the-majority-can-do/

•  Conspiracy theory adepts believe two obviously contradic]ng theories [Wood et al., 2012]

24/10/18 Bases de Données Avancées, 2018 45

(47)

THANK YOU / QUESTIONS?

HTTP://CONTENTCHECK.INRIA.FR/

Références

Documents relatifs

This author essentially agrees with my interpretation of the female genital abdominal segments (Deuve 2001a) by recognising in the same way the presence of

internalizes how the gain of issuing debt is hampered by default risk, which lowers bond prices (debt Laffer curve).. Debt Issuance

In his writings, Feldenkrais uses a constellation of notions linked together: body image, body schema, self- image, awareness and habits.. Schilder’s book is seminal for it

The political exercise of environmentalism (political ecology) offers a set of coherent values allowing to rethink political, economic and social processes, and presents new

Results show that moisture variations on one single face of the panel reduce the residual drying stress by activating the mechano-sorptive creep which turns into concave

“what forms of democracy can be found in the statutes and practices of cooperatives in France?” Answering this question will require specifying the democratic rules that exist in

We believe that three aspects are key for dealing with intensive Big Data management: (i) dealing with efficient simple storage systems that focus efficient read and write

In Theorem 7 we suppose that (EJ is a sequence of infinite- dimensional spaces such that there is an absolutely convex weakly compact separable subset of E^ which is total in E^, n =