• Aucun résultat trouvé

Uncertain Data Management Sources of Uncertain Data

N/A
N/A
Protected

Academic year: 2022

Partager "Uncertain Data Management Sources of Uncertain Data"

Copied!
23
0
0

Texte intégral

(1)

Uncertain Data Management Sources of Uncertain Data

Antoine Amarilli1, Silviu Maniu2 November 21st, 2017

(2)

Uncertain Data Management

Database systems usually assume that data iscorrectandcomplete

Incompleteandmissingdata

Imprecisedata

Noisydata

Untrustworthydata

→ Whichapplicationsproduce uncertain data nowadays?

(3)

Uncertain Data Management

Database systems usually assume that data iscorrectandcomplete

Incompleteandmissingdata

Imprecisedata

Noisydata

Untrustworthydata

→ Whichapplicationsproduce uncertain data nowadays?

(4)

Uncertain Data Management

Database systems usually assume that data iscorrectandcomplete

Incompleteandmissingdata

Imprecisedata

Noisydata

Untrustworthydata

→ Whichapplicationsproduce uncertain data nowadays?

(5)

Never-Ending Language Learning

NELL: Read the Web

(6)

Information extraction

DeepDive: extract facts from journal articles

����������������

�������������� ��������� ���������� �����

�����������������������������������������������

����

���

���

��� �������� ��������� ��������� ���� ���������

��� ��

���� ����

������� ������� �������

�������� ��������� ��� ��

�������� ��������� ��� �������

������������������� ����������

��������������

�����

��� ���������

�������� ���������

(7)

Many sources of uncertainty

Errors in sources:

Entity disambiguation:

“The place andfunction of Venusin Ovid...”

“Computed backscatteringfunction of Venusand the moon...”

Natural language parsing:

“such cities asNew York”, “such cities asMohenjo-Daro”

“the classification ofsuch cities asurban”

Anaphora:“Obama told Hollande thathewas not a target”

Noise:

Incompleteness

(8)

Many sources of uncertainty

Errors in sources:

Entity disambiguation:

“The place andfunction of Venusin Ovid...”

“Computed backscatteringfunction of Venusand the moon...”

Natural language parsing:

“such cities asNew York”, “such cities asMohenjo-Daro”

“the classification ofsuch cities asurban”

Anaphora:“Obama told Hollande thathewas not a target”

Noise:

Incompleteness

(9)

Many sources of uncertainty

Errors in sources:

Entity disambiguation:

“The place andfunction of Venusin Ovid...”

“Computed backscatteringfunction of Venusand the moon...”

Natural language parsing:

“such cities asNew York”, “such cities asMohenjo-Daro”

“the classification ofsuch cities asurban”

Anaphora:“Obama told Hollande thathewas not a target”

Noise:

Incompleteness

(10)

Many sources of uncertainty

Errors in sources:

Entity disambiguation:

“The place andfunction of Venusin Ovid...”

“Computed backscatteringfunction of Venusand the moon...”

Natural language parsing:

“such cities asNew York”, “such cities asMohenjo-Daro”

“the classification ofsuch cities asurban”

Anaphora:“Obama told Hollande thathewas not a target”

Noise:

Incompleteness

(11)

Many sources of uncertainty

Errors in sources:

Entity disambiguation:

“The place andfunction of Venusin Ovid...”

“Computed backscatteringfunction of Venusand the moon...”

Natural language parsing:

“such cities asNew York”, “such cities asMohenjo-Daro”

“the classification ofsuch cities asurban”

“Obama told Hollande that was not a target”

Incompleteness

(12)

Many sources of uncertainty

Errors in sources:

Entity disambiguation:

“The place andfunction of Venusin Ovid...”

“Computed backscatteringfunction of Venusand the moon...”

Natural language parsing:

“such cities asNew York”, “such cities asMohenjo-Daro”

“the classification ofsuch cities asurban”

Anaphora:“Obama told Hollande thathewas not a target”

Noise:

(13)

Crowdsourcing

Amazon Mechanical Turk

→ Users areuntrustworthy!

(14)

Crowdsourcing

Amazon Mechanical Turk

(15)

Sentiment analysis

n Most positiven-grams Most negativen-grams

1 engaging; best; powerful; love; beautiful bad; dull; boring; fails; worst; stupid; painfully 2 excellent performances; A masterpiece; masterful

film; wonderful movie; marvelous performances worst movie; very bad; shapeless mess; worst thing; instantly forgettable; complete failure 3 an amazing performance; wonderful all-ages tri-

umph; a wonderful movie; most visually stunning for worst movie; A lousy movie; a complete fail- ure; most painfully marginal; very bad sign 5 nicely acted and beautifully shot; gorgeous im-

agery, effective performances; the best of the year; a terrific American sports movie; refresh- ingly honest and ultimately touching

silliest and most incoherent movie; completely crass and forgettable movie; just another bad movie. A cumbersome and cliche-ridden movie;

a humorless, disjointed mess 8 one of the bestfilms of the year; A love forfilms

shines through each frame; created a masterful piece of artistry right here; A masterfulfilm from a masterfilmmaker,

A trashy, exploitative, thoroughly unpleasant ex- perience ; this sloppy drama is an empty ves- sel.; quickly drags on becoming boring and pre- dictable.; be the worst special-effects creation of the year

(16)

Schema mappings

(a)

(b)

(c)

(17)

Scientific data

(18)

Speech recognition and OCR

→ Decoding output isuncertain

(19)

Robotics

(20)

Other applications

Data integration:combine data across sources

Data cleaning:fix errors in stale/outdated data

Machine learning:predictions are uncertain

Data mining:trends extracted from large datasets

Computational biology:genomic data management

... and much more!

(21)

Image Credits

Slide 5:http://rtw.ml.cmu.edu/

Slide 7:https://en.wikipedia.org/wiki/Template:Disputed

Slide 6: [Zhang, 2015], page 9

Slide 13:https://www.mturk.com/

Slide 15: [Socher et al., 2013], page 10

Slide 16: [Dong et al., 2009], page 4

Slide 17:https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/CONFNOTES/ATLAS-CONF- 2015-041/fig_06b.png

Slide 18:https://code.google.com/p/transducersaurus/wiki/CascadeTutorial

(22)

References i

Dong, X. L., Halevy, A., and Yu, C. (2009).

Data integration with uncertainty.

The VLDB Journal—The International Journal on Very Large Data Bases, 18(2):469–500.

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C., Ng, A., and Potts, C. (2013).

Recursive deep models for semantic compositionality over a sentiment treebank.

InProc. EMNLP.

(23)

References ii

Zhang, C. (2015).

DeepDive: A Data Management System for Automatic Knowledge Base Construction.

PhD thesis, University of Winconsin–Madison.

https://cs.stanford.edu/people/czhang/zhang.thesis.pdf.

Références

Documents relatifs

Does query evaluation on probabilistic data have lower complexity when the structure of the data is close to a tree.. • Existing results on

Does query evaluation on probabilistic data have lower complexity when the structure of the data is close to a tree?. In

→ Lower bound: Tractable complexity in TID data and query seems unlikely, even for tree-shaped queries and data. → Can we compute lineages more efficiently in

Antoine Amarilli 1 , Silviu Maniu 2 November 21st, 2017... Class 1 (Nov 21): Introduction and

Π teacher,resp (Class) teacher resp Antoine Fabian Silviu Fabian.. → Duplicates

• In SQL, values can be True, False, or Unknown (NULL).. • Essentially anything that involves NULL

Booking date teacher room 21 Antoine Saphir 21 Silviu Saphir 21 Silviu C47 28 Antoine Saphir 28 Antoine C47 28 Silviu Saphir. Booking date teacher room 21 Antoine Saphir 21

• Extensional evaluation: find a safe plan so we can correctly compute all probabilities as the query is being evaluated. •