• Aucun résultat trouvé

Intensional data management

N/A
N/A
Protected

Academic year: 2022

Partager "Intensional data management"

Copied!
15
0
0

Texte intégral

(1)

1/8 Intensional data management

Intensional Data Management

Pierre Senellart

É C O L E N O R M A L E S U P É R I E U R E

RESEARCH UNIVERSITY PARIS

20 April 2018,BioQOP meeting

(2)

2/8 Intensional data management

Uncertain data is everywhere

Numerous sources ofuncertain data:

Measurement errors

Data integration from contradicting sources

Imprecise mappings between heterogeneous schemas

Imprecise automatic processes (information extraction, natural language processing, etc.)

Imperfect human judgment

Lies, opinions, rumors

(3)

3/8 Intensional data management

Structured data is everywhere

Data isstructured, not flat:

Variety of representation formats of data in the wild:

relational tables

trees, semi-structured documents

graphs, e.g., social networks or semantic graphs

data streams

complex views aggregating individual information

Heterogeneous schemas

Additionalstructural constraints: keys, inclusion dependencies

(4)

4/8 Intensional data management

Intensional data is everywhere

Lots of data sources can be seen asintensional: accessing all the data in the source (in extension) is impossibleorvery costly, but it is possible to access the data throughviews, with some access constraints, associated with someaccess cost.

Indexes over regular data sources

Deep Web sources: Web forms, Web services

The Web or social networks as partial graphs that can be expanded by crawling

Outcome of complex automated processes: information extraction, natural language analysis, machine learning, ontology matching

Crowd data: (very) partial views of the world

Logical consequences of facts, costly to compute

(5)

5/8 Intensional data management

Interactions between

uncertainty, structure, intensionality

If the data has complex structure, uncertain models should represent possible worlds over these structures (e.g.,

probability distributions over graph completions of a known subgraph in Web crawling).

If the data is intensional, we can use uncertainty to

represent prior distributions about what may happen if we access the data. Sometimes good enough to reach a

decision without having to make the access!

If the data is an RDF graph accessed by semantic Web services, each intensional data access will not give a single data point, but acomplex subgraph.

(6)

6/8 Intensional data management

Intensional Data Management

Jointly deal with Uncertainty, Structure, and the fact that access to data is limited and has acost (computational, network, monetary, privacy. . . ), to solve a user’s knowledge need

Lazy evaluation whenever possible

Evolving probabilistic, structured view of the current knowledge of the world

Solve at each step the problem: What is the best access to do next given my current knowledge of the world and the knowledge need

Knowledge acquisition plan (recursive, dynamic, adaptive) that minimizes access cost, and provides probabilistic guarantees

(7)

formulation

answer & explanation

optimization

modeling priors

intensional access

knowledge update

Knowledge need Current

knowledge of the world

Knowledge acquisition plan

Uncertain access result Structured

source profiles

(8)

formulation

answer & explanation

optimization

modeling priors

intensional access

knowledge update

Knowledge need

Current knowledge of the world

Knowledge acquisition plan

Uncertain access result Structured

source profiles

(9)

formulation

answer & explanation

optimization

modeling

priors

intensional access

knowledge update

Knowledge need

Current knowledge of the world

Knowledge acquisition plan

Uncertain access result

Structured source profiles

(10)

formulation

answer & explanation

optimization

modeling priors

intensional access

knowledge update

Knowledge need Current

knowledge of the world

Knowledge acquisition plan

Uncertain access result

Structured source profiles

(11)

formulation

answer & explanation

optimization

modeling priors

intensional access

knowledge update

Knowledge need Current

knowledge of the world

Knowledge acquisition plan

Uncertain access result

Structured source profiles

(12)

formulation

answer & explanation

optimization

modeling priors

intensional access

knowledge update

Knowledge need Current

knowledge of the world

Knowledge acquisition plan

Uncertain access result Structured

source profiles

(13)

formulation

answer & explanation

optimization

modeling priors

intensional access

knowledge update

Knowledge need Current

knowledge of the world

Knowledge acquisition plan

Uncertain access result Structured

source profiles

(14)

formulation

answer & explanation

optimization

modeling priors

intensional access

knowledge update

Knowledge need Current

knowledge of the world

Knowledge acquisition plan

Uncertain access result Structured

source profiles

(15)

8/8 Intensional data management

BioQOP: Privacy as Intensionality

In the differential privacy framework, every query over the data has a cost: theprivacy budgetused by the query

Trade-off between utility(quality, certainty of the result) and privacy cost

Answering a complex knowledge need (query) requires combining multiple accesses, while minimizing the total privacy costand maximizing the overall utility

Do not necessarily neglect other forms of cost (computational cost, network cost in distributed computation)

Do not necessarily neglect other forms of uncertainty (uncertainty in the input data, quality of classification algorithms)

Références

Documents relatifs

Each of these ways, and every combination thereof, has a cost in terms of manpower, budget, processing time, bandwidth (the data is intensional ), has a precision both as a prior and

Relational DB with uncertain facts, independent probabilities CQ evaluation already #P-hard in the DB. But what if we restrict

“The place and function of Venus in Ovid...”. “Computed backscattering function of Venus and

Given a user’s query, the central question in intensional data management is: “What is the best thing to do next?” in order to answer the query, meaning, what is the best access

I It can be chosen such that the components with variance λ α greater than the mean variance (by variable) are kept. In normalized PCA, the mean variance is 1. This is the Kaiser

Cur- rent access control systems are based on a predened set of static rules in order to prohibit access to certain elds of a data repository; these access control rules cannot

Data collection; Data storage: the ability to organize heterogeneous data storage, providing real-time access to data, data storage using distributed storage (including

Kerstin Schneider received her diploma degree in computer science from the Technical University of Kaiserslautern 1994 and her doctorate (Dr. nat) in computer science from