• Aucun résultat trouvé

Data Quality Not Your Typical Database Problem

N/A
N/A
Protected

Academic year: 2022

Partager "Data Quality Not Your Typical Database Problem"

Copied!
1
0
0

Texte intégral

(1)

 

Proceedings ICWIT 2012 3

Data Quality – Not Your Typical Database Problem

Mourad Ouzzani

Qatar Computing Research Institute

[email protected]

Abstract.

Textbook database examples are often wrong and simplistic. Unfortunately Data is never born clean or pure. Errors, missing values, repeated entries, inconsistent instances and unsatisfied business rules are the norm rather than the exception. Data cleaning (also known as data cleansing, record linkage and many other terminologies) is growing as a major application requirement and an interdisciplinary research area.

In this talk, we will start by discussing some of the major issues and challenges facing creating effective and efficient data cleaning solutions. Then we will discuss some challenges and criticize current conservative approaches to this very critical problem. Finally we will discuss some of our work at QCRI in this area.

Références

Documents relatifs

Expcrimeut al Studies of the Universal Chemical Key (UCK) Algorithm on the NCI Database of Chemical.. SI\ I A TITS Daylight Chemical Informa tion Systems

z ELMAGARMID, Ahmed K., IPEIROTIS, Panagiotis G., VERYKIOS, Vassilios S., Duplicate Record Detection A Survey, IEEE Transations on knowledge and Data Engineering (TKDE) Vol.

We present the pipeline of enabling vis- ual query creation over a SPARQL endpoint and ready-to-use data schemas over existing public Linked data endpoints, available in the

Social DQ component is responsible for standardizing social-originated data according to DQ requirements, while Sensor DQ component is responsible for standardizing

At the lowest level, individual metrics are calculated which detect the data quality problems described above and measures the overall quality as a value ranging from 0% (bad) to

While the rise of linked data and schema.org markup has made much more data available in an easily accessible manner, a record linkage attack relies on finding datasets that

Our work focuses on the Brazilian Public Health Sys- tem [23], specifically on supporting the assessment of data quality, pre-processing, and linkage of databases provided by

In practice however, data cleaning methods like imputation of missing values will influence statistical results and so must be accounted for in the following analyses or