HD28
.M414
no.^3
WORKING
PAPER
ALFRED
P.SLOAN
SCHOOL
OF
MANAGEMENT
A
Data
Consumer-based Approach
toSupporting
Data
Quality
Judgement
December
1992WP
#3516-93CISL
WP#
92-05Y.Jang H.B.
Kon
Richard Y.
Wang
SloanSchool of
Management,
MIT
MASSACHUSETTS
INSTITUTE
OF
TECHNOLOGY
50
MEMORIAL
DRIVE
Published
in theProceedings
ofthe
Second
Annual
Workshop
on
Information
Technology and Systems (WITS)
Dallas,
TX
December
1992
A
Data
Consumer-based Approach
toSupporting
Data
Quality
Judgement
December
1992WP
#3516-93CISL
WP#
92-05Y.Jang H.B.
Kon
RichardY.
Wang
SloanSchoolof
Management,
MIT
*seepage bottom forcompleteaddress
Yeona
Jang E53-317Henry
B.Kon
E53-322 RichardY.Wang
E53-322Sloan Schoolof
Management
MassachusettsInstitute of
Technology
Cambridge,
MA
01239Publishedin theProceedingsoftheSecond
Annual
Workshop
on
InformationTechnology
and
Systems (WITS) Dallas, TexasDecember,
1992A
Data
Consumer-based
Approach
to
Supporting
Data
Quality
Judgment
Yoena
Jang
Henry
B.Kon
Richard
Y.Wang
December
1992
CISL-92-05
Composite
InformationSystems Laboratory
E53-320, SloanSchool
ofManagement
Massachusetts
Institute ofTechnology
Cambridge,
Moss.02139
ATTN: ProfessorRichard
Wang
Tel: (617) 253^3442 Fax:(617)734-2137
e-mail:
rwang@eagle.mit.edu
A
Knowledge-Based
Approach
to
Assisting
In
Data
Quality
Judgment
(Extended Abstract)
Yeona Jang Henry B.
Kon
Richard Y.Wang
LaboratoryforComputerScience Sloan SchoolofManagement SloanSchoolofManagement
MassachusettsInstituteofTechnology MassachusettsInstituteofTechnology MassachusettsInstituteofTechnology
Yeona®lcs.mit.edu hkonSfmiLedu rwangdmit.edu
Abstract
Astheintegration ofinformationsystemsenablesgreater accessibility todatafrom multiplesources, theissue
ofdata quality becomesincreasingly important.This pap>erattemptsto formallyaddress the data quality judgment problemwithaknowledge-basedapproach.Ouranalysishasidentified several related theoretical
and practical issues. For example, data quality is determined by several factors, referred to as quality
parameters. Qualityparametersare oftennotindependentofeachother, raising the issue ofhowtorepresent relationships amongqualityparametersandreason with such relationships to draw insightful knowledge
about the overall quality of data.
Inparticular, this paper presents a data quality reasoner. Thedata quality reasoner is a data quality
judgmentmodel basedon thenotionof a "censusofneeds." Itprovides a framework forderivinganoverall
dataquaUtyvaluefromlocal relationshipsamong quaUty parameters. The data quality reasoner will assist
dataconsumersinjudging dataquality.Thisisparticularlyimportantwhen a largeamount ofdata involved
indecision-makingcomefromdifferent,imfamUiarsources.
1.
Introduction
As
the integration of information systems has enabled dataconsumers
to gain access to both familiarand
unfamiliar data, there hasbeen growing
interestand
activity in the area of data quality.Even
ifeach individual data supplier v^ere to guarantee the integrity
and
consistency of data, datafrom
different suppliersmay
stillbe
of different quality levels—
due, for example, to different datamaintenance
policies. Unfortunately, asdemonstrated
in studies presented in the literature such as[Bonoma,
1985;Bumham,
1985;Johnson,1990;Laudon,1986], decisionsmade
basedon
inaccurateorout-of-date data can result in serious
economic
and
socialdamage.
The
problem
of data quality is thus increasingly critical.A
majority of previous research effortson
data quality has focusedon
providing to dataconsumers
"meta-data," i.e., data about data, that can facilitate thejudgment
of data quality; forexample, data source, creation time,
and
collectionmethod.
We
refer to these characteristics of thedata manufacturing process asquaUty indicators (see Table 1 for
examples
of quality indicators).Data-quality
judgment
is still,however,
left to the data consumers. Unfortunately, information overloadmakes
it difficult to analyze such dataand
draw
useful conclusions about data quality. This paper seeks to assist dataconsumers
in judging if the quality of datameets
his or her requirements,by
reasoningabout informationcritical todata qualityjudgment.Regarding
dataquality, thispaper focusesespeciallyon
theproblem
of assessinglevelsofdata quality, i.e., the degree towhich
datameets
desired characteristics of the datafrom
the user'sperspective. In considering the data quality assessment problem, our analysis has identified several
theoretical
and
practical issues:1)
What
are data quality requirements?2)
How
canrelationshipsbetween
dimensionsoftheserequirementsberepresented?3)
What
can beknown
aboutoverall data qualityfrom
suchrelationships,and
how?
The
study conducted
on
major
US
firms, in[Wang
&
Guarrascio, 1991], identified a relativelyexhaustivelist ofrequirements, suchas timeliness
and
credibility.Such
requirementsare referred to asquality parameters in this
paper
(see Table 2 forexamples
of data quality parameters). Unfortunately,Research presentedin thispaper was supportedinpartbythe International Financial ServicesResearch CenteratMTT, inpart bytheNational
HeartLung,andBloodIiwtituteunderthegrant numberROl HL33041, andinpartbytheNationalIiutituteofHealthunderthegrantnumberROl
requirementsofdata depjend tolargely
on
the intendedusageof the data. Forexample,consider patientrecords. Availability of the records
may
bemore
important than accuracy to hospital administration,while to physicians accuracy is as important as availability of the records for effective patient
management.
The
issue, then, ishow
to deal with such user- or application-specificity ofquality-parameter relationships. This paper attempts to address this issue with a
knowledge-based
approach. This raises the issue ofhow
to represent relationshipsamong
quality parameters.Another
importantissue is
how
to reason with such relationships todraw
insightfulknowledge
about overall dataquality. This paper focuses mainly
on
addressing the lasttwo
issues: representationaland
reasoningissues.
To do
so,we
assume
thatdata quality parameters, such asshown
in Table 2, are available for use.complete
enumeration
of quality parametersmay
contain toomuch
information toconvey
to dataconsumers any
insightsaboutoverall data quality. This paper providesamodel
to help dataconsumers
raise their levels of
knowledge
about the data they use,and
thusmake
informed decisions.Such
aprocess represents data quality filtering.
Our
projectinvolves an investigation of a data qualityjudgment
model, with theaim
ofraising related issuesand
describingmechanisms
behind the use ofknowledge
about local quality-parameter relationships indata quality judgment. Section 2 discussesa representation for specifying various localrelationships
between
quality parameters. Section 3 discusses the computationalcomponent
of the qualityjudgment
model. It includes amechanism
for reasoning with localdominance
relationships toidentify information critical to overall data quality. Finally, Section 4
summarizes
this researchand
suggests future directionsfor thefield ofdata quality evaluation.13. Related
Work
The
decision-analytic approach, assummarized
in [Keeney&
Raiffa, 1976],and
utilityanalysisunder
multiple objectives, assummarized
in[Chankong
&
Haimes,
1983], describe solution approaches forspecifyingpreferences
and
resolvingmultipleobjectives.The
preference structureofadecisionmaker
or evaluator is specified as a hierarchy of objectives.Through
a decomposition of objectives using eithersubjectively defined
mappings
or formal utility analyses, the hierarchy canbe reduced
toan
overall value.The
decision-analyticapproach
isgenerally builtaround
the presupposition of the existence ofcontinuousutility functions.
The approach
presented in thispaper,on
the otherhand,does
not requirethat
dominance
relationsbetween
quality parameters be continuousfunctions, orthat their interactionsbe completely specified. It only presupposes that
some
localdominance
relationshipsbetween
qualityparameters exist.
Representational
schemes
similar toone
presented in thispaper
are investigated, to represent preferences, in sub disciplines of Artificial Intelligence such as Planning [Wellman,1990,Wellman
&
Doyle, 1991].
The
research effort, however, has focused primarilyon
issues involved in representingpreferences,
and
much
lesssoon
computationalmechanisms
forreasoningwithsuchknowledge.2.
Data
Quality
Reasoner
This section discusses the data quality reasoner, called EX^R.
DQR
is a data qualityjudgment
model
which
derivesan
overall data quality value for a particular data element, basedon
the followinginformation:
1)
A
set,QP,
of underlyingquality parameters that affectdataquality:QP
=
{(Jj, (?2' •••'lj-2)
A
set,DR,
oflocaldominance
relationshipsbetween
qualityparameters inQP.
In particular, this paper addresses the following
fundamental
issues that arise in consideringthe useoflocal relationships
between
quality parametersindata quality judgment:1)
How
torepresentlocaldominance
relationshipsbetween
quality parameters.2)
What
todo
with suchlocaldominance
relationships.Section 2.1 presents a representation
scheme
for specifying localdominance
relationshipsbetween
quality parameters in order to facilitate data quality judgment. Section 2.2 discusses a computationalframework which
exploitssuch relationships todraw
insightsabout overall dataquality.2.1. RepresentationofLocal
Dominance
RelationshipsThis subsection discusses a representation oflocal
dominance
relationshipsbetween
quality parameters.To
facilitate further discussion, additional notations are introduced below. Forany
quality parameterq,, let
symbol
Vj denote the set ofvalues that(j, cantake on. In addition, the following notation is usedto describe value
assignments
for quality parameters. Forany
qualityparameter
(j,, the valueassignment
(J, .=v (forexample. Timeliness := High) represents the instantiation of the valueof q,as v,
"quality-parameter
value assignments".A
qualityparameter
with a particular value assigned to it is alsoreferred to as
an
instantiated quality parameter.For
some
quality parameters (j,, CJ2,...,(]„, forsome
integer n>\, qjnq^r^.-.nq^ represents aconjunction of quality parameters. Similarly, (yj.=z;jn
'?2'=^2'^-"'^'?n'=^n'^^^
some
v,in V., for all ;' =1,2, ... ,
and
n, represents a conjunction of quality-parameter value assignments.Note
that thesymbol
n
used
in theabove
statement denotes the logical conjunction, not setintersection, ofevents assertedby
instantiating quality parameters.Finally, notation '©' is
used
to state that data quality is affectedby
quality parameters.It is
represented as ®(£jjn(j2'^...'^(j„) that data quality is affected
by
quality parameters qi,q2'>
'^^^^n-Statement ®(i^,nij2n...nq^) iscalled a quality-merge statement,
and
is read as "the qualitymerge
of qi_ ^2.,
and q„." Simpler notation, ®(^i,(?2' •••''Jn)' '^ also used.A
quality-merge statement is said to beinstantiated, if all quality parameters in a quality-merge statement are instantiated to certain values.
For example, statement ®((7].=f,n ^2'~^2'^•••'^'?n'=^n'' '^
^"
instantiated quality-merge statement of®((jj ^2' •' lJ'^'^^
some
y,in V,, foralli=1, 2, ...,and
n.The
following defines alocaldominance
relationshipamong
quality parameters.Definition 1
(Dominance
relation): Let £jand
Ej betwo
conjunctions of quality-parameter valueassignments.£j issaid to
dominate
£2, denotedby
£; >i£2' ifa"d
only if®(£j
,£2,+)is reducible to®(£,,+),
where
"+" stands for the conjunction of value assignments for the rest of the quality parameters, inQP, which
areshown
neither inEj norin£2-Note
that as impliedby
"+," this definitionassumes
the context-insensitivity of reduction: (£,,£2,+)can be reduced to ®(£j,+), regardless of the values of the quality parameters, in
QP,
that are notinvolved in the reduction.
Moreover,
"+" implies that theseuninvolved
quality parameters inQP
remain
unaffectedby
the application of reduction. For example, consider a quality-merge statementwhich
consists of qualityparameters
Source-credibility, Interpretability, Timeliness,and
more.Suppose
thatwhen
Source-credibilityand
Timeliness are High,and
Interpretability isMedium,
Interpretability
dominates
theothertwo. Thisdominance
relationship can be represented asfollows: "Interpretability :=Medium
>jSource-Credibility :=High
nTimeliness := High. "Then, ©(Source-credibility := High, Interpretability :=
Medium,
Timeliness := High, +) is reducible toquality-merge statement ®(Interpretability :=
Medium,
+).As
mentioned
at the beginning of Section 2, the evaluation of the overall data quality for aparticular data element requires information about a set of quality parameters that play a role in
determining the overall quality,
QP
= (ijj (^2, ••, (?„),and
a setDR
of localdominance
relationshipsbetween
qualityparameters
inQP.
Information provided inQP
is interpretedby
DQR
as "theoverallquality is the resultof quality
merge
of quality parameters cj^,q^' ••'^nd
q„, i.e., ®(q^j^2---''ln^-" Localdominance
relationships inDR
are used toderivean overall data qualityvalue. Itmay
be unnecessaryor impossible,
however,
to explicitly state eachand
every plausible relationshipbetween
qualityparameters in
DR.
Assuming
incompleteness of preferences in quality parameter relationships, thispaper
approaches
the incompleteness issue with the following default assumption: Forany
two
conjunctions of quality parameters, ifno
informationon
dominance
relationshipsbetween
them
isavailable, then they are
assumed
to be in theindominance
relation.The indominance
relation isrepresented asfollows:
Definition 2
(Indominance
relation): Let £jand
£2 betwo
conjunctions of quality-parameter valueassignments.£j
and
£2 are said tobeintheindominance
relation,ifneither£, >^E2nor E^'^d^iWhen
two
conjunctions of qualityparametersareindominant,a dataconsumer
may
specifytheresultof22. Reasoning
Component
ofDQR
The
previoussubsection discussedhow
torepresent local relationshipsbetween
quality parameters.The
next question that arises is thenhow
to derive overall data qualityfrom
such localdominance
relationships, i.e.,
how
to evaluate a quality-merge statement basedon
such relationships. This task,simply referred toas the "data-quality-estimating problem," is
summarized
as follows:Data-Quality-Estimating Problem:
Let
DR
bea setof localdominance
relationshipsbetween
qualityparameters,q^ (jj, .,and
(j„.Compute
©((jj^2-''?n^'subject to localdominance
relationships inDR.
An
instance of the data-quality-estimatingproblem
is represented as a list of a quality-merge statementand
a correspondingsetoflocaldominance
relationships,i.e.,(©((^2^2•''?n^'
^^)-The
rest of this section presents aframework
for solving the data-quality-estimating problem, basedon
the notionof"reduction".The
followingaxiom
defines the data quality valuewhen
onlyone
quality parameter is involved in quality merge.Axiom
1 (Quality Merge): Forany
quality-merge statement ®(qj/f2'---''1n^' if m = 1, then the value of®{q^xi2-..,q„) isequal to that of q^.
Quality-merge statements with
more
thanone
quality parameter arereduced
toones
with a smallernumber
of quality parameters.The
following defineaxioms
which
provide a basis for the reduction.As
impliedby
Definition 1and
the default assumption,any
two
conjunctions ofquality-parameter value-assignments can be in either the
dominance
relation or in theindominance
relation.The
followingaxiom
specifiesthatany
two
conjunctionscannot be bothin thedominance
relationand
inthe
indominance
relation.Axiom
2(MutualExclusivity): Forany
two
conjunctions £jand
£3 of quality-parameter valueassignments,Ej
and
£2 are related to eachotherinexactlyone
of the followingways:l.£j>,E2 2.£2>,£,
3.£j
and
£2 are in theindominance
relation.The
followingaxiom
defines the precedence of thedominance
relation over theindominance
relation. This implies that while evaluating a quality-merge statement, quality parameters in the
dominance
relationareconsideredbefore thosenot inthedominance
relation.Axiom
3 (Precedence of>J:
The
dominance
relationtakesprecedence overtheindominance
relation.Reduction-Based
Evaluation:A
reduction-based evaluationscheme
isany
evaluationprocesswhere
thereduction operations take precedence over all other evaluation operations. Definition 1
and axiom
3allow the reduction-based evaluation strategy to be used to solve the data-quality-estimating
problem
forquality-mergestatements with
more
thanone
quality parameter.The
use ofdominance
relationships to reduce a quality-merge statement raises the issue ofwhich
localdominance
relationships to apply first, i.e., regarding theorder inwhich
localdominance
relationships are applied. Unfortunately, the reduction of a quality-merge statement is not
always
well-defined. In particular,a quality-mergestatementcan be reducedinmore
thanone way, depending
on
the order inwhich
the reduction is jjerformed. For example, consideran
instance of the data-quality-estimating problem,(®(<?i,<?2''?3'^4''?5''?6)' DR),
where
DR
consists of the following localThen, the
quahty-merge
statement ®('?j,^2''?3''?4''?5''?6^ '^^^ ^^reduced
tomore
thanone
irreduciblequality-merge statement.
The
followingshow some
ofthem:• Incasethat(jj >j (jj, CI4 >j'?s,
and
((Jjnij^) >/q3r>q^) are applied in that order,®(^i,(?2''?3''?4''?5''?6) isreducibleto®((j,,(j4), as follows:
=®('?i'^j'<?4''?5'^6)'
by
applying q^ >^q^=®((?j,<?j,(?4''?6)'
by
applying <j4 >_^(j5,=
®((?3,(?4),by
applying (^jn^j^) >d((f3n(jg).• Incasethatc]^ >^13'
<?s >d<?6'
and
((?2'^%) >/'?i'^^4) ^""^appHed
in thatorder,®('?i,(?2''?3''?4''?5''?6)'sreducibleto®((J2,(J5)/ as follows:
®((?M2'fl3''74''?S''?6)
=
®(<?i,^2''?4''?S''?6)'by
applying (j^ >^q3,=
®(<?2,(?2 '^^4'95)'by
applyingq^ >^q^,=
®<'?2''?5>'by
applying ((?2^'?5>>/l:^1i^-As
illustrated in this simple example, the reduction ofa quality-merge statement is notalways
well-defined.3.
First-Order
Data
Quality
Reasoner
This section investigatesa simpler data quality reasoner that guarantees thewell-defined reduction of
quality-merge statements,
by
making
certain simplifying assumptions.To
facilitate the next step of derivation,an
additional definition is introduced.Definition 3 (First-Order
Dominance
Relation):Forany two
conjunctionsE,and
Ej ofquality parameters such that £2and
E^ are in thedominance
relation, £jand
Ej are said to be in the first-orderdominance
relation,ifeachofEjand
E2 consists ofoneand
onlyone
qualityparameter.The
first-order data quality reasoner, in short calledDQRi
, is a data qualityjudgment
model
that satisfies the following:
First-order
Data
QualityReasoner
(DQRi
)
1.
Axioms
1,2,and
3 hold.2.
Only
indominance
and
first-orderdominance
relationshipsareallowed.3. <jis transitive (i.e., transitive
dominance
relation).In the first-order data quality reasoner, higher-order
dominance
relationships,such
as 'J2'^'?2'^'?3-*d'?4
orq^riq^ >d'?i'^'?2' ^^^ ^^^ allowed. In addition, the first-order data quality reasoner requires that the
dominance
relation be transitive. This implies that forany
conjunctions of quality-parameter valueassignments,Ei,E2,
and
D,
ifEj >jE2and
£2>jE^, then Ej >^ £3. Transitivity of thedominance
relationimplies the
need
foran
algorithm to verify that,when
presented with an instance of the quality-estimatingproblem
(®((j,,(j2, ..,(?„), DR),dominance
relationships inOR
do
not conflict with each other.Well-known graph
algorithmscan be used for performingthischeck (TH
Cormen,
Leiserson,&
Rivest, 1990).
Quality-merge
statements canbe
classified into groups, with respect to levels of thereducibility, as defined below.
Definition 4 (Irreducible
Quality-Merge
Statement): Forany
instantiated quality-merge statemente=®((j2.=i;2,^2'=^2'"''?n'='^n) such that n>2, for
some
r, in V^,*i = 1,2,...,and
n, e is said to beirreducible, if for
any
pair of quality-parameter value assignments in e, sayq,:=v^and
qj:=Vj, q,:=Viand
<1f=Vj are in theindominance
relation. Similarly,any
quality-merge statementwhich
consists ofone
and
onlyone
qualityparameterissaid tobeirreducible.Definition 5
(Completely-Reducible Quality-Merge
Statement): Forany
instantiated quality-merge statement e=®icji:=Vi,q2:=V2,-..,q„:=v„)such thatn>2, forsome
u,in V,, '^i = 1,2,...,and
n, e is said tobe completelyreducible, if for
any
pair ofquality-parameter value assignments in e, say (?,.=u,and
(j>=y,^i-^f,and
ff=Vj are in thedominance
relation.The
nexttwo
sub-sectionsdiscuss algorithms forevaluatinga qualitymerge
statementinDQRi
This process is
diagrammed
inFigure 1. AlgorithmQ-Merge
is the top level algorithmwhich
receives as input a quality-merge statementand
the corresponding quality parameter relationship set. WithinAlgorithm
Q-Merge,
there is atwo
stage process. First, the given quality-merge statement isinstantiated accordingly. It then calls Algorithm Q-Reduction toreduce the
QMS
intoits correspondingirreducibleform.
QualityMergeStatement E.G.©dnterpretability,
Timelmess,
Forexpositorypurposes,
suppose
thate=®(qj:=Vj,q2:=V2,---,'^„:=v„), forsome
v, in V,,for all i=1,2,... ,
and
n,and
letQ
be adynamic
set of quality-parameter value assignments,which
is initializedto [q^:=v^,q2:=V2,---,q„:=v„]. For
any
pair ofquality-parameter value assignmentsq,:=Vjand
^/•'^^i ^^ ^'if(?,.=f, >j'?;'=^i isa
memb)er
ofDR,
then eis reducible to a quality-merge statement with the quality
parametersin il less'?,'=^,>
by
Definition 1. This allowsremoving
(if=Vjfrom
ii,ifbothq,:=v,and
'?,'=f,areelementsinQ.. Continue theprocess of
removing dominated
qualityparameters, untilno
pairofthequality parameters in
Q
arerelated in thedominance
relation. Let Q' denotethe modifiedD
produced
at the
end
of thisremoval
process.The
qualitymerge
of the qualityparameters
inQ'
is the corresp>onding irreduciblequality-merge statementofe,and
thealgorithmreturns®(Q'). Itisprovenin(Jang
&
Wang,
1991)that AlgorithmQ-Reductionshown
inFigure 2alwaysresults in aunique outputinthefirst-order data-quality reasoner, in thatall
dominance
relationsmust
be first-order.3.2.
Algorithm
Q-Merge
When
presented withan
instance of the quality-estimatingproblem
{®{qj,q2,...,q„),DR)
forsome
integer n,Algorithm
Q-Merge
first instantiates the given quality-merge statement, accordingly.The
instantiated quality-merge statement is then reduced until the reduction processresults in anotherinstantiated quality-merge statement
which
cannot be reducedany
further (using Q-Reduction). Thisraises the issue of
how
to evaluatean irreducible quality-merge statement.Unfortunately, the evaluation of
an
irreducible quality-merge statement is notalways
well-defined.When
evaluatingan
irreducible quality-merge statement, thenumber
oforders inwhich
the qualitymerge
operation can beappliedgrows
exponentially with thenumber
of quality parameters inthestatement. In particular, certain quality-merge statements
may
bemerged
inmore
thanone way,
depending on
theorderinwhich
themerge
isperformed. Itispossiblethat thissetmight
includeevery elementof V-s. This paperevades thisproblem by
presenting quality-parameter value assignments in theirreducible quality-merge statementreturnedby
AlgorithmQ-Reduction
so thata usermay
usethisinformationpresented,according to hisorher needs. Figure3
summarizes
AlgorithmQ-Merge.
Algorithm
Q-Merge
Input: (e, DR),where
e =®((jj,(j2''<?„)/ forsome
integer nand
DR
is a setof localdominance
relationshipsbetween
qualityparametersqj,q2,—,and
q„.Output: Overall data quality value
produced
by
evaluatinge.1. Instantiate e.
;;Supposethat q^^^,...,andq^ are instantiated as v^,v^,...,and r^,respectively, forsome ninV^
y,forall1=1,2,...,andn. 2. e' <- IF(n=l)
THEN
eELSE
Q-Reduction(®((?j.-y,,(?2.=P2 q„-=v„) ,DR)
3. Present quality-valueassignments inef .
Figure3: Algorithm Q-Merge
4.
Discussion
We
have
presented aknowledge-based
framework
for data qualityjudgment
that: (1) allowsspecifying local
dominance
relationshipsbetween
quality parameters, (2)performs
the reduction ofquality-merge statements,
and
(3) derives a value for overall data quality.A
knowledge-based
approach
was
applied to data quality judgment, to provide significant flexibilityadvantages
inrepresenting data-consumer-specificrequirements
on
data,and
thereby tailoring data qualityjudgment
todata consumers'needs.
In addition, our analysis has identified issues that
must
be addressed in order for the qualityjudgment
model
presented in this paper to be of practical use.The
rest of this section considers thelimitationsof the
approach
explored in thispaper,and
suggests future directions for the field ofdata qualityjudgment.Higher-order data quality reasoner:
The problem
associated with the reduction of quality-mergestatements
was
discussed inSection 2.2.The
first-orderdata qualityreasonerevades theproblem
of ill-defined reductionby
prohibiting higher order relationships. Real-world problems,however,
often involvemore complex
relationships than first-order relationshipsbetween
qualityparameters. In order to deal with higher-order relationships, both the representational
and
algorithmiccomponents
ofthefirst-orderdata qualityreasonerwould
need
tobe extended.Data
AcquisitionyA hierarchy of quality indicatorsand
parameters: This researchassumed
thatvalues of quality parameters are available so that quality-merge statements can be instantiated
propterly. Issuesof
how
torepresentand
how
tostreamlinesuchvalues tothedata quality reasoner,however,
must
beaddressed.One
approachto theseissueswould
betoorganizequalityparametersand
quality indicators in a hierarchy. Then, to each dataelement
or type can be attached information abouthow
tocompute
a value ofa quality parameter.Such
a hierarchywould
allowthe derivation of a quality parameter value
from
its underlying quality parametersand
quality indicators.A
tool for automatically constructingsuch
a hierarchyand computing
quality-parameter values
would
enhance
the utility ofthe data quality reasoner.Knowledge
Acquisition:The
capability of using localdominance
relationships,which
are typicallyuser- or application-specific, allows us to build systems
more
adaptable to customers' needs.As
application
domains
are complex,however,
itbecomes
increasingly difficult to state all the relationships thatmust
be known. Such
knowledge
acquisition bottlenecks couldbe
alleviatedthrough
development
of acomputer
program
for guiding the process of acquiring relationshipsbetween
quality parameters.User
interface for cooperativeproblem
solving:As
mentioned
in Section 3, the evaluation ofan
irreducible quality-merge statement is not well-defined. Different orders inwhich
qualityparameters in
an
irreducible quality-merge statementare evaluatedmay
result in different values.This research dealt with the
need
to evaluate an irreducible quality-merge statementby
simply presenting informationon
qualityparameters
inan
irreducible quality-merge statement.Development
of a user interfacewhich
allows evaluating irreducible quality-merge cooperativelywithadata
consumer
couldlessentheproblem.In the
continuous
cycle ofmeasurement,
analysis,and
improvement
for data qualitymanagement,
it is crucial that amethodology
bedeveloped
for judging data quality. In particular,while each individual data supplier
may
maintain integrityand
consistency of itsown
data, suchlocal integrity
and
consistencydo
not necessarily guarantee that datafrom
different suppliersdisplay thesame
levelofquality.The development
of a systemthat canassistdataconsumers
injudging ifdatameets
their requirements is important, particularlywhen
decision-making involves datafrom
different,foreign sources.
The model
presented in thispaper providesa firststeptoward
sucha system.Acknowledgments
The
authorswould
like to thank Peter Szolovitsand
StuartMadnick
for their support,members
ofComposite
InformationSystems
Laboratoryfor theircomments,
and
Alexander Ishii forhisnumerous
References
[I]
Bonoma,
T. V.(1985).Case
researchinmarketing: opportunities, problems,and
a process.Journal ofMarketing Research, 22, pp. 199-208.
[2]
Burnham,
D. (1985). FBI Says 12,000 Faulty Reports on Suspects are Issued Each Day.NY.
[3]
Chankong,
V.&
Haimes,
Y. Y. (1983). Multiobjective Decision Making: Theory and Methodology.New
York, N.Y.: Elsevier Science.[4] Jang, Y.
&
Wang,
Y. R. (1991). Data Quality Calculus:A
data-consumer-based approach todelivering quality data. (CISL-91-08)
Composite
InformationSystems
Laboratory, Sloan School ofManagement,
MassachusettsInstituteofTechnology,Cambridge,
MA,
02139November
1991.[5] Johnson, J. R. (1990). Hallmark's
Formula
ForQuality. Datamation, , pp. 119-122.[6] Keeney, R. L.
&
Raiffa, H. (1976). Decisions with Multiple Objectives: Preferences and ValueTradeoffs .
New
York: JohnWiley
&
Son.[7]
Laudon,
K. C. (1986).DataQualityand
Due
Processin LargeInterorganizational Record Systems. Communications of theACM,
2£(1), pp. 4-11.[8]
T
H
Cormen,
T. H.,Leiserson, C. E.,&
Rivest, R. L.(1990). Introduction to Algorithms .Cambridge,
MA,
USA:
MIT
Press.[9]
Wang,
R. Y.&
Kon,
H. B. (1992).Toward
Quality Data:An
Attributes-based Approach to DataQuality. (CISL-92-04) June 1992.
[10]
Wang,
Y. R.&
Guarrascio, L.M.
(1991). Dimensions of Data Quality: Beyond Accuracy.(CISL-91-06)
Composite
InformationSystems
Laboratory,Sloan School ofManagement,
MassachusettsInstituteofTechnology,
Cambridge,
MA,
02139 June1991.[II]
Wellman, M.
P. (1990). Formulation of Tradeoffs in Planning Under Uncertainty .Pitman
and
Morgan
Kaufmann.
[12]
Wellman,
M.
P.&
Doyle,J. (1991). Preferential Semantics for Goals.The
Proceedings of the 9thNationalConference
on
Artificial Intelligence, 1991. pp.^
4
IU
8
Date
Due
MM'
8
1 I8S.«lU
MIT LIBRARIES