/
.^ ss !I
V',^,
\
\
y
M414
V_^
"MIT LIBRARIES
39080 00932 7294
Context
Interchange:
A
Lattice
Based
Approach
M.P.
Reddy
Amar
GuptaWP#3780
August 1994PROFIT
#94-19Productivity
From
InformationTechnology
"PROHT"
Research InitiativeSloan School of
Management
Massachusetts Instituteof
Technology
Cambridge,
MA
02139
USA
(617)253-8584 Fax: (617)258-7579
Copyright MassachusettsInstituteof
Technology
1994.The
researchdescribed herein has beensupported (inwhole
orinpart)by
theProductivityFrom
InformationTechnology
(PROFIT)
Research Initiative atMIT.
Thiscopy
isfor theexclusive useofProductivity
From
Information
Technology
(PROFIT)
The
Productivity
From
Information
Technology
(PROFIT)
Initiativewas
established
on
October
23,1992
by MIT
President Charles Vest
and
Provost
Mark
Wrighton
"tostudy
the
use
of
information technology
inboth
the private
and
public
sectorsand
to
enhance
productivity
inareas
ranging
from
finance
totransportation,
and from
manufacturing
totelecommunications."
At
the
time
of
itsinception,
PROFIT
took
over
the
Composite
Information
Systems
Laboratory
and
Handwritten
Character
Recognition
Laboratory.
These two
laboratories are
now
involved
inresearch
re-lated
tocontext
mediation
and
imaging
respectively.In
addition,
PROFFT
has
undertaken
joint effortswith
anumber
of
research
centers, laboratories,and programs
atMIT,
and
the
resultsof these
effortsare
documented
in
Discussion Papers published
by
PROFIT and/or
the collaborating
MFF
entity.Correspondence can be addressed
to:The
"PROFFF"
InitiativeRoom
E5
3-310,Mrr
50
Memorial
Drive
Cambridge,
MA
02142-1247
Tel:(617)
253-8584
Fax:(617)
258-7579
E-Mail:[email protected]
wlASSACHUSETTSINSTlTUTt OF TECHNOLOGYMAY
2 6 1995 LIBRARIESContext
Interchange:
A
Lattice
Based
Approach
M.
P.
Reddy* and
A.
Gupta
Massachusetts
Institute of
Technology
Cambridge,
MA
02139
Abstract
The
level of semantic data interoperability between a sourceand
a receiver is afunction of the context interchamge mech£inism that operates between the source
and
the receiver.
The
senicintic interoperabilitymechanisms
inexisting systemsare usuallystaticin nature
and
cjinnot cope up with changes in the semcintics ofdata eitherat the .source or at the receiver. In this paper,
we
propose a context interchcinge mechanism, based on lattice theory which can handle changes in the semantics ofdata at both thesource cind the receiver.
A
site-copy selection algorithmis also presented in this paper which selects the set of sources that can supply semantically mecmingful data to thequery ofa particular source.
Keywords:
semantic-interoperability, context-interchange, lattice theory,and
query-processing.1
Introduction
It
would
notbe an
exaggeration to claim that data required forany
application are availableat
some
data-source in the world.Data
in such data-sourcesmight
have
different contextthan
the application context. Ifit is possible to convert the contextofdata
in thedata-source to the context ofan
application, then it is less expensive to reuse these datathan
collectingthe required data for
an
applicationfrom
the scratch.Research on
datacommunications
has* Dr.
Reddy
isnow
withKenan
Systems
Corporation,One
Main
Street,Cambridge,
MA
02142-1517;
e-mail:[email protected]
concentrated
on
the physical obstacles to the reliable transfer of databetween
a sourceand
a receiver, but rarely
on
issues such as themismatch
between
the context of data-sourceand
the context of the data-receiver.
What
exactly constitutes the context is difficult toanswer
[Lyo81].The
concept ofcontext has
been
addressed inmany
areas such as sensory process, perception, language, concept learning, reccJland
recognition [Bur52,Coe77,
Tho88].The
main
reason for the contextassuming
a central role in these areas is that objectsand
their associated events constitutean
integral part oftheirenvironment and
cannotbe understood
in isolation ofthatenvironment.
In thispaper
we
do
notattempt
to give precise definition for this term,even
though
this is part of our longterm
research objective.We
assume
that contextknowledge
of a
data
item
is a triple givenby
the semanticknowledge
of the data, the organization ofthe data,
and
the qualityparameters
ofthe data. In this paper,we
concentrate onlyon
thesemantic
component
of the context,which
is formedly defined in Section 3.Consider the process
by which
a financial analyst accesses the prices for shares of a particularcompany.
He
or she needs to gather informationfrom
severed stockexchanges
located in different nations
and
must
overcome
semantic discrepencies at multiplelevels: the stock prices are stated in different currencies, the currencies are floating with respect to each other; the stock pricemay
be
the latest-price or the closing-price; etc.Such
semantics areimplicit in
many
existing databases. Unless these semantics aremade
explicit, it is difficultto identify
and
resolve underlyingsemamtic
incompatibilities.The
fundamental
question ishow
tomake
such
semantics explicitand
how
to quickly identify the incompatibilitiesand
resolve
them
ifpossible.A
number
of researchers [LR82,Tem89,
DAT87,
Ke88,
RJPS89, BT84]
have proposed
solutions that
aim
to identifysemantic
incompatibilities during the process ofschema
inte-gration. In this scenario, every application defines its
own
viewson
the integratedschema
and
these views are used to translate the semantics ofthe integratedschema
into application semantics. In this approach, the semsmtics ofdata in a database are first tratnslated into the semantics of the integratedschema and
then translated into the semantics of thebeing integrated. In [SM91], a rule-based
approach
was
proposed inwhich
these semanticdifferences are dynamically identified
and
resolved using the context information associated with the data items requiredby
an application. In this approach, the semantics of data in adatabase can
be
directly translated into the semantics of the application. In this paper,we
extend
thismodel
with a lattice-based representation for the context knowledge.We
believe that the lattice representation ismore
natural for representing the contextknowledge and
for cross comparison.
Our
context interchange scenario isshown
in Figure 1. This scenario consists of a setof applications, a set of data sources,
and
a context mediator.Each
application or data-source is associated with aContext
Data
Repository(CDR),
which
explicitly specifies thecontext of each concept relevant to it
from
its point of view.Whenever
an application requests for data, the contextmediator
ensures that the application receives semantically meaningful datafrom
variousdata
sources (ifit is possible).As
such, the context mediatormust
possess the capability to reason with different contexts (i.e., with differentCDRs)
simultaneously.
The
representationscheme
of context ineach
CDR
can enable the contextmediator
to simultaneously reason with differentCDRs.
Thisimposes
aneed
for systematic description of contextknowledge
of application requirementsand
the contextknowledge
of data-sources;
we
have adopted
the latticemodel
for such representation basedon
the following considerations:•
A
databzise can supply meaningful datatoan
application providedthe database contextis
more
general than the application context.The
more
general than relation can beelegantly
modeled
using the Lattice model. In a lattice, the contextbecomes
more
specific as
one
moves
upward
and
more
generalized asone
moves downward.
•
Context knowledge
canbe
economically represented in theform
of a lattice as theknowledge
present at a pcirticular context,which can be
sharedby
all contextswhich
are
more
specificthan
that context.•
Reasoning
with contextknowledge
in theform
of a lattice iseconomical
because angeneralized context
than
the application,and
themore
general than relation is easy totrace in a lattice.
Within
the latticebased
context interchangeframework,
conditionsunder
which
a data sourcecan
supply semanticaJlymeaningful
datato addressan
application's data requirements are derived in this paper.We
have proposed
a context driven site-copy selection algorithm,which
selects various candidate data-sources that can supplymeaningful
data to queriesinitiated
by
different applications.This
paper
is organized as follows: Section 2 presents a brief overview of representationschemes
for the contextknowledge.
The
approach
selected in thispaper
for representing contextknowledge
is presented in Section 3. In Section 4, amechanism
to describe contextdata
repository for databeisesand
their applications is discussed.A
procedure for context mediation is presented in Section 5.Query
processingand
associated issues are discussed inSection 6.
The
last section contains our conclusions.2
Lattice-based
Representation
of
Context
The
concept of context hasbeen
treatedfrom
different perspectiveby
various researchers. Forexample,
theTruth
Maintenance
System
(TMS)
[Doy89] <ind theAssumption-based
Truth Maintenance
System
(ATMS)[dK86a]
have adopted
different representationschemes
for representing the notion of context. In
TMS,
each
datum
is labeled eitherIN
orOUT,
as
determined
by
the givenboundary
condition.The
notion of context is implicit in theboundary
conditionsand
aU
thedata
itemswhich
are believed tobe
true in that context are labelledIN and
all thedata
itemswhich
are not believed tobe
true in that context are labelledOUT.
The
contextchanges
onlywhen
one
of the boundairy conditions changes.As
such,TMS
maintains onlyone
context (global context) atany
given point of time. In contrast,ATMS
provides all the contexts inwhich
thedata item
are believedand
can represent multiple contexts simultaneously.The
computational advantages
ofATMS
are discussed in[McD83, dK86a, dK86b,
dK86c].The
context interchangeproblem
issomewhat
contexts simultaneously.
2.1
Notation
and
Definitions
Let
X
and
Y
represent the contextknowledge about
a conceptfrom two
different viewpoints. Five possible relations can existamong
X
and
Y, as follows:(i)
X
=
Y
;X
and
Y
are in thesame
context.(ii)
X
C
Y
;X
is amore
generalized contextthan Y.
(iii)
X
D
Y
;X
is amore
specific contextthan
Y.(iv)
X
n
Y
7^and
(i), (ii), or (iii) are not satisfied;X
and
Y
possesssome
contextin
common.
(v)
X
n
Y
=
;X
and
Y
are totally disjoint contexts.Two
contexts arecomparable
ifeither (i), (ii) or (iii) is satisfied. Thereforeone
can define only partial orderamong
the set ofcontexts using the relationC
and form
a context-lattice.A
lattice is a partially ordered set, withX
C
Y
meaning
XflY
—
X
and
XuY=Y,
inwhich
each pair ofelements possesses a greatest lower
bound
and
a leastupper
bound
within the set. IfX
C
Y
thenone
says thatX
is as general context as contextY
[Sho91].A
context alone is not interesting; it is interesting onlywhen
it is linked with all asser-tionswhich
are true in that context.Contexts can be viewed
as envelopes enclosingsome
assertions. If
an
assertionA
is true in the contextX
then this information is represented asA^
. This assertionmay
be
true inone
context <ind untrue in another context. IfA^
is validand
ify
C
X
then
A^
is also vjJid. In other words, ifan
assertionA
is true in a particular context, sayX,
then it is true in all contextswhich
aremore
generalthan
X. This is thebasis for the context interchjinge
mechanism
presented in this paper.3
Semantic
Assertions
and
Context-Knowledge
Each
data-source is visualized as a set ofdistinct object typesand
each object isan
aggrega-tion ofa set ofproperties.
We
categorize properties intotwo
clttsses: primitive propertiesand
appUca-tions
and
data sources,whereas
the semantics of a non-primitive propertymay
be differentfor different applications
and
at different data sources.The
semantics of a non-primitive property are captured through a context-assertion lattice.The
context-assertiondomain
ofa property is the set ofcontexts relevant to the propertyand
aset of assertionswhich
may
be
true in each context.The
context-assertiondomain
ofaproperty is given
by
the context-assertion lattice of the property. This lattice is constructedfrom
the context latticeand
the semantic-assertionsdomain.
The
context lattice, these-mantic
assertiondomain, and
the context assertion lattice are described in the following subsections.3.1
Context-Lattice
As
mentioned
earlier, the semantics ofany
property cannotbe understood
in isolation ofthe
environment/context
inwhich
it is intended tobe
used. In other words, theenvi-ronment/context
associated with a property functionally determines the semantics of the property.As
observed in [SSR92], theenvironment
of a property canbe
occasioncdlyde-termined
by
other properties of the object.These
properties are collectively referred to asthe
environment
schema
of the non-primitive property.The
environment
schema
needs tobe
taggedwhenever
a non-primitive property ismoved
from one environment
to another.The
set of all possible environments/contexts inwhich
the property is defined is called the contextdomain
of the property.The
context domziin of a property is representedby
thecontext lattice.
Let
Environment(P)
denote theenvironment
scheme
ofa non-primitive property P. LetEj be
a property inEnvironment(P).
LetDom{Ej)
be
thedomain
or the set of legalval-ues associated with Ej.
The
assignment ofeach
Cj€ Dom(f^j)
to the property Ej sets adifferent context for the non-primitive property P. For
example,
letDom(Instrument-type)
be
{ Equity,Future
}and
Dom(Exchange)
be
{Nyse,Tokyo
}.Assignment
of Equity toInstrument-type
sets a particulaj context to property Trade-price,whereas
the assignment ofFuture
toInstrument-type
sets another context to the property Trade-price. Different contexts associated with the property Trade-price in the excimple areshown
in Table-1.
|A:»;€,Qi
fA3lCi.p^
T>'f^*i.
{AX}
{A.{}
In
some
context, say 'X', themeta-property
MP, may
have
ameta-valueMVi;
in another context,itmay
have
a meta-valueMV2
and
so on. Forexample,
the meta-value of currencymay
be
Dollars inone
contextand Yens
in another context. This implies thattwo
semantic assertions are possible for themeta-property
Currency. Similarly, ifone assumes
thatMeta-Values(Status) is given
by
{ 'Latest-price', 'Closing-price' }, then semantic assertions for 'Trade-price' willbe
asshown
in Table-2.3.3
Context-Assertion
Lattice
of
a
Non-primitive Property
The
context-assertion lattice of a property is generatedby
the cross-product of thecontext-lattice
and
the semantic assertiondomain.
Therefore, the context-assertion lattice couples thecontextsofaproperty with its associated assertions.As an example,
the context-assertionlattice ofTrade-price is as
shown
in Figure 3.A
node
n, in the context-assertionlatticeis apairdenoted
by
(i,,y,).The
first co-ordinate projection (denotedby PJ\)
is called the context co-ordinateand
the second co-ordinate projection (denotedby PJ2)
is Ccilled the semantic co-ordinate.These
two
co-ordinates are defined as follows:PJi{n,) -^ Xi
PJ2{n,) -^ yi
The
relationsC
and
D
among
two nodes
rigand
nj in the context-lattice are defined as follows:"a
Q
"d providedPJ\{na)
C
PJi{nd) and PJ2("a)
=
PJiind)-ria
D
n<J providedPJ\{na)
D
PJ\{nd) and PJzina)
—
•P./2(^d)-Given
any two
nodes, Uaand
n^in the context-assertionlattice,one
can define threerelations as follows:Total
context-lift:The
context ofnode
n^can be
totally lifted to the context ofnode
rio,provided 714
C
ng.Partial
context-lift:The
context ofnode
n^ canbe
partially lifted to the context of anode
rio,whenever
n^
D
no.No
context-lift:The
context ofnode
n^ cannotbe
lifted to anothernode
Ua, ifthe contextof
nj
cajinotbe
either totally or partially lifted to the context of rig.Node
Equivalence:
Given
anode nj
in the context-assertion lattice,we
define a node-equivalence classdenoted
by
[nj] as follows:[nj]
=
{ TioI
rio is in the context- assertion lattice
amd PJi(nj)
=
PJi{na) and
PJ2{'"-d)is convertible to
PJ\{na)
}If the context of
any
node
UkG
[ nj] canbe
lifted to the context ofn^ then everynode
^/
6
[ n<i ] can be lifted to the context of n^.The
contextand
its associated semanticassertions ofany
instance ofP
can be givenby
one of thenodes
of the context-assertion lattice.Our
model
requires a context-assertion lattice for each non-primitive property that is being shared (interoperable)among
applicationsand
data-sources.
The
use of context-assertion lattice in describing the contextknowledge
of adata-source or
an
application is discussed in the next section.4
Context
Definition
Repository
In the
proposed
context interchange architecture, each data-source or application needsto
have
aContext
DefinitionRepository(CDR). Semantic
assertionsand
their associated contexts foreach
and
every non-primitive property of adata
source collectively constitute theCDR
for the data-source. Similar definition holds foran
application'sCDR.
Consider a database, dbl,
which
supplies the instance values of the propertyTrade-price.
Assume
that the instances of Trade-price are defined in three different contexts,namely
Cl, C2,and
C3.The
contextCI
is definedby
Instrument-type=
'Equity'and
Exchange
=
'Nyse', the contextC2
is definedby
Exchange
=
'Tokyo',and
the contextC3
is definedby Instrument-type
=
'Future'. Let the semantic assertionsStatus=
'latest-price'and
Currency
=
'DoUars' are true in context Cl,and
Status=
'Closing-price'and Currency
=
'Yens' are true in C2,and
finally Status=
'Closing-price'and
Currency=
'Dollars' are true in context C3.These
contexts Cl, C2,C3
and
their respectivesemantic
assertions canbe
mapped
tonodes
n21, n20,and
n9
respectively in the context assertion lattice ofproperty Trade-price.The
collection of all contextsand
semantic assertionswhich
are true in these contexts for a non-primitive property in a particular data-sourceforms
the characterizingenvironment
for the non-primitive property at that particular data-source.Now
one
can define the characterizingenvironment
of the Trade-price in the data-sourcedbl
cis:7f,U-pr.c.
=
{"21, n20,n9}
As
such, the contextdata
repository of a data-sourcecan
be
describedby
defining the•o 3 ••
rj
(0 u^ eg
(9 «M .y
l<
if
es
"
^
I
at(9 (ME
Sif N.O
,'~|
10z
«2
^i?r
/"i
^-J
-Mas
g e t: u —83
513
iis
ri
1
r 4) c4V
(4a
o
V
OS CO ca •>X
u
•» C3O
«
characterizing
environments
for eachand
every non-primitive property that can be supplied.The
collection of such characterizing environments,one
for each non-primitive property relevant to a data-source forms theContext
Definition Repository(CDR)
for that data-source. Similarly, the characterizingenvironments
for eachand
every non-primitive property required for the application constitute theContext
Definition Repository(CDR)
for that application.5
Context
Comparison
One
aim
of contextcomparison
is to ensure that applications receive only meaningful datafrom
data-sources.To
accomplish this, thequery
processormust
analyze the query, identifyall the relevant
nodes
in theCDR
of the application,compare
thesenodes
with the nodesin the
CDRs'
ofdata
sources,and
identify potential sourceswhich
can supply meaningful data to the application.Checking
for these relations at run-time, that is during the queryevaluation time implies
more
burden
on
thequery
processor. Therefore, the contextcompar-ison task is ideally delegated to context-mediator.
The
context-mediator canmake
certaincomparisons
inadvance
and
the query-processor can utilize the results of thesecomparisons
at
query
evaluation time.In this scenario, the goal ofcontext-mediator can
be
conceptually outlined.The
context-mediator
is provided with a set ofCDRs
of databasesand
applications.The
task of the context-mediator
is to efficientlydetermine
the subset of databaseswhose
context canbe
lifted to
an
application's context. Efficiencycan be
achievedby
takingadvantage
oftwo
important
factors. First, the context-mediatorcan be
supplied withchanges
in theCDR
of the relevant application or database, allowing the context-mediation process to be incre-mentttl, with
updates
for thechamged
contexts only. Second, the query-processor usuadly needs toknow
onlywhich
data-sourcecan
supplymeaningful
data toan
application's query,and
only rarely the contents ofCDR
of a data-source.The
contextmediator
can thereforebe
constructed asan
intermediatedata
structurewhich can
make
very rapidcomparisons
of contexts. This
data
structuremust
also help the context-mediator in identifyingand
correcting the pre-compiled comparisons,
whenever
changes aremade
to the existingCDRs.
The
context-mediator associates everynode
in the context-assertion lattice withtwo
lists:a Source-list
and
aConsumer-list.
A
Source-list states all the databaseswhich
contain thesame
node
in their respectiveCDRs.
An
consumer-list states all the applicationswhich
are having the
same
node
in their respectiveCDRs.
These
lists are thesame
for allequivalent nodes in the context-assertion lattice.To
demonstrate
context comparison, consider three applications apl, ap2,and
ap3,whose
data requirements
span
over data-sources dbl, db2,and
db2.Assume
that applications apl, ap2,and ap3
require the instances of the non-primitive property 'Trade-price' in differentcontexts
and
furtherthat 'Trade-price'is available in dbl, db2,and
db3
in different contexts.The
semantics of 'Trade-price' in different applicationsand
in different databases are givenby
characterizingenvironments
in their respectiveCDRs.
7f.U-p,.«
=
{"21, n20,n9}
7f.l<^-p,.„
-
{n20,n22}
iTride-prrce
=
("21,n32}
7rrL-pr.ce
=
{"21, 7x33} 7rr!de-pr.«=
{"13}
Consider the context-zissertion lattice for 'Trade-price'
shown
in Figure 3. Using theabove
given characterizingenvironments
of 'Trade-price', source-listand
consumer-list foreach
node
in the context- assertion latticecan be
constructed.As
an example,
considernode
n21 in the context-assertion lattice of 'Trade-price'.
The
elementdbl
is entered into theSource-list
and
elements apland ap2
are entered into the Consumef-list of thenode
n21.The
same
source-listand
consumer-list will be attached to adlnodes which
are eqiiivalentto n21, i.e.,
nodes
like n23. In Figure 3, for each node,an arrow
that is leaving thenode
with
broken
lineshows
the source-listand
the consumer-list associated with the node. Forsimplicity, only
non-empty
sourceand consumer
lists areshown
in this figure.With
the help of Source-listand
Consumer-list, the contextmediator
can identify poten-tial sourceswhich can
provide semanticsdly meaningful solution toam
application's query.The
contextmediator
willbecome
aperformance
bottleneck if every application needs toconsult the context
mediator
for each of its queries.To
avoid thisperformance
bottleneck, the context-mediator is envisaged to distribute the requiredknowledge
of context mediationamong CDRs'
of applications.Every node
in theCDR
of an application isaugmented
withtwo
disjoint Usts,namely
Total-Suppliers-Listand
Particd-Supphers-List.The
derivation ofTotal-Suppliers-List
and
Partial-Suppliers-List is discussed in the following paragraphs.Total-Suppliers-lists:
The
total-supplier-list contains all the data-source nodeswhose
con-text canbe
totally lifted to the context ofan
application node.The
union of Source-listsassociated with each
node
present ineach
chainand
reaching a particular applicationnode
in the context assertion lattice forms the Total-suppliers-list for the application node.
These
chains aJso include the application
node
Partial-Suppliers-list:
The
Pajtial-Supplier-List gives all the data-sourcenodes which
can only
be
particdly lifted to the application node.The
union of Source-lists ofeachnode
present in
each
chain leaving the applicationnode
in the context assertion lattice forms the Partial-Support-List for the application node.These
chains exclude the application node.Consider
node n21
in theCDR
of application apl.There
aretwo
chcdns reaching thisnode
in the context assertion lattice,namely
{n21, nl3, nl,nO}
and
{n21, n5, nl, nO}.The
union ofsource-lists ofjdl
nodes
inthese chains is {dbl
}. Thereforethe Total-Suppliers-Listofn21 of application
apl
willbe
{dbl
}. Since there aieno
chains leaving n21, thePartial-Support-List of
n2I
in theCDR
of apl willbe empty.
Similarly, fornode
n32, there aretwo
chains reaching this node,
namely
{ n32, n20, n4,nO
}. Therefore the Total-Support-List ofn32
willbe
{ dbl,db2
},and
since there areno
chains leaving n21, the Partial-Suppliers-List willbe
empty.
In this scenario,
an
applicationcan
determine
the list of potential sourceswhich
can supply semanticallymeaningful
solution to its query.iT'r'a^-^.c.
=
{{n2l{dbl}{}),
{n22{dbl,db2}{})}
Ifthe application needs
data
related to 'Trade-price' in the context n21 then it will selectdbl,
however,
if the 'Trade-price'data needed
tire in the contextn32
then eitherdbl
ordb2
is selected.Similarly,
one
can write the characterizingenvironments
of Trade-price in apiand
ap2 with Total-Suppliers-listsand
Partial-Suppliers-lists.7t^:..-p..„
=
{{n22{dbl}{}), (n34{ dbl}{ })}{db\} frrl.-^,..=
{{nlH}{dbl,db3})}{}
From
theabove
characterizing environments, applicationsap2 and
ap3 can select data-sourceswhich
are compatible to the required context of the property 'Trade-price'.The
issue that still needs tobe
addressed is the process of ensuring the correctness ofTotal-Suppliers-list
and
Partial-Suppliers-lists that are distributedamong
CDRs
of several applications, in the eventofchangesin semantics either at data source levelor at application level. This issue is addressed in the next subsection.5.1
Integrity
of
Semantic
Mappings
Semantics
of a non-primitive property canchange
either atan
application or at a data source.A
mechanism
tocope
with changes in the semantics of a non-primitive property in theproposed
context interchange architecture is discussed in the following paragraphs.Changes
inApplication
Data
Semantic:
Whenever
the semantics of anon
primitive property inan
application chsinge, thesechanges
must
be reported to the contextmedia-tor.
The
contextmediator
updates consumer-lists of affected nodes in the context assertionlattice
and
using this context assertion lattice generates anew
list of Total-Suppliers-listsand
Partial-Suppliers-lists for allnew
nodes
in the characterizingenvironment
of the non-primitive property inCDR
of the application.The
application should not request for data related to the non-primitive property until the contextmediator
returns anew
set ofToted-Suppliers-lists
and
Partial-Suppliers-lists to the set ofnew
nodes
entered intothe applicationCDR.
Changes
inData
Source's
Data
Semantics:
Whenever
the semantics of anon
primitive property at adata source chsmge, suchchsmges
willbe
reported to the context mediator.The
contextmediator updates
source-lists ofnodes
in the context assertion lattice correspondingto the old
and
new
contexts of the non-primitive property. Concurrently, the context medi-ator identifies the set of affected applications using consumer-lists attached tonodes
in thecontext assertion lattice, evaluates
new
Total-Suppliers-listsand
Partial-Suppliers-lists,and
updates the
CDR
of each affected application. If a data-source changes the semantics of a non-primitive property, then it should not allowany
application to access this property untilthe context
mediator
resolves all semanticmismatches by
updating theCDRs
of all affectedapplications.
In the
above
discussion, amechanism
forsemanticinteroperabilitybetween an
applicationand
a data-source with respect to a single non-primitive property is discussed. In fact,an
application
query
may
consist ofone
ormore
non-primitive propertiesand
may
have
some
primitive properties. In such a situation, identification of data-sources
which
can supply a semanticallymeaningful
solution toan
applicationquery
in a cost-effectivemanner
is a challenging ta^k. In the following section,we
proposed an
algorithm to identify a set ofdata-sources
which can
supply semanticallymeaningful
data toan
application query.6
Query
Processing
An
application generates aquery based
on
itsdata
requirements.Each
query
consists oftwo
parts: thetarget part
and
thequalification part.An
applicationquery
canbe
interpreted as a request for the instances ofthe target properties within the context givenby
the qualificationof the query.
The
semantics of the properties referred to in thequery
canbe determined
by
theCDR
of the qualification of the query. Therefore,one
needs to identify semantic assertionswhich
are true in the context givenby
the qualification part of thequery
and
theCDR
of the application.The
context of a non-primitive property in thequery
canbe
defined only if thequery
is well-formed at the application.We
define aquery
tobe
well-formed with respect to a non-primitive property atan
application, if the semantics of the non-primitive property inthe
query
is derivablefrom
the qualification of thequery
and
theCDR
of the application.In other words, the quaJification of the
query
must
be complete
enough
todetermine
thesemantic
assertions of anon
primitive property referred to in the query.A
well-formedquery
with respect to a non-primitive property at adata
sourcecan be
defined in a similarmanner.
Once
semantic assertions of properties referred to in the query are determined, then the query processor identifies a set of siteswhich
can supply semantically meaningful solution to the query. In the following subsection,we
provide an algorithm to identify a set of such sites.6.1
Site-selection
Algorithm
The
process of site-copy selection is amechanism
for selecting a set of data sources for processing agivenquery
ifmore
thanone
candidate set are available.The
site-copy selectionalgorithm in
MERMAID
[Te87] concentratedon
optimization ofquery
execution costand
did not consider semantic heterogeneityaspects duringthe selection ofprocessing sites,
which
are instead resolvedby
the integratedschema
inMERMAID
architecture.The
MERMAID
algorithm selects the set consisting of theminimum
number
of sites to process the query out of all possible sets of candidate sites.The
rationale behind theminimum
number
ofsites is that, in general, the data
communication
requirements are likely tobe
minimum
ifthe
number
sites involved in thequery
processing areminimum,
which
in turn reduces the overallquery
processing cost. Since there isno
integratedschema
in our context interchange architecture, the Site-copy selection algorithmmust
deal with semantic heterogeneity issues.The
algorithmbelow
consideres such issues during the selection of processing sites.The
following algorithm selects aset ofcandidate sites for agivenquery which
can supplymeaningful
solution to the query.The
Total-Suppliers-Listand
Pcirtial-Suppliers-List usedin the following algorithm are defined in the previous section.
Candidate-sites(query, CDR(application))
{
Let
P
be
the set ofprimitive propertiesand
X
be
the set of non-primitive properties referred to in the query;/* see
Example-
1and Example-
2below
*/For
each
pi€
P, letSPi be
the set of data-sourceswhere
pi is available;Represent SPx
in disjunctivenormal
form. Foreach
Xj£
X
,whose
context is givenby
n,^/* see
Example-
1below
*/{ if total-suppliers list of n^ is not
empty
{ let
SXj
denote the Totcd-Suppliers-List of n^ ;Represent
SXj
in disjunctivenormal form
}/* see
Example-
2below
*/else if Partial-Suppliers-List of
n^
is notempty
{ let
SXj
denote the collection of Particd-Support-List of n^ ;Represent
each Sji€
SXj
in disjunctivenormal form
and
assign their conjunction toSXj
}}
Set Candidate-Sites to [Mp^^pSPi)
A
{\/j,^^xSXj)\Translate the expression Candidate-Sites into disjunctive
normal
form;select the conjective
term from
Candidate-Siteswhich
heisminimum
number
of sitesand
ifmore
than
one
suchterm
exists then selectany one
ofthem
and
return the selected term.}
Examples:
The
followingexamples
illustrate the selection ofcandidate sites to process a givenquery
by
takingsemantic
heterogeneity issues into consideration.Example-
1:Assume
that the followingquery
is receivedfrom
the application apl.Select
Trade-priceWhere
(Elxchange=Nyse and Instrument-type=Equity).
The
query
is well-formed at the application apl.P
=
{Exchange, Instrument-type
}X
=
{ Trade-price } SPsxchanae=
dblV
db2V
dbZS
Plnstrument-Type=
^blV
db2V
db3S
Xxrade-price=
dbl20
Ceindidate sources for processing the
complete subquery
is givenby
(dbl
V
db2V
db3) A {dblV
db2V
db3)A
(dbl)The
above
expression is translated into disjunctivenormal form
as follows:(dbl)
V
{dblA
db2)V
{dblA
db3)V
((f6l A db2A
</63)Since the first disjunction has
minimum
number
of elements, Data-sourcedbl
wiU be selected to process the query.Example-2: Assume
applicationap3
uses the query: Select Trade-priceWhere
Exchange=Nyse
The
query
is well-formed at the application ap3.P=
{Exchange
}X
=
{ Trade-price }SPsxchange
=
dblV
db2V
dbZSince, Trade-price in the application
ap3
hasno complete
solution atany
single data-source. It hcis only partial solution atdbl
and
at db3.SXTrade-pr^ce
=
{dbl)A
{db3)Candidate
sites to process thequery
is givenby
{dbl
V
db2V
dbZ)A
{{dbl)A
{dbZ))The
above
expression is translated into disjunctivenormal form
as follows:{dbl
A
dbZ)V
{db2A
dblA
dbZ)Data-sources,
dbl
amd db3
c&nbe
selected to process the query.7
Conclusion
In this paper,
we
provided alattice-basedframework
for the description ofcontextknowledge
for data-sources
and
applications. Since hierarchical nature is implicitamong
disparatecontexts, the lattice
model
is ideally suited for the required representation.Our
approach
requires that a context data
repository(CDR)
ofan
applicationand
another ofa data-sourcebe
generatedfrom
the given set of context-lattices. In other words, allCDRs
must
share thesame
set of context-lattices.Context comparison and
context evolution tasks can be handledefficiently using the lattice model.
The
lattice technique for representing contextknowledge
ismore
appropriate since it involves only set operations for their context comparison, unlikeas rule
based
representationwhich
uses inferencemechanism
toperform
context comparison.The
context-assertion lattice presents a set of legal contextsand
associated assertionswhich can be
used as a referenceset for aDatabase
Administrator or an application developer toframe
his or herCDRs.
If the application developer has the option to choosebetween
semantic
assertions then he or she can use the existing context-assertion lattice to identifywhat
kind of assertionwould
receive greater database support. Similarly adatabase designer can use the context-assertion lattice to fixsemantic
assertions in his or her data so that theycan
be
utilized meaningfiillyby
many
applications.Our
model
is powerfulenough
toaccommodate
semantic conversions during context interchangethrough
the notionofnode
equivalence. Iftwo
nodes
in acontext assertion latticeare equivalent then all applications associated with
one
of thenodes
can get semanticallymeaningful data
from
thedata
sources associated with the othernode and
vice versa.We
also derived conditionsunder which
a sourcecan
supplymeaningful
data toan
application.
Under
these conditions, the context-mediator maintains the consistency of theknowledge
presentin the application as well as the data-sourceCDRs. The
Query-processor, using theknowledge
present in theCDR
ofthe application, can identify a set ofdata-sourceswhich
can supply me!Lningful solution to the application query.As
such, the roles ofcontext-mediator
and
the query- processor are separated.A
context driven data-source selectionalgorithm for a given application
query was
also described.Since the storage
requirement
foreach
context- assertion lattice canbe
large, the latticemust
be
pruned
to itsminimum
size before it is physically stored. Allnodes
in a context-assertion lattice,which
<ire not present inany
CDR
can
be pruned from
the lattice. Since thelattice is a directed acyclic graph, welldefined storage representation
schemes
ajid algorithmsto include a given
node
into the latticeand
to search for a requirednode
in the lattice areavailable.
Acknowledgments:
The
authorsthank
anonymous
reviewersfor theirdetailed suggestions.M.
P.Reddy
recently joinedKenan
Systems
as a database architect. Earlier, heworked
as
Research
Associate at the Sloan School ofManagement,
Massachusetts Institute of Tech-nologyfrom
1991November
through 1994 January.He
received his doctorate incomputer
scienceand
masters in physicsfrom
the UniversityofHyderabad, Hyderabad,
India. Prior tojoining
MIT,
heworked
asan
Assistant Professor at the University ofHyderabad. At
MIT,
Dr.
Reddy
was
active in several research projects at theComposite
InformationSystems
Laboratory.
He
has published several articles in the areas of heterogeneous databasesand
data quality
management.
His current research interests include integration of heteroge-neous databases, context interchangeamong
heterogeneous information systems,knowledge
discovery in databases, data quality
management,
and
multidimensional databases.A.
Gupta
is Co-Director ofMIT's
Productivityfrom
InformationTechnology
(PROFIT)
Initiative
and
the firstand
ordy SeniorResearch
Scientist at theSloan SchoolofManagement,
Massachusetts
Institute of Technology.He
received his Ph. D. incomputer
sciencefrom
the Indian Institute of Technology,New
Delhi, a masters degree inmanagement
from
MIT
and
a Bachelors degree in electrical engineering
from
the Indian Institute ofTechnology,Kanpur.
Dr.
Gupta
serveson
the AdministrativeCommittee
of theIEEE
Industrial Electronics Societyand
hasbeen
assistantchcdrman
for severallECON
conferences.Sincejoining
MIT
in 1979, he hasbeen
activein the areas ofmultiprocessor architectures, distributedand
heterogeneous databaise systems,and automated
reading of handwritten information.He
has publishedmore
than
100 technical articlesand
papers,and produced
seven
books
in these areas.He
serves asan
advisor to anumber
of leading internationzd organizations.References
[BT84] Y. J. Breitbart
and
L. R.Tieman.
ADDS-heterogeneous
distributed database system. In Proceedings of Third InternationalSeminar
on
DistributedDatabase
Systems,
March
1984.[Bur52] E.
Burnswick,
editor.The Conceptual
Framework
of Psychology. University ofChicago
Press, Chicago, 1952.[Coe77] T. R. Coe,
W.
C. Sarbin.Hypnosis from
the standpoint ofa contextualist.Annals
ofthe
New
YorkAcademy
of Science, 296, 1977.[DAT87]
S.M.
Deen,
R. R.Amin,
and
M.
C. Taylor.Implementation
of a prototype forPRECI*.
Computer
Journal, 30(2), 1987.[dK86a] J.
de
Kleer.An
assumption based
TMS.
Artificial Intelligence, 28, 1986.[dK86b] J. de Kleer.
Extending
theATMS.
Artificial Intelligence, 28, 1986.[dK86c] J.
de
Kleer.Problem
Solving withATMS.
Artificial Intelligence, 28, 1986.[Doy89] R. Doyle.
A
Truth Maintenance System.
Artificial Intelligence, 12, 1989.[Ke88]
V.
Krishnamurthy
and
et al.IMDAS- An
integratedmanufacturing
dataadmin-istration system.
Data
and Knowledge
Engineering, 3, 1988.[LR82] T.
Landers
and
R.Rosenberg.
An
overview ofMULTIBASE.
In DistributedData
Bases, pages 153-183.
North
Holland, 1982.[Lyo81] J Lyons, editor.
Language,
Meaning and
Context.Fontzma Paperbacks, Great
Britain, 1981.
[McD83]
D.V.
McDermott.
Contexts
and Data
Dependencies:A
Synthesis.IEEE
Trans-actionson
Patteren Analysisand Machine
Intelligence, 5, 1983.[RJPS89]
M.
Rajinikanth,G.
Jakobson,
and
G.
Piatetsky-Shapiro.On
heterogeneousdatabase
integration:One
year experience in evaluatingCALIDA.
In Proceed-ings ofWorkshop
on Heterogeneous
Databases,December
1989.[Sho9l] Y. Shohami. Varieties of Context. In
Vladimir
Lifschitz, editor, Artificial In-telligenceand
Mathematical Theory
ofComputation:
papers inhonor
ofJohn
McCarthy.
Academic
Press, 1991.[SM91]
M.
Siegeland
S.Madnick.
A
Metadata
Approach
to ResolvingSemantic
Conflicts. In Proceeding of the 17th International Conferenceon
Very LargeData
Bases,September
1991.[SSR92] E. Sciore,
M.
Siegel,and
A. Rosenthal.Context
interchange using meta-attributes. InSubmission
to the 18th International Conferenceon
Very LargeData
Bases,August
1992.[Te87]
M.
Templeton
and
et al.Mermaid
- a front-end to distributed heterogeneous databases. In Proceedings of theIEEE,
volume
57, pages 695-708, 1987.[Tem89]
M.
Templeton.
Schema
integration inmermaid.
In Position Papers:NSF
Work-shop
on Heterogeneous
Databases,December
11-13, 1989.[Tho88] G.
M. Thomson,
D.M.
Davies. Introduction:Memory
in Context:Context
inMemory.
In Davies G.M.
and
Thomson
D. M., editors.Memory
in Context: Context inMemory. John
Wiley
&
Sons Ltd., 1988.39080 00932 7294