I
NFERRING
MISSING SCHEMA FROM
LI
I
KED DATA
USING
FORMAL
CONCEPT
A ALYSIS
(FCA)
!J:ASTER
THESIS
PRESE
I
TED
AS A
PARTIAL REQUIREMENT
FOR T
HE
MASTER
l
N COMPUTER
SCIENCE
BY
RAZIEH MEHRI
DEHNAVI
Service des bibliothèques
Avertissement
La diffusion de ce mémoire se fait dans le respect des droits de son auteur, qui a signé
le formulaire
Autorisation de reproduire et de diffuser un travail de recherche de cycles
supérieurs
(SDU-522- Rév.01-2006).
Cette autorisation stipule que
«conformément
à
l'article
11 du Règlement no 8 des études de cycles supérieurs, [l
'auteur] concède
à
l'Université du Québec
à
Montréal une licence non exclusive d'utilisation et de
publication de la totalité ou d'une partie importante de [son] travail de recherche pour
des fins pédagogiques et non commerciales.
Plus précisément, [l'auteur] autorise
l'Université du Québec à Montréal
à
reproduire, diffuser, prêter, distribuer ou vendre des
copies
de [son] travail de recherche
à
des fins non commerciales sur quelque support
que ce soit, y compris l'Internet. Cette licence et cette autorisation n'entraînent pas une
renonciation de [la] part [de l'auteur]
à
[ses] droits moraux ni
à
[ses] droits de propriété
intellectuelle. Sauf entente contraire,
[l'auteur] conserve la liberté de diffuser et de
commercialiser ou non ce travail dont [il] possède un exemplaire.»
INFÉRE
CE
DU
SCHÉMA MANQUA T
À
PARTIR DE DO
ÉES LIÉES
À
L'AIDE DE L'ANALYSE DE
CONCEPTS
FORMELS
(ACF)
MÉMOIRE
PRÉSENTÉ
COMME EXIGENCE PARTIELLE
DE LA
MAITRISE EN INFORMATIQUE
PAR
RAZIEH MERRI DEHNAVI
I
am
d
ee
ply
thankfu
l
to
my
s
up
erv
i
sor,
Dr. Petko Valt
c
h
ev,
for his
guidance
a
nd
encourageme
nt
t
hr
oughout
my Mast
e
r
studies at
UQAM. He has been
a
n
exce
ll
e
nt
adv
i
sor and a constant source
of knowl
e
dg
e,
motivation
, a
nd
e
n
co
urag
e
ment during this
diss
e
rtation work.
I would lik
e
to
extend
my thanks to Dr.
F
at
ih
a
S
a
dat
,
my
co
-
s
up
e
rvisor
,
for h
e
r
g
uid
a
n
ce
throughout this res
ea
rch work.
I
am co
ntinuously
gratef
ul
to m
y
family
specia
ll
y
my par
e
nts for
their support
,
l
ove
a
nd
e
ncouragement.
Fin
a
ll
y,
I would
lik
e to thank a
ll
the
staff
m
e
mb
e
r
s
of
the Computer
Sci
e
nc
e department
at
UQAM
for
th
e
ir
dir
ect
and
indir
ect
helps during my studi
es at
UQAM.
LIST
OF FIGURES
XlLIST
OF TABLES
XlllABREVIATIO
NS
xv
RÉ
SUMÉ
xvn
ABSTRACT
X lXI
N
TRODU
CT
IO
N
1
C
HAPTERI
MA
I
CO CEPTS
3
1.
1
RDF .
3
1.2 RDF
Schema (RDFS)
5
1.3 FCA
7
1.3
.1
In
t
r
o
du
ction to Formal Concept
An
alys
i
s
7
1.3.2
Concept
Lattice
.
8
1.4 DBpedia
. . . .
.. .
9
1.4.1
Data Source .
10
1.4.2
Data Structure
11
1.4.3
Data Access .
13
1.5 Summary
14
CHAPTERII
REVIEW OF THE LITERATURE
15
2.1.1
Name-based Techniqu
es
.
.
2.1.2
Structure
-b
ased Techniques
2.1.3
Extensional Techniques
..
2.1.4
Semantic-based
T
ec
hniqu
es
2.2 RDF
a
nd Similarity . . . .
18
27
30
32
34
2.2.1
Simi
l
arity of RDF Graphs on Link
e
d Op
e
n Data (Interlinking Tools)
34
2.2.2
Finding Similarity
b
etwee
n RDF Individuals Using FCA
2.3
FCA
and
Semantic Web Applications
.
2.4
Summary
45
46
47
CHAPTERIII
METHODOLOGY
A D IMPLEMENTATION
3.1
Approach
..
.
. . . .
. . . . .
. . . .
3.1.1
Converting RDF
to FCA Input
3.1.2
Converting FCA Output to
RDFS
49
49
50
52
3.1.3
Choosing
Plausibl
e a
m
es
for RDFS
C
l
asses Using
DBp
e
di
a
54
3
.2
3.3
Implementat
ion
. . .
.
. .
3.2.1
Java Frameworks and
APis
3.2.1.1
J
e
na.
3.2.1.1.1
RDF API
3
.
2.1.1.2
SPARQL
3.2.1.2
Galicia
3.2.1.
3
RDF Gravity
3.2.2
Generat
i
ng RDFS from RDF data
3.2.2.
1
Step
One:
Converting
.
rdf
to .
rcf.
xml
.
3.2.2.2
Step
Two
:
Converting lat.
xml
to
RDFS
3.2.2.3
Step Three:
aming
Classes Using DBp
edia
Summary
.
.
58
58
58
59
60
61
61
64
64
65
66
67
C
H
APTERIV
EXPER
I
MENTS A
ND RESULTS
4
.
1 Dataset
4.2 Resu
l
ts .
4.
2.1
Binary
Re
l
ation Ta
ble
4.2
.2
Concept Latt
i
ce
.
4.2
.
3
R
D
FS G
r
a
ph
. .
4.3 Discussion
of the
Exper
i
ments
.
4.4 Summary
co
T
C
LUSIO
B
I
B
LI
OGRAPHY
69
69
70
70
70
70
72
77
79
79
Fi
g
ur
e
P
age
1.1 RDF
g
r
a
ph
exa
mp
l
e
4
1.2 RDFS
g
r
a
ph
exa
mple.
6
1.3
Formai
co
n
text exa
mpl
e
7
1.4 Conc
e
pt l
attice e
x
a
mpl
e
9
1.5 R
ed
u
ce
d l
a
b
e
ling diagram of
co
n
ce
pt
la
tt
ice
exa
mpl
e
10
1.6 lnf
o
b
ox o
f Portuga
l
.
. . . . .
.
. .
.
.
.
11
1.7 Th
e
DBp
e
dia datas
e
t for B
arac
k Ob
a
m
a
14
2.1 Similarity
tec
hniqu
es
. . .
.
.
16
3.1 RDF
g
r
a
ph of m
u
s
i
c datas
e
t
.
51
3
.
2
L
att
i
ce
of musi
c
d
ata
s
et
. . .
52
3
.
3
R
e
duc
e
d
la
b
e
ling d
i
agram of mu
sic
d
ataset
l
att
i
ce
.
53
3
.4
RDFS
g
r
ap
h
o
f m
u
sic
d
ataset
.
. . .
.
. .
.
.
.
. .
55
3.5
RDF
Grav
i
ty
r
e
presentation
of
mu
s
i
c dataset
's
RDFS
g
r
ap
h
56
3.6
Obj
ects
of
rdf: type
pr
e
di
cate
w
i
t
h
dbpedia
-
owl
prefix
o
f
Lak
e
On
ega
and
Neva
R
iver
in DBp
ed
i
a .
.
. . . .
. .
.
.
.
.
.
. . . .
.
.
.
.
.
.
56
3.7 Objects
of
dbpedia-owl: Wikipagesdirect
predicate for
N
eva
River
in
DBp
ed
i
a .
.
. .
.
57
3
.
8
Jena framework .
59
3.9 SPARQL Example
60
3.10 Galicia v
.
2 beta view
.
62
3.12 S
N
ORQL
4.
1 L
att
i
ce o
f Ru
ss
i
a
d
ataset
b
y
G
a
li
c
i
a
. .
. .
.
.
.
.
. .
4.
2 Full RDF
S g
r
a
ph
o
f Ru
ss
i
a datas
e
t b
y
RDF
G
r
avity
4
.
3
P
a
r
ts of
RDF
S g
r
ap
h
o
f Ru
ss
i
a
d
atas
e
t . . . .
.
.
.
.
66
7
1
73
Tab
l
e
P
age
1.1
Classes
in DBp
edia
onto
l
ogy .
.
. .
.
.
. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
2.1 Tool
comparison
.
.
.
.
.
.
.
. .
.
.
.
.
. .
.
. .
.
.
.
.
. .
.
.
.
. .
. .
44
3.1
Bin
ary
re
l
atio
n
tab
l
e of music dataset
50
3
.
2 RDF Grav
i
ty
no
tat
i
ons
. . . . .
.
.
. .
64
FCA
Formal Con
ce
pt An
a
l
ys
is
RDF
R
eso
u
ce
D
esc
ription Fr
a
m
e
work
RDFS
RDF Schema
LOD
Link
e
d Open Data
SW
S
e
manti
c
W
e
b
G
a
li
c
i
a
GAlois
L
att
i
cebase
d In
cre
m
e
nt
a
!
Closed Items
e
t
Approa
c
h
SPARQL
Simple
Proto
co
l
a
nd Rdf
Qu
ry Langu
age
RDF Gravit
y
RDF GRAph Vlsu
a
li
zat
ion Tool
XML
E
xte
nsibl
e
Markup
L
ang
u
age
W3C
World Wid
e
W
eb
Consortium
RÉSUMÉ
Avec
l'au
gmentat
i
o
n
massive d
e
l
a
quantité
de
données
disponibl
es
sur
l
e
w
e
b
,
l
a
d
étecti
on et
l'a
n
al
yse d'i
nf
o
rm
at
i
on
dans
le
co
nte
nu
web
d
evie
nn
e
nt
très rentables.
L
e
d
ép
loi
e
ment des donn
ées st
ructur
ées
fondé
sur
les tec
hn
o
l
ogies du W
e
b sémant
ique
a
a
u
g
-m
enté
d
e f
açon sign
ifi
cative en
li
gne
au co
urs
d
es d
eux
dernières dé
ce
nni
es.
L
'ext
r
acti
on
d
'i
nformation d
ev
i
e
nt do
nc un
probl
è
m
e
majeur entre
les c
h
er
c
heur
s du W
e
b sémant
ique.
Pour publi
e
r des données st
ruct
ur
ées
sur
le
W
eb,
les
sources de donn
ées sont
décrites
avec le
Cadre d
e
D
escr
iption d
es
R
esso
ur
ces
(Resource D
escri
pt
i
on Framework ou RDF).
D
ans cette m
é
moir
e,
n
o
u
s cherchons à
ext
rair
e
l
a str
uct
ur
e
conceptue
lle
du W
e
b d
e
donn
ées,
c'est à
dir
e
, d
es
donn
ées
RDF d
a
ns
le
W
eb
de doc
ume
nts. L'obj
ect
if pr
i
nc
ipa
l
est
d
'
a
pprendr
e
l
e
niv
eau
du
sc
hé
m
a à
p
a
rtir du niv
eau
d
'
in
stances, en
d'a
ut
r
es
termes,
no
us essayo
ns d
e
convert
ir
les
donn
ées
RDF
à
RDF
Schéma (R
DF
S)
par
a
ppr
e
ntiss
age
d
e
la
st
ru
ct
ur
e conceptue
ll
e
indui
te p
a
r d
es
indi
v
idu
s décr
it
s
e
n RDF.
Pour
con
st
ruire
l
e
treill
i
s
d
e co
n
cepts à
part
ir
de données
RD
F,
les co
n
cepts
sont
i
dent
ifiés
à
l'ai
de
d
e
l
'Anal
yse
d
e concepts
f
o
rmel
s
(
FCA).
Le
nombr
e
de
co
nc
e
pt
s est
bas
é
sur
l
e
nombr
e
de so
u
s-e
n
semb
les
possibl
es
contenant
r
essour
ces
RDF
s
imil
a
ires
.
Par r
essour
ces
RDF s
i
mi
l
a
ir
es
, on veut dir
e
qu
e l
'on
co
n
s
id
ère l'e
ns
e
mbl
e
des resso
urces
RDF q
ui
p
artage
n
t
un
e
n
se
mbl
e co
mmun d
'att
ributs.
Après
la co
nstru
ct
ion du tre
illis
de
co
n
ce
pt
s,
nous
all
ons tenir compte des
propri
étés et
des propriétés d
e
donn
ées
d
é
duit
es
à
p
a
rtir d
e
donn
ées
RDF pour
constr
uire le
sc
hé
ma.
Un autre défi pour
co
nstruir
e
le
modè
le
RDF
S
est
le
fait
d
e
nommer
les
classes
de
RDFS.
P
our
ce
l
a
, on
utilise DBp
edia.
DBpedi
a
co
nti
e
nt
l
'
information
struct
urée
d
e
Wikip
éd
ia
,
qui
cont
i
ent
d
es
inform
at
ion
s t
r
ès
util
es
nou
s
permettant d'apprendre
le
type
d'
in
stances de sort
ie
dans
les
donn
ées
R
DF
.
La méthodologie
présentée
dans cette thèse extrait
le
schéma
maximum possible
à
p
art
i
r
du niveau d'in
sta
nce
de d
o
nnées
RDF. En
a
d
o
p
ta
n
t
l
es étapes
menti
o
nnées
ava
n
t
,
on atte
in
t
l
a capacité
d'exp
l
oiter
l
a structure conceptue
ll
e
à
partir
du Web de données.
The
amo
unt
of
ava
il
a
bl
e
data
o
n
the w
e
b ha
s
considerably increased in rec
e
nt
yea
r
s,
thus the d
e
tection and analysis of u
se
ful information from
i
ts co
nt
e
nt is
very
profi
tab
l
e.
D
e
plo
y
m
e
n
t o
f
st
ru
ct
ur
e
d
data
b
ased o
n
Semantic
W
eb tec
hn
o
l
og
i
es
h
as grow
n
s
i
g
nifi-ca
ntl
y o
nlin
e
in past two
d
eca
d
es.
Th
e
r
e
f
ore,
information
extraction
has become
a
m
a
jor
co
n
cern a
mong S
e
mantic W
e
b
r
esea
r
c
h
ers.
To
publish
structured data on the w
e
b
,
d
ata so
ur
ces a
r
e
published
u
s
in
g
t
h
e
R
e
sourc
e
D
escr
iption Fr
a
m
ewo
rk
(
RDF) d
ata
m
ade
l.
Thi
s t
h
es
i
s a
im
s at extract
in
g concept
u
a
l
st
ru
ct
ur
es
from Web
o
f D
ata,
i.
e.,
RDF
d
ata
in W
eb
of documents.
Th
e
main objective i
s to
l
ea
rn
sc
h
e
ma l
eve
l fr
om
in
sta
n
ce
lev
e
l in
a
dataset;
in
other
word
s,
we try to
couve
rt
the
RDF d
ata
into
a
data
w
i
t
h
the
RDF
S
c
h
e
m
a
(RDFS) mod
e
l
by
lea
rning
t
h
e
co
n
cept
u
a
l
str
u
ct
ur
e
between RDF
individu
als
in
the
instance l
e
vel.
To
const
ru
ct a co
n
ce
pt l
attice
fr
om t
h
e
RDF d
ata
,
concepts are
id
ent
ifi
ed v
i
a
Forma
i
Conc
e
p
t
Ana
lys
i
s
(FCA)
.
The number
o
f
co
n
ce
pt
s
i
s
bas
e
d on
t
h
e
numb
e
r
of
possibl
e
subs
ets conta
inin
g
similar RDF
individu
a
l
s
. B
y s
imil
ar
RDF
indi
v
idu
a
l
s
w
e
m
ea
n
t
h
e
set
o
f
RDF r
eso
ur
ces
which
s
h
are a
co
mmon
set
of
att
ribu
tes.
Aft
e
r
d
etect
in
g
co
n
ce
pts
of
t
h
e concept
lattice
-cl
asses of
RDFS-
a
nd
t
h
e
hi
erarchica
l
re
l
at
i
ons
b
etwee
n
them
,
we
ta
k
e
int
o acco
un
t
the prop
e
rti
es
and t
h
e
inferred data prop
e
r
t
i
es
from th
e
RDF
data
in
ard
e
r
to
co
nstru
ct
the schema
leve
l.
Anoth
e
r
c
h
allenge
in
bui
ld
in
g the
RDFS mod
e
l
from d
ata
i
s
naming th
e
RDFS
cl
asses.
W
e
overco
m
e t
hi
s
i
ss
u
e
by
u
s
in
g
DBp
e
di
a
. DBp
e
dia
co
nt
a
in
s
th
e
structured
inf
ormat
i
on
f
rom
Wikipedia
, w
hi
c
h
co
nt
a
in
s
v
e
r
y
u
sefu
l inf
ormation a
llowin
g
us
to
l
ea
rn
the typ
e
of ex
iting inst
a
n
ces
in
t
h
e
RDF d
ata.
The proposed methodo
l
ogy
in
the thesis ext
ra
cts the maximum
pos
s
ibl
e schema from
the
instance
level
of
RDF data. By adopting
the aforem
ntioned
steps,
we achieved
the
capab
ili
ty to
ex
p
l
oit conceptua
l
structure from Web.
Tod
ay,
the
W
e
b
of documents
h
as ex
p
a
nd
e
d
to the
Web
of
Data
since the a
pp
ea
r
a
n
ce
of
S
e
manti
c
W
e
b. W
e
b
of
D
ata
is d
esc
rib
e
d
as graphs of
data
.
It
rapidly produ
ces
l
a
r
ge
d
atasets conta
inin
g
billion
s
of
RDF
trip
l
es
from diff
e
r
e
nt dom
a
in
s
of
knowl
e
d
ge.
Thu
s,
with
hi
g
h
g
rowin
g availability of st
ru
ct
ur
e
d
data
on the web
, ex
ploitin
g
i
t
b
ecomes ever
mor
e
int
e
r
est
ing
.
Compar
ed to
RDF
data
,
XML
a
nd HTML
a
r
e
mor
e
r
ea
dabl
e
by humans
than
RDF
since
RDF d
ata doesn't ex
pli
c
i
t
l
y
follow hi
era
r
c
hi
ca
l
a
nd
sequent
i
a
l
st
ru
ct
ur
e
form
ats
.
Th
e
r
e
for
e,
RDF mod
e
ll
ac
ks
the simplicity of
human r
ea
d
a
bility
a
nd wri
tab
ilit
y
for
it
s
do
c
um
e
n
ts.
W
e
beli
e
v
e
th
a
t
concept
ext
raction from W
e
b of Data provid
e
d
in
RDF h
e
lp
s
u
s
for
fulfillin
g
u
se
r
's
r
eq
uir
e
m
e
nts
in
h
av
in
g a
b
etter
und
e
rst
a
ndin
g
of
h
ete
ro
ge
n
eous
d
ata
o
n
the web.
Impl
e
m
ent
in
g
this
id
ea co
uld
l
ea
d us
to
improv
e
the readability of RDF
statements by ordering a
nd
g
rouping
them
.
FCA is
a
k
ey
issu
e
for form
a
ll
y
dis
c
ov
e
ring
a
nd r
e
pres
e
nting
concept
hi
e
rar
c
hi
es
as we
ll
as
the
clust
e
ring
of
knowl
e
dg
e
found
on
th
e
w
e
b
.
M
ot
iv
at
i
o
n
Even though the data sources are structura
ll
y
defined
on Web of
Data,
the
effort
for reducing decentralization of data which
suffers
from the
lack
of vocabulary
in
non-conceptualized data is interesting. In other words, extracting schema from
data becomes
more interest
in
g w
h
en
it comes to
data w
i
thout exp
li
cit conceptua
li
zation.
RDF describes resources without cons
i
dering taxonomies of the
i
r classes and propert
i
es.
The approach of discovering conceptual structure from Web of Data represented as RDF
tr
ipl
es
i
s
possible
by
using
FCA.
Obj
ect
ive
Extr
act
ion of sch
e
m
a
from
RDF data
could l
ea
d
to RDFS model
co
n
st
ru
ct
ion
whi
c
h
co
nt
a
in
s
ri
c
h
e
r
vocabu
l
a
ri
es
for
describing
th
e
dat
a
. RDFS
i
s a
n
exte
n
s
ion
of
RDF
mod
e
l
whi
c
h
a
llows t
h
e
d
esc
ript
ion of
RDF
tenns
in
t
h
e
form
of
class
(types
of the
in
sta
n
ces),
subClass
(re
l
at
ion
b
etwee
n clas
ses)
,
property
(properti
es
whi
c
h d
esc
rib
e
classes)
a
nd s
ubProperty
(
r
e
lation
b
etwee
n properties)
as
w
e
ll
as
domain
a
nd
range
of t
he
properties.
Obt
a
ining
a
n
RDFS
model fr
om the
RDF d
ata
h
e
lps
u
s so
lve the
pr
o
bl
e
m
s
of h
ete
ro
ge
n
e
i
ty
in
r
a
w
data
of
t
h
e
web.
Stru
ct
ur
e
of thi
s d
is
sertat
ion
This
di
sse
rt
at
ion i
s
m
ga
ni
ze
d
as
follows:
In
Chapter 1, we
d
e
fin
e
th
e
b
as
i
c co
n
ce
pts
whi
c
h
are
u
se
d
t
hrou
g
hout
t
his
t
h
es
i
s.
Th
e
m
a
in
co
n
ce
pt
s
that
a
r
e exp
l
a
in
e
d in t
h
e
chapter
includ
e
:
R
eso
ur
ce
D
esc
ri
ption
Frame-work
(RDF),
RDF
Schem
a
(RDFS
),
Formai Con
ce
pt An
a
l
ys
i
s
(FCA)
a
nd
DBp
ed
i
a
.
Chapter
2 pr
esen
ts
a
r
ev
i
ew
of
th
e
literatur
e
.
It
pres
e
nts r
e
l
ate
d
works to our thesi
s
a
nd
the
co
mp
a
rison
of our
works
t
o th
e
m
.
First, t
h
e c
h
a
pt
e
r di
sc
u
sses
th
e
basi
c
simi-l
a
rit
y
m
et
hods that
ex
i
s
t
for
ontologies.
The similarit
y
m
et
hods
a
r
e
used
f
o
r
building
in
te
rlinkin
g
tools whi
c
h
are
introd
uced
in
continuan
ce
bri
e
fi
y
. Fin
a
ll
y,
we
pr
ese
n
t o
ur
approac
h in
com
p
a
ri
so
n to
the oth
e
r
works for
ex
tr
ac
tin
g s
imil
a
r RDF
individu
a
l
s.
Ch
a
p
te
r 3 d
e
s
c
rib
es t
h
e full impleme
n
ta
t
i
on of our m
e
thodo
l
o
gy
in
a
ddition to th
e
in-trod
u
ct
i
on to some
J
ava
p
l
atforms a
nd
APis
r
eq
uir
e
d
during
im
p
l
ementat
i
on.
Fina
lly
,
the methodology
is
eval
uated by
three metric measurements
including
precision
,
recall
and
f-measure in
Chapter 4 .
MAIN CONCEPTS
The current
chapter provides
background informat
ion on t
echnologies
we
benefited
from during our
approach.
In two first sections
, brief introductions
to RDF and RDFS
models
are
provided. Third
section introduces
FCA which plays
an
important
role
in
our methodology. Finally, the
DBpedia which
contains
useful
knowledge for
generating
our final
output is proposed.
1.1 RDF
The Resource
Description Framework
(RDF)
is a
fundamental
data
model in
Se-mant
ic Web t
echnology [MM04]
.
It is
designed
to b
e read and und
erstood by
machines.
As
a
generic
data
model,
RDF represents
t
he
information
on the
web
in
th
e
form
of
<
subj
ect-predicate-object
>
triples.
Each
triple
is a
sentence
describing
a
resource.
A
resource is an ent
ity whi
ch can be a subj
ect,
predicat
e or object in
an RDF triple.
Each
resource on t
he web
is
uni
quely
ident
ified
by Uniform Resource
Ident
ifier
(URI). URI
ident
ifies a resource via location
or a
name or both.
The subject or first part of an RDF triple is a resource which the statement describes.
The predi
cat
e or second par
t of a t
riple is
a property or
aspect which relates t
he resource
t
o
a
n obj
ect. T
herefore, t
he object
is
t
hird part of a t
riple
which
could b
e anot
her
re-source
or
a
liter
al
value defined as a string or a
number
, a date, etc
[1
899
]
.
of
a set o
f
tr
ipl
es w
h
ere eac
h tripl
e
represents
a
n
arc
.
Th
ere
for
e
,
each
RDF
state
m
e
nt
i
s a s
ub
graph
where
eac
h nod
e
i
s a s
ubj
ect
or object whereas
arcs a
r
e
pr
e
di
cates (The
arc starts
from the
s
ubj
ect and
it i
s
dir
ected to t
h
e
object). Further,
RDF
can
us
e
XML based
sy
n
tax
, i
.
e.,
RDF
/
XML
to c
r
eate or
modify the RDF
g
r
ap
h
s
[RDF0
4
]
.
An
exa
mpl
e of
RDF
grap
h i
s g
i
ven
in
the f
o
ll
ow
in
g
[Li13]
.
Suppose that
a st
ud
e
n
t
with name
James Anderson
h
as
professor Paul Jones
as
hi
s su
-pervisor
.
The
stateme
nt
s
r
e
l
ate
d
to this
inf
o
rm
at
i
on are represe
nt
e
d
as a
n
RDF
graph
s
h
own
in Figure
1.1.
http:f
fwww~mydomain.orgfuni-m/PauiJoneshttp:f
fwww .mydomain.org/uni-m/ JamesAnderson
http:f/www
.
mydomain.org/uni-m/
/Prof
essor
Figure 1.1: RDF
grap
h
examp
l
e
[Li13]
The XML
sy
nt
ax
of the RDF datais:
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:uni="http://www.mydomain.org/uni-ns">
<rdf:Description rdf:about="http:
//www
.mydomain
.org
/
uni-ns
/
PaulJones"
>
<rdf:
t
ype
rdf:resource="http://www.mydomain.org/uni-ns/Professor"/>
<uni:advises rdf:resource=
"
http://www
.mydomain
.
org/uni-ns/JamesAnderson"/>
</rdf
:
Description>
</rdf:RDF>
http:
//
www.w3.org/1999/02/
22-rdf-syntax-ns#
and
http:
//w
ww
.mydomain.org/uni-ns
a
re
XM
L name
spaces.
One use
s
XML name
space
in RDF
to s
h
ow
the
co
ll
ection of
names
of
resources
and
proper
t
ies. For example,
t
h
e
xmlns: rdf nam
espace
spec
ifies
that
the
el-emen
ts w
i
t
h rdf prefix com
e
from t
he namespace http:
1
/www
0w3
0org/1999/02/22-rdf--syntax-ns
wh
ich
is
known
as a
namespace
for RDF
vocabu
laries.
Moreove
r
,
XML
Qualified
Na
me
is
a shortcut for
URI
; for
example,
we could use
uni
instead
of the
full
URI
http:
1
/www omydomainoorg/uni-ns.
In
t
h
e
graph
,
Paul Jon
es
is connected
to
Jam
es
And
erson
by pre
dicate
advises.
Be-sicles,
there
exists another
re
lati
on wh
ich
co
nnects
Paul Jon
es
to
t
h
e
class
Prof
essor
at
t
he schema
l
eve
l;
ther
e
for
e,
Paul Jon
es
is
connected to
class
Profess
or
by
u
s
in
g
pred-icate
type
.
Again
,
t
he
nam
espace
rdf
is used ins
tead
of wr
i
ting
t
h
e
full
URI
http:-1
/www
0w3
0org/1999/02/22-rdf-syntax
-n
s#. The
full
examp
l
e
of
t
h
e
schema
level
is
given
in
t
he
following section.
1.2 RDF
Schema
(RDFS
)
On
top o
f
t
he
RDF
which doesn't provide sign
ificant semant
ics,
RDFS
is an
exten-sible
know
l
edge
repr
ese
ntat
ion
language
which adds vocabu
lary to
RDF in order
to
ex-press
information
about
class
a
nd subclass and properties
(re
lationship
betwee
n classes)
[BVM04]
.
These
vocabu
lari
es cont
ain
class,
subclass,
r
e
lationship
between classes, prop
-e
rty,
subp
r
operty, re
lati
onsh
ip
betwee
n proper
t
ies,
domains and ranges, etc.
The RDFS
level of
the examp
l
e desc
rib
ed in the previous
section
is
giv
e
n i
n
t
h
e
following
gr
a ph
[Li13]
(
drawn
as
Figure
1.
2):
The
schema
part
of th
e
RDF
/
XML syntax
from RDFS
data is:
<rdfs:Class rdf:ID
<rdfs:Class rdf:ID
"Person"/>
"Student"/>
<
rdfs:subClassOf
rdf:resource
=
"#Person"/>
</rdfs
:
Class
>
<rdfs:Class rdf
:
ID
=
"Professor">
rdfs:range
RDFSievel
RDF level
advisesFigure 1.2:
RD
FS gra
ph exa
mple
[Li
13]
</
rdfs:Class
>
<
rdf
s
:Pro
pert
y
rdf:ID
=
"advises"
>
<
rdfs:domain rdf
:
res
our
ce
=
"#Profe
ssor"
/
>
<
rdfs
:
range rdf
:
r
esour
ce
=
"#Student"
/>
</rdfs
:
Property>
"
#
"
is
us
ed
ins
tead of riting URI referenc. Th rdfs:doma
i
n
and
rdfs:range
predi
-cates
relate
a predicate to the class of instances which can be considered
as
the subject
or obj
ect
of
t
h
e
pr
ed
i
cate,
r
es
p
ect
ively.
rdfs: subClass
O
f ident
ifies t
h
e
hi
e
rarchical
re-lationship between
classes at
the
schema
level. In
t
he above example,
P
ro
f
essor
and
St
u
d
e
nt are subclasses of P
ers
on class. ad
vises
is a property
which
has classes
P
ro
f
ess
or
and St
u
d
e
nt respectiv
e
ly as
its
domain
and r
ang
.
1.3 FCA
1.3
.
1 Introduction to Form
a
l
Concept
Anal
ys
i
s
FCA
sta
nds for
Forma
l
Concept
An
a
l
ys
i
s, a
formal r
eprese
n
tat
ion
of
d
ata that
has
the potentia
l
to
b
e
r
e
pr
esented as
conceptua
l
str
u
ct
ur
e
[GW
99
]
.
FCA
is
a
data a
n
a
l
ys
i
s
tec
hniqu
e
that
h
e
l
ps to
id
e
ntif
y
the co
n
cept
u
a
l
st
ru
ct
ur
e o
f
data
u
s
in
g
formal
co
n
texts
a
nd
co
n
ce
pt l
at
ti
ces.
Every
d
a
t
ase
t whi
c
h
cons
i
sts
of
a
binary r
el
at
i
o
n
between
a set o
f
objects a
nd
a
set of
att
ribu
tes ca
n
be
in
troduced as a
formal
co
n
text
in FCA
[
WB04].
Definition 1 (Formal
Context):
A
f
orma
l
context
i
s a tr
iple
K
:= ( G
,
M
,
I
), whe
r
e
G
a
nd
M
are sets and
I i
s a
re
l
at
ion b
etwee
n
G
a
nd
M. The el
ements
of
G
a
nd
M are
ca
ll
ed objects a
nd
att
ribut
es
, r
es
p
ect
iv
e
l
y
,
a
nd
(g
,
m)
E
I
i
s
r
ead as
"
a
n
object
g
h
asan
attr
ibut
e
m".
A set
of o
bj
ects a
nd
their
co
rr
es
pondin
g attr
ibu
tes
plus
the
r
e
l
ations that ex
i
st b
e
tw
e
e
n
those
ob
j
ects a
nd
attr
ibu
tes ca
n
be s
h
own
in
a
forma
l
co
n
text.
F
o
rm
a
l
co
nt
ext ca
n
be
represented
as a
table
i
n wh
i
c
h row
s a
r
e ob
j
ects and co
lu
mns
are att
ribut
es a
nd
eac
h
cross
in th
e
tab
l
e
i
s a
r
e
l
at
i
o
n b
e
tw
ee
n
a
n
ob
j
ect and corresponding
at
tribut
e
.
An
e
xamp
l
e o
f
form
al
cont
e
xt can
b
e
s
ee
n in Figure 1.3. Th
e
examp
l
e
includ
es
four
obj
e
ct a
nd
four
attr
ibut
es.
~
Aii2l~
Att'4l
Objl ~ ~ ~
IEl
Obj2
1
~
~ [ôlEI
1obj31
~
IEl
IEl
~Obj4 ~
ID
[j) ~F
i
gur
e
1.3
:
F
o
rmal
c
ont
e
xt
e
xampl
e
F
ur
t
h
e
r
, t
h
e
form
a
l
co
n
text ca
n b
e re
pr
ese
n
te
d in
c
on
ce
p
t
u
a
l
str
u
ct
ur
e
whi
c
h
w
ill
be
ex
pl
a
in
e
d
in th
e
n
ext sec
tion.
1.3.2
Concept
L
att
i
ce
A
formal
concept
ca
n b
e
r
ep
r
ese
nt
ed
in
a
l
att
i
ce
of co
n
ce
pt
s
in
which
eac
h
co
n
cept
includ
es a set
of objects a
nd
r
e
l
ated
att
ribut
es.
Th
e
d
e
finition
of
form
a
l
concept a
nd
concept
l
att
i
ce
are g
i
ven
in
t
h
e
followin
g
[WH06].
Definition 2 (
'
Operation):
For
a
set
A
Ç
G
of abjects,
w
e
defin
e:
A'=
{mE
M
1\lg
E
A: (g,m)
E
I}
Corresponding
l
y, for a set
B
Ç
M
of attri
but
es,
we define:
B'
=
{g
E
G
1\lm
E
B
: (g,
m)
E
I}
Th
e
formal
concept
i
s
defined as:
Definition 3 (Formai Concept):
A forma
l
concept C
in
the
form
a
l
co
n
text (
G
,
M,
I
)
i
s a
p
a
ir
(A,
B)
,
where
A
Ç
G
,
B
Ç
M
,
A'
=
B
a
nd
B'
=
A. T
h
e set
Ais
ca
ll
ed
the
exte
n
t a
nd
the set
B
the
int
e
nt
of the co
n
cept
C.
In
ot
h
er words,
eac
h
concept
i
s
represented by
a
p
a
ir
cons
i
st
in
g of an
exte
nsion
and an
intension which
a
r
e a set
of abjects
and a set
of
attr
ibut
es
,
respect
i
ve
l
y.
As
a ge
n
era
l
rule
,
the abjects in the exte
n
sion
h
ave a
ll
the attributes
in
t
h
e
ir in
tension
in
co
mmon
and
h
ave
no oth
e
r
attr
i
butes
in
common. Further,
a
ll
the attr
ibute
s
i
n the intension are
s
h
a
r
ed
b
y a
ll
the abjects
in
t
h
e ex
t
ens
i
on a
nd no
other
abject
outside
o
f
the extension.
A
conce
pt
l
at
ti
ce arises on
the top o
f
the co
ncept
s
d
e
riv
ed from
form
a
l
co
nt
ext.
Definition 4 (Concept
Lattice): For
a
formal
context
K
:=
(G,
M,I) and
two
con-cepts C1
=
(A 1,
B1)
and C2
=
(A2
,
B2)
, a
hi
era
r
chica
l
subconcept
-
superconcept relation
is
given by
T
h
e set of a
ll
concepts in K
o
rd
ered by
the :::; relation is
called the concept
lattice of
K.
The conce
pt l
attice of th
e
above examp
l
e
i
s s
h
own
in
F
i
gure
1.4.
Figure 1.4: Concept
l
attice examp
le
l
abeli
ng di
agram on
ly s
h
ows
the
attr
i
butes
a
nd
abj
ect
s
o
n
ce
in
latt
i
ce
diagram
(
Fi
gure
1.5); therefore,
i
t
mak
es
data
a
n
al
yzat
i
o
n
eas
ier
for
sorne
applications.
1.4
DBp
e
dia
DBpedia
1
i
s a
project
t
hat aims at
ext
racting structured information from Wik
i
pedia
co
ntent. T
his
open source data
set is availabl
e
o
n the web
as
linked data
-
RDF
t
ripl
es-
for
hum
an
and machine u
sage. S
ince
DBped
ia
is
prov
ided
in struct
ura
l
form
,
it a
llows
u
sers
for
much
eas
ier
querying
and
e
xploring
agai
nst Wikipedia
content
by
using
SPARQL
e
nd point.
So
far
DB
pedia is
known
as a central
interlinking hub for
published
data
on t
h
e
web
and
it is
evolved
by
any
c
hanges
in
Wikip
e
dia
[
ABKLC08].
DBpedia includes
around
3.5
million instances
that belong to
diff
erent categor
i
es.
Also,
DBp
ed
ia
is available
in
97
different languages.
More
informati
on
is
gi
ven
later in
this section.
In
t
he
following
,
t
he structure
of DBpedia and
t
he source
of
its
data will
be
described.
F
inally, the methods
for
accessing
DBpedia
are
discussed.
Att 1
Att 4
Att 3
Obj 3, Obj 4
Obj 1
Figure
1.5:
R
educed
labeling diagr
am of concept lattice example
1.4
.
1 Dat
a
Source
DBp
edia
is
a
cross-domain ont
olo
gy which has b
een built manually by th
e members
of DBpedia
community
.
DBpedia
uses Wikip
edia as it
s source
of knowledge. Wikipedi
a
is
one of th
e fastest-
growing and
largest
collections of human know
l
edge ever
collected.
Since some of the information in Wikipedi
a is unstructured, querying information from
it
needs a full t
ext search
.
The DBpedia community found a way t
o convert th
e contents of
Wikipedia
into RDF triples.
In addition to fr
ee t
ext information
, DBpedia also uses the
different t
ypes
of structured information from Wikipedia
including
infobox templates
1
,
t
itle, abstract
,
categorization information
,
images,
geo
information,
and
external
url
links and
conver
t
s
t
hem
into
RDF triples.
Figure
1
.
6 shows an
example
of extracting
semant
ics
from
a
Wikipedia
infobox
for
Port
ugal's content
in DBpedi
a. Currently, t
he
DBpedi
a ontology
2
is created based
on several
Wikipedia
infobox temp
lates and
con-verts t
hem into 359 classes with
1,775 properties.
As m
ent
ioned
earlier
,
DBpedi
a
includes 3.5 million
inst
ances and 2.
35
million of which
are classified in the DBp
edia
Ontology, including
persons,
places,
works
(
contain mu
sic
1
http:
/ / en.
wikipedia.org/
wiki/
Help:Infobox
21 1 1 image..111ap 1 mapsize
=
Algarve = Region = LocalRegiaoAlgarve. svg=
186lpx1 map_caption = llap showing Algarve
Region in Portugal
subdivision...type
=
[[Countries of thewor ld 1 Country]] subdi vision...name = { {POR}}
subdivision...type3
=
Capital citysubdivision...name3
=
[[Faro, PortugaliFaro]]1 are a_ totaLJan2
=
54121 population...total
=
41&&&&1 timezone
=
[[Western European1 utc_offset 1 timezone..J)ST 1 utc_offset..JlST 1 blank_name_sec1 1 blanlc_info_sec1 1 blank_name_sec2 1 blank..info_sec2 } Time 1 iiETJ] =+Il = [[Western European
S\l.llllller Time IWESTJ]
=
+1= [[NUTS]] code
=
PTIS= [[GDPJ] per capita