O VE F RANK
Composition and structure of social networks
Mathématiques et sciences humaines, tome 137 (1997), p. 11-23
<http://www.numdam.org/item?id=MSH_1997__137__11_0>
© Centre d’analyse et de mathématiques sociales de l’EHESS, 1997, tous droits réservés.
L’accès aux archives de la revue « Mathématiques et sciences humaines » (http://
msh.revues.org/) implique l’accord avec les conditions générales d’utilisation (http://www.
numdam.org/conditions). Toute utilisation commerciale ou impression systématique est consti- tutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte- nir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
http://www.numdam.org/
COMPOSITION AND STRUCTURE OF SOCIAL
NETWORKS1 Ove FRANK2
RÉSLJMÉ -
Composition
et structure des réseaux sociauxLes réseaux sociaux représentent une ou plusieurs relation entre des individus, et des informations sur ces
individus eux-mêmes. Les réseaux sociaux montrent à la fois la structure du réseau et des informations sur les
individus. Des modèles
probabilistes
peuvent être utilisés pour analyser les interrelations entre les variables structurelles et les variables individuelles - parexemple
pourexpliquer
comment la structure peut être interprétée par les informations dont ondispose
sur les individus ou comment la composition de celles-ci peut êtreinterprétée
par la structure. L’auteur discutedifférents
modèles et utilise diverses méthodes statistiquespour illustrer les interrelations entre des données concernant un réseau.
SUMMARY - Social networks representing one or more relationships between individuals and one or more categorical characteristics of the individuals exhibit both structure and composition. Probabilistic models
of such networks can be
used for
analyzing the interrelations between structural and compositional variables, for instance in order tofind
how structure can beexplained by
composition or how structureexplains
composition. Different models are discussed anddifferent
statistical methods areemployed
to illustrate suchinterrelationships in network data.
1. INTRODUCTION
Social networks are
composed
ofindividuals,
various individualattributes,
interindividualrelationships,
and various attributes of theserelationships.
The statisticaldescription
andanalysis
ofcompositional
and structural data on social networks can benefit from the use ofprobabilistic
models that formalize andseparate composition
and structure in various ways.The purpose of this article is to review and extend some of the most common social network models and illustrate how
composition
and structure can be reflected in theprobabilistic assumptions.
For other reviews andexpositions
of social networkmodels,
reference isgiven
to the books
by
Knoke and Kuklinski(1982),
Pattison(1993),
and Wasserman and Faust(1994)
and to the articlesby
Frank(1981, 1988a).
Section 2 introduces the
concept
of a coloredmultigraph
in order torepresent
a social network with attributes attached to individuals andrelationships.
A fewexamples
of the useof colored
multigraphs
are discussed to show thegenerality
andflexibility
of theconcept.
Inparticular,
coloredmultigraphs comprise simple graphs
anddigraphs
as well asgraphs
withboth directed and undirected
edges.
The
simplest probabilistic
models of coloredmultigraphs
haveindependent dyads (induced subgraphs
of ordertwo).
Sections 3 and 4give
some results forsimple graphs
anddigraphs
~ This article presents an extended version of a talk given at the International Sunbelt Social Network Conference in Charleston, SC, 1996.
2 Department of Statistics, Stockholm University, Sweden.
with
independent dyads
in order to settleterminology
and notation and toprovide
anadequate background
for moregeneral
models.Sections 5 and 6
generalize
the results of Sections 3 and 4 tosimple graphs
anddigraphs
withpossible dependence
between incidentdyads
and conditionalindependence
between non-incident
dyads,
i.e. Markovdependence
fordyads.
Section 7gives
some results forgraphs
and
digraphs
withgeneral dependence
between thedyads,
and Section 8 treats the extensionto colored
multigraphs.
Theinterplay
betweencompositional
and structuralmodeling
isdiscussed in Section 9
together
with adescription
of a few broad classes of network models.Section 10
gives
a brief introduction to dataanalytic
methods in social networkanalysis.
2. SOCIAL NETWORK DATA REPRESENTATIONS
To
analyse
simultaneous distributions of several attributes it is often convenient tocategorize
or
recategorize
each attribute to two or a fewcategories only.
All attributes in the social networks are here assumed to becategorical.
Ageneral specification
of a social network on nindividuals can be
given by
ap-dimensional
vector variable x defined on theindividuals,
aq-dimensional
vector variable y defined on the unorderedpairs
ofindividuals,
and an r-dimensional vector variable z defined on the orderedpairs
of individuals. Thus network data consist of individual data vectors Xi fori=l,...,n
andpairwise
data vectorsyj=yji
andzlj
fori=1,...,n
andj=1,...,n
with/~/.
If the p
components of x
haveal,...,ap categories, the q components of
y havebl,...,,b
categories, and
the rcom p onents of z
have cl"",crcategories, there
is apossible
totalot-
a=al,...,ap
combinedcategories
of individualattributes, b=bl,...,bq
combinedcategories
ofattributes of unordered
pairs
ofindividuals,
and c=Cl,,,.,cr combinedcategories
of attributes of orderedpairs
of individuals. If the network isrepresented
as agraph
on n vertices withn undirected
edges
andn (n-1 )
directededges,
each vertex isgiven
one of a distinct2 colors, each undirected edge
is given
one of b distinct colors,
and each directed edge
is given
n 2 )
one of c distinct colors. In total there is a
possibility of an h 2 cn(n-1)
distinct colored labeledmultigraphs.
For
dyads (colored
labeledmultigraphs
of ordertwo)
there area2bc2
distinctsversions,
and for triads(colored
labeledmultigraphs
of orderthree)
there area3b3c6
distinct versions. Inparticular, (a, b, c)=( 1,2,1 ) yields
2dyads (labeled graphs
of ordertwo)
and 8 triads(labeled graphs
of orderthree),
and(a,b,c)=(I,I,2) yields
4dyads (labeled digraphs
of ordertwo)
and64 triads
(labeled digraphs
of orderthree).
Structuralproperties
ofgraphs
are often defined asproperties
that are invariant underisomorphism,
and the class ofisomorphic
labeledgraphs
can be
represented by
an unlabeledgraph.
There are 2 undirecteddyads (unlabeled graphs
oforder
two)
and 4 undirected triads(unlabeled graphs
of orderthree),
and there are 3 directeddyads (unlabeled digraphs
of ordertwo)
and 16 directed triads(unlabeled digraphs
of orderi - - 1
three). According
to Frank(1988b)
there aredyads
andtriads for unlabeled colored
multigraphs.
The counts of thesedyads
and triads among all inducedsubgraphs
of order two and three,respectively,
contained in a coloredmultigraph
oforder n are
important
structural statistics. Inexploratory
social networkanalysis
thedyad
and triad counts
might
be convenient summarystatistics,
and underspecial probabilistic
assumptions they
can be shown to be sufficient statistics. See Frank and Strauss(1986),
Frank(1985, 1988a),
and Frank and Novicki(1993).
In order to illustrate the
generality
andflexibility
of the coloredmultigraph,
consider first thecase of a social network
comprising
individuals of two kinds and threesymmetrical binary
relations. This network can be
represented by
a coloredmultigraph
with(a, b, c)=(2,8,1 ).
The2 vertex colors
correspond
to the two kinds of individuals. The 8 undirectededge
colorscorrespond
to thepossible
combinations of occurrence or non-occurrence on each one of the threesymmetrical
relations. Thesingle
color on directededges corresponds
to the absence of anyunsymmetrical binary
relation.As a second
example
consider the case of a social networkcomprising
males and females of three age groups and twobinary
relations. Both the relations describe different kinds ofpairwise
contacts between theindividuals,
and contact intensities arereported
aslow, medium,
orhigh
from each individual to every other individual. Here the social network canbe
represented
as a coloredmultigraph
with(a, b, c)=(6,1,9).
The 6 vertex colorscorrespond
to the combinations of
gender
and age group. Thesingle
undirectededge
color means thatthere is no
symmetrical
relation. The 9 directededge
colorscorrespond
to the combinations ofcontact intensities for the two kinds of contacts.
Finally,
consider the case of a social network defined on individualscategorized
asbeing presently employed
or not, ashaving
ever beenemployed
or not, and asbeing healthy
or not.There is information about
kinship,
about father-sonrelationships,
and about brother and/or sisterrelationships.
Thisexample implies
that astraightforward
cross-classification of the attributesmight
lead tocategorical
combinations that are knownapriori
to beimpossible (structural
zeros among the cross-classificationfrequencies).
Itmight
beadvantageous
toavoid this kind of attribute
redundancy by defining
combined attributes so that the total number of combinedcategories
is reduced. Forinstance,
thestraightforward approach
is todefine three
binary
individual attributes(indicating present employment, previous
orpresent
employment,
andhealthy condition)
and threebinary relationships corresponding
to the twosymmetrical kinship
and brother and/or sisterrelationship,
and oneasymmetrical
father-sonrelationship.
Thisyields
a coloredmultigraph
with(a,b,c)=(8,4,2).
An alternativeapproach keeping
the same information is thefollowing.
Introduce anemployment
attribute with threecategories corresponding
to neveremployed, previously employed only,
andpresently employed. Keep
the health status attribute.Furthermore,
introduce akinship
attribute with fourcategories corresponding
to nokinship,
father-sonrelationship,
brother and/or sisterrelationship,
and other kinds ofkinship.
Thisyields
a coloredmultigraph
with(a, b, c)=(6,1,4). Thus, compared
to the initialapproach
a and b have been reduced but not c.Using
thedyad
formulareported
above it follows that the number ofnon-isomorphic dyads
have been reduced from 544 to 300. A further reduction to 234
non-isomorphic dyads
can beachieved
by modifying
the alternativeapproach
so that the initial father-sonrelationship
iskept
and the initial twosymmetrical relationship
arereplaced by
onesymmetrical relationship
with three
categories.
Thisyields
a coloredmultigraph
with(a,b,c)=(6,3,2).
Generally
it isimportant
to havemutually
exclusive and exhaustivecategories
forvertices,
undirected
edges,
and directededges.
As the lastexample illustrates,
a reduction of b thatimplies
an increase in c isguaranteed
neither to reduce nor to increase the number ofdyads.
3. INDEPENDENT DYAD MODELS FOR SIMPLE GRAPHS
Bollobas
(1985)
and Palmer(1985)
in their extensive accounts of thetheory
of randomgraphs
treatsimple parametric
models and uniform models which have influenced much research ongraphical limit theorems, graphical evolution,
and randomgraph
processes. Such models have notgot
asufficiently
richprobabilistic
structure for mostapplications
and needto be extended to
provide good
fit to statistical network data. Extensionsmight
be mixturedistributions of
simple
models or various kinds ofgeneralizations allowing
morecomplex specifications.
Even for the
simplest
network models the statisticalaspects
deserve some attentions. Network data offer numerouspossibilities
ofvarying
thesampling
and observationprocedures,
andthis
might
result in unconventional statisticalproblems.
Forinstance,
vertexsampling
andinduced
subgraph
observation or different kinds ofcomplete
andpartial
snowballsampling
have been discussed in the literature. See Frank
(1988a)
for references.Here interest is not focused on
sampling
or other sources ofexplanation
for the stochasticdependence prevailing
in netwotk data. Models with different kinds of structuredependence
are
investigated
withoutreferring
to whethersampling,
measurement errors or other sourcesof variation
explain
the randomness.Dyad independence
forsimple graphs
is astarting point.
Consider a finite vertex set
V--{ 1,...,n }
and asimple
randomgraph
on V definedby
its....
( n )
.symmetric adjacency
matrixY=(Yij)
withY=O.
There are 2 2possible
outcomes y ofY,
and under
dyad independence
theprobability
function isgiven by
where pij
is theprobability
ofedge { i, j }.
A convenientreparameterization
is obtainedby introducing
thelogodds 0E4
=logfpij 1 (1-Pj)1
so that J
where ,
is a
normalizing
constant.A
particular
case is thehomogeneous
model with alledge probabilities equal, pij=p,
which iscalled a Bernoulli
(Vp)
model.Setting q=1-p
anda=log(plq) implies that
where is the
edge frequency
of y. Here theedge frequency
minimal sufficient statistic with a binomial
n ,p distribution,
and the maximum likelihood 2estimator of p is
given by
theedge density
An alternative to
homogeneity
isgiven by
thedyad independence
model with amultiplicative edge probability decomposition according
topij
=p#1#,
where~1 ,..., #n
areactivity probabilities
of the vertices and p is a latentedge probability.
A manifestedge
occurs if and
only
if the latentedge
occurs and issupported by
active vertices. Thus then
2edge probabilities
arereplaced by
n+1probabilities p,#i,...,#n
of latentedge
and vertexacitivities. The
probability
function isgiven by
where r is the
edge frequency
as before and yi. =t yij
is thedegree
of vertex 1. The 7=1probability
function is invariant to admissibleparameter changes
that leaveV~ ~
invariantfor
1=1 ,..., n. Identifiability
of theparameters
can be achievedby imposing
the restriction This means that theexpected
numbers of manifest and latentedges
are thesame.
A similar
dyad independence
model is obtainedby assuming
an additivelogodds decomposition according
toctij
=o~+Y~+Y~ .
Thisimplies
that theprobability
function isequal
ton
where yi. is the degree of vertex i
in y.The parameters oc,Y 1 ,...,Y n restricted by Yi= 0
i=1 are identifiable and can be considered as overall and local
specifications
of Y. Thedegrees Y1....,Yn.
of Y are minimal sufficient statistics. Thus this modelmight
bepreferable
to theprevious
model with amultiplicative edge probability decomposition.
4. INDEPENDENT DYAD MODELS FOR SIMPLE DIGRAPHS
A
simple
directedgraph
on V is definedby
itsadjacency
matrixZ=(-7,j)
withZii=0.
Adyad
inducedby
iand j (in
thatorder)
isspecified by (Zij, 7;i).
There are2n~n-1)
outcomesz of
Z,
and underdyad independence
theprobability
function isgiven by
1
where
pij(0,0), p~(OJ),p~(l,0),
andp~j(1,1) are the probabilities of a dyad with no edges
between i
and j,
with anedge from j
to ionly,
with anedge
from ito j only,
and withmutual
edges
between i andj.
It is convenient to introducedyad probabilities pij~k,l)
for alli
and j
andput p#(k,1)=p,jl,k). By denoting ~1,0)=~,
andpt~~(1,1)=b~j
it follows that~(0,0)=1-~-~-~, p~j(0,1)=aj~ ,
and~=~.
Hencewhere
and c is a
normalizing
constantgiven by
A
particular
case is thehomogeneous
model with alldyad
distributionsequal,
that isaij=a
and
bij=b,
or,equivalently, 0E4=OE
andP~=[3
whereIt follows that the
frequcncies
of induceddyads
in Z of size0,1,
and 2 are, multinomial-distrihuted.
If thesefrequencies
are denotedthe maximum likclihood estimators of a and b
are given by â = N1 I n(n-1 )
and~ 1 .- 1
An alternative to
homogeneity
is thedyad independence
model withpartial homogeneity
~i,~=~3.
With no restrictions onocij
there aren(n-1)+1
parameters in the model. If an additiven
decomposition
ofcx’1j
is assumedaccording
toctii + a,; + 3 with ai = 0
and1= 1
n
1 Bi
= 0, then there are 2rr free parameters and sufficient statisticscorresponding
to in-j=1 1
and
outdegrees
and the total mutualadge frequency.
This is the well known model introducedby
Holland and Leinhardt(1981).
An extension is obtainedby relaxing
thepartial
homogeneity Pij=p
to an additivedecomposition according to fi ij
= yt +Yi
+Yj
withn
I Y; _
() This model has ~n-1 free parameters and sufficient statisticscorresponding
toin,
i=1 1
out-, and mutual
degrees
at every vertex, that isfor 1=1 ,...,
n.5. MARKOV DYAD MODELS FOR SIMPLE GRAPHS
The
assumption
of stochasticindependence
betweendyads might
seeminappropriate,
since anetwork is
usually
studied because there is an interest in the links and influences acrossseveral individuals in the network.
Dyad independence
means that all links and influencesinvolving
three or more individuals are the results of random effectsgoverned by
a set offixed values on
dyad parameters.
Therefore the structuralproperties beyond
those ofdyads
are
only indirectly
controlled andmight
fail to fit data. It should bepreferable
to have accessto
parameters retlecting dyad
interactions.The Markov
dyad
models introducedby
Frank and Strauss(1986)
assume that non-incidentdyads
areconditionally independent
but incidentdyads might
bedependent. They
show thatthis
implies
that there areparameters and sufficient statistics
corresponding
totriangles (3-cycles)
and stars. Theprobability
function isgiven by
where the last sum is over distinct
vertices,
and there are(n)
3triangle
parametersT’
(
Bn-1
starparameters
..., forin=2,..., n-1, and ! edge
parameters c,.. Theln / .." ’’" 2
normalizing
constant c is a function of the parameters determinated so that the2 2
probabilities p(y)
sum tounity.
Under
homogeneity
alltriangle
parameters areequal
and the star parametersdépend only
onthe order of the star: ’iik = T and This
implies
thatisomorphic graphs
y getthe
sameprobability
where
is the number of
triangles
in y,(with
distinct vertices in thesum)
is the number of m-stars(stars
of sizem)
in y form=2,...,
n-l,and sl
=2 1
y; is twice the numberof edges
in y.i j
A
simplified
version of thehomogeneity
model assumescrtn=O
for m>2. If theremaining
star parameters are
denoted al =p/2 and 0’2=0’, it
follows thatwhere r,s, t are the
frequencies
ofedges, 2-stars,
andtriangles
in y. This is asimple
modelfor a random
graph
withdependence
between incidentedges.
Inference for this model is discussedby
Frank and Strauss(1986),
Frank(1991),
and Frank and Nowicki(1993).
Without
assuming homogeneity
butassuming
that the starparameters
are 0 for all stars of size 3 or more, the model hasn
2 parameterscrij
y forij,
. n( 2 P n-1 parameters cr ijk
forjk,
andn }
parameters’Cijk
fori jk.
One way ofsimplifying
this model is to make 3additive
decomposition assumptions according
towith restrictions
Zoei
=Zfii
=ZYi
= 0. This model has 4n-1parameters.
Further feasiblesimplifications
are « 1 = « ,a~ _ (3 ~ = Yi, oci = 0, ~i 1= o, or Yi = 0.
The vertexparameters
Mi,
fii, Yi,
«1 can be considered ascontrolling
foredges, 2-paths, 3-cycles,
and 2-stars at vertex i.(A
2-star at i withedges to j
and k is also a2-path at j
and a2-path
atk.)
6. MARKOV DYAD MODELS FOR SIMPLE DIGRAPHS
In the directed case, conditional
independence
for non-incidentdyads implies
that a randomdigraph
withadjacency
matrix Z has aprobability
functionwhere the sum is over all
adjacency
matricesof subgraphs of z
and8(a)=0
unless a is anadjacency
matrix of any of thefollowing graphs
on V : asingle edge,
a2-cycle,
a star oforder 3 or more, a
triangle.
The parameter6(a)
is denotedp~?
for asingle edge, p~2~
for a2-cycle, 6~k’l~
for a star of order m+1 with centerio
andedges
fromio
to the first k vertices among/..., i,,,
i andedges
toio
from the last 1 vertices among/1..., i ,
,n andT(-)
ijk for atriangle
oftype
m. There aretriangles
of 7 types.Types
1 and 2 have size 3.Types 3, 4,
and 5 have size 4.
Type
6 has size 5 andType
7 has size 6.Type
2 is a3-cycle. Type
3 has avertex with two
outedges
andType
4 has a vertex with twoinedges.
In total there aren(n-1)
edges, ( n ) B
22-cycles, n(n-1 )3,n
Y in m-stars(stars ( )
of orderm+ 1)
form=2,...,n-l,and
> > >27 ( n
3
triangles.
The 27triangles
on each fixed set of three vertices consist of 6+2+3+3+6+6+1triangles
of the 7types.
Under
homogeneity
theparameters
are denotedand it follows that
where rl
and r2 are thefrequencies
ofedges
and2-cycles
in z, sklm is thefrequency
ofm-stars
having k outedges
and 1inedges in z, and tm
is thefrequency
oftriangles
ofType
m in z. The number of
parameters
and sufficient statistics is5+ n+2 .
3Only
15parameters
remain if star
parameters
are set to 0 for m>2. In this case the sufficient statistics are thenumbers of
edges, 2-cycles,
2-stars of sixkinds,
andtriangles
of seven kinds. Anequivalent
set of sufficient statistics is the set of triad counts. This set contains 16 kinds of
triads,
andtheir counts sum to
( ).
Thus the triad counts are sufficient statistics if we assume3
homogeneous
Markovdyads
with noparameters
for stars of order 4 or more.By dropping homogeneity
andassuming
additivedecompositions
of theparameters
it ispossible
to obtain models thatmight
be of interest as alternatives to the Holland-Leinhardt model.Assuming
.11 ’B
with
I:mi = ~ (3 j
= 0together
withpartial homogeneity
or zero values on other parameters shouldgive interesting
alternatives.7. CONDITIONAL INDEPENDENCE MODELS FOR SIMPLE GRAPHS AND DIGRAPHS
Consider first the
adjacency
matrixY=(Yij)
of asimple
randomgraph
on V. Definedijkl
equal
to 0 or 1according
to whether or notYij
andYkl
arestochastically independent
conditional on the rest of
Y,
that is conditional on all elements of Yexcept Y~j, Yj~, Ykl, F~.
Obviously dijkl=dklij
for alli, j, k,
1.Moreover, d~jkl=0
ifi j
ork=l,
andd#j=1
ifi~j.
Let
a=(a#)
be theadjacency
matrix of asimple graph
on V. There are2~ 2 ~
such matrices.Define a function
8 (a)
on the class ofadjacency
matrices in such a way that8 (a)=0
unlessfor all
i, j, k,
1. Then theprobability
function of Y can be shown to begiven by
where c is a
normalizing
constant determined so that thep(y)
sum to 1 for alladjacency
matrices y of
simple graphs
on V. There are terms in theexponent
for allsubgraphs a
of yhaving 8 (a)~0.
These termscorresponds
tosubgraphs a
that are restrictedby
the numbersin
d=(dijkl).
We can consider d as anadjacency
matrix of agraph
on V2 withloops
at all(1, j)e
V2 with1#j.
Thisgraph
is called thedependence graph
of Y. Ifa=(aij)
is consideredas an indicator of a subset of
V2,
the restriction on a in terms of d means that a shouldcorrespond
to aclique
of thedependence graph
d.Consider now the
adjacency
matrixZ--(Zij)
of asimple
randomdigraph
on V.Define d~ ~kt
equal
to 0 or 1according
to whether or not thedyads (Z#,Z,1)
and(Zkl, Z tk)
arestochastically
independent
conditional on the rest of Z.Proceeding
as above we define a function0 (a)
foradjacency
matricesa=(aij)
ofsimple digraphs
onV, such that 0(a)--O
unlessfor
all i, j, k,
1. It follows thatfor any
adjacency
matrix z of asimple digraph
on V.Again
the relevantparameters correspond
to subsets ofV2
that arecliques
of thedependence graph
d.8. CONDITIONAL INDEPENDENCE MODELS FOR COLORED MULTIGRAPHS
The conditional
independence
models forsimple graphs
anddigraphs
considered in theprevious
three sections can be extended to coloredmultigraphs by applying
conditionalindependence specifications
to moregeneral
multivariate random variables. The booksby
Whittaker
(1990)
and Edwards(1995)
discuss conditionalindependence modelling
in ageneral
context. The field is known asgraphical modelling
not because network data is ofconcern but because
graphs
are used torepresent
multivariate models.Consider a random colored
multigraph given by
a vectorX=(Xi) having
an outcomes ofvertex
colors,
asymmetrical
matrixY=(Y..) having b(2 )
outcomes of colors of undirectedvertex
colors,
asymmetrical
matrixY=(Yij) h a vin g b ( n ) 2
outcomes of colors of undirectededges,
and a matrixZ=(Z;ij) having cn(n-1)
outcomes of colors of directededges.
There arevarious
possibilities
ofspecifying
theprobabilistic
structure of(X, Y, Z).
One way is to condition on X and consider the
n
2dyad
variables(Y ij’ Zij, Zji)
forij
conditional on X. These variables are chosen to be the basic variables for which a
conditional
dependence
structure needs to be defined. An alternative is to consider the 3n
2variables
YU’ Zg, Zji
forij separately
conditional on X as the basic variables.Another way is to consider
the n
2dyad
variables(Xi Xj,
J JZji)
forij
as the basicvariables with a conditional
dependence
structure. Here all incidentdyads
arecertainly dependent
via their common vertex colors. An alternative is to consider all n+32
2variables
separately
as the basic variables with a conditionaldependence
structure.However,
the
symmetry
of theapproach
withdyad
variablesmight by
useful.A convenient choice in any
particular application
should be a way that offers asimple
conditional
dependence
structure. In addition tospecifying
a conditionaldependence
structure, there is also need for a number of parameters. This number
depends
notonly
on thenumber of basic variables but also on the number of outcomes of these variables.
In
general,
the conditionaldependence
structure of a set of randomcategorical
variableson V2 is
given by
adependence graph
onV2
withadjacency
matrix LetA be a subset of
V2
and thecorresponding
matrix of indicatorsaij
that are 1 or 0according
to whether or not(i, j)
E A. Such a subset A is aclique
of thedependence graph
ifand
only
ifdijkl
forall i, j, k,
1. Theprobability
function of W can begiven by
where c is a
normalizing
constant, and the functions8A
areidentically
zero for subsets A that are notcliques
of thedependence graph.
Moreover,9A(w) depends
onwj
for(i, j)
E Aonly,
and for allA,
w, and e A.By introducing
the submatrixwhich is w with
wi j replaced by
0 if it follows that9. NETWORK MODELLING
Network
modelling
canhelp
inunderstanding
orpredicting
network behaviour. Like in allmodelling, prior knowledge
is confronted withempirical facts,
and it is notalways
clearwhether observed
discrepancies
between model andreality
should call for a minor model modification or amajor change
to another class of models. Since no model isperfect
andthere
always
are statistics that do not follow the modelpattern,
it isgood practice
to beprepared
for future modelbuilding by collecting
also suchgeneral descriptive
statistics thatare not needed for
estimating
the model in use.Compositional
and relational structure should be reflected among suchgeneral
network statistics.Composition
statistics aremainly
varioussubgraph
counts: vertex counts,dyad
counts, triad counts, etc. Structure statistics are for instancedistances, reachability,
andconnectivity
of various kinds.Composition
refers to howmany elements of different kinds that make up the
network,
and structure refers to macroproperties involving
more than localproperties.
Theinterplay
between micro and macro,between
composition
and structure is theobject
of networkmodelling.
It is
possible
todistinguish
a few broad classes of network models that can be described asfollows.
Loglinear
models are models obtained for instanceby
conditionalindependence assumptions
as illustrated in this paper. It is also
possible
toformally apply
theloglinear
methods forcontingency
tableanalysis
tocategorical
counts ofvertices,
undirectededges,
and directededges
even ifappropriate independence assumptions
are not met. It istempting
to useeasily
available statistical
computer packages
for this kind ofanalysis
even if it is not yettheoretically justified
or discredited.Mixture models are models that express the
probability
function as aweighted
average of afamily
ofsimple probability
functions with unknownmixing weights. Usually
the number ofcomponents
in themixing
distribution is also unknown. Mixture models are often hard toestimate,
and mixture models for networks should be noexception.
See Frank(1989).
Block models refer to network models with structure
parameters
thatdepend
on individualsthrough
some kind of individualcategories only.
Thus the vertex set ispartitioned
intodisjoint
and exhaustivecategories,
and there is apartial homogeneity
within and between thevertex
categories.
Block models with randomcategories
can be considered as aspecial
kindof mixture models. For block model
testing,
see for instance Wellman et al.(1991).
Metric models for random colored
multigraphs
have a distance defined on the set ofoutcomes. The
probabilities
of the outcomes have a maximum at a certain centralgraph
anddecrease with
increasing
distance from thisgraph.
Such models areespecially appropriate
ifthe random variation is due to measurement or observation errors, and there is a fixed unknown colored
multigraph
to be estimated. See Banks andCarley (1994).
10. DATA ANALYTIC METHODS
Network
modelling
is oftensimplified
ifhomogeneity
can be assumed. Aprimary
concem inexploratory
networkanalysis
is therefore todecompose
the network into morehomogeneous
subnetworks. One way of
doing
this isby using
localdyad
counts, that is the number ofdyads
of different kinds at each vertex. The local
dyad
counts in coloredmultigraphs
can besubjected
to a clusteranalysis
in order to find out the similarities and dissimilarities between the vertices. In favourable cases,only
a few different kinds of vertices need to bedistinguished,
and separatemodelling
can be tried out within and between the clusters. Forfurther,discussion
and illustration of this method reference isgiven
toFrank, Komanska,
and Widaman(1985)
andFrank, Hallinan,
and Nowicki(1985).
Regression
methods are useful toexplain relationships
between variables. Withcategorical
variables
regression
and classification treesmight
be moreappropriate
than methods based onnumerical scales. A
general
reference is Breiman et al.(1984),
and anapplication
tograph
data is
given by
Frank(1986).
Ordering dyads, triads,
andhigher
order inducedsubgraphs according
todecreasing frequencies might give
someinsight
into the network structure.Finding
thelocally
mostcommon
dyads, triads,
etc. can for instance reveal that some individuals are central orspecial
in other
respects.
This can be useful if parametersvarying
with the individuals are relevant in the model.Dyad
and triad counts are also useful forcomparing
networks and fordetecting
structural
changes. See,
forinstance,
Frank(1987).
BIBLIOGRAPHY
BANKS,
D. andCARLEY,
K.,(1994),
"Metric inference for socialnetworks",
Journalof Classification 11,121-149.
BOLLOBAS,
B.,(1985),
RandomGraphs, London,
Academic Press.BREIMAN,
L.,
FRIEDMAN,J.H., OLSHEN, R.A.,
andSTONE, C.J., (1984), Classification
and
Regression Trees.,
Belmont, Wadsworth.EDWARDS, D., (1995),
Introduction toGraphical Modelling,
NewYork, Springer-Verlag.
FRANK, O., (1981),
"A survey of statistical methods forgraph analysis", Sociological Methodology, 1981,
editedby
S.Leinhardt,
SanFrancisco, Jossey-Bass,
110-155.FRANK, O., (1985),
"Random sets and randomgraphs",
Contributions toProbability
andStatistics in Honour
of Gunnar Blom,
editedby
J. Lanke and G.Lindgren, Lund, 113-120.
FRANK
O., (1986), "Growing
classification andregression
trees on networkdata",
Classification
as a Toolof Research,
editedby
W. Gaul and M.Schader, Amsterdam,
North-Holland, 137-143.
FRANK, O., (1987), "Multiple
relation dataanalysis", Operations
ResearchProceedings
1986,
editedby
H. Isermann etal, Berlin-Heidelberg, Springer-Verlag,
455-460.FRANK, O., (1988a),
"Randomsampling
and social networks - a survey of variousapproaches",
invited paperpresented
at Centre International de RecontresMathématiques, Marseille-Luminy, 1987, Mathématiques, Informatique
et Scienceshumaines , 26, 104,
19-33.
FRANK, O., (1988b),
"Triad countstatistics",
DiscreteMathematics , 72, 141-149.
FRANK, O., (1989),
"Randomgraph mixture",
Annalsof the
New YorkAcademy of Sciences, 576, 192-199.
FRANK, O., (1991),
"Statisticalanalysis
ofchange
innetworks",
StatisticaNeerlandica, 45,
283-293.
FRANK, O., HALLINAN, M..,
andNOWICKI, K., (1985), "Clustering of dyad
distributionsas a tool in network
modelling",
Journalof Mathematical Sociology, 11,
47-64.FRANK, O., KOMANSKA, H..,
andWIDAMAN, K., (1985),
"Clusteranalysis
ofdyad
distributions in
networks",
Journalof Classification, 2, 219-238.
FRANK,
O. andNOWICKI, K., (1993), "Exploratory
statisticalanalysis
ofnetworks",
Annals
of Discrete Mathematics , 55,
349-366.FRANK,
O. andSTRAUSS, D., (1986),
"Markovgraphs",
Journalof
the American StatisticalAssociation , 81,
832-842.HOLLAND,
P. andLEINHARDT, S., (1981),
"Anexponential family
ofprobability
distributions for directed