O VE F RANK
Random sampling and social networks. A survey of various approaches
Mathématiques et sciences humaines, tome 104 (1988), p. 19-33
<http://www.numdam.org/item?id=MSH_1988__104__19_0>
© Centre d’analyse et de mathématiques sociales de l’EHESS, 1988, tous droits réservés.
L’accès aux archives de la revue « Mathématiques et sciences humaines » (http://
msh.revues.org/) implique l’accord avec les conditions générales d’utilisation (http://www.
numdam.org/conditions). Toute utilisation commerciale ou impression systématique est consti- tutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte- nir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
http://www.numdam.org/
RANDOM SAMPLING AND SOCIAL NETWORKS A SURVEY OF VARIOUS APPROACHES
Ove FRANK1
1. GRAPH SAMPLING
In
statistics,
the concept of randomsampling
is used to describe selectionprocedures
that aregoverned by probability
laws.Thus,
two-stagesampling,
clustersampling,
Bernoullisampling,
and
simple
randomsampling
areexamples
of selectionprocedures
where the randomness is controlledby
thedesign.
Random
sampling
is also used to describe selection schemes influencedby
a series ofdifferent causes that are considered to be either irrelevant or
impossible
tocontrol,
but that aresupposed
to besufficiently
well describedby
someprobabilistic
model. Such selectionprocedures
controlledby
"nature" are often assumed toproduce
randomsamples
from aprobability
distributionbelonging
to somespecific parametric family
of distributions.Both
design-controlled
and the model-based randomsamples
are encountered in all areas ofapplied
statistics.Generally speaking,
statisticalanalysis
is not confined todesigns
or modelsbut is
applied
to a combination thereof. The randomsample
consists of observations from aprobability
distribution that contains controllabledesign
parameters as well as other parametersmodelling
the data.The
analysis
of observations of social networks and other systems ofinterrelationships
between some basic units
requires
randomgraphs
and statisticalprobability
distributions on setsof
graphs.
Randomgraph theory
hasdeveloped rapidly
since the appearance of the classical papersby
Erdos andRényi (1959, 1960).
Themonographs by
Bollobas(1985)
and Palmer(1985) give
excellent accounts of this field of research.Statistical
questions
related tograph sampling
cannot be handled without access toparametric
or non
parametric
families of distributions of randomgraphs.
Uniformprobability
distributionsover a set of
graphs
of fixed order and size areusually
notappropriate
for statisticalapplications.
A Bernoulli
graph
with a commonedge probability assigned
toindependent edge
occurrences isusually
not richenough
to modelempirical
data. A richerparametric family
or an even richernonparametric family
ofprobability
distributions isgenerally
needed as anappropriate
statisticalgraph
model.1 Department of Statistics, University of Stockholm, S-106 91 Stockholm, Sweden.
To define a statistical
graph model, simple
randomgraphs
can be used as components of mixture distributions and parts of latent structure combinations of various kinds. Some illustrations of this aregiven
in the next section. More elaborate random colorations andmultigraphs
are described in Section3,
and severalexamples
illustrate the use ofrandomly
colored
multigraphs.
This model focuses interest on the simultaneous consideration ofcomposition
and structure of a network. The idea ofseparating composition
and structure anddistinguishing
between whether or notthey
are random is used toclassify
various models.Section 4
gives
a survey of somemultiparametric
randomgraphs
introducedby
Holland and Leinhardt(1981),
Frank and Strauss(1986)
and others. Inparticular,
the conditionaldyad independence assumption
iscompared
to the Markovdependence assumption
at somelength.
Section 5 is devoted to random
graph
models in which theemphasis
is onsampling design.
Puredesign-based
inference as well asBayesian superpopulation approaches
are discussed.Finally,
Section 6
gives
someconcluding
views on research trends andprospects
in statisticalgraph theory.
2. SIMPLE RANDOM GRAPHS
This section describeç some basic random
graphs
andgives
a few illustrations of how to obtainmore elaborate models
by combining
such randomgraphs
to obtain mixtures and other latentstructures.
A
simple graph
g on a vertex set V can be identified with itsedge
set, thatis,
a subset of theset
V2
ofpairs
of vertices. Ifloops
are notallowed,
g is a subset of the setV(2)
ofpairs
ofdistinct vertices. If all
edges
areundirected,
g issymmetric
in the sense that(i,j)
E gimplies
that’,i)
E g. Undirectedgraphs
g can also be considered as subsets of the union of V and the set(V 2 )
2of the unordered
pairs
of distinct vertices. Ifloops
are notallowed,
g issimply
a subset of(V).
2The cardinalities of V and g are called the order and size of the
graph.
Let G be a
family
ofsimple graphs
defined on V. Set IVI=n. If G is the set of undirectedgraphs
of order n and size rhaving
1loops,
G containsgraphs.
If G is the set of directedgraphs
of order n and size rhaving
1loops,
G containsgraphs.
A randomgraph
Y in G is agraph
chosen from Gaccording
to aprobability
distribution. The
probability
function of Y is denotedby
If
p(y)
=1/IGI
for y EG,
Y is said to beuniform
on G. Forinstance,
a uniform random undirectedgraph
of order n and size rhaving
noloops
hasp(y) = 1/(.)
r where n’ =(n),
2 anduniform random undirected
graph
of order n and size rhaving
any number ofloops
hasp(y) =
1/(f)
where n’ =(n+1),
1/(
r w ere n = 2Let m denote the maximum number of
graphs
inG,
no two of which areisomorphic.
ThenG is the
disjoint
unionG1
U ...UGm
of m classes ofisomorphic graphs.
If Y has aprobability
distribution on G that is invariant under
isomorphism,
there areprobabilities
pl,...,pm such thatp 1+...+pm
= 1 andp(y)
=Pif IGil
for y EGi
andi=l,...,m.
If furthermore pi = 1/m fori=l,...,m,
Y is said to beisomorphically uniform
on G. Forinstance,
anisomorphically
uniform random undirected
graph
of order 4 and size 3having
noloops
hasp(y)
= 1/36 if y is apath,
andp(y) = 1/12
otherwise. Thecorresponding
uniform randomgraph
hasp(y) = 1/20
forthe same
graphs
y. Table 1 illustrates this and some othercomparisons
between uniform andisomorphically
uniform distributions.Generally,
uniform andisomorphically
uniformdistributions on G are
equal
if andonly
if any twographs
in G have the same number ofisomorphisms.
Table 1. Three
examples
of a uniform and anisomorphically
uniform distribution.Let g be a fixed
graph
onV,
and let Y be obtained from gby keeping
ordeleting
itsedges independently
withprobabilities
p and q=1-p, respectively.
Then Y is said to be a Bernoulli(g,p) graph
or a Bernoulli(p) subgraph
of g. Here(2g
denotes the set of all subsets ofg).
Inparticular,
a Bernoulli(p) subgraph
of thecomplete
undirected
graph
g= (V 2) 2)
is one of the randomgraphs
mostfrequently investigated ;
with p =r/(2),
it is often used as anapproximation
to the uniform random undirectedgraph
of order n andsize r
having
noloops.
For p= 1/2,
the Bernoulli(p) subgraph
of g is a uniform randomgraph
on the set of
subgraphs
of g. Theprobability
distribution of any Bernoulligraph
is invariant underisomorphism.
If a directed random
graph
Y on V ={ 1,...,n }
is obtainedby selecting independently
andrandomly
at each vertex i E V a number ai of vertices in V to bejoined by edges
fromi,
then Y is said to be a random(al,...,an) - mapping from
V.Analogously,
a random(bl,...,bn) - mapping
to V is definedby selecting independently
andrandomly
at each i E V a numberbi
ofvertices in V to be
joined by edges
to i. Inparticular,
a random( 1,...,1 ) - mapping
from V issimply
called a randommapping
on V.Let
G(al,...,an)
be the set of directedgraphs
on Vhaving
aç verticesjoined by edges
from ifor
i=l,...,n.
Thus a random(al,...,a,,,) - mapping
Y from V is a uniform randomgraph
onG(al,...,3n) having probability
functionIt is a
isomorphically
uniform if andonly
if all the ai areequal.
A random(al,...,an) - mapping
from V with no
loops
isanalogously
found to be a uniform randomgraph
on the setGo(ai,...,an)
0 1containing (n-1)...(n-1)
al an directedgraphs
on Vhaving a,
vertices other than ijoined by edges
from i fori=l,...,n.
If Y is a random
(a1,...,an) - mapping,
and Y’ is obtained from Yby reversing
all theedges,
the intersection
graph
Y ri Y’ consists of allpairs
of vertices that arejoined
both ways inY,
andthe union
graph
Y U Y’ consists of allpairs
of vertices that arejoined
in at least one direction inY. The undirected
graphs corresponding
to thesymmetric graphs
Y fl Y’ and Y U Y’ are calledthe
strongly
andweakly symmetrized
versions ofY, respectively.
Inparticular,
astrongly symmetrized
randommapping
consists of a number of isolatedloops
andedges corresponding
to the
loops
and2-cycles
in the randommapping.
Let
Yl,...,Yk
be randomgraphs
in G withprobability
functionswhere
i=l,...,k.
A randomgraph
Y is said to be a mixture ofY1,...,Yk
withmixing
probabilities 81 ,...,8k
if theprobability
function of Y isgiven by
Here
8~+...+8~
= 1. Inparticular,
consider a mixture of thestrongly
andweakly symmetrized
versions of a random
mapping
on V ={ 1,...,n ~
withmixing probabilities
8 and 1-8. This is arandom undirected
graph consisting
of connected components that are trees orsingle loops
orcycles
with or without trees attached to them. Alarge
value of 8implies
that isolatedloops
andedges
are common, while a small value of 8implies
that trees and attached trees are morefrequent.
Atypical
realization of such a mixture is shown inFigure
1.Figure
1. Atypical
realization of a mixture of thestrongly
andweakly symmetrized
versions of a randommapping.
Identifiability problems
forgeneral
mixtures are discussedby Titterington,
Smith and Makov(1985).
Frank(1986b)
discusses conditions on the parameters that guarantee theidentifiability
of a mixture of Bernoulli
graphs.
3. RANDOMLY COLORED MULTIGRAPHS
Simple graph
structures are not richenough
for many statisticalapplications involving
relationaldata. For
instance,
arelationship
likekinship
can be furtherspecified
assibling,
parent, etc. Arelationship
likesimilarity
can be furtherspecified by
somequantitative
measure of thedegree
ofsimilarity
orby
someseparation
of different aspects ofsimilarity.
Communication networks may need to havecapacities
or distances attached to theedges. Sociograms
and other contact patternsare often more
interesting
if some information about the individuals is also available.Consider several different
relationships
to be studiedsimultaneously
and inconjunction
withobservations on both individual and relational variables. One or more individual attributes can be combined into a vertex
variable,
and one or more relational attributes can be combined into anedge
variable. It is convenient to refer to the outcomes of these variables as colors andspeak
ofvertex and
edge
colorations.Only finitely
many colors are considered here.Symmetric
andasymmetric relationships
aredistinguished
so that thegeneral
structure is that of a coloredmultigraph consisting
of acomplete
undirectedgraph Ci)
2 and acomplete
directedgraph V(2)
defined on a common vertex set V
= 1,...,n}.
Forconvenience,
directededges
are now calledarcs. There are three random colorations : a vertex coloration that is a function X from V to
A,
an
edge
coloration that is a function Y from(V 2 )
2 toB,
and an arc coloration that is a function Z fromV(2)
to C. HereA,B
and C are color sets ofa,b
and ccolors, respectively. Consequently,
the set G of all
possible
coloredmultigraphs
containsoutcomes of the
randomly
coloredmultigraph (X,Y,Z)
on V. Threeexamples
of sets G areshown in
Figures
2-4.Figure
2 shows thegraphs
of order n=3 with a=l vertexcolor,
b=3edge
colors
(solid, dotted, nothing),
and c=l arc color(nothing). Figure
3 shows thegraphs
of ordern=3 with
a=l,
b=4corresponding
to thepossible
combinations of solid and dottededges,
andc=1.
Figure
4 shows thegraphs
of order n=3 witha=2, b=2,
and c=1. The number at eachgraph
is the number ofisomorphisms.
The
probability
function of(X,Y,Z)
is denotedby
where x =
(xi,...,x~),
y =(Yij : 1 1 j n), z = (Zij :
i :1=j)
are elements ofAn, B ,
andcn(n-1), respectively.
If theprobability
distribution is invariant underisomorphism,
and if thereare m
isomorphism
classesG;,
then there areprobabilities
Pi such that pi+~.+pm =1 andDistributions that are invariant under
isomorphism
are ofparticular
interest formodelling phenomena
thatdepend
not on the identities of the vertices butonly
on their colors. The combinatorialproblem
ofcounting
the number m ofnon-isomorphic graphs
can be solvedby
anapplication
of Burnside’s lemma(see Frank, 1986a)
and for n=2 and n=3 thefollowing
formulae
apply :
-
In
particular,
the last formula checks with the number ofgraphs
inFigures
2-4.Figure
2.Graphs
of order 3 with threeedge
colors.Figure
3.Graphs
of order 3 with fouredge
colors.Figure
4.Graphs
of order 3 with two vertex colors.Assume now that the
probability
distribution of(X,Y,Z)
is invariant underisomorphism.
Letthe colors be labeled
by integers
so that A ={ 1,...,a },
B ={ 1,...,b }
and C ={ 1,...,c } .
SetNi equal
to the number of vertices v E Vhaving Xv
=i,
setRljk equal
to the number ofedges { u,v }
E(v) 2 having X
=i, Xv j, Y = k (where Y = Y~ by définition),
and setSijk1 equal
to the number of arcs
(u,v)
EV(2) having Xu
=i, Xv
=j, Zuv
=k, Z~u
= 1. The vertex colorfrequencies (N1,...,Na)
are summary statistics of thecomposition
of thegraph,
and theedge
and arc color
frequencies (R;jk :
k =1,...,b)
and(S;jkl :
k =1,...,c ;
1 =1,...c)
between and within different vertex colors 1 ij
a are summary statistics of the structure of thegraph
conditional on the
composition.
Herewhere
Nil
=(1~’)
andNii
=NiNj
for i~ j.
Thecomposition
and structure summary statistics aresufficient statistics if the
edge
and arc colors(Y uv’ Zuv, z,u)
areindependent
for distinctpairs {u,v}
E(v)
2 conditional on the vertex colors(Xi,---,Xn).
Anexample
of such a model isdescribed in the next section.
It is
possible
toclassify general
randomgraph
modelsby distinguishing
between random and non-randomcomposition
and structure. Table 2 illustrates thistypology.
If both thecomposition
and the structure arenon-random,
thegraph
is deterministic and not of concernhere. If the
composition
is random but the structure isnon-random,
thegraph
can be consideredas a model for a random process on the vertices of a fixed
graph.
Randompercolation
processeson a lattice medium and random adhesion processes on the surface of a
polyhedron
are of thistype ; Markov
chains, Ising
models and moregeneral
Markov fields are otherexamples.
Suchmodels are
typically
concerned with someprobabilistic
processgoing
on in some fixedenvironment
having
aspecific neighborhood
structure.If the
composition
is non-random but the structure israndom,
thegraph
can be a classicaluniform random
graph
or a Bernoulligraph.
Also more elaborate statisticalgraph
models like theones in the next section
belong
to this type aslong
as there are no vertex colors oronly
deterministic vertex colors. Such models are
typically
concerned withprobabilistic interrelationships
or interactions between the units in a fixed set.If there is some kind of
probabilistic interdependence
between thedevelopment
of a process and the characteristics of the environment in which itarises,
we need a model in which bothconlposition
and structure are random. Several of the models in the next section are of this type.Table 2. Classification of random
graphs according
tocomposition
and structure.4. MULTIPARAMETRIC MODELS
Various attempts have been made to find a
multiparametric
model for social network data that is of sufficientgenerality
to fitreasonably
well in many different contexts ; see, forinstance,
Burt(1982)
and Knoke and Kuklinski(1982).
Animplicit
orexplicit
constraint in somemethodological
papers seems to be that the model should also be easy tohandle, preferably
viastandard
packages
for linear statistical models or othereasily
accessible programs. This doublegoal
ofgenerality
and readiness isgood
if itbrings empirical findings
and feedback fromwidespread applications.
Otherwise it mayjust
be toolimiting
and more of an obstacle to furtherdevelopment
ofspecific
models.This section surveys some of the
approaches
in the literature which aremainly exploratory
inpurpose and aim at
providing general
dataanalytic
tools for network data. A few more restricted models are also mentioned.Holland and Leinhardt
(1981) investigate
a classof probability
distributions for networks thatbelongs
to theexponential family
and contains parameters that can beinterpreted
as contactintensity,
individualattraction,
individualactivity,
and mutual contactintensity.
The basicassumption
is that contacts areindependent
between differentpairs
of individuals. Morespecifically,
let Y be a randomgraph
in the set G of directedgraphs
of order nhaving
noloops
or
multiple edges.
The vertex set is V= ~ 1,...,n},
and Y is considered either as a random subset ofV(2)
or as a randomadjacency
matrix ofedge
indicatorsYuV
UV for(u,v) E V(2).
The(n)
2random
dyads (Yuv, Yvu)
for 1 u v n areindependent,
and theprobability
function of Y isaccordingly given by
aproduct
Its
logarithm
isequal
toHere PUy
(1,0)
= Pvu(0,1 )
for u > v. The mostgeneral
model of this type wouldrequire
3(n)
parameters, but the Holland-Leinhardt model reduces this number to 2n
by assuming
where 1 0152v = L Py,
andÀuv
is anormalizing
constant.Holland and Leinhardt
(1981)
andFienberg, Meyer
and Wasserman(1985)
extend theprevious
modelof pure
structure to situations in whichcomposition
data are alsoavailable ;
thatis,
when thevertices
are colored todistinguish
between variouscategories
of vertices. LetX.
bethe color of vertex v and
Yuv
an indicator ofedge (u,v),
as before. Theprobability
function ofthe
graph (X,Y) having composition
X =(X1,...,Xn)
and structure Y =(Yuv :
u ~v)
can begiven
asif it is assumed that the vertices are colored
independently
andthat,
conditional on the vertexcolors,
thedyads
haveprobabilities
thatdepend only
on the colors of the two vertices in thepair,
and
dyads
areindependent
for differentpairs
of vertices.Now,
the numberof parameters
can berestricted
by
suitableparameterizatzon
so that it increases with the number of colorsonly
and notwith the number of vertices. This is a convenient way to avoid the
complications
due toincreasing degrees
of freedom that make it difficult to evaluateasymptotic
distributions forlarge graph
orders n.White,
Boorman andBreiger (1976), Arabie,
Boorman and Levitt(1978)
and others haveanalyzed
block models that aim at anintegrated representation
ofcomposition
and structure. The vertices can be colored todistinguish
between different blocks of vertices. Theblock-modelling approach
is combinatorial and notprobabilistic,
and its purpose is to find a blockcomposition
that in a combinatorial sense
gives
anoptimal
structure between and within the blocks.Stochastic
block-modelling
is similar to themodelling
ofrandomly
coloredgraphs.
Anearly
reference is
Holland, Laskey
and Leinhardt(1983)
and a more recent one isWang
andWong ( 1987).
The
approach
of Frank et al.(1986)
and Wellman et al.(1988)
considers colored verticesand colored
edges
and allows the order of thegraph
to be random. Thetypical application
is toan individual’s social support network
consisting
of intimate kins and friends and some kinds ofinterrelationships
between them. In order to include various attributes of theindividuals, relationships,
and networks in astudy,
we need models of theinterdependence
betweencomposition
and structure. The order of the network is taken as a truncated Poisson variable.Conditional on the
order,
the individual attributes aresupposed
to beindependent
for different members of the network. Conditional on the individualcomposition,
the attributes ofrelationships
aresupposed
to beindependent
for differentdyads.
This leads to aloglinear
modelthat can be fitted
by
astepwise testing procedure.
Another
exploratory technique
forinvestigating
network data has beenapplied by Frank, Hallinan,
and Nowicki(1985)
andFrank, Komanska,
and Widaman(1985).
The main idea here is to use clusteranalysis
to reduce the number ofdyad
distributions between and within vertexcategories.
Thistypically
leads to a substantial reduction of the initial number of parameters needed when all thedyad
distributions are distinct. Anadvantage
with thisapproach
is that noassumptions
are needed about theparametric
form of thedyad
distributions. The final model may contain a fewquite general dyad
distributions that refer to a coarser classification into vertexcategories.
As an
example
of a morespecific
model of theinterdependence
betweencomposition
andstructure of a
network,
let us now consider thefollowing
modelinvestigated by
Frank andHarary (1982).
The vertices in V= ~ 1,...,n}
areindependently
coloredaccording
to aprobability
distribution P1,...,Pr on r colors. An initialgraph
is formedby joining by edges
allthe vertices of the same color.
Edges
between vertices of the same color can be deleted with aprobability
a, and newedges
between vertices of different colors can be inserted with aprobability ~.
All the deletions and insertions aremutually independent
andindependent
of thevertex coloration. The final
graph
ofremaining
initialedges
and inserted newedges
is observed.This
graph
Y can be described as the union of agraph
that is Bernoulli(X,1-«)
and agraph
thatis Bernouilli
(X, p)
where X is thegraph
on V obtainedby joining
vertices of the same colorand
X
is itscomplement.
This model can be considered
appropriate
for a set of nobjects exhibiting
a latentclustering
into r clusters of similar
objects. Similarity
cannot,however,
be observed without error. There is aprobability
a ofobserving
a falsedissimilarity
and aprobability
ofobserving
a falsesimilarity.
If there is an unknown number r ofequally likely
clusters so that pi= 1/r,
then thereare parameters r,
a, P
that must be estimated to fit this model to data. Frank andHarary (1982)
discuss this and similar estimation
problems.
Further related material can also be found in Frank(1978a).
The models discussed so far have a common
assumption
ofindependence
betweendyads
conditional on the vertex colors.
Thus,
ingeneral
there can bedependence
betweenedge colors,
but this
dependence
is thenexplained by
the vertex colors. This means that structuraldependence
notexplicable by
thecomposition
cannot be handledby
these models. A kind of modelsgovemed by
a moregeneral
structuraldependence
is the Markovgraphs investigated by
Frank
(1985)
and Frank and Strauss(1986).
In a Markovgraph,
the colors of anypair
of non-incident
edges
areindependent
conditional on the colors of all otheredges,
but the colors of apair
of incidentedges
could beconditionally dependent.
A Markovgraph
with aprobability
function that is invariant under
isomorphism
is called ahomogeneous
Markovgraph.
Theminimal sufficient statistics of a
homogeneous
Markovgraph
Y aregiven by
the star andtriangle
counts, that is
by
thefrequencies
ofsubgraphs
of Y that areisomorphic
to various distinct starsand
triangles.
A star at center v E V isspecified by
the colors of all the verticesadjacent
to vand of all the
edges
incident to v. Atriangle
attu,v,wl
E(j)
isspecified by
the colors of these vertices and alledges joining
them. Stars andtriangles
are calledsufficient subgraphs
of aMarkov
graph. Figure
5 shows the sufficientsubgraphs
ofsimple
undirected Markovgraphs
oforder 5 with two vertex colors and two
edge
colors. Inferenceproblems
for Markovgraphs
arenot easy. Frank and Strauss
(1986)
havesuggested
a method ofestimating
Markovgraph
parameters which is based on simulations. The
computational complexity
involved seems to beprohibitive
for extensiveapplications.
Figure
5. Sufficientsubgraphs
of undirected Markovgraphs
of order 5 with two vertex colors.5. SUBGRAPH SAMPLING
A
particular
class ofgraph-sampling problems
arises when a fixedpopulation graph
isinvestigated by sampling
andobserving only
a part of it. This section describes varioussubgraph sampling designs
that have been consideredby
Bloemena(1964), Capobianco (1970, 1974)
and Frank(1969, 1977a,b,c, 1978b).
Let V =
{ 1,...,N ~
be apopulation
of N units and g =(x,y,z)
a coloredmultigraph
definedon V. Various
population
parameters of interest can be the color distributions of thevertices, edges,
and arcs, the number of inducedsubgraphs
of differentkinds,
etc. Sometimes the totals¿xi, lyij, lzij
can beinterpreted
asquantities
ofparticular
interest. Whenonly
asimple graph
y is defined
on V,
its size and number of connected components areexamples
ofpopulation
parameters that can be
given
as totals.Let S be a subset of V selected
according
to aspecific
randomsampling design having
inclusion
probabilities
In
particular, simple
randomsampling
of n unitsimplies
that ni =n/N, ng
=n(n-1 )
/N(N-1 )
for i ~
j,
and Bernoulli(p) sampling implies
that nç = p,ng
=p2
for i ~j.
If S is a vertexsample,
inducedsubgraph sampling
means that thesubgraph
of thepopulation graph
g inducedby
S is observed. Denote thissubgraph by g(S). Thus,
colors are observed of the vertices in thesample
and of theedges
and arcs betweenpairs
of vertices in thesample.
If g = y, theng(S)
=y û (S),
and a total T= 1 yij
can be estimatedby g( )
Y 2Y1 can be estimated by
Ywhere
Si
indicates i E S. This and otherHorvitz-Thompson
estimators forgraph
totals areinvestigated
in Frank{ 1977a,b,c).
An alternative
subgraph sampling procedure
based on a vertexsample
S is starsampling,
inwhich colors are observed of
vertices, edges,
and arcsadjacent
and incident to thesample.
Starsampling
based on S and inducedsubgraph sampling
based on thecomplement S
= V-S arecomplementary ;
thatis, g(S)
denotes the starsample
based on S. Thisrelationship
can be usedto derive results for star
sampling procedures
fromcorresponding
results for inducedsubgraph sampling procedures. Capobianco
and Frank(1982)
havecompared
different estimators basedon these
subgraph sampling procedures.
A
generalization
of starsampling
is snowballsampling
which is a successive extension ofan initial vertex
sample
to its star, of the vertex set of this star to its star, etc. Goodman(1961)
and Frank
(1979) give
further results on snowballsampling.
Other
sampling procedures
ingraphs
are based onedge sampling designs.
Forinstance,
incidentsubgraph sampling
based on anedge sample
S - c(~) 2
means that the colors areobserved of the vertices and
edges
incident to at least oneedge
in S. Anapplication
of thissubgraph sampling procedure
is thefollowing.
Let the vertices bepeople
and theedges telephone
callsduring
aspecific period
of time. Some calls aresampled,
and for eachsampled
call the
speakers
are asked to reportspecific
information about thepeople they
have beencalling
and the calls
they
have madeduring
the timeperiod.
The
subgraph sampling procedures
discussed so far have referred to a fixedpopulation graph.
As inordinary
surveysampling,
it is sometimes useful to introduce asuperpopulation
model
by
which thepopulation graph
is considered assampled
from somefamily (superpopulation)
ofpopulation graphs.
Forinstance,
the introduction of a Bernoullipopulation graph
makes itpossible
to useBayesian
methods forestimating
the size of apopulation graph.
The
exploration
ofBayesian
andempirical Bayesian
methods ingraph sampling
hashardly begun.
6. PROSPECTS
After the appearance of the
pioneering
papersby
Erdôs andRényi (1959, 1960)
randomgraph theory
hasdeveloped rapidly.
The review articleby
Karonski(1982)
contains about 250references,
and the textbookby
Bollobas(1985)
has more than 750 references. Randomgraph
evolution and limit theorems for random
graphs
continue toinspire
much research andproduce interesting
results.Computer
science is one of the mostimportant
sources for new andinteresting
randomgraph problems.
Randomgraphs
also findinteresting applications
in theoreticalchemistry
andbiology,
and these fields are
likely
to haveincreasing
influence onapplied
and theoreticalgraph theory
inthe future.
Families of random
graphs
orparametric graph
distributionsappropriate
for statisticalmodelling
ofgraph
data are not yet available to anylarge
extent.Important
contributions to thedevelopment
havemainly
been initiatedby applications
in the social and behavioral sciences.Other areas in which statistical
graph theory
can beexpected
to find futureapplications
are inpattern and
image modelling. Spatial
statistics and random fieldtheory
areexpanding
branchesof statistics that may also be of
importance
for thedevelopment
of statisticalgraph theory. See,
for
instance, Ripley (1981),
Adler(1981),
and Vanmarcke(1983).
Furthermore,
therapid development
of discrete mathematics and combinatorics may alsobring
with it anincreasing
interest in combinatorialconfigurations
other thangraphs.
Randomhypergraphs,
randompermutations,
randompartitions,
and random tesselations are a fewexamples
of suchobjects
that haveappeared already. See,
forinstance, Berge (1973),
Lovasz(1979),
andAhuja
and Schachter(1983).
REFERENCES
ADLER, R.,
TheGeometry of Random Fields,
NewYork, Wiley, 1981.
AHUJA, N.,
andSCHACHTER, B.,
PatternModels,
NewYork, Wiley, 1983.
ARABIE, P., BOORMAN, S.A.,
andLEVITT, P.R., "Constructing
block models : how andwhy",
Journalof Mathematical Psychology 17, 1978, 21-63.
BERGE, C., Graphs
andHypergraphs, Amsterdam, North-Holland, 1973.
BLOEMENA, A.R., Sampling from a Graph,
Mathematical CentreTracts, Amsterdam, 1964.
BOLLOBAS, B.,
RandomGraphs, London,
AcademicPress, 1985.
BURT, R.,
Toward a StructuralTheory of Action,
NewYork,
AcademicPress, 1982.
CAPOBIANCO, M.,
"Statistical inference in finitepopulations having structure",
Transactionsof the
New YorkAcademy of
Sciences32, 1970, 401-413.
CAPOBIANCO, M.,
"Recent progress instagraphics",
Annalsof
the New YorkAcademy of
Sciences, 231, 1974, 139-141.
CAPOBIANCO, M.,
andFRANK, O., "Comparison
of statisticalgraph-size estimators",
Journal
of Statistical Planning
andInference 6, 1982,
87-97.ERDOS, P.,
andRENYI, A.,
"On randomgraphs I",
Publ. Math. Debrecen6, 1959,
290-297.ERDOS, P.,
andRENYI, A.,
"On the evolution of randomgraphs",
Publicationsof
theMathematical Institute
of
theHungarian Academy of Sciences, 5, 1960, 17-61.
FIENBERG, S.E., MEYER, M.M.,
andWASSERMAN, S.,
"Statisticalanalysis
ofmultiple
sociometric
relations",
Journalof
the American StatisticalAssociation, 80, 1985, 51-67.
FRANK, O.,
"Structure inference and stochasticgraphs", FOA-Reports, 3:2, 1969, 1-8.
FRANK, O.,
"Estimation ofgraph totals",
Scandinavian Journalof Statistics, 4, 1977a,
81-89.FRANK, O.,
"A note on Bernoullisampling
ingraphs
andHorvitz-Thompson estimations",
Scandinavian Journal
of Statistics, 4, 1977b, 178-180.
FRANK, O., "Survey sampling
ingraphs",
Journalof
StatisticalPlanning
andInference, 1, 1977c,
235-264.FRANK, O.,
"Inferencesconcerning
clusterstructure", Proceedings
inComputational Statistics,
editedby
L.C.A. Corsten and J.Hermans, Vienne, Physica-Verlag, 1978a,
259-265.
FRANK, O.,
"Estimation of the number of connected components in agraph by using
asampled subgraph",
Scandinavian Journalof Statistics, 5, 1978b, 177-188.
FRANK, O.,
"Estimation ofpopulation
totalsby
use of snowballsamples", Perspectives
onSocial Network
Research,
editedby
P. Holland and S.Leinhardt,
NewYork,
AcademicPress, 1979,
319-347.FRANK, O.,
"Random sets and randomgraphs",
Contributions toProbability
and Statistics in Honourof
GunnarBlom,
editedby
J. Lanke and G.Lindgren, Lund, 1985, 113-120.
FRANK, O.,
"Triad countstatistics", Department
ofStatistics, University
ofStockholm,
1986a. To appear in Discrete Mathematics.
FRANK, O.,
"Randomgraph mixtures", Department
ofStatistics, University
ofStockholm,
1986a. To appear in the Annals
of the
New YorkAcademy of
Sciences.FRANK, O., HALLINAN, M.,
andNOWICKI, K., "Clustering of dyad
distributions as a tool in networkmodelling",
Journalof
MathematicalSociology, 11, 1985, 47-64.
FRANK, O.,
andHARARY, F.,
"Cluster inferenceby using transitivity
indices inempirical graphs",
Journalof
the American StatisticalAssociation, 77,1982, 835-840.
FRANK, O., KOMANSKA, H.,
andWIDAMAN, K.,
"Clusteranalysis
ofdyad
distributions innetworks",
Journalof Classification, 2, 1985, 219-238.
FRANK, O., LUNDQUIST, S., WELLMAN, B.,
andWILSON, C., "Analysis
ofcomposition
and structure of socialnetworks", Department
ofStatistics, University
ofStockholm, 1986.
FRANK, O.,
andSTRAUSS, D.,
"Markovgraphs",
Journalof
the American StatisticalAssociation, 81, 1986,
832-842.GOODMAN, L.A.,
"Snowballsampling",
Annalsof
MathematicalStatistics, 32, 1961,
148- 170.HOLLAND, P.W., LASKEY, K.B.,
andLEINHARDT, S.,
"Stochasticblockmodels,
Firststeps",
SocialNetworks, 5, 1983,
109-138.HOLLAND, P.W.,
andLEINHARDT, S.,
"Anexponential family of probability
distributions for directedgraphs",
Journalof
the American StatisticalAssociation, 76, 1981, 33-65 (with discussion).
KARONSKI, M.,
"A review of randomgraphs",
Journalof Graph Theory, 6, 1982, 349-389.
KNOKE, D.,
andKUKLINSKI, J.H.,
NetworkAnalysis, Beverly Hills, Sage Publications, Ca.,
1982.LOVASZ, L.,
Combinatorial Problems andExercices, Amsterdam, North-Holland,1979.
PALMER, E., Graphical Evolution,
NewYork, Wiley, 1985.
RIPLEY, B., Spatial Statistics,
NewYork, Wiley, 1981.
TITTERINGTON, D.M., SMITH, A.F.M.,
andMAKOV, U.E.,
StatisticalAnalysis of
FiniteMixture