HD28
•M414
no
.9f
e
Dewey
WORKING
PAPER
ALFRED
P.SLOAN
SCHOOL
OF
MANAGEMENT
THE
APPLICATION
OF SOCIOMETRIC
AND
EVENT-HISTORY
MODELING
TO
BIBLIOMETRIC DATA:
THE
CASE
OF
TRANSGENE
PLANTS
Koenraad Debackere, Bart Clarysse and
Michael A. Rappa 15 February 1994 Sloan
WP
# 3656-94MASSACHUSETTS
INSTITUTE
OF
TECHNOLOGY
50
MEMORIAL
DRIVE
CAMBRIDGE,
MASSACHUSETTS
02139
mm*?®
^MassachusettsInstituteofTechnology
THE
APPLICATION
OF SOCIOMETRIC
AND
EVENT-HISTORY
MODELING TO
BIBLIOMETRIC DATA:
THE
CASE
OF
TRANSGENE
PLANTS
Koenraad Debackere, Bart Clarysse and
MichaelA. Rappa
15 February 1994 Sloan
WP
# 3656-94MASSACHUSETTS
INSTITUTE OF TECHNOLOGY, IO94Alfred P.
Sloan
School of
Management
Massachusetts
Instituteof
Technology
50
Memorial
Drive,E52-538
ICC
[ICO
MAR
2
41994
The
application
of Sociometric
and
event-History
Modeling
to
bibliometr1c data:
the
case
of
transgene
plants
KOENRAAD
DEBACKERE,BART
CLARYSSE ANDMICHAEL
A. RAPPA*Thispaper examines the determinantsofresearcher contribution-spans. The
contribution-span isthe numberofyearsspanninga researcher'sfirst andlast
knownpublications in afield. Asaconsequence, itserves asa uniqueanduseful
measure ofresearchers' persistence in afield. Based on co-authorship data,
several sociometric indices are created and their impact on researchers
'
persistence in their efforts to develop a technology is examined. Evidence is
providedfrom 2,876researchers active in the fieldoftransgeneplants over
eleven-years. Thefindings lend supportto theproposition thatan individual
researcher's network position is an important determinant ofhis or her
persistence in thedevelopmentofanewtechnology.
INTRODUCTION
New
scientificand technological disciplines typically emerge as a result ofthe effortsofresearchersengaged in problem-solvingactivities that lead to thedevelopment of
new
ideas
and
techniques that constitute scientific and technologicalknowledge
(Allen,1966; Constant, 1980; l.audan, 1984; Layton,
1974
&
1977; Root-Bernstein, 1989;Rosenberg, 1982).
As
a consequence, a possible approach to study the processes ofscientificand technological advanceconsistsofunderstanding the creationand diffusion of
knowledge
among
researchers.Widely
accepted models of the growth ofknowledge
view this process as a cumulative progression ofideas
and
techniques (Crane, 1972;Nelson
and Winter, 1982; Sahal, 1981) with only infrequentmajor
disruptions ordiscontinuities (Dosi, 1982;
Kuhn,
1970; Rosenberg, 1982;Tushman
and
Anderson,1986).
This cumulativecharacter of the growth ofscientific and technological
knowledge
isimportant to understand thebehavior ofthe researchers
who
produceit.More
specific,researchers have to master a core stock of (often 'tacit')
knowledge
before they cancontribute to the
grow
h of their field (Collins, 1974; Polanyi, 1958). HerbertSimon
estimates that it
may
takeanywhere up
to ten years tobecome
an expert in an area ofwork. Thus, researchers have to persist in their efforts to develop a
new
technology inorder to contribute to its development. Unfortunately,
undue
persistence often leads todiminishing marginal yields in astockofknowledge (Ziman, 1987).
The
danger ofundue
persistence is that the researcher'sknowledge
basebecomes
obsolete. Katz and Alien (1982) have provided extensive evidence on
how
'persistence'eventually
undermines
research performance through the not-invented-here (NIH)syndrome.
Along
similar lines, Gieryn (1978) hasshown
that 'productive' researchersswitch research agendas regularly in order to avoid obsolescence.
Not
astonishingly,"Koenraad Debackere(seniorresearcherNT.
WO.)
and BartClayrssearewith the VlerickSchoolforManagement,University ofGent. Michael Rappais withthe
MIT
Sloan School ofManagement.Anearlierversionofthispaperwas presented at the Fourth International Conferenceon Bibliometrics, lnformetncs and Sciemometrics, Berlin,
2 K. DEBACKERE, B. CLARYSSE,AND M. A. RAPPA
many
scholars havedocumented
the creation ofnew
knowledge
by researchersfrom
outside thesystem (Ben-David and Collins, 1966;
Edge and
Mulkay, 1976; Gierynand
Hirsch, 1983)-
These
'outsiders' offer ideas thatmay
run counter to prevailing practicein a field. Ifthe
new
ideas attain legitimacy, though, theymay
replace existing traditions ofpractice.What
then determines researchers' persistence with a technology? This questionobviously has clear implications for the
management
ofR&D
(Rappa, Debackere andGarud, 1992).
More
specifically,in this paper,we
want
to address the question towhat
extent researchers' positions in a
network
ofR&D
collaborations influence theirpersistence.
To
this end,we
use the literature as a source of dataon
the length ofaresearcher's association with a field of technological inquiry to
examine
persistencebehavior
(Rappa and
Debackere, 1992a). Persistence is hypothesized todepend
upon
individual, relational
and
collective aspects of theknowledge
creation process thattechnological change entails. Before describing the methodological approach developed
in this paper,
we
first provide an overview oftheknowledge
creation process within aresearcher
community.
COLLECTIVE DIMENSIONS OF
KNOWLEDGE
CREATION
It is realistic to view researchers as
working
in collaborative relationships withone
another (Constant, 1980; Hagstrom, 1965;
Hughes,
1989; Pelzand
Andrews, 1966).As
technological progress oftendepends
on
a synthesis of different competencies,collaboration between researchers
becomes
imperative to solve thecomplex
problemsthat researchers individually are not able to address.
The
creation ofknowledge by
researchers
engaged
in collaborative relationships with peers results in a steadyaccumulation
ofknowledge
that other researchers can buildupon.
Thus,
thedevelopment ofa
new
technologyis not only a cumulative problem-solvingprocess, butalso a collective endeavor.
The
collectivedimension
ofknowledge
creationmost
obviously appears in the acceptance of practices
and
procedures (e.g. the 'searchheuristics' described by Nelson and Winter, 1982) that
become
institutionalizedwithin acommunity
ofresearchers.The
outcome
ofthis process ofinstitutionalization is anincrease in legitimacy ofthe technology being developed. This creates a technological
momentum
by attractingnew
researchers to the field,which
in turnaugments
therapidity with
which
new
technologicalknowledge
is created (Debackereand
Rappa,1993).
Inour previousresearch,
we
have introduced thenotion ofthe researchercommunity
to integrate thevarious aspects ofthe
knowledge
creation and diffusion process (Rappaand
Debackere, 1992b).The
researchercommunity
is defined as a group ofscientistsand
engineerswho
arecommitted
to solve a set of inter-related scientificand
technological problems.
These
researchersmay
be organizationally dispersed in publicand private sector organizations but,
and
this is avital characteristic ofevery researchercommunity,
theycommunicate
with each other.Within
thecommunity, knowledge
SOCIOMETRIC
AND
EVENT-HISTORY
MODELING
3relationships with others. Consequently, persistence is in
some way
determinedby
individual, relational, and collective aspects of the
knowledge
creationand
diffusionprocesswithin this
community.
Previousresearch (Rappa, Debackereand Garud, 1992;
Rappa
and Garud, 1992) hasexplored the directions in
which
individual and collective aspects ofknowledge creationaffect the contribution-spans of researchers in three different fields of technology
development: cochlear implants, polypropylenecatalyst development and
EPDM
rubbercatalyst development. At the individual level, the researcher's cumulative productivity
was
a highly significant predictor (p<0.001) of his or her contribution span.At
thecollective level, the size ofthe researcher
community
was a highlysignificant predictorof the duration of one's contribution-span (p<0.001).
A
more
detailed analysisincludingasecond-order term for the sizeofthe researcher
community
further revealeda
U-shaped
relationship between population sizeand
researcher contribution-spans,which
was interpreted by the authors as suggesting the existenceofa critical mass in afield:
'Thedata indicate that when the community is small, population size is
negatively related to the length ofcontribution-spans and increasingly so until it reachesa sizeofabout 250individuals, atwhich point the slope of the curve turnspositive.This result suggeststhat there may bea point of
critical mass for the community, where its size is sufficiently large to
become significant and thereby increase contribution-spans.' (Rappaand
Garud, 1992: 345).
The
research reported in thispaper intends to take theseanalyses one stepfurther bystudying the influence ofrelational aspects of the
knowledge
creation and diffusionprocess
on
researcher contribution-spans.These
relational aspects capture the extent towhich
researchersassociated with thedevelopment ofa technologycollaboratewith oneanother. Collaboration represents the joint creation ofideas by individual researchers.
Students ofresearch
communities
(Beaver and Rosen, 1979;Edge and Mulkay,
1976;Small
and
Greenlee, 1986; Stewart, 1990) often found that collaborationbetween
researchers in a field steadily increases as the field matures. Consequently, Beaver and
Rosen
(1978) concluded that thegrowth
of collaborationwas
the result of the profession's socialization process:'Within thissystem ofstratificationand domination,collaborationbecomes
a mechanism for both gainingand sustainingaccess to recognition in the
professional community.Collaboration provides ameansofdemonstrating
one'sability to those already in a position to 'recognize' others as well as
keeping upone'soutput from suchaposition. Thuscollaboration actsas a
social regulator: itprovides possibleavenuesof mobilityforthose whoseek
recognition; it also maintainsandsolidifies recognition for thosewhohave
receivedit.' (EeaverandRosen, 1978: 69)
At
thesame
time, though, professional ization entails a division of laboramong
4 K. DEBACKERE, B. CLARYSSE, AND M. A. RAPPA
scientific
and
technologicalproblems
individually.Hence,
we
hypothesize thatcollaborativeefTorts increase the reach ofindividual researchers, thereby allowing
them
to persistwith theircontributions. Thishypothesis is
now
furtherexploredin therest ofthis paper.
RESEARCH
SITEWe
have chosen the field of transgene plantsas a firstcaseto explore the influence ofrelationalaspects on contribution-spans. Transgene plants are asub
domain
ofthenew
biotechnology. It isone of four different biotechnologysub
domains
we
arestudyingatthe
moment.
Transgene plant research has resulted into two major application areas: (1)plant crop protection,
and
(2) plant qualityimprovement.
Interest in plant qualityimprovement
was first aroused in the 1950s as a result of the research into tissuecultures and restrictionsoftissuecultures.
The
emergence of genetic engineeringin the1970s,
combined
with the specification oftheTumor
InducingPlasmid (Ti-Plasmid) in1974, caused a renewed interest in the field.
More
specific, the identification oftheTi-Plasmid laid the foundations of the field that
would
become
known
as plantgeneticengineeringin the 1980s.
The
first plants to b". geneticallyengineered appeared in 1983- Ever since, transgeneplant research has
shown
two major
foci of interest. Plant crop protection aims atdeveloping virus free plants or crops with increased stress, herbicide or disease
resistance. Plant quality
improvement
aims at the production ofhybridsand
at proteinimprovement. Both
areas have generated their first commercial products in the early1990s. Thus, between the early 1980s
and
1993, transgene plants havemoved
from
beingascientific curiosity to apromising commercialactivity.
Data
collection
Journal articles, conference papers and patents in a given field represent a detailed,
self-reported archival record ofthe effortsgenerated by researchers to solvethescientific
and technological problems confronting them. Furthermore, the published literature is
an appealing source ofdata in several respects: the publication conventions ensure a
level ofquality and authenticity; the data can be collected unobtrusively; the findings
can be replicated and tested for reliability; and the data are publicly available and not
very expensive to collect.
When
taken together, the literature can be viewed asa uniquechronology of the efforts of researchers to establish a
new
field,and
can provideinformation about the researchers involved,
where
they were employed,who
theycollaborated with,
what
problems they pursued, andwhen
theywere active in the field.Clearly, it
would
be difficult tomatch
the comprehensivescope and longitudinal natureoftheliteratureusing other data collection techniques.
The
databases ofthe Institute for Scientific Information (Philadelphia, U.S.) wereused to identify publications related to the field of transgeneplants.
The
databaseswereSOCIOMETRIC
AND
EVENT-HISTORY
MODELING
5used by transgene plant researchers.
The
search strategy and the search results werefurther validated through adetailed scrutinyby threeexperts in the field.
The
data collection procedure resulted in the identification of 1,274unique
documents
related to transgeneplants published between1982
and 1992.The
databasewas then used to identify each researcher
who
contributed to the field over theeleven-year period.
The
growth of the field, in terms ofnumber
ofresearchers, isshown
inFigure 1. This procedure yielded a total of2,876 researchers.
A
statistical database wascreated containing time-varyingcovariates foreach researcher.
Figure 1
Growth
oftheTransgenePlant ResearchCommunity,
1982-1992
1000 800 600 400 200 82 84 86 88 90 92 Year
It should be noted, though, that the
documents
require extensiveeditingbefore theycan be used as a source ofdata. This is necessaryin order to createa consistency
among
author
names and
affiliation names. Although such a lackof standardization might notbea
problem
forthe typical user ofanelectronic literaturedatabase, itwould
be amajorsource oferror in determining the duration ofresearchercontribution-spans. Therefore,
it wasessential to meticulously inspect the
name
of each authorand affiliation.METHODS
The
dependent variable for the analysis, the contribution-span, is calculated as thenumber
ofyears that have elapsed from the first to the lastknown
publication for eachauthor. For example, ifa researcher first published in
1985
and last published in 1989,the researcher's contribution span
would
be calculated as five years. Furthermore, it isassumed
thataresearcherwho
publishes inonly oneyear hasacontribution span ofoneyear.
As
a consequence, a researcher'scontribution-span isrelativelyunaffected by his or6 K. DEBACKERE, B. CLARYSSE,AND M. A. RAPPA
Of
course, calculating contribution-spans is straightforward enough. Nevertheless,certain methodological issues arise that need further explanation. It is obvious that at
the
moment
thestudyis done, notall transgene plant researchers haveleft the field. Forthose researchers
who
are active in the field at themoment
ofthe study, the ultimatelength oftheir contribution-span is indeterminate. Since these individuals have notyet
left the field, it is only
known
that the length of their contribution-span issome
minimum
value (i.e., the entryyear to thepresent year).To
account for this 'censoring,'eventhistory statistics were used to analyze thedata (Allison, 1990;
Yamaguchi,
1991).Thesetechniques adjust for the biases thatright-censored data create.
Moreover, determining whether or not a researcher is still active in the field can be
difficult incertain casesbecause
most
ofthem
typicallydo
not publish every year. Thus,an author's contribution-span in a field can be characterized
by
'gaps' ofone
ormore
years in duration in
which
there areno
publications to their credit.The
existence ofdiscontinuities in publication records raises the issue
how
frequentlya researchermust
publish
from
year-to-year in order to be considered an active contributor to the field.This issueis important to determine the proper censoring
scheme
to use in theanalysis.The
question arises:How
longaftersomeone
ceases to publishisit reasonable toassumethat theyare
no
longer in the field.The
answer to this question is necessaryin order todetermine
who
has exited the fieldandwho
continues to be aparticipant.Inspection ofthe dataset
showed
that less than five percent ofthe researchers presenthave a gap between publications of longer than three years in duration,
and
still fewerhave an exceptionally large gap such as five years or more.
These
large gapsmay
beindicative of an individual
who
does not contribute continuously to the field. Thus,although his/her contribution-span
would seem
long, it is not indicative of his/heractual participation in the field. Therefore,
we
considered all contribution-spans withgaps of
no
more
than three years as contiguous. Ifa researcher had a gap longer thanthreeyears,
we
treated it asifthe case exited and thensubsequentlyre-entered thefield.Having
determined the contribution-spans, it is interesting toexamine
the factorsaffecting them.
To
this end, different analytical approaches are possible.Using
theliterature, a
number
of explanatory variables were constructed.As
outlined in theprevious sections ofthis paper,
we
were primarily interested in constructing covariatesthat capture the relational dimensions in the researcher
community
under investigation.These
covariates werecomputed
using the sociometricnetwork
analysis packageSTRUCTURE™
(Version 4.2) developed by Burt atColumbia
University (Burt, 1991).In the next section,
we
providea detailed overview ofthe different relational covariatesused in thepresent analyses.
A
first approach toimplement
the event history analysis is to use a 'single-spell'model. In this approach, the value for each explanatory variable is taken according to
the last year in the author's contribution-span.
The
'single-spell'approach
was
implemented
using theLIFEREG
procedure of SASand
using theLIMDEP
softwarepackage (version 6.0) developed by
Greene
atNew
York
University.This approachwas
chosen forall 2,876 unique authors in the original database.
The
results ofthisSOCIOMETRIC
AND
EVENT-HISTORY
MODELING
In addition, the explanatory variables can be treated as time-varying covariates. In
this second approach, each explanatoryvariable has values that vary in the course ofan
author's contribution-span.
From
a computational perspective, though, this'multiple-spell' approach is
much
more
complex.The L1MDEP
program
allowed us to examinethe factors affecting researcher contribution-spans (with time-varying co-variates) for
689
unique researchers in the database.Hence,
the results of this 'multiple-spell'approach are also reported below, only
now
for a sample of689
researchersfrom
theoriginal database (note: T-test
and
WILCOXON
non-parametric comparisonsshowed
that the distributions of contribution-spans ofthe
689
researcher sample did notdiffersignificantly from thecontribution-span distribution in the complete 2,876 researcher
population). It should also be noted that
we
are currently installing amore
powerfulevent historypackage
on
amainframe
computer
(RATE
)which
will allow us to runthe 'multiple-spell' approach
on
the complete 2,876 authordataset.Besides the sociometric covariates, only a few other explanatory variables
were
introduced in the present analyses. First, thekind of organization in
which
each authoris
employed
was coded according to whether they reside in an academic, government,established industrial, or
new
biotechnologycompany
research laboratory.Given
thetremendous
growth in thenumber
ofnew
biotechnology firms over the last years,we
found
it necessaryand
helpful to distinguishbetween
researchers residing in theresearch laboratories of (mostly large) established firms (e.g.
MONSANTO)
and theircolleagues
employed
by thosenew
start-ups (e.g. Plant Genetic Systems).These
organization variables wereintroduced as
dummy
variablesin theexplanatory analyses.Second, at the individual-level, a variable was constructed to reflect an author's
cumulative productivity in the field
measured
by
the cumulativenumber
ofpublications to his/hercredit.
Third, at the community-level (in order to capture the collective aspects of the
knowledge
creation and diffusion processes in the researchercommunity)
two variableswere created.
One
reflects the populationsizeofthe transgene plants field in each year.Population size is measured in terms ofthe
number
ofindividual authorswho
publishin the field in a given year. In addition, a second-order variable, the square of
population size, was introduced in order to capture any quadratic association between
population sizeand contribution-span.
The
other covariates were then constructed to capture the relational dimensions ofthe
knowledge
creation and diffusion process.INDICATORS
OF RELATIONAL DIMENSIONS OF
KNOWLEDGE
CREATION
As
stated previously,we
are mainly interested in investigating the relationshipbetween relational covariates
and
author contribution-spans. Relational covariates canbe determined by analyzing the authors' sociometric position in the
network
ofco-authorships that emerges from the original database. In order to define and
compute
these sociometric indicators,
we
used the theoryand
the programs developed by Burt8 K. DEBACKERE, B. CLARYSSE, ANDM. A. RAPPA
sociometric indicators has been developed over the last decade (e.g.
Freeman,
1977&1979; Knoke
and Kuklinski, 1990; Rogers and Kincaid, 1981; Yamagishi et al.,1988), the indicators used in the present analysis are based
on
Burt's approach tosociometricmodeling.
The
network
variables werecomputed
using theJEDIT module
ofSTRUCTURE
(Burt, 1991).
The
JEDIT module
allows one to analyze jointinvolvement data. It infersrelations
from
involvement in thesame
events, or affiliations with thesame
groups. Inthisapproach, two actorsi and j aretied together to the extent that theyare involved or
affiliated with the
same
events or groups.Where
scientific articles are eventsand
theauthors ofthe article are actors, zjj measures the connection between persons i
and
j inthe metric of
number
of co-authored articles.These
connectionsare subsequently usedto
compute
different sociometric indices. For the analyses reported in this paper, thesociometric indices
were
treated as time-varying covariates. Thismeans
they werecomputed
foreach year (ornumber
ofyears) ofan author's contribution-span.The
time-periods used tocompute
the time-varying networkvariables were, in part,determined by thecomputational limitationsofthe
program
(which poseconstraintson
the
maximum
number
ofactors in the network to be analyzed simultaneously). Hence, thenetwork
indiceswerecomputed
for the periods 1982-84, 1985-87, 1988-89,and
the years 1990, 1991, and 1992. Thus, an author
who
isactive in the years 1982,1983
and
1984
will have thesame
value for the sociometric indexin each year. Ifthe authoris still active in 1985, his or her sociometric indices
may
change, though. Thisaggregation ofyears in;o longer time-periods is, ofcourse, to a certain extent arbitrary.
However,
sensitivity analyseson
the time periods used did not revealany major
alterationsin theresultsdiscussed below.
First, five variables (called
autonomy
variables) areproduced
to describe author i'snetwork, analyzed asego.
These
variables provide measures ofthesize, the density, andthe brokerage opportunities in an individual's network.
The
first variablesimply reflects the ego-network, N. It is a count ofego's contacts[CTACTS]. This is everyone
who
is connected with i. It is the sizeofi's network. In thepresent analyses, this variable is equal to the
number
ofco-authors each individualauthor hasduring the time-periods considered in the analyses.
The
second
variable,nonredundant
contacts, is acount
of thenumber
ofindependent contactsin ego's
network
[NONRCTACTS].
The
meaning
ofthisvariable iseasyto grasp. Let us assume that authori appears
on one
particular articlewith twoco-authors, j
and
k. In this case, the first contact variable[CTACTS] would
equal 2. It issimply the
number
of co-authors withwhom
i is involved.The
second variable[NONRCTACTS],
though, will equal 1 for author i, since bothj
and
k arestrongly tied.Thus,
by
counting the connection between i and j, the connection between iand
k isimplicit given the strong connection between j
and
k. The distinction between bothvariables describing the author's
network
is thus basedon
the concept of transitiverelationships.
The
third variable describing ego's network is called contact efficiency [CTACTEFF].SOCIOMETRIC
AND
EVENT-HISTORY
MODELING
9The
fourth variabledescribes the networkdensity [NDENS]. It is theaverage marginalstrength ofrelations between contacts (Burt, 1991):
NDFNS
=[IjI
qzjq/max(zjk)]/[N(N-l)], j*q
where
max(zjk) is the largest ofj's relations to anyone, so density rangesfrom
(norelations between contacts) to 1
(maximum
strength relationsbetween all contacts).The
fifthvariable is a proportional densitymeasure [PRDENS]. It is the proportionofcontactpairs that have
some
kind ofconnection with oneanother (Burt, 1991):PRDENS=IjI
q9jq/N(N-l), j*q
whered\q is 1 ifzjq is nonzero, otherwise 3jq equals 0. This measure ofdensity varies
from (no relations between contacts) to 1 (every pair of contacts is connected). In a
networkof contacts allconnected by
weak
relations,NDENS
islow andPRDENS
is high.Given
the nature of our data (strong ties basedon
co-authorships), both densitymeasurescan be expected to be correlated.
For theco-authorship data used in the present analyses, the correlations between the
fivenetwork indices are
shown
inTable 1.Table 1
Correlations
among
Five Indicatorsof Ego'sAutonomy
Position in10 K.
DEBACKERE,
B. CLARYSSE,ANDM. A. RAPPASTATUS
ofi=Ij9ij/(N-l), j*iwhere
N
is thenumber
of authors in thewhole
community
(notjust those connected toi),
and
d\\ equals 1 ifj can reach i (zij>0), otherwise d[\ equals 0.STATUS
variesfrom
(when no
one reachesi) to 1(when
everyoneelsereachesi).This can be misleading
when
two
individuals are the object of relationsfrom
thesame
number
ofothers, butone
receivesweak
relationswhile the other receives strongrelations. Therefore, Burt's second
power
index weights the relationships (read:co-authorships) by theirstrength (where
max
(zjk) isj'sstrongest relation toanyone
else):EXTREL=extensiverelationstoi= Lj[zji/max(zjk)]/(N-l), j*i,k
Thisvariable ranges
from
(when
i receivesno
relations) to 1(when
i is the object ofamaximum
strength relationfrom
every actor).Given
the nature ofour data structure(strong ties based
on
co-authorships),we
can expect bothpower
indicators to bestronglycorrelated (see also Table 2). Therefore, only one of
them
is used in theevent-history analyses.
Two
additionalpower
indices arecomputed
bySTRUCTURE
.One
ofthem
is theauthor's exclusive
prominence
in thecommunity
[EXCLREL]. This index is relevantbecause a
member
of a highly connected clique has a high scoreon
extensiveprominence
[EXTREL], but so does everyoneelse in the clique. Therefore, theexclusiveprominence
indicator measures the extent towhich
author i is the object ofexclusiverelations
from
everyone. Itvariesfrom
(when
i receivesno
relations) to 1(when
i isthe only contact for every other actor). It distinguishes
between
the 'top'and
the'bottom' of thesocial structureina given system.
Finally,
we
computed
the 'ultimate'power
ofeach author[POWER]
as the extent towhich
he/she is the object ofexclusive relations from powerful others.The
correlationsbetween the four
power
variables areshown
inTable 2.Table2
Correlations
among
Four Indicators of Ego'sPower
Position insociometric
and
event-history
modeling
Results
11
Non-parametricestimate1:ofthesurvival
and
hazardfunctionsUsing the published literature data
on
transgene plants for the period 1982-1992,the contribution-span for 2,876 researchers and several explanatoryvariables associated
with each authorwere compiled into a statisticaldatabase.
The
non-parametricsurvivaland
hazard rates were analyzed using theLIFETEST
procedure ofSAS.Of
the 2,876cases, 1,303 (45-3%) wereactivewithin the lasttwo years ofthe dataand were classified
ascensored.
Using the LIFETEST procedure, the firststep in the analysis consisted of
making
non-parametric estimates of the survival and hazard functions for the data.
The
lifetableapproach was chosen.
The
results ofthis procedure areshown
in Figures 2and
3-The
median
survival timeis about twoyears: that is, half thesample has left the field withintwo yearsoftheirfirst publication.
The
survival time diminishes rapidly and levels offatabout 0.20 forcontribution-spans ofeight yearsormore.
Figure 2
Non-ParametricEstimate ofSurvival Function for Researcher
12 K. DEBACKERE, B. CLARYSSE,AND M. A. RAPPA
Figure 3
Non-Parametric Estimate ofthe Hazard Function for Researcher
Contribution Spans inTransgenePlants
-o
2 4 6 8 10 12
Duracion ofContribution Span (years)
Parametric models
of
contribution-spans: thesingle-spellapproachThe
nextstep in the analysis was to determinethe parametricmodel
that best fits thedistribution ofcontribution-spans.
The
basicmodel
adoptedfor thisanalysiswas:Y
= Xfi +ce
where
Y
isthe log ofthecontribution-span,X
is the matrix of explanatory variables,R
isa vectorof
unknown
regression parameters,C
is a scale parameterand
£ is a vector oferrors
from
anassumed
distribution. Specifically,we
evaluated three different types ofdistributions: the exponential, Weibull,
and
log-logistic distribution.The
parameterswere
estimatedby maximum-likelihood
using aNewton-Raphson
algorithm.The
overall fit ofa
model
is represented by the log-likelihood function.On
the basis of thelog-likelihood score, thelog-logistic distribution appeared to provide thebest overall fit.
Hence, thisdistribution was chosenas thebasisforestimatingthe regression coefficients
oftheexplanatory variables in the
model
(seeTable3).The
model was
estimated both with theLIFEREG
procedure ofSAS
and
usingLIMDEP.
The
results of these analyses areshown
in Table 3.Among
thedummy
variables that control for organization type, the variables that distinguish academic
laboratories
and
the laboratories of established industrial firmsfrom
other types oforganizations are both significant (p<0.05).
The
positive coefficients suggest thatresearchers
employed
inacademic
or established firm laboratories have longercontribution-spans
compared
to researchers with other types of affiliations.More
SOCIOMETRIC
AND
EVENT-HISTORY
MODELING
13significance. Although these types of organizations are often claimed to playa vital role
in the
development
of anew
technology, especially in biotechnology, researchersemployed
at these firmsdo
not appear to have longer contribution-spans than theircolleagues at otherorganizations.
At the community-level,
we
find the first- and second-order population terms to bestatisticallysignificant (p<0.001).
The
negativecoefficientfor population sizecombined
with the positive coefficient for thesecond-order term implies a U-shaped relationship
between
population sizeand
researcher contribution-spans. This result supportsprevious research
which
suggests that 'theremay
be a point ofcritical massfor thecommunity, where its size is sufficiently large to become significant
and
thereby increasecontribution-spans'(Rappa and Garud, 1992).
At the individual-level, an author's cumulative productivity is highly significant
(p<0.001). Thus,
and
as might be expected, researcherswho
accumulate a greaternumber
of publicationswill have longer contribution-spans.Perhaps
more
interesting are the relational variables that have been included in theanalyses.
Three
out of the five 'autonomy' indices attain statistical significance(NONRCTACTS,
p<0.01;CTACTEFF,
p<0.05,NDENS,
p<0.05).They
indicate that anindividual author's position in a
network
of co-authors influences his or hercontribution-span.
More
specifically, themore
a researcher isembedded
incommunal
collaborations with his peers in the
community,
the higher his longevity in the field.Also, networkdensity is positively related to researchercontribution-spans.
The
importance ofnetworkpositions on contribution-spansis further emphasized bythe 'power' indices. Exclusive
prominence (EXCLREL)
is positively related to aresearcher's contribution span (p<0.001) suggesting that a researcher's
contribution-span increases
when
hebelongs to the 'top' ofthesocial structure within his or her field.The
only variablewhich
does not behave as one might expect is the STATUS-variable.Although
it is statistically significant (p<0.01), its negative coefficient requires further exploration.Parametric models
of
contribution-spans: the multiple-spellapproachIntroducing time-varyingcovariates isthe next logical stepin thepresent analysis. In
a multiple-spell approach, the covariates areallowed to take
on
different valuesfor eachyear the author is active in the field.
Due
to their computational complexity, inparticular
when
the dataset isfairlylarge, multiple-spell modelsare onlyrarely used. Forinstance, for the 2,876 researchers in the database, a multiple-spell
model
results in4,833 spells.
We
implemented
the multiple-spell approach usingLIMDEP.
Due
to computationallimitations, both the
number
ofcovariates and thenumber
ofspellshad to be restricted.The
results ofthis exploratoryanalysisarereported inTable 4.Although
the multiple-spell results should be considered preliminary (only689
unique authors out of the 2,876 author-database could be used, resulting in 1,104
spells), it is interesting to note that the findings reported in Table 4 arein line with the
14 K. DEBACKERE, B. CLARYSSE,AND M. A. RAPPA
Table3
SOCIOMETRIC
AND
EVENT-HISTORY
MODELING
15Tabic 4
16 K.
DEBACKERE,
B. CLARYSSE, ANDM. A. RAPPAAs far as organization-type is concerned, authors in
academic
laboratories orresearchers residing in established firms contribute longer than their peers in other
research organizations (p<0.01).
The
author's cumulative productivity has thesame
significant effect as in the single-spell
model
(p<0.001).The
relational variablesincluded in the analysis are all statistically significant (p<0.05), except for the
STATUS
variable.
The
directions of their influence are to a large extent similar to the onesreported with the single-spell
model
(see Table 3), except for thenetwork
densityvariable
which
has a negative coefficient. This isdue
to the multiple-spell approach.Indeed, the longer the contribution-span of the author (over the period 1982-1992),
thelarger thesize ofthe
community
hasgrown
(see Figure 1) and, almost naturally, thesmaller thenetwork density.
The
multiple-spell approach unveils this dynamic, whereasthe single-spell approach docs not. Also, the result for the first-order
term
of thepopulation variables is slightlydifferent from theones reported inTable
3-To
conclude, although the multiple-spell results are promising,we
should keep inmind
that, at present, they were obtainedfrom
a limited sample ofthetotal populationwith only a restricted
number
of explanatory, albeit time-varying, covariates included inthe model.
Discussion
and Conclusion
In this paper,
we
have investigated anumber
ofdeterminants of contribution-spansofresearchers in the field of transgeneplants, using the published literature as asource
ofdata. Non-parametric estimatesofthe survival rateand hazard ratewere made, and it
was found that a majority ofresearchers tend to leave the field within two years oftheir
first publication, while those
who
persist are less likely to leave the field at all.The
factthat it is the early years that are
most
critical for a researcher to leave a field isunderstandable, since at that point not
much
ofa researcher's career hasbeen investedin the field.
An
examination of the relationship between several covariatesand
the length ofcontribution-spans indicates the importance oforganization-type: researchers residing
in academic laboratoriesor in the laboratoriesof(mostlylarge) established firmstend to
contribute longer than, say for instance, their colleagues
employed
by
new
biotechnology firms.
Not
astonishingly, an author's cumulative productivity has apositive effect
on
his contribution-span. At the community-level, population sizeappearsto haveeffects thatshould not be neglected.
The
major focus of the paper, though, was to start investigating the relationshipbetween
the relationaldimensions
of a researchcommunity
and
researchercontribution-spans.
To
this end, sociometric network techniqueswere used tocompute
a range of variables reflecting the authors'
embeddedness
within their researchcommunity.
Co-authorship data provided the starting-point tocompute
sociometricindicators. Notwithstanding thepreliminary nature ofthe results reported in thepaper,
the findings lend support to the hypothesis that a researcher's position in the
contribution-SOCIOMETRIC
AND
EVENT-HISTORY
MODELING
17span. Thus, the better a researcher is
embedded
inresearch collaborations withinafield,the higher the likelihood hewill persist.
The
analyses were pursued using single- and multiple-spell approaches. It is obviousthat the multiple-spell approach is the methodologically
more
correct one. For thepresent analyses, both approaches yielded comparable results.
However,
themultiple-spell approachis currentlybeing explored ingreaterdetail.
Finally,
we
believe the application ofsociometric and event history techniques tobibliometric dataoffers
new
perspectives to themanagement
ofR&D
activities. Indeed,analyzing contribution-spans
may
eventually serve as a useful approach to unveil thefundamental
dynamics
ofnew
technology development.By
focusingon
thedeterminants ofresearcher contribution-spans, our
aim
is to shift attentionaway from
predicting the technological future and towards understanding the fundamentals of
researcher behavior.
Improvements
in our understandingofsurvival and hazard ratesforresearchers in a field
may
ultimately lead to the identification ofcritical factorsand
events (e.g. with regard to networking, cfr. the E.C.
Human
Capital and MobilityProgram)
that can inform our decisions regardingemerging
technologies.A
comprehensive and systematic insight into thesustained
commitment
ofresearcherstothe ideas they arepursuing
may
therefore proveinvaluable.References
Allen, T.J. 1966. 'Studies ofthe problem-solving process in engineering design,'
IEEE
Transactions onEngineering
Management,
Vol. 18, pp.72-83-Allison, P.D. 1990. Event History Analysis. Applied Social Research
Methods
Series,Vol. 46,
Newbury
Park: Sage Publications (fifthprinting).Beaver, D.
and
R. Rosen. 1978. 'Studies in scientific collaboration,' Scientometrics,Vol.1, pp. 65-84.
Ben-David, J. and R. Collins. 1966. 'Social factors in the origins ofa
new
science: thecaseof psychology,' AmericanSociologicalReview,Vol. 31, pp. 451-465.
Burt,R.S. 1976. 'Positions in networks,' SocialForces, Vol. 55, pp. 93-122.
Burt, R.S. 1980. 'Models ofnetworkstructure,'
Annual
Review ofSociology, Vol. 6, pp.79-141.
Burt, R.S. 1991. Structure Reference
Manual:
Version4.2.Columbia
University's Centerfor the Social Sciences.
Burt, R.S. 1992. Structural Holes: The Social Structure ofCompetition.
Cambridge,
Mass.: Harvard University Press.
Collins,
H.M.
1974. 'The TEA-set: tacitknowledge
and scientific networks,' ScienceStudies,Wo\. 4,pp. 165-186.
Constant,
E.W.
1980. The Originsof
the Turbojet. Baltimore:The
JohnsHopkins
UniversityPress.
Crane, D. 1972. Invisible Colleges: Diffusion
of
Knowledge in Scientific Communities.18 K. DEBACKERE, B. CLARYSSE,AND M. A. RAPPA
Dcbackere, K.
and M.A.
Rappa. 1993- 'Institutional variations in problem-choice andpersistence
among
researchers inan emergingfield,' forthcomingin ResearchPolicy.Dosi, G. 1982. 'Technological paradigms and technological trajectories,' ResearchPolicy,
Vol. 11, pp. 147-162.
Edge,
D.O.
and
M.J. Mulkay. 1976. Astronomy Transformed: The Emergence of RadioAstronomyin Britain.
New
York:John Wiley
&
Sons.Freeman, L.C. 1977. 'Aset ofmeasuresofcentrality based
on
betweenness,' Sociometry,Vol. 40, pp. 35-41.
Freeman, L.C. 1979. 'Centrality in social networks. I: Conceptual clarifications,' Social
Networks, Vol. 1, pp. 215-239.
Gieryn, T.F. 1978. 'Problem retention
and
problem
change in science,' SociologicalInquiry, Vol. 48, pp. 96-1 15.
Gieryn, T.F.
and
R.F. Hirsch. 1983. 'Marginalityand
innovation in science,' SocialStudiesofScience, Vol. 13, pp. 87-106.
Granovetter, M.S. 1973- 'Thestrength of
weak
ties,' American Journal ofSociology, Vol.78, pp. 1360-1380.
Hagstrom,
W.O.
1965- TheScientificCommunity.New
York: BasicBooks
Inc.Hughes,
T.P. 1989.American
Genesis:A
Centuryof
Inventionand
TechnologicalEnthusiasm.
New
York: Viking.Katz, R.
and
T.J. Allen. 1982. 'Investigatingthe Not-Invented-Here(NIH)
syndrome:a look at the performance, tenure and
communication
patterns of50R&D
groups,'R&D
Management,
Vol. 12, pp. 7-19.Knoke, D. and
J.H. Kuklinski. 1990.Network
Analysis. Applied Social ResearchMethods
Series,Vol. 28,Newbury
Park: SagePublications.Kuhn,
T.S. 1970.The
Structureof
Scientific Revolution. Chicago:The
University ofChicago Press (2nded.).
Laudan, R. 1984. The Nature
of
Technological Knowledge. Dordrecht: Reidel.Layton, E. 1974. 'Technology as knowledge,' Technology
and
Culture, Vol. 15, pp.31-41.
Layton, E. 1977. 'Conditions of technological development,' in Science, Technology
and
Society:
A
Cross-Disciplinary Perspective, I. Spiegel-Rosingand D.
de Solla Price(eds.),
London
and Beverly Hills: Sage Publications.Nelson, R.R.
and
S.G. Winter. 1982.An
Evolutionary Theoryof Economic
Change.Cambridge, Mass.: Harvard UniversityPress.
Pelz,
D.C. and F.M.
Andrews. 1966. Scientists in Organizations:Productive Climates forR&D.
New
York:John
Wiley.Polanyi,
M.
1958. PersonalKnowledge. London: RoutledgeandKegan
Paul.Rappa, M.A., Debackere, K.
and
R. Garud. 1992. 'Technological progressand
theduration of contributionspans,' TechnologicalForecasting
and
SocialChange, Vol. 42,pp. 133-145.
Rappa,
M.A.
and K. Dcbackere. 1992a. 'Monitoringprogress inR&D
communities,' inSOCIOMETR1C
AND
EVENT-HISTORY
MODELING
19Winterhager (eds.). Leiden:
DSWO
Press, Lisbon Institute, University of Leiden, pp.253-265.
Rappa,
M.A.
and K. Debackere. 1992b. 'TechnologicalCommunities
and the DiffusionofKnowledge,'
R&D
Management,
Vol. 22,No.
3, pp. 209-220.Rappa,
M.A.
and R. Garud. 1992. 'Modelingcontribution-spans ofscientists in a field:the caseof cochlear implants,'
R&D
Management,
Vol. 22,No.
4, pp. 337-348.Rogers,
E.M.
and D.L. Kincaid. 1981.Communication
Networks:Toward
aNew
Paradigmfor Research.
New
York: MacMillan.Root-Bernstein, R.S. 1989. Discovering: Inventing
and
SolvingProblemsat theFrontierofScientificKnowledge. Cambridge, Mass.: Harvard UniversityPress.
Rosenberg, N. 1982. Inside theBlack Box. Cambridge,
UK:
Cambridge
UniversityPress.Sahal, D. 1981. Patterns
of
TechnologicalInnovation. Reading, Mass.:Addison-Wesley
PublishingCo.
Small, H. and E. Greenlee. 1986. 'Collagen research in the 1970s,' Scientometrics, Vol.
10, pp. 95-117.
Stewart,J.A. 1990. Drifting Continents
and
CollidingParadigms. Bloomington: IndianaUniversityPress.
Tushman,
M.L.
and
P.Anderson.
1986. 'Technological discontinuitiesand
organizational environments,' Administrative Science Quarterly, Vol. 31, pp.
439-465.
Yamagishi, T., Gillmore,
M.R.
and K.S.Cook.
1988.'Network
connectionsand
thedistribution of
power
in exchange networks,' American Journal ofSociology, Vol. 93,pp. 833-851.
Yamaguchi,
K. 1991. Event HistoryAnalysis. Applied Social ResearchMethods
Series,Vol. 28,
Newbury
Park: SagePublications.Ziman,
J. 1987.Knowing
Everything about Nothing.Cambridge,
UK: Cambridge
&B
A i, R .!F., s d">l II3
10*0
OOflSb? fl
s
Date
Due
JUH*6tf*
j£&
5 .