HAL Id: hal-01834038
https://hal-amu.archives-ouvertes.fr/hal-01834038
Submitted on 11 Apr 2019
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Title length
Yann Bramoullé, Lorenzo Ductor
To cite this version:
Yann Bramoullé, Lorenzo Ductor. Title length. Journal of Economic Behavior and Organization,
Elsevier, 2018, 150, pp.311 - 324. �10.1016/j.jebo.2018.01.014�. �hal-01834038�
Title length
Yann Bramoullé
a, Lorenzo Ductor
b,∗aAix-Marseille University (Aix-Marseille School of Economics), CNRS & EHESS, Centre de la Vieille Charité, Marseille, France
bMiddlesex University London, The Burroughs, NW4 4BT, London, United Kingdom
a b s t r a c t
Wedocumentstrongandrobustnegativecorrelationsbetweenthelengthofthetitleofan
economicsarticleand differentmeasuresofscientificquality.Analyzingallarticlespub-lishedbetween1970and 2011and referenced inEconLit, wefindthatarticleswithshorter
titlestendto bepublishedinbetterjournals,to bemorecited andtobemore innova-
tive.Thesecorrelationsholdcontrollingforunobservedtime-invariantandobservedtime- varyingcharacteristicsofteamsofauthors.
1. Introduction
Wedocumenta strongandrobust negativecorrelation betweenthelength ofthetitle ofan economicsarticleandits scientificquality.Articleswithshortertitlestendtobepublishedinbetterjournals.Controllingforjournalquality,articles with shorter titles tend to receive more citations. To illustrate, papers publishedbetween 1970 and 2011 in economics journalshadatitlecomposed,onaverage,of78characters.Overthesameperiod,theaveragetitlelengthofpapersinthe AmericanEconomicReviewwas60characters.WithintheAER,thetop20mostcitedarticleshadan averagetitlelengthof 52characters.
Thesecorrelations mayhave different explanations. Title length could have a causal impact on quality measures. For instanceashorttitle could makea papereasiertomemorize,whichcould directlyincrease itscitations. Title lengthand qualitycouldalsohavecommondeterminants.Authors’research andwritingskills,inparticular,could affectboththepa- per’smeasured outcomesandthe lengthofitstitle. Goodresearchersmaysystematicallyadoptshortertitlesandpublish well-citedpapersingoodjournals.Othercommondeterminantsmaybehardertomeasure.Wenotablyexpectmorenovel papers tohave shorter titles.Seminal articles oftenhave very short titles;consider “A Theoryof Production” (Cobband Douglas,1928), with22 charactersor“EconomicGrowthandIncomeInequality” (Kuznets,1955) with37 characters. Fur- therstudiesonthesetopicshadtoadoptlongertitles.Novelty isnoteasyto measure,however,andcouldbe imperfectly correlatedwithjournalqualityorcitationcounts.
∗ Corresponding author.
E-mail address: [email protected] (L. Ductor).
Weprovidean in-depthanalysisoftherelationshipbetweentitle lengthandscientificquality.Weexamine allarticles publishedbetween1970and2011andreferencedinEconLit.Ourinitial samplecontainsmorethan 580,000articles.1 We consider three measures ofarticle quality. We look atan impact factor of the journalin which the article is published andatthenumberofcitationsreceivedby thearticle.Citationscapturetheimpact ofthearticleonsubsequentresearch.
Somerecentstudiesindicate,however,thatcitationsmaybelittlecorrelatedwiththearticle’snovelty,seeLeeetal.(2015). We build onrecentadvances in bibliometricanalysis andconstructan indexofnovelty basedon keywordatypicality, as in Boudreau et al. (2016) and Sreenivasan (2013). Keywords capture central aspects of research and more novel papers likely have atypical keywords or keyword combinations. These three measures - journalquality, citations and novelty - capturedifferentdimensionsofthescientificqualityofanarticle.Ourmainobjectiveistobetterunderstandtherelationship betweentitlelengthandthesedifferentdimensions.
Weestimateregressionsofanarticle’squalitymeasureontitlelengthandonanexpandingsetofcontrols.Wecontrol, first,forthe article’snumberofpages andforJELcode fixedeffects, capturingsystematicdifferencesacrossfields.When studyingcitationsandnovelty,wecontrolforjournalfixedeffects.Wethusanalyzehowcitationsornoveltyarerelatedto titlelengthcontrollingforjournalunobserved heterogeneity.Importantly,wealsocontrolforcharacteristicsoftheauthors.
We include a full set of dummies atthe team level.2 Theseteam of authors dummies capturetime-invariant (observed andunobserved)characteristicsoftheteams,suchasteamabilityandgendercomposition.Wealsocontrolforobservable time-varyingteamcharacteristicssuch asspecialization,past teamoutputandaveragepast outputoftheteammembers.
Weexpecttheseteamdummiesandtime-varyingcharacteristicstoaccountreasonablywellfortheimpactofauthors’skills andability.Finally,we includenoveltyasacontrol intheregressionsonjournalquality andcitationstoseewhetherthe observedrelationshipsbetweentitlelengthandthesemeasuresaremediatedthroughnovelty.
Inallspecifications,ourmainestimatesarenegativeandstatisticallyandquantitativelysignificant.Articleswithshorter titlestendtobepublishedinbetterjournals.Theytendtobemorecitedandtogethighernoveltyscores.Moreover,these tendencies aremorepronouncedinbetter journals.Controllingforteamcharacteristicshasastrongimpactintheregres- sions onjournalquality.Themainestimate, whichremainsstatisticallyandquantitativelysignificant, isapproximatelydi- videdbythree.Thisshowsthatbetterteamsindeedhaveastrongsystematictendencytopublisharticlesinbetterjournals andwithshortertitles.Bycontrast,themainestimatesareessentiallyunchangedwhencontrollingforteamcharacteristics inregressionsoncitations ornovelty.Articleswithshortertitlestendtobe morecitedandmorenovelthan articleswith longertitles,controllingforauthors’skillsandabilities.
Moreover,includingnoveltyintheregressionsonjournalqualityandcitationshasessentiallynoimpactontheestimates.
Thismeansthattheobservedrelationbetweentitlelengthandcitationsisnotexplainedbythefactthatnovelarticlestend to haveshortertitlesandtobe morecited. Together,theseresultsshow that title lengthcorrelates well withtheoverall scientificqualityofapaper.
Ouranalysiscontributestothescientificstudyoftheacademicprocess.3 Tworecentstudiesdocumentanegativeasso- ciationbetweentitlelengthandcitationsonlargesamples,controllingforjournalfixedeffects.Letchfordetal.(2015)look at highly cited articles in the Web of Science across all disciplines. Independently of our analysis, Gnewuch and Wohlrabe(2017)considereconomicsarticlepublishedbetween1980and2015andreferencedintheWebofScience.4
Webuildontheseearlierstudiesandmaketwomaincontributionstothestudyoftherelationbetweentitlelengthand scientificquality.First,weanalyzetwootherdimensionsofscientificqualitythancitations.Weprovidethefirstlarge-scale studyofthe relationbetweentitle length andtheimpactfactor ofthejournalin whichthearticle ispublished. Perhaps moreimportantly,webuildan originalindexofnovelty,basedonkeywordatypicality,andprovidethefirstanalysisofthe linkbetweennoveltyandtitlelength.Weshowthattitlelengthisalsostronglynegatively correlatedwiththesetwomea- sures.Second,weprovidethefirstanalysiscontrollingfortime-invariantandobservedtime-varyingcharacteristicsofteams ofauthors.This allowsusto account formaindeterminantsof scientificquality,which wereignored inprevious studies.
We show that twothirds oftheassociation betweentitle length andjournalquality isexplained by teamcharacteristics whilesurprisinglytheobservedrelationbetweentitlelengthandcitationsisquantitativelyrobusttotheinclusionofteam characteristics intheregressions.5 Overall,ourresults reveala strongandrobust negativerelationbetweenthe lengthof thetitleofanarticleanditsscientificquality.
Theremainder ofthepaperisorganizedasfollows.We describethedatainSection 2.Wepresentourempiricalspec- ifications inSection 3.We studytherelation betweentitle length,journalquality,citations andnoveltyin Section4 and concludeinSection5.
1We provide descriptive statistics and figures based on this initial sample in Section 2 . In our regressions, the largest sample studied contains about 490,0 0 0 articles.
2Suppose, for instance, that author a has published single-authored articles, articles with author b , articles with authors b and c , and articles with author d . This leads to the inclusion of four different dummies in the regressions: δ{a}, δ{a,b}, δ{a,b,c}and δ{d}where, for instance, δ{a,b}= 1 for articles written by a and b and 0 otherwise.
3In economics, see, for instance, Bramoullé et al. (2014), Ductor et al. (2014) and Fafchamps et al. (2010) .
4Researchers have been looking for correlates of articles’ quality and impact, see, e.g., Haslam et al. (2008) , Hartley et al. (1988) , Smart and Bayer (1986) and Webster et al. (2009) . For instance, article length appears to be a good predictor of citations, across different periods and disciplines, see Card and DellaVigna (2013) , Falagas et al. (2013) and Fox et al. (2016) . The negative relation with title length was noticed in some early studies on small samples, see Jacques and Sebire (2009) , Jamali and Nikzad (2011) and Habibzadeh and Yadollahie (2010) .
5We also control for field fixed effects, which are ignored in Letchford et al. (2015) and Gnewuch and Wohlrabe (2017) .
0.005.01.015.02Density
0 50 100 150
Number of title's characters
AA A
B Unranked
Fig. 1. Distribution of title length by journal quality. Notes : The sample consists of all regular articles published between 1970 and 2011 and referenced in EconLit , 580,055 articles. We consider the Epanechnikov kernel.
2. Thedata
Weanalyse two main datasets. We firstuse informationon all articles publishedbetween1970 and 2011in journals listedinEconLit,thebibliographyofeconomicsjournalscompiled bythe AmericanEconomicAssociation.We excludefrom thesamplearticlescontainingintheirtitlethewords“note” , “comment” ,“preface” ,“remark” ,“reply” and“foreword”.6 Thisfirst datasetcontains 580,055 articles published in 1617 journals,and is thus composed of the entire body of aca- demiceconomicspublicationsovertheperiod.Wemeasurejournalqualitythroughastandardimpactfactorintroducedby KodrzyckiandYu (2006).The Kodrzycki andYuindex(hereafter,KY)uses an iterativeprocess,excluding self-citations of thejournalandweighting citationsontheinfluenceofthecitingjournal.Asthisindexisnotavailableforallthejournals listedinEconLit,weusethepredictedimpactfactorderivedinDuctor etal.(2014)forjournalsnotlistedinKY.7 Notethat thismeasureisconstantanddoesnotvaryovertime.Computingatime-varyingimpactfactorforthe1617journalslisted inEconLit wouldbe quite challengingas mostofthese journalsare relatively new andcitations are not easily available formostofthem.In addition,changesinjournalsimpact factorsappeartobe quitesmallineconomics, bothinabsolute termandrelativelytootherdisciplines,seeAlthouseetal.(2009).Asarobustnesscheck,weconsideralternativemeasures ofjournalquality.We notably analyzetwo rankings basedon coarse categories,whichshould be unaffected by marginal changesinspecific journalsimpact factors,seeSupplementary AppendixA.We findthatresults arerobust tothequality measure.
SinceEconLitdoesnotprovideinformationoncitations,weretrievedcitationsfromtheWebofScience(ThomsonReuters 2014).Duetotheexpensiveandtime-consumingnatureoftheretrievalprocess,wefocusonjournalsappearinginthelist oftheTinbergenInstitute.Thislistcontains117journalsrankedinthreecategories:AA(5journals),A(27journals),andB (85journals), seeAppendixA.Itarguablycontainsmostoftheestablished journalsofourdiscipline.The citationdataset includesinformationon161,699articlesandthenumberofcitationstheyreceivedyearlyuntil2013.Themostcitedpaper,
“ProspectTheory:anAnalysisofDecisionunderRisk” (KahnemanandTversky,1979),received8571citations,while13%of thepapershavenocitations.
We measure the novelty of an article by building an index based on atypicality of keyword combinations, following Boudreauetal.(2016) andSreenivasan (2013).Keywords capturecentralaspects oftheresearch.Thisindexmeasures the relativeinfrequencyofpairs ofkeywordsofthe article,comparedtopreviously publishedarticles.Formally,let Ki denote thesetofkeywordsofarticlei.LetNt;k
1,k2 denotethenumberofarticlespublishedinyeart orbefore whichcontainthe twokeywordsk1andk2.LetNt denotetheoverallnumberofpairsofkeywordscontainedinarticlespublishedinyeartor
6These atypical articles only represent 1.1% of the full sample and tend to have longer titles and to be less cited. Including them in the analysis leads to estimates of the relation between title length and scientific quality that are slightly greater in magnitude.
7To illustrate, the KY impact factor is 27.1 for the American Economic Review , 6.6 for the Journal of Development Economics and 0.6 for the Pacific Economic Review .
0.005.01.015Density
0 50 100 150
Number of title's characters
>90th 10-90th
<10th
Fig. 2. Distribution of title length by citations. Notes : The sample consists of all regular articles published between 1970 and 2011 in a journal listed in the Tinbergen Institute list, 161,699 articles. We consider the Epanechnikov kernel.
before.Definetheprobabilityofobservingthepairofkeywords{k1,k2}beforeyeartas Pt(k1,k2)=Nt;k1,k2
Nt
Notethat−ln(Pt)measurestheatypicalityofthepairofkeywords{k1,k2}inthisperiod.Novelty ofarticleipublishedin yeartisthenequaltothenormalizedaverageatypicalityofitspairsofkeywords.Formally,
No
v
i=−k1=k2∈Kiln(Pt(k1,k2))
|
Ki||
Ki+1|
ln(Nt)where|Ki|isthenumberofkeywordsofarticlei.Notethat0≤ −ln(Pt)≤ln(Nt)andhenceNovivariesbetween0and1.An article’snoveltygetsclosetoitsmaximalvalueof1whenallitspairsofkeywordsareextremelyatypical.8
Wemeasurethe lengthofanarticle’stitle bysimplycountingthenumberofcharacters, includingspacesandpunctu- ations. Averagetitle length in theEconlit sampleis 78; some papershave very longtitles.9 Descriptive statistics give us afirst ideaofthelinkbetweentitlelength andscientificquality.Overthewholeperiod,averagetitle lengthis61 forAA journals,66 forA journals,70for B journalsand81 forunrankedjournals. Fig.1 showsthe distributions oftitle length acrossjournalcategoriesusingtheentireEconLitsample,580,055articles.Weobserveaclearorderingofthedistributions byjournalquality.Statistically,anynotionthatdistributionsarenotorderedthroughFirst-OrderStochasticDominancerela- tionsalignedwithqualityisrejected.Fig.2showsthedistributionsoftitlelengthforarticlesinthefirstdecileofthecitation distribution,thoseinthelastdecileandallotherarticlesusingtheTIsample,161,699articles.Again,thesedistributionsare unambiguouslyorderedandarticleswithmorecitationsappeartohaveshortertitles.Fig.3showsthedistributionoftitle lengthacrossdifferentpercentilesofthenoveltydistribution,forthesampleofarticleswithatleastonekeyword,464,835 articles.Clearly,thetitlelengthofarticlesinthetop10%ofnoveltyhaveshortertitlesthantherestofarticles.
Fig.4showstheevolution ofaveragetitle lengthovertimeandbyjournalqualityusingtheEconLitsamplefrom1974 to 2011, 567,218articles. For unrankedand B ranked journals, we observe a clearincreasing tendency. The average title length ofarticles publishedinthesetwo categorieshasincreasedsubstantially over time (despiteaslighttemporary de- creasearound1990).Thisincreaseisconsistentwiththelargeriseinthenumberofarticlespublishedovertheperiodand withageneraltendencytowardsmorespecialization.Asauthorsdevelopfinerandfinerextensionsofseminalworks,they likelyneed toadoptlonger titlestodescribe their research.Interestingly, however,AandAAranked journalsdisplay dif- ferenttrends.Fortop fieldandtop generalistjournals,wealsoobservea similarincreasing trendatthebeginningofthe period.However,abreakoccursatsometimeinthe1980’sforthetwocategories.Averagetitlelengththenbecomesslightly decreasingwithtimeforarticlespublishedinArankedjournalsandstaysessentiallyinvariantforarticlespublishedinAA
8For robustness, we also look at two alternative indices: one built from relative frequencies of keyword pairs compared to articles published in the same year and another based on the atypicality of individual keywords. The results are robust to these alternative measures of novelty, see Supplementary Appendix A.
9For instance, the following title has 203 characters: “Analysis of the carbon sequestration costs of afforestation and reforestation agro-forestry practices and the use of cost curves to evaluate their potential for implementation of climate change mitigation”, Torres et al. (2010) .
0.005.01.015Density
0 50 100 150
Number of title's characters
>90th 10-90th
<10th
Fig. 3. Distribution of title length by novelty. Notes : The sample consists of all regular articles published between 1970 and 2011 in EconLit with at least one keyword, 464,835 articles. We consider the Epanechnikov kernel.
606570758085
1980 1990 2000 2010
Year of publication
AA A
B unranked
Fig. 4. Average title length over time and by journal quality. Notes : Five year moving averages of title length. The sample consists of all regular articles published between 1974 and 2011 and referenced in EconLit , 567,218 articles. We consider the Epanechnikov kernel.
rankedjournals.Apossibleexplanationbehindthesedistincttendenciescould bethedocumentedincreaseincompetition overtheperiod(CardandDellaVigna,2013).10 Aspublication slotsbecomemore andmorescarceintop journalsauthors likelyincreasethequalityoftheirsubmittedpapers,leadingtoshortertitles.
10 Previously documented patterns consistent with increased competition include an increase in the number of submissions to the top 5 ( Card and DellaVigna, 2013 ), in number of coauthors ( Ductor 2015 ), in papers’ length ( Card and DellaVigna 2014 ) and in turnaround time, Ellison (2002) .
3. Empiricalmodel
Inthissection,wepresentthemainempiricalspecificationsthatwe usetostudytherelationshipbetweentitlelength andarticlequality.Forjournalquality,weconsidervariantsofthefollowingeconometricmodel:
log(qi)=
β
0+β
1lengthi+β
2pagesi+ F f=1λ
fJELi,f+
β
3Hr,t−1+β
4Tr,t−1+β
5Ar,t−1+δ
r+μ
t+i, (1) wherei denotesan article,rthearticle’sresearch teamandtthe yearwherethisarticle ispublished.The researchteam ofanarticleisthesetofits author(s).Forinstance,r={a}forallarticlessingle-authored byauthora whiler={a,b}for
articleswrittenbyauthorsaandb.Notethatauthorsareidentifiedinthedataonthebasisoftheirfirstandlastnames.In therelativelyinfrequentcaseswherefirstnamesarenotavailable,we applythenamedisambiguationalgorithm designed byVanderLeij(2006).11
The left-handside variable, log(qi),denotes the logofthe impactfactor ofthejournal inwhicharticle i ispublished.
On the righthand side, lengthi is thelength ofthe title of the article,pagesi is numberof pages and JELi,f denotes the proportionoftheJELcodesofthearticlesincategoryf.12Next,thevariablesHr,t−1,Tr,t−1andAr,t−1 representtime-varying characteristics of thearticle’s research team. Hr,t−1 is the degree of specialization of research team r,computed over all publicationsbyresearch teamrpublishedinyeart−1orbefore.13Tomeasurespecialization,weusetheHerfindahlindex asinDuctor (2015).Formally,letnt−1,f denotethenumberofarticleswrittenbytheteaminfieldfandpublishedinyear t−1orbefore.Letnt−1=F
f=1nt−1,f denotetheoverallnumberofarticlespublishedbytheteamuntilt−1.Articleswith multipleJELcodesaredividedandassignedproportionallytoeachfield.Then,
Hr,t−1= F
f=1
nt−1,f
nt−1
2.
Thismeasureismaximumandequalto1whenalltheteam’spapersarewritteninauniquefield.Alowervalueindicates alowerdegreeofspecialization.
Tr,t−1 is thepastoutput ofthe researchteam,computedastheoverall numberofpapers,weightedby journalquality, publishedbyresearch teamruntilt−1.Ar,t−1capturestheaveragepastoutputofthedifferentauthorsintheteam,com- putedastheoverallnumberofpapers,weightedbyjournalquality,publishedbyallauthorsoftheresearchteam.Notethat thesetwo measurescoincide insomespecific cases,forinstancewhenrhasa singleauthorwhohasneverwritten with someoneelseorwhentwoauthorswritealltheirpaperstogether.Theygenerallydiffer,however.Tr,t−1measuresthetrack recordoftheteamitselfwhileAr,t−1capturesthetrackrecordoftheteam’smembers.Thus,ateamcomposedoftwosenior researcherswhostartcollaboratingwillhaveahighAr,t−1andalowTr,t−1.
Wealsoincluderesearchteamdummiesδr.Forinstance,ifauthorsaandb writethreearticlesi,j andk together,the
dummyvariableδ{a,b}isequal to1forarticlesi, j andk andto0forall otherarticles. Thesedummiescontrolfortime- invariant characteristicsoftheresearch teams,suchasinnateabilitiesorethnic andgendercomposition.We alsopresent results without team dummies, but with dummiesfor the number ofauthors of the article. Finally, we include yearof publication dummies, μt, foreach yearfrom 1970 to2011. Theseyear dummiesaccount forany possibletime trendsin journalquality.Insomespecifications,wefurtherinteractyeardummies,oralineartimetrend,withtitlelength,todetect differencesintheeffectoftitlelengthacrosstime.
Tostudycitationsandnovelty,weaugmentthepreviousmodelwithjournalfixedeffects.Forinstance,letcidenotethe numberofcitations gatheredby articleiandj denotethejournalinwhichthe articleispublished. Webaseourcitation regressionsonthefollowingmodel:
log(ci+1)=
β
0+β
1lengthi+β
2pagesi+ Ff=1
λ
fJELi,f+
β
3Hr,t−1+β
4Tr,t−1+β
5Ar,t−1+δ
r+ν
j+μ
t+i (2) wherejournalfixedeffects,νj,nowcapturesystematicdifferencesincitationpatternsacrossjournals.Weadoptasimilar modeltoanalyzean article’snovelty, Novi.Thelogplusonetransformationofthenumberofcitations mitigatestheeffect ofhighlycitedpapers,seeDuctor(2015).14
11We also apply this algorithm to account for middle initials, see the Appendix of Chapter 2 in Van der Leij (2006) for details.
12 For instance if an article has JEL codes B 2, C 1 and C 5, then JEL i,B= 1 / 3 , JEL i,C = 2 / 3 and JEL i,f= 0 if f ∈ { B, C }.
13By assumption, H r,t−1= T r,t−1= 0 if research team r has not published any article in year t −1 or before.
14Our results are robust to looking at the number of citations directly and to count data models such as poisson and negative binomial, see Supplemen- tary Appendix A.
Table 1
Title length and journal quality.
Variables/Samples: (1) (2) (3) (4) (5) (6) (7)
All All TFE TFE TFE TFE TFE
Title length −0.0064 ∗∗∗ −0.0060 ∗∗∗ −0.0062 ∗∗∗ −0.0019 ∗∗∗ −0.0018 ∗∗∗ −0.0020 ∗∗∗ −0.0020 ∗∗∗
(0.0 0 01) (0.0 0 01) (0.0 0 01) (0.0 0 01) (0.0 0 01) (0.0 0 01) (0.0 0 01)
Pages 0.0149 ∗∗∗ 0.0147 ∗∗∗ 0.0036 ∗∗∗ 0.0037 ∗∗∗ 0.0039 ∗∗∗ 0.0038 ∗∗∗
(0.0 0 05) (0.0 0 08) (0.0 0 03) (0.0 0 03) (0.0 0 04) (0.0 0 04) Number of authors = 2 0.2895 ∗∗∗ 0.4103 ∗∗∗
(0.0069) (0.0112) Number of authors = 3 0.3349 ∗∗∗ 0.4867 ∗∗∗
(0.0082) (0.0195) Number of authors = 4 0.2532 ∗∗∗ 0.4828 ∗∗∗
(0.0153) (0.0704)
Team specialization 0.1422 ∗∗∗ 0.1170 ∗∗∗ 0.1167 ∗∗∗
(0.0100) (0.0127) (0.0127)
Past output team −0.1889 ∗∗∗ −0.1868 ∗∗∗ −0.1867 ∗∗∗
(0.0041) (0.0047) (0.0047)
Avg. past output authors 0.0205 ∗∗∗ 0.0291 ∗∗∗ 0.0291 ∗∗∗
(0.0051) (0.0060) (0.0060)
Novelty −0.0152
(0.0110)
Observations 4 88,64 9 4 88,64 9 279,239 279,239 279,239 216,494 216,494
R-squared 0.0274 0.1306 0.1371 0.6276 0.6359 0.6397 0.6397
Year dummies No Yes Yes Yes Yes Yes Yes
Fields shares No Yes Yes Yes Yes Yes Yes
Team dummies No No No Yes Yes Yes Yes
Notes : In columns 1–7, we estimate the relationship between the impact factor of the journal where the article is published and the article’s title length, the dependent variable is in log. In columns 1 and 2 (All) we consider the full sample of articles. In columns 3–5 (TFE) the sample is restricted to articles published by a research team (and sole authors) with at least two publications. In columns 6 and 7 the sample is restricted to articles published by a research team (and sole authors) with at least two publications and articles with at least one keyword. Pages is the number of pages of the article; Number of authors == 2, 3, > 3 are dummy variables if the number of authors publishing the article is 2, 3, or more than 3, respectively; Team specialization is a herfindhal index obtained using the shares of past publications in different fields in economics, as defined by the first digit of the JEL codes; Past output team is the number of papers adjusted by quality that the research team has published together in the past; Avg. past output authors is the average number of papers adjusted by quality published by the authors of the research team in the past; Novelty is an index of the novelty of the article obtained measuring the atypicality of the keywords of the article. All the regressions use clustered standard errors at the team level. ∗∗∗p < 0.01, ∗∗p < 0.05, ∗p < 0.1
4. Results
Wenowpresentourmainresults,analyzingtherelationbetweentitlelengthandthethreedifferentqualitymeasuresin turn.Table1presentsestimatesofvariantsofEq.(1),regressingthelogarithmoftheimpactfactorofthejournalinwhich thearticleispublishedoverthelengthofthearticle’stitleandexpandingsetsofcontrols.15Column(1)presentsresultsof estimationswithoutcontrols.WefocusinourregressionsonarticlespublishedinjournalswhoseextendedKYimpactfactor canbecomputedandwithJELcodeinformation.About13.7%ofthepapersintheoriginalsamplearepublishedinjournals whoseimpactfactorcannotbe computedandafurther2.8%doesnothaveinformationonJEL codes.Ourmainregression samplethus containsinformationon488,649articles.Therawestimate is−0.0064.Togeta senseofthemagnitude,this estimateimpliesthataswitchfromthefirstdecile(39)tothelastdecile(121)ofthetitlelengthdistributionisassociated witha 52% increasein the journal’simpactfactor. In theestimations underlyingColumn (2),we control forthe article’s numberofpages,dummiesfordifferentnumbersofauthors,JEL codedummies, andyeardummies.Controllingforthese factorshaslittleimpactontheestimate.
Wethen includeresearch teamdummiesandthree time-varyingcharacteristics oftheresearch teams (degreeofspe- cialization,pastoutputjointlypublishedbytheteamandaveragepastoutputofalltheauthorsintheresearchteam).The teamdummiescontrolforheterogeneityattheleveloftheteamofauthors.Notethat identificationnowreliesonarticles publishedbyteamswithatleasttwopublications.Todetectpotential selectioneffects,we reportinColumn(3)resultsof estimationonthissubsamplewithoutteamfixedeffectsandcharacteristics.Theestimateisessentiallyunchanged.Wethen reportresultsofestimationsincludingteamfixedeffectsinColumn(4).Theestimateisroughlydividedby3.Therefore,two thirdsoftheoriginaleffectisduetocomposition:betterteamstendtopublishpapersinbetterjournalsandwithshorter titles.Theremainingthirdcapturesthefactthat,evencontrollingfortime-invariantteamheterogeneity,paperswithshorter titlestendtobepublishedinbetterjournals.Notethattheeffectisstillquantitativelysignificant.Aswitchfromfirsttolast decileofthetitlelengthdistributionisnowassociatedwitha16%increaseinjournalimpactfactor.
15 Our results are robust to regressing journal quality over powers of the article’s title length, see Supplementary Appendix A.
Table 2
Title length and citations.
Variables/Samples: (1) (2) (3) (4) (5) (6) (7)
All All TFE TFE TFE TFE TFE
Title length −0.0017 ∗∗∗ −0.0 0 08 ∗∗∗ −0.0013 ∗∗∗ −0.0015 ∗∗∗ −0.0014 ∗∗∗ −0.0014 ∗∗∗ −0.0014 ∗∗∗
(0.0 0 01) (0.0 0 01) (0.0 0 02) (0.0 0 02) (0.0 0 02) (0.0 0 02) (0.0 0 02) Pages 0.0224 ∗∗∗ 0.0164 ∗∗∗ 0.0269 ∗∗∗ 0.0217 ∗∗∗ 0.0217 ∗∗∗ 0.0232 ∗∗∗ 0.0231 ∗∗∗
(0.0017) (0.0016) (0.0029) (0.0030) (0.0030) (0.0027) (0.0027) Number of authors = 2, 0.2769 ∗∗∗ 0.2270 ∗∗∗
(0.0099) (0.0084) Number of authors = 3 0.3943 ∗∗∗ 0.3170 ∗∗∗
(0.0121) (0.0107) Number of authors > 3 0.5548 ∗∗∗ 0.4223 ∗∗∗
(0.0234) (0.0213)
Team Specialization −0.0774 ∗∗∗ −0.0934 ∗∗∗ −0.0940 ∗∗∗
(0.0239) (0.0295) (0.0295)
Past output team −0.0507 ∗∗∗ −0.0551 ∗∗∗ −0.0551 ∗∗∗
(0.0044) (0.0051) (0.0051)
Avg. past output authors 0.0230 ∗∗∗ 0.0321 ∗∗∗ 0.0322 ∗∗∗
(0.0063) (0.0073) (0.0073)
Novelty −0.0600 ∗∗∗
(0.0210)
Observations 161,699 161,699 87,728 87,728 87,728 66,823 66,823
R-squared 0.1502 0.2858 0.2847 0.5848 0.5859 0.5991 0.5992
Year dummies Yes Yes Yes Yes Yes Yes Yes
Fields shares Yes Yes Yes Yes Yes Yes Yes
Journals dummies No Yes Yes Yes Yes Yes Yes
Team dummies No No No Yes Yes Yes Yes
Notes : In columns 1 to 7, we estimate the relationship between cumulative citations from year of publication to 2013 and the article’s title length, the dependent variable is in log(x + 1). In columns 1 and 2 (All) we consider the full sample of articles. In columns 3–5 (TFE) the sample is restricted to articles published by a research team (and sole authors) with at least two publications. In columns 6 and 7 the sample is restricted to articles published by a research team (and sole authors) with at least two publications and articles with at least one keyword. Pages is the number of pages of the article; Number of authors == 2, 3, > 3 are dummy variables if the number of authors publishing the article is 2, 3, or more than 3, respectively; Team specialization is a herfindhal index obtained using the shares of past publications in different fields in economics, as defined by the first digit of the JEL codes; Past output team is the number of papers adjusted by quality that the research team has published together in the past; Avg. past output authors is the average number of papers adjusted by quality published by the authors of the research team in the past; Novelty is an index of the novelty of the article obtained measuring the atypicality of the keywords of the article. All the regressions use clustered standard errors at the team level. ∗∗∗p < 0.01, ∗∗p < 0.05, ∗p < 0.1
Addingtime-varyingteamcharacteristics,inColumn(5),leavestheestimateassociatedwithtitle lengthessentiallyun- affected.Theseresultsreveal someinteresting effects,however.Specializationispositivelyassociatedwithjournalquality, whichisconsistentwiththeexistence ofpositivereturnstospecializationinthepublication process.Authors’pastoutput isalso positivelyassociated withjournalquality,whichcould reflecta positiveimpact ofexperienceandpast successon publications.Controllingforauthors’pastoutput,however,teampastoutput isnegatively associatedwithjournalquality.
Authorswithdiversesetsofcoauthorsthustendtopublishinbetterjournalsthanauthorswhoalwayspublishtogether.
Finally,we investigatewhethertheobserved negativerelationshipbetweentitle length andjournalqualitycan be ex- plained by novelty, as more novel articles may have shortertitles and be published in better journals. To compute the noveltyscore,weneedtofurtherrestrictthesampletoarticlesforwhichkeywordsareavailable.InColumn(6),wereport resultsfromtheregressionsunderlyingColumn(5)onthisnewsubsample.Themainestimateislittleaffected.Wethenadd thenoveltyscoretothecontrolsandreportresultsinColumn(7).Theestimateremainsunchanged.Therefore,thenegative associationbetweentitlelengthandjournalqualityisnotmediatedthroughnovelty.
Wenextturntotherelationbetweentitlelengthandcitations.FollowingEq.(2),weregressthelogarithmofthenumber ofcitationsofthearticlein2011plusoneagainstthelengthofthearticle’stitleandcontrols.Yeardummiescontrolforthe factthatcitationsaccumulateovertime.16Column(1)ofTable2presentsresultsofregressionsincludingthesamecontrols asinColumn(2)ofTable1.Theestimateis−0.0017.Aswitchfromfirsttolastdecileofthetitlelengthdistributionisthen associatedwitha14%increaseinthenumberofcitations.
Inthenextsetofregressions,weincludejournaldummies. ResultsarereportedinColumn(2)ofTable2.Theestimate decreasesinmagnitudeandisnow−0.0008.ComparingColumns(1)and(2)weseethatanimportantfractionoftheas- sociationbetweentitle lengthandcitations isexplainedbytimeinvariant characteristicsspecifictothejournalwherethe articleis published. Wethen includeresearch team dummiesandtime varyingteam characteristics,we report resultsin Column (4)and(5).Interestinglythe estimate,whichis stillnegativeandstatisticallysignificant,is nowlarger inmagni-
16Our results are robust to alternative ways to account for this accumulation process, such as looking at citations per year, number of citations during the first five years after publication, or the percentile in the citation distribution, see Supplementary Appendix A.
Table 3
Title length and citations by journal quality.
(1) TFE (2) AA (3) A (4) B
Title length −0.0 0 06 ∗∗ −0.0023 ∗∗ −0.0026 ∗∗∗ −0.0013 ∗∗∗
(0.0 0 03) (0.0 0 09) (0.0 0 05) (0.0 0 02) Journal Quality IF 0.2540 ∗∗∗
(0.0098) Title length ∗Journal quality IF −0.0 0 04 ∗∗∗
(0.0 0 01)
Pages 0.0230 ∗∗∗ 0.0388 ∗∗∗ 0.0105 ∗∗∗ 0.0381 ∗∗∗
(0.0028) (0.0085) (0.0032) (0.0013) Team Specialization −0.0668 ∗∗∗ 0.0656 −0.1314 ∗∗ −0.0795 ∗∗∗
(0.0242) (0.1282) (0.0623) (0.0283) Past output team −0.0584 ∗∗∗ −0.0726 ∗∗∗ −0.0511 ∗∗∗ −0.0568 ∗∗∗
(0.0047) (0.0158) (0.0088) (0.0077) Avg. past output authors 0.0236 ∗∗∗ 0.0067 0.0328 ∗∗ 0.0364 ∗∗∗
(0.0066) (0.0231) (0.0137) (0.0094)
Observations 87,728 7431 18,998 46,343
R-squared 0.5639 0.5776 0.5844 0.5883
Year dummies Yes Yes Yes Yes
Fields shares Yes Yes Yes Yes
Journals dummies No Yes Yes Yes
Team dummies Yes Yes Yes Yes
Notes : In column 1, the results are obtained using team of authors with more than one ob- servation. Column 2–4 report the result using the sample of articles published in AA-ranked journals, A-ranked journals and B-ranked journals, respectively. The dependent variable is in
log(x + 1), Journal quality IF is in log. Pages is the number of pages of the article; Number
of authors == 2, 3, > 3 are dummy variables if the number of authors publishing the article is 2, 3, or more than 3, respectively; Team specialization is a herfindhal index obtained using the shares of past publications in different fields in economics, as defined by the first digit of the JEL codes; Past output team is the number of papers adjusted by quality that the research team has published together in the past; Avg. past output authors is the average number of papers adjusted by quality published by the authors of the research team in the past. All the regressions use clustered standard errors at the team level. ∗∗∗p < 0.01, ∗∗p < 0.05, ∗p < 0.1
tude.17Therefore,articleswithshortertitlestendtobemorecited,evencontrollingforjournaltime-invariantcharacteristics andforteamtime-invariantandtime-varyingobservedcharacteristics.Toillustrate,consideraresearchteampublishingtwo articlesintheAmericanEconomicReviewin2001,onewith39charactersinitstitleandtheotherwith122characters.The articlewiththeshortertitleispredictedtoreceive15.3citationsin2013whiletheonewiththelongertitlewouldreceive 13.6citations.18
Controllingforteamcharacteristicsthereforehasaverydifferenteffectdependingonwhetherwelookatjournalquality orcitations. Alargepartofthenegativeassociationbetweentitlelength andjournalqualityisduetothefactthat teams withbetterability and, forinstance,more experienced authorspublish articlesin betterjournals andwithshortertitles.
Thisconformsto ourinitial expectationsince teamcharacteristics are primarydeterminants ofan article’soutcomesand content.Surprisingly,however,controllingforteamcharacteristicshaslittleimpactontheassociationbetweentitlelength andcitations.
WealsofinditquiteinterestingtocontrastresultreportedinColumn(5)ofTable2withthosereportedinColumn(5)of Table1.Specializationisnownegativelyassociatedwithcitations,incontrasttoitspositiveassociationwithjournalquality.
Thearticlespublishedbymorespecializedteamsthereforereceivelesscitations,eventhoughtheytendtobepublishedin betterjournals.Thiscouldreflectthefactthattheytendtoreachanarroweraudience.Bycontrast,estimatesofpastteam outputandpastoutput ofteam membershavesimilarsigns.Articlespublishedbyauthors withbetter trackrecordsthus tendtoreceivemorecitations andtobepublishedinbetterjournals,consistentlywiththeMattheweffect(Merton,1968).
Andarticlespublishedbyauthorswhopublishexclusivelywiththesamecoauthors tendtoreceivelesscitationsandtobe publishedinlower-rankedjournals.
Finally, we include novelty as a control and report results in Column (7). The main estimate is unchanged. The negative association between title length and citations is therefore not explained by the fact that more novel papers tend to have shorter titles and to be more cited. Surprisingly, the relationship between novelty and citations is nega- tiveandstatisticallysignificant. Controllingfortitlelength, journaltimeinvariant characteristics andteam characteristics, more novel articles therefore tend to be lesscited. Thesefindings complement recent evidence on novelty in research:
17 As shown by results reported in Column (3), part of this increase is due to a composition effect. The association between title length and citations is stronger on the subset of articles published by research teams with at least two publications.
18 To get these predictions, the other control factors are evaluated at the mean.
Table 4
Title length and novelty.
Variables/Samples: (1) (2) (3) (4) (5)
All All TFE TFE TFE
Title length −0.0 0 03 ∗∗∗ −0.0 0 04 ∗∗∗ −0.0 0 04 ∗∗∗ −0.0 0 03 ∗∗∗ −0.0 0 03 ∗∗∗
(0.0 0 0 0) (0.0 0 0 0) (0.0 0 0 0) (0.0 0 0 0) (0.0 0 0 0) Pages −0.0014 ∗∗∗ −0.0012 ∗∗∗ −0.0012 ∗∗∗ −0.0010 ∗∗∗ −0.0010 ∗∗∗
(0.0 0 01) (0.0 0 01) (0.0 0 01) (0.0 0 01) (0.0 0 01) Number of authors = 2, −0.0080 ∗∗∗ −0.0049 ∗∗∗
(0.0010) (0.0 0 09) Number of authors = 3 −0.0057 ∗∗∗ −0.0028 ∗∗
(0.0013) (0.0013) Number of authors > 3 −0.0078 ∗∗∗ −0.0078 ∗∗∗
(0.0028) (0.0028)
Team Specialization −0.0235 ∗∗∗
(0.0030)
Past output team −0.0 0 01
(0.0 0 08)
Avg. past output authors 0.0028 ∗∗∗
(0.0011)
Observations 464,815 464,614 246,824 246,824 246,824
R-squared 0.2985 0.3300 0.3299 0.5567 0.5569
Year dummies Yes Yes Yes Yes Yes
Fields shares Yes Yes Yes Yes Yes
Journals dummies No Yes Yes Yes Yes
Team dummies No No No Yes Yes
Notes : In columns 1–5, we estimate the relationship between the novelty of the articles obtained as keywords atypicality and the article’s title length. In columns 1 and 3 (All) we consider the full sample of articles with keywords. In columns 3–5 (TFE) the sample is restricted to articles published by a research team (and sole authors) with at least two publications. Pages is the number of pages of the article;
Number of authors == 2, 3, > 3 are dummy variables if the number of authors publishing the article is 2, 3, or more than 3, respectively; Team specialization is a herfindhal index obtained using the shares of past publications in different fields in economics, as defined by the first digit of the JEL codes; Team experience is the number of papers that the research team has published together; Past output team is the number of papers adjusted by quality that the research team has published together in the past; Avg.
past output authors is the average number of papers adjusted by quality published by the authors of the research team in the past. All the regressions use clustered standard errors at the team level. ∗∗∗p < 0.01,
∗∗p < 0.05, ∗p < 0.1
Boudreauetal.(2016)showthatnovelresearchgrantproposalsareassociatedwithlowerevaluationsbycommitteeswhile Leeetal.(2015)showinadifferentcontextthatnovelarticlesindeedappeartoaccumulatecitationsatalowerpace.
We further lookathow the relationbetween title length andcitations variesby journalquality.We first includethe journal’simpactfactoranditsinteractionwithtitlelengthinacitationregression.TheresultsarereportedinColumn(1)of Table3.Thenegativecoefficientoftheinteractiontermshowsthattherelationbetweentitlelengthandcitationsishigher forjournalswithhigherimpactfactor.Wethenestimateseparatecitationregressionsforeachjournalcategory.Theresults arereportedinColumns(2),(3)and(4)ofTable3.The effectistwice aslargeinAandAAjournalsasinB journals.The relationbetweentitlelengthandcitationsisthereforestrongerinbetterjournals.
Third,westudytheassociationbetweentitlelengthandthenoveltyoftheresearcharticlemeasuredusingtheatypicality ofkeywords pairs.The resultspresentedinTable4 reveala negativerobustassociation betweentitlelength andnovelty.
The estimate is equal to −0.0003 inthe specification that control forteam characteristics, seeColumn (5), andchanges verylittleacrossspecifications.Quantitatively,aswitchfromthefirstdeciletothelastdecileofthetitlelengthdistribution is associatedwitha 0.0247 increase innovelty, corresponding to a+8.3% increase comparedto average novelty, equalto 0.31.Articleswithshortertitlesthereforetendtoscorehigheronthenoveltyindex,evenwhencontrollingforjournalfixed effectsandteamcharacteristics.19
Fourth, we analyzewhether the relation betweentitle length andscientific quality varied over time. We modify our preferredempirical specifications(including journalandteam fixed effectsand teamtime-varyingcharacteristics)in two ways.Wefirstincludelineartimetrends(ratherthanyearfixedeffects)andlineartimetrendsinteractedwithtitlelength in theregressions.Columns (1)–(3) of Table5 report resultsoftheseregressions on thethree differentquality measures studied:journalquality,citationsandnovelty.Wethenrelaxthelinearityassumptionandincludeyeardummiesaswellas yeardummiesinteractedwithtitle length intheregressions.Results fromthismoreflexible specificationarereportedin Column(4)–(6)ofTable5.
19In line with expectations, team specialization is negatively associated with novelty while team members past output is positively associated with novelty.