Title length

(1)

HAL Id: hal-01834038

https://hal-amu.archives-ouvertes.fr/hal-01834038

Submitted on 11 Apr 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Title length

Yann Bramoullé, Lorenzo Ductor

To cite this version:

Yann Bramoullé, Lorenzo Ductor. Title length. Journal of Economic Behavior and Organization,

Elsevier, 2018, 150, pp.311 - 324. �10.1016/j.jebo.2018.01.014�. �hal-01834038�

(2)

Title length

Yann Bramoullé

^a

, Lorenzo Ductor

^b^,^∗

aAix-Marseille University (Aix-Marseille School of Economics), CNRS & EHESS, Centre de la Vieille Charité, Marseille, France

bMiddlesex University London, The Burroughs, NW4 4BT, London, United Kingdom

a b s t r a c t

Wedocumentstrongandrobustnegativecorrelationsbetweenthelengthofthetitleofan

economicsarticleand differentmeasuresofscientiﬁcquality.Analyzingallarticlespub-lishedbetween1970and 2011and referenced inEconLit, weﬁndthatarticleswithshorter

titlestendto bepublishedinbetterjournals,to bemorecited andtobemore innova-

tive.Thesecorrelationsholdcontrollingforunobservedtime-invariantandobservedtime- varyingcharacteristicsofteamsofauthors.

1. Introduction

Wedocumenta strongandrobust negativecorrelation betweenthelength ofthetitle ofan economicsarticleandits scientiﬁcquality.Articleswithshortertitlestendtobepublishedinbetterjournals.Controllingforjournalquality,articles with shorter titles tend to receive more citations. To illustrate, papers publishedbetween 1970 and 2011 in economics journalshadatitlecomposed,onaverage,of78characters.Overthesameperiod,theaveragetitlelengthofpapersinthe AmericanEconomicReviewwas60characters.WithintheAER,thetop20mostcitedarticleshadan averagetitlelengthof 52characters.

Thesecorrelations mayhave different explanations. Title length could have a causal impact on quality measures. For instanceashorttitle could makea papereasiertomemorize,whichcould directlyincrease itscitations. Title lengthand qualitycouldalsohavecommondeterminants.Authors’research andwritingskills,inparticular,could affectboththepa- per’smeasured outcomesandthe lengthofitstitle. Goodresearchersmaysystematicallyadoptshortertitlesandpublish well-citedpapersingoodjournals.Othercommondeterminantsmaybehardertomeasure.Wenotablyexpectmorenovel papers tohave shorter titles.Seminal articles oftenhave very short titles;consider “A Theoryof Production” (Cobband Douglas,1928), with22 charactersor“EconomicGrowthandIncomeInequality” (Kuznets,1955) with37 characters. Fur- therstudiesonthesetopicshadtoadoptlongertitles.Novelty isnoteasyto measure,however,andcouldbe imperfectly correlatedwithjournalqualityorcitationcounts.

∗ Corresponding author.

E-mail address: [email protected] (L. Ductor).

(3)

Weprovidean in-depthanalysisoftherelationshipbetweentitle lengthandscientiﬁcquality.Weexamine allarticles publishedbetween1970and2011andreferencedinEconLit.Ourinitial samplecontainsmorethan 580,000articles.¹ We consider three measures ofarticle quality. We look atan impact factor of the journalin which the article is published andatthenumberofcitationsreceivedby thearticle.Citationscapturetheimpact ofthearticleonsubsequentresearch.

Somerecentstudiesindicate,however,thatcitationsmaybelittlecorrelatedwiththearticle’snovelty,seeLeeetal.(2015). We build onrecentadvances in bibliometricanalysis andconstructan indexofnovelty basedon keywordatypicality, as in Boudreau et al. (2016) and Sreenivasan (2013). Keywords capture central aspects of research and more novel papers likely have atypical keywords or keyword combinations. These three measures - journalquality, citations and novelty - capturedifferentdimensionsofthescientiﬁcqualityofanarticle.Ourmainobjectiveistobetterunderstandtherelationship betweentitlelengthandthesedifferentdimensions.

Weestimateregressionsofanarticle’squalitymeasureontitlelengthandonanexpandingsetofcontrols.Wecontrol, first,forthe article’snumberofpages andforJELcode fixedeffects, capturingsystematicdifferencesacrossfields.When studyingcitationsandnovelty,wecontrolforjournalfixedeffects.Wethusanalyzehowcitationsornoveltyarerelatedto titlelengthcontrollingforjournalunobserved heterogeneity.Importantly,wealsocontrolforcharacteristicsoftheauthors.

We include a full set of dummies atthe team level.² Theseteam of authors dummies capturetime-invariant (observed andunobserved)characteristicsoftheteams,suchasteamabilityandgendercomposition.Wealsocontrolforobservable time-varyingteamcharacteristicssuch asspecialization,past teamoutputandaveragepast outputoftheteammembers.

Weexpecttheseteamdummiesandtime-varyingcharacteristicstoaccountreasonablywellfortheimpactofauthors’skills andability.Finally,we includenoveltyasacontrol intheregressionsonjournalquality andcitationstoseewhetherthe observedrelationshipsbetweentitlelengthandthesemeasuresaremediatedthroughnovelty.

Inallspecifications,ourmainestimatesarenegativeandstatisticallyandquantitativelysignificant.Articleswithshorter titlestendtobepublishedinbetterjournals.Theytendtobemorecitedandtogethighernoveltyscores.Moreover,these tendencies aremorepronouncedinbetter journals.Controllingforteamcharacteristicshasastrongimpactintheregres- sions onjournalquality.Themainestimate, whichremainsstatisticallyandquantitativelysignificant, isapproximatelydi- videdbythree.Thisshowsthatbetterteamsindeedhaveastrongsystematictendencytopublisharticlesinbetterjournals andwithshortertitles.Bycontrast,themainestimatesareessentiallyunchangedwhencontrollingforteamcharacteristics inregressionsoncitations ornovelty.Articleswithshortertitlestendtobe morecitedandmorenovelthan articleswith longertitles,controllingforauthors’skillsandabilities.

Moreover,includingnoveltyintheregressionsonjournalqualityandcitationshasessentiallynoimpactontheestimates.

Thismeansthattheobservedrelationbetweentitlelengthandcitationsisnotexplainedbythefactthatnovelarticlestend to haveshortertitlesandtobe morecited. Together,theseresultsshow that title lengthcorrelates well withtheoverall scientiﬁcqualityofapaper.

Ouranalysiscontributestothescientiﬁcstudyoftheacademicprocess.³ Tworecentstudiesdocumentanegativeasso- ciationbetweentitlelengthandcitationsonlargesamples,controllingforjournalﬁxedeffects.Letchfordetal.(2015)look at highly cited articles in the Web of Science across all disciplines. Independently of our analysis, Gnewuch and Wohlrabe(2017)considereconomicsarticlepublishedbetween1980and2015andreferencedintheWebofScience.⁴

Webuildontheseearlierstudiesandmaketwomaincontributionstothestudyoftherelationbetweentitlelengthand scientificquality.First,weanalyzetwootherdimensionsofscientificqualitythancitations.Weprovidethefirstlarge-scale studyofthe relationbetweentitle length andtheimpactfactor ofthejournalin whichthearticle ispublished. Perhaps moreimportantly,webuildan originalindexofnovelty,basedonkeywordatypicality,andprovidethefirstanalysisofthe linkbetweennoveltyandtitlelength.Weshowthattitlelengthisalsostronglynegatively correlatedwiththesetwomea- sures.Second,weprovidethefirstanalysiscontrollingfortime-invariantandobservedtime-varyingcharacteristicsofteams ofauthors.This allowsusto account formaindeterminantsof scientificquality,which wereignored inprevious studies.

We show that twothirds oftheassociation betweentitle length andjournalquality isexplained by teamcharacteristics whilesurprisinglytheobservedrelationbetweentitlelengthandcitationsisquantitativelyrobusttotheinclusionofteam characteristics intheregressions.⁵ Overall,ourresults reveala strongandrobust negativerelationbetweenthe lengthof thetitleofanarticleanditsscientiﬁcquality.

Theremainder ofthepaperisorganizedasfollows.We describethedatainSection 2.Wepresentourempiricalspec- iﬁcations inSection 3.We studytherelation betweentitle length,journalquality,citations andnoveltyin Section4 and concludeinSection5.

1We provide descriptive statistics and ﬁgures based on this initial sample in Section 2 . In our regressions, the largest sample studied contains about 490,0 0 0 articles.

2Suppose, for instance, that author a has published single-authored articles, articles with author b , articles with authors b and c , and articles with author d . This leads to the inclusion of four different dummies in the regressions: δ{a}, δ{a,b}, δ{a,b,c}and δ{d}where, for instance, δ{a,b}= 1 for articles written by a and b and 0 otherwise.

3In economics, see, for instance, Bramoullé et al. (2014), Ductor et al. (2014) and Fafchamps et al. (2010) .

4Researchers have been looking for correlates of articles’ quality and impact, see, e.g., Haslam et al. (2008) , Hartley et al. (1988) , Smart and Bayer (1986) and Webster et al. (2009) . For instance, article length appears to be a good predictor of citations, across different periods and disciplines, see Card and DellaVigna (2013) , Falagas et al. (2013) and Fox et al. (2016) . The negative relation with title length was noticed in some early studies on small samples, see Jacques and Sebire (2009) , Jamali and Nikzad (2011) and Habibzadeh and Yadollahie (2010) .

5We also control for ﬁeld ﬁxed effects, which are ignored in Letchford et al. (2015) and Gnewuch and Wohlrabe (2017) .

(4)

0.005.01.015.02Density

0 50 100 150

Number of title's characters

AA A

B Unranked

Fig. 1. Distribution of title length by journal quality. Notes : The sample consists of all regular articles published between 1970 and 2011 and referenced in EconLit , 580,055 articles. We consider the Epanechnikov kernel.

2. Thedata

Weanalyse two main datasets. We firstuse informationon all articles publishedbetween1970 and 2011in journals listedinEconLit,thebibliographyofeconomicsjournalscompiled bythe AmericanEconomicAssociation.We excludefrom thesamplearticlescontainingintheirtitlethewords“note” , “comment” ,“preface” ,“remark” ,“reply” and“foreword”.⁶ Thisfirst datasetcontains 580,055 articles published in 1617 journals,and is thus composed of the entire body of aca- demiceconomicspublicationsovertheperiod.Wemeasurejournalqualitythroughastandardimpactfactorintroducedby KodrzyckiandYu (2006).The Kodrzycki andYuindex(hereafter,KY)uses an iterativeprocess,excluding self-citations of thejournalandweighting citationsontheinfluenceofthecitingjournal.Asthisindexisnotavailableforallthejournals listedinEconLit,weusethepredictedimpactfactorderivedinDuctor etal.(2014)forjournalsnotlistedinKY.⁷ Notethat thismeasureisconstantanddoesnotvaryovertime.Computingatime-varyingimpactfactorforthe1617journalslisted inEconLit wouldbe quite challengingas mostofthese journalsare relatively new andcitations are not easily available formostofthem.In addition,changesinjournalsimpact factorsappeartobe quitesmallineconomics, bothinabsolute termandrelativelytootherdisciplines,seeAlthouseetal.(2009).Asarobustnesscheck,weconsideralternativemeasures ofjournalquality.We notably analyzetwo rankings basedon coarse categories,whichshould be unaffected by marginal changesinspecific journalsimpact factors,seeSupplementary AppendixA.We findthatresults arerobust tothequality measure.

SinceEconLitdoesnotprovideinformationoncitations,weretrievedcitationsfromtheWebofScience(ThomsonReuters 2014).Duetotheexpensiveandtime-consumingnatureoftheretrievalprocess,wefocusonjournalsappearinginthelist oftheTinbergenInstitute.Thislistcontains117journalsrankedinthreecategories:AA(5journals),A(27journals),andB (85journals), seeAppendixA.Itarguablycontainsmostoftheestablished journalsofourdiscipline.The citationdataset includesinformationon161,699articlesandthenumberofcitationstheyreceivedyearlyuntil2013.Themostcitedpaper,

“ProspectTheory:anAnalysisofDecisionunderRisk” (KahnemanandTversky,1979),received8571citations,while13%of thepapershavenocitations.

We measure the novelty of an article by building an index based on atypicality of keyword combinations, following Boudreauetal.(2016) andSreenivasan (2013).Keywords capturecentralaspects oftheresearch.Thisindexmeasures the relativeinfrequencyofpairs ofkeywordsofthe article,comparedtopreviously publishedarticles.Formally,let K_i denote thesetofkeywordsofarticlei.LetN_t_;_k

1,k₂ denotethenumberofarticlespublishedinyeart orbefore whichcontainthe twokeywordsk₁andk₂.LetN_t denotetheoverallnumberofpairsofkeywordscontainedinarticlespublishedinyeartor

6These atypical articles only represent 1.1% of the full sample and tend to have longer titles and to be less cited. Including them in the analysis leads to estimates of the relation between title length and scientiﬁc quality that are slightly greater in magnitude.

7To illustrate, the KY impact factor is 27.1 for the American Economic Review , 6.6 for the Journal of Development Economics and 0.6 for the Paciﬁc Economic Review .

(5)

0.005.01.015Density

0 50 100 150

>90th 10-90th

<10th

Fig. 2. Distribution of title length by citations. Notes : The sample consists of all regular articles published between 1970 and 2011 in a journal listed in the Tinbergen Institute list, 161,699 articles. We consider the Epanechnikov kernel.

before.Deﬁnetheprobabilityofobservingthepairofkeywords{k₁,k₂}beforeyeartas Pt(^k1,k2)=N_t;k₁_,k₂

Nt

Notethat−ln(^Pt)^measures^theatypicalityofthepairofkeywords{k₁,k₂}inthisperiod.Novelty ofarticleipublishedin yeartisthenequaltothenormalizedaverageatypicalityofitspairsofkeywords.Formally,

No

v

i=−

k1=k2∈Kiln(^Pt(^k1,k2))

|

^Ki

||

^Ki+1

|

^ln(^Nt)

where|K_i|isthenumberofkeywordsofarticlei.Notethat0≤ −ln(^Pt)≤ln(^Nt)^and^hence^Novivariesbetween0and1.An article’snoveltygetsclosetoitsmaximalvalueof1whenallitspairsofkeywordsareextremelyatypical.⁸

Wemeasurethe lengthofanarticle’stitle bysimplycountingthenumberofcharacters, includingspacesandpunctu- ations. Averagetitle length in theEconlit sampleis 78; some papershave very longtitles.⁹ Descriptive statistics give us afirst ideaofthelinkbetweentitlelength andscientificquality.Overthewholeperiod,averagetitle lengthis61 forAA journals,66 forA journals,70for B journalsand81 forunrankedjournals. Fig.1 showsthe distributions oftitle length acrossjournalcategoriesusingtheentireEconLitsample,580,055articles.Weobserveaclearorderingofthedistributions byjournalquality.Statistically,anynotionthatdistributionsarenotorderedthroughFirst-OrderStochasticDominancerela- tionsalignedwithqualityisrejected.Fig.2showsthedistributionsoftitlelengthforarticlesinthefirstdecileofthecitation distribution,thoseinthelastdecileandallotherarticlesusingtheTIsample,161,699articles.Again,thesedistributionsare unambiguouslyorderedandarticleswithmorecitationsappeartohaveshortertitles.Fig.3showsthedistributionoftitle lengthacrossdifferentpercentilesofthenoveltydistribution,forthesampleofarticleswithatleastonekeyword,464,835 articles.Clearly,thetitlelengthofarticlesinthetop10%ofnoveltyhaveshortertitlesthantherestofarticles.

Fig.4showstheevolution ofaveragetitle lengthovertimeandbyjournalqualityusingtheEconLitsamplefrom1974 to 2011, 567,218articles. For unrankedand B ranked journals, we observe a clearincreasing tendency. The average title length ofarticles publishedinthesetwo categorieshasincreasedsubstantially over time (despiteaslighttemporary de- creasearound1990).Thisincreaseisconsistentwiththelargeriseinthenumberofarticlespublishedovertheperiodand withageneraltendencytowardsmorespecialization.Asauthorsdevelopfinerandfinerextensionsofseminalworks,they likelyneed toadoptlonger titlestodescribe their research.Interestingly, however,AandAAranked journalsdisplay dif- ferenttrends.Fortop fieldandtop generalistjournals,wealsoobservea similarincreasing trendatthebeginningofthe period.However,abreakoccursatsometimeinthe1980’sforthetwocategories.Averagetitlelengththenbecomesslightly decreasingwithtimeforarticlespublishedinArankedjournalsandstaysessentiallyinvariantforarticlespublishedinAA

8For robustness, we also look at two alternative indices: one built from relative frequencies of keyword pairs compared to articles published in the same year and another based on the atypicality of individual keywords. The results are robust to these alternative measures of novelty, see Supplementary Appendix A.

9For instance, the following title has 203 characters: “Analysis of the carbon sequestration costs of afforestation and reforestation agro-forestry practices and the use of cost curves to evaluate their potential for implementation of climate change mitigation”, Torres et al. (2010) .

(6)

0.005.01.015Density

0 50 100 150

>90th 10-90th

<10th

Fig. 3. Distribution of title length by novelty. Notes : The sample consists of all regular articles published between 1970 and 2011 in EconLit with at least one keyword, 464,835 articles. We consider the Epanechnikov kernel.

606570758085

1980 1990 2000 2010

Year of publication

AA A

B unranked

Fig. 4. Average title length over time and by journal quality. Notes : Five year moving averages of title length. The sample consists of all regular articles published between 1974 and 2011 and referenced in EconLit , 567,218 articles. We consider the Epanechnikov kernel.

rankedjournals.Apossibleexplanationbehindthesedistincttendenciescould bethedocumentedincreaseincompetition overtheperiod(CardandDellaVigna,2013).¹⁰ Aspublication slotsbecomemore andmorescarceintop journalsauthors likelyincreasethequalityoftheirsubmittedpapers,leadingtoshortertitles.

10 Previously documented patterns consistent with increased competition include an increase in the number of submissions to the top 5 ( Card and DellaVigna, 2013 ), in number of coauthors ( Ductor 2015 ), in papers’ length ( Card and DellaVigna 2014 ) and in turnaround time, Ellison (2002) .

(7)

3. Empiricalmodel

Inthissection,wepresentthemainempiricalspeciﬁcationsthatwe usetostudytherelationshipbetweentitlelength andarticlequality.Forjournalquality,weconsidervariantsofthefollowingeconometricmodel:

log(^qi)=

β

0+

β

1length_i+

β

2pages_i+ F f=1

λ

fJEL_i_,_f

+

β

³^H^r,t−1+

β

⁴^T^r,t−1+

β

⁵^A^r,t−1+

δ

^r⁺

μ

^t⁺

i, (1) wherei denotesan article,rthearticle’sresearch teamandtthe yearwherethisarticle ispublished.The researchteam ofanarticleisthesetofits author(s).Forinstance,r={â}^forâllârticlessingle-authored byauthora whiler={â,b}^for

articleswrittenbyauthorsaandb.Notethatauthorsareidentifiedinthedataonthebasisoftheirfirstandlastnames.In therelativelyinfrequentcaseswherefirstnamesarenotavailable,we applythenamedisambiguationalgorithm designed byVanderLeij(2006).¹¹

The left-handside variable, log(q_i),denotes the logofthe impactfactor ofthejournal inwhicharticle i ispublished.

On the righthand side, length_i is thelength ofthe title of the article,pages_i is numberof pages and JEL_i,_f denotes the proportionoftheJELcodesofthearticlesincategoryf.¹²Next,thevariablesH_r_,_t₋₁,T_r_,_t₋₁andA_r_,_t₋₁ representtime-varying characteristics of thearticle’s research team. H_r_,_t₋₁ is the degree of specialization of research team r,computed over all publicationsbyresearch teamrpublishedinyeart−1orbefore.¹³Tomeasurespecialization,weusetheHerﬁndahlindex asinDuctor (2015).Formally,letn_t₋₁_,_f denotethenumberofarticleswrittenbytheteaminﬁeldfandpublishedinyear t−1orbefore.Letn_t₋₁=F

f=1n_t₋₁_,_f denotetheoverallnumberofarticlespublishedbytheteamuntilt−1.Articleswith multipleJELcodesaredividedandassignedproportionallytoeachﬁeld.Then,

Hr,t−1= F

f=1

_n

t−1,f

nt−1

2

.

Thismeasureismaximumandequalto1whenalltheteam’spapersarewritteninauniqueﬁeld.Alowervalueindicates alowerdegreeofspecialization.

T_r_,_t₋₁ is thepastoutput ofthe researchteam,computedastheoverall numberofpapers,weightedby journalquality, publishedbyresearch teamruntilt−1.A_r,t−1capturestheaveragepastoutputofthedifferentauthorsintheteam,com- putedastheoverallnumberofpapers,weightedbyjournalquality,publishedbyallauthorsoftheresearchteam.Notethat thesetwo measurescoincide insomespeciﬁc cases,forinstancewhenrhasa singleauthorwhohasneverwritten with someoneelseorwhentwoauthorswritealltheirpaperstogether.Theygenerallydiffer,however.T_r,t−1measuresthetrack recordoftheteamitselfwhileA_r_,_t₋₁capturesthetrackrecordoftheteam’smembers.Thus,ateamcomposedoftwosenior researcherswhostartcollaboratingwillhaveahighA_r_,_t₋₁andalowT_r_,_t₋₁.

Wealsoincluderesearchteamdummiesδ^r^.^Forînstance,îfâuthorsâând^b ^write^threeârticlesî,^j ând^k ^together,^the

dummyvariableδ{a,b}isequal to1forarticlesi, j andk andto0forall otherarticles. Thesedummiescontrolfortime- invariant characteristicsoftheresearch teams,suchasinnateabilitiesorethnic andgendercomposition.We alsopresent results without team dummies, but with dummiesfor the number ofauthors of the article. Finally, we include yearof publication dummies, μt, foreach yearfrom 1970 to2011. Theseyear dummiesaccount forany possibletime trendsin journalquality.Insomespeciﬁcations,wefurtherinteractyeardummies,oralineartimetrend,withtitlelength,todetect differencesintheeffectoftitlelengthacrosstime.

Tostudycitationsandnovelty,weaugmentthepreviousmodelwithjournalﬁxedeffects.Forinstance,letc_idenotethe numberofcitations gatheredby articleiandj denotethejournalinwhichthe articleispublished. Webaseourcitation regressionsonthefollowingmodel:

log(^ci+1)=

β

⁰⁺

β

¹^lengthi+

β

²^pagesi+ F

f=1

λ

fJEL_i,_f

+

β

3Hr,t−1+

β

4Tr,t−1+

β

5Ar,t−1+

δ

r+

ν

j+

μ

t+

i (2) wherejournalﬁxedeffects,νj,nowcapturesystematicdifferencesincitationpatternsacrossjournals.Weadoptasimilar modeltoanalyzean article’snovelty, Nov_i.Thelogplusonetransformationofthenumberofcitations mitigatestheeffect ofhighlycitedpapers,seeDuctor(2015).¹⁴

11We also apply this algorithm to account for middle initials, see the Appendix of Chapter 2 in Van der Leij (2006) for details.

12 For instance if an article has JEL codes B 2, C 1 and C 5, then JEL i,B= 1 / 3 , JEL i,C = 2 / 3 and JEL i,f= 0 if f ∈ { B, C }.

13By assumption, H _r,t−1= T _r,t−1= 0 if research team r has not published any article in year t −1 or before.

14Our results are robust to looking at the number of citations directly and to count data models such as poisson and negative binomial, see Supplemen- tary Appendix A.

(8)

Table 1

Title length and journal quality.

Variables/Samples: (1) (2) (3) (4) (5) (6) (7)

All All TFE TFE TFE TFE TFE

Title length −0.0064 ^∗∗∗ −0.0060 ^∗∗∗ −0.0062 ^∗∗∗ −0.0019 ^∗∗∗ −0.0018 ^∗∗∗ −0.0020 ^∗∗∗ −0.0020 ^∗∗∗

(0.0 0 01) (0.0 0 01) (0.0 0 01) (0.0 0 01) (0.0 0 01) (0.0 0 01) (0.0 0 01)

Pages 0.0149 ^∗∗∗ 0.0147 ^∗∗∗ 0.0036 ^∗∗∗ 0.0037 ^∗∗∗ 0.0039 ^∗∗∗ 0.0038 ^∗∗∗

(0.0 0 05) (0.0 0 08) (0.0 0 03) (0.0 0 03) (0.0 0 04) (0.0 0 04) Number of authors = 2 0.2895 ^∗∗∗ 0.4103 ^∗∗∗

(0.0069) (0.0112) Number of authors = 3 0.3349 ^∗∗∗ 0.4867 ^∗∗∗

(0.0153) (0.0704)

Team specialization 0.1422 ^∗∗∗ 0.1170 ^∗∗∗ 0.1167 ^∗∗∗

(0.0100) (0.0127) (0.0127)

Past output team −0.1889 ^∗∗∗ −0.1868 ^∗∗∗ −0.1867 ^∗∗∗

(0.0041) (0.0047) (0.0047)

Avg. past output authors 0.0205 ^∗∗∗ 0.0291 ^∗∗∗ 0.0291 ^∗∗∗

(0.0051) (0.0060) (0.0060)

Novelty −0.0152

(0.0110)

Observations 4 88,64 9 4 88,64 9 279,239 279,239 279,239 216,494 216,494

R-squared 0.0274 0.1306 0.1371 0.6276 0.6359 0.6397 0.6397

Year dummies No Yes Yes Yes Yes Yes Yes

Fields shares No Yes Yes Yes Yes Yes Yes

Team dummies No No No Yes Yes Yes Yes

Notes : In columns 1–7, we estimate the relationship between the impact factor of the journal where the article is published and the article’s title length, the dependent variable is in log. In columns 1 and 2 (All) we consider the full sample of articles. In columns 3–5 (TFE) the sample is restricted to articles published by a research team (and sole authors) with at least two publications. In columns 6 and 7 the sample is restricted to articles published by a research team (and sole authors) with at least two publications and articles with at least one keyword. Pages is the number of pages of the article; Number of authors == 2, 3, > 3 are dummy variables if the number of authors publishing the article is 2, 3, or more than 3, respectively; Team specialization is a herfindhal index obtained using the shares of past publications in different fields in economics, as defined by the first digit of the JEL codes; Past output team is the number of papers adjusted by quality that the research team has published together in the past; Avg. past output authors is the average number of papers adjusted by quality published by the authors of the research team in the past; Novelty is an index of the novelty of the article obtained measuring the atypicality of the keywords of the article. All the regressions use clustered standard errors at the team level. ^∗∗∗p < 0.01, ^∗∗p < 0.05, ^∗p < 0.1

4. Results

Wenowpresentourmainresults,analyzingtherelationbetweentitlelengthandthethreedifferentqualitymeasuresin turn.Table1presentsestimatesofvariantsofEq.(1),regressingthelogarithmoftheimpactfactorofthejournalinwhich thearticleispublishedoverthelengthofthearticle’stitleandexpandingsetsofcontrols.¹⁵Column(1)presentsresultsof estimationswithoutcontrols.WefocusinourregressionsonarticlespublishedinjournalswhoseextendedKYimpactfactor canbecomputedandwithJELcodeinformation.About13.7%ofthepapersintheoriginalsamplearepublishedinjournals whoseimpactfactorcannotbe computedandafurther2.8%doesnothaveinformationonJEL codes.Ourmainregression samplethus containsinformationon488,649articles.Therawestimate is−0.0064.Togeta senseofthemagnitude,this estimateimpliesthataswitchfromtheﬁrstdecile(39)tothelastdecile(121)ofthetitlelengthdistributionisassociated witha 52% increasein the journal’simpactfactor. In theestimations underlyingColumn (2),we control forthe article’s numberofpages,dummiesfordifferentnumbersofauthors,JEL codedummies, andyeardummies.Controllingforthese factorshaslittleimpactontheestimate.

Wethen includeresearch teamdummiesandthree time-varyingcharacteristics oftheresearch teams (degreeofspe- cialization,pastoutputjointlypublishedbytheteamandaveragepastoutputofalltheauthorsintheresearchteam).The teamdummiescontrolforheterogeneityattheleveloftheteamofauthors.Notethat identificationnowreliesonarticles publishedbyteamswithatleasttwopublications.Todetectpotential selectioneffects,we reportinColumn(3)resultsof estimationonthissubsamplewithoutteamfixedeffectsandcharacteristics.Theestimateisessentiallyunchanged.Wethen reportresultsofestimationsincludingteamfixedeffectsinColumn(4).Theestimateisroughlydividedby3.Therefore,two thirdsoftheoriginaleffectisduetocomposition:betterteamstendtopublishpapersinbetterjournalsandwithshorter titles.Theremainingthirdcapturesthefactthat,evencontrollingfortime-invariantteamheterogeneity,paperswithshorter titlestendtobepublishedinbetterjournals.Notethattheeffectisstillquantitativelysignificant.Aswitchfromfirsttolast decileofthetitlelengthdistributionisnowassociatedwitha16%increaseinjournalimpactfactor.

15 Our results are robust to regressing journal quality over powers of the article’s title length, see Supplementary Appendix A.

(9)

Table 2

Title length and citations.

Variables/Samples: (1) (2) (3) (4) (5) (6) (7)

All All TFE TFE TFE TFE TFE

Title length −0.0017 ^∗∗∗ −0.0 0 08 ^∗∗∗ −0.0013 ^∗∗∗ −0.0015 ^∗∗∗ −0.0014 ^∗∗∗ −0.0014 ^∗∗∗ −0.0014 ^∗∗∗

(0.0 0 01) (0.0 0 01) (0.0 0 02) (0.0 0 02) (0.0 0 02) (0.0 0 02) (0.0 0 02) Pages 0.0224 ^∗∗∗ 0.0164 ^∗∗∗ 0.0269 ^∗∗∗ 0.0217 ^∗∗∗ 0.0217 ^∗∗∗ 0.0232 ^∗∗∗ 0.0231 ^∗∗∗

(0.0017) (0.0016) (0.0029) (0.0030) (0.0030) (0.0027) (0.0027) Number of authors = 2, 0.2769 ^∗∗∗ 0.2270 ^∗∗∗

(0.0121) (0.0107) Number of authors > 3 0.5548 ^∗∗∗ 0.4223 ^∗∗∗

(0.0234) (0.0213)

Team Specialization −0.0774 ^∗∗∗ −0.0934 ^∗∗∗ −0.0940 ^∗∗∗

(0.0239) (0.0295) (0.0295)

Past output team −0.0507 ^∗∗∗ −0.0551 ^∗∗∗ −0.0551 ^∗∗∗

(0.0044) (0.0051) (0.0051)

Avg. past output authors 0.0230 ^∗∗∗ 0.0321 ^∗∗∗ 0.0322 ^∗∗∗

(0.0063) (0.0073) (0.0073)

Novelty −0.0600 ^∗∗∗

(0.0210)

Observations 161,699 161,699 87,728 87,728 87,728 66,823 66,823

R-squared 0.1502 0.2858 0.2847 0.5848 0.5859 0.5991 0.5992

Year dummies Yes Yes Yes Yes Yes Yes Yes

Fields shares Yes Yes Yes Yes Yes Yes Yes

Journals dummies No Yes Yes Yes Yes Yes Yes

Team dummies No No No Yes Yes Yes Yes

Notes : In columns 1 to 7, we estimate the relationship between cumulative citations from year of publication to 2013 and the article’s title length, the dependent variable is in log(x + 1). In columns 1 and 2 (All) we consider the full sample of articles. In columns 3–5 (TFE) the sample is restricted to articles published by a research team (and sole authors) with at least two publications. In columns 6 and 7 the sample is restricted to articles published by a research team (and sole authors) with at least two publications and articles with at least one keyword. Pages is the number of pages of the article; Number of authors == 2, 3, > 3 are dummy variables if the number of authors publishing the article is 2, 3, or more than 3, respectively; Team specialization is a herfindhal index obtained using the shares of past publications in different fields in economics, as defined by the first digit of the JEL codes; Past output team is the number of papers adjusted by quality that the research team has published together in the past; Avg. past output authors is the average number of papers adjusted by quality published by the authors of the research team in the past; Novelty is an index of the novelty of the article obtained measuring the atypicality of the keywords of the article. All the regressions use clustered standard errors at the team level. ^∗∗∗p < 0.01, ^∗∗p < 0.05, ^∗p < 0.1

Addingtime-varyingteamcharacteristics,inColumn(5),leavestheestimateassociatedwithtitle lengthessentiallyun- affected.Theseresultsreveal someinteresting effects,however.Specializationispositivelyassociatedwithjournalquality, whichisconsistentwiththeexistence ofpositivereturnstospecializationinthepublication process.Authors’pastoutput isalso positivelyassociated withjournalquality,whichcould reﬂecta positiveimpact ofexperienceandpast successon publications.Controllingforauthors’pastoutput,however,teampastoutput isnegatively associatedwithjournalquality.

Authorswithdiversesetsofcoauthorsthustendtopublishinbetterjournalsthanauthorswhoalwayspublishtogether.

Finally,we investigatewhethertheobserved negativerelationshipbetweentitle length andjournalqualitycan be explained by novelty, as more novel articles may have shortertitles and be published in better journals. To compute the noveltyscore,weneedtofurtherrestrictthesampletoarticlesforwhichkeywordsareavailable.InColumn(6),wereport resultsfromtheregressionsunderlyingColumn(5)onthisnewsubsample.Themainestimateislittleaffected.Wethenadd thenoveltyscoretothecontrolsandreportresultsinColumn(7).Theestimateremainsunchanged.Therefore,thenegative associationbetweentitlelengthandjournalqualityisnotmediatedthroughnovelty.

Wenextturntotherelationbetweentitlelengthandcitations.FollowingEq.(2),weregressthelogarithmofthenumber ofcitationsofthearticlein2011plusoneagainstthelengthofthearticle’stitleandcontrols.Yeardummiescontrolforthe factthatcitationsaccumulateovertime.¹⁶Column(1)ofTable2presentsresultsofregressionsincludingthesamecontrols asinColumn(2)ofTable1.Theestimateis−0.0017.Aswitchfromﬁrsttolastdecileofthetitlelengthdistributionisthen associatedwitha14%increaseinthenumberofcitations.

Inthenextsetofregressions,weincludejournaldummies. ResultsarereportedinColumn(2)ofTable2.Theestimate decreasesinmagnitudeandisnow−0.0008.ComparingColumns(1)and(2)weseethatanimportantfractionoftheas- sociationbetweentitle lengthandcitations isexplainedbytimeinvariant characteristicsspeciﬁctothejournalwherethe articleis published. Wethen includeresearch team dummiesandtime varyingteam characteristics,we report resultsin Column (4)and(5).Interestinglythe estimate,whichis stillnegativeandstatisticallysigniﬁcant,is nowlarger inmagni-

16Our results are robust to alternative ways to account for this accumulation process, such as looking at citations per year, number of citations during the ﬁrst ﬁve years after publication, or the percentile in the citation distribution, see Supplementary Appendix A.

(10)

Table 3

Title length and citations by journal quality.

(1) TFE (2) AA (3) A (4) B

Title length −0.0 0 06 ^∗∗ −0.0023 ^∗∗ −0.0026 ^∗∗∗ −0.0013 ^∗∗∗

(0.0 0 03) (0.0 0 09) (0.0 0 05) (0.0 0 02) Journal Quality IF 0.2540 ^∗∗∗

(0.0098) Title length ^∗Journal quality IF −0.0 0 04 ^∗∗∗

(0.0 0 01)

Pages 0.0230 ^∗∗∗ 0.0388 ^∗∗∗ 0.0105 ^∗∗∗ 0.0381 ^∗∗∗

(0.0028) (0.0085) (0.0032) (0.0013) Team Specialization −0.0668 ^∗∗∗ 0.0656 −0.1314 ^∗∗ −0.0795 ^∗∗∗

(0.0242) (0.1282) (0.0623) (0.0283) Past output team −0.0584 ^∗∗∗ −0.0726 ^∗∗∗ −0.0511 ^∗∗∗ −0.0568 ^∗∗∗

(0.0047) (0.0158) (0.0088) (0.0077) Avg. past output authors 0.0236 ^∗∗∗ 0.0067 0.0328 ^∗∗ 0.0364 ^∗∗∗

(0.0066) (0.0231) (0.0137) (0.0094)

Observations 87,728 7431 18,998 46,343

R-squared 0.5639 0.5776 0.5844 0.5883

Year dummies Yes Yes Yes Yes

Fields shares Yes Yes Yes Yes

Journals dummies No Yes Yes Yes

Team dummies Yes Yes Yes Yes

Notes : In column 1, the results are obtained using team of authors with more than one ob- servation. Column 2–4 report the result using the sample of articles published in AA-ranked journals, A-ranked journals and B-ranked journals, respectively. The dependent variable is in

log(x + 1), Journal quality IF is in log. Pages is the number of pages of the article; Number

of authors == 2, 3, > 3 are dummy variables if the number of authors publishing the article is 2, 3, or more than 3, respectively; Team specialization is a herfindhal index obtained using the shares of past publications in different fields in economics, as defined by the first digit of the JEL codes; Past output team is the number of papers adjusted by quality that the research team has published together in the past; Avg. past output authors is the average number of papers adjusted by quality published by the authors of the research team in the past. All the regressions use clustered standard errors at the team level. ^∗∗∗p < 0.01, ^∗∗p < 0.05, ^∗p < 0.1

tude.¹⁷Therefore,articleswithshortertitlestendtobemorecited,evencontrollingforjournaltime-invariantcharacteristics andforteamtime-invariantandtime-varyingobservedcharacteristics.Toillustrate,consideraresearchteampublishingtwo articlesintheAmericanEconomicReviewin2001,onewith39charactersinitstitleandtheotherwith122characters.The articlewiththeshortertitleispredictedtoreceive15.3citationsin2013whiletheonewiththelongertitlewouldreceive 13.6citations.¹⁸

Controllingforteamcharacteristicsthereforehasaverydifferenteffectdependingonwhetherwelookatjournalquality orcitations. Alargepartofthenegativeassociationbetweentitlelength andjournalqualityisduetothefactthat teams withbetterability and, forinstance,more experienced authorspublish articlesin betterjournals andwithshortertitles.

Thisconformsto ourinitial expectationsince teamcharacteristics are primarydeterminants ofan article’soutcomesand content.Surprisingly,however,controllingforteamcharacteristicshaslittleimpactontheassociationbetweentitlelength andcitations.

WealsoﬁnditquiteinterestingtocontrastresultreportedinColumn(5)ofTable2withthosereportedinColumn(5)of Table1.Specializationisnownegativelyassociatedwithcitations,incontrasttoitspositiveassociationwithjournalquality.

Thearticlespublishedbymorespecializedteamsthereforereceivelesscitations,eventhoughtheytendtobepublishedin betterjournals.Thiscouldreﬂectthefactthattheytendtoreachanarroweraudience.Bycontrast,estimatesofpastteam outputandpastoutput ofteam membershavesimilarsigns.Articlespublishedbyauthors withbetter trackrecordsthus tendtoreceivemorecitations andtobepublishedinbetterjournals,consistentlywiththeMattheweffect(Merton,1968).

Andarticlespublishedbyauthorswhopublishexclusivelywiththesamecoauthors tendtoreceivelesscitationsandtobe publishedinlower-rankedjournals.

Finally, we include novelty as a control and report results in Column (7). The main estimate is unchanged. The negative association between title length and citations is therefore not explained by the fact that more novel papers tend to have shorter titles and to be more cited. Surprisingly, the relationship between novelty and citations is nega- tiveandstatisticallysigniﬁcant. Controllingfortitlelength, journaltimeinvariant characteristics andteam characteristics, more novel articles therefore tend to be lesscited. Theseﬁndings complement recent evidence on novelty in research:

17 As shown by results reported in Column (3), part of this increase is due to a composition effect. The association between title length and citations is stronger on the subset of articles published by research teams with at least two publications.

18 To get these predictions, the other control factors are evaluated at the mean.

(11)

Table 4

Title length and novelty.

Variables/Samples: (1) (2) (3) (4) (5)

All All TFE TFE TFE

Title length −0.0 0 03 ^∗∗∗ −0.0 0 04 ^∗∗∗ −0.0 0 04 ^∗∗∗ −0.0 0 03 ^∗∗∗ −0.0 0 03 ^∗∗∗

(0.0 0 0 0) (0.0 0 0 0) (0.0 0 0 0) (0.0 0 0 0) (0.0 0 0 0) Pages −0.0014 ^∗∗∗ −0.0012 ^∗∗∗ −0.0012 ^∗∗∗ −0.0010 ^∗∗∗ −0.0010 ^∗∗∗

(0.0 0 01) (0.0 0 01) (0.0 0 01) (0.0 0 01) (0.0 0 01) Number of authors = 2, −0.0080 ^∗∗∗ −0.0049 ^∗∗∗

(0.0010) (0.0 0 09) Number of authors = 3 −0.0057 ^∗∗∗ −0.0028 ^∗∗

(0.0013) (0.0013) Number of authors > 3 −0.0078 ^∗∗∗ −0.0078 ^∗∗∗

(0.0028) (0.0028)

Team Specialization −0.0235 ^∗∗∗

(0.0030)

Past output team −0.0 0 01

(0.0 0 08)

Avg. past output authors 0.0028 ^∗∗∗

(0.0011)

Observations 464,815 464,614 246,824 246,824 246,824

R-squared 0.2985 0.3300 0.3299 0.5567 0.5569

Year dummies Yes Yes Yes Yes Yes

Fields shares Yes Yes Yes Yes Yes

Journals dummies No Yes Yes Yes Yes

Team dummies No No No Yes Yes

Notes : In columns 1–5, we estimate the relationship between the novelty of the articles obtained as keywords atypicality and the article’s title length. In columns 1 and 3 (All) we consider the full sample of articles with keywords. In columns 3–5 (TFE) the sample is restricted to articles published by a research team (and sole authors) with at least two publications. Pages is the number of pages of the article;

Number of authors == 2, 3, > 3 are dummy variables if the number of authors publishing the article is 2, 3, or more than 3, respectively; Team specialization is a herfindhal index obtained using the shares of past publications in different fields in economics, as defined by the first digit of the JEL codes; Team experience is the number of papers that the research team has published together; Past output team is the number of papers adjusted by quality that the research team has published together in the past; Avg.

past output authors is the average number of papers adjusted by quality published by the authors of the research team in the past. All the regressions use clustered standard errors at the team level. ^∗∗∗p < 0.01,

∗∗p < 0.05, ^∗p < 0.1

Boudreauetal.(2016)showthatnovelresearchgrantproposalsareassociatedwithlowerevaluationsbycommitteeswhile Leeetal.(2015)showinadifferentcontextthatnovelarticlesindeedappeartoaccumulatecitationsatalowerpace.

We further lookathow the relationbetween title length andcitations variesby journalquality.We ﬁrst includethe journal’simpactfactoranditsinteractionwithtitlelengthinacitationregression.TheresultsarereportedinColumn(1)of Table3.Thenegativecoeﬃcientoftheinteractiontermshowsthattherelationbetweentitlelengthandcitationsishigher forjournalswithhigherimpactfactor.Wethenestimateseparatecitationregressionsforeachjournalcategory.Theresults arereportedinColumns(2),(3)and(4)ofTable3.The effectistwice aslargeinAandAAjournalsasinB journals.The relationbetweentitlelengthandcitationsisthereforestrongerinbetterjournals.

Third,westudytheassociationbetweentitlelengthandthenoveltyoftheresearcharticlemeasuredusingtheatypicality ofkeywords pairs.The resultspresentedinTable4 reveala negativerobustassociation betweentitlelength andnovelty.

The estimate is equal to −0.0003 inthe specification that control forteam characteristics, seeColumn (5), andchanges verylittleacrossspecifications.Quantitatively,aswitchfromthefirstdeciletothelastdecileofthetitlelengthdistribution is associatedwitha 0.0247 increase innovelty, corresponding to a+8.3% increase comparedto average novelty, equalto 0.31.Articleswithshortertitlesthereforetendtoscorehigheronthenoveltyindex,evenwhencontrollingforjournalfixed effectsandteamcharacteristics.¹⁹

Fourth, we analyzewhether the relation betweentitle length andscientific quality varied over time. We modify our preferredempirical specifications(including journalandteam fixed effectsand teamtime-varyingcharacteristics)in two ways.Wefirstincludelineartimetrends(ratherthanyearfixedeffects)andlineartimetrendsinteractedwithtitlelength in theregressions.Columns (1)–(3) of Table5 report resultsoftheseregressions on thethree differentquality measures studied:journalquality,citationsandnovelty.Wethenrelaxthelinearityassumptionandincludeyeardummiesaswellas yeardummiesinteractedwithtitle length intheregressions.Results fromthismoreflexible specificationarereportedin Column(4)–(6)ofTable5.

19In line with expectations, team specialization is negatively associated with novelty while team members past output is positively associated with novelty.