• Aucun résultat trouvé

utility Hermes, of Midas the of Corrections reliability for criterion in validity generalization: Theconsistency Journal of Work and Organizational Psychology

N/A
N/A
Protected

Academic year: 2021

Partager "utility Hermes, of Midas the of Corrections reliability for criterion in validity generalization: Theconsistency Journal of Work and Organizational Psychology"

Copied!
7
0
0

Texte intégral

(1)

w w w . e l s e v i e r . e s / r p t o

Journal of Work and Organizational Psychology

Corrections for criterion reliability in validity generalization: The consistency of Hermes, the utility of Midas

JesúsF.Salgadoa,∗, SilviaMoscosoa,NeilAndersonb

aUniversityofSantiagodeCompostela,Spain

bBrunelUniversity,U.K.

a r t i c l e i n f o

Articlehistory:

Received23November2015 Accepted3December2015 Availableonline4February2016

Keywords:

Interrater Reliability

Validitygeneralization Jobperformance Ratings

a b s t r a c t

Thereiscriticismintheliteratureabouttheuseofinterratercoefficientstocorrectforcriterionreliability invaliditygeneralization(VG)studiesanddisputingwhether.52isanaccurateandnon-dubiousestimate ofinterraterreliabilityofoveralljobperformance(OJP)ratings.Wepresentasecond-ordermeta-analysis ofthreeindependentmeta-analyticstudiesoftheinterraterreliabilityofjobperformanceratingsand makeanumberofcommentsandreflectionsonLeBretonetal.’spaper.Theresultsofourmeta-analysis indicatethattheinterraterreliabilityforasinglerateris.52(k=66,N=18,582,SD=.105).Ourmain conclusionsare:(a)thevalueof.52isanaccurateestimateoftheinterraterreliabilityofoveralljob performanceforasinglerater;(b)itisnotreasonabletoconcludethatpastVGstudiesthatused.52asthe criterionreliabilityvaluehavealessthansecurestatisticalfoundation;(c)basedoninterraterreliability, test-retestreliability,andcoefficientalpha,supervisorratingsareausefulandappropriatemeasureofjob performanceandcanbeconfidentlyusedasacriterion;(d)validitycorrectionforcriterionunreliability hasbeenunanimouslyrecommendedby“classical”psychometriciansandI/Opsychologistsastheproper waytoestimatepredictorvalidity,andisstillrecommendedatpresent;(e)thesubstantivecontribution ofVGprocedurestoinformHRMpracticesinorganizationsshouldnotbelostinthesetechnicalpoints ofdebate.

©2015ColegioOficialdePsicólogosdeMadrid.PublishedbyElsevierEspaña,S.L.U.Thisisanopen accessarticleundertheCCBY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/4.0/).

Correcciónporlafiabilidaddelcriterioenlageneralizationdelavalidez:

lacohererenciadeHermes,lautilidaddeMidas

Palabrasclave:

Interjueces Fiabilidad

Generalizacióndelavalidez Desempe ˜noeneltrabajo Valoraciones

re s um e n

En laliterature secritica el usode loscoeficientes interjuecespara corregirpor lafiabilidad del criterioenlosestudiosdegeneralizacióndelavalidez(GV)ycuestionansi.52esunestimadorpre- cisoynodudosodelafiabilidadinterjuecesdelasvaloracionesdeldesempe ˜noglobaleneltrabajo.

Eneste articulo,presentamosunmeta-análisis desegundoorden detresestudiosmeta-analíticos independientes sobre la fiabilidad interjueces de las valoraciones deldesempe ˜no en el trabajoy hacemosdiversoscomentarios yreflexionessobre el artículodeLeBretonetal. Losresultadosde nuestro meta-análisisindicanque lafiabilidadinterjueces es.52(k=66,N=18.582,SD=.105) para unúnicosupervisor.Nuestrasprincipalesconclusionesson:(a)elvalorde.52esunestimadorpre- ciso dela fiabilidad interjueces del desempe ˜no global en el trabajopara unúnico valorador, (b) no esrazonableconcluir que los estudios deGV que han usado.52como valor dela fiabilidad delcriterio tengan una fundamentaciónestadísticapocosegura, (c)sobre labase dela fiabilidad interjueces,lafiabilidad test-retestyel coeficientealfa,losjuiciosdelsupervisor sonuna medida

Correspondingauthor.DepartmentofOrganizationalPsychology.FacultyofLaborRelations.UniversityofSantiagodeCompostela.CampusVida.15782Santiago deCompostela,ACoru ˜na,Spain.

E-mailaddress:[email protected](J.F.Salgado).

http://dx.doi.org/10.1016/j.rpto.2015.12.001

1576-5962/©2015Colegio Oficialde PsicólogosdeMadrid. Publishedby ElsevierEspaña,S.L.U. Thisis an openaccessarticle underthe CCBY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

(2)

útilyadecuadadeldesempe ˜noeneltrabajoypuedenserusadosconconfianzacomocriterio,(d)la correccióndelavalidezporfaltadefiabilidaddelcriteriohasidounánimementerecomendadaporlos psicómetrasypsicólogosindustriales“clásicos”comoelmétodocorrectodeestimarlavalidezdelpre- dictoryestodavíarecomendadaenlaactualidady(e)lacontribuciónsustantivadelosprocedimientos deGVparaorientarlasprácticasderecursoshumanosenlasorganizacionesnodeberíaperderseenestas cuestionestécnicasdedebate.

©2015ColegioOficialdePsicólogosdeMadrid.PublicadoporElsevierEspaña,S.L.U.Esteesunartículo OpenAccessbajolalicenciaCCBY-NC-ND(http://creativecommons.org/licenses/by-nc-nd/4.0/).

LeBreton,Scherer,andJames(2014)havewrittenachallenging leadarticleinwhichtheymakeaseriesofcriticismsabouttheuse ofinterratercoefficientstocorrectforcriterionreliabilityinvalidity generalization(VG)studiesanddisputingwhether.52isanaccu- rateandnon-dubiousestimateofinterrater reliabilityofoverall jobperformance(OJP)ratings.Asresearcherswhohaveconducted severalmeta-analytical(MA)andVGstudiesinwhichthevalueof theinterraterreliabilitywasestimated,weheremakeanumberof commentsandreflectionsonLeBretonetal.’spaper.Weorganize ourcommentsundersixpoints:(1)whether.52isinfactadubious interraterreliabilityvalueofOJP,(2)theircriticismthatcorrected coefficientswerewronglylabelledasuncorrectedcoefficients,(3) toshowthattherearesomelabellingerrorsinLeBretonetal.,(4) ifitisappropriatetocorrectobservedvalidityforcriterionreliabil- ity,(5)whetherinterraterreliabilityistheappropriatecoefficient tocorrectforcriterionreliabilityinVGstudies,and(6)widerissues overthevalueofVGstudiesforinformingpoliciesandpracticesin organizations.

Incombination,wearguethatthesepointsindicateunequivo- callythatthecaseofLeBretonetal.(2014)islogicallyflawed,and indeedoncloserinspectionhasbeenbuiltuppiecemealonanum- berofoutlierinterpretations,non-sequitersoflogicalprogression, andimpracticalcallsfordatasettreatmentinVGstudies.Following theirrecommendationsrisk“throwingthebabyoutwiththebath- water”andreducingthelikelihoodthatVGstudieswouldcontinue tohaveimportantpositivebenefitsforthepracticeinemployee selectionandotherareasofI/OPsychology.

Is.52aDubiousInterraterReliabilityValue?

LeBretonetal.(2014)doubtwhether.52isalegitimateandaccu- rateestimateoftheinterraterreliability.Toquote,theyarguethat

“thepastVGstudieswhichreliedonthisdubiouscriterionreliabil- ityvaluehavealessthansecurestatisticalfoundation”,andthat they“suspectthatresearcherswouldconcludethat.52isnotacred- ibleestimate”.Theproblemhereisthatthesearesimplyopinions withoutempiricalbasis,orinfactanysupportingrationalebeing proffered.LeBretonetal.donotprovideanyempiricalsupportfor rejecting.52asacrediblevaluebeyondtheirsuspicion.Shouldwe acceptthisopiniontounilaterallyjettisonthiswell-establishedand widelyusedvaluewithoutanysupportingreasoning orempiri- calfoundation?We believeabsolutelynot,especially whenone considerstheevidenceuponwhichuseofthisinterraterreliability valuehasbeenbased.

Viswesvaran, Ones, and Schmidt (1996), for instance,found valuesof .52(k=40, N=14,650) forinterrater reliability,.81 for coefficients ofstability (k=12, N=1,374)and .86 for coefficient alpha(k=89, N=17,899). These coefficients estimatethree dif- ferentsources of measurementerror (Schmidt &Hunter, 1996;

Viswesvaran, Schmidt,&Ones, 2002).Notallresearchers agree thattheinterratercoefficientistheappropriateestimateofreli- ability.Forinstance,MurphyandDeShon(2000)suggestedthatit istheappropriatecoefficient.However,onethingistobelievethat anothercoefficientistheappropriate,asMurphy&DeShonhave suggested,andanotherthingistodisputethat.52isa credible

Table1

Second-orderMeta-analysisoftheInterraterReliabilityofJobPerformanceRatings.

N k ryy SD 99%CI

18,582 66 .52 .1056 .518/.522

Note.N=totalsamplesize;k=numberofindependentcoefficients;ryy=weighted- sampleaverageinterraterreliability;SD=standarddeviationofryy;99%CI=99%

confidenceintervalofinterraterreliability.

and non-dubious estimate of interrater reliability, as LeBreton etal.,2014havesuggested.Theonlywaytosupportthis claim istodemonstratebeyondreasonabledoubtthatViswesvaranetal.

(1996)madeerrorswhentheycalculatedtheirestimatesor,alter- natively,toprovideanotherestimateoftheinterratercorrelation based on an independent database. In her large-sample study (N=9,975)oftheinterraterreliabilityofoverallperformancerat- ings,Rothstein(1990)foundtheaverageinterraterwas.52.The meta-analysisbySalgadoetal.(2003,Table2)providedanother estimateofinterraterreliabilityofoveralljobperformancewith aEuropeansetofinterratercoefficients. Theyfoundexactlythe samevalueof.52 (k=18,N=1,936). Ina thirdand morerecent meta-analysis,SalgadoandTauriz(2014)foundthattheinterrater reliabilityofoverallperformanceratingswas.52(k=8,N=1,996), usinganindependentdataset.Thedifferencebetweentheesti- matesofViswesvaranetal.(1996),Salgado,Anderson,andTauriz (2015),andSalgadoand Taurizwasthatthestandarddeviation was .095, .19, and .05, respectively. That three MAs produced an identicalinterrater reliability estimate usingentirely differ- entsamplesofprimarystudiesismorethanjust coincidental itsuggeststhatthisestimateisreasonableandaccurate.Inapre- viousmeta-analysis,Salgado andMoscoso(1996)estimatedthe interraterreliabilityforcompositeandsinglesupervisoryratings criteria.Theyfoundmeaninterraterreliabilitiesof.618and.402, respectively (average ryy=.51). Table1 reports theresults of a second-ordermeta-analysisofthefirstthreeindependentstud- ies:SalgadoandMoscoso’s(1996)meta-analysiswasnotincluded becauseitdoesnotincludethesamplesizes.Ascanbeseen,the interraterreliabilityis.52 andthestandard deviationcombined is.105, which isvery closetothefigurefoundby Viswesvaran etal.(1996).In thepresentcase, weusedtheformulagivenby McNemar(1962,p. 24)todeterminethestandarddeviationfor threedistributionscombined.

Murphyand DeShon(2000,p. 896)suggestedthat thecor- relationof.52 canbearesult ofusingcontextsthat encourage disagreementamongratersandthatencouragesubstantialrating inflationand,consequently,rangerestriction.Assumingthanone raterusestheentirescaleandtheotheronlythetophalfofthe scale,MurphyandDeShonestimatedthatthecorrelationamong raterscorrected for rangerestrictionalone willbe.68and cor- rectedforunreliability,usingViswesvaranetal.’s(1996)coefficient alphaestimateof.86,wouldbe.79.Assumingthatonerateruses theentirescaleandanotheronlythetopthirdofthescale,their estimatedvalueswouldbe.91and1,respectively.

AproblematicpointinMurphyandDeShon’s(2000)examples isthatinadditiontoassumingthattheinterratercorrelationisa

(3)

validitycoefficient,theyappliedtheThrondike’sformulaforCaseII (Thorndike,1949,p.173)forcorrectingforrangerestriction.How- ever,intheirexamples,theproperformula tocorrect forrange restrictionwouldbetheThorndike’sformula forCaseI,because therestrictionisinthecriterion(seetheformulaintheAppendix).

Applyingthisformula,thecorrelationcorrectedforrangerestric- tionalonewouldbe.82,andcorrectedforunreliabilityinY1and Y2wouldbe.95(usingalpha=.86)inthefirstexampleofMurphy andDeShon.Inthesecondexample,thecorrelationcorrectedfor rangerestrictionwouldbe.96andcorrectedforunreliabilitywould be1.12(usingalpha=.86),whichisanimpossiblevalue.Moreover, itshouldbetakenintoaccountthatiftheratingsarerestrictedin rangeMurphyandDeShonshouldhaveattenuatedthereliabil- ityproportionallyinordertocorrectforunreliability.Thiscanbe doneusingtheformuladevelopedindependentlyeachotherbyOtis (1922)andKelley(1921)andreproducedinmanybooksandarti- cles(seetheOtis-KelleyformulaintheAppendix).Theapplication ofthisformulawouldresultinanalphacoefficientof.68inthefirst caseandof-.26inthesecond.Repeatingthecalculationswiththe attenuatedreliabilityvalueforthefirstexample,thiswouldbe.82 dividedbythesquarerootof.86multipliedby.68equalsto1.08.

Inotherwords,thecorrectionscarriedoutforthetwoexamples givetwoimpossiblevalues,whichcastdoubtbothonMurphyand DeShon’srationaleandtherealismoftheassumedvaluesofrange restriction.

IfoneacceptsMurphyandDeShon’s(2000)rationale,thenitis surelyunrealistictothinkthatifthecontextproducesrangerestric- tiononeraterusestheentire scaleandthesecondrateronlya fractionofthescale.Itwouldbemorerealistictothinkthatthe tworaters wereaffectedbyrange restriction,and consequently bothwoulduseafractionofthescale,forexample,oneusingthe top¾ofthescaleandthesecondthetophalfofthescale.This appearstousamorerealisticcase.However,thiscasewouldneed acorrectionfordoublerangerestriction.Tothisregard,sixyears afterdevelopingtheformulaspopularizedby Thorndike(1949), Pearson(1908)developed theformula forcorrectingfor double range-restriction,whichistheformulatobeappliedinthiscase(see Formula3intheAppendix).Applyingthisformula,thecorrected interratercorrelationwouldbe.88.

However,ifweacceptthattheinterratercorrelationisareli- abilityestimate(asLeBretonetal.,2014,do),and weapplythe Otis-Kelley’sformula,thisgivesvaluesof.79and.95asinterrater reliabilitycoefficientsforthefirstandsecondcasesofMurphyand DeShon(2000),respectively.Inordertoestimatethepredictive validityofatest,andacceptingthatthecriteriondistributionis restrictedinrange,thenthecorrectionshouldbedoneonboththe criterionreliabilityandthepredictorrestrictedvalidity.Forexam- ple,iftheobserved correlationbetweenapredictor(e.g.,GMA) andoveralljobperformanceratingsis.25,andthevalueofrange restrictionisU=1.5,asintheMurphyandDeShon’sfirstexam- ple,thefullcorrectedvaliditywouldbe.77(rounded),usingCase Iformula(becausetherestrictionisinthecriterion).Thisrequires threesteps:(a)tocorrecttheinterraterreliabilityof.52forrange restrictionusingOtis-Kelley’sformula,whichresultsin.79;(b)to correctthevalidityof.25bythesquarerootof.79,whichgives.28;

and(c)tocorrect.28forrangerestrictionusingaUvalueof1.5, whichproducesacorrectedvalidityof.77.Ifweusetheformula fordisattenuationonly,withthecriterionreliabilityvalueof.52, thecorrectedvaliditywouldbe.35(rounded).Inotherwords,the correctionforrangerestrictionoftheinterraterreliabilityimplies thecorrectionforrangerestrictionofthevaliditycoefficientusing CaseIformula,andtheconsequenceisthatalargervalidity(and unrealistic)valueisobtained.Moreover,itwouldstillbelackingin properlycorrectingforrangerestrictioninthepredictor.

With regard to criterion reliability, sixty-five years ago, Thorndike (1949, pp. 106-107) wrote that “it is not of critical

importancethatthereliabilityofacriterionbehighaslongasit isestablishedasdefinitelygreaterthanzero.Evenwhenthereli- abilityofacriterionisquitelow,giventhatitisdefinitelygreater thanzero,itisstillpossibletoobtainfairlysubstantialcorrelations betweenthatcriterionandreliabletestsandtocarryoutusefulsta- tisticalanalysesinconnectionwiththepredictionofthatcriterion.

Givenatestorcompositeoftestswithareliabilityof.90andacri- terionwithreliabilityof.40,itistheoreticallypossibletoobtaina correlationof.60betweenthetwo...Itismoreimportantthatthe reliabilityofacriterionmeasurebeknownthanthatitbehigh.”

AccordingtoThorndike(1949)andmanyclassicalpsychometri- ciansandI/Opsychologists(e.g.,Ghiselli,Campbell,&Zedeck,1981;

Guilford,1954;Guion,1965,1998;Gulliksen,1950;Nunnally,1978, amongothers),whenthecriterionmeasureisunreliable,whatitis ofcriticalimportanceisthatthesamplesizebeincreasedinorder toallowforsamplingfluctuationsandtogetstabilityintherelative sizeofthevaliditycoefficients.

Insummary,whilenoothermoreaccurateestimateoftheinter- raterreliabilityisavailable,researcherscanbeconfidentthat.52 iscurrentlya robust,accurate,andusefulestimateofinterrater reliabilityofoveralljobperformanceforasinglerater.

WereCorrectedCoefficientsLabelledasUncorrected Coefficients?

LeBretonetal.(2014,p.492)writethat“coefficientsthathave beencorrectedshouldbesodenoted(ˆ)ratherthansimplylabelled asobservedcorrelationcoefficients (r)or referredtoas“validi- ties”withoutclearlyarticulatingthevariouscorrectionsmadeto thesecorrelations(cf.Hunter&Hunter,1984,Table10;Schmidt

&Hunter,1998,Table1).Labellingcorrectedcoefficientsasuncor- rectedcoefficientscouldleadsomepsychologists(orHRmanagers) to drawimproper inferences from meta-analyses.” We are not awarethatthisconstitutesanendemicorevenfrequentproblemin VGstudiesandmanypublishedpaperscanbecitedtodemonstrate this.Inaddition,westronglydisagreewiththeusethatLeBreton hasmadeoftheTable1ofSchmidtandHunter(1998)andTable10 ofHunterandHunter(1984).InthefootnoteofTable1,Schmidtand Hunterwrotethefollowing(thesametextisrepeatedinTable2):

Allofthevaliditiesinthistableareforthecriterionofoveralljob performance.Unlessotherwisenoted,allvalidityestimatesarecor- rectedforthedownwardbiasduetomeasurementerrorinthemeasure ofjobperformance(emphasisadded)andrangerestrictiononthe predictorinincumbentsamplesrelativetoapplicantpopulations.

ThecorrelationsbetweenGMAandotherpredictorsarecorrected forrangerestrictionbutnotformeasurementerrorineithermea- sure(thustheyaresmallerthanfullycorrected meanvaluesin theliterature).Thesecorrelationsrepresentobservedscorecorre- lationsbetweenselectionmethodsinapplicantpopulations.

WithregardtoHunterandHunter’s(1984)Table10,onceagain, LeBretonetal.(2014)arenotfair,andinsteadadoptanextreme interpretationandposition.HunterandHunterwrote:

Ifthepredictorsaretobecompared,thecriterionforjobperfor- mancemustbethesameforall.Thisnecessityinexorablyleads tothechoiceofsupervisor ratings(with correctionfor measure- menterror)asthecriterionbecausetheyarepredictionstudiesfor supervisoryratingsforallpredictors(p.89).

Therefore,HunterandHunter(1984)explainedthattheymade correctionsfor criterion unreliability.Consequently,Hunterand HunterandSchmidtandHunter(1998)properlylabelledthecor- rectedcoefficients and theyhavenot leadpsychologists(orHR managers) todrawimproperinferences frommeta-analyses.In other words,if a reader draws improperinferences is because he/shehasnotproperlyreadthefootnoteandtheexplanations.The responsibilityisthatofthereadernotofthewritersanditis,ironi- cally,LeBretonetal.(2014)whomayhavemisguidedreadersinthe

Références

Documents relatifs

Also, we deepened our analysis on more theoretical questions, to respond in particular to the problems of specification and modelling, collateral and functional interactions in

A residential economy is an economy that survives not because it trades goods or services, but merely because people live and spend money there that has been earned

The strategic nature of the subcontractor cascade (shared responsibilities through a multiplication of subcontractors and contractors, the supply chain, value chain or

Since the number of answer categories is always reasonably small (never more than five possible categories), the DOSEI data satisfy the assumptions underlying the Condorcet Jury

Reliabilities of genomic estimated breeding values for crossbred performance with (W/) and without (W/O) availability of genotyping data, based on an across-breed SNP genotype

The present research attempts to address a host of issues relevant to the reliability and validity of the testing models in a case study of first year linguistics tests

Research Instrument, Questionnaire, Survey, Survey Validity, Questionnaire Reliability, Content Validity, Face Validity, Construct Validity, and Criterion Validity..

The function of performance bounding two domains (domain of safety and domain of failure) is estimated for several geometrical configurations of a hydrodynamic