Introduction:Item nonresponse and data quality

(1)

Allgemeines Statistisches Archiv 89, 3–5 c

Physica-Verlag 2005, ISSN 0002-6018

Introduction:

Item nonresponse and data quality

By Regina T. Riphahn

To describe a population and to analyze its characteristics and behav-iors, empirical analysis needs data. Surveys provide such data, typically by measuring the characteristics of samples drawn from the relevant pop-ulation. However, if these samples are not drawn at random, or if not all answers from a randomly drawn sample are available, the data may not be representative, and the description of the population may turn out to be unreliable. Therefore, empirical sciences and data users are concerned about data quality. Data quality can be aﬀected by a variety of aspects: Respondents may refuse to provide information, they may make mistakes when answering, or their answers may be imprecise. But problems can also result from unreliable and even cheating interviewers and poorly designed survey questionnaires.

Even though most surveys suffer from data quality and nonresponse problems, these issues find little attention in applied research and in con-sequence continue to affect empirical studies. In order to promote our un-derstanding of nonresponse and data quality problems I initiated an inter-national workshop on ‘Item Nonresponse and Data Quality in Large Social Surveys’ at the University of Basel in October of 2003. The workshop was supported by the German Socioeconomic Panel, SIDOS (Swiss Information and Data Archive), the Swiss Household Panel, the Swiss National Commis-sion for UNESCO, the Swiss National Science Foundation, and the WWZ Forum at the University of Basel. The 37 papers presented at the workshop indicated the breadth of the data quality problem. This selection of papers reflects this field of research and the numerous angles from which the data quality problem can be addressed and investigated.

The ﬁve contributions gathered here cover the entire process of data col-lection, empirical application, and correction of data quality problems. The contribution by Schraepler and Wagner draws attention to the behav-ior of interviewers and discusses interviewer cheating. Building on evidence from various samples within the German Socioeconomic Panel (GSOEP) the authors describe the occurrence of completely faked interviews and the characteristics of cheating interviewers. Due to the recontacting of house-holds in the framework of annual panel surveys the probability of detecting interviewers who invent entire interviews is high. The analysis shows that cheating interviewers generated information which did not diﬀer much from the true answers in terms of means. However, faked interviews yielded dif-ferent correlations between the respondent characteristics compared to the true interviews.

(2)

4 REGINA T. RIPHAHN

When interviewers approach and interview the intended respondents, data quality may be aﬀected by the design of questionnaires. Redline,

Dillman, Carley-Baxter, and Creecy investigate the relevance of

ques-tion complexity and of quesques-tionnaire design for response quality. They focus on respondents’ correct adherence to questionnaire branching instructions. In an experiment they provided diﬀerent questionnaires to respondents and analysed which questionnaire features were correlated with the propensity to erroneously answer and to erroneously not answer a question. The re-sults yield substantial eﬀects of questionnaire design on the probability of following the instructions correctly.

Given that individuals are interviewed and provide information, there are still further dimensions of data quality that demand research attention:

Hanisch investigates the extent to which respondents provide rounded

in-stead of precise information as well as the determinants of rounding behav-ior. He points out that while the effect of rounding on mean values is likely to be negligible, it biases variance estimates. Hanisch finds very little sys-tematic determinants of rounding, as the explanatory variables in his model do not explain much of the observed heterogeneity in rounding behaviors. This finding is fortunate for applied researchers as it suggests that round-ing may justly be considered a random event comparable to unsystematic measurement error.

The remaining two studies address the problem of item nonresponse, where individuals refuse to provide answers to certain survey questions.

Frick and Grabka investigate the frequency of item nonresponse with

re-spect to income questions in the German Socioeconomic Panel. They ﬁnd that nonresponse is particularly frequent at the tails of the income distri-bution which – if unadjusted – aﬀects estimates of cross sectional income inequality and of income mobility over time. Also, item nonresponse appears to be a precursor of unit nonresponse in subsequent panel surveys.

Whereas Frick and Grabka describe the effect of item nonresponse on the empirical analysis, Spiess and Goebel focus on alternative treatments of the nonresponse problem. They compare the coefficient estimates in panel wage regressions obtained when applying two alternative mechanisms to handle missing data: The first, complete case analysis, simply deletes all observations with incomplete data while the second applies two alternative multiple imputation procedures. The authors find that the parameter esti-mates are sensitive to the choice of a mechanism and recommend to apply multiple imputation procedures.

These ﬁve papers illustrate the range of topics addressed in the ﬁeld of data quality research. They provide important insights with respect to the design of questionnaires, survey quality control, and estimation with imperfect survey data. Hopefully, the contributions will stimulate further research in the area.

(3)

ITEM NONRESPONSE AND DATA QUALITY 5 Regina T. Riphahn WWZ – University of Basel Petersgraben 51 4003 Basel Switzerland regina.riphahn@unibas.ch