The handling of missing binary data in language research

Main Article Content

François Pichette
Sébastien Béland
Shahab Jolani
Justyna Leśniewska


Researchers are frequently confronted with unanswered questions or items on their questionnaires and tests, due to factors such as item difficulty, lack of testing time, or participant distraction. This paper first presents results from a poll confirming previous claims (Rietveld & van Hout, 2006; Schafer & Gra- ham, 2002) that data replacement and deletion methods are common in research. Language researchers declared that when faced with missing answers of the yes/no type (that translate into zero or one in data tables), the three most common solutions they adopt are to exclude the participant’s data from the analyses, to leave the square empty, or to fill in with zero, as for an incorrect answer. This study then examines the impact on Cronbach’s α of five types of data insertion, using simulated and actual data with various numbers of participants and missing percentages. Our analyses indicate that the three most common methods we identified among language researchers are the ones with the greatest impact  n Cronbach's α coefficients; in other words, they are the least desirable solutions to the missing data problem. On the basis of our results, we make recommendations for language researchers concerning the best way to deal with missing data. Given that none of the most common simple methods works properly, we suggest that the missing data be replaced either by the item’s mean or by the participants’ overall mean to provide a better, more accurate image of the instrument’s internal consistency.


Download data is not yet available.

Article Details

How to Cite
Pichette, F., Béland, S., Jolani, S., & Leśniewska, J. (2015). The handling of missing binary data in language research. Studies in Second Language Learning and Teaching, 5(1), 153-169.
Author Biographies

François Pichette, UER Sciences humaines, lettres et communication, Téluq 455, rue du Parvis, Québec (QC), G1K 9H6
François Pichette is Professor of Linguistics at Téluq - Université du Québec, Canada. His current teaching and research interests include first- and second-language acquisition, L2 reading and writing, early bilingualism, language testing, and second-language vocabulary acquisition.

Sébastien Béland, Département d’administration et fondements de l’éducation, Université de Montréal, 2900 Boulevard Edouard-Montpetit, Montréal, QC H3T1J4
Sébastien Béland is a lecturer at Université de Montréal, Canada. His research interests are in the field of learning assessment and evaluation, and revolve around measurement models in education, item response theory, missing data, Bayesian approaches, detection of aberrant response patterns, and differential item functioning.

Shahab Jolani, Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Padualaan 14, 3584CH Utrecht
Shahab Jolani is a researcher in the Department of Methodology and Statistics at the University of Utrecht, the Netherlands. His primarily research interest is in the analysis of incomplete data, particularly in longitudinal settings. His expertise lies in imputation of missing data, causal inference, longitudinal dataanalysis, Bayesian computational statistics, analysis of incomplete data, and analysis of time to event data.

Justyna Leśniewska, Institute of English Studies, Jagiellonian University, ul. Łojasiewicza 4, 30-348 Kraków
Justyna Leśniewska teaches at the Institute of English Studies, Jagiellonian University, Poland. Her research interests are in applied linguistics and include second language vocabulary acquisition, collocation competence development, corpus-based linguistics, early bilingualism and EFL teaching.


  1. Allison, P. D. (1987). Estimation of linear models with incomplete data. Sociological Methodology, 17, 71-103.
  2. Allison, P. D. (2001). Missing data. Thousand Oaks, CA: Sage.
  3. Barladi, A. N., & Enders, C. K. (2010). An introduction to modern missing data analyses. Journal of School Psychology, 48, 5-37.
  4. Blom, E., & Unsworth, S. (Eds.) (2010). Experimental methods in language acquisition research. Amsterdam: Benjamins.
  5. Cronbach, L. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334.
  6. Curtis, D. A., & Harwell, M. (1996). Training graduate students in educational statistics. Paper presented at the annual meeting of the American Educational Research Association, New York, USA.
  7. Fallon, D. (2006). The buffalo upon the chimneypiece: The value of evidence. Journal of Teacher Education, 57(2), 139-154.
  8. Finch, W. H. (2008). Estimation of item response theory parameters values in the presence of missing data. Journal of Educational Measurement, 45, 225-246.
  9. Florez-Lopez, R. (2010). Effects of missing data in credit risk scoring. A comparative analysis of methods to achieve robustness in the absence of sufficient data. The Journal of the Operational Research Society, 61, 486-501.
  10. Graham, B. S. (2011). Efficiency bounds for missing data models with semiparametric restrictions. Econometrica, 79, 437-452.
  11. Harvey, A. C. & Pierse, R. G. (1984). Estimating missing observations in economic time series. Journal of the American Statistical Association, 79, 125-131.
  12. Horowitz, J. L., & Manski, C. F. (1998). Censoring of outcomes and regressors due to survey non-response: Identification and estimation using weights and imputation, Journal of Econometrics, 84, 37-58.
  13. King, G., Honaker, J., Joseph, A., & Scheve, K. (2001). Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review, 95, 49-69.
  14. Laurier, M. D., Froio, L., Paero, C., & Fournier, M. (1998). L’élaboration d’un test provincial pour le classement des étudiants en anglais langue seconde au collégial. Québec, Canada: Direction générale de l’enseignement collégial, ministère de l’Éducation du Québec.
  15. Lazaraton, A. (2000). Current trends in research methodology and statistics in applied linguistics. TESOL Quarterly, 34, 175–181.
  16. Lazaraton, A., Riggenbach, H., & Ediger, A. (1987). Forming a discipline: Applied linguists’ literacy in research methodology and statistics. TESOL Quarterly, 21, 263-277.
  17. Little, R. J., & Rubin, D. B. (2002). Statistical analysis with missing data. New York: Wiley.
  18. Marsh, H. W. (1998). Pairwise deletion for missing data in structural equation models: Nonpositive definite matrices, parameter estimates, goodness of fit, and adjusted sample sizes. Structural Equation Modeling, 5, 22-36.
  19. Nicoletti, C., Peracchi, F., & Foliano, F. (2011). Estimating income poverty in the presence of missing data and measurement error. Journal of Business & Economic Statistics, 29, 61-72.
  20. Peterson, R. A. (1994). A meta-analysis of Cronbach's coefficient alpha. Journal of Consumer Research, 21, 381-391.
  21. Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of Educational Research, 74, 525-556.
  22. Philipson, T. (2001). Data markets, missing data, and incentive pay. Econometrica, 69, 1099-1111.
  23. Pichette, F., Béland, S., Lafontaine, M., & de Serres, L. (2014). Measuring L2 reading comprehension ability: The Sentence Verification Technique. The Quantitative Methods in Psychology, 10(2), 95-106.
  24. Rietveld, A. C. M., & van Hout, R. (2006). Statistics for language research: Analysis of variance. Berlin: Gruyter Mouton.
  25. Robitzsch, A., & Rupp, A. A. (2009). Impact of missing data on the detection of differential item functioning: The case of Mantel-Haenszel and logistic regression analysis. Educational and Psychological Measurement, 69, 18-34.
  26. Schafer, J. L., & Graham, J. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147-177.
  27. Schmidtke, J., Spino, L. A., & Lavolette, B. (2012, October). How statistically literate are we? Examining SLA professors’ and graduate students’ statistical knowledge and training. Paper presented at the 31st Second Language Research Forum, Pittsburgh, USA.
  28. Schrapler, J.-P. (2004). Respondent behavior in panel studies: A case study for income-nonresponse by means of the German Socio-Economic Panel (SOEP). Sociological Methods & Research, 33, 118-156.
  29. SCImago. (2014, October 27). SJR — SCImago journal & country rank. Retrieved from
  30. Shaver, J. P. (2001). The future of research in social studies – For what purpose? In W. B. Stanley (Ed.), Critical issues in social studies for the 21st century (pp. 231-252). Greenwich, CT: IAP.
  31. Shohamy, E., Donitsa-Schmidt, S., & Ferman, I. (1996). Test impact revisited: Washback effect over time. Language Testing, 13(3), 298-317.
  32. Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107-120.
  33. Sterling, S., Wolff, D., & Papi, M. (2012, October). Students’ and professors’ views of statistics in SLA – A call for a change? Paper presented at the 31st Second Language Research Forum, Pittsburgh, USA.
  34. Tufis, C. D. (2008). Multiple imputation as a solution to the missing data problem in social science. Metode de cercetare, 1-2, 199-212.
  35. Winer, B. J. (1971). Statistical principles in experimental design (2nd ed.). New York: McGraw-Hill.
  36. Winship, C., & Mare, R. D. (1989). Loglinear models with missing data: A latent class approach. Sociological Methodology, 19, 331-367.
  37. Yang, K. (2010).Making sense of statistical methods in social research. London: Sage.
  38. Zhang, B., & Walker, C. M. (2008). Impact of missing data on person model fit and person trait estimation. Applied Psychological Measurement, 32, 466-479.