The handling of missing binary data in language research
PDF

Keywords

missing data
Cronbach’s alpha
participant exclusion
second language testing

How to Cite

Pichette, F., Béland, S., Jolani, S., & Leśniewska, J. (2015). The handling of missing binary data in language research. Studies in Second Language Learning and Teaching, 5(1), 153–169. https://doi.org/10.14746/ssllt.2015.5.1.8

Number of views: 531


Number of downloads: 406

Abstract

Researchers are frequently confronted with unanswered questions or items on their questionnaires and tests, due to factors such as item difficulty, lack of testing time, or participant distraction. This paper first presents results from a poll confirming previous claims (Rietveld & van Hout, 2006; Schafer & Gra- ham, 2002) that data replacement and deletion methods are common in research. Language researchers declared that when faced with missing answers of the yes/no type (that translate into zero or one in data tables), the three most common solutions they adopt are to exclude the participant’s data from the analyses, to leave the square empty, or to fill in with zero, as for an incorrect answer. This study then examines the impact on Cronbach’s α of five types of data insertion, using simulated and actual data with various numbers of participants and missing percentages. Our analyses indicate that the three most common methods we identified among language researchers are the ones with the greatest impact  n Cronbach's α coefficients; in other words, they are the least desirable solutions to the missing data problem. On the basis of our results, we make recommendations for language researchers concerning the best way to deal with missing data. Given that none of the most common simple methods works properly, we suggest that the missing data be replaced either by the item’s mean or by the participants’ overall mean to provide a better, more accurate image of the instrument’s internal consistency.
https://doi.org/10.14746/ssllt.2015.5.1.8
PDF

References

Allison, P. D. (1987). Estimation of linear models with incomplete data. Sociological Methodology, 17, 71-103.

Allison, P. D. (2001). Missing data. Thousand Oaks, CA: Sage.

Barladi, A. N., & Enders, C. K. (2010). An introduction to modern missing data analyses. Journal of School Psychology, 48, 5-37.

Blom, E., & Unsworth, S. (Eds.) (2010). Experimental methods in language acquisition research. Amsterdam: Benjamins.

Cronbach, L. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334.

Curtis, D. A., & Harwell, M. (1996). Training graduate students in educational statistics. Paper presented at the annual meeting of the American Educational Research Association, New York, USA.

Fallon, D. (2006). The buffalo upon the chimneypiece: The value of evidence. Journal of Teacher Education, 57(2), 139-154.

Finch, W. H. (2008). Estimation of item response theory parameters values in the presence of missing data. Journal of Educational Measurement, 45, 225-246.

Florez-Lopez, R. (2010). Effects of missing data in credit risk scoring. A comparative analysis of methods to achieve robustness in the absence of sufficient data. The Journal of the Operational Research Society, 61, 486-501.

Graham, B. S. (2011). Efficiency bounds for missing data models with semiparametric restrictions. Econometrica, 79, 437-452.

Harvey, A. C. & Pierse, R. G. (1984). Estimating missing observations in economic time series. Journal of the American Statistical Association, 79, 125-131.

Horowitz, J. L., & Manski, C. F. (1998). Censoring of outcomes and regressors due to survey non-response: Identification and estimation using weights and imputation, Journal of Econometrics, 84, 37-58.

King, G., Honaker, J., Joseph, A., & Scheve, K. (2001). Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review, 95, 49-69.

Laurier, M. D., Froio, L., Paero, C., & Fournier, M. (1998). L’élaboration d’un test provincial pour le classement des étudiants en anglais langue seconde au collégial. Québec, Canada: Direction générale de l’enseignement collégial, ministère de l’Éducation du Québec.

Lazaraton, A. (2000). Current trends in research methodology and statistics in applied linguistics. TESOL Quarterly, 34, 175–181.

Lazaraton, A., Riggenbach, H., & Ediger, A. (1987). Forming a discipline: Applied linguists’ literacy in research methodology and statistics. TESOL Quarterly, 21, 263-277.

Little, R. J., & Rubin, D. B. (2002). Statistical analysis with missing data. New York: Wiley.

Marsh, H. W. (1998). Pairwise deletion for missing data in structural equation models: Nonpositive definite matrices, parameter estimates, goodness of fit, and adjusted sample sizes. Structural Equation Modeling, 5, 22-36.

Nicoletti, C., Peracchi, F., & Foliano, F. (2011). Estimating income poverty in the presence of missing data and measurement error. Journal of Business & Economic Statistics, 29, 61-72.

Peterson, R. A. (1994). A meta-analysis of Cronbach's coefficient alpha. Journal of Consumer Research, 21, 381-391.

Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of Educational Research, 74, 525-556.

Philipson, T. (2001). Data markets, missing data, and incentive pay. Econometrica, 69, 1099-1111.

Pichette, F., Béland, S., Lafontaine, M., & de Serres, L. (2014). Measuring L2 reading comprehension ability: The Sentence Verification Technique. The Quantitative Methods in Psychology, 10(2), 95-106.

Rietveld, A. C. M., & van Hout, R. (2006). Statistics for language research: Analysis of variance. Berlin: Gruyter Mouton.

Robitzsch, A., & Rupp, A. A. (2009). Impact of missing data on the detection of differential item functioning: The case of Mantel-Haenszel and logistic regression analysis. Educational and Psychological Measurement, 69, 18-34.

Schafer, J. L., & Graham, J. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147-177.

Schmidtke, J., Spino, L. A., & Lavolette, B. (2012, October). How statistically literate are we? Examining SLA professors’ and graduate students’ statistical knowledge and training. Paper presented at the 31st Second Language Research Forum, Pittsburgh, USA.

Schrapler, J.-P. (2004). Respondent behavior in panel studies: A case study for income-nonresponse by means of the German Socio-Economic Panel (SOEP). Sociological Methods & Research, 33, 118-156.

SCImago. (2014, October 27). SJR — SCImago journal & country rank. Retrieved from http://www.scimagojr.com

Shaver, J. P. (2001). The future of research in social studies – For what purpose? In W. B. Stanley (Ed.), Critical issues in social studies for the 21st century (pp. 231-252). Greenwich, CT: IAP.

Shohamy, E., Donitsa-Schmidt, S., & Ferman, I. (1996). Test impact revisited: Washback effect over time. Language Testing, 13(3), 298-317.

Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107-120.

Sterling, S., Wolff, D., & Papi, M. (2012, October). Students’ and professors’ views of statistics in SLA – A call for a change? Paper presented at the 31st Second Language Research Forum, Pittsburgh, USA.

Tufis, C. D. (2008). Multiple imputation as a solution to the missing data problem in social science. Metode de cercetare, 1-2, 199-212.

Winer, B. J. (1971). Statistical principles in experimental design (2nd ed.). New York: McGraw-Hill.

Winship, C., & Mare, R. D. (1989). Loglinear models with missing data: A latent class approach. Sociological Methodology, 19, 331-367.

Yang, K. (2010).Making sense of statistical methods in social research. London: Sage.

Zhang, B., & Walker, C. M. (2008). Impact of missing data on person model fit and person trait estimation. Applied Psychological Measurement, 32, 466-479.