The handling of missing binary data in language research

François Pichette; Sébastien Béland; Shahab Jolani; Justyna Leśniewska

doi:10.14746/ssllt.2015.5.1.8

Vol. 5 No. 1 (2015), Articles

Vol. 5 No. 1 (2015)

The handling of missing binary data in language research

Articles

https://doi.org/10.14746/ssllt.2015.5.1.8

Published 2015-01-01

François Pichette⁺⁻
Sébastien Béland⁺⁻
Shahab Jolani⁺⁻
Justyna Leśniewska⁺⁻

François Pichette

UER Sciences humaines, lettres et communication, Téluq 455, rue du Parvis, Québec (QC), G1K 9H6

Canada

Sébastien Béland

Département d’administration et fondements de l’éducation, Université de Montréal, 2900 Boulevard Edouard-Montpetit, Montréal, QC H3T1J4

Canada

Shahab Jolani

Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Padualaan 14, 3584CH Utrecht

Netherlands

Justyna Leśniewska

Institute of English Studies, Jagiellonian University, ul. Łojasiewicza 4, 30-348 Kraków

Poland

PDF

Keywords

missing data
Cronbach’s alpha
participant exclusion
second language testing

How to Cite

Pichette, F., Béland, S., Jolani, S., & Leśniewska, J. (2015). The handling of missing binary data in language research. Studies in Second Language Learning and Teaching, 5(1), 153–169. https://doi.org/10.14746/ssllt.2015.5.1.8

Number of views: 526

Number of downloads: 402

Abstract

Researchers are frequently confronted with unanswered questions or items on their questionnaires and tests, due to factors such as item difficulty, lack of testing time, or participant distraction. This paper first presents results from a poll confirming previous claims (Rietveld & van Hout, 2006; Schafer & Gra- ham, 2002) that data replacement and deletion methods are common in research. Language researchers declared that when faced with missing answers of the yes/no type (that translate into zero or one in data tables), the three most common solutions they adopt are to exclude the participant’s data from the analyses, to leave the square empty, or to fill in with zero, as for an incorrect answer. This study then examines the impact on Cronbach’s α of five types of data insertion, using simulated and actual data with various numbers of participants and missing percentages. Our analyses indicate that the three most common methods we identified among language researchers are the ones with the greatest impact n Cronbach's α coefficients; in other words, they are the least desirable solutions to the missing data problem. On the basis of our results, we make recommendations for language researchers concerning the best way to deal with missing data. Given that none of the most common simple methods works properly, we suggest that the missing data be replaced either by the item’s mean or by the participants’ overall mean to provide a better, more accurate image of the instrument’s internal consistency.

https://doi.org/10.14746/ssllt.2015.5.1.8

PDF

Downloads

Download data is not yet available.

References

Allison, P. D. (1987). Estimation of linear models with incomplete data. Sociological Methodology, 17, 71-103.

Allison, P. D. (2001). Missing data. Thousand Oaks, CA: Sage.

Barladi, A. N., & Enders, C. K. (2010). An introduction to modern missing data analyses. Journal of School Psychology, 48, 5-37.

Blom, E., & Unsworth, S. (Eds.) (2010). Experimental methods in language acquisition research. Amsterdam: Benjamins.

Cronbach, L. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334.

Curtis, D. A., & Harwell, M. (1996). Training graduate students in educational statistics. Paper presented at the annual meeting of the American Educational Research Association, New York, USA.

Fallon, D. (2006). The buffalo upon the chimneypiece: The value of evidence. Journal of Teacher Education, 57(2), 139-154.

Finch, W. H. (2008). Estimation of item response theory parameters values in the presence of missing data. Journal of Educational Measurement, 45, 225-246.

Florez-Lopez, R. (2010). Effects of missing data in credit risk scoring. A comparative analysis of methods to achieve robustness in the absence of sufficient data. The Journal of the Operational Research Society, 61, 486-501.

Graham, B. S. (2011). Efficiency bounds for missing data models with semiparametric restrictions. Econometrica, 79, 437-452.

Harvey, A. C. & Pierse, R. G. (1984). Estimating missing observations in economic time series. Journal of the American Statistical Association, 79, 125-131.

Horowitz, J. L., & Manski, C. F. (1998). Censoring of outcomes and regressors due to survey non-response: Identification and estimation using weights and imputation, Journal of Econometrics, 84, 37-58.

King, G., Honaker, J., Joseph, A., & Scheve, K. (2001). Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review, 95, 49-69.

Laurier, M. D., Froio, L., Paero, C., & Fournier, M. (1998). L’élaboration d’un test provincial pour le classement des étudiants en anglais langue seconde au collégial. Québec, Canada: Direction générale de l’enseignement collégial, ministère de l’Éducation du Québec.

Lazaraton, A. (2000). Current trends in research methodology and statistics in applied linguistics. TESOL Quarterly, 34, 175–181.

Lazaraton, A., Riggenbach, H., & Ediger, A. (1987). Forming a discipline: Applied linguists’ literacy in research methodology and statistics. TESOL Quarterly, 21, 263-277.

Little, R. J., & Rubin, D. B. (2002). Statistical analysis with missing data. New York: Wiley.

Marsh, H. W. (1998). Pairwise deletion for missing data in structural equation models: Nonpositive definite matrices, parameter estimates, goodness of fit, and adjusted sample sizes. Structural Equation Modeling, 5, 22-36.

Nicoletti, C., Peracchi, F., & Foliano, F. (2011). Estimating income poverty in the presence of missing data and measurement error. Journal of Business & Economic Statistics, 29, 61-72.

Peterson, R. A. (1994). A meta-analysis of Cronbach's coefficient alpha. Journal of Consumer Research, 21, 381-391.

Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of Educational Research, 74, 525-556.

Philipson, T. (2001). Data markets, missing data, and incentive pay. Econometrica, 69, 1099-1111.

Pichette, F., Béland, S., Lafontaine, M., & de Serres, L. (2014). Measuring L2 reading comprehension ability: The Sentence Verification Technique. The Quantitative Methods in Psychology, 10(2), 95-106.

Rietveld, A. C. M., & van Hout, R. (2006). Statistics for language research: Analysis of variance. Berlin: Gruyter Mouton.

Robitzsch, A., & Rupp, A. A. (2009). Impact of missing data on the detection of differential item functioning: The case of Mantel-Haenszel and logistic regression analysis. Educational and Psychological Measurement, 69, 18-34.

Schafer, J. L., & Graham, J. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147-177.

Schmidtke, J., Spino, L. A., & Lavolette, B. (2012, October). How statistically literate are we? Examining SLA professors’ and graduate students’ statistical knowledge and training. Paper presented at the 31st Second Language Research Forum, Pittsburgh, USA.

Schrapler, J.-P. (2004). Respondent behavior in panel studies: A case study for income-nonresponse by means of the German Socio-Economic Panel (SOEP). Sociological Methods & Research, 33, 118-156.

SCImago. (2014, October 27). SJR — SCImago journal & country rank. Retrieved from http://www.scimagojr.com

Shaver, J. P. (2001). The future of research in social studies – For what purpose? In W. B. Stanley (Ed.), Critical issues in social studies for the 21st century (pp. 231-252). Greenwich, CT: IAP.

Shohamy, E., Donitsa-Schmidt, S., & Ferman, I. (1996). Test impact revisited: Washback effect over time. Language Testing, 13(3), 298-317.

Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107-120.

Sterling, S., Wolff, D., & Papi, M. (2012, October). Students’ and professors’ views of statistics in SLA – A call for a change? Paper presented at the 31st Second Language Research Forum, Pittsburgh, USA.

Tufis, C. D. (2008). Multiple imputation as a solution to the missing data problem in social science. Metode de cercetare, 1-2, 199-212.

Winer, B. J. (1971). Statistical principles in experimental design (2nd ed.). New York: McGraw-Hill.

Winship, C., & Mare, R. D. (1989). Loglinear models with missing data: A latent class approach. Sociological Methodology, 19, 331-367.

Yang, K. (2010).Making sense of statistical methods in social research. London: Sage.

Zhang, B., & Walker, C. M. (2008). Impact of missing data on person model fit and person trait estimation. Applied Psychological Measurement, 32, 466-479.

License

1.1 The Author hereby warrants that he/she is the owner of all the copyright and other intellectual property rights in the Work and that, within the scope of the present Agreement, the paper does not infringe the legal rights of another person. The owner of the copyright work also warrants that he/she is the sole and original creator thereof and that is not bound by any legal constraints in regard to the use or sale of the work.

1.2. The Publisher warrants that is the owner of the PRESSto platform for open access journals, hereinafter referred to as the PRESSto Platform.

2. The Author grants the Publisher non-exclusive and free of charge license to unlimited use worldwide over an unspecified period of time in the following areas of exploitation:

2.1. production of multiple copies of the Work produced according to the specific application of a given technology, including printing, reproduction of graphics through mechanical or electrical means (reprography) and digital technology;

2.2. marketing authorisation, loan or lease of the original or copies thereof;

2.3. public performance, public performance in the broadcast, video screening, media enhancements as well as broadcasting and rebroadcasting, made available to the public in such a way that members of the public may access the Work from a place and at a time individually chosen by them;

2.4. inclusion of the Work into a collective work (i.e. with a number of contributions);

2.5. inclusion of the Work in the electronic version to be offered on an electronic platform, or any other conceivable introduction of the Work in its electronic version to the Internet;

2.6. dissemination of electronic versions of the Work in its electronic version online, in a collective work or independently;

2.7. making the Work in the electronic version available to the public in such a way that members of the public may access the Work from a place and at a time individually chosen by them, in particular by making it accessible via the Internet, Intranet, Extranet;

2.8. making the Work available according to appropriate license pattern Attribution 4.0 International (CC BY 4.0) as well as another language version of this license or any later version published by Creative Commons.

3. The Author grants the Publisher permission to reproduce a single copy (print or download) and royalty-free use and disposal of rights to compilations of the Work and these compilations.

4. The Author grants the Publisher permission to send metadata files related to the Work, including to commercial and non-commercial journal-indexing databases.

5. The Author represents that, on the basis of the license granted in the present Agreement, the Publisher is entitled and obliged to:

5.1. allow third parties to obtain further licenses (sublicenses) to the Work and to other materials, including derivatives thereof or compilations made, based on or including the Work, whereas the provisions of such sub-licenses will be the same as with the Attribution 4.0 International (CC BY 4.0) Creative Commons sub-license or another language version of this license, or any later version of this license published by Creative Commons;

5.2. make the Work available to the public in such a way that members of the public may access the Work from a place and at a time individually chosen by them, without any technological constraints;

5.3. appropriately inform members of the public to whom the Work is to be made available about sublicenses in such a way as to ensure that all parties are properly informed (appropriate informing messages).

6. Because of the royalty-free provision of services of the Author (resulting from the scope of obligations stipulated in the present Agreement), the Author shall not be entitled to any author’s fee due and payable on the part of the Publisher (no fee or royalty is payable by the Publisher to the Author).

7.1. In the case of third party claims or actions for indemnity against the Publisher owing to any infractions related to any form of infringement of intellectual property rights protection, including copyright infringements, the Author is obliged to take all possible measures necessary to protect against these claims and, when as a result of legal action, the Publisher, or any third party licensed by the Publisher to use the Work, will have to abandon using the Work in its entirety or in part or, following a court ruling in a legal challenge, to pay damages to a third party, whatever the legal basis

7.2. The Author will immediately inform the Publisher about any damage claims related to intellectual property infringements, including the author’s proprietary rights pertaining to a copyrighted work, filed against the Author. of liability, the Author is obliged to redress the damage resulting from claims made by third party, including costs and expenditures incurred in the process.

7.3. To all matters not settled herein provisions of the Polish Civil Code and the Polish Copyright and Related Rights Act shall apply.

The handling of missing binary data in language research

Keywords

How to Cite

Download Citation

Abstract

Downloads

References

License