Analyzing randomized controlled interventions: Three notes for applied linguists

Main Article Content

Jan Vanhove


I discuss three common practices that obfuscate or invalidate the statistical analysis of randomized controlled interventions in applied linguistics. These are (a) checking whether randomization produced groups that are balanced on a number of possibly relevant covariates, (b) using repeated measures ANOVA to analyze pretest-posttest designs, and (c) using traditional significance tests to analyze interventions in which whole groups were assigned to the conditions (cluster randomization). The first practice is labeled superfluous, and taking full advantage of important covariates regardless of balance is recommended. The second is needlessly complicated, and analysis of covariance is recommended as a more powerful alternative. The third produces dramatic inferential errors, which are largely, though not entirely, avoided when mixed-effects modeling is used. This discussion is geared towards applied linguists who need to design, analyze, or assess intervention studies or other randomized controlled trials. Statistical formalism is kept to a minimum throughout.


Download data is not yet available.

Article Details

How to Cite
Vanhove, J. (2015). Analyzing randomized controlled interventions: Three notes for applied linguists. Studies in Second Language Learning and Teaching, 5(1), 135-152.
Author Biography

Jan Vanhove, University of Fribourg, Department of Multilingualism, Rue de Rome 1, CH-1700 Fribourg
Jan Vanhove is an Oberassistent at the Department of Multilingualism in Fribourg, Switzerland. He finished his PhD with a thesis entitled Receptive Multilingualism Across the Lifespan: Cognitive and Linguistic Factors in Cognate Guessing in 2014 and blogs semi-regularly about statistical issues and research design in applied linguistics and multilingualism research at


  1. Abelson, R. P. (1995). Statistics as principled argument. Hillsdale, NJ: Lawrence Erlbaum.
  2. Baayen, R. H. (2008). Analyzing linguistic data. A practical introduction to statistics using R. Cambridge: Cambridge University Press.
  3. Barcikowski, R. S. (1981). Statistical power with group mean as the unit of analysis. Journal of Educational and Behavioral Statistics, 6(3), 267-285.
  4. Bates, D. (2006, May 19). lmer, p-values and all that [Electronic mailing list message]. Retrieved from
  5. Bates, D., Martin, M., Bolker, B., & Walker, S. (2014). lme4: Linear mixed-effects models using Eigen and S4. R package (version 1.1-7) [Computer software]. Retrieved from
  6. Blair, R. C., & Higgins, J. J. (1986). Comment on “Statistical power with group mean as the unit of analysis.” Journal of Educational and Behavioral Statistics, 11(2), 161-169.
  7. Bloom, H. S., Richburg-Hayes, L., & Black, A. R. (2007). Using covariates to improve precision for studies that randomize schools to evaluate educational interventions. Educational Evaluation and Policy Analysis, 29(1), 30-59.
  8. Campbell, M. J., Donner, A., & Klar, N. (2007). Developments in cluster randomized trials and Statistics in Medicine. Statistics in Medicine, 26, 2-19.
  9. Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159.
  10. Cohen, J. (1994). The Earth is round (p < .05). American Psychologist, 49, 997-1003.
  11. Dalton, S., & Overall, J. E. (1977). Nonrandom assignment in ANCOVA: The alternate ranks design. Journal of Experimental Education, 46(1), 58-62.
  12. Faraway, J. J. (2006). Extending the linear model with R: Generalized linear, mixed effect and nonparametric regression models. Boca Raton, FL: Chapman & Hall/CRC.
  13. Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Unpublished manuscript. Retrieved on 31 August 2014 from
  14. Halekoh, U., & Højsgaard, S. (2014). pbkrtest: Parametric bootstrap and Kenward Roger based methods for mixed model comparison. R package (version 0.4-0) [Computer software]. Retrieved from
  15. Hedges, L. V. (2007). Correcting a significance test for clustering. Journal of Educational and Behavioral Statistics, 32(2), 151-179.
  16. Hedges, L. V., & Hedberg, E. C. (2007). Intraclass correlation values for planning group-randomized trials in education. Educational Evaluation and Policy Analysis, 29(1), 60-87.
  17. Hendrix, L. J., Carter, M. W., & Hintze, J. L. (1978). A comparison of five statistical methods for analyzing pretest-posttest designs. Journal of Experimental Education, 47(2), 96-102.
  18. Huck, S. W., & McLean, R. A. (1975). Using a repeated measures ANOVA to analyze the data from a pretest-posttest design: A potentially confusing task. Psychological Bulletin, 82(4), 511.
  19. Imai, K., King, G., & Stuart, E. A. (2008). Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society: Series A (Statistics in Society), 171(2), 481-502.
  20. Killip, S., Mahfoud, Z., & Pearce, K. (2004). What is an intracluster correlation coefficient? Crucial concepts for primary care researchers. The Annals of Family Medicine, 2(3), 204-208.
  21. Lazaraton, A. (2005). Quantitative research methods. In E. Hinkel (Ed.), Handbook of research in second language learning (pp. 209-224). Mahwah, NJ: Lawrence Erlbaum.
  22. Lee, K. J., & Thompson, S. G. (2005). Clustering by health professional in individually randomised trials. BMJ, 330, 142-144.
  23. Maris, E. (1998). Covariance adjustment versus gain scores—revisited. Psychological Methods, 3(3), 309-327.
  24. Maxwell, S. E., Delaney, H. D., & Dill, C. A. (1984). Another look at ANCOVA versus blocking. Psychological Bulletin, 95(1), 136-147.
  25. McAweeney, M. J., & Klockars, A. J. (1998). Maximizing power in skewed distributions: Analysis and assignment. Psychological Methods, 3(1), 117.
  26. Moerbeek, M. (2006). Power and money in cluster randomized trials: When is it worth measuring a covariate? Statistics in Medicine, 25(15), 2607-2617.
  27. Moore, R. T. (2012). Multivariate continuous blocking to improve political science experiments. Political Analysis, 20(4), 460-479.
  28. Moore, R. T., & Moore, S. A. (2013). Blocking for sequential political experiments. Political Analysis, 21(4), 507-523.
  29. Murray, D. M., & Blitstein, J. L. (2003). Methods to reduce the impact of intraclass correlation in group-randomized trials. Evaluation Review, 27(1), 79-103.
  30. Murray, D. M., Varnell, S. P., & Blitstein, J. L. (2004). Design and analysis of group-randomized trials: A review of recent methodological developments. American Journal of Public Health, 94(3), 423-432.
  31. Mutz, D., & Pemantle, R. (2013). The perils of randomization checks in the analysis of experiments. Unpublished manuscript. Retrieved on 31 August 2014 from
  32. Oehlert, G. W. (2010). A first course in the design and analysis of experiments. Retrieved from
  33. Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1, 115-129.
  34. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359-1366.
  35. Schochet, P. Z. (2008). Statistical power for random assignment evaluations of education programs. Journal of Educational and Behavioral Statistics, 33(1), 62-87.
  36. Spybrook, J., Bloom, H., Congdon, R., Hill, C., Martinez, A., & Raudenbush, S. (2011). Optimal design for longitudinal and multilevel research: Documentation for the Optimal Design (version 3.0) [Computer software]. Retrieved from
  37. Van Breukelen, G. J. (2006). ANCOVA versus change from baseline had more power in randomized studies and more bias in nonrandomized studies. Journal of Clinical Epidemiology, 59(9), 920-925.
  38. Walsh, J. E. (1947). Concerning the effect of intraclass correlation on certain significance tests. The Annals of Mathematical Statistics, 18(1), 88-96.