The hidden problem in Big Data: even infinite information does not guarantee consistent measurement
Journal cover Society Register, volume 8, no. 4, year 2024
PDF

Keywords

measurement
Big Data
measurement error
measurement crisis
simulation
Social Capital Benchmark Surveys (SCBS)

Number of views: 369


Number of downloads: 230

Abstract

The social sciences heavily depend on the measurement of abstract constructs for quantifying effects, identifying associations between variables, and testing hypotheses. In data science, constructs are also often used for forecasting, and thanks to the recent big data revolution, they promise to enhance their accuracy by leveraging the constantly increasing stream of digital information around us. However, the possibility of optimizing various social indicators implicitly hinges on our ability to reliably reduce complex and abstract constructs (such as life satisfaction or social trust) into numeric measures. While many scientists are aware of the issue of measurement error, there is widespread, implicit hope that access to more data will eventually render this issue irrelevant. This paper delves into the nature of measurement error under quasi-ideal conditions. We show mathematically and by employing simulations that single measurements fail to converge even when we can access progressively more information. Then, by using real-world data from the Social Capital Benchmark Surveys, we demonstrate how adding new information increases the dimensionality of the measured construct quasi-indefinitely, further contributing to measurement divergence. We conclude by discussing implications and future research directions to solve this problem.

https://doi.org/10.14746/sr.2024.8.4.01
PDF

Funding

DC is grateful for support from the project “CoCi: Co-Evolving City Life”, which was funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program under grant agreement No. 833168.

References

Bandalos, D. L. (2018). Methodology in the social sciences. Measurement theory and applications for the social sciences. New York, NY: Guilford Press.

Bollen, K. A. (1989). Structural equations with latent variables. New York: John Wiley. DOI: https://doi.org/10.1002/9781118619179

Botvinik-Nezer, R., Holzmeister, F., Camerer, C. F., Dreber, A., Huber, J., Johannesson, M., ... & Schonberg, T. (2020). Variabilityin the analysis of a single neuroimaging dataset by many teams. Nature, 582(7810), 84-88.

Boutyline, A. & Vaisey, S. (2017). Belief network analysis: A relational approach to understanding the structure of attitudes. American journal of sociology, 122(5), 1371-1447. DOI: https://doi.org/10.1086/691274

Breznau, N., Rinke, E. M., Wuttke, A., Nguyen, H. H., Adem, M., Adriaans, J., ... & Van Assche, J. (2022). Observing many researchers using the same data and hypothesis reveals a hidden universe of uncertainty. Proceedings of the National Academy of Sciences, 119(44), e2203150119. DOI: https://doi.org/10.1073/pnas.2203150119

Caragliu, A., Del Bo, Ch., & Nijkamp, P. (2011). Smart cities in Europe. Journal of Urban Technology, 18(2), 65-82. DOI: https://doi.org/10.1080/10630732.2011.601117

Carmines, E. G. & Zeller, R. A. (1979). Reliability and validity assessment. Newbury Park, CA: Sage Publications. DOI: https://doi.org/10.4135/9781412985642

Carpentras, D. & Quayle, M. (2023). The psychometric house-of-mirrors: the effect of measurement distortions on agent-based models’ predictions. International Journal of Social Research Methodology, 26(2), 215-231. DOI: https://doi.org/10.1080/13645579.2022.2137938

Carpentras, D. (2024). We urgently need a culture of multi-operationalization in psychological research. Communications Psychology, 2(1), 32. DOI: https://doi.org/10.1038/s44271-024-00084-7

Challen, R., Denny, J., Pitt, M., Gompels, L., Edwards, T., & Tsaneva-Atanasova, K. (2019). Artificial intelligence, bias and clinical safety. BMJ Quality & Safety, 28(3), 231-237. DOI: https://doi.org/10.1136/bmjqs-2018-008370

Charitonidou, M. (2022). Urban scale digital twins in data-driven society: Challenging digital universalism in urban planning decision-making. International Journal of Architectural Computing, 20(2), 238-253. DOI: https://doi.org/10.1177/14780771211070005

Costa Jr, P. T. & McCrae, R. R. (1992). Four ways five factors are basic. Personality and individual differences, 13(6), 653-665. DOI: https://doi.org/10.1016/0191-8869(92)90236-I

Courant, R., John, F., Blank, A. A., & Solomon, A. (1965). Introduction to calculus and analysis (Vol. 1). New York: Interscience Publishers.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334. DOI: https://doi.org/10.1007/BF02310555

DeVellis, R. F. (2006). Classical test theory. Medical care, S50-S59. DOI: https://doi.org/10.1097/01.mlr.0000245426.10853.30

Dinesen, P. T., Schaeffer, M., & Sønderskov, K. M. (2020). Ethnic diversity and social trust: A narrative and meta-analytical review. Annual Review of Political Science, 23, 441-465. DOI: https://doi.org/10.1146/annurev-polisci-052918-020708

Duck-Mayr, J. & Montgomery, J. (2022). Ends against the middle: Measuring latent traits when opposites respond the same way for antithetical reasons. Political Analysis, 31(4), 606-625. DOI: https://doi.org/10.1017/pan.2022.33

Ghazal, T. M., Hasan, M. K., Alshurideh, M. T., Alzoubi, H. M., Ahmad, M., Akbar, S. S., ... & Akour, I. A. (2021). IoT for smart cities: Machine learning approaches in smart healthcare—A review. Future Internet, 13(8), 218. DOI: https://doi.org/10.3390/fi13080218

GIGO. (2024). Retrieved form https://it.wikipedia.org/wiki/Garbage_in,_garbage_out

Gligor, D. M., Pillai, K. G., & Golgeci, I. (2021). Theorizing the dark side of business-to-business relationships in the era of AI, big data, and blockchain. Journal of Business Research, 133, 79-88. DOI: https://doi.org/10.1016/j.jbusres.2021.04.043

Golino, H. F. & Epskamp, S. (2017). Exploratory graph analysis: A new approach for estimating the number of dimensions in psychological research. PloS One, 12(6), e0174035. DOI: https://doi.org/10.1371/journal.pone.0174035

Golino, H., et al. (2021). Entropy fit indices: New fit measures for assessing the structure and dimensionality of multiple latent variables. Multivariate Behavioral Research, 56(6), 874-902. DOI: https://doi.org/10.1080/00273171.2020.1779642

Halperin, I. & Schwartz, L. (1952). Introduction to the Theory of Distributions. Toronto, ON: University of Toronto Press. DOI: https://doi.org/10.3138/9781442615151

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Thousand Oaks, CA: Sage.

Heisenberg, W. (1927). Über den anschaulichen Inhalt der quantentheoretischen Kinematik und Mechanik. Zeitschrift für Physik, 43(3-4), 172-198. DOI: https://doi.org/10.1007/BF01397280

Kean, J. & Reilly, J. (2014). Item response theory. Handbook for clinical research: Design, statistics and implementation, 195-198. DOI: https://doi.org/10.1891/9781617050992.0049

Kline, R. B. (2023). Principles and practice of structural equation modeling. New York, NY: Guilford Publications.

Kourtit, K. & Nijkamp, P. (2012). Smart cities in the innovation age. Innovation: The European Journal of Social Science Research, 25(2), 93-95. DOI: https://doi.org/10.1080/13511610.2012.660331

Krantz, D., Luce, R., Suppes, P., & Tversky, A. (1971). Foundations of Measurement Volume I: Additive and Polynomial Representations. Mineola, NY: Dover Publications. DOI: https://doi.org/10.1016/B978-0-12-425401-5.50011-8

Krantz, D., Luce, R., Suppes, P., & Tversky, A. (1989). Foundations of Measurement Volume II: Geometrical, Threshold, and Probabilistic Representations. Mineola, NY: Dover Publications.

Krantz, D., Luce, R., Suppes, P., & Tversky, A. (1990). Foundations of Measurement Volume III: Representation, Axiomatization, and Invariance. Mineola, NY: Dover Publications.

Krummel, T. M. (2019). The rise of wearable technology in health care. JAMA Network Open, 2(2), e187672-e187672. DOI: https://doi.org/10.1001/jamanetworkopen.2018.7672

Labroo, A. A., Mukhopadhyay, A., & Dong, P. (2014). Not always the best medicine: Why frequent smiling can reduce wellbeing. Journal of Experimental Social psychology, 53, 156-162. DOI: https://doi.org/10.1016/j.jesp.2014.03.001

Lai, C. S., Jia, Y., Dong, Z., Wang, D., Tao, Y., Lai, Q. H., ... & Lai, L. L. (2020). A review of technical standards for smart cities. Clean Technologies, 2(3), 290-310. DOI: https://doi.org/10.3390/cleantechnol2030019

Landy, J. F., Jia, M., Ding, I. L., Viganola, D., Tierney, W., Dreber, A.,...The Crowdsourcing Hypothesis Tests Collaboration.(2020). Crowdsourcing hypothesis tests: Making transparent how design choices shape research results. Psychological Bulletin,146, 451-479. DOI: https://doi.org/10.1037/bul0000220

Li, X., Liu, H., Wang, W., Zheng, Y., Lv, H., & Lv, Z. (2022). Big data analysis of the internet of things in the digital twins of smart city based on deep learning. Future Generation Computer Systems, 128, 167-177. DOI: https://doi.org/10.1016/j.future.2021.10.006

Luce, R. D. (1966). Two extensions of conjoint measurement. Journal of Mathematical Psychology, 3(2), 348-370. DOI: https://doi.org/10.1016/0022-2496(66)90019-8

Lueders, A., Carpentras, D., & Quayle, M. (2022). A Holistic View on Polarization: Attitudes, Emotions, and Partisanship as Elements of Social Identity Construction. Retrieved from https://psyarxiv.com/apkzv/download?format=pdf

McNamara, M. E., Zisser, M., Beevers, C. G., & Shumake, J. (2022). Not just “big” data: Importance of sample size, measurement error, and uninformative predictors for developing prognostic models for digital interventions. Behaviour research and therapy, 153, 104086. DOI: https://doi.org/10.1016/j.brat.2022.104086

Nakano, S. & Washizu, A. (2021). Will smart cities enhance the social capital of residents? The importance of smart neighborhood management. Cities, 115, 103244. DOI: https://doi.org/10.1016/j.cities.2021.103244

Nunnally, J. C. (1978). An overview of psychological measurement. In B. B. Wolman (Ed.), Clinical diagnosis of mental disorders: A Handbook (pp. 97-146). Boston, MA: Springer. DOI: https://doi.org/10.1007/978-1-4684-2490-4_4

O’Leary, D. E. (2013). Artificial intelligence and big data. IEEE intelligent systems, 28(2), 96-99. DOI: https://doi.org/10.1109/MIS.2013.39

Pan, Y., Tian, Y., Liu, X., Gu, D., & Hua, G. (2016). Urban big data and the development of city intelligence. Engineering, 2(2), 171-178. DOI: https://doi.org/10.1016/J.ENG.2016.02.003

Pedregosa, F., Varoquaux, Gael, Gramfort, A., Michel, V., Thirion, B., Grisel, O., … & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830.

Pereira, G. V., Parycek, P., Falco, E., & Kleinhans, R. (2018). Smart governance in the context of smart cities: A literature review. Information Polity, 23(2), 143-162. DOI: https://doi.org/10.3233/IP-170067

Putnam, R. D. (1995). Bowling alone: Americas’s declining social capital. Journal of Democracy, 6(1), 65-78. DOI: https://doi.org/10.1353/jod.1995.0002

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.

Reckase, M. D. (1990). Unidimensional Data from Multidimensional Tests and Multidimensional Data from Unidimensional Tests.

Roselli, D., Matthews, J., & Talagala, N. (2019, May). Managing bias in AI. Companion Proceedings of The 2019 World Wide Web Conference (pp. 539-544). New York, NY: Association for Computing Machinery. DOI: https://doi.org/10.1145/3308560.3317590

Sanchez, F. & Sobolev, K. (2010). Nanotechnology in concrete–a review. Construction and building materials, 24(11), 2060-2071. DOI: https://doi.org/10.1016/j.conbuildmat.2010.03.014

Schröder, C. & Yitzhaki, S. (2017). Revisiting the evidence for cardinal treatment of ordinal variables. European Economic Review, 92, 337-358. DOI: https://doi.org/10.1016/j.euroecorev.2016.12.011

Spearman, C. (1904). “General Intelligence,” Objectively Determined and Measured. The American Journal of Psychology, 15(2), 201-292. DOI: https://doi.org/10.2307/1412107

Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103(2684), 677-680. DOI: https://doi.org/10.1126/science.103.2684.677

Szarota, P. (2011). Smiling and happiness in cultural perspective. Austral-Asian Journal of Cancer, 10(4), 277-282.

Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E.,...& Nosek, B. A. (2018). Many analysts, one dataset: Making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science, 1(3), 337-356. DOI: https://doi.org/10.1177/2515245918810511

Thompson, B., & Vacha-Haase, T. (2000). Psychometrics is datametrics: The test is not reliable. Educational and Psychological Measurement, 60(2), 174-195. DOI: https://doi.org/10.1177/00131640021970448

Trencher, G. & Karvonen, A. (2020). Stretching “smart”: Advancing health and well-being through the smart city agenda. In Smart and Sustainable Cities? (pp. 54-71). London: Routledge. DOI: https://doi.org/10.4324/9781003120247-5

Uher, J. (2021). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and the complex network of its underlying fallacies. Journal of Theoretical and Philosophical Psychology, 41(1), 58. DOI: https://doi.org/10.1037/teo0000176

Van der Linden, W. J. & Hambleton, R. K. (1997). Handbook of item response theory. New York: Springer. DOI: https://doi.org/10.1007/978-1-4757-2691-6

Warncke, P., Searing, D.D. and Allen, N. (2024). Active, assertive, anointed, absconded? Testing claims about career politicians in the United Kingdom. European Journal of Political Research, 63(3), 1129-1154. DOI: https://doi.org/10.1111/1475-6765.12637

Webster, G. S. (1996). A prehistory of Sardinia, 2300-500 BC (No. 5). Sheffield: Sheffield Academic Press.

Wikipedia, https://en.wikipedia.org/wiki/Geographical_midpoint_of_Europe

Yenduri, G., Ramalingam, M., Selvi, G. C., Supriya, Y., Srivastava, G., Maddikunta, P. K. R., Raj, G. D., Jhaveri, R. H., Prabadevi, B., Wang, W., & Vasilakos, A. V. (2024). GPT (Generative Pre-Trained Transformer) – A comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions. IEEE, 12, 54608-54649. DOI: https://doi.org/10.1109/ACCESS.2024.3389497

Zhuang, Y. T., Wu, F., Chen, C., & Pan, Y. H. (2017). Challenges and opportunities: from big data to knowledge in AI 2.0. Frontiers of Information Technology & Electronic Engineering, 18, 3-14. DOI: https://doi.org/10.1631/FITEE.1601883