Abstract
Aware of the challenges faced by the social sciences in publishing a massive volume of research papers, it is worth looking at a novel but no longer so new ways of machine learning for the purposes of literature review. To this end, I explore a probabilistic topic model called Latent Dirichlet Allocation (LDA) in the context of the epistemological challenge of analysing texts on social welfare. This paper aims to describe how the LDA algorithm works for large corpora of data, along with its advantages and disadvantages. This preliminary characterisation of an inductive method for automated text analysis is intended to give a brief overview of how LDA can be used in the social sciences.
Funding
This work was supported by the National Science Centre, Poland, under research project “Social welfare in the light of topic modelling: A preliminary study”, no 2021/05/X/HS6/00067.
References
Ananiadou, S., Rea, B., Okazaki, N., Procter, R., & Thomas, J. (2009). Supporting Systematic Reviews Using Text Mining. Social Science Computer Review, 27(4), 509-523. doi: https://doi.org/10.1177/0894439309332293
Audi, R. (2003). Epistemology: A contemporary introduction to the theory of knowledge (Second ed.). New York and London: Routledge.
Baranowski, M. (2017). Welfare sociology in our times. How social, political, and economic uncertainties shape contemporary societies. Przegląd Socjologiczny, 66(4), 9-26. doi: https://doi.org/10.26485/PS/2017/66.4/1
Baranowski, M. (2019). The Struggle for social welfare: towards an emerging welfare sociology. Society Register, 3(2), 7-19. doi: https://doi.org/10.14746/sr.2019.3.2.01
Baranowski, M. (2021). The Sharing Economy: Social Welfare in a Technologically Networked Economy. Bulletin of Science, Technology & Society, 41(1), 20-30. doi: https://doi.org/10.1177/02704676211010723
Baranowski, M. (2022a). Myths, Narratives and Welfare States: The Impact of Stories on Welfare State Development. Contemporary Sociology, 51(3), 202-204. doi: https://doi.org/10.1177/00943061221090769h
Baranowski, M. (2022b). Radicalising Cultures of Uneven Data-Driven Political Communication. Knowledge Cultures, 10(2), 145-155. doi: https://doi.org/10.22381/kc10220227
Baranowski, M., & Cichocki, P. (2021). Good and bad sociology: Does topic modelling make a difference? Society Register, 5(4), 7-22. doi: https://doi.org/10.14746/sr.2021.5.4.01
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993–1022. doi: https://dl.acm.org/doi/10.5555/944919.944937
Chauhan, U., & Shah, A. (2021). Topic Modeling Using Latent Dirichlet allocation: A Survey. ACM Computing Surveys, 54(7), Article 145. doi: https://doi.org/10.1145/3462478
Evans, J. A., & Aceves, P. (2016). Machine Translation: Mining Text for Social Theory. Annual review of sociology, 42(1), 21-50. doi: https://doi.org/10.1146/annurev-soc-081715-074206
Feagin, J. R. (1973). Issues in welfare research: A critical overview. Social Science Quarterly, 54(2), 321-328. Retrieved from https://www.jstor.org/stable/42859163
Forder, A., Caslin, T., Ponton, G., & Walklate, S. (2018). Theories of welfare. London: Routledge.
García Adeva, J. J., Pikatza Atxa, J. M., Ubeda Carrillo, M., & Ansuategi Zengotitabengoa, E. (2014). Automatic text classification to support systematic reviews in medicine. Expert Systems with Applications, 41(4), 1498-1508. doi: https://doi.org/10.1016/j.eswa.2013.08.047
Genc-Nayebi, N., & Abran, A. (2017). A systematic literature review: Opinion mining studies from mobile app store user reviews. Journal of Systems and Software, 125, 207-219. doi: https://doi.org/10.1016/j.jss.2016.11.027
Günther, E., & Quandt, T. (2016). Word Counts and Topic Models. Digital Journalism, 4(1), 75-88. doi: https://doi.org/10.1080/21670811.2015.1093270
Hadley, R., & Hatch, S. (2019). Social Welfare and the Failure of the State: Centralised Social Services and Participatory Alternatives. Abingdon and New York: Routledge.
Heinrich, G. (2009). Parameter estimation for text analysis. Retrieved from https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.74.6555&rep=rep1&type=pdf
Hofmann, T. (1999). Probabilistic latent semantic analysis. Paper presented at the Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, Stockholm, Sweden.
Hofmann, T. (2001). Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning, 42(1), 177-196. doi: https://doi.org/10.1023/A:1007617005950
Kuhn, T. S. (1970). The Structure of Scientific Revolutions (Second ed.). Chicago: The University of Chicago Press.
McFarland, D. A., Ramage, D., Chuang, J., Heer, J., Manning, C. D., & Jurafsky, D. (2013). Differentiating language usage through topic models. Poetics, 41(6), 607-625. doi: https://doi.org/10.1016/j.poetic.2013.06.004
Mo, Y., Kontonatsios, G., & Ananiadou, S. (2015). Supporting systematic reviews using LDA-based document representations. Systematic Reviews, 4, 172. doi: https://doi.org/10.1186/s13643-015-0117-0
Mohr, J. W., & Bogdanov, P. (2013). Introduction—Topic models: What they are and why they matter. Poetics, 41(6), 545-569. doi: https://doi.org/10.1016/j.poetic.2013.10.001
O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M., & Ananiadou, S. (2015). Using text mining for study identification in systematic reviews: a systematic review of current approaches. Systematic Reviews, 4(1), 5. doi: https://doi.org/10.1186/2046-4053-4-5
Pääkkönen, J., & Ylikoski, P. (2021). Humanistic interpretation and machine learning. Synthese, 199(1), 1461-1497. doi: https://doi.org/10.1007/s11229-020-02806-w
Pandur, M. B., Dobša, J., & Kronegger, L. (2020). Topic Modelling in Social Sciences: Case Study of Web of Science. Paper presented at the Central European Conference on Information and Intelligent Systems, Varazdin.
Plant, R. (2019). Needs and welfare. In N. Timms (Ed.), Social welfare: Why and How? (pp. 103-122). Abingdon and New York: Routledge.
Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. In Handbook of latent semantic analysis. (pp. 427-448). Mahwah, NJ, US: Lawrence Erlbaum Associates Publishers.
Syed, S., & Spruit, M. (2018, 31 Jan.-2 Feb. 2018). Selecting Priors for Latent Dirichlet Allocation. Paper presented at the 2018 IEEE 12th International Conference on Semantic Computing (ICSC).
Wallace, B. C., Small, K., Brodley, C. E., & Trikalinos, T. A. (2010). Active learning for biomedical citation screening. Paper presented at the Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, DC, USA. doi: https://doi.org/10.1145/1835804.1835829
Wallace, B. C., Trikalinos, T. A., Lau, J., Brodley, C., & Schmid, C. H. (2010). Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics, 11(1), 55. doi: https://doi.org/10.1186/1471-2105-11-55
Williams, F., Popay, J., & Oakley, A. (2014). Changing Paradigms of Welfare. In F. Williams, J. Popay, & A. Oakley (Eds.), Welfare Research: A Critical Review (pp. 2-17). London and New York: Routledge.
License
W przypadku zakwalifikowania tekstu do druku Autor wyraża zgodę na przekazanie praw autorskich do tego artykułu wydawcy (zob. Polityka open access). Autor artykułu zachowuje prawo wykorzystania treści opublikowanego przez czasopismo artykułu w dalszej pracy naukowej i popularyzatorskiej pod warunkiem wskazania źródła publikacji.