Epistemological aspect of topic modelling in the social sciences: Latent Dirichlet Allocation
PDF

Keywords

Latent Dirichlet Allocation (LDA)
topic modelling
social sciences
social welfare
automated text analysis

How to Cite

Baranowski, M. (2022). Epistemological aspect of topic modelling in the social sciences: Latent Dirichlet Allocation. Critical Review, 4(1), 7–16. https://doi.org/10.14746/pk.2022.4.1.1

Abstract

Aware of the challenges faced by the social sciences in publishing a massive volume of research papers, it is worth looking at a novel but no longer so new ways of machine learning for the purposes of literature review. To this end, I explore a probabilistic topic model called Latent Dirichlet Allocation (LDA) in the context of the epistemological challenge of analysing texts on social welfare. This paper aims to describe how the LDA algorithm works for large corpora of data, along with its advantages and disadvantages. This preliminary characterisation of an inductive method for automated text analysis is intended to give a brief overview of how LDA can be used in the social sciences.

https://doi.org/10.14746/pk.2022.4.1.1
PDF

Funding

This work was supported by the National Science Centre, Poland, under research project “Social welfare in the light of topic modelling: A preliminary study”, no 2021/05/X/HS6/00067.

References

Ananiadou, S., Rea, B., Okazaki, N., Procter, R., & Thomas, J. (2009). Supporting Systematic Reviews Using Text Mining. Social Science Computer Review, 27(4), 509-523. doi: https://doi.org/10.1177/0894439309332293

Audi, R. (2003). Epistemology: A contemporary introduction to the theory of knowledge (Second ed.). New York and London: Routledge.

Baranowski, M. (2017). Welfare sociology in our times. How social, political, and economic uncertainties shape contemporary societies. Przegląd Socjologiczny, 66(4), 9-26. doi: https://doi.org/10.26485/PS/2017/66.4/1

Baranowski, M. (2019). The Struggle for social welfare: towards an emerging welfare sociology. Society Register, 3(2), 7-19. doi: https://doi.org/10.14746/sr.2019.3.2.01

Baranowski, M. (2021). The Sharing Economy: Social Welfare in a Technologically Networked Economy. Bulletin of Science, Technology & Society, 41(1), 20-30. doi: https://doi.org/10.1177/02704676211010723

Baranowski, M. (2022a). Myths, Narratives and Welfare States: The Impact of Stories on Welfare State Development. Contemporary Sociology, 51(3), 202-204. doi: https://doi.org/10.1177/00943061221090769h

Baranowski, M. (2022b). Radicalising Cultures of Uneven Data-Driven Political Communication. Knowledge Cultures, 10(2), 145-155. doi: https://doi.org/10.22381/kc10220227

Baranowski, M., & Cichocki, P. (2021). Good and bad sociology: Does topic modelling make a difference? Society Register, 5(4), 7-22. doi: https://doi.org/10.14746/sr.2021.5.4.01

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993–1022. doi: https://dl.acm.org/doi/10.5555/944919.944937

Chauhan, U., & Shah, A. (2021). Topic Modeling Using Latent Dirichlet allocation: A Survey. ACM Computing Surveys, 54(7), Article 145. doi: https://doi.org/10.1145/3462478

Evans, J. A., & Aceves, P. (2016). Machine Translation: Mining Text for Social Theory. Annual review of sociology, 42(1), 21-50. doi: https://doi.org/10.1146/annurev-soc-081715-074206

Feagin, J. R. (1973). Issues in welfare research: A critical overview. Social Science Quarterly, 54(2), 321-328. Retrieved from https://www.jstor.org/stable/42859163

Forder, A., Caslin, T., Ponton, G., & Walklate, S. (2018). Theories of welfare. London: Routledge.

García Adeva, J. J., Pikatza Atxa, J. M., Ubeda Carrillo, M., & Ansuategi Zengotitabengoa, E. (2014). Automatic text classification to support systematic reviews in medicine. Expert Systems with Applications, 41(4), 1498-1508. doi: https://doi.org/10.1016/j.eswa.2013.08.047

Genc-Nayebi, N., & Abran, A. (2017). A systematic literature review: Opinion mining studies from mobile app store user reviews. Journal of Systems and Software, 125, 207-219. doi: https://doi.org/10.1016/j.jss.2016.11.027

Günther, E., & Quandt, T. (2016). Word Counts and Topic Models. Digital Journalism, 4(1), 75-88. doi: https://doi.org/10.1080/21670811.2015.1093270

Hadley, R., & Hatch, S. (2019). Social Welfare and the Failure of the State: Centralised Social Services and Participatory Alternatives. Abingdon and New York: Routledge.

Heinrich, G. (2009). Parameter estimation for text analysis. Retrieved from https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.74.6555&rep=rep1&type=pdf

Hofmann, T. (1999). Probabilistic latent semantic analysis. Paper presented at the Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, Stockholm, Sweden.

Hofmann, T. (2001). Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning, 42(1), 177-196. doi: https://doi.org/10.1023/A:1007617005950

Kuhn, T. S. (1970). The Structure of Scientific Revolutions (Second ed.). Chicago: The University of Chicago Press.

McFarland, D. A., Ramage, D., Chuang, J., Heer, J., Manning, C. D., & Jurafsky, D. (2013). Differentiating language usage through topic models. Poetics, 41(6), 607-625. doi: https://doi.org/10.1016/j.poetic.2013.06.004

Mo, Y., Kontonatsios, G., & Ananiadou, S. (2015). Supporting systematic reviews using LDA-based document representations. Systematic Reviews, 4, 172. doi: https://doi.org/10.1186/s13643-015-0117-0

Mohr, J. W., & Bogdanov, P. (2013). Introduction—Topic models: What they are and why they matter. Poetics, 41(6), 545-569. doi: https://doi.org/10.1016/j.poetic.2013.10.001

O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M., & Ananiadou, S. (2015). Using text mining for study identification in systematic reviews: a systematic review of current approaches. Systematic Reviews, 4(1), 5. doi: https://doi.org/10.1186/2046-4053-4-5

Pääkkönen, J., & Ylikoski, P. (2021). Humanistic interpretation and machine learning. Synthese, 199(1), 1461-1497. doi: https://doi.org/10.1007/s11229-020-02806-w

Pandur, M. B., Dobša, J., & Kronegger, L. (2020). Topic Modelling in Social Sciences: Case Study of Web of Science. Paper presented at the Central European Conference on Information and Intelligent Systems, Varazdin.

Plant, R. (2019). Needs and welfare. In N. Timms (Ed.), Social welfare: Why and How? (pp. 103-122). Abingdon and New York: Routledge.

Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. In Handbook of latent semantic analysis. (pp. 427-448). Mahwah, NJ, US: Lawrence Erlbaum Associates Publishers.

Syed, S., & Spruit, M. (2018, 31 Jan.-2 Feb. 2018). Selecting Priors for Latent Dirichlet Allocation. Paper presented at the 2018 IEEE 12th International Conference on Semantic Computing (ICSC).

Wallace, B. C., Small, K., Brodley, C. E., & Trikalinos, T. A. (2010). Active learning for biomedical citation screening. Paper presented at the Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, DC, USA. doi: https://doi.org/10.1145/1835804.1835829

Wallace, B. C., Trikalinos, T. A., Lau, J., Brodley, C., & Schmid, C. H. (2010). Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics, 11(1), 55. doi: https://doi.org/10.1186/1471-2105-11-55

Williams, F., Popay, J., & Oakley, A. (2014). Changing Paradigms of Welfare. In F. Williams, J. Popay, & A. Oakley (Eds.), Welfare Research: A Critical Review (pp. 2-17). London and New York: Routledge.