GOOD AND BAD SOCIOLOGY: DOES TOPIC MODELLING MAKE A DIFFERENCE?
PDF

Keywords

unsupervised text analysis
LDA
topic modelling
sociological methods
big data sociology

Abstract

The changing social reality, which is increasingly digitally networked, requires new research methods capable of analysing large bodies of data (including textual data). This development poses a challenge for sociology, whose ambition is primarily to describe and explain social reality. As traditional sociological research methods focus on analysing relatively small data, the existential challenge of today involves the need to embrace new methods and techniques, which enable valuable insights into big volumes of data at speed. One such emerging area of investigation involves the application of Natural Language Processing and Machine-Learning to text mining, which allows for swift analyses of vast bodies of textual content. The paper’s main aim is to probe whether such a novel approach, namely, topic modelling based on Latent Dirichlet Allocation (LDA) algorithm, can find meaningful applications within sociology and whether its adaptation makes sociology perform its tasks better. In order to outline the context of the applicability of LDA in the social sciences and humanities, an analysis of abstracts of articles published in journals indexed in Elsevier’s Scopus database on topic modelling was conducted. This study, based on 1,149 abstracts, showed not only the diversity of topics undertaken by researchers but helped to answer the question of whether sociology using topic modelling is “good” sociology in the sense that it provides opportunities for exploration of topic areas and data that would not otherwise be undertaken.

https://doi.org/10.14746/sr.2021.5.4.01
PDF

Funding

This work was supported by grants awarded by the National Science Centre, Poland [no. 2021/05/X/HS6/00067] and [no. 2018/31/B/HS6/00403].

References

Adorjan, Michael &Benjamin Kelly. 2021. “Time as Vernacular Resource: Temporality and Credibility in Social Problems Claims-Making.” The American Sociologist 1–27. https://doi.org/10.1007/s12108-021-09516-x

Alghamdi, Rubayyi & Khalid Alfalqi. 2015. “A survey of topic modeling in text mining.” International Journal of Advanced Computer Science and Applications 6(1): 147–153.

Arabshahi, Forough & Animashree Anandkumar. 2016. Beyond LDA: A unified framework for learning latent normalized infinitely divisible topic models through spectral methods. Technical report. Retrieved November 10, 2021 (https://escholarship.org/content/qt7d95h1dd/qt7d95h1dd_noSplash_f43b4c2f867fcf6945df3700d0196f3a.pdf).

Baranowski, Mariusz. 2021. “The sharing economy: Social welfare in a technologically networked economy.” Bulletin of Science, Technology & Society 41(1): 20–30. https://doi.org/10.1177/02704676211010723

Baranowski, Mariusz & Dorota Mroczkowska. 2021. “Algorithmic Automation of Leisure from a Sustainable Development Perspective.” Pp. 21–38 in Handbook of Sustainable Development and Leisure Services. World Sustainability Series, edited by A. Lubowiecki-Vikuk, B. M. B. de Sousa, B. M. Đerčan, & W. Leal Filho. Cham: Springer. https://doi.org/10.1007/978-3-030-59820-4_2

Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, & Akitaka Matsuo. 2018. “quanteda: An R package for the quantitative analysis of textual data.” Journal of Open Source Software 3(30): 774. doi:10.21105/joss.00774

Berelson, Bernard R. 1952. Content analysis in communication research. Glencoe, Ill.: Free Press.

Blei, David M., Andrew Y. Ng, & Michael I. Jordan. 2003. “Latent dirichlet allocation.” Journal of Machine Learning Research 3(1): 993–1022.

Blei, David & John Lafferty. 2006. “Correlated topic models.” Advances in Neural Information Processing Systems 18: 147.

Bohr, Jeremiah & Riley E. Dunlap. 2018. “Key Topics in environmental sociology, 1990–2014: Results from a computational text analysis.” Environmental Sociology 4(2): 181–195. DOI: 10.1080/23251042.2017.1393863

DiMaggio, Paul, Manish Nag, & David Blei. 2013. “Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. government arts funding.” Poetics 41(6): 570–606. https://doi.org/10.1016/j.poetic.2013.08.004

Ding, Juncheng & Wei Jin. 2019. “A Prior Setting that Improves LDA in both Document Representation and Topic Extraction.” 2019 International Joint Conference on Neural Networks (IJCNN) 2019: 1–8. DOI: 10.1109/IJCNN.2019.8852000

Gans, Herbert J. 1999. Making Sense of America: Sociological Analyses and Esseys. Lanham, Oxford: Rowman & Littlefield Publishers, Inc.

Gouldner, Alvin W. 1976. The Dialectic of Ideology and Technology: The Origins, Grammar, and Future of Ideology. New York: Seabury Press.

Grün Bettina & Kurt Hornik. 2011. “topicmodels: An R Package for Fitting Topic Models.” Journal of Statistical Software 40(13): 1–30. doi: 10.18637/jss.v040.i13

Hannigan, Timothy R. et al. 2019. “Topic modeling in management research: Rendering new theory from textual data.” Academy of Management Annals 13(2): 586–632.

Jabkowski, Piotr, Piotr Cichocki, & Marta Kołczyńska. 2021. “Multi-Project Assessments of Sample Quality in Cross-National Surveys: The Role of Weights in Applying External and Internal Measures of Sample Bias.” Journal of Survey Statistics and Methodology 1–24. https://doi.org/10.1093/jssam/smab027

Lasswell, Harold D. 1927. “The theory of political propaganda.” American Political Science Review 21(3): 627–631.

Lazarsfeld, Paul F. & Anthony R. Oberschall. 1965. “Max Weber and Empirical Social Research”. American Sociological Review 30(2): 185–199. https://doi.org/10.2307/2091563

Lee, Sangno, Jaeki Song, & Yongjin Kim. 2010. “An empirical comparison of four text mining methods.” Journal of Computer Information Systems 51(1): 1–10. DOI: 10.1080/08874417.2010.11645444

Lee, Monica & John L. Martin. 2015. “Coding, counting and cultural cartography.” American Journal of Cultural Sociology 3(1): 1–33. https://doi.org/10.1057/ajcs.2014.13

Mayntz, Renate, Kurt Holm, & Peter Hübner. 1976. Introduction to Empirical Sociology. Harmondsworth: Penguin Education.

McFarland, Daniel A., Daniel Ramage, Jason Chuang, Jeffrey Heer, Christopher D. Manning, & Daniel Jurafsky. 2013. “Differentiating language usage through topic models.” Poetics 41(6): 607–625. https://doi.org/10.1016/j.poetic.2013.06.004

Mohr, John W. & Petko Bogdanov. 2013. “Introduction—Topic models: What they are and why they matter.” Poetics 41(6): 545–569. https://doi.org/10.1016/j.poetic.2013.10.001

Pääkkönen, Juho & Petri Ylikoski. 2021. “Humanistic interpretation and machine learning.” Synthese 199: 1461–1497. https://doi.org/10.1007/s11229-020-02806-w

R Core Team. 2021. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. https://www.R-project.org/

Rex, John. 1983. “British Sociology: 1960-80—An Essay.” Social Forces 61(4): 999-1009. https://doi.org/10.2307/2578275

Schmiedel, Theresa, Oliver Müller, & Jan vom Brocke. 2018. “Topic modeling as a strategy of inquiry in organizational research.” Organizational Research Methods 22(4): 941–968. https://doi.org/10.1177/1094428118773858

Selwyn, Neil. 2015. “Data entry: Towards the critical study of digital data and education.” Learning, Media and Technology 40(1): 64–82. https://doi.org/10.1080/17439884.2014.921628

Silge, Julia & David Robinson. 2017. Text mining with R: A tidy approach. Sebastopol, CA: O’Reilly Media.

Weber, Max. 1949. “‘Objectivity’ in Social Science and Social Policy.” Pp. 50–112 in Max Weber on The Methodology of the Social Sciences, edited by E. A. Shils & H. A. Finch. Illinois: The Free Press of Glencoe.

Weber, Robert Philip. 1990. Basic content analysis. London: Sage.

Wilterdink, Nico. 2012. “Controversial science: Good and bad sociology.” Figurations: Newsletter of the Norbert Elias Foundation 36: 1–12. https://pure.uva.nl/ws/files/4493017/151330_380325.pdf