Abstract
The changing social reality, which is increasingly digitally networked, requires new research methods capable of analysing large bodies of data (including textual data). This development poses a challenge for sociology, whose ambition is primarily to describe and explain social reality. As traditional sociological research methods focus on analysing relatively small data, the existential challenge of today involves the need to embrace new methods and techniques, which enable valuable insights into big volumes of data at speed. One such emerging area of investigation involves the application of Natural Language Processing and Machine-Learning to text mining, which allows for swift analyses of vast bodies of textual content. The paper’s main aim is to probe whether such a novel approach, namely, topic modelling based on Latent Dirichlet Allocation (LDA) algorithm, can find meaningful applications within sociology and whether its adaptation makes sociology perform its tasks better. In order to outline the context of the applicability of LDA in the social sciences and humanities, an analysis of abstracts of articles published in journals indexed in Elsevier’s Scopus database on topic modelling was conducted. This study, based on 1,149 abstracts, showed not only the diversity of topics undertaken by researchers but helped to answer the question of whether sociology using topic modelling is “good” sociology in the sense that it provides opportunities for exploration of topic areas and data that would not otherwise be undertaken.
Funding
This work was supported by grants awarded by the National Science Centre, Poland [no. 2021/05/X/HS6/00067] and [no. 2018/31/B/HS6/00403].
References
Adorjan, Michael &Benjamin Kelly. 2021. “Time as Vernacular Resource: Temporality and Credibility in Social Problems Claims-Making.” The American Sociologist 1–27. https://doi.org/10.1007/s12108-021-09516-x
Alghamdi, Rubayyi & Khalid Alfalqi. 2015. “A survey of topic modeling in text mining.” International Journal of Advanced Computer Science and Applications 6(1): 147–153.
Arabshahi, Forough & Animashree Anandkumar. 2016. Beyond LDA: A unified framework for learning latent normalized infinitely divisible topic models through spectral methods. Technical report. Retrieved November 10, 2021 (https://escholarship.org/content/qt7d95h1dd/qt7d95h1dd_noSplash_f43b4c2f867fcf6945df3700d0196f3a.pdf).
Baranowski, Mariusz. 2021. “The sharing economy: Social welfare in a technologically networked economy.” Bulletin of Science, Technology & Society 41(1): 20–30. https://doi.org/10.1177/02704676211010723
Baranowski, Mariusz & Dorota Mroczkowska. 2021. “Algorithmic Automation of Leisure from a Sustainable Development Perspective.” Pp. 21–38 in Handbook of Sustainable Development and Leisure Services. World Sustainability Series, edited by A. Lubowiecki-Vikuk, B. M. B. de Sousa, B. M. Đerčan, & W. Leal Filho. Cham: Springer. https://doi.org/10.1007/978-3-030-59820-4_2
Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, & Akitaka Matsuo. 2018. “quanteda: An R package for the quantitative analysis of textual data.” Journal of Open Source Software 3(30): 774. doi:10.21105/joss.00774
Berelson, Bernard R. 1952. Content analysis in communication research. Glencoe, Ill.: Free Press.
Blei, David M., Andrew Y. Ng, & Michael I. Jordan. 2003. “Latent dirichlet allocation.” Journal of Machine Learning Research 3(1): 993–1022.
Blei, David & John Lafferty. 2006. “Correlated topic models.” Advances in Neural Information Processing Systems 18: 147.
Bohr, Jeremiah & Riley E. Dunlap. 2018. “Key Topics in environmental sociology, 1990–2014: Results from a computational text analysis.” Environmental Sociology 4(2): 181–195. DOI: 10.1080/23251042.2017.1393863
DiMaggio, Paul, Manish Nag, & David Blei. 2013. “Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. government arts funding.” Poetics 41(6): 570–606. https://doi.org/10.1016/j.poetic.2013.08.004
Ding, Juncheng & Wei Jin. 2019. “A Prior Setting that Improves LDA in both Document Representation and Topic Extraction.” 2019 International Joint Conference on Neural Networks (IJCNN) 2019: 1–8. DOI: 10.1109/IJCNN.2019.8852000
Gans, Herbert J. 1999. Making Sense of America: Sociological Analyses and Esseys. Lanham, Oxford: Rowman & Littlefield Publishers, Inc.
Gouldner, Alvin W. 1976. The Dialectic of Ideology and Technology: The Origins, Grammar, and Future of Ideology. New York: Seabury Press.
Grün Bettina & Kurt Hornik. 2011. “topicmodels: An R Package for Fitting Topic Models.” Journal of Statistical Software 40(13): 1–30. doi: 10.18637/jss.v040.i13
Hannigan, Timothy R. et al. 2019. “Topic modeling in management research: Rendering new theory from textual data.” Academy of Management Annals 13(2): 586–632.
Jabkowski, Piotr, Piotr Cichocki, & Marta Kołczyńska. 2021. “Multi-Project Assessments of Sample Quality in Cross-National Surveys: The Role of Weights in Applying External and Internal Measures of Sample Bias.” Journal of Survey Statistics and Methodology 1–24. https://doi.org/10.1093/jssam/smab027
Lasswell, Harold D. 1927. “The theory of political propaganda.” American Political Science Review 21(3): 627–631.
Lazarsfeld, Paul F. & Anthony R. Oberschall. 1965. “Max Weber and Empirical Social Research”. American Sociological Review 30(2): 185–199. https://doi.org/10.2307/2091563
Lee, Sangno, Jaeki Song, & Yongjin Kim. 2010. “An empirical comparison of four text mining methods.” Journal of Computer Information Systems 51(1): 1–10. DOI: 10.1080/08874417.2010.11645444
Lee, Monica & John L. Martin. 2015. “Coding, counting and cultural cartography.” American Journal of Cultural Sociology 3(1): 1–33. https://doi.org/10.1057/ajcs.2014.13
Mayntz, Renate, Kurt Holm, & Peter Hübner. 1976. Introduction to Empirical Sociology. Harmondsworth: Penguin Education.
McFarland, Daniel A., Daniel Ramage, Jason Chuang, Jeffrey Heer, Christopher D. Manning, & Daniel Jurafsky. 2013. “Differentiating language usage through topic models.” Poetics 41(6): 607–625. https://doi.org/10.1016/j.poetic.2013.06.004
Mohr, John W. & Petko Bogdanov. 2013. “Introduction—Topic models: What they are and why they matter.” Poetics 41(6): 545–569. https://doi.org/10.1016/j.poetic.2013.10.001
Pääkkönen, Juho & Petri Ylikoski. 2021. “Humanistic interpretation and machine learning.” Synthese 199: 1461–1497. https://doi.org/10.1007/s11229-020-02806-w
R Core Team. 2021. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. https://www.R-project.org/
Rex, John. 1983. “British Sociology: 1960-80—An Essay.” Social Forces 61(4): 999-1009. https://doi.org/10.2307/2578275
Schmiedel, Theresa, Oliver Müller, & Jan vom Brocke. 2018. “Topic modeling as a strategy of inquiry in organizational research.” Organizational Research Methods 22(4): 941–968. https://doi.org/10.1177/1094428118773858
Selwyn, Neil. 2015. “Data entry: Towards the critical study of digital data and education.” Learning, Media and Technology 40(1): 64–82. https://doi.org/10.1080/17439884.2014.921628
Silge, Julia & David Robinson. 2017. Text mining with R: A tidy approach. Sebastopol, CA: O’Reilly Media.
Weber, Max. 1949. “‘Objectivity’ in Social Science and Social Policy.” Pp. 50–112 in Max Weber on The Methodology of the Social Sciences, edited by E. A. Shils & H. A. Finch. Illinois: The Free Press of Glencoe.
Weber, Robert Philip. 1990. Basic content analysis. London: Sage.
Wilterdink, Nico. 2012. “Controversial science: Good and bad sociology.” Figurations: Newsletter of the Norbert Elias Foundation 36: 1–12. https://pure.uva.nl/ws/files/4493017/151330_380325.pdf
License
Manuscript authors are responsible for obtaining copyright permissions for any copyrighted materials included within manuscripts. The authors must provide permission letters, when appropriate, to the Society Register Editors.
In addition, all published papers in Society Register are published under a Creative Commons Attribution-NonCommercial 4.0 Unported License.
1.1 The Author hereby warrants that he/she is the owner of all the copyright and other intellectual property rights in the Work and that, within the scope of the present Agreement, the paper does not infringe the legal rights of another person. The owner of the copyright work also warrants that he/she is the sole and original creator thereof and that is not bound by any legal constraints in regard to the use or sale of the work.
1.2. The Publisher warrants that is the owner of the PRESSto platform for open access journals, hereinafter referred to as the PRESSto Platform.
2. The Author grants the Publisher non-exclusive and free of charge license to unlimited use worldwide over an unspecified period of time in the following areas of exploitation:
2.1. production of multiple copies of the Work produced according to the specific application of a given technology, including printing, reproduction of graphics through mechanical or electrical means (reprography) and digital technology;
2.2. marketing authorisation, loan or lease of the original or copies thereof;
2.3. public performance, public performance in the broadcast, video screening, media enhancements as well as broadcasting and rebroadcasting, made available to the public in such a way that members of the public may access the Work from a place and at a time individually chosen by them;
2.4. inclusion of the Work into a collective work (i.e. with a number of contributions);
2.5. inclusion of the Work in the electronic version to be offered on an electronic platform, or any other conceivable introduction of the Work in its electronic version to the Internet;
2.6. dissemination of electronic versions of the Work in its electronic version online, in a collective work or independently;
2.7. making the Work in the electronic version available to the public in such a way that members of the public may access the Work from a place and at a time individually chosen by them, in particular by making it accessible via the Internet, Intranet, Extranet;
2.8. making the Work available according to appropriate license pattern CC BY-NC 4.0 as well as another language version of this license or any later version published by Creative Commons.
3. The Author grants the Publisher permission to reproduce a single copy (print or download) and royalty-free use and disposal of rights to compilations of the Work and these compilations.
4. The Author grants the Publisher permission to send metadata files related to the Work, including to commercial and non-commercial journal-indexing databases.
5. The Author represents that, on the basis of the license granted in the present Agreement, the Publisher is entitled and obliged to:
5.1. allow third parties to obtain further licenses (sublicenses) to the Work and to other materials, including derivatives thereof or compilations made, based on or including the Work, whereas the provisions of such sub-licenses will be the same as with the Attribution 4.0 International (CC BY-NC 4.0) Creative Commons sub-license or another language version of this license, or any later version of this license published by Creative Commons;
5.2. make the Work available to the public in such a way that members of the public may access the Work from a place and at a time individually chosen by them, without any technological constraints;
5.3. appropriately inform members of the public to whom the Work is to be made available about sublicenses in such a way as to ensure that all parties are properly informed (appropriate informing messages).
6. Because of the royalty-free provision of services of the Author (resulting from the scope of obligations stipulated in the present Agreement), the Author shall not be entitled to any author’s fee due and payable on the part of the Publisher (no fee or royalty is payable by the Publisher to the Author).
7.1. In the case of third party claims or actions for indemnity against the Publisher owing to any infractions related to any form of infringement of intellectual property rights protection, including copyright infringements, the Author is obliged to take all possible measures necessary to protect against these claims and, when as a result of legal action, the Publisher, or any third party licensed by the Publisher to use the Work, will have to abandon using the Work in its entirety or in part or, following a court ruling in a legal challenge, to pay damages to a third party, whatever the legal basis
7.2. The Author will immediately inform the Publisher about any damage claims related to intellectual property infringements, including the author’s proprietary rights pertaining to a copyrighted work, filed against the Author. of liability, the Author is obliged to redress the damage resulting from claims made by third party, including costs and expenditures incurred in the process.
7.3. To all matters not settled herein provisions of the Polish Civil Code and the Polish Copyright and Related Rights Act shall apply.