Topic Modeling Using Latent Dirichlet Allocation and Latent Semantic Indexing on South African Telco Twitter Data

Twitter is one of the most popular social media platforms where users share their opinions on different subjects. Twitter can be considered a great source for mining text due to the high volumes of data generated through the platform daily. Many industries such as telecommunication companies can leverage the availability of Twitter data to better understand their markets and make an appropriate business decision. This study performs topic modeling on Twitter data using Latent Dirichlet Allocation (LDA). The obtained results are benchmarked with another topic modeling technique, Latent Semantic Indexing (LSI). The study aims to retrieve topics on a Twitter dataset containing user tweets on South African Telcos. Results from this study show that LSI is much faster than LDA. However, LDA yields better results with higher topic coherence by 8% for the best-performing model in this experiment. A higher topic coherence score indicates better performance of the model.





References:
[1] A. Madan and U. Ghose, "Sentiment Analysis for Twitter Data in the Hindi," in 11th International Conference on Cloud Computing, Data Science & Engineering, 201, p. 1.
[2] E. S. Negara, D. Triadi, and R. Andryani, "Topic Modelling Twitter Data with Latent Dirichlet," in International Conference on Electrical Engineering and Computer Science, 2019.
[3] S. Writer. "The biggest and most popular social media platforms in South Africa, including TikTok." Business Tech. https://businesstech.co.za/news/internet/502583/the-biggest-and-most-popular-social-media-platforms-in-south-africa-including-tiktok/ (accessed 09/16/2021, 2021).
[4] S. Qomariyah, N. Iriawan, and K. Fithriasari, "Topic modeling Twitter data using Latent Dirichlet Allocation and Latent Semantic Analysis," in The 2nd International Conference on Science, Mathematics, Environment, and Education, 2019AIP
[5] B. Chris. "Can latent Semantic Indexation be regarded as way to do topic modeling." https://www.researchgate.net/post/Can-latent-Semantic-Indexation-be-regarded-as-as-way-to-do-topic-modeling (accessed 22/09/2021, 2021).
[6] K. Nalini and L. J. Sheela, "Classification using Latent Dirichlet Allocation with Naive Bayes Classifier to," Indian Journal of Science and Technology, vol. 9(28), 2016.
[7] E. Laoh, I. Surjandari, and L. R. Febirautami, "Indonesian’s Song Lyrics Topic Modelling using Latent Dirichlet Allocation," in 5th International Conference on Information Science and Control Engineering, 2018.
[8] A. F. Hidayatullah, S. K. Aditya, Karimah, and S. t. Gardini, "Topic modeling of weather and climate condition on twitter using latent dirichlet allocation (LDA)," in IOP Conference Series Materials Science and Engineering, 2019.
[9] I. Antonellis and E. Gallopoulos, "Exploring term-document matrices from matrix," in SIAM Conference of Data Mining, 2006.
[10] M. Asghari, A. S. Elmaghraby, and D. Sierra-Sosa, "Trends on Health in Social Media: Analysis using Twitter Topic Modeling," presented at the IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), 2018.
[11] A. Rozeva and S. Zerkova, "Assessing semantic similarity of texts – Methods and algorithms " in AIP Conference Proceedings 1910, 060012 (2017), 2017, doi: https://doi.org/10.1063/1.5014006.
[12] S. L. Brunton and J. N. Kutz, "Singular Value Decomposition," in Data-Driven Science and Engineering, 2019, pp. 3-46.