Extracting Attributes for Twitter Hashtag Communities

Various organisations often need to understand discussions on social media, such as what trending topics are and characteristics of the people engaged in the discussion. A number of approaches have been proposed to extract attributes that would characterise a discussion group. However, these approaches are largely based on supervised learning, and as such they require a large amount of labelled data. We propose an approach in this paper that does not require labelled data, but rely on lexical sources to detect meaningful attributes for online discussion groups. Our findings show an acceptable level of accuracy in detecting attributes for Twitter discussion groups.





References:
[1] K. H. Lim and A. Datta, “Finding twitter communities with common
interests using following links of celebrities,” in Proceedings of the 3rd
international workshop on Modeling social media, 2012, pp. 25–32.
[2] ——, “Following the follower: Detecting communities with common
interests on twitter,” in Proceedings of the 23rd ACM conference on
Hypertext and social media, 2012, pp. 317–318.
[3] M. Bakillah, R.-Y. Li, and S. H. Liang, “Geo-located community
detection in twitter with enhanced fast-greedy optimization of
modularity: the case study of typhoon haiyan,” International Journal
of Geographical Information Science, vol. 29, no. 2, pp. 258–279, 2015.
[4] B. R. Amor, S. I. Vuik, R. Callahan, A. Darzi, S. N. Yaliraki, and
M. Barahona, “Community detection and role identification in directed
networks: understanding the twitter network of the care. data debate,”
in Dynamic networks and cyber-security. World Scientific, 2016, pp.
111–136.
[5] N. Cao, L. Lu, Y.-R. Lin, F. Wang, and Z. Wen, “Socialhelix:
visual analysis of sentiment divergence in social media,” Journal of
visualization, vol. 18, no. 2, pp. 221–235, 2015.
[6] P. Vijayaraghavan, S. Vosoughi, and D. Roy, “Twitter demographic
classification using deep multi-modal multi-task learning,” in
Proceedings of the 55th Annual Meeting of the Association for
Computational Linguistics (Volume 2: Short Papers), 2017, pp.
478–483.
[7] H. A. Schwartz, J. C. Eichstaedt, M. L. Kern, L. Dziurzynski, S. M.
Ramones, M. Agrawal, A. Shah, M. Kosinski, D. Stillwell, M. E.
Seligman et al., “Personality, gender, and age in the language of social
media: The open-vocabulary approach,” PloS one, vol. 8, no. 9, p.
e73791, 2013. [8] T. Yo and K. Sasahara, “Inference of personal attributes from tweets
using machine learning,” in 2017 IEEE International Conference on Big
Data (Big Data). IEEE, 2017, pp. 3168–3174.
[9] L. Sloan, J. Morgan, P. Burnap, and M. Williams, “Who tweets? deriving
the demographic characteristics of age, occupation and social class from
twitter user meta-data,” PloS one, vol. 10, no. 3, p. e0115545, 2015.
[10] T. Hu, H. Xiao, J. Luo, and T.-v. T. Nguyen, “What the language
you tweet says about your occupation,” in Tenth International AAAI
Conference on Web and Social Media, 2016.
[11] Z. Wood-Doughty, N. Andrews, R. Marvin, and M. Dredze, “Predicting
twitter user demographics from names alone,” in Proceedings of the
Second Workshop on Computational Modeling of People’s Opinions,
Personality, and Emotions in Social Media, 2018, pp. 105–111.
[12] T. Georgiou, A. El Abbadi, and X. Yan, “Extracting topics with focused
communities for social content recommendation,” in Proceedings of the
2017 ACM Conference on Computer Supported Cooperative Work and
Social Computing. ACM, 2017, pp. 1432–1443.
[13] A. Culotta, N. R. Kumar, and J. Cutler, “Predicting the demographics of
twitter users from website traffic data,” in Twenty-Ninth AAAI Conference
on Artificial Intelligence, 2015.
[14] S. Volkova, Y. Bachrach, and B. Van Durme, “Mining user interests to
predict perceived psycho-demographic traits on twitter,” in 2016 IEEE
Second International Conference on Big Data Computing Service and
Applications (BigDataService). IEEE, 2016, pp. 36–43.
[15] J. Messias, P. Vikatos, and F. Benevenuto, “White, man, and highly
followed: Gender and race inequalities in twitter,” in Proceedings of
the International Conference on Web Intelligence. ACM, 2017, pp.
266–274.
[16] N. Cesare, C. Grant, and E. O. Nsoesie, “Detection of user demographics
on social media: A review of methods and recommendations for best
practices,” arXiv preprint arXiv:1702.01807, 2017.
[17] M. E. Newman and M. Girvan, “Finding and evaluating community
structure in networks,” Physical review E, vol. 69, no. 2, p. 026113,
2004.
[18] S. Fortunato, “Community detection in graphs,” Physics reports, vol.
486, no. 3-5, pp. 75–174, 2010.
[19] Z. Xu, Y. Ke, Y. Wang, H. Cheng, and J. Cheng, “A model-based
approach to attributed graph clustering,” in Proceedings of the 2012 ACM
SIGMOD international conference on management of data. ACM, 2012,
pp. 505–516.
[20] Y. Ruan, D. Fuhry, and S. Parthasarathy, “Efficient community detection
in large networks using content and links,” in Proceedings of the
22nd international conference on World Wide Web. ACM, 2013, pp.
1089–1098.
[21] J. Pan, R. Bhardwaj, W. Lu, H. L. Chieu, X. Pan, and N. Y. Puay,
“Twitter homophily: Network based prediction of user’s occupation,”
in Proceedings of the 57th Annual Meeting of the Association for
Computational Linguistics, 2019, pp. 2633–2638.
[22] A. Chakraborty, J. Messias, F. Benevenuto, S. Ghosh, N. Ganguly,
and K. P. Gummadi, “Who makes trends? understanding demographic
biases in crowdsourced recommendations,” in Eleventh International
AAAI Conference on Web and Social Media, 2017.
[23] M. Vicente, F. Batista, and J. P. Carvalho, “Gender detection of twitter
users based on multiple information sources,” in Interactions Between
Computational Intelligence and Mathematics Part 2. Springer, 2019,
pp. 39–54.
[24] X. Huang, L. Xing, F. Dernoncourt, and M. J. Paul, “Multilingual twitter
corpus and baselines for evaluating demographic bias in hate speech
recognition,” arXiv preprint arXiv:2002.10361, 2020.
[25] A. Mueller, Z. Wood-Doughty, S. Amir, M. Dredze, and A. L. Nobles,
“Demographic representation and collective storytelling in the me too
twitter hashtag activism movement,” arXiv preprint arXiv:2010.06472,
2020.
[26] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of
word representations in vector space,” arXiv preprint arXiv:1301.3781,
2013.
[27] G. A. Miller, “Wordnet: a lexical database for english,” Communications
of the ACM, vol. 38, no. 11, pp. 39–41, 1995.
[28] Y. Li, Z. A. Bandar, and D. McLean, “An approach for measuring
semantic similarity between words using multiple information sources,”
IEEE Transactions on knowledge and data engineering, vol. 15, no. 4,
pp. 871–882, 2003.
[29] Z. Gong, M. Muyeba, and J. Guo, “Business information query
expansion through semantic network,” Enterprise Information Systems,
vol. 4, no. 1, pp. 1–22, 2010.