Fake Account Detection in Twitter Based on Minimum Weighted Feature set

Social networking sites such as Twitter and Facebook attracts over 500 million users across the world, for those users, their social life, even their practical life, has become interrelated. Their interaction with social networking has affected their life forever. Accordingly, social networking sites have become among the main channels that are responsible for vast dissemination of different kinds of information during real time events. This popularity in Social networking has led to different problems including the possibility of exposing incorrect information to their users through fake accounts which results to the spread of malicious content during life events. This situation can result to a huge damage in the real world to the society in general including citizens, business entities, and others. In this paper, we present a classification method for detecting the fake accounts on Twitter. The study determines the minimized set of the main factors that influence the detection of the fake accounts on Twitter, and then the determined factors are applied using different classification techniques. A comparison of the results of these techniques has been performed and the most accurate algorithm is selected according to the accuracy of the results. The study has been compared with different recent researches in the same area; this comparison has proved the accuracy of the proposed study. We claim that this study can be continuously applied on Twitter social network to automatically detect the fake accounts; moreover, the study can be applied on different social network sites such as Facebook with minor changes according to the nature of the social network which are discussed in this paper.




References:
[1] Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro,
"Aiding the detection of fake accounts in large scale social online
services," in Proceedings of the 9th USENIX conference on Networked
Systems Design and Implementation, 2012.
[2] Carlos Castillo, Marcelo Mendoza, and Barbara Poblete, "Information
credibility on twitter," in Proceedings of the 20th international
conference on Worldwide web, 2011.
[3] Manish Gupta, Peixiang Zhao, and Jiawei Han, "Evaluating Event
Credibility on Twitter," Siam, 2012.
[4] P. Heymann, G. Koutrika, and H. Garcia-Molina, "Fighting spam on
social web sites: A survey of approaches and future challenges," IEEE
Internet Computing, 11, 2007.
[5] Aditi Gupta, Hemank Lamba, and Ponnurangam Kumaraguru, "$1.00
per RT #BostonMarathon #PrayForBoston: Analyzing Fake Content on
Twitter," Eigth IEEE APWG eCrime Research Summit (eCRS), 12,
2013.
[6] Yazan Boshmaf et al., "Íntegro: Leveraging Victim Prediction for
Robust Fake Account Detection in OSNs," in NDSS ’15, 8-11 , San
Diego, CA, USA, February 2015.
[7] Vladislav Kontsevoi, Naim Lujan, and Adrian Orozco, "Detecting
Subversion of Twitter," May 14, 2014.
[8] Fabr´ıcio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virg´ılio
Almeida, "Detecting spammers on twitter," Collaboration, electronic
messaging, anti-abuse and spam conference (CEAS). Vol. 6, 2010.
[9] Supraja Gurajala, Joshua S. White, Brian Hudson, and Jeanna N.
Matthews, "Fake Twitter accounts: Profile characteristics obtained using
an activity-based pattern detection approach," in SMSociety '15, July 27
- 29, Toronto, ON, Canada, 2015
[10] G. Stringhini, C. Kruegel, and G. Vigna, "Detecting spammers on social
networks," in Proceedings of the 26th Annual Computer Security
Applications Conference, 2010, pp. 1–9.
[11] L. Breiman, "Random forests," Machine Learning, 2001.
[12] Zhi Yang et al., "Uncovering Social Network Sybils in the Wild," in
Proceedings of the 2011 ACM SIGCOMM conference on Internet
measurement conference, November 02-04, 2011, Berlin, Germany,
2011.
[13] T. Joachims, Learning to Classify Text Using Support Vector Machines:
Methods, Theory, and Algorithms. Boston: Kluwer Academic
Publishers, 2002.
[14] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze,
Introduction to Information Retrieval. New York: Cambridge
University, 2008.
[15] SocialBakers. (Online) http://www.socialbakers.com/products/
analytics?ref=fakefollowers-top-bar, last retrieved on 30-10-2015
[16] M. Camisani-Calzolari. (2012, August ) Analysis of Twitter followers of
the US Presidential Election candidates: Barack Obama and Mitt
Romney. (Online). http://digitalevaluations.com/
[17] The Fake project. (Online). http://wafi.iit.cnr.it/theFakeProject/ (last
retrieved on 30-10-2015).
[18] Asha Gowda Karegowda, A. S. Manjunath, and M.A. Jayaram,
"Comparative Study of Attribute Selection Using Gain Ratio,"
International Journal of Information Technology and Knowledge
Management, vol. 2, no. 2, pp. 271-277, July-December 2010.
[19] Tatsunori Mori, Miwa Kikuchi, and Kazufumi Yoshida, "ermWeighting
Method based on Information Gain Ratio for Summarizing Documents
retrieved by IR systems," Journal of Natural Language Processing, vol.
9, no. 4, pp. 3--32, 2002.
[20] S. Cresci, M. Petrocchi, and R. Di Pietro, "A criticism to Society (As
seen by Twitter analystics)," in IEEE 34th international conference on
distributes computing systems workshops, 2014.
[21] S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, and M. Tesconi, "A
Fake Follower Story: improving fake accounts detection on Twitter,"
2014.
[22] Bas Van Den Beld. (2012, September) Stateofsearch.com. (Online).
http://goo.gl/YZbVf
[23] Manuel Fern_andez Delgado, Eva Cernadas, Sen_en Barro, and Dinani
Amorim, "Do we Need Hundreds of Classifiers to Solve Real World
Classification Problems?," Journal of Machine Learning Research, vol.
15, pp. 3133-3181, 2014.
[24] Lior Rokach and Oded Maimon, Data Mining and Knowledge
Discovery Handbook - Chapter 9 (Decision Trees), Oded Maimon and
Lior Rokach, Eds., 2005.
[25] David Kriesel, A Brief Introduction to Neural Networks.: dkriesel.com,
2005. (Online) http://www.dkriesel.com/en/science/neural_networks
(last retrieved 30-10-2015).
[26] Jesse Davis and Mark Goadrich, "The Relationship between Precision-
Recall and ROC Curves," in Proceedings of the 23rd International
Conference on Machine Learning, Pittsburgh, 2006.