Fake Account Detection in Twitter Based on Minimum Weighted Feature set
Social networking sites such as Twitter and Facebook
attracts over 500 million users across the world, for those users, their
social life, even their practical life, has become interrelated. Their
interaction with social networking has affected their life forever.
Accordingly, social networking sites have become among the main
channels that are responsible for vast dissemination of different kinds
of information during real time events. This popularity in Social
networking has led to different problems including the possibility of
exposing incorrect information to their users through fake accounts
which results to the spread of malicious content during life events.
This situation can result to a huge damage in the real world to the
society in general including citizens, business entities, and others. In this paper, we present a classification method for detecting the
fake accounts on Twitter. The study determines the minimized set of
the main factors that influence the detection of the fake accounts on
Twitter, and then the determined factors are applied using different
classification techniques. A comparison of the results of these
techniques has been performed and the most accurate algorithm is
selected according to the accuracy of the results. The study has been
compared with different recent researches in the same area; this
comparison has proved the accuracy of the proposed study. We claim
that this study can be continuously applied on Twitter social network
to automatically detect the fake accounts; moreover, the study can be
applied on different social network sites such as Facebook with minor
changes according to the nature of the social network which are
discussed in this paper.
[1] Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro,
"Aiding the detection of fake accounts in large scale social online
services," in Proceedings of the 9th USENIX conference on Networked
Systems Design and Implementation, 2012.
[2] Carlos Castillo, Marcelo Mendoza, and Barbara Poblete, "Information
credibility on twitter," in Proceedings of the 20th international
conference on Worldwide web, 2011.
[3] Manish Gupta, Peixiang Zhao, and Jiawei Han, "Evaluating Event
Credibility on Twitter," Siam, 2012.
[4] P. Heymann, G. Koutrika, and H. Garcia-Molina, "Fighting spam on
social web sites: A survey of approaches and future challenges," IEEE
Internet Computing, 11, 2007.
[5] Aditi Gupta, Hemank Lamba, and Ponnurangam Kumaraguru, "$1.00
per RT #BostonMarathon #PrayForBoston: Analyzing Fake Content on
Twitter," Eigth IEEE APWG eCrime Research Summit (eCRS), 12,
2013.
[6] Yazan Boshmaf et al., "Íntegro: Leveraging Victim Prediction for
Robust Fake Account Detection in OSNs," in NDSS ’15, 8-11 , San
Diego, CA, USA, February 2015.
[7] Vladislav Kontsevoi, Naim Lujan, and Adrian Orozco, "Detecting
Subversion of Twitter," May 14, 2014.
[8] Fabr´ıcio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virg´ılio
Almeida, "Detecting spammers on twitter," Collaboration, electronic
messaging, anti-abuse and spam conference (CEAS). Vol. 6, 2010.
[9] Supraja Gurajala, Joshua S. White, Brian Hudson, and Jeanna N.
Matthews, "Fake Twitter accounts: Profile characteristics obtained using
an activity-based pattern detection approach," in SMSociety '15, July 27
- 29, Toronto, ON, Canada, 2015
[10] G. Stringhini, C. Kruegel, and G. Vigna, "Detecting spammers on social
networks," in Proceedings of the 26th Annual Computer Security
Applications Conference, 2010, pp. 1–9.
[11] L. Breiman, "Random forests," Machine Learning, 2001.
[12] Zhi Yang et al., "Uncovering Social Network Sybils in the Wild," in
Proceedings of the 2011 ACM SIGCOMM conference on Internet
measurement conference, November 02-04, 2011, Berlin, Germany,
2011.
[13] T. Joachims, Learning to Classify Text Using Support Vector Machines:
Methods, Theory, and Algorithms. Boston: Kluwer Academic
Publishers, 2002.
[14] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze,
Introduction to Information Retrieval. New York: Cambridge
University, 2008.
[15] SocialBakers. (Online) http://www.socialbakers.com/products/
analytics?ref=fakefollowers-top-bar, last retrieved on 30-10-2015
[16] M. Camisani-Calzolari. (2012, August ) Analysis of Twitter followers of
the US Presidential Election candidates: Barack Obama and Mitt
Romney. (Online). http://digitalevaluations.com/
[17] The Fake project. (Online). http://wafi.iit.cnr.it/theFakeProject/ (last
retrieved on 30-10-2015).
[18] Asha Gowda Karegowda, A. S. Manjunath, and M.A. Jayaram,
"Comparative Study of Attribute Selection Using Gain Ratio,"
International Journal of Information Technology and Knowledge
Management, vol. 2, no. 2, pp. 271-277, July-December 2010.
[19] Tatsunori Mori, Miwa Kikuchi, and Kazufumi Yoshida, "ermWeighting
Method based on Information Gain Ratio for Summarizing Documents
retrieved by IR systems," Journal of Natural Language Processing, vol.
9, no. 4, pp. 3--32, 2002.
[20] S. Cresci, M. Petrocchi, and R. Di Pietro, "A criticism to Society (As
seen by Twitter analystics)," in IEEE 34th international conference on
distributes computing systems workshops, 2014.
[21] S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, and M. Tesconi, "A
Fake Follower Story: improving fake accounts detection on Twitter,"
2014.
[22] Bas Van Den Beld. (2012, September) Stateofsearch.com. (Online).
http://goo.gl/YZbVf
[23] Manuel Fern_andez Delgado, Eva Cernadas, Sen_en Barro, and Dinani
Amorim, "Do we Need Hundreds of Classifiers to Solve Real World
Classification Problems?," Journal of Machine Learning Research, vol.
15, pp. 3133-3181, 2014.
[24] Lior Rokach and Oded Maimon, Data Mining and Knowledge
Discovery Handbook - Chapter 9 (Decision Trees), Oded Maimon and
Lior Rokach, Eds., 2005.
[25] David Kriesel, A Brief Introduction to Neural Networks.: dkriesel.com,
2005. (Online) http://www.dkriesel.com/en/science/neural_networks
(last retrieved 30-10-2015).
[26] Jesse Davis and Mark Goadrich, "The Relationship between Precision-
Recall and ROC Curves," in Proceedings of the 23rd International
Conference on Machine Learning, Pittsburgh, 2006.
[1] Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro,
"Aiding the detection of fake accounts in large scale social online
services," in Proceedings of the 9th USENIX conference on Networked
Systems Design and Implementation, 2012.
[2] Carlos Castillo, Marcelo Mendoza, and Barbara Poblete, "Information
credibility on twitter," in Proceedings of the 20th international
conference on Worldwide web, 2011.
[3] Manish Gupta, Peixiang Zhao, and Jiawei Han, "Evaluating Event
Credibility on Twitter," Siam, 2012.
[4] P. Heymann, G. Koutrika, and H. Garcia-Molina, "Fighting spam on
social web sites: A survey of approaches and future challenges," IEEE
Internet Computing, 11, 2007.
[5] Aditi Gupta, Hemank Lamba, and Ponnurangam Kumaraguru, "$1.00
per RT #BostonMarathon #PrayForBoston: Analyzing Fake Content on
Twitter," Eigth IEEE APWG eCrime Research Summit (eCRS), 12,
2013.
[6] Yazan Boshmaf et al., "Íntegro: Leveraging Victim Prediction for
Robust Fake Account Detection in OSNs," in NDSS ’15, 8-11 , San
Diego, CA, USA, February 2015.
[7] Vladislav Kontsevoi, Naim Lujan, and Adrian Orozco, "Detecting
Subversion of Twitter," May 14, 2014.
[8] Fabr´ıcio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virg´ılio
Almeida, "Detecting spammers on twitter," Collaboration, electronic
messaging, anti-abuse and spam conference (CEAS). Vol. 6, 2010.
[9] Supraja Gurajala, Joshua S. White, Brian Hudson, and Jeanna N.
Matthews, "Fake Twitter accounts: Profile characteristics obtained using
an activity-based pattern detection approach," in SMSociety '15, July 27
- 29, Toronto, ON, Canada, 2015
[10] G. Stringhini, C. Kruegel, and G. Vigna, "Detecting spammers on social
networks," in Proceedings of the 26th Annual Computer Security
Applications Conference, 2010, pp. 1–9.
[11] L. Breiman, "Random forests," Machine Learning, 2001.
[12] Zhi Yang et al., "Uncovering Social Network Sybils in the Wild," in
Proceedings of the 2011 ACM SIGCOMM conference on Internet
measurement conference, November 02-04, 2011, Berlin, Germany,
2011.
[13] T. Joachims, Learning to Classify Text Using Support Vector Machines:
Methods, Theory, and Algorithms. Boston: Kluwer Academic
Publishers, 2002.
[14] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze,
Introduction to Information Retrieval. New York: Cambridge
University, 2008.
[15] SocialBakers. (Online) http://www.socialbakers.com/products/
analytics?ref=fakefollowers-top-bar, last retrieved on 30-10-2015
[16] M. Camisani-Calzolari. (2012, August ) Analysis of Twitter followers of
the US Presidential Election candidates: Barack Obama and Mitt
Romney. (Online). http://digitalevaluations.com/
[17] The Fake project. (Online). http://wafi.iit.cnr.it/theFakeProject/ (last
retrieved on 30-10-2015).
[18] Asha Gowda Karegowda, A. S. Manjunath, and M.A. Jayaram,
"Comparative Study of Attribute Selection Using Gain Ratio,"
International Journal of Information Technology and Knowledge
Management, vol. 2, no. 2, pp. 271-277, July-December 2010.
[19] Tatsunori Mori, Miwa Kikuchi, and Kazufumi Yoshida, "ermWeighting
Method based on Information Gain Ratio for Summarizing Documents
retrieved by IR systems," Journal of Natural Language Processing, vol.
9, no. 4, pp. 3--32, 2002.
[20] S. Cresci, M. Petrocchi, and R. Di Pietro, "A criticism to Society (As
seen by Twitter analystics)," in IEEE 34th international conference on
distributes computing systems workshops, 2014.
[21] S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, and M. Tesconi, "A
Fake Follower Story: improving fake accounts detection on Twitter,"
2014.
[22] Bas Van Den Beld. (2012, September) Stateofsearch.com. (Online).
http://goo.gl/YZbVf
[23] Manuel Fern_andez Delgado, Eva Cernadas, Sen_en Barro, and Dinani
Amorim, "Do we Need Hundreds of Classifiers to Solve Real World
Classification Problems?," Journal of Machine Learning Research, vol.
15, pp. 3133-3181, 2014.
[24] Lior Rokach and Oded Maimon, Data Mining and Knowledge
Discovery Handbook - Chapter 9 (Decision Trees), Oded Maimon and
Lior Rokach, Eds., 2005.
[25] David Kriesel, A Brief Introduction to Neural Networks.: dkriesel.com,
2005. (Online) http://www.dkriesel.com/en/science/neural_networks
(last retrieved 30-10-2015).
[26] Jesse Davis and Mark Goadrich, "The Relationship between Precision-
Recall and ROC Curves," in Proceedings of the 23rd International
Conference on Machine Learning, Pittsburgh, 2006.
@article{"International Journal of Information, Control and Computer Sciences:71660", author = "Ahmed El Azab and Amira M. Idrees and Mahmoud A. Mahmoud and Hesham Hefny", title = "Fake Account Detection in Twitter Based on Minimum Weighted Feature set", abstract = "Social networking sites such as Twitter and Facebook
attracts over 500 million users across the world, for those users, their
social life, even their practical life, has become interrelated. Their
interaction with social networking has affected their life forever.
Accordingly, social networking sites have become among the main
channels that are responsible for vast dissemination of different kinds
of information during real time events. This popularity in Social
networking has led to different problems including the possibility of
exposing incorrect information to their users through fake accounts
which results to the spread of malicious content during life events.
This situation can result to a huge damage in the real world to the
society in general including citizens, business entities, and others. In this paper, we present a classification method for detecting the
fake accounts on Twitter. The study determines the minimized set of
the main factors that influence the detection of the fake accounts on
Twitter, and then the determined factors are applied using different
classification techniques. A comparison of the results of these
techniques has been performed and the most accurate algorithm is
selected according to the accuracy of the results. The study has been
compared with different recent researches in the same area; this
comparison has proved the accuracy of the proposed study. We claim
that this study can be continuously applied on Twitter social network
to automatically detect the fake accounts; moreover, the study can be
applied on different social network sites such as Facebook with minor
changes according to the nature of the social network which are
discussed in this paper.", keywords = "Fake accounts detection, classification algorithms,
twitter accounts analysis, features based techniques.", volume = "10", number = "1", pages = "13-6", }