Web Proxy Detection via Bipartite Graphs and One-Mode Projections

With the Internet becoming the dominant channel for business and life, many IPs are increasingly masked using web proxies for illegal purposes such as propagating malware, impersonate phishing pages to steal sensitive data or redirect victims to other malicious targets. Moreover, as Internet traffic continues to grow in size and complexity, it has become an increasingly challenging task to detect the proxy service due to their dynamic update and high anonymity. In this paper, we present an approach based on behavioral graph analysis to study the behavior similarity of web proxy users. Specifically, we use bipartite graphs to model host communications from network traffic and build one-mode projections of bipartite graphs for discovering social-behavior similarity of web proxy users. Based on the similarity matrices of end-users from the derived one-mode projection graphs, we apply a simple yet effective spectral clustering algorithm to discover the inherent web proxy users behavior clusters. The web proxy URL may vary from time to time. Still, the inherent interest would not. So, based on the intuition, by dint of our private tools implemented by WebDriver, we examine whether the top URLs visited by the web proxy users are web proxies. Our experiment results based on real datasets show that the behavior clusters not only reduce the number of URLs analysis but also provide an effective way to detect the web proxies, especially for the unknown web proxies.





References:
[1] Li Z, Alrwais S, Xie Y, et al. Finding the linchpins of the dark web: a study on topologically dedicated hosts on malicious web infrastructures(C)//Security and Privacy (SP), 2013 IEEE Symposium on. IEEE, 2013: 112-126.
[2] Ma J, Saul L K, Savage S, et al. Beyond blacklists: learning to Detect Malicious Websites from Suspicious URLs (C)//Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009: 1245-1254.
[3] Staniford-Chen S, Heberlein L T. Holding Intruders Accountable on the Internet(C)// Security and Privacy, 1995. Proceedings. 1995 IEEE Symposium on. IEEE, 1995:39-49.
[4] Snort. https://www.snort.org/, accessed on:15/11/2017.
[5] Peng P, Ning P, Reeves D S. On the secrecy of timing-based active watermarking trace-back techniques(C)// Security and Privacy, 2006 IEEE Symposium on. IEEE, 2006:15 pp.-349.
[6] Aghaei-Foroushani V, Zincir-Heywood / N. A Proxy Identifier Based on Patterns in Traffic Flows(M). IEEE, 2015.
[7] Lin R M, Chou Y C, Chen K T. Stepping stone detection at the server side(C)// Computer Communications Workshops. 2011:964 - 969.
[8] Kumar R, Gupta B B. Stepping Stone Detection Techniques: Classification and State-of-the-Art(M)// Proceedings of the International Conference on Recent Cognizance in Wireless Communication & Image Processing. Springer India, 2016.
[9] Shullich R, Chu J, Ji P, et al. A Survey Of Research In Stepping-Stone Detection (J). International Journal of Electronic Commerce Studies, 2011, 2(2).
[10] SeleniumWebDriver.http://docs.seleniumhq.org/projects/webdriver/, accessed on:16/11/2017.
[11] Gama J, Žliobaitė I, Bifet A, et al. A survey on concept drift adaptation(J). ACM Computing Surveys (CSUR), 2014, 46(4): 44.
[12] J. Brozycki. Detecting and preventing anonymous proxy usage, SANS Inst, 2008.
[13] Miller S, Curran K, Lunney T. Traffic Classification for the Detection of Anonymous Web Proxy Routing(J). International Journal for Information Security Research, 2015, 5(1): 538-545.
[14] IP2Proxy, http://www.fraudlabs.com/ip2proxy.aspx, accessed on:13/11/2017.
[15] CIPAFilter, https://cipafilter.com/, accessed on: 10/11/2017.
[16] MaxMind, https://www.maxmind.com/, accessed on: 18/11/2017.
[17] Luxburg U. A tutorial on spectral clustering (M). Kluwer Academic Publishers, 2007.
[18] Global Web Index Q4,2013-Q3,2014 based on the Internet users aged 16-64,http://insight.globalwebindex.net/chart-of-the-day-90-million-vpn-users-in-china-have-accessed-restricted-social-networks?ecid= .
[19] Seifert, C., Welch, I. and Komisarczuk, P., 2007. Honeyc-the low-interaction client honeypot. Proceedings of the 2007 NZCSRCS, Waikato University, Hamilton, New Zealand, 6.
[20] Cova, Marco, Christopher Kruegel, and Giovanni Vigna. "Detection and analysis of drive-by-download attacks and malicious JavaScript code." Proceedings of the 19th international conference on World wide web. ACM, 2010.
[21] De Maio, Giancarlo, et al. "Pexy: The other side of exploit kits." International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, Cham, 2014.
[22] Chen Z, et al. ProxyDetector: A Guided Approach to Finding Web Proxies.(C) The 42nd IEEE Conference on Local Computer Networks (LCN), 2017.
[23] Invernizzi L, Benvenuti S, Cova M, et al. EvilSeed: A Guided Approach to Finding Malicious Web Pages(C)// Security and Privacy. IEEE, 2012:428-442.
[24] Xu, Kuai, Feng Wang, and Lin Gu. "Behavior analysis of internet traffic via bipartite graphs and one-mode projections." IEEE/ACM Transactions on Networking (TON) 22.3 (2014): 931-942.