Speedup Breadth-First Search by Graph Ordering

Breadth-First Search (BFS) is a core graph algorithm that is widely used for graph analysis. As it is frequently used in many graph applications, improving the BFS performance is essential. In this paper, we present a graph ordering method that could reorder the graph nodes to achieve better data locality, thus, improving the BFS performance. Our method is based on an observation that the sibling relationships will dominate the cache access pattern during the BFS traversal. Therefore, we propose a frequency-based model to construct the graph order. First, we optimize the graph order according to the nodes’ visit frequency. Nodes with high visit frequency will be processed in priority. Second, we try to maximize the child nodes’ overlap layer by layer. As it is proved to be NP-hard, we propose a heuristic method that could greatly reduce the preprocessing overheads.We conduct extensive experiments on 16 real-world datasets. The result shows that our method could achieve comparable performance with the state-of-the-art methods while the graph ordering overheads are only about 1/15.


Authors:



References:
[1] LAW: The laboratory for web algorithmics. http://http://law.di.unimi.it.
[2] V. Agarwal, F. Petrini, D. Pasetto, and D. A. Bader. Scalable graph
exploration on multicore processors. In SC’10: Proceedings of the 2010
ACM/IEEE International Conference for High Performance Computing,
Networking, Storage and Analysis, pages 1–11. IEEE, 2010.
[3] A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. Dbmss on
a modern processor: Where does time go? In VLDB’99, Proceedings
of 25th International Conference on Very Large Data Bases, September
7-10, 1999, Edinburgh, Scotland, UK, number CONF, pages 266–277,
1999.
[4] J. Arai, H. Shiokawa, T. Yamamuro, M. Onizuka, and S. Iwamura.
Rabbit order: Just-in-time parallel reordering for fast graph analysis.
In 2016 IEEE International Parallel and Distributed Processing
Symposium (IPDPS), pages 22–31. IEEE, 2016.
[5] D. A. Bader and K. Madduri. Designing multithreaded algorithms
for breadth-first search and st-connectivity on the cray mta-2. In
2006 International Conference on Parallel Processing (ICPP’06), pages
523–530. IEEE, 2006.
[6] V. Balaji and B. Lucia. When is graph reordering an optimization?
studying the effect of lightweight graph reordering across applications
and input graphs. In 2018 IEEE International Symposium on Workload
Characterization (IISWC), 2018.
[7] P. Boldi, B. Codenotti, M. Santini, and S. Vigna. Ubicrawler: A
scalable fully distributed web crawler. Software: Practice & Experience,
34(8):711–726, 2004.
[8] P. Boldi, M. Rosa, M. Santini, and S. Vigna. Layered label propagation:
A multiresolution coordinate-free ordering for compressing social
networks. In Proceedings of the 20th international conference on World
wide web, pages 587–596, 2011.
[9] J. Cheng, Z. Shang, H. Cheng, H. Wang, and J. X. Yu. K-reach: who
is in your small world. arXiv preprint arXiv:1208.0090, 2012.
[10] B. V. Cherkassky, A. V. Goldberg, and T. Radzik. Shortest
paths algorithms: Theory and experimental evaluation. Mathematical
programming, 73(2):129–174, 1996.
[11] J. Chhugani, N. Satish, C. Kim, J. Sewall, and P. Dubey. Fast and
efficient graph traversal algorithm for cpus: Maximizing single-node
efficiency. In 2012 IEEE 26th International Parallel and Distributed
Processing Symposium, pages 378–389. IEEE, 2012.
[12] F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi,
and P. Raghavan. On compressing social networks. In Proceedings of the
15th ACM SIGKDD international conference on Knowledge discovery
and data mining, pages 219–228, 2009.
[13] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction
to algorithms. MIT press, 2009.
[14] E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric
matrices. In Proceedings of the 1969 24th national conference, pages
157–172, 1969.
[15] S. Dolev, Y. Elovici, and R. Puzis. Routing betweenness centrality.
Journal of the ACM (JACM), 57(4):1–27, 2010.
[16] M. Frasca, K. Madduri, and P. Raghavan. Numa-aware graph
mining techniques for performance and energy efficiency. In SC’12:
Proceedings of the International Conference on High Performance
Computing, Networking, Storage and Analysis, pages 1–11. IEEE, 2012.
[17] A. George. Nested dissection of a regular finite element mesh. SIAM
Journal on Numerical Analysis, 10(2):345–363, 1973.
[18] J. A. George. Computer implementation of the finite element method.
Technical report, STANFORD UNIV CA DEPT OF COMPUTER
SCIENCE, 1971.
[19] S. Hong, T. Oguntebi, and K. Olukotun. Efficient parallel graph
exploration on multi-core cpu and gpu. In 2011 International Conference
on Parallel Architectures and Compilation Techniques, pages 78–88.
IEEE, 2011.
[20] K. I. Karantasis, A. Lenharth, D. Nguyen, M. J. Garzaran, and K. Pingali.
Parallelization of reordering algorithms for bandwidth and wavefront
reduction. In SC’14: Proceedings of the International Conference for
High Performance Computing, Networking, Storage and Analysis, pages
921–932. IEEE, 2014.
[21] J. Kunegis. The koblenz network collection. http://konect.uni-koblenz.
de, 2017.
[22] K. Lakhotia, S. Singapura, R. Kannan, and V. Prasanna. Recall:
Reordered cache aware locality based graph processing. In 2017 IEEE
24th International Conference on High Performance Computing (HiPC),
pages 273–282. IEEE, 2017.
[23] D. LaSalle and G. Karypis. Multi-threaded graph partitioning. In
2013 IEEE 27th International Symposium on Parallel and Distributed
Processing, pages 225–236. IEEE, 2013.
[24] J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network
dataset collection. http://snap.stanford.edu/data, June 2014.
[25] Y. Lim, U. Kang, and C. Faloutsos. Slashburn: Graph compression and
mining beyond caveman communities. IEEE Transactions on Knowledge
and Data Engineering, 26(12):3077–3089, 2014.
[26] K. Madduri, D. Ediger, K. Jiang, D. A. Bader, and D. Chavarria-Miranda.
A faster parallel algorithm and efficient multithreaded implementations
for evaluating betweenness centrality on massive datasets. In 2009 IEEE
International Symposium on Parallel & Distributed Processing, pages
1–8. IEEE, 2009.
[27] J. Malicevic, B. Lepers, and W. Zwaenepoel. Everything you always
wanted to know about multicore graph processing but were afraid to ask.
In 2017 {USENIX} Annual Technical Conference ({USENIX}{ATC}
17), pages 631–643, 2017.
[28] A. McLaughlin and D. A. Bader. Scalable and high performance
betweenness centrality on the gpu. In SC’14: Proceedings of the
International Conference for High Performance Computing, Networking,
Storage and Analysis, pages 572–583. IEEE, 2014.
[29] V. Ufimtsev and S. Bhowmick. Application of group testing in
identifying high betweenness centrality vertices in complex networks.
In Eleventh Workshop on Mining and Learning with Graphs, pages 1–8,
2013.
[30] H. Wei, J. X. Yu, C. Lu, and X. Lin. Speedup graph processing by
graph ordering. In Proceedings of the 2016 International Conference
on Management of Data, pages 1813–1828, 2016.
[31] Y. Zhang, V. Kiriansky, C. Mendis, S. Amarasinghe, and M. Zaharia.
Making caches work for graph analytics. In 2017 IEEE International
Conference on Big Data (Big Data), pages 293–302. IEEE, 2017.