Accelerating Sparse Matrix Vector Multiplication on Many-Core GPUs

Scholarly

Volume:6, Issue: 1, 2012 Page No: 23 - 30

International Journal of Information, Control and Computer Sciences

ISSN: 2517-9942

1551 Downloads

Abstract Full Text Download References Share Add to Favorites

DOI:10.5281/zenodo.1057071 BibTeX JSON

Accelerating Sparse Matrix Vector Multiplication on Many-Core GPUs

Many-core GPUs provide high computing ability and substantial bandwidth; however, optimizing irregular applications like SpMV on GPUs becomes a difficult but meaningful task. In this paper, we propose a novel method to improve the performance of SpMV on GPUs. A new storage format called HYB-R is proposed to exploit GPU architecture more efficiently. The COO portion of the matrix is partitioned recursively into a ELL portion and a COO portion in the process of creating HYB-R format to ensure that there are as many non-zeros as possible in ELL format. The method of partitioning the matrix is an important problem for HYB-R kernel, so we also try to tune the parameters to partition the matrix for higher performance. Experimental results show that our method can get better performance than the fastest kernel (HYB) in NVIDIA-s SpMV library with as high as 17% speedup.

Authors:

Keywords:

References:

[1] E.-J. Im, "Optimizing the performance of sparse matrix-vector
multiplication", PhD thesis, University of California, Berkeley, May
2000.
[2] R. Vuduc, " Automatic Performance Tuning of Sparse Matrix Kernels",
PhD thesis, University of California, Berkeley, December 2003.
[3] S. Williams , "Auto-tuning Performance on Multicore Computers", PhD
thesis, University of California, Berkeley, 2008.
[4] Sam Williams, Richard Vuduc, Leonid Oliker, John Shalf, Katherine
Yelick, and James Demmel, "Optimizing sparse matrix-vector multiply
on emerging multicore platforms," Journal of Parallel Computing, vol.
35, no. 3, pp.178-194, March 2009.
[5] Jeff Bolz, Ian Farmer, et al. , "Sparse matrix solvers on the GPU:
Conjugate gradients and multigrid," In Proc. Special Interest Group on
Graphics Conf. (SIGGRAPH), San Diego, CA, USA, July 2003.
[6] Shubhabrata Sengupta, Mark Harris, Yao Zhang, and John D. Owens,
"Scan primitives for GPU computing," In Proc. ACM Dense Protein
QCD Cantilever Spheres Harbor Ship Wind Tunnel
SIGGRAPH/EUROGRAPHICS Symp. Graphics Hardware, San Diego,
CA, USA, 2007.
[7] Nathan Bell and Michael Garland, "Efficient sparse matrix-vector
multiplication on CUDA," In Proc. ACM/IEEE Conf. Supercomputing
(SC), Portland, OR, USA, November 2009.
[8] Muthu Manikandan Baskaran and Rajesh Bordawekar, "Optimizing
sparse matrix-vector multiplication on GPUs using compile-time and
run-time strategies," Technical Report RC24704 (W0812-047), IBM
T.J.Watson Research Center, Yorktown Heights, NY, USA, December
2008.
[9] Ali Cevahir , Akira Nukada , Satoshi Matsuoka, "Fast Conjugate
Gradients with Multiple GPUs," Proceedings of the 9th International
Conference on Computational Science: Part I, Baton Rouge, LA, May
25-27, 2009.
[10] F. Vazquez, E. M. Garzon, J.A.Martinez, and J.J.Fernandez, "The sparse
matrix vector product on GPUs," Technical report, University of
Almeria, June 2009.
[11] Monakov, A., A. Lokhmotov, and A. Avetisyan, "Automatically tuning
sparse matrix-vector multiplication for GPU architectures," High
Performance Embedded Architectures and Compilers, Lecture Notes in
Computer Science, Vol. 5952, pp.111-125, 2010.
[12] J. W. Choi, A. Singh, and R. Vuduc, "Model-driven autotuning of sparse
matrix-vector multiply on gpus," In PPOPP, pp.115-126, 2010.
[13] Ping Guo, Liqiang Wang, "Auto-Tuning CUDA Parameters for Sparse
Matrix-Vector Multiplication on GPUs," The 2010 International
Conference on Computational and Information Sciences, Chengdu,
China.
[14] A. Bulu├º, J. T. Fineman, M. Frigo, J. R. Gilbert, and E. Leiserson,
"Parallel sparse matrix-vector and matrixtranspose-vector multiplication
using compressed sparse blocks," In SPAA, pp. 233-244, 2009.
[15] A. Bulu├º, et.al., "Reduced-Bandwidth Multithreaded Algorithms for
Sparse Matrix-Vector Multiplication," In IPDPS, 2011.
[16] K. Kourtis, V. Karakasis, G. Goumas, and N. Koziris, "CSX: An
Extended Compression Format for SpMV on Shared Memory
Systems,"In PPoPP, pp. 247-256, San Antonio, Texas, USA, 2011.
[17] Xintian Yang, Srinivasan Parthasarathy, P. Sadayappan, "Fast Sparse
Matrix-Vector Multiplication on GPUs: Implications for Graph Mining,"
In VLDB, 2011.
[18] NVIDIA CUDA (Compute Unified Device Architecture): Programming
Guide, Version 2.1, December 2008.
[19] J. Willcock, A. Lumsdaine, "Accelerating sparse matrix computations
via data compression," In ICS, Cairns, Australia, June 2006.

Scholarly

International Journal of Information, Control and Computer Sciences

Archive

Last Issue

Commitee

Accelerating Sparse Matrix Vector Multiplication on Many-Core GPUs

Scholarly

International Journal of Information, Control and Computer Sciences

Archive

Last Issue

Commitee

Accelerating Sparse Matrix Vector Multiplication on Many-Core GPUs

Preview