Abstract: Many-core GPUs provide high computing ability and
substantial bandwidth; however, optimizing irregular applications
like SpMV on GPUs becomes a difficult but meaningful task. In this
paper, we propose a novel method to improve the performance of
SpMV on GPUs. A new storage format called HYB-R is proposed to
exploit GPU architecture more efficiently. The COO portion of the
matrix is partitioned recursively into a ELL portion and a COO
portion in the process of creating HYB-R format to ensure that there
are as many non-zeros as possible in ELL format. The method of
partitioning the matrix is an important problem for HYB-R kernel, so
we also try to tune the parameters to partition the matrix for higher
performance. Experimental results show that our method can get
better performance than the fastest kernel (HYB) in NVIDIA-s
SpMV library with as high as 17% speedup.