Professional Documents
Culture Documents
Additionally, we also have performance evaluation for the different matrix format-based
implementations compared against the BRC format for different GPU devices (SP/DP =
single/double precision).
Conclusion:
Sparse matrix vector multiplication is best optimised by evenly balancing the load
distribution of all on-zero computations on all processing elements. This would ideally
require a hybrid blocked format derived from CSR which is parallelised based on the merge
path algorithm described. Such an implementation would give optimal speedup, with the
tradeoff being consistent throughput. Power requirements across different platforms needs
further exploring.
References:
[1] Merrill, Duane, and Michael Garland. "Merge-based sparse matrix-vector multiplication
(SpMV) using the CSR storage format." Proceedings of the 21st ACM SIGPLAN
Symposium on Principles and Practice of Parallel Programming. ACM, 2016.
[2] Merrill, Duane, and Michael Garland. "Merge-based sparse matrix-vector multiplication
(SpMV) using the CSR storage format." Proceedings of the 21st ACM SIGPLAN
Symposium on Principles and Practice of Parallel Programming. ACM, 2016.
[3] Bell, Nathan, and Michael Garland. "Implementing sparse matrix-vector multiplication on
throughput-oriented processors." Proceedings of the conference on high performance
computing networking, storage and analysis. ACM, 2009.
[4] Davis, Timothy A., and Yifan Hu. "The University of Florida sparse matrix collection."
ACM Transactions on Mathematical Software (TOMS) 38.1 (2011): 1.