Presentation

· Contributors · Organizations · Search Program · Flagged · Happening Now

Graph Attention Network-based Sparse Format Selection for Accelerating SpMM on GPUs

SessionWednesday Work-in-Progress Posters

DescriptionSparse Matrix-Matrix Multiplication (SpMM) is widely used in many scientific and engineering applications, such as numerical simulations and graph neural networks. Previous researchers have proposed numerous sparse formats and corresponding algorithms to enhance performance on GPUs. However, no single SpMM solution consistently outperforms the others due to the complexity of sparse patterns. In this paper, we propose using Graph Attention Network (GAT) to learn these patterns and select an optimal sparse format for SpMM acceleration on GPUs. First, a sparse matrix can inherently be treated as an adjacency matrix in graph, which intuitively transforms the task of format selection into a graph classification problem. Second, we employ GAT to learn the intricate relationships between the characteristics of sparse matrices and the performance of GPU kernels. Our approach preserves most of matrices' structural information and incorporates performance-related statistics as node embeddings, enabling attention mechanisms and message-passing capabilities of GAT to effectively focus on potential latency bottlenecks. Extensive experiments show that our method outperforms state-of-the-art SpMM GPU kernels, delivering an average 1.3x to 1.6x GFLOPs speedup across a diverse set of over 1700 sparse matrices derived from real applications.

Authors

Dezhan Tu

University of California, Los Angeles

Tiandong Zhao

University of California, Los Angeles

Zhuofu Tao

University of California, Los Angeles

Tianjia Zhou

University of California, Los Angeles

Lei He

University of California, Los Angeles

Event Type