Close

Presentation

Graph Attention Network-based Sparse Format Selection for Accelerating SpMM on GPUs
DescriptionSparse Matrix-Matrix Multiplication (SpMM) is widely used in many scientific and engineering applications, such as numerical simulations and graph neural networks. Previous researchers have proposed numerous sparse formats and corresponding algorithms to enhance performance on GPUs. However, no single SpMM solution consistently outperforms the others due to the complexity of sparse patterns. In this paper, we propose using Graph Attention Network (GAT) to learn these patterns and select an optimal sparse format for SpMM acceleration on GPUs. First, a sparse matrix can inherently be treated as an adjacency matrix in graph, which intuitively transforms the task of format selection into a graph classification problem. Second, we employ GAT to learn the intricate relationships between the characteristics of sparse matrices and the performance of GPU kernels. Our approach preserves most of matrices' structural information and incorporates performance-related statistics as node embeddings, enabling attention mechanisms and message-passing capabilities of GAT to effectively focus on potential latency bottlenecks. Extensive experiments show that our method outperforms state-of-the-art SpMM GPU kernels, delivering an average 1.3x to 1.6x GFLOPs speedup across a diverse set of over 1700 sparse matrices derived from real applications.
Event Type
Work-in-Progress Poster
TimeWednesday, June 265:00pm - 6:00pm PDT
LocationLevel 2 Lobby
Topics
AI
Autonomous Systems
Cloud
Design
EDA
Embedded Systems
IP
Security