Presentation

· Contributors · Organizations · Search Program · Flagged · Happening Now

SpARC: Token Similarity-Aware Sparse Attention Transformer Accelerator via Row-wise Clustering

SessionEfficient Acceleration Strategies for Transformers: From Token Similarity to Weight Sparsity

DescriptionIn this paper, we propose SpARC, a sparse attention transformer accelerator that enhances throughput and energy efficiency by reducing the computational complexity of the self-attention mechanism. Our approach exploits inherent row-level redundancies in transformer attention maps to reduce the overall self-attention computation. By employing row-wise clustering, attention scores are calculated only once per cluster to achieve approximate attention without seriously compromising accuracy. To leverage the high parallelism of the proposed clustering approximate attention, we develop a fully pipelined accelerator with a dedicated memory hierarchy.

Authors

Event Type

Research Manuscript