Close

Presentation

SpARC: Token Similarity-Aware Sparse Attention Transformer Accelerator via Row-wise Clustering
DescriptionIn this paper, we propose SpARC, a sparse attention transformer accelerator that enhances throughput and energy efficiency by reducing the computational complexity of the self-attention mechanism. Our approach exploits inherent row-level redundancies in transformer attention maps to reduce the overall self-attention computation. By employing row-wise clustering, attention scores are calculated only once per cluster to achieve approximate attention without seriously compromising accuracy. To leverage the high parallelism of the proposed clustering approximate attention, we develop a fully pipelined accelerator with a dedicated memory hierarchy.
Event Type
Research Manuscript
TimeTuesday, June 252:30pm - 2:45pm PDT
Location3003, 3rd Floor
Topics
AI
Design
Keywords
AI/ML Architecture Design