Presentation

· Contributors · Organizations · Search Program · Flagged · Happening Now

ViT-slice: End-to-end Vision Transformer Accelerator with Bit-slice Algorithm

SessionEfficient Acceleration Strategies for Transformers: From Token Similarity to Weight Sparsity

DescriptionVision Transformers have demonstrated remarkable performance in various vision tasks. However, general-purpose processors, such as CPUs and GPUs, face challenges in efficiently handling the inference of Vision Transformers. To address the issue, prior works have focused on accelerating only attention due to its high computational cost in NLP Transformers. In contrast, Vision Transformers demonstrate a higher computational cost due to linear modules such as linear transformation, linear projection and Feed-Forward Network (FFN), compared to attention. In this paper, we present ViT-slice, an algorithm-architecture co-design that enhances end-to-end performance and energy efficiency by optimizing not only attention but also linear modules. At the algorithm level, we propose bit-slice compression that avoids storing the redundant most significant bits (MSBs). Additionally, we present bit-slice dot product with early skip to efficiently compute the dot product using bit-sliced data. To enable early skip during the dot product computation, we leverage a trainable threshold. On the hardware level, we introduce a specialized bit-slice dot product unit (BSDPU) to efficiently process the bit-slice dot product with early skip algorithm. Additionally, we present a bit-slice encoder and decoder for on-chip bit-slice compression. ViT-slice achieves 244×, 35.3×, 16.8×, 10.4×, 5.0× end-to-end speedup over Xeon CPU, EdgeGPU, TITAN Xp GPU, Sanger accelerator and ViTCoD accelerator, respectively.

Authors

Event Type

Research Manuscript

TimeTuesday, June 252:15pm - 2:30pm PDT

Location3003, 3rd Floor

Topics

Keywords

Next PresentationNext Presentation

DAC 2024