BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240626T180033Z
LOCATION:3003\, 3rd Floor
DTSTART;TZID=America/Los_Angeles:20240625T141500
DTEND;TZID=America/Los_Angeles:20240625T143000
UID:dac_DAC 2024_sess158_RESEARCH1297@linklings.com
SUMMARY:ViT-slice: End-to-end Vision Transformer Accelerator with Bit-slic
 e Algorithm
DESCRIPTION:Research Manuscript\n\nDongjin Shin, Insu Choi, and Joon-Sung 
 Yang (Yonsei University)\n\nVision Transformers have demonstrated remarkab
 le performance in various vision tasks. However, general-purpose processor
 s, such as CPUs and GPUs, face challenges in efficiently handling the infe
 rence of Vision Transformers. To address the issue, prior works have focus
 ed on accelerating only attention due to its high computational cost in NL
 P Transformers. In contrast, Vision Transformers demonstrate a higher comp
 utational cost due to linear modules such as linear transformation, linear
  projection and Feed-Forward Network (FFN), compared to attention. In this
  paper, we present ViT-slice, an algorithm-architecture co-design that enh
 ances end-to-end performance and energy efficiency by optimizing not only 
 attention but also linear modules. At the algorithm level, we propose bit-
 slice compression that avoids storing the redundant most significant bits 
 (MSBs). Additionally, we present bit-slice dot product with early skip to 
 efficiently compute the dot product using bit-sliced data. To enable early
  skip during the dot product computation, we leverage a trainable threshol
 d. On the hardware level, we introduce a specialized bit-slice dot product
  unit (BSDPU) to efficiently process the bit-slice dot product with early 
 skip algorithm. Additionally, we present a bit-slice encoder and decoder f
 or on-chip bit-slice compression. ViT-slice achieves 244×, 35.3×, 16.8×, 1
 0.4×, 5.0× end-to-end speedup over Xeon CPU, EdgeGPU, TITAN Xp GPU, Sanger
  accelerator and ViTCoD accelerator, respectively.\n\nTopic: AI, Design\n\
 nKeyword: AI/ML Architecture Design\n\nSession Chair: Hyoukjun Kwon (Unive
 rsity of California, Irvine)
END:VEVENT
END:VCALENDAR
