BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240626T180002Z
LOCATION:3003\, 3rd Floor
DTSTART;TZID=America/Los_Angeles:20240625T133000
DTEND;TZID=America/Los_Angeles:20240625T150000
UID:dac_DAC 2024_sess158@linklings.com
SUMMARY:Efficient Acceleration Strategies for Transformers: From Token Sim
 ilarity to Weight Sparsity
DESCRIPTION:Research Manuscript\n\nRecent advancement in transformer model
 s led the performance improvement in language modeling and vision tasks. T
 ransformers are equipped with the attention mechanism that extracts useful
  dependency information between input tokens. Due to the nature of sequent
 ial processing, running a transformer is bounded by off-chip memory bandwi
 dth. For vision transformers, a feedforward network that follows after the
  attention module further incurs significant runtime overhead. In this ses
 sion, many unique approaches and their associated hardware architecture ar
 e discussed, including proactively skipping computations for tokens with l
 ow probability, leveraging token similarities, bit-slice compression techn
 ique, and exploiting sparsity in transformers.\n\nSpARC: Token Similarity-
 Aware Sparse Attention Transformer Accelerator via Row-wise Clustering\n\n
 In this paper, we propose SpARC, a sparse attention transformer accelerato
 r that enhances throughput and energy efficiency by reducing the computati
 onal complexity of the self-attention mechanism. Our approach exploits inh
 erent row-level redundancies in transformer attention maps to reduce the o
 vera...\n\n\nHan Cho, Dongjun Kim, Seungeon Hwang, and Jongsun Park (Korea
  University)\n---------------------\nToken-Picker: Accelerating Attention 
 in Text Generation with Minimized Memory Transfer via Probability Estimati
 on*\n\nThe attention mechanism in text generation is memory-bounded due to
  its sequential characteristics. Therefore, off-chip memory accesses shoul
 d be minimized for faster execution. Although previous methods addressed t
 his by pruning unimportant tokens, they fall short in selectively removing
  tokens wit...\n\n\nJunyoung Park, Myeonggu Kang, Yunki Han, Yang-Gon Kim,
  Jaekang Shin, and Lee-Sup Kim (Korea Advanced Institute of Science and Te
 chnology (KAIST))\n---------------------\nCSTrans-OPU: An FPGA-based Overl
 ay Processor with Full Compilation for Transformer Networks via Sparsity E
 xploration\n\nIn this work, we propose CSTrans-OPU, an FPGA-based overlay 
 processor with full compilation for transformer networks via sparsity expl
 oration. Specifically, we customize a multi-precision processing element (
 PE) array with DSP-packing for unified computation format with full resour
 ce utilization. Ad...\n\n\nYueyin Bai, Keqing Zhao, Yang Liu, Hongji Wang,
  Hao Zhou, Xiaoxing Wu, Jun Yu, and Kun Wang (Fudan University)\n---------
 ------------\nFLAME: Fully Leveraging MoE Sparsity for Transformer on FPGA
 \n\nMoE (Mixture-of-Experts) mechanism has been widely adopted in transfor
 mer-based models to facilitate further expansion of model parameter size a
 nd enhance generalization capabilities. However, the practical deployment 
 of MoE mechanism for transformer on resource-constrained platforms, such a
 s FPGA, ...\n\n\nXuanda Lin, Huinan Tian, Wenxiao Xue, Lanqi Ma, Jialin Ca
 o, Manting Zhang, Jun Yu, and Kun Wang (Fudan University)\n---------------
 ------\nFNM-Trans: Efficient FPGA-based Transformer Architecture with Full
  N:M Sparsity\n\nTransformer models have become popular in various AI appl
 ications due to their exceptional performance. However, their impressive p
 erformance comes with significant computing and memory costs, hindering ef
 ficient deployment of Transformer-based applications. Many solutions focus
  on leveraging sparsi...\n\n\nManting Zhang, Jialin Cao, Kejia Shi, Keqing
  Zhao, Genhao Zhang, Jun Yu, and Kun Wang (Fudan University)\n------------
 ---------\nViT-slice: End-to-end Vision Transformer Accelerator with Bit-s
 lice Algorithm\n\nVision Transformers have demonstrated remarkable perform
 ance in various vision tasks. However, general-purpose processors, such as
  CPUs and GPUs, face challenges in efficiently handling the inference of V
 ision Transformers. To address the issue, prior works have focused on acce
 lerating only attentio...\n\n\nDongjin Shin, Insu Choi, and Joon-Sung Yang
  (Yonsei University)\n\nTopic: AI, Design\n\nKeyword: AI/ML Architecture D
 esign\n\nSession Chair: Hyoukjun Kwon (University of California, Irvine)
END:VEVENT
END:VCALENDAR
