Close

Presentation

Heterogeneous Vector Accelerator for Matrix Multiplications on FPGA
DescriptionLarge matrix multiplications are crucial in transformers, especially in self-attention. We propose a heterogeneous vector systolic accelerator where each processing element (PE) has varying vector lane widths, diverging from homogeneous lane widths across all PEs. We partition input matrices into sub-matrices for efficient mapping onto PEs, optimizing resource utilization and minimizing latency. We implement the design on an AMD-Xilinx ZCU104 FPGA. The heterogeneous architectures reports 1.68x better throughput and latency compared to a homogeneous architecture, with a 23% better resource utilization. While using heterogeneous vector tiles, we prefer tiles with larger
lane widths for optimal throughput.
Event Type
Work-in-Progress Poster
TimeWednesday, June 265:00pm - 6:00pm PDT
LocationLevel 2 Lobby
Topics
AI
Autonomous Systems
Cloud
Design
EDA
Embedded Systems
IP
Security