BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240626T180033Z
LOCATION:Level 2 Lobby
DTSTART;TZID=America/Los_Angeles:20240626T180000
DTEND;TZID=America/Los_Angeles:20240626T190000
UID:dac_DAC 2024_sess237_RESEARCH2084@linklings.com
SUMMARY:Heterogeneous Vector Accelerator for Matrix Multiplications on FPG
 A
DESCRIPTION:Work-in-Progress Poster\n\nJay Shah and Nanditha Rao (Internat
 ional Institute of Information Technology, Bangalore)\n\nLarge matrix mult
 iplications are crucial in transformers, especially in self-attention. We 
 propose a heterogeneous vector systolic accelerator where each processing 
 element (PE) has varying vector lane widths, diverging from homogeneous la
 ne widths across all PEs. We partition input matrices into sub-matrices fo
 r efficient mapping onto PEs, optimizing resource utilization and minimizi
 ng latency. We implement the design on an AMD-Xilinx ZCU104 FPGA. The hete
 rogeneous architectures reports 1.68x better throughput and latency compar
 ed to a homogeneous architecture, with a 23% better resource utilization. 
 While using heterogeneous vector tiles, we prefer tiles with larger\nlane 
 widths for optimal throughput.\n\nTopic: AI, Autonomous Systems, Cloud, De
 sign, EDA, Embedded Systems, IP, Security
END:VEVENT
END:VCALENDAR
