BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240626T180033Z
LOCATION:2012\, 2nd Floor
DTSTART;TZID=America/Los_Angeles:20240624T134500
DTEND;TZID=America/Los_Angeles:20240624T140000
UID:dac_DAC 2024_sess203_IP010@linklings.com
SUMMARY:Hybrid Tiled Vector Systolic Architecture to
DESCRIPTION:IP\n\nJay Shah and Nanditha Rao (International Institute of In
 formation Technology, Bangalore)\n\nTo accelerate the Convolution and Matr
 ix Multiplications for CNNs and Transformers respectively, we propose an F
 PGA-based Vector Systolic Array (VSA) Accelerator. This custom IP employs 
 adaptable vector lane-width to enable parallel data processing for enhance
 d throughput. We enhance this architecture by introducing a hybrid tiled v
 ector systolic design which utilizes LUTs and DSPs in a complimentary fash
 ion by using a unique data mapping strategy. Results show a 7x and 1.26x i
 ncrease in throughput for single-tile and multi-tile configurations, respe
 ctively. The hybrid tile approach achieves competitive throughputs of 1165
  GOPs and 1072 GOPs for Vector-6 and 8, outperforming related work by 3.8x
 . Additionally, we designed this architecture with a novel convolution met
 hod to reduce latency and packaged it as a customizable IP targeted for an
  FPGA accelerator. This design reduces memory access latency while maintai
 ning competitive throughput by reusing kernels and by partitioning image m
 atrices to suit the different lane widths.\n\nTopic: Engineering Tracks, I
 P\n\nSession Chair: Barun Bikash Paul (Broadcom)
END:VEVENT
END:VCALENDAR
