Close

Presentation

Hybrid Tiled Vector Systolic Architecture to
DescriptionTo accelerate the Convolution and Matrix Multiplications for CNNs and Transformers respectively, we propose an FPGA-based Vector Systolic Array (VSA) Accelerator. This custom IP employs adaptable vector lane-width to enable parallel data processing for enhanced throughput. We enhance this architecture by introducing a hybrid tiled vector systolic design which utilizes LUTs and DSPs in a complimentary fashion by using a unique data mapping strategy. Results show a 7x and 1.26x increase in throughput for single-tile and multi-tile configurations, respectively. The hybrid tile approach achieves competitive throughputs of 1165 GOPs and 1072 GOPs for Vector-6 and 8, outperforming related work by 3.8x. Additionally, we designed this architecture with a novel convolution method to reduce latency and packaged it as a customizable IP targeted for an FPGA accelerator. This design reduces memory access latency while maintaining competitive throughput by reusing kernels and by partitioning image matrices to suit the different lane widths.
Event Type
IP
TimeMonday, June 241:45pm - 2:00pm PDT
Location2012, 2nd Floor
Topics
Engineering Tracks
IP