Close

Presentation

Enabling Multi-Tensor Fused Dataflow for Transformer Models on Spatial Accelerators
DescriptionIn transformer models, data reuse within an operator is insufficient, which prompts more aggressive multiple tensor-wise operator fusion (multi-tensor fusion). Due to the complexity in tensor-wise operator dataflow, conventional fusion techniques often fall short by limited dataflow options and short fusion length. In this study, we first identify three challenges on multi-tensor fusion that result in inferior fusions. Then we propose dataflow adaptive tiling (DAT), a novel inter-operator dataflow to enable an efficient fusion of multiple operators connected in any form and chained in any length. Then, we broaden the dataflow exploration from intra-operator to inter-operator and develop an exploration framework to quickly find the best dataflow on spatial accelerators with given on-chip buffer size. Experiment results show that DAT delivers 2.24X and 1.74X speedup and 35.5% and 15.5% energy savings on average for edge and cloud accelerators, respectively, comparing to the state-of-the-art dataflow explorer FLAT. In addition, DAT exploration framework will be open-sourced.
Event Type
Research Manuscript
TimeWednesday, June 2611:15am - 11:30am PDT
Location3008, 3rd Floor
Topics
AI
Design
Keywords
AI/ML System and Platform Design