Presentation

· Contributors · Organizations · Search Program · Flagged · Happening Now

Enabling Multi-Tensor Fused Dataflow for Transformer Models on Spatial Accelerators

SessionOptimizing Both Directions: Better AI Algorithm for Systems and Better Systems for AI

DescriptionIn transformer models, data reuse within an operator is insufficient, which prompts more aggressive multiple tensor-wise operator fusion (multi-tensor fusion). Due to the complexity in tensor-wise operator dataflow, conventional fusion techniques often fall short by limited dataflow options and short fusion length. In this study, we first identify three challenges on multi-tensor fusion that result in inferior fusions. Then we propose dataflow adaptive tiling (DAT), a novel inter-operator dataflow to enable an efficient fusion of multiple operators connected in any form and chained in any length. Then, we broaden the dataflow exploration from intra-operator to inter-operator and develop an exploration framework to quickly find the best dataflow on spatial accelerators with given on-chip buffer size. Experiment results show that DAT delivers 2.24X and 1.74X speedup and 35.5% and 15.5% energy savings on average for edge and cloud accelerators, respectively, comparing to the state-of-the-art dataflow explorer FLAT. In addition, DAT exploration framework will be open-sourced.

Authors

Lei Xu

Shanghai Jiao Tong University

Zhiwen Mo

Shanghai Jiao Tong University

Qin Wang

Shanghai Jiao Tong University

Jianfei Jiang

Shanghai Jiao Tong University

Naifeng Jing

Shanghai Jiao Tong University

Event Type