Presentation

· Contributors · Organizations · Search Program · Flagged · Happening Now

SPARK: An Efficient Hybrid Acceleration Architecture with Run-Time Sparsity-Aware Scheduling for TinyML Learning

SessionFrom Conception to Deployment, a Journey in CPS and IoT Design

DescriptionCurrently most TinyML devices only focus on inference, as training requires much more hardware resources. In this paper, we introduce SPARK, an efficient hybrid acceleration architecture with run-time sparsity-aware scheduling for TinyML learning. Besides a stand-alone accelerator, an in-pipeline acceleration unit is integrated within the CPU pipeline to support simultaneous forward and backward propagation. To better utilize sparsity and improve hardware utilization, a sparsity-aware acceleration scheduler is implemented to schedule the workload between two acceleration units. A unified memory system is also constructed to support transposable data fetch, reducing memory access. We implement SPARK using TSMC 22nm technology and evaluate different TinyML tasks. Our work is the first architecture to utilize two acceleration units for on-device learning. Compared with the baseline accelerator, SPARK achieves 4.1x performance improvement in average with only 2.27% area overhead. SPARK also outperforms off-shelf edge devices in performance by 9.4x with 446.0x higher efficiency.

Authors

Event Type