Close

Presentation

SPARK: An Efficient Hybrid Acceleration Architecture with Run-Time Sparsity-Aware Scheduling for TinyML Learning
DescriptionCurrently most TinyML devices only focus on inference, as training requires much more hardware resources. In this paper, we introduce SPARK, an efficient hybrid acceleration architecture with run-time sparsity-aware scheduling for TinyML learning. Besides a stand-alone accelerator, an in-pipeline acceleration unit is integrated within the CPU pipeline to support simultaneous forward and backward propagation. To better utilize sparsity and improve hardware utilization, a sparsity-aware acceleration scheduler is implemented to schedule the workload between two acceleration units. A unified memory system is also constructed to support transposable data fetch, reducing memory access. We implement SPARK using TSMC 22nm technology and evaluate different TinyML tasks. Our work is the first architecture to utilize two acceleration units for on-device learning. Compared with the baseline accelerator, SPARK achieves 4.1x performance improvement in average with only 2.27% area overhead. SPARK also outperforms off-shelf edge devices in performance by 9.4x with 446.0x higher efficiency.
Event Type
Research Manuscript
TimeTuesday, June 253:45pm - 4:00pm PDT
Location3008, 3rd Floor
Topics
Design
Keywords
Design of Cyber-physical Systems and IoT