Presentation

· Contributors · Organizations · Search Program · Flagged · Happening Now

Accelerating DNN Execution via Weight and Data Adaptive N:M Pruning

SessionWednesday Work-in-Progress Posters

DescriptionBalancing accuracy and hardware efficiency remains a challenge with traditional pruning methods. N:M sparsity is a recent approach offering a compromise, allowing up to N non-zero weights in a group of M consecutive weights.
However, N:M pruning enforces a uniform sparsity level of $\frac{N}{M}$ across all layers, which does not align well sparse nature of deep neural networks (DNNs). To achieve a more flexible sparsity pattern and a higher overall sparsity level, we present~\textit{JointNF}, a novel joint N:M and structured pruning algorithm to enable fine-grained structured pruning with adaptive sparsity levels across the DNN layers. Moreover, we show for the first time that N:M pruning can also be applied over the input activation for further performance enhancement.

Authors

Event Type

Work-in-Progress Poster