Close

Session

Research Manuscript: Do-More-with-Less: Optimizing AI Models for Inference Efficiencies
DescriptionWith AI models' continued growth, inference efficiencies have become more important than ever. This session presents six papers on model optimization techniques for improving inference efficiency. The first three papers aim at optimizing the model network topology with path selection, residual optimization, and graph substitution/parallelization. The next two papers present improvements in quantization with variable-length quantization and for point-cloud networks. The sixth paper presents a novel attention mechanism for transformer models.
Event TypeResearch Manuscript
TimeWednesday, June 261:30pm - 3:00pm PDT
Location3003, 3rd Floor
Topics
AI
Design
Keywords
AI/ML System and Platform Design