Close

Presentation

EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Unified Compression and Adaptive Layer Voting
DescriptionEfficiently adapting Large Language Models (LLMs) on resource-constrained devices, such as edge devices, is vital for applications requiring continuous and privacy-preserving adaptation. However, existing solutions fall short due to the high memory and computational overhead associated with LLMs. To address this, we introduce an LLM tuning framework, Edge-LLM, that features three core components: (1) a unified compression method offering cost-effective layer-wise pruning ratios and quantization policies, (2) an adaptive tuning and voting scheme that selectively adjusts a subset of layers during each iteration and then adaptively combines their outputs for the final inference, thus reducing backpropagation depth and memory overhead during adaptation, and (3) a complementary search space that optimizes device workload and utilization. Experiment results demonstrate that Edge-LLM achieves efficient on-device adaptation with comparable performance with vanilla tuning methods.
Event Type
Research Manuscript
TimeTuesday, June 252:42pm - 3:00pm PDT
Location3001, 3rd Floor
Topics
AI
Keywords
AI/ML Algorithms