Presentation

· Contributors · Organizations · Search Program · Flagged · Happening Now

EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Unified Compression and Adaptive Layer Voting

SessionThe Next Step to Efficient AI: Number Formats, Quantization and Beyond

DescriptionEfficiently adapting Large Language Models (LLMs) on resource-constrained devices, such as edge devices, is vital for applications requiring continuous and privacy-preserving adaptation. However, existing solutions fall short due to the high memory and computational overhead associated with LLMs. To address this, we introduce an LLM tuning framework, Edge-LLM, that features three core components: (1) a unified compression method offering cost-effective layer-wise pruning ratios and quantization policies, (2) an adaptive tuning and voting scheme that selectively adjusts a subset of layers during each iteration and then adaptively combines their outputs for the final inference, thus reducing backpropagation depth and memory overhead during adaptation, and (3) a complementary search space that optimizes device workload and utilization. Experiment results demonstrate that Edge-LLM achieves efficient on-device adaptation with comparable performance with vanilla tuning methods.

Authors

Zhongzhi Yu

Georgia Institute of Technology

Zheng Wang

Georgia Institute of Technology

Yuhan Li

Georgia Institute of Technology

Ruijie Gao

Georgia Institute of Technology

Xiaoya Zhou

University of California, Santa Barbara

Sreenidhi Reddy Bommu