BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240626T180034Z
LOCATION:Level 2 Lobby
DTSTART;TZID=America/Los_Angeles:20240626T180000
DTEND;TZID=America/Los_Angeles:20240626T190000
UID:dac_DAC 2024_sess237_RESEARCH818@linklings.com
SUMMARY:Distributed Inference of DL Workloads on CIM-based Heterogeneous A
 ccelerators
DESCRIPTION:Work-in-Progress Poster\n\nMojtaba AlShams and Kamilya Smagulo
 va (King Abdullah University of Science and Technology (KAUST)), Mohammed 
 Fouda (Rain Neuromorphics Inc.), and Ahmed Eltawil (King Abdullah Universi
 ty of Science and Technology (KAUST))\n\nThe remarkable advancements in Ne
 ural Networks' precision have ignited a revolution in their architecture, 
 demanding ever-expanding memory and computational resources. As we confron
 t the limitations posed by current hardware, such as memory and processing
  capabilities, one innovative solution emerges: the distribution of neural
  network model inference across multiple devices. While most prior efforts
  have focused on optimizing single-device inference or partitioning models
  to enhance inference throughput. This work proposes a framework that sear
 ches for optimal model splits and distributes the partitions across the co
 mbination of a given set of devices taking into consideration the throughp
 ut and energy. Participating devices are strategically grouped into homoge
 neous and heterogeneous clusters consisting of general-purpose CPU and GPU
  architectures, as well as emerging Compute-In-Memory (CIM) accelerators. 
 The framework simultaneously optimizes inference throughput and energy con
 sumption with a weighting control parameter. Compared to the performance o
 f a single GPU, it helps to achieve up to 4$\times$ speedup with approxima
 tely 4$\times$ per-device energy reduction in a heterogeneous setup. The a
 lgorithm also finds a smooth Pareto-like curve on the throughput-energy sp
 ace for CIM devices.\n\nTopic: AI, Autonomous Systems, Cloud, Design, EDA,
  Embedded Systems, IP, Security
END:VEVENT
END:VCALENDAR
