BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240626T180033Z
LOCATION:3001\, 3rd Floor
DTSTART;TZID=America/Los_Angeles:20240625T160000
DTEND;TZID=America/Los_Angeles:20240625T161500
UID:dac_DAC 2024_sess105_RESEARCH1518@linklings.com
SUMMARY:Beyond Inference: Performance Analysis of DNN Server Overheads for
  Computer Vision
DESCRIPTION:Research Manuscript\n\nAhmed AbouElhamayed (Cornell University
 ), Susanne Balle and Deshanand Singh (Intel Corporation), and Mohamed Abde
 lfattah (Cornell University)\n\nDeep neural network (DNN) inference has be
 come an important\npart of many data-center workloads. This has prompted f
 ocused ef-\nforts to design ever-faster deep learning accelerators such as
  GPUs\nand TPUs. However, an end-to-end vision application contains\nmore 
 than just DNN inference, including input decompression, re-\nsizing, sampl
 ing, normalization, and data transfer. In this paper,\nwe perform a thorou
 gh evaluation of computer vision inference\nrequests performed on a throug
 hput-optimized serving system. We\nquantify the performance impact of serv
 er overheads such as data\nmovement, preprocessing, and message brokers be
 tween two DNNs\nproducing outputs at different rates. Our empirical analys
 is encom-\npasses many computer vision tasks including image classificatio
 n,\nsegmentation, detection, depth-estimation, and more complex pro-\ncess
 ing pipelines with multiple DNNs. Our results consistently\ndemonstrate th
 at end-to-end application performance can easily\nbe dominated by data pro
 cessing and data movement functions (up\nto 56% of end-to-end latency in a
  medium-sized image, and &#8764; 80%\nimpact on system throughput in a large ima
 ge), even though these\nfunctions have been conventionally overlooked in d
 eep learning\nsystem design. Our work identifies important performance bot
 tle-\nnecks in different application scenarios, achieves\n2.25× better thr
 oughput compared to prior work, and paves the \nway for more holistic deep
  learning system design.\n\nTopic: AI\n\nKeyword: AI/ML Application and In
 frastructure\n\nSession Chairs: Hongxiang Fan (Imperial College London; Sa
 msung AI Center, UK) and Xiaoxuan Yang (University of Virginia, Stanford U
 niversity)
END:VEVENT
END:VCALENDAR
