BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240626T180034Z
LOCATION:3008\, 3rd Floor
DTSTART;TZID=America/Los_Angeles:20240627T143000
DTEND;TZID=America/Los_Angeles:20240627T144500
UID:dac_DAC 2024_sess150_RESEARCH927@linklings.com
SUMMARY:PipeSSD: A Lock-free Pipelined SSD Firmware Design for Multi-core 
 Architecture
DESCRIPTION:Research Manuscript\n\nZelin Du (The Chinese University of Hon
 g Kong), Shaoqi Li (Shenzhen University), Zixuan Huang and Jin Xue (The Ch
 inese University of Hong Kong), Tianyu Wang (Shenzhen University), and Kec
 heng Huang and Zili Shao (The Chinese University of Hong Kong)\n\nModern S
 SD firmware is continuously optimized for higher parallelism to match the 
 growing frontend PCIe bandwidth with more backend flash channels. Although
  a multi-core microprocessor is typically adopted to concurrently process 
 independent NVMe requests from multiple NVMe queues, the existing one-to-m
 any thread-request mapping model with each thread serving one or more inco
 ming I/O requests has poor scalability due to severe lock contention probl
 em, especially in cache management.\n\nIn this paper, we first conduct pre
 liminary experiments on an open-channel NVMe SSD to exhibit the lock conte
 ntion problem in the one-to-many thread-request mapping model. When a thre
 ad locks a cache line and is waiting for a long-latency flash read to upda
 te this cache line, subsequent tasks on other threads that require the sam
 e cache line are all blocked to guarantee correctness. To mitigate this, w
 e propose PipeSSD, a lock-free pipeline-based SSD firmware design with a m
 any-to-one thread-request mapping model that assigns multiple threads to s
 erve different stages of each I/O request in a pipelined way. It is worth 
 noting that PipeSSD only performs cache updates in the last pipeline stage
  to eliminate dependency loops in the pipeline while maintaining a pilot f
 or each cache line in the beginning pipeline stage to indicate the cache l
 ine status. With a multi-core architecture, different pipeline stages are 
 processed on different cores communicated via several FIFO queues, which c
 an ensure the processing sequence and data consistency without any cache l
 ine locks. We implement PipeSSD on real hardware and evaluate its performa
 nce on a multi-core NVMe SSD prototype. The evaluation results show that o
 n an 8-core system, PipeSSD has a significant throughput improvement compa
 red to the state-of-the-art multi-core SSD firmware.\n\nTopic: Embedded Sy
 stems\n\nKeyword: Embedded Memory and Storage Systems\n\nSession Chair: Fi
 lippo Carloni (Politecnico di Milano)
END:VEVENT
END:VCALENDAR
