Close

Presentation

Rethinking DRAM Failure Prediction In Memory Reliability: An Efficient Deep Image Classification Perspective
DescriptionDynamic Random Access Memory (DRAM) failures cause a significant number of server crashes in large-scale cloud centers, resulting in service interruptions and substantial economic losses. In this paper, we reframe the problem of DRAM failure prediction as a deep image classification (DIC) task. We propose a method that utilizes DIC algorithms to establish the relationship between Correctable Errors (CEs) and Uncorrectable Errors (UEs) with a post-enhancement stage. First, we encode the spatial positions of CEs into distinct blocks distributed across designated channels. Each block contains a value that represents CE counts. Then, we design an extensible post-enhancement stage to enhance those patterns that cannot be captured in the first stage. In our experiments conducted on a dataset from a real-world production cloud center, our approach demonstrates a significant improvement and achieves state-of-the-art performance. We release all source code as open source.
Event Type
Work-in-Progress Poster
TimeWednesday, June 265:00pm - 6:00pm PDT
LocationLevel 2 Lobby
Topics
AI
Autonomous Systems
Cloud
Design
EDA
Embedded Systems
IP
Security