Close

Presentation

Control Flow Divergence Optimization by Exploiting Tensor Cores
DescriptionKernels are scheduled on Graphics Processing Units (GPUs) in the granularity of warp, a bunch of concurrently executing threads. When executing kernels with conditional branches, threads within a warp may execute different branches sequentially, resulting in a considerable utilization loss and unpredictable execution time, known as the control flow divergence. This paper proposes a novel method to predict threads' execution path before the kernel launch by deploying a branch prediction network on the GPU's tensor cores, capable of parallel running with CUDA cores. Combined with a well-designed thread data reorganization algorithm, this solution can mitigate GPUs' control flow divergence problem.
Event Type
Research Manuscript
TimeWednesday, June 264:15pm - 4:30pm PDT
Location3001, 3rd Floor
Topics
Embedded Systems
Keywords
Embedded Software