A Parallelization of Non-Serial Polyadic Dynamic Programming on GPU

Tausif Diwan, Jitendra Tembhurne

Abstract


Parallelization of Non-Serial Polyadic Dynamic Programming (NPDP) on high-throughput manycore architectures, such as NVIDIA GPUs, suffers from load imbalance, i.e. non-optimal mapping between the sub-problems of NPDP and the processing elements of the GPU. NPDP exhibits non-uniformity in the number of subproblems as well as computational complexity across the phases. In NPDP parallelization, phases are computed sequentially whereas subproblems of each phase are computed concurrently. Therefore, it is essential to effectively map the subproblems of each phase to the processing elements while implementing thread level parallelism. We propose an adaptive Generalized Mapping Method (GMM) for NPDP parallelization that utilizes the GPU for efficient mapping of subproblems onto processing threads in each phase. Input-size and targeted GPU decide the computing power and the best mapping for each phase in NPDP parallelization. The performance of GMM is compared with different conventional parallelization approaches. For sufficiently large inputs, our technique outperforms the state-of-the-art conventional parallelization approach and achieves a significant speedup of a factor 30. We also summarize the general heuristics for achieving better gain in the NPDP parallelization.


Keywords


dynamic programming, parallel computing, GPU, CUDA, NPDP

Full Text:

PDF


Creative Commons License
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.

Crossref Similarity Check logo

Crossref logologo_doaj