ComputingReviews.com

A GPU-based branch-and-bound algorithm using IntegerVectorMatrix data structure
Gmys J., Mezmaz M., Melab N., Tuyttens D. Parallel Computing59(C):119-139,2016.Type:Article

Date Reviewed: 05/17/17

Gmys et al. address the parallelization of the popular branch-and-bound (B&B) algorithm for solving combinatorial optimization problems using graphics processing units (GPUs). The irregular structure of B&B makes it challenging for GPU parallelization.

The novel contribution of the paper is the use of so-called integer–vector–matrix (IVM) data structure instead of the traditional linked list to store and manage the pool of subproblems. This allows the authors to avoid the data transfer bottleneck between central processing unit (CPU) and GPU.

Since the IVM-based parallel B&B is still highly irregular in terms of workload, control flow, and memory access patterns, the authors address also the reduction of thread divergence that arises in CUDA’s single instruction, multiple data (SIMD) execution model because of control flow irregularities. They propose alternative mapping schemes for the algorithm with the aim of reducing thread divergence.

For the experimental evaluation of the proposed solutions, the authors use the flow shop scheduling problem, which is implemented using the CUDA programming model and run on a computer equipped with a NVIDIA Tesla K20m GPU. The GPU algorithm using the suggested IVM data structure decomposes on average 3.3 times more nodes per second than its counterpart that uses the traditional linked list data structure. The performance of the IVM-based B&B implementation is analyzed for different mapping choices (one-thread-per-IVM and one-warp-per-IVM), a varying number of IVM structures, and different work stealing strategies.

Reviewer: Sergei Gorlatch

Review #: CR145285 (1707-0484)

Reproduction in whole or in part without permission is prohibited. Copyright 2024 ComputingReviews.com™
Terms of Use | Privacy Policy