Computing Reviews, the leading online review service for computing literature.

Search

GPU-accelerated Hungarian algorithms for the linear assignment problem
Date K., Nagi R. Parallel Computing57 (C):52-72,2016.Type:Article

Date Reviewed: May 11 2017

The paper is devoted to a fundamental problem in the area of combinatorial optimization--the assignment problem. The goal is to find a maximum (or minimum) weight matching in a weighted bipartite graph. The authors focus on the Hungarian algorithm, which solves the linear assignment problem within polynomial time with respect to the number of agents. Alternative approaches that could be used instead of the Hungarian algorithm include, for example, adaptations of the primal simplex algorithm and the auction algorithm. The main contribution of the paper is the development and evaluation of parallel versions of the Hungarian algorithm. The parallelization is performed in CUDA and targets graphics processing units (GPU) by the NVIDIA Corporation. The most time-intensive phase of the Hungarian algorithm is the augmenting path search; therefore, the authors develop its efficient implementation and they provide detailed pseudocode for both the classical Hungarian algorithm and the alternating tree Hungarian algorithm. The paper contains a detailed description of parallelization for each step (initial reduction, optimality check, augmenting patch search, dual update, and so on) of the algorithm. In order to target a broad variety of parallel architectures, the authors develop parallel variants for: a single central processing unit (CPU) using OpenMP, a single GPU using CUDA, and multiple GPUs using CUDA. The paper addresses the challenging aspect of storing the cost matrix in the GPU by suggesting a grid-like architecture with multiple compute nodes, each containing one CPU-GPU pair. For implementing the communication between CPUs, the message passing interface (MPI) is employed. The experimental evaluation is conducted on problems of small and large size; it shows that the implementation versions using GPU are much faster than both the sequential and the OpenMP implementation using a common multicore CPU.

Reviewer: Sergei Gorlatch	Review #: CR145269 (1707-0485)

Graphics Processors (I.3.1 ... )

Parallel Algorithms (G.1.0 ... )

Would you recommend this review?

yes

Other reviews under "Graphics Processors":	Date

Introduction to volume rendering Lichtenbelt B., Crane R., Naqvi S., Prentice-Hall, Inc., Upper Saddle River, NJ, 1998. Type: Book (9780138616830)	May 1 1999

Time/space tradeoffs for polygon mesh rendering Bar-Yehuda R., Gotsman C. ACM Transactions on Graphics (TOG) 15(2): 141-152, 1996. Type: Article	Jul 1 1997

A programmable vertex shader with fixed-point SIMD datapath for low power wireless applications Sohn J., Woo R., Yoo H. Graphics hardware (Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, Grenoble, France, Aug 29-30, 2004)107-114, 2004. Type: Proceedings	Jul 8 2005

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy