Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Best of 2016 Recommended by Editor Recommended by Reviewer Recommended by Reader
Search
GPU-accelerated Hungarian algorithms for the linear assignment problem
Date K., Nagi R. Parallel Computing57 (C):52-72,2016.Type:Article
Date Reviewed: May 11 2017

The paper is devoted to a fundamental problem in the area of combinatorial optimization--the assignment problem. The goal is to find a maximum (or minimum) weight matching in a weighted bipartite graph. The authors focus on the Hungarian algorithm, which solves the linear assignment problem within polynomial time with respect to the number of agents. Alternative approaches that could be used instead of the Hungarian algorithm include, for example, adaptations of the primal simplex algorithm and the auction algorithm.

The main contribution of the paper is the development and evaluation of parallel versions of the Hungarian algorithm. The parallelization is performed in CUDA and targets graphics processing units (GPU) by the NVIDIA Corporation. The most time-intensive phase of the Hungarian algorithm is the augmenting path search; therefore, the authors develop its efficient implementation and they provide detailed pseudocode for both the classical Hungarian algorithm and the alternating tree Hungarian algorithm. The paper contains a detailed description of parallelization for each step (initial reduction, optimality check, augmenting patch search, dual update, and so on) of the algorithm.

In order to target a broad variety of parallel architectures, the authors develop parallel variants for: a single central processing unit (CPU) using OpenMP, a single GPU using CUDA, and multiple GPUs using CUDA. The paper addresses the challenging aspect of storing the cost matrix in the GPU by suggesting a grid-like architecture with multiple compute nodes, each containing one CPU-GPU pair. For implementing the communication between CPUs, the message passing interface (MPI) is employed. The experimental evaluation is conducted on problems of small and large size; it shows that the implementation versions using GPU are much faster than both the sequential and the OpenMP implementation using a common multicore CPU.

Reviewer:  Sergei Gorlatch Review #: CR145269 (1707-0485)
Bookmark and Share
  Editor Recommended
 
 
Graphics Processors (I.3.1 ... )
 
 
Parallel Algorithms (G.1.0 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Graphics Processors": Date
Introduction to volume rendering
Lichtenbelt B., Crane R., Naqvi S., Prentice-Hall, Inc., Upper Saddle River, NJ, 1998. Type: Book (9780138616830)
May 1 1999
Time/space tradeoffs for polygon mesh rendering
Bar-Yehuda R., Gotsman C. ACM Transactions on Graphics (TOG) 15(2): 141-152, 1996. Type: Article
Jul 1 1997
A programmable vertex shader with fixed-point SIMD datapath for low power wireless applications
Sohn J., Woo R., Yoo H.  Graphics hardware (Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, Grenoble, France, Aug 29-30, 2004)107-114, 2004. Type: Proceedings
Jul 8 2005
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy