The Assignment Problem: Exploring Parallelism

Size: px

Start display at page:

Download "The Assignment Problem: Exploring Parallelism"

Rosemary Greene
5 years ago
Views:

1 The Assignment Problem: Exploring Parallelism Timothy J Rolfe Department of Computer Science Eastern Washington University 319F Computing & Engineering Bldg. Cheney, Washington USA Timothy.Rolfe@mail.ewu.edu Abstract: The linear assignment problem requires the determination of an optimal permutation vector for the assignment of tasks to agents. Even the backtracking implementation supports a rather powerful bounding function. Since the processing of permutation families (based on low-subscripted vector assignments) can be done independently of each other, one may examine parallel processing strategies, and discovery of instances in which the parallel execution is a very bad idea. Because of article size limitations, this article discusses only backtracking in parallel. A later article will address branch-and-bound in parallel. Categories and Subject Descriptors: F.2.2 Nonnumerical Algorithms and Problems Computations on discrete structures; G.2.1 [Discrete Mathematics] Combinatorics (permutations and combinations); G.2.3 [Discrete Mathematics] Applications General Terms: Algorithms, Performance Keywords: Parallel Processing, Backtracking 1. INTRODUCTION The Assignment Problem provides a useful basis for exploring parallel solution of problems since it appears to provide an embarrassingly parallel problem, one in which one may work independently on subproblems since the subproblems do not depend on each other. Specifically, the Linear Assignment Problem [1] may be briefly described thus (either as a minimization problem or a maximization problem): given a set of n agents and a set of n tasks, find the optimal assignment of agents to tasks subject to the matrix that gives the result of assigning each agent and each task, in which each agent has a single task and each task has a single agent. As a minimization problem, the matrix would be the cost matrix, giving the cost of each agent [j] and task [k] assignment, and one wishes the minimum cost assignment. As a maximization problem, the matrix would be the benefit matrix, and one wishes the maximum benefit assignment. Since the two forms are equivalent, this discussion is framed in terms of the maximization problem with a benefit matrix. [Except for permute, all code segments below can be viewed in the context of their full programs through the following URL: 2. BOUNDED BACKTRACKING [2] The pure brute-force solution would be to examine all n! permutations and find a maximal benefit permutation since it is possible that the permutations may not all have unique benefits. As the basis for discussion, here is one possible permutation generation algorithm. [3] void permute (int index, int n, int perm[]) { if (index == n) // All cells assigned { process(perm, n); else { // March available values thru [index] int k, hold; for (k = index; k < n; k++) { swap (perm, index, k); // Put bounding logic here if some // partial permutations can be // immediately discarded. permute (index+1, n, perm); // Above code right rotates by one, // so perform the left rotation. hold = perm[index]; for (k = index+1; k < n; k++) perm[k-1] = perm[k]; perm[n-1] = hold; We can now start pruning the decision tree. First, it is possible to examine the two easiest permutations and obtain a lower bound on the solution: simply make the assignments consistent with traversing the benefit matrix in a diagonal (row = col) or anti diagonal (row+col = n 1) fashion. Each of these is a valid solution to the problem, inroads SIGCSE Bulletin Volume 41, Number June

2 and the higher-benefit solution provides a lower bound for all subsequent work. Once there is a lower bound on the solution, backtracking allows early pruning of the decision tree within the logic that generates the permutations. As noted in the above permute() code, one can insert bounding logic into the permutation generation actually make the recursive call permute(perm, index+1, n) only if the partial permutation [0..index] can be the basis for a successful complete solution. Thus one can compute the benefit for just these assignments. They will be the fixed portion in permutations built with this basis. It is then possible to find an upper bound with this prefix without completely solving the problem examine the unassigned cells and get the largest possible value simply by examining the column maxima, looking at rows in the range [index+1..n 1], and columns perm[index+1] through perm[n 1]. In terms of the following code segment (which assumes a global benefit matrix), this would be obtained by maxadditional = colmaxsum(index+1, perm). int colmaxsum(int start, int perm[]) { int sum = 0, k; for (k = start; k < size; k++) { int columnmaximum = benefit[start][perm[k]], row; for (row = start+1; row < size; row++) { if (columnmaximum < benefit[row][perm[k]]) columnmaximum = benefit[row][perm[k]]; sum += columnmaximum; return sum; Thus, any partial permutation that has fixed perm[0] through perm[index], no complete permutation can be exceed the total of benefit[k][perm[k]] for k from 0 to index plus the above colmaxsum, where start is index+1. The initial permutation comes from the diagonal or anti-diagonal permutation mentioned above, and even with this basis some partial permutations can be discarded. As calculation proceeds, however, the processing of completed permutations allows successive refinement of the maximal benefit permutation, and each of these raises the bar on the lower bound used in discarding partial permutations. Parallel Execution If one has access to a multi-processor / multi-core computer running a Unix variant, the Unix fork() function provides easy access to parallel execution. At the point at which the statement child = fork() is executed, another process, the child process, is created and receives zero from the function, while the original process receives back the process ID of the child process. The child process is created in exactly the same state as the parent process except for the value returned by fork. For all practical purposes it has a copy of all of all variables in the parent in its own space (that is, there is no shared memory), and also shares any files opened for output. Each process can then proceed to process a subproblem in the over-all problem, and then save results in a shared output file. In the context of the assignment problem, one can generate a small number of processes for ranges of values in perm[0]. Each can find the optimal permutation flowing from its initial states and save that state in a shared file. // Break the problem into nproc pieces: hi = size / nprocs; for (proc = 1; proc < nprocs;) { Child = fork(); if ( Child!= 0 ) break; lo = hi; hi = size * ++proc / nprocs; // Explore each option for this set for (k = lo; k < hi; k++) { swap(vect, 0, k); // [1..n-1] done sequentially explore(1, vect, benefit[0][vect[0]]); swap(vect, k, 0); // undo swap This provides a chain of parent/child processes, each working on different initial permutations. The use of a shared output file for communication is discussed in the article Bargain-Basement Parallelism. [4] Note, however, that each process develops its own lower limit so that there is significantly less decision tree pruning than in the sequential program. [Note: in both the above C code and in the following Java code, the permute method above becomes the explore method. The problem size is a global variable, so that the arguments to explore are the index being assigned, the permutation vector, and finally the total benefit for the portion of the permutation already fixed.] On the other hand, within Java one can use separate Java threads to compute ranges of initial permutations. In addition, Java provides a shared-memory environment so that one does not need to send messages through files. One simply needs to ensure that the update of the optimal permutation and its benefit can only be executed by one thread at a time that it be a synchronized method. An additional bonus is that the global lowerlimit can be shared by all threads, so that they can prune their decision trees based on solutions discovered by other threads. The generation of the threads and their execution is simple. static void threadrun(int NThreads) { int thrd, lo = 0, hi; inroads SIGCSE Bulletin Volume 41, Number June

3 for (thrd = 0; thrd < nthreads; thrd++) { hi = size * (thrd+1) / nthreads; engine[thrd] = new Compute (lo, hi, thrd); lo = hi; try { for (thrd=0; thrd<nthreads; thrd++) engine[thrd].start(); for (thrd=0; thrd<nthreads; thrd++) engine[thrd].join(); catch (Exception e) { e.printstacktrace(); // Inner class accesses outer class methods // and data. class Compute extends Thread { int start, finish,// Index range position; // Thread ID. int[] vect; // Working solution Compute(int lo, int hi, int thread) { start = lo; finish = hi; position = thread; // Working permutation from the solution // Insure that ALL threads start the same vect = (int[]) solution.clone(); public void run() { for (int k = start; k < finish; k++) { swap(vect, 0, k); explore(1, vect, benefit[0][vect[0]]); swap(vect, 0, k); The assignment of tasks to processors is handled for the programmer by the Unix operating system in the fork example and by the Java Virtual Machine in the Java thread example. Alternative: Thread Self-Scheduling Rather than working on a block of states, the compute engines can work one initial state at a time. They get their work from a synchronized method that doles out the individual permutations referred to as self-scheduling. Each compute engine then processes several permutations before terminating. The method void threadrun(int []perm, int NThread) would initialize from perm the global vector pending, holding the present state of the permutation, set the global variable knext to 0, and start the requested number of threads without sending them the permutation. Instead, each thread will invoke a boolean getproblem method. // Insure that only one thread at a time // accesses this. Return a boolean false // when all permutations have been // distributed. // These are the global variables: // int pending[], knext; synchronized boolean getproblem(int []work) { if (knext >= size) return false; swap(pending, 0, knext++); System.arraycopy(pending, 0, work, 0, size); return true; The individual threads will invoke this outer class method to get the permutations they work on, and to receive the signal that they can terminate // Inner class accesses outer class methods // and data. class Compute extends Thread { public Compute() { ; // Nothing to do public void run() { int []perm = new int[size]; while (getproblem(perm)) explore(1, perm, benefit[0][perm[0]]); 3. EXPERIMENTAL RESULTS The programs discussed above were run on three different random data sets, one involving a 30x30 random benefit grid and two involving 32x32 random benefit grids, where the permutations involve values in the ranges [0..29] and [0..31] respectively. The 30x30 grid has the highest-value permutation as one occurring about a fifth of the way through the permutations (the first three terms are 5, 23, and 4). One of the 32x32 grids has the highest-value permutation very soon (the first three terms are 1, 4, and 30) while the other has its solution nearly 80% through the permutations (the first three terms are 24, 9, and 14). The programs were run under Ubuntu s Linux version smp on quad-processor hyperthreading computers with Intel Xeon CPUs. Because of the hyperthreading these appear to Linux to be eight CPUs running at 2.80GHz. Programs were run in their pure sequential form and then in their parallel form with two through eight processes cooperating in the solution. The observed behavior of the parallel execution in C based on the Unix fork makes clear that there are dependencies among the calculation of different permutation vectors: in the sequential calculation the lower limit for solutions is updated throughout the calculation. In the following table the wall-clock time required for solutions is shown for the pure sequential program and then inroads SIGCSE Bulletin Volume 41, Number June

4 for the two-process through eight-process parallel executions. Unix C Fork Implementation Sequential processes processes processes processes processes processes processes In multiplying the number of processes dealing with permutation vectors, the decision tree pruning is based on the highest-value permutation within each set, and (as is usual for parallel processing) the total time required comes from the slowest process. Because the three data sets are generated randomly, the permutation values are also randomly distributed. It would appear that in the 32x32a data set the seven-way split generates a subproblem with very little decision tree pruning. The Java threaded implementations, however, do not have the same problem as the C fork one: each thread has access to the shared static variable (lowerlimit) that drives the decision tree pruning so that the high-valued permutation discovered by one thread provides pruning for all other threads as well. The first approach discussed was that in which all threads process approximately the same number of permutations, very similar to the C fork version but with Java threads and the shared memory. We do observe speed-up rather than slow-down in this case. Java Threads Equal Number of Permutations Sequential processes processes processes processes processes processes processes Given the random distribution of permutation values, there is also a random distribution of speed-up ratios. In general, though, it appears that adding processes tends to speed things up. One caveat: it is the hyperthreading that makes the four processors appear to be eight. In pure compute-bound benchmarks the speed-up ratios from one to four processes are better than the ratios from five to eight processes In the self-scheduling approach, the threads do not necessarily process the same number of permutations, but instead process individual permutations until all have been computed. This allows for a random case in which at the end a single thread is processing the last permutation set, again delaying completion. Java Threads Self-Scheduling Permutations Sequential processes processes processes processes processes processes processes From these limited experimental data, neither of the Java thread implementations is clearly better than the other. 4. BRANCH-AND-BOUND PREVIEW The preferred method for solving this problem is through Branch-and-Bound. That will be covered in a subsequent paper. The power of the best-fit-first approach can be seen in the following experimental results: Java Threads Branch-and-Bound Version Sequential processes processes processes processes processes processes processes SUMMARY In this case study, the linear assignment problem is brought into parallel solution by means of backtracking. In the process the C fork approach revealed that there is dependence among the individual solutions in the extent to which decision tree pruning is possible, a problem that does not affect the Java threads approach. There will be a second paper that continues the case study with the solution of this problem through the preferred branch-and-bound algorithmic strategy. 6. WEB RESOURCE This page provides access to this paper and to the subsequent paper on Branch-and-Bound. It also provides inroads SIGCSE Bulletin Volume 41, Number June

5 access to the programs discussed above and an Excel workbook giving the results of the numerical experiments that gave the above tables. ACKNOWLEDGEMENTS These results were obtained using equipment within the Computer Science Department at Eastern Washington University. REFERENCES [1] Gilles Brassard and Paul Bratley, Fundamentals of Algorithmics (Prentice-Hall, Inc., 1996), pp See also Anany Levitin, Introduction to the Design & Analysis of Algorithms (2 nd ed.; Pearson Education, Inc., 2007), pp. 116, 118. [2] The complete chapter on backtracking from Sartaj Sahni s Data Structures, Algorithms, and Applications in Java (Silicon Press, 2004) is at [3] Timothy Rolfe, Backtracking Algorithms, Dr. Dobb s Journal, Vol. 29, No. 5 (May 2004), pp. 48, Text of the article is available through Source code is available through ftp:// /sourcecode /ddj/2004/0405.zip [4] Timothy Rolfe, Bargain-Basement Parallelism, Dr. Dobb s Journal, Vol. 28, No. 2 (February 2003), pp. 46, 48, 50. Text of the article is available through Source code is available through ftp:// /sourcecode/ddj/2003/0302.zip Computing Curricula Overview Report < > Computer Engineering < Computer Science <computer.org/education/cc2001/> Information Systems Information Technology Software Engineering inroads SIGCSE Bulletin Volume 41, Number June

Backtracking. Chapter 5

Backtracking. Chapter 5 1 Backtracking Chapter 5 2 Objectives Describe the backtrack programming technique Determine when the backtracking technique is an appropriate approach to solving a problem Define a state space tree for