A Genetic Algorithm for the Number Partitioning Problem

A Algorithm for the Number Partitiong Problem Jordan Junkermeier Department of Computer Science, St. Cloud State University, St. Cloud, MN 5631 USA Abstract The Number Partitiong Problem (NPP) is an NPhard problem of combatorial optimization which a set of positive tegers must be partitioned to two subsets such that the sums of the subsets are as equal as possible (Hayes, 22). The longest processg time heuristic is a known greedy approach to solvg this problem. In this paper, a Algorithm (GA) is presented as a solution to the NPP and is compared to this greedy heuristic. The GA encodes candidate solutions as bary strgs, uses k-tournament selection to choose parent chromosomes, and uses two-pot crossover and probabilistic bitwise mutation as operators on the encodgs. The results of the two algorithms are compared on identical put sets, and the results show that of the two algorithms, the genetic algorithm produced superior solutions to those of the greedy heuristic. However, the results of the genetic algorithm are probabilistic, that each successive execution of the genetic program may produce a different overall best partition. 1. The Problem 1.1. Background The Number Partitiong Problem (NPP), sometimes referred to as the easiest hard problem (Hayes, 22; Mertens, 23), is an NP-Hard problem of combatorial optimization. Given a set of positive tegers, the problem is to partition the tegers to two subsets such that the sums of the subsets are as equal as possible (Hayes, 22). Formally, given a set a 1, a 2,, a n of positive tegers, fd a partition β {1,, n} such that E(β) = i β(b i ) i β (b i ) is mimized (Mertens, 23). A perfect partition is a partition which the sums of the two subsets are equal, such that E = (Mertens, 23). A perfect partition is always desired, yet not all sets have a perfect partition, as shown the second example below. The followg examples demonstrate the Number Partitiong Problem. Take the set of positive tegers A = {1, 2, 3, }. The optimal partition for this set forms subsets A 1 = {2, 3} and A 2 = {1, }, such that E(A 1 ) = ; a perfect partition. In another, less prime, example, the set B = {, 6} can, at best, be partitioned to form subsets B 1 = {} and B 2 = {6}, such that E(B 1 ) = 2. While this partition is not a perfect partition, this is still the optimal partition sce the difference between the sums is mimized. In this example, a perfect partition is not possible. 1.2. Complexity The Number Partitiong Problem s computational complexity is dependent on the type of numbers the put set A = {a 1, a 2,, a N }. If each a i is a positive teger bounded by a constant B, then the difference E between the partitioned subsets can be at most NB different values, such that the search space is O(NB) stead of O(2 N ). This is known as pseudo polynomiality. (Mertens, 23). However, a typical put set is comprised of dependently and identically distributed random numbers, such that the mimal difference E 1 is a stochastic variable with median value O( N 2 N ) (Karmarkar et al., 1986). Surprisgly, heuristic algorithms for the NPP are of poor quality (Johnson et al., 1991 & Ruml et al., 1996). The differencg method is the best polynomial time heuristic, which, for put set A of real valued a i, yields discrepancies O(N logn ) for some positive constant (Yakir, 1996). Due to the NP-hardness of the NPP, for any put set A bounded by B = 2 kn, the worst case complexity of any exact algorithm is exponential N for all k > (Mertens, 23). 1.3. Applications The Number Partitiong Problem can be applied to areas cludg public key encryption and task schedulg (Mertens, 23), public key cryptography (Merkle, 1978), and team selection for sportg events (Hayes, 22). Specifically, the NPP can be utilized multiprocessor schedulg and VLSI circuit size and delay mimization (Coffman & Lueker, 1991 & Tsai, 1992). Additionally, the Number Partitiong Problem is one of the six basic NP-Hard problems that are fundamental to the theory of NP-completeness (Garey & Johnson, 1997 & Mertens, 22) and is often used NP-completeness proofs for problems such as

knapsack problems, quadratic programmg, b packg, and multiprocessor schedulg (Mertens, 23). 2. The Greedy Heuristic Because the Number Partitiong Problem is NPhard, exact solutions with known algorithms are only possible for small problem stances (Pedroso & Kubo, 28). Therefore, the idea of an exact solution should be abandoned and approximative heuristic algorithms should be implemented stead. (Mertens, 23). One such heuristic is the longest processg time heuristic, commonly used the multi-processor schedulg problem (Pedroso & Kubo, 28). In this algorithm, the largest number the origal set is placed to one of the two subsets. The largest remag number is then placed to the subset that has the smaller total sum. This contues until all numbers have been assigned to a subset. The aim of this heuristic is to keep the sum discrepancy as small as possible with each successive decision (Mertens, 23). Figure 1 shows the heuristic applied to the set A = {, 5, 6, 7, 8}. A = {, 5, 6, 7, 8} A 1 = {8} A 2 = { } A 1 = {8} Time A 2 = {7} A 1 = {8} A 2 = {7, 6} A 1 = {8, 5} A 2 = {7, 6} A 1 = {8, 5, } A 2 = {7, 6} Fal sum discrepancy = Figure 1. The longest processg time heuristic applied to the set A = {, 5, 6, 7, 8}. For this stance, the heuristic produces a partition with a sum discrepancy of four. The optimal partition for the set the above example is {7, 8} {, 5, 6}, a perfect partition with a sum discrepancy of zero. In addition to the greedy heuristic s failure to produce the optimal partition, the partition {6, 8} {, 5, 7}, with a discrepancy of two, was also missed. In short, while this greedy heuristic may be acceptable, it is not ideal. This algorithm s time complexity is O(N log N), the time complexity of sortg N numbers (Mertens, 23). As the example depicted Figure 1, the worst situation arises when the sums of the two subsets are equal just before the last sertion. In this case, the fal discrepancy will necessarily be equal to the last number serted. This is a motivation for the assignment of numbers decreasg order, which gives the scalg O(N 1 ) of the result for real-valued a j (Mertens, 23). 3. The Algorithm In addition to greedy heuristics, genetic algorithms (GA) can also be used to produce adequate solutions to NP-hard problems. algorithms are heuristics based on biological evolution that simulate reproduction with variation and selection accordg to fitness, like that of a true biological population (Julstrom, 215). The genetic algorithm created to solve the NPP is implemented the C# programmg language and follows the general structure of a typical genetic algorithm, shown Figure 2 (Julstrom, 215). In this GA, the program iterates through a set number of generations and then halts, reportg the best overall solution. Generate random itial population; While (not done) { For i=1 to population size { Select two parents; Crossover to produce an offsprg; Mutate the offsprg; Insert offsprg to new generation; } } Offsprg replace parents; Report the best solution the population; Report the best overall solution; Figure 2. The general structure of a genetic algorithm. 3.1. Encodg Candidate Solutions In a GA designed for the Number Partitiong Problem, candidate solutions can be represented as bary strgs, parallel to an array of the put numbers, such that each character the bary strg represents one number the put array. Each number the array is represented by the character the bary strg whose position the strg is the same as that number s dex the array. For each bary character a candidate solution, if the character s value is, the correspondg number is a member of the first subset the partition. Otherwise (the bary character s value is 1), the number belongs to the second subset. This relationship between candidate solutions and their bary strg encodgs is illustrated Figure 3. A Chromosome struct with data members genome and fitness hold each candidate solution and its associated fitness, respectively. A population array holds every Chromosome the population.

A = {, 5, 6, 7, 8} Chromosome i = 111 Partition i = {, 5, 8} {6, 7} Chromosome j = 111 Partition j = {, 5, 6} {7, 8} Chromosome k = Partition k = { } {, 5, 6, 7, 8} Figure 3. Three chromosomes and their correspondg partitions for the bary strg encodg of set A = {, 5, 6, 7, 8}. 3.2. In this genetic algorithm, a chromosome s fitness is equal to the sum discrepancy between the two subsets that its bary strg encodg represents. Therefore, fitness should be mimized, such that chromosomes with smaller fitnesses are better solutions. The fitness for each chromosome is also stored with the Chromosome struct. es are calculated by takg the absolute value of the difference between two sums the chromosomes represent. The characters the bary strg encodg are iterated through and the represented numbers are added to the appropriate subset sum. The resultg difference becomes that chromosome s fitness. At the end of each generation, once the offsprg chromosomes replace the parent chromosomes, the chromosome with the smallest fitness is reported as output to the program. This chromosome is also compared to the overall best chromosome throughout the program. If the best chromosome the current population has a smaller fitness than the overall best, the local best chromosome becomes the overall best. At the end of the program s execution, the overall best chromosome and its fitness are reported as output. 3.3. Selection The genetic algorithm uses k-tournament selection to determe which chromosomes the population will become parents, with k = 2 for the problem stances used durg testg and durg comparison with the greedy heuristic. To determe a parent, an array of size k of candidate parents is itialized, and chromosomes are randomly chosen from the population and added to the array until it is full. The candidates fitnesses are then compared, and the chromosome with the smallest fitness becomes the parent. 3.. Crossover Crossover the genetic algorithm is accomplished via two-pot crossover, which two dices, X 1 and X 2, of the bary strg encodg are chosen, and each of the two parent chromosomes are cut at those dices. The offsprg chromosome is created by takg the characters at dices [, X 1 ) from the first parent, appendg the characters at dices [X 1, X 2 ) from the second parent, and appendg characters at dex X 2 onward from the first parent. This process is illustrated Figure. In this particular genetic algorithm, X 1 and X 2 occur approximately one-third and two-thirds of the way through the chromosome, respectively. X 1 X 2 parent = 111111 parent 1 = 111111 offsprg = 111111 Figure. An example of two-pot crossover a genetic algorithm. 3.5. Mutation After an offsprg chromosome has been created, a probabilistic bitwise mutation is performed on that chromosome s bary strg encodg. For each character the bary strg, there is a 1% chance that the character will be swapped. For each swap, an existg 1 becomes a, and an existg becomes a 1. This way, each new chromosome that enters the population still matas some heritability from its parents, while also allowg for the troduction of new traits to the population.. Comparison of Algorithms In this section, several problem stances of the NPP are described, and the results of both the genetic algorithm and the longest processg time greedy heuristic on those stances are stated and compared. For these tests, both of the algorithms have been implemented the C# programmg language and run as console applications..1. A Small Test Instance The set A = {, 5, 6, 7, 8} was used above Figure 1 to illustrate the longest processg time heuristic. Therefore, it was the first NPP problem stance used the algorithm comparison. The greedy heuristic produced the partition {8, 5, } {7, 6} with sum discrepancy E =. When A was used as put to the genetic program with a

population size of 1 and 5 generations, the perfect partition {, 5, 6} {7, 8} with E = was achieved. It should be noted that, unlike the greedy heuristic, which always produces the same end partition per put set, the results of the genetic algorithm are probabilistic, that each successive run of the program may produce a different overall best partition. In the previous example, the perfect partition was achieved on the first execution of the program. Subsequent executions yielded equivalent results. See Table 1 and Table 2 for more details. While this example may be trivial, it demonstrates the comparative effectiveness of the genetic algorithm on a basic level. Additionally, this put set was the put used to itially test the correctness of the genetic program..2. Additional Comparisons and used as put to both of the programs. As Table 1 shows, the genetic program aga produced the better solution, even on the first program execution. On its first execution, the genetic program formed a partition with a sum discrepancy of 1, while the greedy program only managed to produce a partition with a discrepancy of 58. Moreover, on all of the stacked executions of the genetic program, perfect partitions were achieved. However, the greedy program (usg Quicksort as the sortg mechanism) had a total execution time of ms, while a sgle execution of the genetic program lasted 1ms. Execution time summary statistics are detailed Table 2. Similarly, an put set of 5 random tegers the range [1, 1] was also generated and used as put. The results are shown Table 1 and Table 2. For another demonstration, an put set of 1 random tegers the range [1, 1] was generated Table 1. Summary Statistics for the Comparison of the Longest Processg Time greedy heuristic and the proposed Algorithm (with 5 generations) Input Set {, 5, 6, 7, 8} 1 [1, 1] 5 [1, 1] Algorithm (1 execution) (25 executions) (5 executions) (1 executions) Greedy 58 58 58 58 (population = 1) (population = 1) 1 Greedy 1 1 1 1 (population = 1) 2 Table 2. Program Execution Time Summary Statistics for the Comparison of the Longest Processg Time greedy heuristic and the proposed Algorithm (with 5 generations) Input Set {, 5, 6, 7, 8} 1 [1, 1] 5 [1, 1] Algorithm Execution Time (1 execution) (milliseconds) Greedy 2 (population = 1) 19 (population = 1) 1 (population = 1) 1,17

5. Conclusion From the results collected from the test problem stances described 5. Comparison of Algorithms, shown Table 1, it is clear that of the two algorithms, the genetic algorithm managed to produce superior solutions to those of the greedy heuristic. However, the results of the genetic algorithm are probabilistic, that each successive execution of the genetic program may produce a different overall best partition. This is dependent on the random itial population, the population size, the randomly selected parents, random mutation, and more. Therefore, there is a chance that the genetic program will not produce the optimal partition for a given put set. To crease the chances of producg a superior fal partition, subsequent executions of the program should be performed, as demonstrated this paper. Additionally, the number of offsprg generations each program execution may be altered for different results. The major difference between alterg the number of program executions and alterg the number of generations is that each generation has some heritability from their parent chromosomes, while the itial generation a subsequent program execution is randomized, such that there is no heritability from one execution to the next. As detailed Table 2, the genetic algorithm s superior solutions are contrasted by its ferior execution time. Time complexity could possibly be reduced through mor, more efficient, alterations to the algorithm. In conclusion, this genetic algorithm can be used to generate adequate solutions to stances of the Number Partitiong Problem, whereas exact solutions to the problem would be computationally hard to produce, and solutions provided by the longest processg time heuristic are ferior. References Coffman, E. & Lueker, G. S. (1991). Probabilistic Analysis of Packg and Partitiong Algorithms. John Wiley & Sons. New York. Garey, M. R. & Johnson, D. S. (1997). Computers and Intractability. A Guide to the Theory of NP-Completeness. W.H. Freeman. New York. Hayes, B. (22). The Easiest Hard Problem. American Scientist. 9, 113. Johnson, D. S., Aragon, C. R., McGeoch, L. A., & Schevron C. (1991). Operations Research. 39, 378. Julstrom, B. (215). Evolutionary Computation. Lecture Notes. Karmarkar, N., Karp, R. M., Lueker, G. S., & Odlyzko, J. (1986). Appl. Prob. 23, 626. Merkle, R. C. & Hellman, M. E. (1978). IEEE Transactions on Information Theory 2, 525. Mertens, S. (22). Computg Science and Engeerg., 31. Mertens, S. (23). The Easiest Hard Problem: Number Partitiong. Magdeburg, Germany. Pedroso, J. P. & Kubo, M. (28). Heuristics and Exact Methods for Number Partitiong. Technical Report Series: DCC-28-3. Ruml, W., Ngo, J., Marks, J., & Shieber, S. (1996). Journal of Optimization Theory and Applications. 89, 251. Tsai, L.-H. (1992). SIAM J. Comput. 21, 59. Yakir, B. (1996). Math. Oper. Res. 21, 85.