Subset Sum - A Dynamic Parallel Solution

Size: px

Start display at page:

Download "Subset Sum - A Dynamic Parallel Solution"

Julianna Hall
5 years ago
Views:

1 Subset Sum - A Dynamic Parallel Solution Team Cthulu - Project Report ABSTRACT Tushar Iyer Rochester Institute of Technology Rochester, New York txi9546@rit.edu The subset sum problem is an NP-Complete problem which depending on the size of the set and sum can take a long time to complete. This paper introduces a student teams research project into speeding up the calculation of SubsetSum by performing calculations in parallel using PJ2. The students discuss the reasons for picking their topic before discussing the design of their sequential and parallel programs. Following that they discuss the results of their project by showing the Strong and Weak scaling results. They state their hypothesis about why their scaling is not ideal, and concluding with their future plans and lessons they learned throughout the time spent on this project. 1 TOPIC AREA Our chosen problem falls under the umbrella of NP-Complete problems. An NP-Complete problem is one that is both NP-Hard and in NP. The problem can be solved in pseudo-polynomial time, and until the P=NP conundrum is solved it is unknown as to whether or not there can be a polynomial time solution. In this problem area, the time taken to find a solution will increase in proportion to the number of the inputs multiplied by the sum of their values. As the size of the input grows, the computational effort will grow exponentially. Our solution to addressing the increase in computational size and time taken is to rewrite the solution in parallel. This way we should be able to break the computation amongst multiple cores in order to speed up the time taken to find a solution. 2 COMPUTATIONAL PROBLEM The computational problem we aim to solve is Subset Sum. The premise of Subset Sum asks whether for a given set X and a target sum Y, does there exist a subset of X called Z such that the sum of all elements in Z are equal to the value of Y? We are building a solution around a modified version of the Subset Sum problem. In our solution, we consider that all elements within the set X as well as the target Y must be positive integers greater than zero. Furthermore, we are considering the solution of Subset Sum that uses Dynamic Programming to build the solution. We chose the dynamic algorithm since be definition a dynamic algorithm is one that successfully simplifies a problem by dividing it into smaller sub-parts which are more computationally manageable. We believe that this lies in tune with how we would normally approach building a parallel solution for any other computational problem, as it can help provide a boost in computational performance. We are using the Parallel Java 2 library (PJ2) [2] in order to build our solution. This library features a specific syntax which allows a for loop to be conducted in parallel. Since the dynamic algorithm for the Subset Sum problem features multiple for loops, we hope Aziel Shaw Rochester Institute of Technology Rochester, New York ats9095@rit.edu to be able to take advantage of this structure and this library in order to give us a significant speedup. 3 RESEARCH PAPER ANALYSIS 3.1 Research Paper One Parallel Solution of the Subset-sum Problem: An Empirical Study [1] by Bokhari, Saniyah S. [Ohio State University (2011)] Problem. The paper parallelizes subset sum problem on three different architectures (Cray XMT, IBM x3755, NVIDIA FX 5800). The study resulted in the creation of two different algorithms for parallelization. The original hypothesis was that the performance would not be a fixed factor and would vary depending on the computational size of the problem. After creating the solution, it is then evaluated on these three machines. The performance is discussed and then Bokhari goes on to confirm the hypothesis of variation of performance Solution. All three architectures gave interesting results; The IBM x3755 performed with decent scaling up to 8 cores. The Cray XMT performed with good scaling up to 16 nodes, and its 128-bit processor was better suited to larger problem sizes. Unlike the other two machines, the Cray XMT also features word-locking, which means that no two threads can access the current word simultaneously. In order to take advantage of that, Bokhari et al. [1] wrote a modified implementation that would use word-locking as part of the parallel solution. Most interestingly out of the three, the NVIDIA FX 5800 was not the best performer out of the three for medium sized problems, but rather it performed well for the smaller variety of problem sizes (Essentially resulting in the finding that it would always perform decently when the problem would fit in memory). Figure 1 on the next page shows the comparison of the performance of all three architectures up against each other.

2 3.2.3 Use in Our Project. Upon researching this paper, we had mentioned that we would work with it to see if parallelization through CUDA was feasible in our case, and decided later that we would more likely be able to produce a decent result with a Multicore parallel solution. We did however take away from this paper the effect of proper GPU computing and how that could potentially help lessen the computational load from a CPU to thousands of CUDA kernels. 3.3 Research Paper Three Parallel Implementation of the Modified Subset Sum Problem in OpenCL [3] by D. Petkovski, and I. Mishkovski [ICT Innovations 2015 Web Proceedings ISSN ] Problem. This paper presented a solution for a modified version of the Subset Sum problem that involved parallelization in such a manner that memory allocation was made dynamic and less memory would thereby required for a solution. It delves into how parallelization was done using OpenCL before discussing the resulting implementation. Figure 1: Image - Performance [1] Use in Our Project. The dynamic algorithm that was used for the Cray XMT is similar to the algorithm we used for both the sequential and parallel implementations in this project, and this study thereby served as a guideline for further development Solution. By using a CUDA-enabled GPU, their novel solution to the Subset Sum problem showed that using a GPU was highly impressive. Their results included processing speedups of up to 20. Ristovski et al. [4] discuss how there is currently a shift in the parallel programming paradigm as more and more work is being done by using a CPU and GPU that work in tandem, such that each architecture can be used for the advantages they uniquely offer whilst moving towards the greater goal of using efficient parallel computing to solve difficult problems. Figure 2 shows a comparison of both the sequential and parallel implementation of Ristovski et al. s [4] implementation of the modified version Subset Sum Solution. The modified version of Subset Sum that Petkovski et al. [3] use in this paper is a solution to the problem that takes in three inputs, X, Y & Z. The solution works by initially working through the set to find all vectors that have X elements such that each element in X is less than Y, and the sum of all those elements are equal to the target sum Z. The main goal of this implementation was to keep in focus a solution that worked while also reducing the number of permutations run by each thread as much as possible. Another focus was to ensure that memory was being allocated only to what was absolutely necessary and there weren t any redundant results or computations being stored to cut down on on the cost of the problem. Eventually Petkovski et al. [3] present results which show significant speedup and also prove the base hypothesis that Subset Sum scales with growth in the problem size. In OpenCL, an application is structured so that multiple data structures can interact with a host application. Unlike a program using C/CUDA which is locked to a device with a compatible NVIDIA GPU, OpenCL applications are capable of running on hardware manufactured by a variety of companies. It makes OpenCL a much more flexible choice when hardware limitations are present prior to building the application. Figure 3 depicts the structure of a generic OpenCL application with the data structures it interacts with. Figure 2: Image - Performance [4] Figure 3: Image - OpenCL Structure [3] 3.2 Research Paper Two Parallel implementation of the modified subset sum problem in CUDA [4] by Z. Ristovski, I. Mishkovski, S. Gramatikov and S. Filiposka [ nd Telecommunications Forum Telfor (TELFOR)] Problem. This paper presented a solution for a modified version of the Subset Sum problem that involved parallelization in such a manner that the solution took advantage of a GPU to perform the computation necessary to find a solution. The CUDA design was presented before delving into the results. 2

3 3.3.3 Use in Our Project. This algorithm was focused on memory efficiency, and we used it to help develop our algorithm so that we too, could attempt a solution that would hopefully only use the smallest amount of memory necessary for the solution to be found. 4 ALGORITHM DESIGN This section covers the two main algorithms that we are using in our implementation of the subset sum parallel algorithm. The first is the sequential version of the algorithm where we cover how we will parallelize it. The second version is the actual parallelized version of the algorithm, along with a short explanation and some initial hypothesis. At the end, we check the cell to see if its value is True or False. This tells us whether or not a subset was found such that all elements within the subset added up are equal to the target sum. It should be noted that there can be multiple subsets which total the target sum, but if the result is True then we know that we found one of the potentially many subsets. 4.1 Sequential Program Algorithm 1 SubsetSumSeq Ensure: A subset which adds up to a given sum. 1: procedure SubsetSeq(Set, Sum) 2: for idx: 0 to n do 3: dynarray[idx][0] = TRUE 4: for idx: 1 to n do 5: curp = set[idx] 6: for idx2: 0 to setlen - 1 do 7: dynarray[idx][idx2] = dynarray[idx - 1][idx2] 8: for idx2: 0 to sum do 9: newp = curp + idx2 10: if getbit(dynarray[idx][idx2]) == 1 and newp <= sum then 11: setbit(dynarray[idx - 1], newp) In this sequential program (Algorithm 1), we currently see only the fragment of the program that deals with the Subset Sum problem. By the time we reach this point, we checked that the constructor argument has been passed in, the SubsetSpec argument has been created and that all bounds-checking has passed. The passing of these argument checks allows us to continue on with the program, to a high degree of certainty. We instantiate our 2D dynamic table and start by initializing the first column in each row to be True. This is one of the loops which we will later parallelize. While it will not greatly help with the performance of the subset search, it does reduce the time needed to initialize a larger 2D table. This will help us when we work with large set sizes. The second loop (line 4) is not going to be parallelized, since we will encounter sequential dependencies on the loops within. However, the inner loops can and will be parallelized. This is where parallelization will really provide us with the biggest speedup, as we can parallelize two operations, both of which do not have any sequential dependencies. If we were to parallelize the outer loop then we would be introducing a sequential dependency into the program which is less than ideal as it will make the results incorrect. We keep track of the current index, and in both loops build the 2D table in a bottom-up fashion as is commonly done in dynamic programming. As is also the case with dynamic programming, the complexity of the program/algorithm is roughly equivalent to the size and dimensions of the dynamic table. Figure 4: Image - Sequential Flow 3

4 4.2 Parallel Program Algorithm 2 SubsetSumSmp Ensure: A subset which adds up to a given sum. 1: procedure SubsetSmp(Set, Sum) 2: parallelfor idx: 0 to n do 3: dynarray[idx][0] = TRUE 4: end 5: for idx: 1 to n do 6: curp = set[idx] 7: parallelfor idx2: 0 to setlen do 8: dynarray[idx][idx2] = dynarray[idx - 1][idx2] 9: end 10: parallelfor idx2: 0 to sum do 11: newp = curp + idx2 12: if getbit(dynarray[idx][idx2]) == 1 and newp <= sum then 13: setbit(dynarray[idx - 1], newp) 14: end In this sequential program, we currently see only the fragment of the program that deals with the Subset Sum problem. By the time we reach this point, we assume that the constructor argument has been passed in, the SubsetSpec argument has been created and that all bounds-checking has passed. This fragment algorithm and the program flow diagram look almost identical to the sequential version, however the initial for loop and the two inner for loops have now been made into parallelfor loops. These loops will run in parallel, and the results will be written into the 2D dynamic table as before. At the end, we check the cell to see if it s value is True or False. This tells us whether or not a subset was found Initial Hypothesis. Initially, we set out to do this project with a couple ideas in mind of how this problem would work. We believe that parallelization will help, but only to an extent. Beyond that, we expect to see slower running times. 5 MANUALS 5.1 Developer Manual For this project all our code was tested on the tardis computer provided by Dr. Kaminsky. Compiling the program requires first making sure that the program will be compiled with Java 1.7 and that the PJ2 [2] distribution is properly included in the class path. In order to compile the program with Java 1.7, you will have to first set the class path to include the 1.7 SDK. This can be done two ways based on the shell environment being used. For the bash shell: export PATH=/usr/local/dcs/versions/1.7.0\_51/bin:\$PATH For the csh shell: setenv PATH /usr/local/dcs/versions/1.7.0\_51/bin:\$path In order to compile the program with PJ2 [2], you will have to first set the class path to include the PJ2 [2] distribution. This can be done two ways based on the shell environment being used. For the bash shell: export CLASSPATH=.:/var/tmp/parajava/pj2/pj2.jar For the csh shell: setenv CLASSPATH.:/var/tmp/parajava/pj2/pj2.jar After this, make a directory build in which all the compiled class files will be stored. This can be done by running the following: mkdir build Now compile the Java class files: javac -d./build *.java This line will compile all files with a.java extension and move the compiled class files to the build directory. Change into that directory with cd build and then build the final jar with the following: jar cvf proj.jar * That line will package all the class files into a single jar called proj. When the user is then running the program using the PJ2 launcher, they will have to identify the name of the jar file in the command line arguments. 5.2 User Manual In order to run the Subset Sum program on tardis you will first have to set the class path to include the Java 1.7 SDK and the PJ2 [2] distribution using the following instructions: Figure 5: Image - Parallel Flow For the bash shell: export PATH=/usr/local/dcs/versions/1.7.0\_51/bin:\$PATH export CLASSPATH=.:/var/tmp/parajava/pj2/pj2.jar For the csh shell: setenv PATH /usr/local/dcs/versions/1.7.0\_51/bin:\$path setenv CLASSPATH.:/var/tmp/parajava/pj2/pj2.jar Now you are ready to run the Subset Sum program. The program s command line argument includes a single string argument which is a constructor for a SubsetSpec class. The various SubsetSpec classes were used to generate different sets for the SubsetSum problem and were used for testing. The main spec 4

5 <sum> - Positive integer target sum for the problem. Below are screen shots of our program running in a terminal window: class is called RandomSet and its internal arguments will define the bounds by which the pseudo-random number generator will construct the set for the program. The various spec constructors are: RandomSet [Creates a set of random numbers] [Main spec object designed for this program] LinearSet [Creates an increasing set size by a step counter] SameSet [Creates a set of length size of a repeating number] [Used only for testing] FibonacciSet [Creates a set of the first size numbers in the sequence] [Made for fun only, as a large enough set will allow for any number to be found as a sum] The command line structure uses PJ2 [2] as the launcher for the program, and it takes in multiple arguments: (1) (2) (3) (4) Name of the jar [Required] Makespan argument to print out runtime [Optional] Class name (SubsetSumSeq or SubsetSumSmp) [Required] SubsetSpec constructor in quotes [Required] Below is an example of the full command line argument for both the sequential and parallel versions of this program using the RandomSet constructor: Figure 6: Image - Screenshot: No Solution java pj2 jar=proj.jar debug=makespan SubsetSumSeq "RandomSet(<lb>,<ub>,<seed>,<size>,<sum>)" java pj2 jar=proj.jar debug=makespan cores=<k> SubsetSumSmp "RandomSet(<lb>,<ub>,<seed>,<size>,<sum>)" NOTE: the command line arguments should all be written on one line. The constructor argument in this report appears below the rest of the arguments for report formatting. If you look at the constructor RandomSet(lb,ub,seed,size,sum), you ll see multiple arguments within the RandomSet constructor. Below are the breakdown of what each one is: <lb> - Lower bound integer greater than zero <ub> - Upper bound integer greater than lower bound <seed> - Random integer seed for the PRNG <size> - Positive integer size for the global set <sum> - Positive integer target sum for the problem. Below is an example of the full command line argument for both the sequential and parallel versions of this program using the LinearSet constructor: Figure 7: Image - Screenshot: Solution Found java pj2 jar=proj.jar debug=makespan SubsetSumSeq "LinearSet(<start>,<step>,<size>,<sum>)" 6 PERFORMANCE 6.1 Strong Scaling java pj2 jar=proj.jar debug=makespan cores=<k> SubsetSumSmp "LinearSet(<start>,<step>,<size>,<sum>)" Strong Scaling is the scaling procedure where performance is measured by keeping the problem size the same, but continually increasing the number of cores on which the program runs. This means that ideally the program should take K1 time to compute the answer where K is the number of cores. This will result in a program speedup (ideally, of K times). For our project, we ran strong scaling tests for two different variants. In the first, we fix the sum such that it is too large, and that the solution will never be found. In the second, we fix the sum such that it is small enough to be If you look at the constructor: LinearSet(start,step,size,sum), you ll see multiple arguments within the LinearSet constructor. Below are the breakdown of what each one is: <start> - Lower bound integer greater than zero <step> - Interval step integer greater than zero (If you use a step of 1, this will become a ConsecutiveSet) <size> - Positive integer size for the global set 5

6 found. We do this by making the sum equal to the upper bound multiplied by the set size in order to not find a solution. To ensure that we do find a solution, we fix the sum to be half of what it was in the Not Found variant. Below starting at Figure 8, we see the graph output for strong scaling where no solution is possible: Figure 10: Plotting Efficiency Vs. Cores Figure 8: Plotting Running Time Vs. Cores From the figures above we observe that running time did decrease for each core added but that the curve started to flatten out towards the bottom, indicating that after a point the decrease in time was not really enough to make a significant change (i.e.; beyond approximately 7 cores the speedups were not significant enough to warrant using more cores). The speedup graph showed a pretty linear trend, and we were able to see that at certain numbers of cores, we got a slightly higher speedup than others. Our efficiency graph did show that adding more cores contributed to a drop in efficiency. We worked this out to be because the increase in reduction was not as warranted with the set sizes and sums we were using. Currently, the output indicates that going from a sequential program to a parallel program running on 4 cores seems to provide the best overall speedup. Below we now present the graphs for strong scaling that were obtained when the sum was fixed so that it would always be found: Figure 9: Plotting Speedup Vs. Cores 6

7 Figure 13: Image - Plotting Efficiency Vs. Cores Figure 11: Image - Plotting Running Time Vs. Cores Figure 12: Image - Plotting Speedup Vs. Cores 7

8 The same graphs also show the data we see results for when subsets are found. A direct comparison to the first set of graphs reveals that the trends in running times, speedups and efficiency seem to follow the same general trend, but the program does in fact perform poorer when a solution is found. Comparing the running times show that even with just an adjustment from one core to two, the drop in running time is not as significant when there is a solution compared to when there isn t a solution. However speedup seems to take on a slight curve as the performance tends towards a plateau when more cores are added. Efficiency here shows that running the program with 12 cores results in quite a big drop in performance. Below are the tables containing the numbers used to generate the above graphs. The figures depict the runtimes for both variants of strong scaling side by side: Figure 16: Problem Size 2: No Solution Figure 14: Problem Size 1: No Solution Figure 17: Problem Size 2: Solution Found Figure 15: Problem Size 1: Solution Found Figure 18: Problem Size 3: No Solution 8

Figure 19: Problem Size 3: Solution Found Figure 22: Problem Size 5: No Solution Figure 20: Problem Size 4: No Solution Figure 23: Problem Size 5: Solution Found 6.

However, we can see from the results that the scaling is non-ideal.

9 Figure 19: Problem Size 3: Solution Found Figure 22: Problem Size 5: No Solution Figure 20: Problem Size 4: No Solution Figure 23: Problem Size 5: Solution Found 6.2 Strong Scaling Discussion With strong scaling, the ideal result would be that the time taken when the work is spread amongst X cores would decrease by a factor of X. However, we can see from the results that the scaling is non-ideal. In our case, when we increase the number of cores to 2, we see that the time taken drops to almost half the time taken on a single core, but when we use 3 cores, we don t see the time taken drop to near a third of the original time. A reason for this is that the program is not equally partitioning the workload amongst all three cores, we ve just added another core to use. In that case, the speedup is closer to 50% of what it would ideally be. Another reason we think could be the case is that the thread synchronization could be contributing to the non-ideal scaling overhead. Figure 21: Problem Size 4: Solution Found 6.3 Weak Scaling Weak Scaling is the scaling procedure where performance is measured by increasing the problem size by the same proportion as the increase in cores. This means that ideally a program that takes X time on one core should take the same time to compute the answer where the problem size is K times larger and is now running on K cores. This will result in a program sizeup. For our project, we 9

10 ran weak scaling tests for two different variants (Like with strong scaling). In the first, we fix the sum such that it is too large, and that the solution will never be found. In the second, we fix the sum such that it is small enough to be found. We do this by taking the set size at 12 cores and multiplying it by the upper bound for the Not Found variant, and we make the sum equal to the sum of all elements when using only 1 core for the Found variant. Within each variant, we ran three iterations of scaling where the program was tested on all cores with three different levels of computation. This when graphed will show us multiple lines on each graph, thereby allowing us to see how the program performs on small, medium and large size problem sets. Below we see the graph outputs for weak scaling where no solution is possible: Figure 25: Image - Plotting Sizeup Vs. Cores Figure 24: Image - Plotting Running Time Vs. Cores Figure 26: Image - Plotting Efficiency Vs. Cores From this first set of graphs, we see that when performing weak scaling, we observe increases in the time it takes to run the program. Our sizeup graph shows a curve that tends towards a plateau point. We believe that is the reinforcement of our hypothesis that this program would be subject to diminishing returns at a certain number of cores. The sizeup also shows an outlier, when using 8 cores. The efficiency graph reflects the outlier for the same amount of cores, 10

11 and generally shows a decrease in program efficiency when more cores are added. Below we now present figures which show the weak scaling test results when the sum is fixed such that it will always be found. Figure 29: Image - Plotting Efficiency Vs. Cores Figure 27: Image - Plotting Running Time Vs. Cores We some new activity here! With running times, we see a slightly steeper upward trend, but for the medium-difficulty problem we observe that with 4 cores the running time goes down. 4 cores, as we noted in Strong Scaling, seems to be the sweet spot for this program. Our sizeup graphs reflects the 4 core sweet spot by showing a burst in sizeup, however the graph generally indicates that while there is a bell-curve trend, the actual improvements seem to be all over the place. As with Strong Scaling, the efficiency graph for when a solution is present shows quite a steep drop in efficiency as more cores are added. Below the first stage consists of the tabular data for Weak Scaling runtimes: Figure 28: Image - Plotting Sizeup Vs. Cores Figure 30: Problem Size 1: No Solution 11

12 Figure 31: Problem Size 1: Solution Found Figure 33: Problem Size 2: Solution Found Figure 32: Problem Size 2: No Solution Figure 34: Problem Size 3: No Solution 12

13 Figure 35: Problem Size 3: Solution Found Figure 37: Problem Size 4: Solution Found Figure 36: Problem Size 4: No Solution Figure 38: Problem Size 5: No Solution 13

14 8 KNOWLEDGE GAINED This project allowed us to learn more about the Subset Sum algorithm and the variety of implementations that exist. It allowed us to hone our understanding of how to write a parallel program using PJ2 [2], a library we ve become incredibly interested in seeing if we could use towards other projects both personal and academic. This project better taught us how to handle situations where incorrect results were being received due to not checking fully for sequential dependencies on loops when attempting to parallelize them. The class and the project gave us experience in creating spec objects such that rather than passing in the entire set we could pass in a constructor for a class that would then create a set for us based on the parameters accepted by the constructor. Figure 39: Problem Size 5: Solution Found This concludes the tabular data for Weak Scaling for when a solution is available. 6.4 Weak Scaling Discussion With weak scaling, the ideal result would be that the time taken to do a certain problem size on a single core should be approximately equal to the time taken to do double that problem size on two cores. However, we can see from the graphs and tables above that the scaling is again, not ideal. This happens for a similar reason as with strong scaling; while the work is increased and the number of cores increases too, the new workload is not being spread across the cores as evenly as possible. This leads to only a partial sizeup, which as we can see from the graphs tends towards a plateau much faster than with strong scaling. Again, we think that another reason for the resulting scaling could be caused by thread synchronization. 9 TEAM BREAKDOWN Both of us were involved with choosing Subset Sum as our project topic, and we both sat together and researched the articles that we would later end up using to guide our program development and learn from about other implementations of the algorithm. We stored the project in a git repository so that we would both be able to contribute to all source files. Scaling tests, presentations and the report were also done as a fully shared effort. REFERENCES [1] Saniyah S. Bokhari Parallel Solution of the Subset-sum Problem: An Empirical Study. Ohio State University, Ohio State, OH. org/f3fc/b462b7366ab7d91febe5fb ff63dd.pdf Date Accessed: September 24, 2018 URL: [2] Alan Kaminsky Parallel Java 2. [3] Dushan Petkovski and Igor Mishkovski Parallel Implementation of the Modified Subset Sum Problem in OpenCL. ICT Innovations 2015, Web Proceedings ISSN null (2015), Date Accessed: October 3, 2018 URL: [4] Z. Ristovski, I. Mishkovski, S. Gramatikov, and S. Filiposka Parallel implementation of the modified subset sum problem in CUDA. (Nov 2014), Date Accessed: September 19, 2018 URL: 7 FUTURE WORK As in any project, there is room for more features to be added in the future. Given the time, we thought that good additions to this project could include modifier flags to be used as command line arguments which would then be used to select parameters of the subset algorithm itself. For example; a boolean flag of pos could be used to toggle whether or not the algorithm should allow both positive and negative integers in the global set and consider them when creating the subsets or to work with only positive integers. A flag of numtype could be set to int, long or double to force the SubsetSpec object to create the sets of numbers that use only the data-type passed as a parameter. Future work could also include adding the ability to run the Subset Sum algorithm on one or more GPUs such as the NVIDIA unites connected to the kraken computer. This would involve writing another version of the Subset Sum algorithm that interfaced with a kernel written in C/CUDA. 14

Subset Sum Problem Parallel Solution

Subset Sum Problem Parallel Solution Project Report Harshit Shah hrs8207@rit.edu Rochester Institute of Technology, NY, USA 1. Overview Subset sum problem is NP-complete problem which can be solved in