Subset Sum - A Dynamic Parallel Solution

Size: px
Start display at page:

Download "Subset Sum - A Dynamic Parallel Solution"

Transcription

1 Subset Sum - A Dynamic Parallel Solution Team Cthulu - Project Report ABSTRACT Tushar Iyer Rochester Institute of Technology Rochester, New York txi9546@rit.edu The subset sum problem is an NP-Complete problem which depending on the size of the set and sum can take a long time to complete. This paper introduces a student teams research project into speeding up the calculation of SubsetSum by performing calculations in parallel using PJ2. The students discuss the reasons for picking their topic before discussing the design of their sequential and parallel programs. Following that they discuss the results of their project by showing the Strong and Weak scaling results. They state their hypothesis about why their scaling is not ideal, and concluding with their future plans and lessons they learned throughout the time spent on this project. 1 TOPIC AREA Our chosen problem falls under the umbrella of NP-Complete problems. An NP-Complete problem is one that is both NP-Hard and in NP. The problem can be solved in pseudo-polynomial time, and until the P=NP conundrum is solved it is unknown as to whether or not there can be a polynomial time solution. In this problem area, the time taken to find a solution will increase in proportion to the number of the inputs multiplied by the sum of their values. As the size of the input grows, the computational effort will grow exponentially. Our solution to addressing the increase in computational size and time taken is to rewrite the solution in parallel. This way we should be able to break the computation amongst multiple cores in order to speed up the time taken to find a solution. 2 COMPUTATIONAL PROBLEM The computational problem we aim to solve is Subset Sum. The premise of Subset Sum asks whether for a given set X and a target sum Y, does there exist a subset of X called Z such that the sum of all elements in Z are equal to the value of Y? We are building a solution around a modified version of the Subset Sum problem. In our solution, we consider that all elements within the set X as well as the target Y must be positive integers greater than zero. Furthermore, we are considering the solution of Subset Sum that uses Dynamic Programming to build the solution. We chose the dynamic algorithm since be definition a dynamic algorithm is one that successfully simplifies a problem by dividing it into smaller sub-parts which are more computationally manageable. We believe that this lies in tune with how we would normally approach building a parallel solution for any other computational problem, as it can help provide a boost in computational performance. We are using the Parallel Java 2 library (PJ2) [2] in order to build our solution. This library features a specific syntax which allows a for loop to be conducted in parallel. Since the dynamic algorithm for the Subset Sum problem features multiple for loops, we hope Aziel Shaw Rochester Institute of Technology Rochester, New York ats9095@rit.edu to be able to take advantage of this structure and this library in order to give us a significant speedup. 3 RESEARCH PAPER ANALYSIS 3.1 Research Paper One Parallel Solution of the Subset-sum Problem: An Empirical Study [1] by Bokhari, Saniyah S. [Ohio State University (2011)] Problem. The paper parallelizes subset sum problem on three different architectures (Cray XMT, IBM x3755, NVIDIA FX 5800). The study resulted in the creation of two different algorithms for parallelization. The original hypothesis was that the performance would not be a fixed factor and would vary depending on the computational size of the problem. After creating the solution, it is then evaluated on these three machines. The performance is discussed and then Bokhari goes on to confirm the hypothesis of variation of performance Solution. All three architectures gave interesting results; The IBM x3755 performed with decent scaling up to 8 cores. The Cray XMT performed with good scaling up to 16 nodes, and its 128-bit processor was better suited to larger problem sizes. Unlike the other two machines, the Cray XMT also features word-locking, which means that no two threads can access the current word simultaneously. In order to take advantage of that, Bokhari et al. [1] wrote a modified implementation that would use word-locking as part of the parallel solution. Most interestingly out of the three, the NVIDIA FX 5800 was not the best performer out of the three for medium sized problems, but rather it performed well for the smaller variety of problem sizes (Essentially resulting in the finding that it would always perform decently when the problem would fit in memory). Figure 1 on the next page shows the comparison of the performance of all three architectures up against each other.

2 3.2.3 Use in Our Project. Upon researching this paper, we had mentioned that we would work with it to see if parallelization through CUDA was feasible in our case, and decided later that we would more likely be able to produce a decent result with a Multicore parallel solution. We did however take away from this paper the effect of proper GPU computing and how that could potentially help lessen the computational load from a CPU to thousands of CUDA kernels. 3.3 Research Paper Three Parallel Implementation of the Modified Subset Sum Problem in OpenCL [3] by D. Petkovski, and I. Mishkovski [ICT Innovations 2015 Web Proceedings ISSN ] Problem. This paper presented a solution for a modified version of the Subset Sum problem that involved parallelization in such a manner that memory allocation was made dynamic and less memory would thereby required for a solution. It delves into how parallelization was done using OpenCL before discussing the resulting implementation. Figure 1: Image - Performance [1] Use in Our Project. The dynamic algorithm that was used for the Cray XMT is similar to the algorithm we used for both the sequential and parallel implementations in this project, and this study thereby served as a guideline for further development Solution. By using a CUDA-enabled GPU, their novel solution to the Subset Sum problem showed that using a GPU was highly impressive. Their results included processing speedups of up to 20. Ristovski et al. [4] discuss how there is currently a shift in the parallel programming paradigm as more and more work is being done by using a CPU and GPU that work in tandem, such that each architecture can be used for the advantages they uniquely offer whilst moving towards the greater goal of using efficient parallel computing to solve difficult problems. Figure 2 shows a comparison of both the sequential and parallel implementation of Ristovski et al. s [4] implementation of the modified version Subset Sum Solution. The modified version of Subset Sum that Petkovski et al. [3] use in this paper is a solution to the problem that takes in three inputs, X, Y & Z. The solution works by initially working through the set to find all vectors that have X elements such that each element in X is less than Y, and the sum of all those elements are equal to the target sum Z. The main goal of this implementation was to keep in focus a solution that worked while also reducing the number of permutations run by each thread as much as possible. Another focus was to ensure that memory was being allocated only to what was absolutely necessary and there weren t any redundant results or computations being stored to cut down on on the cost of the problem. Eventually Petkovski et al. [3] present results which show significant speedup and also prove the base hypothesis that Subset Sum scales with growth in the problem size. In OpenCL, an application is structured so that multiple data structures can interact with a host application. Unlike a program using C/CUDA which is locked to a device with a compatible NVIDIA GPU, OpenCL applications are capable of running on hardware manufactured by a variety of companies. It makes OpenCL a much more flexible choice when hardware limitations are present prior to building the application. Figure 3 depicts the structure of a generic OpenCL application with the data structures it interacts with. Figure 2: Image - Performance [4] Figure 3: Image - OpenCL Structure [3] 3.2 Research Paper Two Parallel implementation of the modified subset sum problem in CUDA [4] by Z. Ristovski, I. Mishkovski, S. Gramatikov and S. Filiposka [ nd Telecommunications Forum Telfor (TELFOR)] Problem. This paper presented a solution for a modified version of the Subset Sum problem that involved parallelization in such a manner that the solution took advantage of a GPU to perform the computation necessary to find a solution. The CUDA design was presented before delving into the results. 2

3 3.3.3 Use in Our Project. This algorithm was focused on memory efficiency, and we used it to help develop our algorithm so that we too, could attempt a solution that would hopefully only use the smallest amount of memory necessary for the solution to be found. 4 ALGORITHM DESIGN This section covers the two main algorithms that we are using in our implementation of the subset sum parallel algorithm. The first is the sequential version of the algorithm where we cover how we will parallelize it. The second version is the actual parallelized version of the algorithm, along with a short explanation and some initial hypothesis. At the end, we check the cell to see if its value is True or False. This tells us whether or not a subset was found such that all elements within the subset added up are equal to the target sum. It should be noted that there can be multiple subsets which total the target sum, but if the result is True then we know that we found one of the potentially many subsets. 4.1 Sequential Program Algorithm 1 SubsetSumSeq Ensure: A subset which adds up to a given sum. 1: procedure SubsetSeq(Set, Sum) 2: for idx: 0 to n do 3: dynarray[idx][0] = TRUE 4: for idx: 1 to n do 5: curp = set[idx] 6: for idx2: 0 to setlen - 1 do 7: dynarray[idx][idx2] = dynarray[idx - 1][idx2] 8: for idx2: 0 to sum do 9: newp = curp + idx2 10: if getbit(dynarray[idx][idx2]) == 1 and newp <= sum then 11: setbit(dynarray[idx - 1], newp) In this sequential program (Algorithm 1), we currently see only the fragment of the program that deals with the Subset Sum problem. By the time we reach this point, we checked that the constructor argument has been passed in, the SubsetSpec argument has been created and that all bounds-checking has passed. The passing of these argument checks allows us to continue on with the program, to a high degree of certainty. We instantiate our 2D dynamic table and start by initializing the first column in each row to be True. This is one of the loops which we will later parallelize. While it will not greatly help with the performance of the subset search, it does reduce the time needed to initialize a larger 2D table. This will help us when we work with large set sizes. The second loop (line 4) is not going to be parallelized, since we will encounter sequential dependencies on the loops within. However, the inner loops can and will be parallelized. This is where parallelization will really provide us with the biggest speedup, as we can parallelize two operations, both of which do not have any sequential dependencies. If we were to parallelize the outer loop then we would be introducing a sequential dependency into the program which is less than ideal as it will make the results incorrect. We keep track of the current index, and in both loops build the 2D table in a bottom-up fashion as is commonly done in dynamic programming. As is also the case with dynamic programming, the complexity of the program/algorithm is roughly equivalent to the size and dimensions of the dynamic table. Figure 4: Image - Sequential Flow 3

4 4.2 Parallel Program Algorithm 2 SubsetSumSmp Ensure: A subset which adds up to a given sum. 1: procedure SubsetSmp(Set, Sum) 2: parallelfor idx: 0 to n do 3: dynarray[idx][0] = TRUE 4: end 5: for idx: 1 to n do 6: curp = set[idx] 7: parallelfor idx2: 0 to setlen do 8: dynarray[idx][idx2] = dynarray[idx - 1][idx2] 9: end 10: parallelfor idx2: 0 to sum do 11: newp = curp + idx2 12: if getbit(dynarray[idx][idx2]) == 1 and newp <= sum then 13: setbit(dynarray[idx - 1], newp) 14: end In this sequential program, we currently see only the fragment of the program that deals with the Subset Sum problem. By the time we reach this point, we assume that the constructor argument has been passed in, the SubsetSpec argument has been created and that all bounds-checking has passed. This fragment algorithm and the program flow diagram look almost identical to the sequential version, however the initial for loop and the two inner for loops have now been made into parallelfor loops. These loops will run in parallel, and the results will be written into the 2D dynamic table as before. At the end, we check the cell to see if it s value is True or False. This tells us whether or not a subset was found Initial Hypothesis. Initially, we set out to do this project with a couple ideas in mind of how this problem would work. We believe that parallelization will help, but only to an extent. Beyond that, we expect to see slower running times. 5 MANUALS 5.1 Developer Manual For this project all our code was tested on the tardis computer provided by Dr. Kaminsky. Compiling the program requires first making sure that the program will be compiled with Java 1.7 and that the PJ2 [2] distribution is properly included in the class path. In order to compile the program with Java 1.7, you will have to first set the class path to include the 1.7 SDK. This can be done two ways based on the shell environment being used. For the bash shell: export PATH=/usr/local/dcs/versions/1.7.0\_51/bin:\$PATH For the csh shell: setenv PATH /usr/local/dcs/versions/1.7.0\_51/bin:\$path In order to compile the program with PJ2 [2], you will have to first set the class path to include the PJ2 [2] distribution. This can be done two ways based on the shell environment being used. For the bash shell: export CLASSPATH=.:/var/tmp/parajava/pj2/pj2.jar For the csh shell: setenv CLASSPATH.:/var/tmp/parajava/pj2/pj2.jar After this, make a directory build in which all the compiled class files will be stored. This can be done by running the following: mkdir build Now compile the Java class files: javac -d./build *.java This line will compile all files with a.java extension and move the compiled class files to the build directory. Change into that directory with cd build and then build the final jar with the following: jar cvf proj.jar * That line will package all the class files into a single jar called proj. When the user is then running the program using the PJ2 launcher, they will have to identify the name of the jar file in the command line arguments. 5.2 User Manual In order to run the Subset Sum program on tardis you will first have to set the class path to include the Java 1.7 SDK and the PJ2 [2] distribution using the following instructions: Figure 5: Image - Parallel Flow For the bash shell: export PATH=/usr/local/dcs/versions/1.7.0\_51/bin:\$PATH export CLASSPATH=.:/var/tmp/parajava/pj2/pj2.jar For the csh shell: setenv PATH /usr/local/dcs/versions/1.7.0\_51/bin:\$path setenv CLASSPATH.:/var/tmp/parajava/pj2/pj2.jar Now you are ready to run the Subset Sum program. The program s command line argument includes a single string argument which is a constructor for a SubsetSpec class. The various SubsetSpec classes were used to generate different sets for the SubsetSum problem and were used for testing. The main spec 4

5 <sum> - Positive integer target sum for the problem. Below are screen shots of our program running in a terminal window: class is called RandomSet and its internal arguments will define the bounds by which the pseudo-random number generator will construct the set for the program. The various spec constructors are: RandomSet [Creates a set of random numbers] [Main spec object designed for this program] LinearSet [Creates an increasing set size by a step counter] SameSet [Creates a set of length size of a repeating number] [Used only for testing] FibonacciSet [Creates a set of the first size numbers in the sequence] [Made for fun only, as a large enough set will allow for any number to be found as a sum] The command line structure uses PJ2 [2] as the launcher for the program, and it takes in multiple arguments: (1) (2) (3) (4) Name of the jar [Required] Makespan argument to print out runtime [Optional] Class name (SubsetSumSeq or SubsetSumSmp) [Required] SubsetSpec constructor in quotes [Required] Below is an example of the full command line argument for both the sequential and parallel versions of this program using the RandomSet constructor: Figure 6: Image - Screenshot: No Solution java pj2 jar=proj.jar debug=makespan SubsetSumSeq "RandomSet(<lb>,<ub>,<seed>,<size>,<sum>)" java pj2 jar=proj.jar debug=makespan cores=<k> SubsetSumSmp "RandomSet(<lb>,<ub>,<seed>,<size>,<sum>)" NOTE: the command line arguments should all be written on one line. The constructor argument in this report appears below the rest of the arguments for report formatting. If you look at the constructor RandomSet(lb,ub,seed,size,sum), you ll see multiple arguments within the RandomSet constructor. Below are the breakdown of what each one is: <lb> - Lower bound integer greater than zero <ub> - Upper bound integer greater than lower bound <seed> - Random integer seed for the PRNG <size> - Positive integer size for the global set <sum> - Positive integer target sum for the problem. Below is an example of the full command line argument for both the sequential and parallel versions of this program using the LinearSet constructor: Figure 7: Image - Screenshot: Solution Found java pj2 jar=proj.jar debug=makespan SubsetSumSeq "LinearSet(<start>,<step>,<size>,<sum>)" 6 PERFORMANCE 6.1 Strong Scaling java pj2 jar=proj.jar debug=makespan cores=<k> SubsetSumSmp "LinearSet(<start>,<step>,<size>,<sum>)" Strong Scaling is the scaling procedure where performance is measured by keeping the problem size the same, but continually increasing the number of cores on which the program runs. This means that ideally the program should take K1 time to compute the answer where K is the number of cores. This will result in a program speedup (ideally, of K times). For our project, we ran strong scaling tests for two different variants. In the first, we fix the sum such that it is too large, and that the solution will never be found. In the second, we fix the sum such that it is small enough to be If you look at the constructor: LinearSet(start,step,size,sum), you ll see multiple arguments within the LinearSet constructor. Below are the breakdown of what each one is: <start> - Lower bound integer greater than zero <step> - Interval step integer greater than zero (If you use a step of 1, this will become a ConsecutiveSet) <size> - Positive integer size for the global set 5

6 found. We do this by making the sum equal to the upper bound multiplied by the set size in order to not find a solution. To ensure that we do find a solution, we fix the sum to be half of what it was in the Not Found variant. Below starting at Figure 8, we see the graph output for strong scaling where no solution is possible: Figure 10: Plotting Efficiency Vs. Cores Figure 8: Plotting Running Time Vs. Cores From the figures above we observe that running time did decrease for each core added but that the curve started to flatten out towards the bottom, indicating that after a point the decrease in time was not really enough to make a significant change (i.e.; beyond approximately 7 cores the speedups were not significant enough to warrant using more cores). The speedup graph showed a pretty linear trend, and we were able to see that at certain numbers of cores, we got a slightly higher speedup than others. Our efficiency graph did show that adding more cores contributed to a drop in efficiency. We worked this out to be because the increase in reduction was not as warranted with the set sizes and sums we were using. Currently, the output indicates that going from a sequential program to a parallel program running on 4 cores seems to provide the best overall speedup. Below we now present the graphs for strong scaling that were obtained when the sum was fixed so that it would always be found: Figure 9: Plotting Speedup Vs. Cores 6

7 Figure 13: Image - Plotting Efficiency Vs. Cores Figure 11: Image - Plotting Running Time Vs. Cores Figure 12: Image - Plotting Speedup Vs. Cores 7

8 The same graphs also show the data we see results for when subsets are found. A direct comparison to the first set of graphs reveals that the trends in running times, speedups and efficiency seem to follow the same general trend, but the program does in fact perform poorer when a solution is found. Comparing the running times show that even with just an adjustment from one core to two, the drop in running time is not as significant when there is a solution compared to when there isn t a solution. However speedup seems to take on a slight curve as the performance tends towards a plateau when more cores are added. Efficiency here shows that running the program with 12 cores results in quite a big drop in performance. Below are the tables containing the numbers used to generate the above graphs. The figures depict the runtimes for both variants of strong scaling side by side: Figure 16: Problem Size 2: No Solution Figure 14: Problem Size 1: No Solution Figure 17: Problem Size 2: Solution Found Figure 15: Problem Size 1: Solution Found Figure 18: Problem Size 3: No Solution 8

9 Figure 19: Problem Size 3: Solution Found Figure 22: Problem Size 5: No Solution Figure 20: Problem Size 4: No Solution Figure 23: Problem Size 5: Solution Found 6.2 Strong Scaling Discussion With strong scaling, the ideal result would be that the time taken when the work is spread amongst X cores would decrease by a factor of X. However, we can see from the results that the scaling is non-ideal. In our case, when we increase the number of cores to 2, we see that the time taken drops to almost half the time taken on a single core, but when we use 3 cores, we don t see the time taken drop to near a third of the original time. A reason for this is that the program is not equally partitioning the workload amongst all three cores, we ve just added another core to use. In that case, the speedup is closer to 50% of what it would ideally be. Another reason we think could be the case is that the thread synchronization could be contributing to the non-ideal scaling overhead. Figure 21: Problem Size 4: Solution Found 6.3 Weak Scaling Weak Scaling is the scaling procedure where performance is measured by increasing the problem size by the same proportion as the increase in cores. This means that ideally a program that takes X time on one core should take the same time to compute the answer where the problem size is K times larger and is now running on K cores. This will result in a program sizeup. For our project, we 9

10 ran weak scaling tests for two different variants (Like with strong scaling). In the first, we fix the sum such that it is too large, and that the solution will never be found. In the second, we fix the sum such that it is small enough to be found. We do this by taking the set size at 12 cores and multiplying it by the upper bound for the Not Found variant, and we make the sum equal to the sum of all elements when using only 1 core for the Found variant. Within each variant, we ran three iterations of scaling where the program was tested on all cores with three different levels of computation. This when graphed will show us multiple lines on each graph, thereby allowing us to see how the program performs on small, medium and large size problem sets. Below we see the graph outputs for weak scaling where no solution is possible: Figure 25: Image - Plotting Sizeup Vs. Cores Figure 24: Image - Plotting Running Time Vs. Cores Figure 26: Image - Plotting Efficiency Vs. Cores From this first set of graphs, we see that when performing weak scaling, we observe increases in the time it takes to run the program. Our sizeup graph shows a curve that tends towards a plateau point. We believe that is the reinforcement of our hypothesis that this program would be subject to diminishing returns at a certain number of cores. The sizeup also shows an outlier, when using 8 cores. The efficiency graph reflects the outlier for the same amount of cores, 10

11 and generally shows a decrease in program efficiency when more cores are added. Below we now present figures which show the weak scaling test results when the sum is fixed such that it will always be found. Figure 29: Image - Plotting Efficiency Vs. Cores Figure 27: Image - Plotting Running Time Vs. Cores We some new activity here! With running times, we see a slightly steeper upward trend, but for the medium-difficulty problem we observe that with 4 cores the running time goes down. 4 cores, as we noted in Strong Scaling, seems to be the sweet spot for this program. Our sizeup graphs reflects the 4 core sweet spot by showing a burst in sizeup, however the graph generally indicates that while there is a bell-curve trend, the actual improvements seem to be all over the place. As with Strong Scaling, the efficiency graph for when a solution is present shows quite a steep drop in efficiency as more cores are added. Below the first stage consists of the tabular data for Weak Scaling runtimes: Figure 28: Image - Plotting Sizeup Vs. Cores Figure 30: Problem Size 1: No Solution 11

12 Figure 31: Problem Size 1: Solution Found Figure 33: Problem Size 2: Solution Found Figure 32: Problem Size 2: No Solution Figure 34: Problem Size 3: No Solution 12

13 Figure 35: Problem Size 3: Solution Found Figure 37: Problem Size 4: Solution Found Figure 36: Problem Size 4: No Solution Figure 38: Problem Size 5: No Solution 13

14 8 KNOWLEDGE GAINED This project allowed us to learn more about the Subset Sum algorithm and the variety of implementations that exist. It allowed us to hone our understanding of how to write a parallel program using PJ2 [2], a library we ve become incredibly interested in seeing if we could use towards other projects both personal and academic. This project better taught us how to handle situations where incorrect results were being received due to not checking fully for sequential dependencies on loops when attempting to parallelize them. The class and the project gave us experience in creating spec objects such that rather than passing in the entire set we could pass in a constructor for a class that would then create a set for us based on the parameters accepted by the constructor. Figure 39: Problem Size 5: Solution Found This concludes the tabular data for Weak Scaling for when a solution is available. 6.4 Weak Scaling Discussion With weak scaling, the ideal result would be that the time taken to do a certain problem size on a single core should be approximately equal to the time taken to do double that problem size on two cores. However, we can see from the graphs and tables above that the scaling is again, not ideal. This happens for a similar reason as with strong scaling; while the work is increased and the number of cores increases too, the new workload is not being spread across the cores as evenly as possible. This leads to only a partial sizeup, which as we can see from the graphs tends towards a plateau much faster than with strong scaling. Again, we think that another reason for the resulting scaling could be caused by thread synchronization. 9 TEAM BREAKDOWN Both of us were involved with choosing Subset Sum as our project topic, and we both sat together and researched the articles that we would later end up using to guide our program development and learn from about other implementations of the algorithm. We stored the project in a git repository so that we would both be able to contribute to all source files. Scaling tests, presentations and the report were also done as a fully shared effort. REFERENCES [1] Saniyah S. Bokhari Parallel Solution of the Subset-sum Problem: An Empirical Study. Ohio State University, Ohio State, OH. org/f3fc/b462b7366ab7d91febe5fb ff63dd.pdf Date Accessed: September 24, 2018 URL: [2] Alan Kaminsky Parallel Java 2. [3] Dushan Petkovski and Igor Mishkovski Parallel Implementation of the Modified Subset Sum Problem in OpenCL. ICT Innovations 2015, Web Proceedings ISSN null (2015), Date Accessed: October 3, 2018 URL: [4] Z. Ristovski, I. Mishkovski, S. Gramatikov, and S. Filiposka Parallel implementation of the modified subset sum problem in CUDA. (Nov 2014), Date Accessed: September 19, 2018 URL: 7 FUTURE WORK As in any project, there is room for more features to be added in the future. Given the time, we thought that good additions to this project could include modifier flags to be used as command line arguments which would then be used to select parameters of the subset algorithm itself. For example; a boolean flag of pos could be used to toggle whether or not the algorithm should allow both positive and negative integers in the global set and consider them when creating the subsets or to work with only positive integers. A flag of numtype could be set to int, long or double to force the SubsetSpec object to create the sets of numbers that use only the data-type passed as a parameter. Future work could also include adding the ability to run the Subset Sum algorithm on one or more GPUs such as the NVIDIA unites connected to the kraken computer. This would involve writing another version of the Subset Sum algorithm that interfaced with a kernel written in C/CUDA. 14

Subset Sum Problem Parallel Solution

Subset Sum Problem Parallel Solution Subset Sum Problem Parallel Solution Project Report Harshit Shah hrs8207@rit.edu Rochester Institute of Technology, NY, USA 1. Overview Subset sum problem is NP-complete problem which can be solved in

More information

Chapter 16 Heuristic Search

Chapter 16 Heuristic Search Chapter 16 Heuristic Search Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables

More information

Cost Optimal Parallel Algorithm for 0-1 Knapsack Problem

Cost Optimal Parallel Algorithm for 0-1 Knapsack Problem Cost Optimal Parallel Algorithm for 0-1 Knapsack Problem Project Report Sandeep Kumar Ragila Rochester Institute of Technology sr5626@rit.edu Santosh Vodela Rochester Institute of Technology pv8395@rit.edu

More information

Chapter 6 Parallel Loops

Chapter 6 Parallel Loops Chapter 6 Parallel Loops Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables

More information

Chapter 31 Multi-GPU Programming

Chapter 31 Multi-GPU Programming Chapter 31 Multi-GPU Programming Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Part IV. GPU Acceleration Chapter 29. GPU Massively Parallel Chapter 30. GPU

More information

N N Sudoku Solver. Sequential and Parallel Computing

N N Sudoku Solver. Sequential and Parallel Computing N N Sudoku Solver Sequential and Parallel Computing Abdulaziz Aljohani Computer Science. Rochester Institute of Technology, RIT Rochester, United States aaa4020@rit.edu Abstract 'Sudoku' is a logic-based

More information

WalkSAT: Solving Boolean Satisfiability via Stochastic Search

WalkSAT: Solving Boolean Satisfiability via Stochastic Search WalkSAT: Solving Boolean Satisfiability via Stochastic Search Connor Adsit cda8519@rit.edu Kevin Bradley kmb3398@rit.edu December 10, 2014 Christian Heinrich cah2792@rit.edu Contents 1 Overview 1 2 Research

More information

Chapter 13 Strong Scaling

Chapter 13 Strong Scaling Chapter 13 Strong Scaling Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables

More information

Chapter 17 Parallel Work Queues

Chapter 17 Parallel Work Queues Chapter 17 Parallel Work Queues Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction

More information

Chapter 24 File Output on a Cluster

Chapter 24 File Output on a Cluster Chapter 24 File Output on a Cluster Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Chapter 18. Massively Parallel Chapter 19. Hybrid Parallel Chapter 20. Tuple

More information

FINAL REPORT: K MEANS CLUSTERING SAPNA GANESH (sg1368) VAIBHAV GANDHI(vrg5913)

FINAL REPORT: K MEANS CLUSTERING SAPNA GANESH (sg1368) VAIBHAV GANDHI(vrg5913) FINAL REPORT: K MEANS CLUSTERING SAPNA GANESH (sg1368) VAIBHAV GANDHI(vrg5913) Overview The partitioning of data points according to certain features of the points into small groups is called clustering.

More information

Performance impact of dynamic parallelism on different clustering algorithms

Performance impact of dynamic parallelism on different clustering algorithms Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu

More information

Maximum Clique Problem

Maximum Clique Problem Maximum Clique Problem Dler Ahmad dha3142@rit.edu Yogesh Jagadeesan yj6026@rit.edu 1. INTRODUCTION Graph is a very common approach to represent computational problems. A graph consists a set of vertices

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing This document consists of two parts. The first part introduces basic concepts and issues that apply generally in discussions of parallel computing. The second part consists

More information

Chapter 21 Cluster Parallel Loops

Chapter 21 Cluster Parallel Loops Chapter 21 Cluster Parallel Loops Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Chapter 18. Massively Parallel Chapter 19. Hybrid Parallel Chapter 20. Tuple

More information

Parallel Computing Concepts. CSInParallel Project

Parallel Computing Concepts. CSInParallel Project Parallel Computing Concepts CSInParallel Project July 26, 2012 CONTENTS 1 Introduction 1 1.1 Motivation................................................ 1 1.2 Some pairs of terms...........................................

More information

Massively Parallel Seesaw Search for MAX-SAT

Massively Parallel Seesaw Search for MAX-SAT Massively Parallel Seesaw Search for MAX-SAT Harshad Paradkar Rochester Institute of Technology hp7212@rit.edu Prof. Alan Kaminsky (Advisor) Rochester Institute of Technology ark@cs.rit.edu Abstract The

More information

COSC 311: ALGORITHMS HW1: SORTING

COSC 311: ALGORITHMS HW1: SORTING COSC 311: ALGORITHMS HW1: SORTIG Solutions 1) Theoretical predictions. Solution: On randomly ordered data, we expect the following ordering: Heapsort = Mergesort = Quicksort (deterministic or randomized)

More information

Sorting. Bubble Sort. Selection Sort

Sorting. Bubble Sort. Selection Sort Sorting In this class we will consider three sorting algorithms, that is, algorithms that will take as input an array of items, and then rearrange (sort) those items in increasing order within the array.

More information

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: September 28, 2016 Edited by Ofir Geri

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: September 28, 2016 Edited by Ofir Geri CS161, Lecture 2 MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: September 28, 2016 Edited by Ofir Geri 1 Introduction Today, we will introduce a fundamental algorithm design paradigm,

More information

Foundation of Parallel Computing- Term project report

Foundation of Parallel Computing- Term project report Foundation of Parallel Computing- Term project report Shobhit Dutia Shreyas Jayanna Anirudh S N (snd7555@rit.edu) (sj7316@rit.edu) (asn5467@rit.edu) 1. Overview: Graphs are a set of connections between

More information

Chapter 26 Cluster Heuristic Search

Chapter 26 Cluster Heuristic Search Chapter 26 Cluster Heuristic Search Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Chapter 18. Massively Parallel Chapter 19. Hybrid Parallel Chapter 20. Tuple

More information

(Refer Slide Time: 1:27)

(Refer Slide Time: 1:27) Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 1 Introduction to Data Structures and Algorithms Welcome to data

More information

Enabling Loop Parallelization with Decoupled Software Pipelining in LLVM: Final Report

Enabling Loop Parallelization with Decoupled Software Pipelining in LLVM: Final Report Enabling Loop Parallelization with Decoupled Software Pipelining in LLVM: Final Report Ameya Velingker and Dougal J. Sutherland {avelingk, dsutherl}@cs.cmu.edu http://www.cs.cmu.edu/~avelingk/compilers/

More information

Running SNAP. The SNAP Team October 2012

Running SNAP. The SNAP Team October 2012 Running SNAP The SNAP Team October 2012 1 Introduction SNAP is a tool that is intended to serve as the read aligner in a gene sequencing pipeline. Its theory of operation is described in Faster and More

More information

Table of Laplace Transforms

Table of Laplace Transforms Table of Laplace Transforms 1 1 2 3 4, p > -1 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Heaviside Function 27 28. Dirac Delta Function 29 30. 31 32. 1 33 34. 35 36. 37 Laplace Transforms

More information

Massively Parallel Approximation Algorithms for the Knapsack Problem

Massively Parallel Approximation Algorithms for the Knapsack Problem Massively Parallel Approximation Algorithms for the Knapsack Problem Zhenkuang He Rochester Institute of Technology Department of Computer Science zxh3909@g.rit.edu Committee: Chair: Prof. Alan Kaminsky

More information

15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015

15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015 15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015 While we have good algorithms for many optimization problems, the previous lecture showed that many

More information

Using TLC to Check Inductive Invariance

Using TLC to Check Inductive Invariance Using TLC to Check Inductive Invariance Leslie Lamport 23 August 2018 1 Inductive Invariance Suppose we have a specification with initial predicate Init and next-state predicate Next, so its specification

More information

Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures

Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures University of Virginia Dept. of Computer Science Technical Report #CS-2011-09 Jeremy W. Sheaffer and Kevin

More information

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: April 1, 2015

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: April 1, 2015 CS161, Lecture 2 MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: April 1, 2015 1 Introduction Today, we will introduce a fundamental algorithm design paradigm, Divide-And-Conquer,

More information

CSE 373: Data Structures and Algorithms. Memory and Locality. Autumn Shrirang (Shri) Mare

CSE 373: Data Structures and Algorithms. Memory and Locality. Autumn Shrirang (Shri) Mare CSE 373: Data Structures and Algorithms Memory and Locality Autumn 2018 Shrirang (Shri) Mare shri@cs.washington.edu Thanks to Kasey Champion, Ben Jones, Adam Blank, Michael Lee, Evan McCarty, Robbie Weber,

More information

Running SNAP. The SNAP Team February 2012

Running SNAP. The SNAP Team February 2012 Running SNAP The SNAP Team February 2012 1 Introduction SNAP is a tool that is intended to serve as the read aligner in a gene sequencing pipeline. Its theory of operation is described in Faster and More

More information

What Secret the Bisection Method Hides? by Namir Clement Shammas

What Secret the Bisection Method Hides? by Namir Clement Shammas What Secret the Bisection Method Hides? 1 What Secret the Bisection Method Hides? by Namir Clement Shammas Introduction Over the past few years I have modified the simple root-seeking Bisection Method

More information

Chapter 2: Complexity Analysis

Chapter 2: Complexity Analysis Chapter 2: Complexity Analysis Objectives Looking ahead in this chapter, we ll consider: Computational and Asymptotic Complexity Big-O Notation Properties of the Big-O Notation Ω and Θ Notations Possible

More information

The Running Time of Programs

The Running Time of Programs The Running Time of Programs The 90 10 Rule Many programs exhibit the property that most of their running time is spent in a small fraction of the source code. There is an informal rule that states 90%

More information

Chapter 11 Overlapping

Chapter 11 Overlapping Chapter 11 Overlapping Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables

More information

CPSC 320: Intermediate Algorithm Design and Analysis. Tutorial: Week 3

CPSC 320: Intermediate Algorithm Design and Analysis. Tutorial: Week 3 CPSC 320: Intermediate Algorithm Design and Analysis Author: Susanne Bradley Tutorial: Week 3 At the time of this week s tutorial, we were approaching the end of our stable matching unit and about to start

More information

CSCE 626 Experimental Evaluation.

CSCE 626 Experimental Evaluation. CSCE 626 Experimental Evaluation http://parasol.tamu.edu Introduction This lecture discusses how to properly design an experimental setup, measure and analyze the performance of parallel algorithms you

More information

A Comparison of Scheduling Algorithms in MapReduce Systems

A Comparison of Scheduling Algorithms in MapReduce Systems A Comparison of Scheduling Algorithms in MapReduce Systems Karthik Kotian (kkk3860), Jason A Smith (jas7553), Ye Zhang (yxz4728) [4005-730] Distributed Systems (Winter 2012-2013) 1 1. Overview MapReduce

More information

Hashing. Hashing Procedures

Hashing. Hashing Procedures Hashing Hashing Procedures Let us denote the set of all possible key values (i.e., the universe of keys) used in a dictionary application by U. Suppose an application requires a dictionary in which elements

More information

CSE332 Summer 2010: Final Exam

CSE332 Summer 2010: Final Exam CSE332 Summer 2010: Final Exam Closed notes, closed book; calculator ok. Read the instructions for each problem carefully before answering. Problems vary in point-values, difficulty and length, so you

More information

1 Dynamic Programming

1 Dynamic Programming CS161 Lecture 13 Dynamic Programming and Greedy Algorithms Scribe by: Eric Huang Date: May 13, 2015 1 Dynamic Programming The idea of dynamic programming is to have a table of solutions of subproblems

More information

CS 475: Parallel Programming Introduction

CS 475: Parallel Programming Introduction CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.

More information

CS 137 Part 7. Big-Oh Notation, Linear Searching and Basic Sorting Algorithms. November 10th, 2017

CS 137 Part 7. Big-Oh Notation, Linear Searching and Basic Sorting Algorithms. November 10th, 2017 CS 137 Part 7 Big-Oh Notation, Linear Searching and Basic Sorting Algorithms November 10th, 2017 Big-Oh Notation Up to this point, we ve been writing code without any consideration for optimization. There

More information

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS Chapter 6 Indexing Results 6. INTRODUCTION The generation of inverted indexes for text databases is a computationally intensive process that requires the exclusive use of processing resources for long

More information

CS125 : Introduction to Computer Science. Lecture Notes #38 and #39 Quicksort. c 2005, 2003, 2002, 2000 Jason Zych

CS125 : Introduction to Computer Science. Lecture Notes #38 and #39 Quicksort. c 2005, 2003, 2002, 2000 Jason Zych CS125 : Introduction to Computer Science Lecture Notes #38 and #39 Quicksort c 2005, 2003, 2002, 2000 Jason Zych 1 Lectures 38 and 39 : Quicksort Quicksort is the best sorting algorithm known which is

More information

If Statements, For Loops, Functions

If Statements, For Loops, Functions Fundamentals of Programming If Statements, For Loops, Functions Table of Contents Hello World Types of Variables Integers and Floats String Boolean Relational Operators Lists Conditionals If and Else Statements

More information

Double-Precision Matrix Multiply on CUDA

Double-Precision Matrix Multiply on CUDA Double-Precision Matrix Multiply on CUDA Parallel Computation (CSE 60), Assignment Andrew Conegliano (A5055) Matthias Springer (A995007) GID G--665 February, 0 Assumptions All matrices are square matrices

More information

Vector Quantization. A Many-Core Approach

Vector Quantization. A Many-Core Approach Vector Quantization A Many-Core Approach Rita Silva, Telmo Marques, Jorge Désirat, Patrício Domingues Informatics Engineering Department School of Technology and Management, Polytechnic Institute of Leiria

More information

Repetition Structures

Repetition Structures Repetition Structures Chapter 5 Fall 2016, CSUS Introduction to Repetition Structures Chapter 5.1 1 Introduction to Repetition Structures A repetition structure causes a statement or set of statements

More information

(Refer Slide Time: 01.26)

(Refer Slide Time: 01.26) Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture # 22 Why Sorting? Today we are going to be looking at sorting.

More information

The p196 mpi implementation of the reverse-and-add algorithm for the palindrome quest.

The p196 mpi implementation of the reverse-and-add algorithm for the palindrome quest. The p196 mpi implementation of the reverse-and-add algorithm for the palindrome quest. Romain Dolbeau March 24, 2014 1 Introduction To quote John Walker, the first person to brute-force the problem [1]:

More information

Parallel Implementation of VLSI Gate Placement in CUDA

Parallel Implementation of VLSI Gate Placement in CUDA ME 759: Project Report Parallel Implementation of VLSI Gate Placement in CUDA Movers and Placers Kai Zhao Snehal Mhatre December 21, 2015 1 Table of Contents 1. Introduction...... 3 2. Problem Formulation...

More information

CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims. Lecture 10: Asymptotic Complexity and

CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims. Lecture 10: Asymptotic Complexity and CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims Lecture 10: Asymptotic Complexity and What Makes a Good Algorithm? Suppose you have two possible algorithms or

More information

Introduction to Multicore Programming

Introduction to Multicore Programming Introduction to Multicore Programming Minsoo Ryu Department of Computer Science and Engineering 2 1 Multithreaded Programming 2 Automatic Parallelization and OpenMP 3 GPGPU 2 Multithreaded Programming

More information

Multi-Way Number Partitioning

Multi-Way Number Partitioning Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09) Multi-Way Number Partitioning Richard E. Korf Computer Science Department University of California,

More information

2: Computer Performance

2: Computer Performance 2: Computer Performance John Burkardt Information Technology Department Virginia Tech... FDI Summer Track V: Parallel Programming... http://people.sc.fsu.edu/ jburkardt/presentations/ performance 2008

More information

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2 CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2 Prof. John Park Based on slides from previous iterations of this course Today s Topics Overview Uses and motivations of hash tables Major concerns with hash

More information

GPU Implementation of a Multiobjective Search Algorithm

GPU Implementation of a Multiobjective Search Algorithm Department Informatik Technical Reports / ISSN 29-58 Steffen Limmer, Dietmar Fey, Johannes Jahn GPU Implementation of a Multiobjective Search Algorithm Technical Report CS-2-3 April 2 Please cite as: Steffen

More information

A Virtual Laboratory for Study of Algorithms

A Virtual Laboratory for Study of Algorithms A Virtual Laboratory for Study of Algorithms Thomas E. O'Neil and Scott Kerlin Computer Science Department University of North Dakota Grand Forks, ND 58202-9015 oneil@cs.und.edu Abstract Empirical studies

More information

An Introduction to OpenAcc

An Introduction to OpenAcc An Introduction to OpenAcc ECS 158 Final Project Robert Gonzales Matthew Martin Nile Mittow Ryan Rasmuss Spring 2016 1 Introduction: What is OpenAcc? OpenAcc stands for Open Accelerators. Developed by

More information

Chapter Fourteen Bonus Lessons: Algorithms and Efficiency

Chapter Fourteen Bonus Lessons: Algorithms and Efficiency : Algorithms and Efficiency The following lessons take a deeper look at Chapter 14 topics regarding algorithms, efficiency, and Big O measurements. They can be completed by AP students after Chapter 14.

More information

CS179 GPU Programming Introduction to CUDA. Lecture originally by Luke Durant and Tamas Szalay

CS179 GPU Programming Introduction to CUDA. Lecture originally by Luke Durant and Tamas Szalay Introduction to CUDA Lecture originally by Luke Durant and Tamas Szalay Today CUDA - Why CUDA? - Overview of CUDA architecture - Dense matrix multiplication with CUDA 2 Shader GPGPU - Before current generation,

More information

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Second Edition. Alan Kaminsky

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Second Edition. Alan Kaminsky Solving the World s Toughest Computational Problems with Parallel Computing Second Edition Alan Kaminsky Department of Computer Science B. Thomas Golisano College of Computing and Information Sciences

More information

CS 4349 Lecture October 18th, 2017

CS 4349 Lecture October 18th, 2017 CS 4349 Lecture October 18th, 2017 Main topics for #lecture include #minimum_spanning_trees. Prelude Homework 6 due today. Homework 7 due Wednesday, October 25th. Homework 7 has one normal homework problem.

More information

Chapter 4: Control structures. Repetition

Chapter 4: Control structures. Repetition Chapter 4: Control structures Repetition Loop Statements After reading and studying this Section, student should be able to Implement repetition control in a program using while statements. Implement repetition

More information

Intro to Algorithms. Professor Kevin Gold

Intro to Algorithms. Professor Kevin Gold Intro to Algorithms Professor Kevin Gold What is an Algorithm? An algorithm is a procedure for producing outputs from inputs. A chocolate chip cookie recipe technically qualifies. An algorithm taught in

More information

Computer Science 322 Operating Systems Mount Holyoke College Spring Topic Notes: C and Unix Overview

Computer Science 322 Operating Systems Mount Holyoke College Spring Topic Notes: C and Unix Overview Computer Science 322 Operating Systems Mount Holyoke College Spring 2010 Topic Notes: C and Unix Overview This course is about operating systems, but since most of our upcoming programming is in C on a

More information

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing Second Edition. Alan Kaminsky

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing Second Edition. Alan Kaminsky Solving the World s Toughest Computational Problems with Parallel Computing Second Edition Alan Kaminsky Solving the World s Toughest Computational Problems with Parallel Computing Second Edition Alan

More information

Bob s Notes for COS 226 Fall : Introduction, Union-Find, and Percolation. Robert E. Tarjan 9/15/13. Introduction

Bob s Notes for COS 226 Fall : Introduction, Union-Find, and Percolation. Robert E. Tarjan 9/15/13. Introduction Bob s Notes for COS 226 Fall 2013 1: Introduction, Union-Find, and Percolation Robert E. Tarjan 9/15/13 Introduction Welcome to COS 226! This is the first of a series of occasional notes that I plan to

More information

Data Structures and Algorithms Key to Homework 1

Data Structures and Algorithms Key to Homework 1 Data Structures and Algorithms Key to Homework 1 January 31, 2005 15 Define an ADT for a set of integers (remember that a set may not contain duplicates) Your ADT should consist of the functions that can

More information

Practice Problems for the Final

Practice Problems for the Final ECE-250 Algorithms and Data Structures (Winter 2012) Practice Problems for the Final Disclaimer: Please do keep in mind that this problem set does not reflect the exact topics or the fractions of each

More information

Chapter 1. Math review. 1.1 Some sets

Chapter 1. Math review. 1.1 Some sets Chapter 1 Math review This book assumes that you understood precalculus when you took it. So you used to know how to do things like factoring polynomials, solving high school geometry problems, using trigonometric

More information

Chapter 27 Cluster Work Queues

Chapter 27 Cluster Work Queues Chapter 27 Cluster Work Queues Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Chapter 18. Massively Parallel Chapter 19. Hybrid Parallel Chapter 20. Tuple Space

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can

More information

Subset sum problem and dynamic programming

Subset sum problem and dynamic programming Lecture Notes: Dynamic programming We will discuss the subset sum problem (introduced last time), and introduce the main idea of dynamic programming. We illustrate it further using a variant of the so-called

More information

Lecture 1 Contracts. 1 A Mysterious Program : Principles of Imperative Computation (Spring 2018) Frank Pfenning

Lecture 1 Contracts. 1 A Mysterious Program : Principles of Imperative Computation (Spring 2018) Frank Pfenning Lecture 1 Contracts 15-122: Principles of Imperative Computation (Spring 2018) Frank Pfenning In these notes we review contracts, which we use to collectively denote function contracts, loop invariants,

More information

Computer Sciences 302 Exam 2 Information & Sample Exam

Computer Sciences 302 Exam 2 Information & Sample Exam Computer Sciences 302 Exam 2 Information & Sample Exam Below you ll find information about the second midterm exam and sample exam questions. This sample is intended to be similar in length and difficulty

More information

Chapter 4: Control structures

Chapter 4: Control structures Chapter 4: Control structures Repetition Loop Statements After reading and studying this Section, student should be able to Implement repetition control in a program using while statements. Implement repetition

More information

Chapter 36 Cluster Map-Reduce

Chapter 36 Cluster Map-Reduce Chapter 36 Cluster Map-Reduce Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Part IV. GPU Acceleration Part V. Big Data Chapter 35. Basic Map-Reduce Chapter

More information

CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018

CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018 CS 31: Intro to Systems Virtual Memory Kevin Webb Swarthmore College November 15, 2018 Reading Quiz Memory Abstraction goal: make every process think it has the same memory layout. MUCH simpler for compiler

More information

Chapter 10 Recursion

Chapter 10 Recursion Chapter 10 Recursion Written by Dr. Mark Snyder [minor edits for this semester by Dr. Kinga Dobolyi] Recursion implies that something is defined in terms of itself. We will see in detail how code can be

More information

Parallel Auction Algorithm for Linear Assignment Problem

Parallel Auction Algorithm for Linear Assignment Problem Parallel Auction Algorithm for Linear Assignment Problem Xin Jin 1 Introduction The (linear) assignment problem is one of classic combinatorial optimization problems, first appearing in the studies on

More information

How Rust views tradeoffs. Steve Klabnik

How Rust views tradeoffs. Steve Klabnik How Rust views tradeoffs Steve Klabnik 03.04.2019 What is a tradeoff? Bending the Curve Overview Design is about values Case Studies BDFL vs Design By Committee Stability Without Stagnation Acceptable

More information

Summer 2017 Discussion 10: July 25, Introduction. 2 Primitives and Define

Summer 2017 Discussion 10: July 25, Introduction. 2 Primitives and Define CS 6A Scheme Summer 207 Discussion 0: July 25, 207 Introduction In the next part of the course, we will be working with the Scheme programming language. In addition to learning how to write Scheme programs,

More information

FEKO Mesh Optimization Study of the EDGES Antenna Panels with Side Lips using a Wire Port and an Infinite Ground Plane

FEKO Mesh Optimization Study of the EDGES Antenna Panels with Side Lips using a Wire Port and an Infinite Ground Plane FEKO Mesh Optimization Study of the EDGES Antenna Panels with Side Lips using a Wire Port and an Infinite Ground Plane Tom Mozdzen 12/08/2013 Summary This study evaluated adaptive mesh refinement in the

More information

Lecture 1 Contracts : Principles of Imperative Computation (Fall 2018) Frank Pfenning

Lecture 1 Contracts : Principles of Imperative Computation (Fall 2018) Frank Pfenning Lecture 1 Contracts 15-122: Principles of Imperative Computation (Fall 2018) Frank Pfenning In these notes we review contracts, which we use to collectively denote function contracts, loop invariants,

More information

11.1 Facility Location

11.1 Facility Location CS787: Advanced Algorithms Scribe: Amanda Burton, Leah Kluegel Lecturer: Shuchi Chawla Topic: Facility Location ctd., Linear Programming Date: October 8, 2007 Today we conclude the discussion of local

More information

1 Dynamic Memory continued: Memory Leaks

1 Dynamic Memory continued: Memory Leaks CS104: Data Structures and Object-Oriented Design (Fall 2013) September 3, 2013: Dynamic Memory, continued; A Refresher on Recursion Scribes: CS 104 Teaching Team Lecture Summary In this lecture, we continue

More information

Assignment 4: CS Machine Learning

Assignment 4: CS Machine Learning Assignment 4: CS7641 - Machine Learning Saad Khan November 29, 2015 1 Introduction The purpose of this assignment is to apply some of the techniques learned from reinforcement learning to make decisions

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

Appendix A Clash of the Titans: C vs. Java

Appendix A Clash of the Titans: C vs. Java Appendix A Clash of the Titans: C vs. Java Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Part IV. GPU Acceleration Part V. Map-Reduce Appendices Appendix A.

More information

Condition-Controlled Loop. Condition-Controlled Loop. If Statement. Various Forms. Conditional-Controlled Loop. Loop Caution.

Condition-Controlled Loop. Condition-Controlled Loop. If Statement. Various Forms. Conditional-Controlled Loop. Loop Caution. Repetition Structures Introduction to Repetition Structures Chapter 5 Spring 2016, CSUS Chapter 5.1 Introduction to Repetition Structures The Problems with Duplicate Code A repetition structure causes

More information

CMPSCI 187: Programming With Data Structures. Lecture 5: Analysis of Algorithms Overview 16 September 2011

CMPSCI 187: Programming With Data Structures. Lecture 5: Analysis of Algorithms Overview 16 September 2011 CMPSCI 187: Programming With Data Structures Lecture 5: Analysis of Algorithms Overview 16 September 2011 Analysis of Algorithms Overview What is Analysis of Algorithms? L&C s Dishwashing Example Being

More information

ABC basics (compilation from different articles)

ABC basics (compilation from different articles) 1. AIG construction 2. AIG optimization 3. Technology mapping ABC basics (compilation from different articles) 1. BACKGROUND An And-Inverter Graph (AIG) is a directed acyclic graph (DAG), in which a node

More information

Computer Programming. Basic Control Flow - Loops. Adapted from C++ for Everyone and Big C++ by Cay Horstmann, John Wiley & Sons

Computer Programming. Basic Control Flow - Loops. Adapted from C++ for Everyone and Big C++ by Cay Horstmann, John Wiley & Sons Computer Programming Basic Control Flow - Loops Adapted from C++ for Everyone and Big C++ by Cay Horstmann, John Wiley & Sons Objectives To learn about the three types of loops: while for do To avoid infinite

More information

Guide - The limitations in screen layout using the Item Placement Tool

Guide - The limitations in screen layout using the Item Placement Tool Guide - The limitations in screen layout using the Item Placement Tool 1/8 Guide - The limitations in screen layout using the Item Placement Tool I the B1 Usability Package we have the Item Placement Tool

More information

Remember, to manage anything, we need to measure it. So, how do we compare two implementations? ArrayBag vs. LinkedBag Which is more efficient?

Remember, to manage anything, we need to measure it. So, how do we compare two implementations? ArrayBag vs. LinkedBag Which is more efficient? CS706 Intro to Object Oriented Dev II - Spring 04 Announcements Week 4 Program due Two Weeks! Lab 3-4 required completion this week First web-cat submission Material Evaluation of efficiency Time Analysis

More information

Lecture Notes on Sorting

Lecture Notes on Sorting Lecture Notes on Sorting 15-122: Principles of Imperative Computation Frank Pfenning Lecture 4 September 2, 2010 1 Introduction Algorithms and data structures can be evaluated along a number of dimensions,

More information