Parallel local search on GPU and CPU with OpenCL Language

Size: px
Start display at page:

Download "Parallel local search on GPU and CPU with OpenCL Language"

Transcription

1 Parallel local search on GPU and CPU with OpenCL Language Omar ABDELKAFI, Khalil CHEBIL, Mahdi KHEMAKHEM LOGIQ, UNIVERSITY OF SFAX SFAX-TUNISIA Abstract Real-world optimization problems are very complex and NP-hard. The modeling of such problems is in constant evolution in term of constraints and objectives and their resolution is expensive in computation time. With all this change, even metaheuristics, well known for their efficiency, begin to be overtaken by data explosion. Recently, Thanks to the publication of languages as OpenCL and CUDA, the development of parallel metaheuristics on GPU platform has a growing interest. Throughout this paper, we propose a parallelization in an iterative level of a local search. The contribution of this work is to propose a robust local search through two popular neighborhood structures. This contribution is applied to some combinatorial problems and adapted for the GPU platform. For this, several techniques have been proposed to accelerate the memory access, control the divergence and to maximize the parallelization. Many versions have been implemented with the OpenCL language to test parallelization on both GPU and CPU. Computational performance of this parallel local search are reported and compared to the sequential version. Keywords: GPU, OpenCL, Optimization, Parallel Local search, Knapsack problem, Traveling salesman problem 1. INTRODUCTION Many optimization problems in different area including data mining, communication, logistic and transport are NPhard. These problems have been resolved successfully with optimization approach such as metaheuristics (generic heuristics). Local search (LS) is one of the basic metaheuristics well known for his efficiency. LS is based on a single-solution process. Starting from a single solution, at each iteration the heuristic replaces the current solution by a neighbor that improves the objective function. The search stops when local optimum is reached. In the literature, we can found various LS based heuristics such as hill climbing, simulated annealing, tabu search, iterative local search and variable neighborhood search, etc. A state-of-the-art of LS algorithms can be found in [1]. Many methods have been proposed to parallelize the LS. Ref. [1] proposed three major parallel models to design parallel metaheuristics. These models are the algorithmiclevel, the iteration-level and the solution-level. Our work focus on the iteration-level, the main goal of this model is to parallelize the neighborhood generation. During the last decade, the growing of the Graphic Process Unit (GPU) is faster than the growing of the Central Process Unit (CPU). This amounts to the explosion of the industry of video games and his greedy demand for graphic power. GPU provide great computing power for a low cost, this is one of the main reason for the success of this architecture. Since 2003, the semiconductor industry has settled on two main trajectories for designing microprocessor. The multi-core trajectory like the current CPU and the many-core trajectory like the current GPU [7]. Nowadays, many-core GPU as NVIDIA Tesla K20X have 2688 cores and can provide a theoretical performance of 3.95 Tflops. In the other hand, multi-core CPU as AMD FX 8350 has 8 cores and a performance up to 256 Gflops. The use of the GPU on non-graphic application caused a rapid increase of interest on the scientist community. Indeed, the GPU became one of the most interesting platforms to implement parallel metaheuristics. As academic and industrial combinatorial optimization problems always increase in size and complexity, the field of parallel metaheuristics has to follow this evolution of high performance computing. Currently, the most three biggest vendors of GPU are Intel, Nvidia and AMD [2]. This market sharing allows the Open Computing Language (OpenCL) to become a very interesting choice to implement with GPU and to deal with the financier challenge. Indeed, thanks to the Khornos group, OpenCL became an industrial standard and can support many vendors of GPU. In addition, OpenCL allows us to run the same parallel code on both GPU and CPU to compare the efficiency of the two platforms. In this paper, we present a design of robust local search implementations on GPU using OpenCL language applied to the Knapsack Problem (KP) [10] and the Traveling Salesman Problem (TSP) [13]. Three contributions are proposed, the first one is the acceleration of the memory access on GPU, the second is the maximization of the data parallelized and the last one is the control of divergence. The rest of the paper is organized as follows. The second section is dedicated to introduce the OpenCL programming model with GPU and the different strategies to implement efficient metaheuristics. In section three, we present several techniques to design parallel local search for the KP and the TSP. In the section four we present the results of the computational experiments. The last section concludes the manuscript. 2. OPEN COMPUTING LANGUAGE ON GPU Programmable GPU hardware started with programmable units (pixel and vertex shaders), The Open Computing

2 Language is created to help programmer to use the GPU platform. The main advantage of this language is his portability and capacity to run on the most popular vendors of GPU like AMD, NVIDIA and Intel [2]. To get a good result with GPU, we need different strategies to design efficient metaheuristics with OpenCL. For this reason, we need to understand how the GPU execute the parallel program. Like the other language on GPU, OpenCL is based on cooperation between the host who represent the CPU and one or many devices such as the GPU in our case. The parallel code is implemented on a kernel that will be duplicated when the program is executed. This kernel is executed through several Work-Items (WI), these WI are grouped on a set of Work-Groups (WG). WI on the same WG needs to be synchronized with a barrier of synchronization. The most important concept to deal with on the GPU is the management of memories. With OpenCL we have access to four types of memories. The first one is the global memory which is very slow but very large. We should rarely use it because multiple accesses to this memory affect the performance. The second memory is the constant memory, slow and large, we use it only when all the WI have to access to the same memory address. The third memory is the local memory, fast and small, this memory is shared between WI in the same WG. If this memory is well used it can be one of the faster memory on GPU. However, if we over use this memory we can affect the performance. The final type of memory is the private memory (also called register), very fast but very small, using this memory must be carefully because the abuse causes the failure of the execution. The GPU architecture is very well designed for the parallel model and the CPU architecture is very powerful in the sequential model. Indeed, CPU has few arithmetic and logic unit (ALU) but powerful to perform sequential work and the GPU have a lot of ALU less powerful to increase throughput and perform the parallel work. The aim is to use the CPU to execute each iteration in sequentially manner and to use the GPU for the generation of neighborhood in a parallel way. The GPU is composed with a set of compute unit (CU), each CU is composed with a set of processing elements (PE) considered as cores on the GPU architecture. Every PE has his own private memory and every CU shares a local memory between his PE. All transfer between the host and the device are performed throw the global and the constant memory (see Fig1). To optimize execution and have an efficient design on GPU, several strategies need to be considered. We present four strategies that will help us in our work. The first strategy is to reduce transfer between the host and the device. The access to the global memory is faster than the transfer CPU-GPU, for this reason, we need to learn how to reuse data and reduce transfer [3]. The second strategy is to find a way to use the local memory instead of the global memory to accelerate access on GPU memory. The third strategy is to maximize the number of WI to take advantage of Fig1. Architecture of the GPU on the OpenCL Fig2.Divergence of instruction with SIMD system the parallel architecture of the GPU. The last strategy is the control of divergence, since GPU use a Single Instruction Multiple Data (SIMD) system, all data execute a single instruction in the same time, for this reason we need to use the less possible fragments of code like If-Else or loops in the kernel to control the divergence of the execution (see Fig2) [3]. An investigation was conducted in [4], for the decade between 2002 and 2012 they found only 8 works of LS on GPU and no one of them was implemented with OpenCL. In 2011 [5] was the first group who proposed the first results on a LS, they concentrate on binary problems and represent one neighborhood on one single thread. The thread in CUDA language is the equivalent of the WI in OpenCL language. They propose a technique to transform two or three index to one and one index to two or three. The objective of such technique is to reduce the transfer. In 2012, [8] used the LS to accelerate TSP with 2-opt and 3-opt structure [12]. More recently, [6] proposed several techniques and tools to optimize the transfer between the host and the device on LS. In the best of our knowledge, this is the first work that implements LS using the OpenCL language and compares the results between the parallelization on CPU and GPU with the same code. 3. PARALLEL LOCAL SEARCH As already said, the focus in this work is the iterative level [1]. One way to parallelize the neighborhood generation is to affect one neighbor to one WI [5]. This way is the most efficient approach but steal have a problem of memory. Indeed, on the case of large neighborhood, only recent GPU have

3 enough memory to support large number of WI because of the reduced number of registers. Since our focus is on portability and robustness, we choose another approach consisting on representing every WI as an item of the solution, which leads to a parallel execution of neighbor s generation. With this approach we can solve very large instances and produce robust LS but the neighbor s evaluation still performed sequentially in each WI. Both sequential and parallel pseudo-code versions of local search with the best improving strategy are shown respectively in Listing 1 and 2. Procedure Sequential local search Lets T: current iteration, S(T): the current solution, N(T): the number of neighbors of S(T), V(i): the i th neighbor of S(T), V * (T): the best neighbor of S(T); Begin Build an initial solution S(0); Specific local search pre-treatment; T: =0; Repeat V * (T):=V(1) For i=2 to N(T) Do Generate V(i); If V(i) is better than V * (T) V * (T):=V (i); Endif Endfor S (T+1):= V * (T); T: = T+1; Until S(T-1) better than S(T) Return S(T); End Listing 1: sequential Local search Procedure Parallel local search Lets T: current iteration, S(T): the current solution, N(T): the number of neighbors of S(T) which is equal to the WI number, V(i): the i th neighbor of S(T) produced by WI(i), V * (T): the best neighbor of S(T); Begin Build an initial solution S(0); Specific local search pre-treatment; T: =0; Allocate and copy problem data inputs on device memory; Repeat Allocate and copy the current solution on device memory; Allocate and copy additional information on device memory; Parallel execution of all WI(i) (i:=1 to N(T)) to generate V(i); V * (T):=V(1); For i=2 to N(T) Do If V(i) is better than V * (T) V * (T):=V (i); Endif EndFor S (T+1):= V * (T); T: = T+1; Until S(T-1) better than S(T) Return S(T); End Listing 2: Parallel local search with OpenCL Fig3.Traffic reduction of memory access Our first contribution consists to load some information from the global memory to the local memory to accelerate access time. Every WG has a local memory shared between his WI. Information used by all WI in the same time is loaded on the local memory instead the global memory. This technique reduces the access traffic and the memory access became faster (see Fig3). Listing 3 show the load of the information from the global to the local memory. Procedure reduce memory traffic Lets N: the size of the instance, GM[N]: information from the global memory, LM: a local memory variable Begin For i: =0 to N Do LM: = GM[i]; Synchronize WI; Perform compute operations with LM instead GM[i]; Synchronize WI; Endfor End Listing 3: traffic reduction of memory access A. Local search for KP The 0-1 Knapsack problem is defined as follow: given a knapsack with capacity c and a set of n items. Each item j has profit and weight. The objective is to select a subset of the items so as to Where {

4 KP is the most important knapsack problem and one of the most intensively studied. The literature on KP is vast but covered very well in [9] [10]. For this problem we use Drop and Add moves to generate neighborhoods. We drop each item that was selected in the knapsack and we add another item, not selected yet, hopping to found a best combination (see Fig5). Both the weight and the profit of every item are loaded on the local memory as shown on fig 3 and listing 3. Then, to maximize the number of WI launched, we create two kernels and we count the number of potential drop and add moves. If the number of drop moves is bigger than add moves, we launch every drop move on a WI and we evaluate the potential item to add (Kernel1). Else, we launch every add move on a WI and we evaluate the potential item to drop (Kernel2). At the end of the parallel execution we choose the best combination of drop and add moves. This contribution gives us two advantages, the first one is to maximize the WI launched to exploit the parallel architecture of GPU and the second one is to reduce the sequential work in each WI. B. Local search for TSP The TSP can be defined on a complete undirected graph G= (V, E). The set V= {1,, n} is the vertex (city) set, is an edge set. is defined on E as the Euclidean distance between two vertex i and j. The TSP consists to found a least-cost sequence in which to visit a set of cities, starting and ending at the same city, and in such a way that each city is visited exactly once (the lower cost Hamiltonian circuit). Ref. [11] formulated this problem as follows: For this problem we use the switch neighborhood structure. Every city is switched with all the others hopping to find a shorter circuit (see Fig4). To accelerate the time access using the local memory, the distances between each city to switch and his neighbors are loaded on the local memory as shown on fig 3 and listing 3. Fig5.neighborhood for KP with Drop-Add structure To control the divergence, we create two kernels, kernel 1, to perform adjacent permutation and the rest of permutations for kernel 2. The two kernels are executed in a parallel way. In the end of the parallel execution we chose the best switch move. 4. EXPERIMENTAL RESULTS For the KP we generate 50 instances from 10 3 to 10 4 items strongly correlated randomly generated, we make 5 tests for every instance. For the TSP we use 10 well known tests from the TSPLIB [15] to valid our work. To perform the experimentation we use CPU and GPU configuration. Our objective is seeing what we can earn in parallelization using the same machine without spending money to buy other configurations. Table 1 shows the specifications of each configuration. Unlike the GPU, the number of CU is equal to the number of PE for the CPU. The frequency is the operation performed by the processor in one second, the Turbo Frequency is the frequency of the processor when only one core is working (sequential model). Finally, the theoretical performance is calculated on simple precision (SP) for the two configurations to compare them. The execution depends on this performance and on the rapidity of memory access. For fair comparison in both KP and TSP, the execution on parallel CPU doesn t use the local memory (listing 3) because it s disadvantage the CPU execution. A. Knapsack Problem Table 2 resumes the acceleration of the parallel CPU and GPU for solving KP, every instance in table 2 is the average of 5 tests. We can see that the acceleration of GPU is better than CPU. The KP is a problem which use only two accesses to the memory for add move and two other accesses for drop move, and need a computing power to calculate the new weight and profit. In this specific problem we can see that the theoretical performance in simple precision is very significant. We can see also that the acceleration on GPU appears from the instance with items only at the 47 th iteration. This is thanks to the capability of GPU to adapt quickly with the parallelization. For the CPU the acceleration begin only on the instance with items at the 77 th iteration. The best result recorded is an acceleration of 6.70 times for GPU against 3.69 times for CPU when the LS perform 152 iterations. Fig4.neighborhood for TSP with Switch structure

5 Table 1.Specifications of the configurations Specifications CU/PE Frequency (Turbo) Performance (SP) Intel core i7-2630qm Nvidia Ge-Force GT525M 8/8 2 (2.9) Ghz 128 GFOLPS 2/ Ghz GFLOPS Table2. Acceleration for knapsack Problem Instances (items number) Avg. iterations AccCPU AccGPU Fig6.Curve of acceleration KP Fig 6 shows the evolution of the three models. We can see the clear advantage of the two parallel models against the sequential model. Indeed, the curve of the sequential model is exponential. The two parallel models are more stable and we can see the constant advantage for the GPU. For the KP we can conclude that the performances recorded are very interesting. Indeed, the proposition is robust because it depends on the instance and not on the size of neighborhoods (listing 2). The acceleration grows when the GPU platform is more powerful and the LS perform more iteration. Few work considered the KP on GPU, Indeed, on the best of our knowledge we found only two work for exact method solving the KP with GPU [16] [17]. B. Traveling Salesman Problem The second experience is on TSP. Table 3 shows the acceleration of the parallel CPU and GPU. In this specific problem, the memory bound is very important. For this reason the CPU is more efficient but with the GPU we record good acceleration regarding the theoretical performance of the GPU used. We can see that we get acceleration with the GPU before the parallel CPU. The GPU stay more efficient up to 3496 cities, this efficiency is the consequence of the fast adaptation of GPU with the parallel model. The CPU takes the lead in the rest of instances thanks to his fast access to the memory and efficiency on the sequential work. The GPU continue to grow slowly until have an acceleration of 3.04 times against 5.91 times for the parallel CPU when the LS perform 711 iterations for the last instance. Fig 7 shows the curve for the three models of execution on TSP. We can see the exponential evolution of the sequential Fig7.Curve of acceleration TSP Table3. Acceleration for Traveling Salesman Problem Instances (city number) iterations AccCPU AccGPU dsj fra mu nu dlb Xua fnl ca eg ym model. The two parallel models for TSP haven t the same stability as in KP. The challenge with the TSP is the access to the memory, as we can see the theoretical performances of the configurations are not very significant. For this specific problem we try to compensate the access to the memory with the control of the divergence. It works better for the parallel execution on CPU but with this platform we still not obtaining performance in

6 small instance with few iterations. The GPU platform is better in this case. The last experience is using both parallel CPU and GPU, the idea is to run the first kernel performing the adjacent cities with the GPU and the second kernel performing the others permutation with the CPU. The objective is to exploit the best of the two platforms. The idea came from the work of AMD and Intel on a new fusion between CPU and GPU in the same chips. To perform this proposition we create two separated devices. For our case we need to do the transfer twice and create two separated program with OpenCL to execute the two devices. All of this additional work can be avoided if the GPU and the CPU are in the same chips. Despite this we can see that we obtain very interesting results using this approach. Table 4 resumes the acceleration of this proposition. We can see that the acceleration begin earlier than the parallel CPU model alone. Indeed, from 3496 cities we have acceleration, but the most interesting, we have a very significant acceleration for 3694 cities with an acceleration of 4.00 times when the LS perform 244 iterations. We didn t obtain better acceleration then the parallel CPU but the results are very close and more stable. With this solution we can execute very large instances and the LS is efficient more quickly than the parallelization with the CPU alone. The TSP is a very challenging problem, for the decade between 2002 and 2012, 16 publications works on the parallelization of TSP with the GPU using different metaheuristics with population-based and single solution-based [4]. In 2011 [14] proposed a design of a LS representing one neighborhood on one single thread as their earlier work [5], but in this work they concentrated on the management of memories and minimized transfer between the CPU and GPU, they studies four optimization problems including TSP with a size between 101 and 5915 cities. In 2012, [8] used the LS to accelerate TSP with 2-opt and 3-opt structure, they used 13 TSPLIB problem instances [15] with a size between 100 and 4461 cities, In this work they choose to dedicate one thread to one 2-opt (or 3-opt) swap calculation, the most important contribution on this work is to re-compute data from the coordinates of the points, instead acceding to the pre-calculated matrix of distances, they recompute the data using the high peak computational power of GPU, the coordinates of points are stored in the local memory, but the problem is the limitation of this approach to approximately 4800 cities because of the small size of the local memory. On our side, we propose this approach of fusion between platforms and we believe that it can be a very efficient approach especially for large neighborhood instances when the GPU is not very powerful. It resolves the problem with little number of registers and gives good acceleration. More interesting, it can be a very efficient solution if we have in the future a chip with both GPU and CPU. Table4. Acceleration for TSP CPU-GPU Instances (city number) iterations AccCPU+GPU dsj fra mu nu dlb Xua fnl ca eg ym CONCLUSION AND FUTURE WORKS The aim of this paper was to propose a robust local search that can produce results for large scale problems. We propose several techniques to accelerate the execution on GPU and we adapt these techniques with KP and TSP. The proposition was experimented on both CPU and GPU thanks to the portability of OpenCL. For the two problems we can clearly see the stability of the parallel models against the exponential evolution of the sequential model. Through this work we can conclude that the parallelization depend on two main factors. The first one is the characteristics of the used platform. For us it was the power of the GPU on parallel execution and the power of the CPU on sequential execution. The second factor is the nature of the problem. Indeed, we can observe that when the problem depends on the computing bound, the theoretical performance of the platform is significant. On the other side, the bandwidth is more important when the problem depends on memory bound. For the KP, very good acceleration on GPU was recorded regarding the number of iterations performed. For TSP, better acceleration was recorded with the parallel CPU but the fusion between GPU and CPU is a promising approach. Our next objective is to adapt our parallel LS for different others combinatorial problems. Each problem can be studied and allow us to find new techniques to apply. Many other metaheuristics based on single solution or population can be explored. For other metaheuristics based on a single solution process like tabu search, it s easy to adapt our techniques on these methods. For example the tabu list can be saved on the local memory and the other techniques are perfectly applicable. Other level of parallelization like the algorithmic-level and the solution-level can be used to accelerate the execution and to ameliorate the results [1]. Also, constructive methods and classification methods can be parallelized with GPU. These methods are used to help the metaheuristics to solve the combinatorial problems. Several advanced techniques can be used to optimize the transfer between the host and the device like the overlap between the execution and the transfer [6]. Another very interesting objective is the use of many GPU (a GPU cluster), it can product a very powerful system with a reasonable cost.

7 References [1] E.G. Talbi, Metaheuristics :from Design to Implementation, John wiley and sons Inc, [2] R.B. André, R.H. Trond, L.S. Martin, Graphic process unit (GPU) programming strategies and trends in GPU computing, J.Parallel Distrib.Comput, 73: 4-13, [3] R.B. André, R.H. Trond, S. Christian, H.Geir, GPU Computing in discrete optimization Part 1:Introduction to the GPU, EURO journal on transportation and logistics, 2: , [4] S. Christian, H.Geir, R.B. André, R.H. Trond, GPU Computing in discrete optimization Part 2:Survey focused on Routing problems, EURO journal on transportation and logistics, 2: , [5] L.T. Luang, N. Melab, E.G. Talbi, Large neighborhood local search optimization on graphic process unit, Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, 1-8, [6] S. Christian, Efficient local search on the GPU- Investigations on the vehucule routing problem, J.Parallel Distrib.Comput, 73: 14-31, [7] D.B. Kirk, W.M. Hwu, Programming massively parallel processor-a hands on approach, Morgan kaufmann, 30 Corporate Drive, [8] K. Rocki, R. Suda, Acceleratin 2-opt and 3-opt local search using GPU in the traveling salesman problem, High Performance Computing and Simulation (HPCS), 2012 [9] S. Martello,P.Toth,. Knapsack problems: Algorithms and computer implementations, in: Series in Discrete Mathematics and Optimization, Wiley Interscience [10] H. Kellerer,U. Pferschy,D. Pisinger, Knapsack Problems, Springer, [11] M. Desrochers, G. Laporte, Improvements and extensions to the millertucker-zemlin subtour elimination constraints. Operations Reasearch Letters 10, 27-36, [12] S. Lin and B. W. Kernighan. An Effective Heuristic Algorithm for the Traveling-Salesman Problem. Operations Research 21, , [13] G. Gutin, A.P. Punnen, The Traveling Salesman Problem and Its Variations, Springer, [14] T. V. Luong, N. Melab, E.-G. Talbi. Gpu computing for parallel local search metaheuristic algorithms. IEEE TRANSACTIONS ON COMPUTERS, 63 :( ), [15] G. Reinelt, TSPLIB - A Traveling Salesman Problem Library. ORSA Journal on Computing, Vol. 3, No. 4, pp Fall,1991. [16] M. E. Lalami and D. El-Baz,. Gpu implementation of the branch and bound method for knapsack problems. IEEE 26th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, pages 1-9,2012 [17] V. Boyera, D. E. Baz, M. Elkihel, Solving knapsack problems on gpu, Computers and Operations Research, 39 :(42-47).2012

HEURISTICS optimization algorithms like 2-opt or 3-opt

HEURISTICS optimization algorithms like 2-opt or 3-opt Parallel 2-Opt Local Search on GPU Wen-Bao Qiao, Jean-Charles Créput Abstract To accelerate the solution for large scale traveling salesman problems (TSP), a parallel 2-opt local search algorithm with

More information

Accelerating local search algorithms for the travelling salesman problem through the effective use of GPU

Accelerating local search algorithms for the travelling salesman problem through the effective use of GPU Available online at www.sciencedirect.com ScienceDirect Transportation Research Procedia 22 (2017) 409 418 www.elsevier.com/locate/procedia 19th EURO Working Group on Transportation Meeting, EWGT2016,

More information

Towards ParadisEO-MO-GPU: a Framework for GPU-based Local Search Metaheuristics

Towards ParadisEO-MO-GPU: a Framework for GPU-based Local Search Metaheuristics Towards ParadisEO-MO-GPU: a Framework for GPU-based Local Search Metaheuristics N. Melab, T-V. Luong, K. Boufaras and E-G. Talbi Dolphin Project INRIA Lille Nord Europe - LIFL/CNRS UMR 8022 - Université

More information

Advances in Metaheuristics on GPU

Advances in Metaheuristics on GPU Advances in Metaheuristics on GPU 1 Thé Van Luong, El-Ghazali Talbi and Nouredine Melab DOLPHIN Project Team May 2011 Interests in optimization methods 2 Exact Algorithms Heuristics Branch and X Dynamic

More information

High Performance GPU Accelerated Local Optimization in TSP

High Performance GPU Accelerated Local Optimization in TSP 2013 IEEE 27th International Symposium on Parallel & Distributed Processing Workshops and PhD Forum High Performance GPU Accelerated Local Optimization in TSP Kamil Rocki, Reiji Suda The University of

More information

Parallelization Strategies for Local Search Algorithms on Graphics Processing Units

Parallelization Strategies for Local Search Algorithms on Graphics Processing Units Parallelization Strategies for Local Search Algorithms on Graphics Processing Units Audrey Delévacq, Pierre Delisle, and Michaël Krajecki CReSTIC, Université de Reims Champagne-Ardenne, Reims, France Abstract

More information

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini Metaheuristic Development Methodology Fall 2009 Instructor: Dr. Masoud Yaghini Phases and Steps Phases and Steps Phase 1: Understanding Problem Step 1: State the Problem Step 2: Review of Existing Solution

More information

Parallel Implementation of Travelling Salesman Problem using Ant Colony Optimization

Parallel Implementation of Travelling Salesman Problem using Ant Colony Optimization Parallel Implementation of Travelling Salesman Problem using Ant Colony Optimization Gaurav Bhardwaj Department of Computer Science and Engineering Maulana Azad National Institute of Technology Bhopal,

More information

XLVI Pesquisa Operacional na Gestão da Segurança Pública

XLVI Pesquisa Operacional na Gestão da Segurança Pública PARALLEL CONSTRUCTION FOR CONTINUOUS GRASP OPTIMIZATION ON GPUs Lisieux Marie Marinho dos Santos Andrade Centro de Informática Universidade Federal da Paraíba Campus I, Cidade Universitária 58059-900,

More information

Genetic Algorithms with Oracle for the Traveling Salesman Problem

Genetic Algorithms with Oracle for the Traveling Salesman Problem PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY VOLUME 7 AUGUST 25 ISSN 17-884 Genetic Algorithms with Oracle for the Traveling Salesman Problem Robin Gremlich, Andreas Hamfelt, Héctor

More information

Local search heuristic for multiple knapsack problem

Local search heuristic for multiple knapsack problem International Journal of Intelligent Information Systems 2015; 4(2): 35-39 Published online February 14, 2015 (http://www.sciencepublishinggroup.com/j/ijiis) doi: 10.11648/j.ijiis.20150402.11 ISSN: 2328-7675

More information

Fuzzy Inspired Hybrid Genetic Approach to Optimize Travelling Salesman Problem

Fuzzy Inspired Hybrid Genetic Approach to Optimize Travelling Salesman Problem Fuzzy Inspired Hybrid Genetic Approach to Optimize Travelling Salesman Problem Bindu Student, JMIT Radaur binduaahuja@gmail.com Mrs. Pinki Tanwar Asstt. Prof, CSE, JMIT Radaur pinki.tanwar@gmail.com Abstract

More information

GPU Implementation of the Branch and Bound method for knapsack problems

GPU Implementation of the Branch and Bound method for knapsack problems 2012 IEEE 201226th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum GPU Implementation of

More information

Solving Traveling Salesman Problem Using Parallel Genetic. Algorithm and Simulated Annealing

Solving Traveling Salesman Problem Using Parallel Genetic. Algorithm and Simulated Annealing Solving Traveling Salesman Problem Using Parallel Genetic Algorithm and Simulated Annealing Fan Yang May 18, 2010 Abstract The traveling salesman problem (TSP) is to find a tour of a given number of cities

More information

Modeling the Component Pickup and Placement Sequencing Problem with Nozzle Assignment in a Chip Mounting Machine

Modeling the Component Pickup and Placement Sequencing Problem with Nozzle Assignment in a Chip Mounting Machine Modeling the Component Pickup and Placement Sequencing Problem with Nozzle Assignment in a Chip Mounting Machine Hiroaki Konishi, Hidenori Ohta and Mario Nakamori Department of Information and Computer

More information

GPU Programming Using NVIDIA CUDA

GPU Programming Using NVIDIA CUDA GPU Programming Using NVIDIA CUDA Siddhante Nangla 1, Professor Chetna Achar 2 1, 2 MET s Institute of Computer Science, Bandra Mumbai University Abstract: GPGPU or General-Purpose Computing on Graphics

More information

Using Genetic Algorithm with Triple Crossover to Solve Travelling Salesman Problem

Using Genetic Algorithm with Triple Crossover to Solve Travelling Salesman Problem Proc. 1 st International Conference on Machine Learning and Data Engineering (icmlde2017) 20-22 Nov 2017, Sydney, Australia ISBN: 978-0-6480147-3-7 Using Genetic Algorithm with Triple Crossover to Solve

More information

A Steady-State Genetic Algorithm for Traveling Salesman Problem with Pickup and Delivery

A Steady-State Genetic Algorithm for Traveling Salesman Problem with Pickup and Delivery A Steady-State Genetic Algorithm for Traveling Salesman Problem with Pickup and Delivery Monika Sharma 1, Deepak Sharma 2 1 Research Scholar Department of Computer Science and Engineering, NNSS SGI Samalkha,

More information

Optimization solutions for the segmented sum algorithmic function

Optimization solutions for the segmented sum algorithmic function Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code

More information

Parallel Computing in Combinatorial Optimization

Parallel Computing in Combinatorial Optimization Parallel Computing in Combinatorial Optimization Bernard Gendron Université de Montréal gendron@iro.umontreal.ca Course Outline Objective: provide an overview of the current research on the design of parallel

More information

Parallel Implementation of the Max_Min Ant System for the Travelling Salesman Problem on GPU

Parallel Implementation of the Max_Min Ant System for the Travelling Salesman Problem on GPU Parallel Implementation of the Max_Min Ant System for the Travelling Salesman Problem on GPU Gaurav Bhardwaj Department of Computer Science and Engineering Maulana Azad National Institute of Technology

More information

Hardware-Software Codesign

Hardware-Software Codesign Hardware-Software Codesign 4. System Partitioning Lothar Thiele 4-1 System Design specification system synthesis estimation SW-compilation intellectual prop. code instruction set HW-synthesis intellectual

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

Parallel Metaheuristics on GPU

Parallel Metaheuristics on GPU Ph.D. Defense - Thé Van LUONG December 1st 2011 Parallel Metaheuristics on GPU Advisors: Nouredine MELAB and El-Ghazali TALBI Outline 2 I. Scientific Context 1. Parallel Metaheuristics 2. GPU Computing

More information

Two new variants of Christofides heuristic for the Static TSP and a computational study of a nearest neighbor approach for the Dynamic TSP

Two new variants of Christofides heuristic for the Static TSP and a computational study of a nearest neighbor approach for the Dynamic TSP Two new variants of Christofides heuristic for the Static TSP and a computational study of a nearest neighbor approach for the Dynamic TSP Orlis Christos Kartsiotis George Samaras Nikolaos Margaritis Konstantinos

More information

Optimization Techniques for Design Space Exploration

Optimization Techniques for Design Space Exploration 0-0-7 Optimization Techniques for Design Space Exploration Zebo Peng Embedded Systems Laboratory (ESLAB) Linköping University Outline Optimization problems in ERT system design Heuristic techniques Simulated

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied

More information

Cost Optimal Parallel Algorithm for 0-1 Knapsack Problem

Cost Optimal Parallel Algorithm for 0-1 Knapsack Problem Cost Optimal Parallel Algorithm for 0-1 Knapsack Problem Project Report Sandeep Kumar Ragila Rochester Institute of Technology sr5626@rit.edu Santosh Vodela Rochester Institute of Technology pv8395@rit.edu

More information

A Parallel Architecture for the Generalized Traveling Salesman Problem

A Parallel Architecture for the Generalized Traveling Salesman Problem A Parallel Architecture for the Generalized Traveling Salesman Problem Max Scharrenbroich AMSC 663 Project Proposal Advisor: Dr. Bruce L. Golden R. H. Smith School of Business 1 Background and Introduction

More information

A COMPARATIVE STUDY OF FIVE PARALLEL GENETIC ALGORITHMS USING THE TRAVELING SALESMAN PROBLEM

A COMPARATIVE STUDY OF FIVE PARALLEL GENETIC ALGORITHMS USING THE TRAVELING SALESMAN PROBLEM A COMPARATIVE STUDY OF FIVE PARALLEL GENETIC ALGORITHMS USING THE TRAVELING SALESMAN PROBLEM Lee Wang, Anthony A. Maciejewski, Howard Jay Siegel, and Vwani P. Roychowdhury * Microsoft Corporation Parallel

More information

Hybrid Constraint Programming and Metaheuristic methods for Large Scale Optimization Problems

Hybrid Constraint Programming and Metaheuristic methods for Large Scale Optimization Problems Hybrid Constraint Programming and Metaheuristic methods for Large Scale Optimization Problems Fabio Parisini Tutor: Paola Mello Co-tutor: Michela Milano Final seminars of the XXIII cycle of the doctorate

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

Complete Local Search with Memory

Complete Local Search with Memory Complete Local Search with Memory Diptesh Ghosh Gerard Sierksma SOM-theme A Primary Processes within Firms Abstract Neighborhood search heuristics like local search and its variants are some of the most

More information

Massively Parallel Approximation Algorithms for the Traveling Salesman Problem

Massively Parallel Approximation Algorithms for the Traveling Salesman Problem Massively Parallel Approximation Algorithms for the Traveling Salesman Problem Vaibhav Gandhi May 14, 2015 Abstract This paper introduces the reader to massively parallel approximation algorithms which

More information

Algorithm Design (4) Metaheuristics

Algorithm Design (4) Metaheuristics Algorithm Design (4) Metaheuristics Takashi Chikayama School of Engineering The University of Tokyo Formalization of Constraint Optimization Minimize (or maximize) the objective function f(x 0,, x n )

More information

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior

More information

Massively Parallel Approximation Algorithms for the Knapsack Problem

Massively Parallel Approximation Algorithms for the Knapsack Problem Massively Parallel Approximation Algorithms for the Knapsack Problem Zhenkuang He Rochester Institute of Technology Department of Computer Science zxh3909@g.rit.edu Committee: Chair: Prof. Alan Kaminsky

More information

Optimal tour along pubs in the UK

Optimal tour along pubs in the UK 1 From Facebook Optimal tour along 24727 pubs in the UK Road distance (by google maps) see also http://www.math.uwaterloo.ca/tsp/pubs/index.html (part of TSP homepage http://www.math.uwaterloo.ca/tsp/

More information

The Traveling Salesman Problem: State of the Art

The Traveling Salesman Problem: State of the Art The Traveling Salesman Problem: State of the Art Thomas Stützle stuetzle@informatik.tu-darmstadt.de http://www.intellektik.informatik.tu-darmstadt.de/ tom. Darmstadt University of Technology Department

More information

Portland State University ECE 588/688. Graphics Processors

Portland State University ECE 588/688. Graphics Processors Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly

More information

Improvement heuristics for the Sparse Travelling Salesman Problem

Improvement heuristics for the Sparse Travelling Salesman Problem Improvement heuristics for the Sparse Travelling Salesman Problem FREDRICK MTENZI Computer Science Department Dublin Institute of Technology School of Computing, DIT Kevin Street, Dublin 8 IRELAND http://www.comp.dit.ie/fmtenzi

More information

Effective Tour Searching for Large TSP Instances. Gerold Jäger

Effective Tour Searching for Large TSP Instances. Gerold Jäger Effective Tour Searching for Large TSP Instances Gerold Jäger Martin-Luther-University Halle-Wittenberg (Germany) joint work with Changxing Dong, Paul Molitor, Dirk Richter German Research Foundation Grant

More information

Introduction to Combinatorial Algorithms

Introduction to Combinatorial Algorithms Fall 2009 Intro Introduction to the course What are : Combinatorial Structures? Combinatorial Algorithms? Combinatorial Problems? Combinatorial Structures Combinatorial Structures Combinatorial structures

More information

Travelling salesman problem using reduced algorithmic Branch and bound approach P. Ranjana Hindustan Institute of Technology and Science

Travelling salesman problem using reduced algorithmic Branch and bound approach P. Ranjana Hindustan Institute of Technology and Science Volume 118 No. 20 2018, 419-424 ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Travelling salesman problem using reduced algorithmic Branch and bound approach P. Ranjana Hindustan

More information

GPU Implementation of a Multiobjective Search Algorithm

GPU Implementation of a Multiobjective Search Algorithm Department Informatik Technical Reports / ISSN 29-58 Steffen Limmer, Dietmar Fey, Johannes Jahn GPU Implementation of a Multiobjective Search Algorithm Technical Report CS-2-3 April 2 Please cite as: Steffen

More information

Comparison Study of Multiple Traveling Salesmen Problem using Genetic Algorithm

Comparison Study of Multiple Traveling Salesmen Problem using Genetic Algorithm IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-661, p- ISSN: 2278-8727Volume 13, Issue 3 (Jul. - Aug. 213), PP 17-22 Comparison Study of Multiple Traveling Salesmen Problem using Genetic

More information

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany Computing on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Summary: The increasing power of GPUs has led to the intent to transfer computing load from CPUs to GPUs. A first example has been

More information

Simultaneous Solving of Linear Programming Problems in GPU

Simultaneous Solving of Linear Programming Problems in GPU Simultaneous Solving of Linear Programming Problems in GPU Amit Gurung* amitgurung@nitm.ac.in Binayak Das* binayak89cse@gmail.com Rajarshi Ray* raj.ray84@gmail.com * National Institute of Technology Meghalaya

More information

Graphics Processor Acceleration and YOU

Graphics Processor Acceleration and YOU Graphics Processor Acceleration and YOU James Phillips Research/gpu/ Goals of Lecture After this talk the audience will: Understand how GPUs differ from CPUs Understand the limits of GPU acceleration Have

More information

Local Search Overview

Local Search Overview DM841 DISCRETE OPTIMIZATION Part 2 Heuristics Local Search Overview Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. 2. 3. Local Search 2 Outline

More information

Job Shop Scheduling Problem (JSSP) Genetic Algorithms Critical Block and DG distance Neighbourhood Search

Job Shop Scheduling Problem (JSSP) Genetic Algorithms Critical Block and DG distance Neighbourhood Search A JOB-SHOP SCHEDULING PROBLEM (JSSP) USING GENETIC ALGORITHM (GA) Mahanim Omar, Adam Baharum, Yahya Abu Hasan School of Mathematical Sciences, Universiti Sains Malaysia 11800 Penang, Malaysia Tel: (+)

More information

REPORT DOCUMENTATION PAGE

REPORT DOCUMENTATION PAGE REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,

More information

Massively Parallel Computations of the LZ-complexity of Strings

Massively Parallel Computations of the LZ-complexity of Strings Massively Parallel Computations of the LZ-complexity of Strings Alexander Belousov Electrical and Electronics Engineering Department Ariel University Ariel, Israel alex.blsv@gmail.com Joel Ratsaby Electrical

More information

Journal of Universal Computer Science, vol. 14, no. 14 (2008), submitted: 30/9/07, accepted: 30/4/08, appeared: 28/7/08 J.

Journal of Universal Computer Science, vol. 14, no. 14 (2008), submitted: 30/9/07, accepted: 30/4/08, appeared: 28/7/08 J. Journal of Universal Computer Science, vol. 14, no. 14 (2008), 2416-2427 submitted: 30/9/07, accepted: 30/4/08, appeared: 28/7/08 J.UCS Tabu Search on GPU Adam Janiak (Institute of Computer Engineering

More information

A Development of Hybrid Cross Entropy-Tabu Search Algorithm for Travelling Repairman Problem

A Development of Hybrid Cross Entropy-Tabu Search Algorithm for Travelling Repairman Problem Proceedings of the 2012 International Conference on Industrial Engineering and Operations Management Istanbul, Turkey, July 3 6, 2012 A Development of Hybrid Cross Entropy-Tabu Search Algorithm for Travelling

More information

CSE 417 Branch & Bound (pt 4) Branch & Bound

CSE 417 Branch & Bound (pt 4) Branch & Bound CSE 417 Branch & Bound (pt 4) Branch & Bound Reminders > HW8 due today > HW9 will be posted tomorrow start early program will be slow, so debugging will be slow... Review of previous lectures > Complexity

More information

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1 Lecture 15: Introduction to GPU programming Lecture 15: Introduction to GPU programming p. 1 Overview Hardware features of GPGPU Principles of GPU programming A good reference: David B. Kirk and Wen-mei

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References This set of slides is mainly based on: CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory Slide of Applied

More information

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of

More information

Khushboo Arora, Samiksha Agarwal, Rohit Tanwar

Khushboo Arora, Samiksha Agarwal, Rohit Tanwar International Journal of Scientific & Engineering Research, Volume 7, Issue 1, January-2016 1014 Solving TSP using Genetic Algorithm and Nearest Neighbour Algorithm and their Comparison Khushboo Arora,

More information

TABU search and Iterated Local Search classical OR methods

TABU search and Iterated Local Search classical OR methods TABU search and Iterated Local Search classical OR methods tks@imm.dtu.dk Informatics and Mathematical Modeling Technical University of Denmark 1 Outline TSP optimization problem Tabu Search (TS) (most

More information

Outline. TABU search and Iterated Local Search classical OR methods. Traveling Salesman Problem (TSP) 2-opt

Outline. TABU search and Iterated Local Search classical OR methods. Traveling Salesman Problem (TSP) 2-opt TABU search and Iterated Local Search classical OR methods Outline TSP optimization problem Tabu Search (TS) (most important) Iterated Local Search (ILS) tks@imm.dtu.dk Informatics and Mathematical Modeling

More information

Hybrid Differential Evolution Algorithm for Traveling Salesman Problem

Hybrid Differential Evolution Algorithm for Traveling Salesman Problem Available online at www.sciencedirect.com Procedia Engineering 15 (2011) 2716 2720 Advanced in Control Engineeringand Information Science Hybrid Differential Evolution Algorithm for Traveling Salesman

More information

Methods and Models for Combinatorial Optimization Heuristis for Combinatorial Optimization

Methods and Models for Combinatorial Optimization Heuristis for Combinatorial Optimization Methods and Models for Combinatorial Optimization Heuristis for Combinatorial Optimization L. De Giovanni 1 Introduction Solution methods for Combinatorial Optimization Problems (COPs) fall into two classes:

More information

When Network Embedding meets Reinforcement Learning?

When Network Embedding meets Reinforcement Learning? When Network Embedding meets Reinforcement Learning? ---Learning Combinatorial Optimization Problems over Graphs Changjun Fan 1 1. An Introduction to (Deep) Reinforcement Learning 2. How to combine NE

More information

A Parallel Access Method for Spatial Data Using GPU

A Parallel Access Method for Spatial Data Using GPU A Parallel Access Method for Spatial Data Using GPU Byoung-Woo Oh Department of Computer Engineering Kumoh National Institute of Technology Gumi, Korea bwoh@kumoh.ac.kr Abstract Spatial access methods

More information

ACO and other (meta)heuristics for CO

ACO and other (meta)heuristics for CO ACO and other (meta)heuristics for CO 32 33 Outline Notes on combinatorial optimization and algorithmic complexity Construction and modification metaheuristics: two complementary ways of searching a solution

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 12

More information

Origins of Operations Research: World War II

Origins of Operations Research: World War II ESD.83 Historical Roots Assignment METHODOLOGICAL LINKS BETWEEN OPERATIONS RESEARCH AND STOCHASTIC OPTIMIZATION Chaiwoo Lee Jennifer Morris 11/10/2010 Origins of Operations Research: World War II Need

More information

Non-deterministic Search techniques. Emma Hart

Non-deterministic Search techniques. Emma Hart Non-deterministic Search techniques Emma Hart Why do local search? Many real problems are too hard to solve with exact (deterministic) techniques Modern, non-deterministic techniques offer ways of getting

More information

Machine Learning for Software Engineering

Machine Learning for Software Engineering Machine Learning for Software Engineering Introduction and Motivation Prof. Dr.-Ing. Norbert Siegmund Intelligent Software Systems 1 2 Organizational Stuff Lectures: Tuesday 11:00 12:30 in room SR015 Cover

More information

Accelerating K-Means Clustering with Parallel Implementations and GPU computing

Accelerating K-Means Clustering with Parallel Implementations and GPU computing Accelerating K-Means Clustering with Parallel Implementations and GPU computing Janki Bhimani Electrical and Computer Engineering Dept. Northeastern University Boston, MA Email: bhimani@ece.neu.edu Miriam

More information

A Parallel Simulated Annealing Algorithm for Weapon-Target Assignment Problem

A Parallel Simulated Annealing Algorithm for Weapon-Target Assignment Problem A Parallel Simulated Annealing Algorithm for Weapon-Target Assignment Problem Emrullah SONUC Department of Computer Engineering Karabuk University Karabuk, TURKEY Baha SEN Department of Computer Engineering

More information

ND-Tree: a Fast Online Algorithm for Updating the Pareto Archive

ND-Tree: a Fast Online Algorithm for Updating the Pareto Archive ND-Tree: a Fast Online Algorithm for Updating the Pareto Archive i.e. return to Algorithms and Data structures Andrzej Jaszkiewicz, Thibaut Lust Multiobjective optimization " minimize" z1 f... " minimize"

More information

Research Article A Novel Metaheuristic for Travelling Salesman Problem

Research Article A Novel Metaheuristic for Travelling Salesman Problem Industrial Volume 2013, Article ID 347825, 5 pages http://dx.doi.org/10.1155/2013/347825 Research Article A Novel Metaheuristic for Travelling Salesman Problem Vahid Zharfi and Abolfazl Mirzazadeh Industrial

More information

A memetic algorithm for symmetric traveling salesman problem

A memetic algorithm for symmetric traveling salesman problem ISSN 1750-9653, England, UK International Journal of Management Science and Engineering Management Vol. 3 (2008) No. 4, pp. 275-283 A memetic algorithm for symmetric traveling salesman problem Keivan Ghoseiri

More information

Measurement of real time information using GPU

Measurement of real time information using GPU Measurement of real time information using GPU Pooja Sharma M. Tech Scholar, Department of Electronics and Communication E-mail: poojachaturvedi1985@gmail.com Rajni Billa M. Tech Scholar, Department of

More information

Effective Tour Searching for Large TSP Instances. Gerold Jäger

Effective Tour Searching for Large TSP Instances. Gerold Jäger Effective Tour Searching for Large TSP Instances Gerold Jäger Martin-Luther-University Halle-Wittenberg joint work with Changxing Dong, Paul Molitor, Dirk Richter November 14, 2008 Overview 1 Introduction

More information

Performance impact of dynamic parallelism on different clustering algorithms

Performance impact of dynamic parallelism on different clustering algorithms Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu

More information

Effective Optimizer Development for Solving Combinatorial Optimization Problems *

Effective Optimizer Development for Solving Combinatorial Optimization Problems * Proceedings of the 11th WSEAS International Conference on SYSTEMS, Agios Nikolaos, Crete Island, Greece, July 23-25, 2007 311 Effective Optimizer Development for Solving Combinatorial Optimization s *

More information

XIV International PhD Workshop OWD 2012, October Optimal structure of face detection algorithm using GPU architecture

XIV International PhD Workshop OWD 2012, October Optimal structure of face detection algorithm using GPU architecture XIV International PhD Workshop OWD 2012, 20 23 October 2012 Optimal structure of face detection algorithm using GPU architecture Dmitry Pertsau, Belarusian State University of Informatics and Radioelectronics

More information

Travelling Salesman Problem: Tabu Search

Travelling Salesman Problem: Tabu Search Travelling Salesman Problem: Tabu Search (Anonymized) April 2017 Abstract The Tabu Search algorithm is a heuristic method to find optimal solutions to the Travelling Salesman Problem (TSP). It is a local

More information

Clustering Strategy to Euclidean TSP

Clustering Strategy to Euclidean TSP 2010 Second International Conference on Computer Modeling and Simulation Clustering Strategy to Euclidean TSP Hamilton Path Role in Tour Construction Abdulah Fajar, Nur Azman Abu, Nanna Suryana Herman

More information

GPU Implementation of Implicit Runge-Kutta Methods

GPU Implementation of Implicit Runge-Kutta Methods GPU Implementation of Implicit Runge-Kutta Methods Navchetan Awasthi, Abhijith J Supercomputer Education and Research Centre Indian Institute of Science, Bangalore, India navchetanawasthi@gmail.com, abhijith31792@gmail.com

More information

A Memetic Algorithm for the Generalized Traveling Salesman Problem

A Memetic Algorithm for the Generalized Traveling Salesman Problem A Memetic Algorithm for the Generalized Traveling Salesman Problem Gregory Gutin Daniel Karapetyan Abstract The generalized traveling salesman problem (GTSP) is an extension of the well-known traveling

More information

A Discrete Fireworks Algorithm for Solving Large-Scale Travel Salesman Problem

A Discrete Fireworks Algorithm for Solving Large-Scale Travel Salesman Problem A Discrete Fireworks Algorithm for Solving Large-Scale Travel Salesman Problem Haoran Luo Peking University Beijing, China Email: luohaoran@pku.edu.cn Weidi Xu Key Laboratory of Machine Perception (MOE)

More information

Parallel Systems. Project topics

Parallel Systems. Project topics Parallel Systems Project topics 2016-2017 1. Scheduling Scheduling is a common problem which however is NP-complete, so that we are never sure about the optimality of the solution. Parallelisation is a

More information

Algorithms and Experimental Study for the Traveling Salesman Problem of Second Order. Gerold Jäger

Algorithms and Experimental Study for the Traveling Salesman Problem of Second Order. Gerold Jäger Algorithms and Experimental Study for the Traveling Salesman Problem of Second Order Gerold Jäger joint work with Paul Molitor University Halle-Wittenberg, Germany August 22, 2008 Overview 1 Introduction

More information

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents

More information

Modified Order Crossover (OX) Operator

Modified Order Crossover (OX) Operator Modified Order Crossover (OX) Operator Ms. Monica Sehrawat 1 N.C. College of Engineering, Israna Panipat, Haryana, INDIA. Mr. Sukhvir Singh 2 N.C. College of Engineering, Israna Panipat, Haryana, INDIA.

More information

Exploring GPU Architecture for N2P Image Processing Algorithms

Exploring GPU Architecture for N2P Image Processing Algorithms Exploring GPU Architecture for N2P Image Processing Algorithms Xuyuan Jin(0729183) x.jin@student.tue.nl 1. Introduction It is a trend that computer manufacturers provide multithreaded hardware that strongly

More information

SLS Methods: An Overview

SLS Methods: An Overview HEURSTC OPTMZATON SLS Methods: An Overview adapted from slides for SLS:FA, Chapter 2 Outline 1. Constructive Heuristics (Revisited) 2. terative mprovement (Revisited) 3. Simple SLS Methods 4. Hybrid SLS

More information

Solving the Large Scale Next Release Problem with a Backbone Based Multilevel Algorithm

Solving the Large Scale Next Release Problem with a Backbone Based Multilevel Algorithm IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 Solving the Large Scale Next Release Problem with a Backbone Based Multilevel Algorithm Jifeng Xuan, He Jiang, Member, IEEE, Zhilei Ren, and Zhongxuan

More information

Parallelization of Graph Isomorphism using OpenMP

Parallelization of Graph Isomorphism using OpenMP Parallelization of Graph Isomorphism using OpenMP Vijaya Balpande Research Scholar GHRCE, Nagpur Priyadarshini J L College of Engineering, Nagpur ABSTRACT Advancement in computer architecture leads to

More information

Dense matching GPU implementation

Dense matching GPU implementation Dense matching GPU implementation Author: Hailong Fu. Supervisor: Prof. Dr.-Ing. Norbert Haala, Dipl. -Ing. Mathias Rothermel. Universität Stuttgart 1. Introduction Correspondence problem is an important

More information

Optimizing Data Locality for Iterative Matrix Solvers on CUDA

Optimizing Data Locality for Iterative Matrix Solvers on CUDA Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,

More information

SCIENCE & TECHNOLOGY

SCIENCE & TECHNOLOGY Pertanika J. Sci. & Technol. 25 (S): 199-210 (2017) SCIENCE & TECHNOLOGY Journal homepage: http://www.pertanika.upm.edu.my/ Water Flow-Like Algorithm Improvement Using K-Opt Local Search Wu Diyi, Zulaiha

More information

RESEARCH ARTICLE. Accelerating Ant Colony Optimization for the Traveling Salesman Problem on the GPU

RESEARCH ARTICLE. Accelerating Ant Colony Optimization for the Traveling Salesman Problem on the GPU The International Journal of Parallel, Emergent and Distributed Systems Vol. 00, No. 00, Month 2011, 1 21 RESEARCH ARTICLE Accelerating Ant Colony Optimization for the Traveling Salesman Problem on the

More information

56:272 Integer Programming & Network Flows Final Exam -- December 16, 1997

56:272 Integer Programming & Network Flows Final Exam -- December 16, 1997 56:272 Integer Programming & Network Flows Final Exam -- December 16, 1997 Answer #1 and any five of the remaining six problems! possible score 1. Multiple Choice 25 2. Traveling Salesman Problem 15 3.

More information