A Simulated Annealing algorithm for GPU clusters

Size: px

Start display at page:

Download "A Simulated Annealing algorithm for GPU clusters"

Sophie Franklin
5 years ago
Views:

1 A Simulated Annealing algorithm for GPU clusters Institute of Computer Science Warsaw University of Technology Parallel Processing and Applied Mathematics 2011

2 1 Introduction 2 3 The lower level The upper level Results 4

3 Simulated Annealing (1) Metaheuristic for the global optimization problem Adaptation of Metropolis-Hastings Effective for computationally hard (NP-hard) problems

4 Simulated Annealing (2) Metropolis-Hastings algorithm On start: pick x min = x 0 R n, L < x 0 < U Find x i+1 based on x min If f(x i+1 ) < f(x min ) then x min = x i+1 Else x min = x i+1 with probability p

5 Simulated Annealing (2) Metropolis-Hastings algorithm On start: pick x min = x 0 R n, L < x 0 < U Find x i+1 based on x min If f(x i+1 ) < f(x min ) then x min = x i+1 Else x min = x i+1 with probability p Simulated Annealing The p value depends on the temperature The temperature is reduced whenever the suboptimal solution is accepted

6 Basic concepts of the processing platform (1) Two-level hierarchical architecture (master-slave)

7 Basic concepts of the processing platform (1) MASTER Upper level SLAVES Lower level CUDA Threads CUDA Threads CUDA Threads GPUs

8 Basic concepts of the processing platform (2) Assumption: Asynchronous communication on the upper level

9 Basic concepts of the processing platform (2) Assumption: Asynchronous communication on the upper level Implications: No direct cooperation between nodes (nor GPU threads)

10 Basic concepts of the processing platform (2) Assumption: Asynchronous communication on the upper level Implications: No direct cooperation between nodes (nor GPU threads) But: the system may be heterogeneous

11 Basic concepts of the processing platform (2) Asynchronous Synchronous or sequential Fast GPU Slow GPU

12 The lower level Introduction The lower level The upper level Results Initial solution computed on CPU Each thread perform m steps of sequential SA modifying only the d-th dimension (d = ID mod n) All solutions are adaptively composed into the new one

13 The upper level Introduction The lower level The upper level Results Whenever a result is obtained: The result is adaptively combined to the previous one If the new result is better, it is accepted

14 Test environment Introduction The lower level The upper level Results Comparison with traditional parallel SA (MHCS) MHCS: Sun Fire X4600M2 (8x AMD Opteron 8218) TLSA: 16x NVIDIA GeForce 130 GT n-dimensional test functions (n {10, 100})

15 The lower level The upper level Results Time results, n = 10 (mean of 5 runs) Function GPU cluster [s] Sun Fire [s] Ackley > 4min De Jong Giewangk > 4min Michalewicz Rastrigin > 4min Rosenbrock Schwefel

16 The lower level The upper level Results Time results, n = 100 (mean of 5 runs) Function GPU cluster [s] Sun Fire [s] Ackley De Jong Giewangk Michalewicz > 4h Rastrigin Rosenbrock Schwefel

17 Scalability (1) Introduction The lower level The upper level Results Ideal and obtained speedups Ideal Experiment 12 Speedup Number of machines

18 Scalability (2) Introduction The lower level The upper level Results Execution times for various cluster sizes 1 machine 2 machines 8 machines 16 machines Normalized time Number of thread blocks per machine

19 Scalability (3) Introduction The lower level The upper level Results Execution times for various GPUs 1x GeForce 130 GT 1x GeForce GTX Normalized time Number of thread blocks

20 Introduction Good efficiency for complex problems Higher versatility (heterogeneous environment) Almost ideal scalability w.r.t. number of machines (up to some point)

21 Thank you!

22 Time results, n = 10 Function GPU cluster Sun Fire m [s] σ [s] m [s] σ [s] Ackley > 4min 0 De Jong Giewangk > 4min 0 Michalewicz Rastrigin > 4min 0 Rosenbrock Schwefel

23 Time results, n = 100 Function GPU cluster Sun Fire m [s] σ [s] m [s] σ [s] Ackley De Jong Giewangk Michalewicz > 4h 0 Rastrigin Rosenbrock Schwefel

24 Results, n = 100 Function GPU Cluster Sun Fire name minimum m σ m σ Ackley De Jong Giewangk Michalewicz? Rastrigin Rosenbrock Schwefel

25 Results, n = 10 Function GPU Cluster Sun Fire name minimum m σ m σ Ackley De Jong Giewangk Michalewicz? Rastrigin Rosenbrock Schwefel

26 Observations Introduction More threads more SA steps in parallel Best when the number of dimensions number of threads What with other cases?

Approaches to Parallel Computing

Approaches to Parallel Computing K. Cooper 1 1 Department of Mathematics Washington State University 2019 Paradigms Concept Many hands make light work... Set several processors to work on separate aspects