Parallel Traveling Salesman. PhD Student: Viet Anh Trinh Advisor: Professor Feng Gu.

Parallel Traveling Salesman PhD Student: Viet Anh Trinh Advisor: Professor Feng Gu

Agenda 1. Traveling salesman introduction 2. Genetic Algorithm for TSP 3. Tree Search for TSP

Travelling Salesman - Set of N cities - Find the shortest closed non-looping path that covers all the cities - No city be visited more than once

Travelling Salesman First Parallel Approach: Genetic Algorithm

Travelling Salesman -Sequential Genetic Algorithm Initialization 0123, 0231,1320, 0321 Fitness Evaluation (0123) = 1/(1 + 2 + 10 + 7) = 0.050 (0231) = 1/(3 + 10 + 4 + 5) =0.045 (1320) =1/(6 + 12 + 1 + 1) = 0.050 (0321) = 1/(8 + 12 + 18 + 5) = 0.023 Selection Cross-over Mutation Termination

Sequential GA Travelling Salesman - Individuals - Closed non-looping paths across all cities - Initial Population - Set of randomly generated paths - Evaluation - Assess the fitness of the individual. Fitness is 1/ total distance of a given path - Selection - Select the fittest individuals ( biggest fitness, smallest distance) - Offspring production - Cross-over + mutation

Selection Roulette Wheel Selection: P(choice (0123)) = 0.05/(0.05+ 0.045 + 0.05 + 0.023) = 0.3 P(choice (0231) = 0.27 P(choice (1320)) = 0.3 P(choice (0321)) = 0.13 1st path 2nd path 3rd path 4th path If random number fall in: 0 r < 0.3 choose 0123 0.3 r < 0.57 choose 0231 0.57 r < 0.87 choose 1320 0.87 r < 1 choose 0321

Specialized Crossover Operator Normal Crossover: Invalid path appear Order Crossover(OX): No invalid part

Mutation - Select 2 random point and swap - Ensure the valid path

Sequential Genetic Algorithm

Parallel Genetic Algorithm Fitness Evaluation (0123) = 1/(1 + 2 + 10 + 7) = 0.050 (0231) = 1/(3 + 10 + 4 + 5) =0.045 Master: Initialization 0123, 0231,1320, 0321 Slave 1 Slave 2 Fitness Evaluation (1320) =1/(6 + 12 + 1 + 1) = 0.050 (0321) = 1/(8 + 12 + 18 + 5) = 0.023 Selection Cross-over Mutation Selection Cross-over Mutation Termination Termination

Parallel Travelling Salesman - Master Master - Initializes population - Sends path to slaves - Examine the best paths from slaves return results Slave - Signals the master that it is ready for work - Waits for paths to be sent by the master until a termination message is received - Evaluates the paths fitness - Selection - Crossover - Mutation - Sends the best c paths to nearby neighbors after k generations - When finish, send best paths and their lengths to master

Time Complexity Sequential Time Complexity: n :population size l : length of a path, number of cities g : number of generation Sequential Genetic Algorithm: Initialization : O(n) Evaluation: O(nl) Selection: O(nl) Crossover: c1x O(nl) Mutation: c2 x O(nl) Time: O(nl) + go(nl) =O(gnl)

Time Complexity Parallel Isolated subpopulations Stepping model model: only send best individuals to neighbor processor Communication time Master send data to slave using scatter: t comm1 = O(nl/p) Slave send best c paths to neighbor processor after k generations: t comm2 = g/ko(cl) = g/ko(l) Slave send their c best paths and their length value to Master: t comm3 = O(cl) Computation time Master Initialization : t comp1 =O(n) Slave evaluation, selection, crossover, mutation: t comp2 = O(gnl/p) Master final evaluation : t comp3 =O(pc) Parallel time : t p = O(gnl/p) Speed up = t s /t p = p Efficiency = t s /pt p =1

Travelling Salesman Second Parallel Approach: Tree Search

Travelling Salesman

Travelling Salesman Tree Search

Travelling Salesman Sequential Algorithm

Travelling Salesman Sequential Algorithm - City count: examines the partial tour if there are n cities on the partial tour. - Best tour: check if the complete tour has a lower cost than best tour - Update best tour: replace the current best tour with this tour - Feasible: checks to see if the city or vertex has already been visited.

Travelling Salesman Sequential

Travelling Salesman Parallel Static load balancing (picture) à Imbalance load Solution à Dynamic load balancing

Travelling Salesman Parallel

Travelling Salesman Parallel Terminologies Donor process: the process that sends work Recipient process: the process that requests/receives work Half-split: ideally, the stack is split into two equal pieces such that the search space of each stack is the same Cutoff depth: to avoid sending very small amounts of work, nodes beyond a specified stack depth are not given away

Travelling Salesman Parallel Some possible strategies 1. Send nodes near the bottom of the stack Works well with uniform search space; has low splitting cost 2. Send nodes near the cutoff depth Performs better with a strong heuristic (tries to distribute the parts of the search space likely to contain a solution) 3. Send half the nodes between the bottom and the cutoff depth Works well with uniform and irregular search space

Travelling Salesman Parallel

Travelling Salesman Parallel The entire space is assigned to master When slave runs out of work, it gets more work from another slave using work requests and responses Unexplored states can be conveniently stored as local stacks at processors. Slave terminate when reaching final state

Travelling Salesman Parallel Load balancing scheme: Random polling (RP) When a processor becomes idle, it randomly selects a donor. Each processor is selected as a donor with equal probability, ensuring that work requests are evenly distributed.

Travelling Salesman Parallel Let W be serial work and pw p be parallel work. Search overhead factor s is defined as pw P /W Quantify total overhead T o in terms of W to compute scalability. T o = pw p W Upper bound on speed up is p 1/s.

Travelling Salesman Parallel Assumption: Search overhead factor = one Work at any processor can be partitioned into independent pieces as long as its size exceeds a threshold ε. A reasonable work-splitting mechanism is available. If work w at a processor is split into two parts ψw and (1 ψ)w, there exists an arbitrarily small constant α (0 < α 0.5),such that ψw > αw and (1 ψ)w > αw. The constant α sets a lower bound on the load imbalance from work splitting.

Travelling Salesman Parallel If processor P i initially had work w i, after a single request by processor P j and split, neither P i nor P j have more than (1 α)w i work. For each load balancing strategy, we define V(P) as the total number of work requests after which each processor receives at least one work request (note that V(p) p). Assume that the largest piece of work at any point is W. After V(p) requests, the maximum work remaining at any processor is less than (1 α)w; after 2V(p) requests, it is less than (1 α) 2 W; After (log 1/1(1- α ) (W/ε))V(p) requests, the maximum work remaining at any processor is below a threshold value ε. The total number of work requests is O(V(p) log W).

Travelling Salesman Parallel If t comm is the time required to communicate a piece of work, then the communication overhead T O is T O = t comm V(p)log W The corresponding efficiency E is given by:

Travelling Salesman Parallel Random Polling Worst case V(p) is unbounded. We do average case analysis. Let F(i,p) represent a state in which i of the processors have been requested, and p i have not. Let f(i,p) denote the average number of trials needed to change from state F(i,p) to F(p,p) (V(p) = f(0,p)).

Travelling Salesman Parallel We have As p becomes large, H p 1.69 ln p. Thus, V(p) = O(p log p). T o = O(p log p log W) Therefore W = O(p log 2 p).

END OF PRESENTATION THANK YOU!