Parallel Traveling Salesman PhD Student: Viet Anh Trinh Advisor: Professor Feng Gu
Agenda 1. Traveling salesman introduction 2. Genetic Algorithm for TSP 3. Tree Search for TSP
Travelling Salesman - Set of N cities - Find the shortest closed non-looping path that covers all the cities - No city be visited more than once
Travelling Salesman First Parallel Approach: Genetic Algorithm
Travelling Salesman -Sequential Genetic Algorithm Initialization 0123, 0231,1320, 0321 Fitness Evaluation (0123) = 1/(1 + 2 + 10 + 7) = 0.050 (0231) = 1/(3 + 10 + 4 + 5) =0.045 (1320) =1/(6 + 12 + 1 + 1) = 0.050 (0321) = 1/(8 + 12 + 18 + 5) = 0.023 Selection Cross-over Mutation Termination
Sequential GA Travelling Salesman - Individuals - Closed non-looping paths across all cities - Initial Population - Set of randomly generated paths - Evaluation - Assess the fitness of the individual. Fitness is 1/ total distance of a given path - Selection - Select the fittest individuals ( biggest fitness, smallest distance) - Offspring production - Cross-over + mutation
Selection Roulette Wheel Selection: P(choice (0123)) = 0.05/(0.05+ 0.045 + 0.05 + 0.023) = 0.3 P(choice (0231) = 0.27 P(choice (1320)) = 0.3 P(choice (0321)) = 0.13 1st path 2nd path 3rd path 4th path If random number fall in: 0 r < 0.3 choose 0123 0.3 r < 0.57 choose 0231 0.57 r < 0.87 choose 1320 0.87 r < 1 choose 0321
Specialized Crossover Operator Normal Crossover: Invalid path appear Order Crossover(OX): No invalid part
Mutation - Select 2 random point and swap - Ensure the valid path
Sequential Genetic Algorithm
Parallel Genetic Algorithm Fitness Evaluation (0123) = 1/(1 + 2 + 10 + 7) = 0.050 (0231) = 1/(3 + 10 + 4 + 5) =0.045 Master: Initialization 0123, 0231,1320, 0321 Slave 1 Slave 2 Fitness Evaluation (1320) =1/(6 + 12 + 1 + 1) = 0.050 (0321) = 1/(8 + 12 + 18 + 5) = 0.023 Selection Cross-over Mutation Selection Cross-over Mutation Termination Termination
Parallel Travelling Salesman - Master Master - Initializes population - Sends path to slaves - Examine the best paths from slaves return results Slave - Signals the master that it is ready for work - Waits for paths to be sent by the master until a termination message is received - Evaluates the paths fitness - Selection - Crossover - Mutation - Sends the best c paths to nearby neighbors after k generations - When finish, send best paths and their lengths to master
Time Complexity Sequential Time Complexity: n :population size l : length of a path, number of cities g : number of generation Sequential Genetic Algorithm: Initialization : O(n) Evaluation: O(nl) Selection: O(nl) Crossover: c1x O(nl) Mutation: c2 x O(nl) Time: O(nl) + go(nl) =O(gnl)
Time Complexity Parallel Isolated subpopulations Stepping model model: only send best individuals to neighbor processor Communication time Master send data to slave using scatter: t comm1 = O(nl/p) Slave send best c paths to neighbor processor after k generations: t comm2 = g/ko(cl) = g/ko(l) Slave send their c best paths and their length value to Master: t comm3 = O(cl) Computation time Master Initialization : t comp1 =O(n) Slave evaluation, selection, crossover, mutation: t comp2 = O(gnl/p) Master final evaluation : t comp3 =O(pc) Parallel time : t p = O(gnl/p) Speed up = t s /t p = p Efficiency = t s /pt p =1
Travelling Salesman Second Parallel Approach: Tree Search
Travelling Salesman
Travelling Salesman Tree Search
Travelling Salesman Sequential Algorithm
Travelling Salesman Sequential Algorithm - City count: examines the partial tour if there are n cities on the partial tour. - Best tour: check if the complete tour has a lower cost than best tour - Update best tour: replace the current best tour with this tour - Feasible: checks to see if the city or vertex has already been visited.
Travelling Salesman Sequential
Travelling Salesman Sequential
Travelling Salesman Sequential
Travelling Salesman Parallel Static load balancing (picture) à Imbalance load Solution à Dynamic load balancing
Travelling Salesman Parallel
Travelling Salesman Parallel Terminologies Donor process: the process that sends work Recipient process: the process that requests/receives work Half-split: ideally, the stack is split into two equal pieces such that the search space of each stack is the same Cutoff depth: to avoid sending very small amounts of work, nodes beyond a specified stack depth are not given away
Travelling Salesman Parallel Some possible strategies 1. Send nodes near the bottom of the stack Works well with uniform search space; has low splitting cost 2. Send nodes near the cutoff depth Performs better with a strong heuristic (tries to distribute the parts of the search space likely to contain a solution) 3. Send half the nodes between the bottom and the cutoff depth Works well with uniform and irregular search space
Travelling Salesman Parallel
Travelling Salesman Parallel The entire space is assigned to master When slave runs out of work, it gets more work from another slave using work requests and responses Unexplored states can be conveniently stored as local stacks at processors. Slave terminate when reaching final state
Travelling Salesman Parallel Load balancing scheme: Random polling (RP) When a processor becomes idle, it randomly selects a donor. Each processor is selected as a donor with equal probability, ensuring that work requests are evenly distributed.
Travelling Salesman Parallel Let W be serial work and pw p be parallel work. Search overhead factor s is defined as pw P /W Quantify total overhead T o in terms of W to compute scalability. T o = pw p W Upper bound on speed up is p 1/s.
Travelling Salesman Parallel Assumption: Search overhead factor = one Work at any processor can be partitioned into independent pieces as long as its size exceeds a threshold ε. A reasonable work-splitting mechanism is available. If work w at a processor is split into two parts ψw and (1 ψ)w, there exists an arbitrarily small constant α (0 < α 0.5),such that ψw > αw and (1 ψ)w > αw. The constant α sets a lower bound on the load imbalance from work splitting.
Travelling Salesman Parallel If processor P i initially had work w i, after a single request by processor P j and split, neither P i nor P j have more than (1 α)w i work. For each load balancing strategy, we define V(P) as the total number of work requests after which each processor receives at least one work request (note that V(p) p). Assume that the largest piece of work at any point is W. After V(p) requests, the maximum work remaining at any processor is less than (1 α)w; after 2V(p) requests, it is less than (1 α) 2 W; After (log 1/1(1- α ) (W/ε))V(p) requests, the maximum work remaining at any processor is below a threshold value ε. The total number of work requests is O(V(p) log W).
Travelling Salesman Parallel If t comm is the time required to communicate a piece of work, then the communication overhead T O is T O = t comm V(p)log W The corresponding efficiency E is given by:
Travelling Salesman Parallel Random Polling Worst case V(p) is unbounded. We do average case analysis. Let F(i,p) represent a state in which i of the processors have been requested, and p i have not. Let f(i,p) denote the average number of trials needed to change from state F(i,p) to F(p,p) (V(p) = f(0,p)).
Travelling Salesman Parallel We have As p becomes large, H p 1.69 ln p. Thus, V(p) = O(p log p). T o = O(p log p log W) Therefore W = O(p log 2 p).
END OF PRESENTATION THANK YOU!