July, 2014 1 SOLVING LARGE CARPOOLING PROBLEMS USING GRAPH THEORETIC TOOLS Irith Ben-Arroyo Hartman Datasim project - (joint work with Abed Abu dbai, Elad Cohen, Daniel Keren) University of Haifa, Israel
July, 2014 2 Outline of talk 1. Background 2. Matching Problem Matching in bipartite graphs The greedy matching algorithm Why is greedy performing so well? The assignment problem 3. General Problem formulation Is there an efficient algorithm? What do we know about the general problem? F-factor in bipartite graphs 4. Heuristics for the general problem Theoretical upper bounds Approximation algorithms 5. Other related problems
July, 2014 3 BACKGROUND Why carpool? Defining the graph model
July, 2014 4 Why carpool? Reduces the number of cars on the roads. Saves time, petrol, traffic congestion, noise, air pollution, parking spaces, stress, accidents. Encourages sociability
July, 2014 Summer School-2014 5 Background In IMOB (Transportation Research Institute University of Hasselt) an automatic service for carpooling is being designed. People register their periodic trip executions (PTE) i.e. a periodic trip on Monday from A to B leaving at about 9:00am Together with information: origin and destination, earliest and latest departure and arrival times, maximal detour distance acceptable, capacity of the car (if available) Information of the person: age, gender, educational level, special interests, etc.
July, 2014 6 Summer School-2014 Background The system suggests to individuals to share a car Individuals evaluate the suggestion, negotiate it, and possibly, agree to carpool. After the drive, individuals evaluate each other, the system uses the feedback for future trips.
July, 2014 7 Given: The problem informal description 1. a set of trips (Periodic Trip Executions), 2. some owners of trips own a car, and some don t, 3. compatibility measure (wxy) of x riding in the car owned by y, 4. capacity of every car, Can we match between people so as to minimize the number of cars, and maximize the total compatibility between passengers and drivers?
July, 2014 8 Defining the Graph Model passenger x Wxy C(y) y Owner of vehicle (driver) Wxy takes into account the origins and destinations of x and y, the times of departure and arrival, maximal detour distance, time flexibility, profiles of passenger and potential driver, feedback of passengers and driver. C(y) is the capacity of the car how many people it can contain including the driver.
July, 2014 9 How is wxy being computed? Use Path Similarity: 0 < pathsim(a,b) 1 When is it close to 0? When is it 1?
July, 2014 10 How is wxy being computed? Use Time Interval Similarity Use Profile Similarity (age, gender, income category, job type, music preference, etc.) Use Reputation (safety, timeliness)
July, 2014 11 Assume we allow one passenger in a car How do we model the problem? What do we optimize?
July, 2014 12 THE MATCHING PROBLEM Matchings in bipartite graphs The greedy matching algorithm Why is greedy performing so well? The assignment problem
July, 2014 13 Assume we allow one passenger in a car Definitions: A matching in a graph G=(V,E) is a collection of vertex disjoint edges. A matching is maximum (or maximum weight) if there is no other matching of larger cardinality (or larger weight). A matching is maximal (or maximal weight) if there is no other matching containing it which is of larger cardinality (or larger weight).
14 Example- matching
15 Example- maximal matching A matching is defined on undirected graphs. What do we do if the graph is directed? Prove this is a maximum matching
July, 2014 16 Matching in bipartite graphs Definition: A bipartite graph is a graph G=(V,E) where V=V 1 U V 2 and all edges in the graph are between V 1 andv 2. V 1 (passengers) V 2 (drivers)
July, 2014 17 Matching algorithms If the graph is bipartite and unweighted we have the Hungarian Algorithm O( V E ), or Hopcroft-Karp O ( E V 1/2 ) If the graph is bipartite and weighted Kuhn-Munkers alg. O( V 2 E ) If the graph is general and unweighted Edmonds (1965) algorithm and improvement by Micali-Vazirani O ( E V 1/2 ) If the graph is general and weighted Edmonds, and improvement by Galil O( V E log V )
The greedy algorithm for max weight matching 18 In the worst case greedy / optimal = 1/2
July, 2014 19 Worse case scenario of greedy matching Optimal matching
July, 2014 20 Worse case scenario of greedy matching Optimal matching Greedy matching
Number of graphs July, 2014 Summer School-2014 21 Performance of the Greedy Algorithm 150 Accuracy Histogram Good (and surprising!?) news! 100 50 1000 graphs, each of size 500x500 and 10% edge density 0 0.958 0.96 0.962 0.964 0.966 0.968 0.97 0.972 0.974 0.976 0.978 Accuracy (Greedy/K-M)
July, 2014 Summer School-2014 Why is the greedy heuristics performing so well? Theorem (P. Erdos 1961): A random graph in G n,p almost surely has stability number at most 2p -1 logn Idea of proof is to look at the random variable X number of stable sets of cardinality k+1 in G, and compute 22 E(X)= When k 2p 1 log n P( X 0) 1as n Implying that a random graph almost surely has stability no at most k.
July, 2014 Summer School-2014 23 Why is the greedy heuristics performing so well? 1. We use the fact that if G has a maximal (not maximum!) matching of size n-k then G has a stable set of size at least k. 2. We conclude that in G n,p the greedy matching algorithm will almost surely find a matching of size at least n- 2p -1 logn
July, 2014 Summer School-2014 24 Assignment Problem (bipartite weighted graphs) w w
July, 2014 Summer School-2014 Incremental Approach: 25 Given an optimal weighted matching M in G, finding quickly an optimal matching M in G. (where G differs from G by a relatively small number of edges. G is also called the perturbed graph.) We can estimate how far w(m) is from the optimal solution of G. We can use the optimal matching and covering in G in order to find quickly an optimal, or good enough solution to the perturbed graph G.
CPU-time (seconds) KM-inc total CPU-time/KM total CPU-time July, 2014 Summer School-2014 Comparing K-M to incremental K-M 26 18 Average Run Time Kuhn-Munkers incremental VS. Kuhn-Munkers 1.1 Run Time Ratio Kuhn-Munkers incremental VS. Kuhn-Munkers 16 14 KM KM-inc 1 0.9 12 10 0.8 0.7 0.6 8 0.5 6 0.4 4 0.3 2 0.2 0 0 5 10 15 20 25 30 35 40 generation Average run-time 0.1 0 5 10 15 20 25 30 35 40 generation Run-time ratio
July, 2014 27 GENERAL PROBLEM FORMULATION Is there an efficient algorithm? What do we know about the general problem? F-factor in bipartite graphs
July, 2014 28 What is different if we allow more than one passenger in a car?
July, 2014 29 General Problem Formal description Definitions: A directed star, c(r)=5 A Star Partition is a covering of V by disjoint directed stars. Feasible Star Partition is a star partition where each star with root r is of in-degree at most c(r), and r has a loop. Problem formulation: Given G=(V,E), c:v -> N, w:e -> (0,1) Find a feasible star partition of V(G) such that the sum of the weights of all the edges in the stars is maximized.
July, 2014 30 General Problem Formulation -LP
July, 2014 31 Does this problem minimize no of drivers? Example: 1 1 0.6 0.6 0.6 1 Assume the capacity of each vertex is 6. Weights of the edges are denoted in the graph. All other edges in the graph have weight 0. 1
Example Summer School-2014 July, 2014 32 1 1 0.6 0.6 0.6 1 What is max Σwij xij subject to the LP constraints? It is 4 with 2 drivers 1 Assume the capacity of each vertex is 6. Weights of the edges are denoted in the graph. All other edges in the graph have weight 0.
Example Summer School-2014 July, 2014 33 1 1 0.6 0.6 0.6 1 1 What is max Σwij xij subject to the LP constraints? It is 4 with 2 drivers. But the minimum no. of drivers is 1 with total weight 3.8 Assume the capacity of each vertex is 5. Weights of the edges are denoted in the graph. All other edges in the graph have weight 0.
July, 2014 34 When is optimal star partition = minimum no of drivers? Problem 1: Given G=(V,E), w:e ->R, c:v -> N, find a feasible star partition that covers a set of edges of maximum weight. Problem 2: Given G=(V,E), c:v -> N, find a feasible star partition with a minimum number of stars. Claim: If w ij = 1 for every existing edge (except for loops which have w ii =0) then Problems 1 and 2 are equivalent
July, 2014 35 Every star with c legs covers c+1 vertices. If we have a star partition with d stars, then the total no of edges covered by the stars is V - d. Proof of Claim Thus minimizing the number of stars is equivalent to maximizing the number of edges covered by the stars.
36
July, 2014 37 IS THERE AN EFFICIENT ALGORITHM FOR THE GENERAL CARPOOLING PROBLEM? No! The problem is NP-Complete
July, 2014 38 Proof of NP-completeness Claim 1: When the drivers are unknown, and the edge weights are 0/1 the problem is NP-hard. Proof: Reduction from the Minimum Dominating Set Problem. A dominating set in a graph is a subset of vertices such that for every there exists some adjacent to it. u ÏV ' Given a directed graph G = (V, E), and an integer k > 0, does there exist a dominating set of size at most k? - an NP-Complete problem. V ' ÏV v Î V '
July, 2014 Summer School-2014 39 Proof of NP-completeness Claim 1: When the drivers are unknown, and the edge weights are 0/1 the problem is NP-hard. Proof: Reduction from the Minimum Dominating Set Problem. G C(v)=max-degree
July, 2014 Summer School-2014 40 Proof of NP-completeness Claim 2: When the drivers are unknown, the edge weights are 0/1 and c(v) 3 the problem is NP-hard. Proof: Reduction from the problem of partitioning into paths of length two.
July, 2014 41 What about the case when the drivers are known in advance?
July, 2014 Summer School-2014 42 What about the case when the drivers are known in advance? C=3. Is this an optimal solution? If yes prove it! No give a better solution. V 1 (passengers) V 2 (drivers)
July, 2014 43 An f - factor in a graph Let f :V N. An f-factor is a collection of edges E' E such that E meets every v V in exactly f(v) edges. Q1: What is a 1-factor? Q2: For the carpool problem when the drivers are known and every driver can take at most 4 passengers, what are we looking for?
July, 2014 44 example f ( v) 1 2 if if v V v V 1 2 V 1 (passengers) V 2 (drivers)
July, 2014 45 How do we find an f-factor, or a maximum partial f-factor in a bipartite graph? Convert G to G, and look for a 1-factor in G. V 1 V 2 V 1 V 2 G G
July, 2014 46 How do we find an f-factor, or a maximum partial f-factor in a bipartite graph? Convert G to G, and look for a 1-factor in G. V 1 V 2 V 1 V 2 G G
July, 2014 47 What do we know about the general problem? There are 8 possible scenarios, depending on these 3 questions: Capacity 1/2 or general? Capacity 1/2 is a matching problem. there exist efficient algorithms General capacity is a star partition problem- Intractable problem Edge weights 0/1 or (0,1)? 0/1 edge weights is an un-weighted graph. (0,1) edge weights corresponds to a weighted graph Drivers are a-priori known/unknown? If the drivers are known, the graph is bipartite. (edges among passengers, or among drivers are irrelevant). If the drivers are not known, the graph is general. Intractable problem
July, 2014 48 Summary of all possible scenarios 1/2 capacity General Capacity 1/2 capacity Known Drivers Max bipartite matching Can be reduced to max bipartite matching 0/1 edge weights Known Drivers Max weighted bipartite matching (assignment Pb) Unknown Drivers Max matching in general graphs NP-hard (even for capacity 3) Unknown Drivers Max weight matching in general graphs General Capacity Can be reduced to max weighted bipartite matching NP-hard (even for capacity 3) General edge weights
July, 2014 49 HEURISTICS FOR THE GENERAL PROBLEM Challenges: NP-hard problem for c >2 Theoretical upper bounds Approximation algorithms
July, 2014 50 Basic, greedy heuristics for the general carpooling problem Given G=(V,E), c:v -> N, w:e -> (0,1) and a subset D of V consisting of potential drivers. Take heaviest edge as long as it does not violate the star family
July, 2014 51 Better (linear) heuristics for the general problem We take into account the following considerations: 1. It is preferable to match non-drivers before potential drivers, since potential drivers, if unmatched, can always drive their own vehicle. 2. Since we would like to minimize the number of vehicles, it is preferable to assign passengers to existing vehicles, (which already contain passengers) than to use 'new' vehicles. 3. If a new vehicle is used, it is preferable to use a vehicle with larger capacity, than to use a small capacity vehicle.
July, 2014 52 Greedy heuristics for the general problem
July, 2014 53 A different approach by giving potential drivers a weight function
July, 2014 54 Other greedy heuristics... Other heuristics are also possible, such as Pick a vertex v (in D) with a highest sum of weights of inedges to v. Add highest c(v) edges to v, remove from the graph, and continue...
July, 2014 55 Heaviest driver heuristics-v1
July, 2014 56 Heaviest driver heuristics- V3 1. Guess a set of drivers (the heaviest drivers ), 2. Try to match all passengers to the drivers (a bipartite graph problem) 3. If you do not succeed, then add drivers, until you succeed.
July, 2014 57 Greedy with local adjustments of weights
July, 2014 58 THEORETICAL UPPER BOUNDS
July, 2014 59 Naïve upper bound to a star family The number of stars is at least n/c Number of edges chosen is at most (c-1)n/c
July, 2014 60 Assume Better upper bound to a star family c(v ) ³ c(v ) ³... ³ c(v ) 1 2 n k 1 c( v ) Take the smallest k s.t. i i Now sort E in descending order by w w(e 1 ) ³ w(e 2 ) ³... ³ w(e m ) n Then max w( H) n k i 1 w( ei )
July, 2014 61 How do we evaluate the heuristics? We can compare the weight of the selected star family, w(h) in different heuristics, as well as the amount of all unmatched vertices U. We can compare running times between heuristics We can compare w(h) to a theoretical upper bound:
July, 2014 Summer School-2014 62 Comparison with optimal solution for c=2 i.e. finding a maximum weight matching
July, 2014 Summer School-2014 63 How do we compare to an optimal solution if we cannot compute the optimal solution? 1. IDEA: Assume c=5, w=1. 2. Plant an optimal solution. 3. Add edges to hide it. 4. Run the heuristic algorithms and see if they find it.
July, 2014 Summer School-2014 64 plant an optimal solution
July, 2014 Summer School-2014 Add edges to hide it 65
July, 2014 Summer School-2014 66 Comparison with a known optimal solution
July, 2014 67 Results on real data
July, 2014 68 Approximation algorithms An algorithm is a p-approximation algorithm if it has a solution of value f(x) (for input x), where p OPT f ( x) E.g. the greedy matching algorithm is ½-approximation algorithm. What does the greedy star partition algorithm give? OPT
July, 2014 69 Greedy star forest is 1/c -approximation E.g.c=5
July, 2014 70 Greedy star forest is 1/c -approximation E.g.c=5 Optimal solution
July, 2014 71 Greedy star forest is 1/c -approximation E.g.c=5 Optimal solution Greedy solution
July, 2014 72 Can we find a heuristics that is better than 1/c - approximation? If the graph is undirected, there is a ½ - approximation algorithm for the star forest problem. [Nguyen, Shen, Hou, Sheng, Miller and Zhang] 1. Take a maximum weight spanning tree t. 2. Pick either the odd layers or the even layers of t whichever is the heaviest. 3. Get a star forest of weight at least ½ the optimum star forest. w( SF 1 ( ) 1 ( ) 1 G) w T OPT G OPT ( G) 2 G 2 t 2 sf
July, 2014 73 Example 1. Taking a max weight spanning tree
July, 2014 74 Example 1. Pick either the odd or even layers of the tree
July, 2014 75 What about directed graphs? 1. Take a maximum weight reverse-arborescence. (Algorithm by Edmonds)
July, 2014 76 What are the problems with this algorithm? 1. Is a max weight reverse-arborescence heavier than a max weight star forest? 2. How can we guarantee that the in-degree of every vertex is not greater than its capacity c(v)?
July, 2014 77 Solution to problem 1 W=0 G
July, 2014 78 Solution to problem 1 W=0 G
July, 2014 79 Solution to problem 1 W=0 G
July, 2014 80 1. Carpooling with Aversion: Additional Extensions Some passengers do not want to share a ride with some other specific passengers. We have shown this problem is NP-complete even when the drivers are known. (reduction from the minimum vertex-colouring problem). 2. Carpooling with Attachment: Some passengers prefer to be with other passengers in the same ride. We have shown this problem is NP-complete even when the drivers are known. (reduction from the knapsack problem).
July, 2014 81 Additional Extensions 3. Carpooling with VIP passengers. Some passengers do not want to be squeezed in a carpool, they want to share a ride with only few others: NP-Complete same as above.
July, 2014 82 Conclusions We have proved that the general carpooling problem is NPhard Found quick algorithms and incremental algorithms for the case of bipartite graphs. Devised and implemented 6 different heuristics for the general problem on real data Compared between the heuristics in terms of running time and performance Compared to the optimal matching in general graphs. (c=2), and to various upper bounds on the general problem. Challenge: Find a good approximation algorithm and show it is close enough to the optimal solution.
July, 2014 83 THANK YOU! Questions?