Data Knowledge and Optimization. Patrick De Causmaecker CODeS-imec Department of Computer Science KU Leuven/ KULAK

Size: px

Start display at page:

Download "Data Knowledge and Optimization. Patrick De Causmaecker CODeS-imec Department of Computer Science KU Leuven/ KULAK"

Spencer Hood
5 years ago
Views:

1 Data Knowledge and Optimization Patrick De Causmaecker CODeS-imec Department of Computer Science KU Leuven/ KULAK

2 Whishes from West-Flanders, Belgium

3 10 rules for success in supply chain mngmnt & logistics systems (Ratliff, 2013) 1. Objectives - quantified, measurable 2. Models - faithful, essential 3. Variability - explicitly considered 4. Data - accurate, timely, and comprehensive 5. Integration - fully automated data transfer 6. Delivery - support execution, management and control 7. Algorithms - exploit individual problem structure 8. People - expertise in models, data, and optimization 9. Process - support optimization, continuously improve 10.ROI - provable, cost of technology, people and operations

4 10 rules for success in supply chain management & logistics systems 1. Objectives - quantified, measurable 2. Models - faithful, essential 3. Variability - explicitly considered 4. Data - accurate, timely, and comprehensive 5. Knowledge of (domain) experts 6. Integration - fully automated data transfer 7. Delivery - support execution, management and control 8. Algorithms - exploit individual problem structure 9. People - expertise in models, data, and optimization 10.Process - support optimization, continuously improve 11.ROI - provable, cost of technology, people and operations

5 Subjects Machine learning and optimization -> computing Configuration, tuning, construction Bayesian optimization Formal languages Large graphs, classification, pattern mining Matrix factorization Adaptive algorithms and hyperheuristics Algorithm selection (on line learning) (*) Algorithm design (*)

6 Characterization of neighborhood behaviours in a multi-neighborhood local search algorithm Nguyen Dang, Patrick De Causmaecker KU Leuven, Belgium CODeS, ITEC-iMinds

7 Context A multi-neighborhood local search for the Swap-body Vehicle Routing problem Authors: Jan Christiaens, Tony Wauters, Túlio Toffolo, Sam Van Malderen Winner of the Verolog Solver Challenge

8 A multi-neighborhood local search for the Swap-body Vehicle Routing problem Neighborhood types (18) Cheapest insertion Swap Intra-route 2-opt Inter-route 2-opt Change swap location Merge routes Split to sub-routes Ruin recreate Remove route Remove sub-route Remove sub-route with cheapest insertion Remove chains EachSequenceCheapestInsert Convert to route Convert to sub-route Add sub-route Ejection chain P 8

9 A multi-neighborhood local search for the Swap-body Vehicle Routing problem Neighborhoods (42) Cheapest insertion (11) Swap (1) Intra-route 2-opt (1) Inter-route 2-opt (1) Change swap location (1) Merge routes (1) Split to sub-routes (1) Ruin recreate (2) Remove route (1) Remove sub-route (1) Remove sub-route with cheapest insertion (1) Remove chains (8) EachSequenceCheapestInsert (3) Convert to route (1) Convert to sub-route (1) Add sub-route (1) Ejection chain (6) 9

10 A multi-neighborhood local search for the Swap-body Vehicle Routing problem An Iterated Local Search algorithm s* = Late Acceptance Hill Climbing(s 0 ) while (time <= time-limit) s = Perturbate(s*) s = Late Acceptance Hill Climbing(s) if (f(s ) < f(s*)) s* = s 10

11 A multi-neighborhood local search for the Swap-body Vehicle Routing problem Late Acceptance Hill Climbing s = s* = s0 while (#iteration-without-improvement <= itwi) s = neighbor(s) if f(s ) <= f(s) or f(s ) <= f(solution-of-listsize-step-before) s = s Update s* Neighborhood N 1 with weight w 1 Neighborhood N 2 with weight w 2 Neighborhood N 3 with weight w 3. w 1 + w w n = 1 A 11

12 Research question Groups of similar neighborhoods? 1. Help algorithm designers to understand the behaviour of these neighbourhoods better 2. Characterize each neighborhood behaviours as a feature vector (based on information collected from different algorithm runs) 3. Clustering neighborhoods N 12

13 Observables For each neighborhood instance combination Probability improve, worsen or nothing Magnitudes of improvement and worsening Running time (for tie-breaking) Collect statistics during testrun Mustafa Mısır, B., Stephanus Daniel Handoko, and Hoong Chuin Lau. "OSCAR: Online Selection of Algorithm Portfolios with Case Study on Memetic Algorithms." Learning and Intelligent Optimization:

14 Characterize neighborhood s behaviours Merge route Cheapest insertion 2 r nothing r worsen r improve A Remove route Cheapest insertion 25 X-axis: quality of the current solution 14

Solution quality regions and visits interval i 1 interval i 2 1000 intervals upper bound

15 Solution quality regions and visits interval i 1 interval i intervals upper bound lower bound easy to reach, easy to escape easy to reach, hard to escape hard to reach 15

16 Feature Step 3: characterize each neighborhood as a feature vector Aggregate collected information into regions Probability of improvement (I), worsening (W), doing nothing (SN) on region f: r improve f = ni k / niters k k f k f r worsen f = nw k / niters k k f k f r nothing f = 1 - r improve f - r worsen f 16

17 Feature vectors Per region Probabilities by size Ammounts by a ranking procedure Kolde, R., Laur, S., Adler, P., & Vilo, J. (2012). Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics, 28(4),

18 Clustering neighborhoods on similarity Neighbourhood characteristics #instances #regions #observables elements High-dimensional low-sample size (42 individuals, 150 dimensions) 18

19 Clustering neighborhoods High-dimensional low-sample size clustering method: HDDC (Berge et al 2012) Apply isometric log-transformation (Aitchison 1986) on r improve, r worsen, r nothing (compositional data) before clustering Bergé, L., Bouveyron, C., & Girard, S. (2012). HDclassif: An R package for model-based clustering and discriminant analysis of high-dimensional data. Aitchison, J. (1986). The statistical analysis of compositional data. 19

20 Clustering neighborhoods Clustering result: 9 clusters Ejection-chain 3, 4, 5; Remove-chain 1, 2, 3, 6, 7, 8; Remove-sub-route-with-cheapest-insertion; Swap; Inter-route-two-opt Cheapest-insertion 10, 15, 20, 25, 35, 50; Each-sequence-cheapest-insertion (2,5), (4,4), (5,2); Removechain 4 Cheapest-insertion 1, 2, 3, 4, 5 Change-swap-location; Merge-route Add-sub-route; Convert-to-sub-route Ejection-chain 10, 15, 35; Remove-chain 5; Intra-route-two-opt Ruin-recreate 2, 3 Convert-to-route; Remove-sub-route; Remove-route; Split-to-sub-route 20

21 Run a number of algorithm runs on each problem instance Collect log files (accumulable statistics on neighborhoods behaviours) Automated parameter tuning with k (<<42) parameters Characterize and cluster neighborhoods into k (<<42) groups 21

22 Apply to algorithm tuning Three scenarios: Original (42 weight parameters) Basic (28 weight parameters) Clustered (9 weight parameters): all neighborhoods in a cluster have the same weight 22

23 Apply to algorithm tuning original vs clustered : p = basic vs clustered : p =

24 Apply to algorithm tuning Original vs original with identical weights: p = Basic vs basic with identical weights: p = Clustered vs cluster with identical weights: p =

25 Thank you Dang, Nguyen Thi Thanh; De Causmaecker, Patrick. Characterization of neighborhood behaviours in a multineighborhood local search algorithm, Lecture Notes In Computer Science, Springer, Learning and Intelligent OptimizatioN Conference, Italy, 29/5-1/ Data Science meets Optimization o EURO Working Group (workshop)

26 Online Algorithm Selection Learning when to use which Algorithm Hans Degroote KU Leuven campus KULAK (Belgium) Supervised by Patrick De Causmaecker Joined work with Bernd Bischl (LMU Munich) Lars Kotthoff (Univeristy of British Colombia)

27 Overview Algorithm portfolios When are algorithm selection/portfolios useful (+ quantification) The algorithm selection problem The online algorithm selection problem o Exploration vs exploitation trade-off Empirical validation

28 Algorithm selection A portfolio of complementary algorithms exists Computational resources are limited AND Competitions Operational problems (production schedule for the day, delivery schedule for the day ) Sub-problem that needs to be solved often If much time or cores are available: use a parallel portfolio

29 Satzilla SAT competition: o Many algorithms competing and improving over time Idea of Satzilla: on line selection from a portfolio from previous competitions o o Result: dominated the competition (as long as ) Reason: highly complementary set of solvers Xu, Lin, et al. "SATzilla: portfolio-based algorithm selection for SAT." Journal of Artificial

30 SAT competition 2013 potential-visualisation Single Best Solver: solves 231/300 instances Virtual Best Solver: solves 288/300 instances

31 Algorithm selection: off line version Learn a selector λ from training data o Supervised learning (Nearest neighbours, ) o Regression o... Use the selector as part of the new solver

32 Regression-based classification Select for each instance the algorithm predicted to be best

33 Algorithm selection: on-line version Core idea: update selection mapping as new data comes in o E.g. learn a better selection mapping over time

34 Online algorithm selection: methodology How to define the strategy? Main challenge: online data is incomplete 1. Supervised learning methods cannot deal well with incomplete data 2. Exploration vs. Exploitation trade-off

35 Online data is incomplete Offline data a 1 a 2 a 3 Best i a 1 i a 1 i a 3 i a 2 Online data a 1 a 2 a 3 Best i 5 50????? i 6? 70???? i 7 80????? i 8?? 120???

36 Online algorithm selection based on regression models

37 Exploration vs. Exploitation trade-off New data is collected only for instances on which the algorithm is predicted to be best o If an algorithm is incorrectly predicted to be bad, this is hard to correct

38 Exploration vs. exploitation: multi-armed bandits GOOD: 5 BAD: 3 GOOD: 1 BAD: 2 GOOD: 2 BAD: 2

39 Exploration vs. Exploitation: methods ε-greedy o select random algorithm with probability ε o Select predicted best otherwise UCB: upper confidence bound o o Calculate the variance on each prediction Select the algorithm with highest value for: mean+variance*λ

40 UCB

41 Experiments: the ASLIB benchmark 21 algorithm selection scenario Each scenario contains: o o Complete performance information of all algorithms on all instances Feature values for all feature-instance combinations Scenario sizes vary o 2-31 algorithms o instances

42 Empirical results summary RQ1: Does processing the online data result in better performance? o Yes: online processing of the data leads to better performance on almost all scenarios RQ2: Does explicitly taking the exploration vs. exploitation trade-off into account improve performance? o No: exploration strategies do not perform well Exploration strategies do not even learn better models than the greedy strategy

43 Empirical results discussion Why does the exploration not help? o Bad exploration strategies o o Large amount of initial training data available Greedy performs a kind of exploration of its own Collects online data for any algorithm that is predicted best on some instances

44 Hans Degroote Conclusions Before developing a new algorithm, check if the existing algorithms are complementary o o Many computational resources => Parallel portfolio Limited computational resources => Algorithm selection The potential of a portfolio can be quantified by comparing the Single Best Solver and the Virtual Best Solver on benchmark data An algorithm selection method can be improved while in use, by processing the online generated data

45 Hans Degroote Current/Future work Apply online algorithm selection to an OR problem (GAP) o With unlimited instance generation instead of benchmark data Test more clever exploration methods o From bandit literature Test alternative regression models o o Updatable models Gaussian processes

46 Hans Degroote KU Leuven campus KULAK (Belgium) Supervised by Patrick De Causmaecker

Data Science meets Optimization. ODS 2017 EURO Plenary Patrick De Causmaecker EWG/DSO EURO Working Group CODeS KU Leuven/ KULAK

Data Science meets Optimization ODS 2017 EURO Plenary Patrick De Causmaecker EWG/DSO EURO Working Group CODeS KU Leuven/ KULAK ODS 2017, Sorrento www.fourthparadigm.com Copyright 2009 Microsoft Corporation