Data Knowledge and Optimization. Patrick De Causmaecker CODeS-imec Department of Computer Science KU Leuven/ KULAK
|
|
- Spencer Hood
- 5 years ago
- Views:
Transcription
1 Data Knowledge and Optimization Patrick De Causmaecker CODeS-imec Department of Computer Science KU Leuven/ KULAK
2 Whishes from West-Flanders, Belgium
3 10 rules for success in supply chain mngmnt & logistics systems (Ratliff, 2013) 1. Objectives - quantified, measurable 2. Models - faithful, essential 3. Variability - explicitly considered 4. Data - accurate, timely, and comprehensive 5. Integration - fully automated data transfer 6. Delivery - support execution, management and control 7. Algorithms - exploit individual problem structure 8. People - expertise in models, data, and optimization 9. Process - support optimization, continuously improve 10.ROI - provable, cost of technology, people and operations
4 10 rules for success in supply chain management & logistics systems 1. Objectives - quantified, measurable 2. Models - faithful, essential 3. Variability - explicitly considered 4. Data - accurate, timely, and comprehensive 5. Knowledge of (domain) experts 6. Integration - fully automated data transfer 7. Delivery - support execution, management and control 8. Algorithms - exploit individual problem structure 9. People - expertise in models, data, and optimization 10.Process - support optimization, continuously improve 11.ROI - provable, cost of technology, people and operations
5 Subjects Machine learning and optimization -> computing Configuration, tuning, construction Bayesian optimization Formal languages Large graphs, classification, pattern mining Matrix factorization Adaptive algorithms and hyperheuristics Algorithm selection (on line learning) (*) Algorithm design (*)
6 Characterization of neighborhood behaviours in a multi-neighborhood local search algorithm Nguyen Dang, Patrick De Causmaecker KU Leuven, Belgium CODeS, ITEC-iMinds
7 Context A multi-neighborhood local search for the Swap-body Vehicle Routing problem Authors: Jan Christiaens, Tony Wauters, Túlio Toffolo, Sam Van Malderen Winner of the Verolog Solver Challenge
8 A multi-neighborhood local search for the Swap-body Vehicle Routing problem Neighborhood types (18) Cheapest insertion Swap Intra-route 2-opt Inter-route 2-opt Change swap location Merge routes Split to sub-routes Ruin recreate Remove route Remove sub-route Remove sub-route with cheapest insertion Remove chains EachSequenceCheapestInsert Convert to route Convert to sub-route Add sub-route Ejection chain P 8
9 A multi-neighborhood local search for the Swap-body Vehicle Routing problem Neighborhoods (42) Cheapest insertion (11) Swap (1) Intra-route 2-opt (1) Inter-route 2-opt (1) Change swap location (1) Merge routes (1) Split to sub-routes (1) Ruin recreate (2) Remove route (1) Remove sub-route (1) Remove sub-route with cheapest insertion (1) Remove chains (8) EachSequenceCheapestInsert (3) Convert to route (1) Convert to sub-route (1) Add sub-route (1) Ejection chain (6) 9
10 A multi-neighborhood local search for the Swap-body Vehicle Routing problem An Iterated Local Search algorithm s* = Late Acceptance Hill Climbing(s 0 ) while (time <= time-limit) s = Perturbate(s*) s = Late Acceptance Hill Climbing(s) if (f(s ) < f(s*)) s* = s 10
11 A multi-neighborhood local search for the Swap-body Vehicle Routing problem Late Acceptance Hill Climbing s = s* = s0 while (#iteration-without-improvement <= itwi) s = neighbor(s) if f(s ) <= f(s) or f(s ) <= f(solution-of-listsize-step-before) s = s Update s* Neighborhood N 1 with weight w 1 Neighborhood N 2 with weight w 2 Neighborhood N 3 with weight w 3. w 1 + w w n = 1 A 11
12 Research question Groups of similar neighborhoods? 1. Help algorithm designers to understand the behaviour of these neighbourhoods better 2. Characterize each neighborhood behaviours as a feature vector (based on information collected from different algorithm runs) 3. Clustering neighborhoods N 12
13 Observables For each neighborhood instance combination Probability improve, worsen or nothing Magnitudes of improvement and worsening Running time (for tie-breaking) Collect statistics during testrun Mustafa Mısır, B., Stephanus Daniel Handoko, and Hoong Chuin Lau. "OSCAR: Online Selection of Algorithm Portfolios with Case Study on Memetic Algorithms." Learning and Intelligent Optimization:
14 Characterize neighborhood s behaviours Merge route Cheapest insertion 2 r nothing r worsen r improve A Remove route Cheapest insertion 25 X-axis: quality of the current solution 14
15 Solution quality regions and visits interval i 1 interval i intervals upper bound lower bound easy to reach, easy to escape easy to reach, hard to escape hard to reach 15
16 Feature Step 3: characterize each neighborhood as a feature vector Aggregate collected information into regions Probability of improvement (I), worsening (W), doing nothing (SN) on region f: r improve f = ni k / niters k k f k f r worsen f = nw k / niters k k f k f r nothing f = 1 - r improve f - r worsen f 16
17 Feature vectors Per region Probabilities by size Ammounts by a ranking procedure Kolde, R., Laur, S., Adler, P., & Vilo, J. (2012). Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics, 28(4),
18 Clustering neighborhoods on similarity Neighbourhood characteristics #instances #regions #observables elements High-dimensional low-sample size (42 individuals, 150 dimensions) 18
19 Clustering neighborhoods High-dimensional low-sample size clustering method: HDDC (Berge et al 2012) Apply isometric log-transformation (Aitchison 1986) on r improve, r worsen, r nothing (compositional data) before clustering Bergé, L., Bouveyron, C., & Girard, S. (2012). HDclassif: An R package for model-based clustering and discriminant analysis of high-dimensional data. Aitchison, J. (1986). The statistical analysis of compositional data. 19
20 Clustering neighborhoods Clustering result: 9 clusters Ejection-chain 3, 4, 5; Remove-chain 1, 2, 3, 6, 7, 8; Remove-sub-route-with-cheapest-insertion; Swap; Inter-route-two-opt Cheapest-insertion 10, 15, 20, 25, 35, 50; Each-sequence-cheapest-insertion (2,5), (4,4), (5,2); Removechain 4 Cheapest-insertion 1, 2, 3, 4, 5 Change-swap-location; Merge-route Add-sub-route; Convert-to-sub-route Ejection-chain 10, 15, 35; Remove-chain 5; Intra-route-two-opt Ruin-recreate 2, 3 Convert-to-route; Remove-sub-route; Remove-route; Split-to-sub-route 20
21 Run a number of algorithm runs on each problem instance Collect log files (accumulable statistics on neighborhoods behaviours) Automated parameter tuning with k (<<42) parameters Characterize and cluster neighborhoods into k (<<42) groups 21
22 Apply to algorithm tuning Three scenarios: Original (42 weight parameters) Basic (28 weight parameters) Clustered (9 weight parameters): all neighborhoods in a cluster have the same weight 22
23 Apply to algorithm tuning original vs clustered : p = basic vs clustered : p =
24 Apply to algorithm tuning Original vs original with identical weights: p = Basic vs basic with identical weights: p = Clustered vs cluster with identical weights: p =
25 Thank you Dang, Nguyen Thi Thanh; De Causmaecker, Patrick. Characterization of neighborhood behaviours in a multineighborhood local search algorithm, Lecture Notes In Computer Science, Springer, Learning and Intelligent OptimizatioN Conference, Italy, 29/5-1/ Data Science meets Optimization o EURO Working Group (workshop)
26 Online Algorithm Selection Learning when to use which Algorithm Hans Degroote KU Leuven campus KULAK (Belgium) Supervised by Patrick De Causmaecker Joined work with Bernd Bischl (LMU Munich) Lars Kotthoff (Univeristy of British Colombia)
27 Overview Algorithm portfolios When are algorithm selection/portfolios useful (+ quantification) The algorithm selection problem The online algorithm selection problem o Exploration vs exploitation trade-off Empirical validation
28 Algorithm selection A portfolio of complementary algorithms exists Computational resources are limited AND Competitions Operational problems (production schedule for the day, delivery schedule for the day ) Sub-problem that needs to be solved often If much time or cores are available: use a parallel portfolio
29 Satzilla SAT competition: o Many algorithms competing and improving over time Idea of Satzilla: on line selection from a portfolio from previous competitions o o Result: dominated the competition (as long as ) Reason: highly complementary set of solvers Xu, Lin, et al. "SATzilla: portfolio-based algorithm selection for SAT." Journal of Artificial
30 SAT competition 2013 potential-visualisation Single Best Solver: solves 231/300 instances Virtual Best Solver: solves 288/300 instances
31 Algorithm selection: off line version Learn a selector λ from training data o Supervised learning (Nearest neighbours, ) o Regression o... Use the selector as part of the new solver
32 Regression-based classification Select for each instance the algorithm predicted to be best
33 Algorithm selection: on-line version Core idea: update selection mapping as new data comes in o E.g. learn a better selection mapping over time
34 Online algorithm selection: methodology How to define the strategy? Main challenge: online data is incomplete 1. Supervised learning methods cannot deal well with incomplete data 2. Exploration vs. Exploitation trade-off
35 Online data is incomplete Offline data a 1 a 2 a 3 Best i a 1 i a 1 i a 3 i a 2 Online data a 1 a 2 a 3 Best i 5 50????? i 6? 70???? i 7 80????? i 8?? 120???
36 Online algorithm selection based on regression models
37 Exploration vs. Exploitation trade-off New data is collected only for instances on which the algorithm is predicted to be best o If an algorithm is incorrectly predicted to be bad, this is hard to correct
38 Exploration vs. exploitation: multi-armed bandits GOOD: 5 BAD: 3 GOOD: 1 BAD: 2 GOOD: 2 BAD: 2
39 Exploration vs. Exploitation: methods ε-greedy o select random algorithm with probability ε o Select predicted best otherwise UCB: upper confidence bound o o Calculate the variance on each prediction Select the algorithm with highest value for: mean+variance*λ
40 UCB
41 Experiments: the ASLIB benchmark 21 algorithm selection scenario Each scenario contains: o o Complete performance information of all algorithms on all instances Feature values for all feature-instance combinations Scenario sizes vary o 2-31 algorithms o instances
42 Empirical results summary RQ1: Does processing the online data result in better performance? o Yes: online processing of the data leads to better performance on almost all scenarios RQ2: Does explicitly taking the exploration vs. exploitation trade-off into account improve performance? o No: exploration strategies do not perform well Exploration strategies do not even learn better models than the greedy strategy
43 Empirical results discussion Why does the exploration not help? o Bad exploration strategies o o Large amount of initial training data available Greedy performs a kind of exploration of its own Collects online data for any algorithm that is predicted best on some instances
44 Hans Degroote Conclusions Before developing a new algorithm, check if the existing algorithms are complementary o o Many computational resources => Parallel portfolio Limited computational resources => Algorithm selection The potential of a portfolio can be quantified by comparing the Single Best Solver and the Virtual Best Solver on benchmark data An algorithm selection method can be improved while in use, by processing the online generated data
45 Hans Degroote Current/Future work Apply online algorithm selection to an OR problem (GAP) o With unlimited instance generation instead of benchmark data Test more clever exploration methods o From bandit literature Test alternative regression models o o Updatable models Gaussian processes
46 Hans Degroote KU Leuven campus KULAK (Belgium) Supervised by Patrick De Causmaecker
Data Science meets Optimization. ODS 2017 EURO Plenary Patrick De Causmaecker EWG/DSO EURO Working Group CODeS KU Leuven/ KULAK
Data Science meets Optimization ODS 2017 EURO Plenary Patrick De Causmaecker EWG/DSO EURO Working Group CODeS KU Leuven/ KULAK ODS 2017, Sorrento www.fourthparadigm.com Copyright 2009 Microsoft Corporation
More informationAn Empirical Study of Per-Instance Algorithm Scheduling
An Empirical Study of Per-Instance Algorithm Scheduling Marius Lindauer, Rolf-David Bergdoll, and Frank Hutter University of Freiburg Abstract. Algorithm selection is a prominent approach to improve a
More informationOutline of the module
Evolutionary and Heuristic Optimisation (ITNPD8) Lecture 2: Heuristics and Metaheuristics Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ Computing Science and Mathematics, School of Natural Sciences University
More informationA Hyper-heuristic based on Random Gradient, Greedy and Dominance
A Hyper-heuristic based on Random Gradient, Greedy and Dominance Ender Özcan and Ahmed Kheiri University of Nottingham, School of Computer Science Jubilee Campus, Wollaton Road, Nottingham, NG8 1BB, UK
More informationAn introduction to multi-armed bandits
An introduction to multi-armed bandits Henry WJ Reeve (Manchester) (henry.reeve@manchester.ac.uk) A joint work with Joe Mellor (Edinburgh) & Professor Gavin Brown (Manchester) Plan 1. An introduction to
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationArtificial Intelligence. Programming Styles
Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.
More informationAn Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification
An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification Flora Yu-Hui Yeh and Marcus Gallagher School of Information Technology and Electrical Engineering University
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationAn Intelligent Hyper-heuristic Framework for CHeSC 2011
An Intelligent Hyper-heuristic Framework for CHeSC 2011 M. Mısır 1,2, K. Verbeeck 1,2, P. De Causmaecker 2, and G. Vanden Berghe 1,2 1 CODeS, KAHO Sint-Lieven {mustafa.misir,katja.verbeeck,greet.vandenberghe}@kahosl.be
More informationAlgorithm Engineering Applied To Graph Clustering
Algorithm Engineering Applied To Graph Clustering Insights and Open Questions in Designing Experimental Evaluations Marco 1 Workshop on Communities in Networks 14. March, 2008 Louvain-la-Neuve Outline
More informationBioinformatics - Lecture 07
Bioinformatics - Lecture 07 Bioinformatics Clusters and networks Martin Saturka http://www.bioplexity.org/lectures/ EBI version 0.4 Creative Commons Attribution-Share Alike 2.5 License Learning on profiles
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationOutlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
More informationCSE 158. Web Mining and Recommender Systems. Midterm recap
CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158
More informationChapter DM:II. II. Cluster Analysis
Chapter DM:II II. Cluster Analysis Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster Analysis DM:II-1
More informationClustering Algorithms for Data Stream
Clustering Algorithms for Data Stream Karishma Nadhe 1, Prof. P. M. Chawan 2 1Student, Dept of CS & IT, VJTI Mumbai, Maharashtra, India 2Professor, Dept of CS & IT, VJTI Mumbai, Maharashtra, India Abstract:
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationnode2vec: Scalable Feature Learning for Networks
node2vec: Scalable Feature Learning for Networks A paper by Aditya Grover and Jure Leskovec, presented at Knowledge Discovery and Data Mining 16. 11/27/2018 Presented by: Dharvi Verma CS 848: Graph Database
More informationMachine Learning in Python. Rohith Mohan GradQuant Spring 2018
Machine Learning in Python Rohith Mohan GradQuant Spring 2018 What is Machine Learning? https://twitter.com/myusuf3/status/995425049170489344 Traditional Programming Data Computer Program Output Getting
More informationFast or furious? - User analysis of SF Express Inc
CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood
More informationData Mining 4. Cluster Analysis
Data Mining 4. Cluster Analysis 4.5 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction DBSCAN Algorithm OPTICS Algorithm DENCLUE Algorithm References Outline Introduction Introduction Density-based
More informationARTIFICIAL INTELLIGENCE (CSCU9YE ) LECTURE 5: EVOLUTIONARY ALGORITHMS
ARTIFICIAL INTELLIGENCE (CSCU9YE ) LECTURE 5: EVOLUTIONARY ALGORITHMS Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Optimisation problems Optimisation & search Two Examples The knapsack problem
More informationMulti-Level Algorithm Selection for ASP
Multi-Level Algorithm Selection for ASP Marco Maratea 1, Luca Pulina 2, and Francesco Ricca 3 1 DIBRIS, University of Genova 2 POLCOMING, University of Sassari 3 DeMaCS, University of Calabria 13th International
More informationK Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat
K Nearest Neighbor Wrap Up K- Means Clustering Slides adapted from Prof. Carpuat K Nearest Neighbor classification Classification is based on Test instance with Training Data K: number of neighbors that
More informationBRACE: A Paradigm For the Discretization of Continuously Valued Data
Proceedings of the Seventh Florida Artificial Intelligence Research Symposium, pp. 7-2, 994 BRACE: A Paradigm For the Discretization of Continuously Valued Data Dan Ventura Tony R. Martinez Computer Science
More informationMODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS
MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS J.I. Serrano M.D. Del Castillo Instituto de Automática Industrial CSIC. Ctra. Campo Real km.0 200. La Poveda. Arganda del Rey. 28500
More informationMoving Beyond Linearity
Moving Beyond Linearity Basic non-linear models one input feature: polynomial regression step functions splines smoothing splines local regression. more features: generalized additive models. Polynomial
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationCOMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions
COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 2: Linear Regression Gradient Descent Non-linear basis functions LINEAR REGRESSION MOTIVATION Why Linear Regression? Simplest
More informationStability of Feature Selection Algorithms
Stability of Feature Selection Algorithms Alexandros Kalousis, Jullien Prados, Phong Nguyen Melanie Hilario Artificial Intelligence Group Department of Computer Science University of Geneva Stability of
More informationBehavioral Data Mining. Lecture 18 Clustering
Behavioral Data Mining Lecture 18 Clustering Outline Why? Cluster quality K-means Spectral clustering Generative Models Rationale Given a set {X i } for i = 1,,n, a clustering is a partition of the X i
More informationK-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection
K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection Zhenghui Ma School of Computer Science The University of Birmingham Edgbaston, B15 2TT Birmingham, UK Ata Kaban School of Computer
More informationRandom Forest A. Fornaser
Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University
More informationCase-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More informationCS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More informationDecomposition and Local Search Based Methods for the Traveling Umpire Problem
Decomposition and Local Search Based Methods for the Traveling Umpire Problem Tony Wauters Sam Van Malderen Greet Vanden Berghe CODeS, Department of Computer Science, KU Leuven, Gebroeders De Smetstraat
More informationPredicting Popular Xbox games based on Search Queries of Users
1 Predicting Popular Xbox games based on Search Queries of Users Chinmoy Mandayam and Saahil Shenoy I. INTRODUCTION This project is based on a completed Kaggle competition. Our goal is to predict which
More informationData Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data
More informationCS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods
+ CS78: Machine Learning and Data Mining Complexity & Nearest Neighbor Methods Prof. Erik Sudderth Some materials courtesy Alex Ihler & Sameer Singh Machine Learning Complexity and Overfitting Nearest
More informationOnline Graph Exploration
Distributed Computing Online Graph Exploration Semester thesis Simon Hungerbühler simonhu@ethz.ch Distributed Computing Group Computer Engineering and Networks Laboratory ETH Zürich Supervisors: Sebastian
More informationCS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University
CS423: Data Mining Introduction Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS423: Data Mining 1 / 29 Quote of the day Never memorize something that
More informationAn Iterated Multi-stage Selection Hyper-heuristic
An Iterated Multi-stage Selection Hyper-heuristic Ahmed Kheiri 1a, Ender Özcana a University of Nottingham School of Computer Science Jubilee Campus, Wollaton Road, Nottingham, NG8 1BB, UK {psxak1,ender.ozcan}@nottingham.ac.uk
More informationa local optimum is encountered in such a way that further improvement steps become possible.
Dynamic Local Search I Key Idea: Modify the evaluation function whenever a local optimum is encountered in such a way that further improvement steps become possible. I Associate penalty weights (penalties)
More informationPerformance Prediction and Automated Tuning of Randomized and Parametric Algorithms
Performance Prediction and Automated Tuning of Randomized and Parametric Algorithms Frank Hutter 1, Youssef Hamadi 2, Holger Hoos 1, and Kevin Leyton-Brown 1 1 University of British Columbia, Vancouver,
More informationApplication-Specific Algorithm Selection
Application-Specific Algorithm Selection Tim Roughgarden (Stanford) joint work with Rishi Gupta 1 Algorithm Selection I need to solve problem X. Which algorithm should I use? Answer usually depends on
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationSelf-Organizing Maps for cyclic and unbounded graphs
Self-Organizing Maps for cyclic and unbounded graphs M. Hagenbuchner 1, A. Sperduti 2, A.C. Tsoi 3 1- University of Wollongong, Wollongong, Australia. 2- University of Padova, Padova, Italy. 3- Hong Kong
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationIntroduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)
Introduction to Data Science What is Analytics and Data Science? Overview of Data Science and Analytics Why Analytics is is becoming popular now? Application of Analytics in business Analytics Vs Data
More informationTopics in Machine Learning
Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur
More informationThe Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem
Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationNorbert Schuff VA Medical Center and UCSF
Norbert Schuff Medical Center and UCSF Norbert.schuff@ucsf.edu Medical Imaging Informatics N.Schuff Course # 170.03 Slide 1/67 Objective Learn the principle segmentation techniques Understand the role
More informationAn Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures
An Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures José Ramón Pasillas-Díaz, Sylvie Ratté Presenter: Christoforos Leventis 1 Basic concepts Outlier
More informationContext Change and Versatile Models in Machine Learning
Context Change and Versatile s in Machine Learning José Hernández-Orallo Universitat Politècnica de València jorallo@dsic.upv.es ECML Workshop on Learning over Multiple Contexts Nancy, 19 September 2014
More informationMulti-label classification using rule-based classifier systems
Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar
More informationCSC411/2515 Tutorial: K-NN and Decision Tree
CSC411/2515 Tutorial: K-NN and Decision Tree Mengye Ren csc{411,2515}ta@cs.toronto.edu September 25, 2016 Cross-validation K-nearest-neighbours Decision Trees Review: Motivation for Validation Framework:
More informationUVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia
UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 11/9/16 1 Rough Plan HW5
More informationReddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011
Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions
More informationSimulated Annealing. Slides based on lecture by Van Larhoven
Simulated Annealing Slides based on lecture by Van Larhoven Iterative Improvement 1 General method to solve combinatorial optimization problems Principle: Start with initial configuration Repeatedly search
More informationLearning the Neighborhood with the Linkage Tree Genetic Algorithm
Learning the Neighborhood with the Linkage Tree Genetic Algorithm Dirk Thierens 12 and Peter A.N. Bosman 2 1 Institute of Information and Computing Sciences Universiteit Utrecht, The Netherlands 2 Centrum
More informationCISC 4631 Data Mining
CISC 4631 Data Mining Lecture 03: Nearest Neighbor Learning Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. R. Mooney (UT Austin) Prof E. Keogh (UCR), Prof. F.
More informationUninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall
Midterm Exam Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Covers topics through Decision Trees and Random Forests (does not include constraint satisfaction) Closed book 8.5 x 11 sheet with notes
More informationBeyond Sliding Windows: Object Localization by Efficient Subwindow Search
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search Christoph H. Lampert, Matthew B. Blaschko, & Thomas Hofmann Max Planck Institute for Biological Cybernetics Tübingen, Germany Google,
More informationSketchable Histograms of Oriented Gradients for Object Detection
Sketchable Histograms of Oriented Gradients for Object Detection No Author Given No Institute Given Abstract. In this paper we investigate a new representation approach for visual object recognition. The
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationFast Downward Cedalion
Fast Downward Cedalion Jendrik Seipp and Silvan Sievers Universität Basel Basel, Switzerland {jendrik.seipp,silvan.sievers}@unibas.ch Frank Hutter Universität Freiburg Freiburg, Germany fh@informatik.uni-freiburg.de
More informationCLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16
CLUSTERING CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16 1. K-medoids: REFERENCES https://www.coursera.org/learn/cluster-analysis/lecture/nj0sb/3-4-the-k-medoids-clustering-method https://anuradhasrinivas.files.wordpress.com/2013/04/lesson8-clustering.pdf
More informationDimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationMachine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017
Machine Learning in the Wild Dealing with Messy Data Rajmonda S. Caceres SDS 293 Smith College October 30, 2017 Analytical Chain: From Data to Actions Data Collection Data Cleaning/ Preparation Analysis
More informationData Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationCAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification
CAMCOS Report Day December 9 th, 2015 San Jose State University Project Theme: Classification On Classification: An Empirical Study of Existing Algorithms based on two Kaggle Competitions Team 1 Team 2
More informationK-Nearest Neighbour (Continued) Dr. Xiaowei Huang
K-Nearest Neighbour (Continued) Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ A few things: No lectures on Week 7 (i.e., the week starting from Monday 5 th November), and Week 11 (i.e., the week
More informationSLS Methods: An Overview
HEURSTC OPTMZATON SLS Methods: An Overview adapted from slides for SLS:FA, Chapter 2 Outline 1. Constructive Heuristics (Revisited) 2. terative mprovement (Revisited) 3. Simple SLS Methods 4. Hybrid SLS
More informationA study of classification algorithms using Rapidminer
Volume 119 No. 12 2018, 15977-15988 ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu A study of classification algorithms using Rapidminer Dr.J.Arunadevi 1, S.Ramya 2, M.Ramesh Raja
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationSimple algorithm portfolio for SAT
Artif Intell Rev DOI 10.1007/s10462-011-9290-2 Simple algorithm portfolio for SAT Mladen Nikolić Filip Marić PredragJaničić Springer Science+Business Media B.V. 2011 Abstract The importance of algorithm
More informationExploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center
Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation Outline IBM OpenPower Platform Accelerating
More informationMTTTS17 Dimensionality Reduction and Visualization. Spring 2018 Jaakko Peltonen. Lecture 11: Neighbor Embedding Methods continued
MTTTS17 Dimensionality Reduction and Visualization Spring 2018 Jaakko Peltonen Lecture 11: Neighbor Embedding Methods continued This Lecture Neighbor embedding by generative modeling Some supervised neighbor
More informationBudgetedSVM: A Toolbox for Scalable SVM Approximations
Journal of Machine Learning Research 14 (2013) 3813-3817 Submitted 4/13; Revised 9/13; Published 12/13 BudgetedSVM: A Toolbox for Scalable SVM Approximations Nemanja Djuric Liang Lan Slobodan Vucetic 304
More informationModel learning for robot control: a survey
Model learning for robot control: a survey Duy Nguyen-Tuong, Jan Peters 2011 Presented by Evan Beachly 1 Motivation Robots that can learn how their motors move their body Complexity Unanticipated Environments
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationGPUML: Graphical processors for speeding up kernel machines
GPUML: Graphical processors for speeding up kernel machines http://www.umiacs.umd.edu/~balajiv/gpuml.htm Balaji Vasan Srinivasan, Qi Hu, Ramani Duraiswami Department of Computer Science, University of
More informationLecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017
Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last
More informationPackage llama. R topics documented: July 11, Type Package
Type Package Package llama July 11, 2018 Title Leveraging Learning to Automatically Manage Algorithms Version 0.9.2 Date 2018-07-11 Author Lars Kotthoff [aut,cre], Bernd Bischl [aut], Barry Hurley [ctb],
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationMetric learning approaches! for image annotation! and face recognition!
Metric learning approaches! for image annotation! and face recognition! Jakob Verbeek" LEAR Team, INRIA Grenoble, France! Joint work with :"!Matthieu Guillaumin"!!Thomas Mensink"!!!Cordelia Schmid! 1 2
More informationActive Appearance Models
Active Appearance Models Edwards, Taylor, and Cootes Presented by Bryan Russell Overview Overview of Appearance Models Combined Appearance Models Active Appearance Model Search Results Constrained Active
More informationSequential Model-based Optimization for General Algorithm Configuration
Sequential Model-based Optimization for General Algorithm Configuration Frank Hutter, Holger Hoos, Kevin Leyton-Brown University of British Columbia LION 5, Rome January 18, 2011 Motivation Most optimization
More informationHybridization EVOLUTIONARY COMPUTING. Reasons for Hybridization - 1. Naming. Reasons for Hybridization - 3. Reasons for Hybridization - 2
Hybridization EVOLUTIONARY COMPUTING Hybrid Evolutionary Algorithms hybridization of an EA with local search techniques (commonly called memetic algorithms) EA+LS=MA constructive heuristics exact methods
More informationSUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018
SUPERVISED LEARNING METHODS Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018 2 CHOICE OF ML You cannot know which algorithm will work
More informationMachine Learning in the Process Industry. Anders Hedlund Analytics Specialist
Machine Learning in the Process Industry Anders Hedlund Analytics Specialist anders@binordic.com Artificial Specific Intelligence Artificial General Intelligence Strong AI Consciousness MEDIA, NEWS, CELEBRITIES
More information