The ALAMO approach to machine learning: Best subset selection, adaptive sampling, and constrained regression

Similar documents
Surrogate models and the optimization of systems in the absence of algebraic models

ALAMO: Automatic Learning of Algebraic Models for Optimization

DERIVATIVE-FREE OPTIMIZATION ENHANCED-SURROGATE MODEL DEVELOPMENT FOR OPTIMIZATION. Alison Cozad, Nick Sahinidis, David Miller

ALAMO user manual and installation guide v

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

Learning surrogate models for simulation-based optimization

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Stochastic Separable Mixed-Integer Nonlinear Programming via Nonconvex Generalized Benders Decomposition

Reduced Order Models for Oxycombustion Boiler Optimization

The ALAMO approach to machine learning

The Efficient Modelling of Steam Utility Systems

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy

Nonparametric Methods Recap

The AIMMS Outer Approximation Algorithm for MINLP

The AIMMS Outer Approximation Algorithm for MINLP

Webinar. Machine Tool Optimization with ANSYS optislang

Comparison of Some High-Performance MINLP Solvers

New developments in LS-OPT

NUMERICAL INVESTIGATION OF THE FLOW BEHAVIOR INTO THE INLET GUIDE VANE SYSTEM (IGV)

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

ALAMO user manual and installation guide v

Embedding Formulations, Complexity and Representability for Unions of Convex Sets

Optimization and Probabilistic Analysis Using LS-DYNA

What is machine learning?

A Nonlinear Presolve Algorithm in AIMMS

Applying Supervised Learning

Metaheuristic Optimization with Evolver, Genocop and OptQuest

The Automation of the Feature Selection Process. Ronen Meiri & Jacob Zahavi

Introduction to ANSYS DesignXplorer

STA121: Applied Regression Analysis

I How does the formulation (5) serve the purpose of the composite parameterization

Variable Selection 6.783, Biomedical Decision Support

Machine Learning / Jan 27, 2010

Lecture 13: Model selection and regularization

Lecture 7: Support Vector Machine

MINLP applications, part II: Water Network Design and some applications of black-box optimization

Network Traffic Measurements and Analysis

Operations Research and Optimization: A Primer

Speed and Accuracy of CFD: Achieving Both Successfully ANSYS UK S.A.Silvester

Design of a control system model in SimulationX using calibration and optimization. Dynardo GmbH

Flow and Heat Transfer in a Mixing Elbow

Feature Subset Selection for Logistic Regression via Mixed Integer Optimization

ALGEBRAIC MODELS: Algorithms, software,

Compressible Flow in a Nozzle

Computer Experiments. Designs

Review of Mixed-Integer Nonlinear and Generalized Disjunctive Programming Methods

Optimization Methods for Machine Learning (OMML)

Parameter based 3D Optimization of the TU Berlin TurboLab Stator with ANSYS optislang

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

LaGO. Ivo Nowak and Stefan Vigerske. Humboldt-University Berlin, Department of Mathematics

Lasso Regression: Regularization for feature selection

Tutorial 1. Introduction to Using FLUENT: Fluid Flow and Heat Transfer in a Mixing Elbow

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Using Machine Learning to Optimize Storage Systems

Lecture on Modeling Tools for Clustering & Regression

Investigation of mixing chamber for experimental FGD reactor

DETERMINISTIC OPERATIONS RESEARCH

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

First Steps - Conjugate Heat Transfer

The exam is closed book, closed notes except your one-page cheat sheet.

Design optimization and design exploration using an open source framework on HPC facilities Presented by: Joel GUERRERO

Accurate Thermo-Fluid Simulation in Real Time Environments. Silvia Poles, Alberto Deponti, EnginSoft S.p.A. Frank Rhodes, Mentor Graphics

Machine Learning Techniques for Data Mining

Multivariate Analysis Multivariate Calibration part 2

CFD Simulation of a dry Scroll Vacuum Pump including Leakage Flows

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

Classification and Regression Trees

Index. ,a-possibility efficient solution, programming, 275

Reliability - Based Robust Design Optimization of Centrifugal Pump Impeller for Performance Improvement considering Uncertainties in Design Variable

Isotropic Porous Media Tutorial

Optimization with Multiple Objectives

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Coupled Analysis of FSI

Adjoint Solver Workshop

"optislang inside ANSYS Workbench" efficient, easy, and safe to use Robust Design Optimization (RDO) - Part I: Sensitivity and Optimization

Sensitivity and optimization methods applied to the dynamic fuel cycle

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

Complexity Challenges to the Discovery of Relationships in Eddy Current Non-destructive Test Data

Simulation of Flow Development in a Pipe

Combinatorial optimization and its applications in image Processing. Filip Malmberg

Optimization Techniques for Design Space Exploration

Analysis and optimization methods of graph based meta-models for data flow simulation

ENERGY-224 Reservoir Simulation Project Report. Ala Alzayer

Dr.-Ing. Johannes Will CAD-FEM GmbH/DYNARDO GmbH dynamic software & engineering GmbH

5 Learning hypothesis classes (16 points)

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

Driven Cavity Example

Estimating Data Center Thermal Correlation Indices from Historical Data

Kratos Multi-Physics 3D Fluid Analysis Tutorial. Pooyan Dadvand Jordi Cotela Kratos Team

1.2 Numerical Solutions of Flow Problems

Slides for Data Mining by I. H. Witten and E. Frank

COMPUTATIONAL FLUID DYNAMICS USED IN THE DESIGN OF WATERBLAST TOOLING

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Introduction to Multiobjective Optimization

NEW FEATURES IN CHEMCAD VERSION 6: VERSION RELEASE NOTES. CHEMCAD New Features and Enhancements. CHEMCAD Maintenance

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Linear Methods for Regression and Shrinkage Methods

Modeling Flow Through Porous Media

Transcription:

The ALAMO approach to machine learning: Best subset selection, adaptive sampling, and constrained regression Nick Sahinidis Acknowledgments: Alison Cozad, David Miller, Zach Wilson

MACHINE LEARNING PROBLEM Build a model of output variables as a function of input variables x over a specified interval Process simulation or Experiment Independent variables: Operating conditions, inlet flow properties, unit geometry Dependent variables: Efficiency, outlet flow conditions, conversions, heat flow, etc. 2

SIMULATION OPTIMIZATION Pulverized coal plant Aspen Plus simulation provided by the National Energy Technology Laboratory 3

CHALLENGES No algebraic model Complex process alternatives OPTIMIZER Cost $ 1 reactor 2 reactors 3 reactors Reactor size SIMULATOR Costly simulations minutes seconds hours Scarcity of fully robust simulations X Gradient based methods X Derivative free methods 4

PROCESS DISAGGREGATION Block 1: Simulator Model generation Block 2: Simulator Model generation Block 3: Simulator Model generation Process Simulation Disaggregate process into process blocks Surrogate Models Build simple and accurate models with a functional form tailored for an optimization framework Optimization Model Add algebraic constraints design specs, heat/mass balances, and logic constraints 5

DESIRED MODEL ATTRIBUTES 1. Accurate We want to reflect the true nature of the system 2. Simple Usable for algebraic optimization Interpretable 3. Generated from a minimal data set Reduce experimental and simulation requirements 4. Obeys physics and user insights Increase fidelity and validity in regions with no measurements 6

ALAMO Automated Learning of Algebraic MOdels Start Initial sampling Build surrogate model Update training data set false Adaptive sampling Model converged? true Stop Model error Black-box function New model Current model 7

MODEL COMPLEXITY TRADEOFF Kriging [Krige, 63] Neural nets [McCulloch-Pitts, 43] Radial basis functions [Buhman, 00] Model accuracy Preferred region Linear response surface Model complexity 8

MODEL IDENTIFICATION Identify the functional form and complexity of the surrogate models Seek models that are combinations of basis functions 1. Simple basis functions 2. Radial basis functions for non parametric regression 3. User specified basis functions for tailored regression 9

OVERFITTING AND TRUE ERROR Step 1: Define a large set of potential basis functions Step 2: Model reduction Select subset that optimizes an information criterion Error Ideal Model Complexity True error Empirical error Underfitting Overfitting 10

MODEL REDUCTION TECHNIQUES Qualitative tradeoffs of model reduction methods Stepwise regression [Efroymson, 60] Backward elimination [Oosterhof, 63] Forward selection [Hamaker, 62] Best subset methods Enumerate all possible subsets Regularized regression techniques Penalize the least squares objective using the magnitude of the regressors 11

MODEL SELECTION CRITERIA Balance fit (sum of square errors) with model complexity (number of terms in the model; denoted by p) Corrected Akaike information criterion log 1 2 Mallows Cp 2 Hannan Quinn information criterion log 1 Bayes information criterion Mean squared error 2 1 1 2 log log log 1 Mixed integer nonlinear programming models 12

BRANCH AND REDUCE Constraint propagation and duality based bounds tightening Ryoo and Sahinidis, 1995, 1996 Tawarmalani and Sahinidis, 2004 Finite branching rules Shectman and Sahinidis, 1998 Ahmed, Tawarmalani and Sahinidis, 2004 Convexification Tawarmalani and Sahinidis, 2001, 2002, 2004, 2005 Khajavirad and Sahinidis, 2012, 2013, 2014, 2016 Zorn and Sahinidis, 2013, 2013, 2014 Implemented in BARON First deterministic global optimization solver for NLP and MINLP 13

GLOBAL MINLP SOLVERS ON MINLPLIB2 CAN_SOLVE BARON SCIP ANTIGONE LINDOGLOBAL COUENNE Con: 1893 (1 164,321), Var: 1027 (3 107,223), Disc: 137 (1 31,824) 14

CPU TIME COMPARISON OF METRICS Eight benchmarks from the UCI and CMU data sets Seventy noisy data sets were generated with multicolinearity and increasing problem size (number of bases) Cp BIC AIC, MSE, HQC 100000 CPU time (s) 10000 1000 100 10 1 BIC solves more than two orders of magnitude faster than AIC, MSE and HQC 0.1 0.01 20 30 40 50 60 70 80 Problem Size 15

MODEL QUALITY COMPARISON BIC leads to smaller, more accurate models Larger penalty for model complexity Spurious Variables Included Cp BIC AIC HQC MSE 25 20 15 10 5 0 20 30 40 50 60 70 80 Problem Size % Deviation from True Model 10 9 8 7 6 5 4 3 2 1 0 20 30 40 50 60 70 80 Problem Size 16

ALAMO Automated Learning of Algebraic MOdels Start Initial sampling Build surrogate model Update training data set Adaptive sampling Model error false Model converged? true Stop Error maximization sampling 17

ERROR MAXIMIZATION SAMPLING Search the problem space for areas of model inconsistency or model mismatch Find points that maximize the model error with respect to the independent variables Surrogate model Optimized using derivative free solver SNOBFIT (Huyer and Neumaier, 2008) SNOBFIT outperforms most derivative free solvers (Rios and Sahinidis, 2013) 18

CONSTRAINED REGRESSION Fits the true function better Fits the data better Extrapolation zone Data space Safe extrapolation 19

CONSTRAINED REGRESSION Standard regression Constrained regression easy Surrogate model tough Challenging due to the semi infinite nature of the regression constraints Use intuitive restrictions among predictor and response variables to infer nonintuitive relationships between regression parameters 20

IMPLIED PARAMETER RESTRICTIONS Step 1: Formulate constraint in z and x space Step 2: Identify a sufficient set of β space constraints 1 parametric constraint 4 β constraints Global optimization problems solved with BARON 21

TYPES OF RESTRICTIONS Response bounds Individual responses Multiple responses pressure, temperature, compositions mass and energy balances, physical limitations mass balances, sum toone, state variables Response derivatives Alternative domains Boundary conditions Extrapolation zone velocity profile model monotonicity, numerical properties, convexity Problem Space safe extrapolation, boundary conditions Add no slip 22

CATEGORICAL CORRELATED 23

CATEGORICAL CORRELATED 24

GROUP CONSTRAINTS A collection of basis functions or groups of basis functions is referred to as a group A group is selected iff any of its components is selected Group constraints NMT: no more than a specified number of components is selected from a group ATL: at least a specified number of components is selected from a group REQ: if a group is selected, another group must also be selected XCL: if a group is selected, another group must not be selected 25

ALAMO TUTORIAL AND APPLICATIONS 26 of 76

DOWNLOAD From The Optimization Firm at http://minlp.com ZIP archives of Windows, Linux, and OSX versions Free licenses for academics and CAPD companies 27

WORKING WITH PRE EXISTING DATA SETS 28 of 76

EXAMPLE ALAMO INPUT FILE 128 alternative models 29

ALAMO OUTPUT 30

SIMPLE AND ACCURATE MODELS Modeling type, Median more complexity than required Number of terms in the surrogate model _ Number of terms in the true function Results over a test set of 45 known functions treated as black boxes with bases that are available to all modeling methods. 31

WORKING WITH A SIMULATOR 32 of 76

BUBBLING FLUIDIZED BED ADSORBER Outlet gas Solid feed Cooling water CO 2 rich gas CO 2 rich solid outlet Goal: Optimize a bubbling fluidized bed reactor by Minimizing the increased cost of electricity Maximizing CO 2 removal 33

BUBBLING FLUIDIZED BED ADSORBER Cooling water CO 2 rich solid outlet Generate model of % CO 2 removal: Over the Range: 34

POTENTIAL MODEL 66 basis functions Not tractable in an algebraic superstructure formulation! 35

BUILD MODEL Current model form, model surface, and data points 36

ADAPTIVE SAMPLING New data point(s) shown 37

REBUILD MODEL Previous model in mesh 38

REBUILD MODEL Model error(iterations) and AIC(T) shown 39

ADAPTIVE SAMPLING 40

REBUILD MODEL Sufficiently low error, model converges 41

OPTIMAL PARETO CURVE 42

OPTIMAL PARETO CURVE 43

OPTIMAL PARETO CURVE Surrogate results 44

OPTIMAL PARETO CURVE Surrogate results Simulation results 45

SAMPLING EFFICIENCY 70% of problems solved exactly Fraction of problems solved (0.005, 0.80) 80% of the problems had 0.5% error error maximization sampling ALAMO modeler the lasso Ordinary regression Normalized test error ALAMO modeler Ordinary regression the lasso single Latin hypercube Normalized test error Results with 45 known functions with bases that are available to all modeling methods 46

SUPERSTRUCTURE OPTIMIZATION 47 of 76

CARBON CAPTURE SYSTEM DESIGN Surrogate models for each reactor and technology used Discrete decisions: How many units? Parallel trains? What technology used for each reactor? Continuous decisions: Unit geometries Operating conditions: Vessel temperature and pressure, flow rates, compositions 48

BUBBLING FLUIDIZED BED Bubbling fluidized bed adsorber diagram Outlet gas Solid feed Cooling water CO 2 rich gas Model inputs (16 total) Geometry (3) Operating conditions (5) Gas mole fractions (2) Solid compositions (2) Flow rates (4) CO 2 rich solid outlet Model outputs (14 total) Geometry required (2) Operating condition required (1) Gas mole fractions (3) Solid compositions (3) Flow rates (2) Outlet temperatures (3) Model created by Andrew Lee at the National Energy Technology Laboratory 49

EXAMPLE MODELS ADSORBER Cooling water CO 2 rich gas Solid feed 50

SUPERSTRUCTURE OPTIMIZATION Mixed-integer nonlinear programming model Economic model Process model Material balances Hydrodynamic/Energy balances Reactor surrogate models Link between economic model and process model Binary variable constraints Bounds for variables MINLP solved with BARON 51

CONCLUSIONS ALAMO provides algebraic models that are Accurate Simple Generated from a minimal number of data points ALAMO s constrained regression facility allows modeling of Bounds on response variables Variable groups Forthcoming: constraints on gradient of response variables Built on top of state of the art optimization solvers Available from www.minlp.com 52