High performance computing and the simplex method

Similar documents
Parallelizing the dual revised simplex method

Computational issues in linear programming

Matrix-free IPM with GPU acceleration

Aim. Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity. Structure and matrix sparsity: Overview

J.A.J.Hall, K.I.M.McKinnon. September 1996

Julian Hall School of Mathematics University of Edinburgh. June 15th Parallel matrix inversion for the revised simplex method - a study

NATCOR Convex Optimization Linear Programming 1

What is linear programming (LP)? NATCOR Convex Optimization Linear Programming 1. Solving LP problems: The standard simplex method

High performance simplex solver. Qi Huangfu

Crash-Starting the Simplex Method

GPU acceleration of the matrix-free interior point method

Parallel distributed-memory simplex for large-scale stochastic LP problems

ASYNPLEX, an asynchronous parallel revised simplex algorithm J. A. J. Hall K. I. M. McKinnon 15 th July 1997 Abstract This paper describes ASYNPLEX, a

Gurobi Guidelines for Numerical Issues February 2017

3 Interior Point Method

Some Advanced Topics in Linear Programming

How to solve QPs with 10 9 variables

Section Notes 5. Review of Linear Programming. Applied Math / Engineering Sciences 121. Week of October 15, 2017

A PRIMAL-DUAL EXTERIOR POINT ALGORITHM FOR LINEAR PROGRAMMING PROBLEMS

Solving Linear Programs Using the Simplex Method (Manual)

The Ascendance of the Dual Simplex Method: A Geometric View

Outline. CS38 Introduction to Algorithms. Linear programming 5/21/2014. Linear programming. Lecture 15 May 20, 2014

DM545 Linear and Integer Programming. Lecture 2. The Simplex Method. Marco Chiarandini

The MIP-Solving-Framework SCIP

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić

Solving Large-Scale Energy System Models

Part 4. Decomposition Algorithms Dantzig-Wolf Decomposition Algorithm

Issues In Implementing The Primal-Dual Method for SDP. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM

Towards a practical simplex method for second order cone programming

Chapter II. Linear Programming

Lecture Notes 2: The Simplex Algorithm

AN EXPERIMENTAL INVESTIGATION OF A PRIMAL- DUAL EXTERIOR POINT SIMPLEX ALGORITHM

Section Notes 4. Duality, Sensitivity, and the Dual Simplex Algorithm. Applied Math / Engineering Sciences 121. Week of October 8, 2018

Parallel Interior Point Solver for Structured Linear Programs

LECTURE 6: INTERIOR POINT METHOD. 1. Motivation 2. Basic concepts 3. Primal affine scaling algorithm 4. Dual affine scaling algorithm

DEGENERACY AND THE FUNDAMENTAL THEOREM

Outline. Column Generation: Cutting Stock A very applied method. Introduction to Column Generation. Given an LP problem

Column Generation: Cutting Stock

GAMS and High-Performance Computing

Linear and Integer Programming :Algorithms in the Real World. Related Optimization Problems. How important is optimization?

SDPA Project: Solving Large-scale Semidefinite Programs

COLUMN GENERATION IN LINEAR PROGRAMMING

GPU acceleration of 3D forward and backward projection using separable footprints for X-ray CT image reconstruction

Exploiting Degeneracy in MIP

Math Models of OR: The Simplex Algorithm: Practical Considerations

COT 6936: Topics in Algorithms! Giri Narasimhan. ECS 254A / EC 2443; Phone: x3748

Welcome to the Webinar. What s New in Gurobi 7.5

Advanced Operations Research Techniques IE316. Quiz 2 Review. Dr. Ted Ralphs

MPC Toolbox with GPU Accelerated Optimization Algorithms

The Gurobi Optimizer. Bob Bixby

arxiv: v1 [cs.dc] 21 Feb 2018

Fundamentals of Integer Programming

On the limits of (and opportunities for?) GPU acceleration

Addressing degeneracy in the dual simplex algorithm using a decompositon approach

Linear & Integer Programming: A Decade of Computation

A Dual Variant of. Benson s Outer Approximation Algorithm. for Multiple Objective Linear Programs

Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation

The Simplex Algorithm. Chapter 5. Decision Procedures. An Algorithmic Point of View. Revision 1.0

Ellipsoid Algorithm :Algorithms in the Real World. Ellipsoid Algorithm. Reduction from general case

1. Introduction. Consider the linear programming (LP) problem in the standard. minimize subject to Ax = b, x 0,

Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding

Simulation. Lecture O1 Optimization: Linear Programming. Saeed Bastani April 2016

On the selection of Benders cuts

Convex Optimization M2

Linear Optimization and Extensions: Theory and Algorithms

A New Product Form of the Inverse

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)

Linear Programming. Linear programming provides methods for allocating limited resources among competing activities in an optimal way.

Research Interests Optimization:

Introduction to Mathematical Programming IE496. Final Review. Dr. Ted Ralphs

GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013

VARIANTS OF THE SIMPLEX METHOD

Automatic Structure Detection in Constraints of Tabular Data

PARALLEL OPTIMIZATION

Pivot and Gomory Cut. A MIP Feasibility Heuristic NSERC

Analysis and Optimization of Power Consumption in the Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms

A Sparse QP-Solver Implementation in CGAL. Yves Brise,

Programming, numerics and optimization

David G. Luenberger Yinyu Ye. Linear and Nonlinear. Programming. Fourth Edition. ö Springer

On the Global Solution of Linear Programs with Linear Complementarity Constraints

25. NLP algorithms. ˆ Overview. ˆ Local methods. ˆ Constrained optimization. ˆ Global methods. ˆ Black-box methods.

Modern GPUs (Graphics Processing Units)

Reliable Outer Bounds for the Dual Simplex Algorithm with Interval Right-hand Side

The Simplex Algorithm

Solving Linear and Integer Programs

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Fast-Lipschitz Optimization

Introduction. Linear because it requires linear functions. Programming as synonymous of planning.

Graphs that have the feasible bases of a given linear

A GPU Sparse Direct Solver for AX=B

MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures

Improved Gomory Cuts for Primal Cutting Plane Algorithms

A hard integer program made easy by lexicography

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 36

Discrete Optimization. Lecture Notes 2

An Extension of the Multicut L-Shaped Method. INEN Large-Scale Stochastic Optimization Semester project. Svyatoslav Trukhanov

Block Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations

Lec13p1, ORF363/COS323

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM

Transcription:

Julian Hall, Qi Huangfu and Edmund Smith School of Mathematics University of Edinburgh 12th April 2011

The simplex method for LP Not... Nonlinear programming... Integer programming... Stochastic programming...... but methods for all three depend on it! 1

Overview LP problems and the simplex method Three approaches to exploiting HPC Conclusions 2

Linear programming (LP) minimize f = c T x subject to Ax = b x 0 Fundamental model in optimal decision-making Solution techniques Simplex method (1947) Interior point methods (1984 date) Large problems have 10 3 10 7 variables 10 3 10 7 constraints Matrix A is (usually) sparse STAIR: 356 rows, 467 columns and 3856 nonzeros 3

Mathematics of LP minimize f = c T x subject to Ax = b x 0 (P ) Geometry: Feasible points form a convex polyhedron Results: An optimal solution occurs at a vertex At a vertex the variable set can be partitioned as B N and constraints as so B is nonsingular and x N = 0 Dual LP problem: Bx B + Nx N = b maximize f = b T y subject to A T y + s = c s 0 Result: Optimal partition B N for (P ) also solves (D) (D) 4

The reduced LP problem At a vertex, for a partition B N with B nonsingular and x N = 0, the original problem is minimize f = c T N x N + c T B x B subject to N x N + B x B = b x N 0 x B 0. Eliminate x B from the objective to give the reduced LP problem minimize f = s T N x N + ˆf subject to ˆN xn + I x B = ˆb x N 0 x B 0, where ˆb = B 1 b, ˆN = B 1 N, ˆf = c T Bˆb and s N is given by Vertex is optimal x B 0 and s N 0 s T N = ct N ct B ˆN 5

Primal vs dual simplex Finding an optimal partition B N underpins the simplex method Primal simplex method Maintains x B 0 Moves along edges of the feasible region of (P ) Terminates when s N 0 Dual simplex method Maintains s N 0 Moves along edges of the feasible region of (D) Terminates when x B 0 Adaptations of both are required to find initial feasible point 6

Summary: major computational components for simplex implementations Standard simplex method (SSM) Update tableau ˆN := ˆN (1/â pq )â q â T p Revised simplex method (RSM) Operations Form π T p = et p B 1 Form â T p = πt p N Form â q = B 1 a q Inversion of B Distinctive features Vectors e p, a q are always sparse B may be highly reducible B 1 may be sparse Vectors π p, â p and â q may be sparse Efficient implementations must exploit these features H and McKinnon (1998 2005) 7

Why use the simplex method? Hot start makes it generally more efficient for families of LPs Can be better than barrier for some individual LPs Why is dual simplex preferred? Easier to find feasible point of (D) to start Has some efficient algorithmic tricks not available to primal Dual feasibility retained when constraints are added (MIP) Evidence? 8

CPLEX LP solvers applied to standard test problems Dual simplex better than primal Little to choose between dual simplex and barrier with crossover 9

Parallel simplex: why? Moore s law drives core counts per processor, but clock speeds will stabilise Serial performance of simplex is spectacularly good Flop count per iteration is near optimal Number of iterations is near optimal Can t wait for faster serial processors or algorithmic improvement Simplex method must try to exploit parallelism 10

Parallel simplex: immediate scope Standard simplex method Update tableau ˆN := ˆN (1/â pq )â q â T p Level 2 BLAS with ˆN dense so massively data parallel Revised simplex method Operations π T p = et p B 1 and â q = B 1 a q are inherently serial Operation â T p = πt p N is massively data parallel Amdahl s law implies little immediate scope for exploiting data parallelism 11

Parallel simplex: past work Data parallel standard simplex method Good parallel efficiency was achieved Totally uncompetitive with serial revised simplex method without prohibitive resources Data parallel revised simplex method Only immediate parallelism is in forming π T p N When n m, cost of π T p N dominates: significant speed-up was achieved Bixby and Martin (2000) Task parallel revised simplex method Overlap computational components for different iterations Wunderling (1996), H and McKinnon (1995-2005) Modest speed-up was achieved on general sparse LP problems Review: H (2010) 12

Architectures: CPU or GPU or both? Heterogeneous desk-top architectures CPU: Fewer, faster cores Relatively slow memory transfer Welcomes algorithmically complex code Full range of development tools GPU: More, slower cores Relatively fast memory transfer Global communication is expensive/difficult Very limited development tools CPU and GPU: Possibly combine CPU and GPU to harness full computing power Relatively slow memory transfer between CPU and GPU 13

Parallel simplex: three current approaches Data parallel standard simplex method On a GPU (H and Smith) Data parallel revised simplex method Exploit block-angular structure on a CPU (H and Smith) Task parallel revised simplex method Novel algorithmic variant of the dual revised simplex method (H and Qi) 14

Data parallel standard simplex on a GPU Implemented on dual quad-core AMD Opteron 2378 system as i6 NVIDIA GTX285 GPU as i8 Smith (2009 10) For dense LP problems: best results Solver Type HPC Time Iterations Speed (iter/s) gurobi primal RSM serial 1357 16034 12 gurobi dual RSM serial 976 14518 15 i6 primal SSM parallel 4039 288419 79 i8 primal SSM GPU 800 221157 276 May be of value for large dense LP problems (sparse reconstruction?) No hope of beating serial solvers on sparse LP problems Now running with steepest edge and double precision on Tesla C2070 15

Data parallel revised simplex for block angular LP (BALP) problems minimize f = c T x subject to Ax = b x 0 A = A 00 A 01 A 02... A 0r A 11 A 22... A rr Structure The linking rows are [ A 0 A 01 A 02... A 0r ] [ ] A0 The master columns are 0 The diagonal blocks are A 11,..., A rr Origin Occur naturally in (eg) decentralised planning and multicommodity flow BALP structure can be identified in general sparse LPs 16

Data parallel revised simplex for BALP problems: technique Matrices B and N in the revised simplex method inherit structure of A B = B 00 B 01... B 0r B 11... N = N 00 N 01... N 0r N 11... B rr N rr Operations with B and N can exploit structure (eg) Lasdon (1970) [ ] S C Inverting B = using Schur complement W = S CT 1 R R T Exploit block-diagonal structure of T Operating with B 1 Exploit block-diagonal structure of T Operating with N Exploit block-angular structure of N 17

Data parallel revised simplex for BALP problems: results Implemented on dual quad-core AMD Opteron 2378 system by Smith as i7 Base code is highly efficient LP solver Using 8 diagonal blocks and (up to) 8 cores Problem Rows Columns Best speedup cre-b 9648 72447 1.1 stocfor3 16688 15708 1.1 pds-20 33874 105278 1.4 stormg2-125 66186 529317 1.0 deteq27 68672 186928 1.1 ken-18 105127 154699 1.1 pds-80 129181 426278 1.2 Performance is not great: memory bound and use of Schur complement is costly 18

Task parallel dual revised simplex: technique Perform multiple pricing standard simplex suboptimization Primal: Orchard-Hays (1968) Dual: Rosander (1975) Algorithmically Primal: Identify attractive column slice of tableau Dual: Identify attractive row slice of tableau Both perform standard simplex iterations to identify a set of basis changes Computationally Solve systems with multiple RHS Update tableaux Form matrix products with multiple vectors Attractive in the days when memory access was expensive... Primal: Parallel implementations by Wunderling (1996), H and McKinnon (1995-2005) Dual: New, even in serial 19

Task parallel dual revised simplex: results Written by Huangfu (2010) Uses (generally) highly efficient core routines Tested on a dual quad-core AMD Opteron 2378 system One pivotal row per core used 20

Task parallel dual revised simplex: preliminary results pds-06: 9882 rows, 28655 columns and 82269 nonzeros Cores 1 2 4 8 clp dual Major iterations 10266 5049 2543 1253 9808 Total iterations 10266 9625 8820 7616 9808 Solution time (s) 3.76 2.51 2.00 1.52 1.92 Speed (iter/s) 2730 3836 4419 5017 5111 Speed-up of 2.5 leads to it opt-performing clp 21

Task parallel dual revised simplex: preliminary results pds-10: 16559 rows, 48763 columns and 140063 nonzeros Cores 1 2 4 8 clp dual Major iterations 17983 9158 4807 2557 17713 Total iterations 17983 17051 16263 15404 17713 Solution time (s) 12.58 10.01 6.86 5.72 6.61 Speed (iter/s) 1430 1704 2370 2695 2682 Speed-up of 2.2 leads to it opt-performing clp Underlying serial solver is now competitive with clp Awaiting further results for parallel implementation 22

Exploiting both CPU and GPU Heterogenous computing offers many new challenges Computational scheme must limit memory transfer between CPU and GPU Initial experiments with GPU for AθA T x in CPU-based matrix-free IPM Gondzio, H and Smith (2011) Outcome will inform planned CPU+GPU implementation of dual simplex with suboptimization. 23

Conclusions Identified need for simplex method to exploit parallelism Developed prototype high performance simplex solvers Standard simplex GPU solver may be valuable BALP simplex solver of little value Dual simplex solver promises to be valuable Scope for combined CPU+GPU solver should be explored 24

References [1] R. E. Bixby and A. Martin. Parallelizing the dual simplex method. INFORMS Journal on Computing, 12:45 56, 2000. [2] J. A. J. Hall. Towards a practical parallelisation of the simplex method. Computational Management Science, 7(2):139 170, 2010. [3] J. A. J. Hall and K. I. M. McKinnon. PARSMI, a parallel revised simplex algorithm incorporating minor iterations and Devex pricing. In J. Waśniewski, J. Dongarra, K. Madsen, and D. Olesen, editors, Applied Parallel Computing, volume 1184 of Lecture Notes in Computer Science, pages 67 76. Springer, 1996. [4] J. A. J. Hall and K. I. M. McKinnon. ASYNPLEX, an asynchronous parallel revised simplex method algorithm. Annals of Operations Research, 81:27 49, 1998. [5] J. A. J. Hall and K. I. M. McKinnon. Hyper-sparsity in the revised simplex method and how to exploit it. Computational Optimization and Applications, 32(3):259 283, December 2005. [6] W. Orchard-Hays. Advanced Linear programming computing techniques. McGraw-Hill, New York, 1968. 25

[7] R. R. Rosander. Multiple pricing and suboptimization in dual linear programming algorithms. Mathematical Programming Study, 4:108 117, 1975. [8] R. Vuduc, A. Chandramowlishwarany, J. Choi, M. Guney, and A. Shringarpurez. On the limits of GPU acceleration. Not known, 2010. [9] R. Wunderling. Paralleler und objektorientierter simplex. Technical Report TR-96-09, Konrad- Zuse-Zentrum für Informationstechnik Berlin, 1996. 26