Parallel and Distributed Optimization with Gurobi Optimizer

Similar documents
Using Multiple Machines to Solve Models Faster with Gurobi 6.0

Welcome to the Webinar. What s New in Gurobi 7.5

The Gurobi Optimizer. Bob Bixby

State-of-the-Optimization using Xpress-MP v2006

Exploiting Degeneracy in MIP

What's New in Gurobi 7.0

Cloud Branching MIP workshop, Ohio State University, 23/Jul/2014

George Reloaded. M. Monaci (University of Padova, Italy) joint work with M. Fischetti. MIP Workshop, July 2010

Heuristics in Commercial MIP Solvers Part I (Heuristics in IBM CPLEX)

The Heuristic (Dark) Side of MIP Solvers. Asja Derviskadic, EPFL Vit Prochazka, NHH Christoph Schaefer, EPFL

A Parallel Macro Partitioning Framework for Solving Mixed Integer Programs

An Introduction to ODH CPLEX. Alkis Vazacopoulos Robert Ashford Optimization Direct Inc. April 2018

Exact solutions to mixed-integer linear programming problems

Topics. Introduction. Specific tuning/troubleshooting topics "It crashed" Gurobi parameters The tuning tool. LP tuning. MIP tuning

Algorithms II MIP Details

Using ODHeuristics To Solve Hard Mixed Integer Programming Problems. Alkis Vazacopoulos Robert Ashford Optimization Direct Inc.

Mixed Integer Programming Class Library (MIPCL)

What's New In Gurobi 5.0. Dr. Edward Rothberg

The MIP-Solving-Framework SCIP

Assessing Performance of Parallel MILP Solvers

The Gurobi Solver V1.0

What s New in CPLEX Optimization Studio ?

Advanced Use of GAMS Solver Links

LP SCIP NEOS URL. example1.lp 2.1 LP 1. minimize. subject to, bounds, free, general, binary, end. .lp 1 2.2

Gurobi Guidelines for Numerical Issues February 2017

Computational Experience with Parallel Integer Programming using the CHiPPS Framework. INFORMS Annual Conference, San Diego, CA October 12, 2009

Motivation for Heuristics

Solving lexicographic multiobjective MIPs with Branch-Cut-Price

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs

Implementing Scalable Parallel Search Algorithms for Data-Intensive Applications

Software and Tools for HPE s The Machine Project

Computational Integer Programming. Lecture 12: Branch and Cut. Dr. Ted Ralphs

Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware

Solving a Challenging Quadratic 3D Assignment Problem

Primal Heuristics in SCIP

Pivot and Gomory Cut. A MIP Feasibility Heuristic NSERC

Conflict Analysis in Mixed Integer Programming

Algorithms for Decision Support. Integer linear programming models

A Computational Study of Conflict Graphs and Aggressive Cut Separation in Integer Programming

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Outline. Modeling. Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING. 1. Models Lecture 5 Mixed Integer Programming Models and Exercises

Dynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle

How to use your favorite MIP Solver: modeling, solving, cannibalizing. Andrea Lodi University of Bologna, Italy

Primal Heuristics for Branch-and-Price Algorithms

Automated Configuration of MIP solvers

Parallel Processing in Mixed Integer Programming

Alternative GPU friendly assignment algorithms. Paul Richmond and Peter Heywood Department of Computer Science The University of Sheffield

2 TEST: A Tracer for Extracting Speculative Threads

Performance Baseline of Hitachi Data Systems UCP for Oracle

Branching rules revisited

An Introduction to Parallel Programming

Performance Tools for Technical Computing

Performance Baseline of Oracle Exadata X2-2 HR HC

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

Gurobi Implementation

Solving Difficult MIP Problems using GAMS and Condor

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

Addressing degeneracy in the dual simplex algorithm using a decompositon approach

Constraint Branching and Disjunctive Cuts for Mixed Integer Programs

Improving Dual Bound for Stochastic MILP Models Using Sensitivity Analysis

OPTIMIZAÇÃO E DECISÃO 09/10

Comparison of Storage Protocol Performance ESX Server 3.5

Improved Solutions for I/O Provisioning and Application Acceleration


A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

Solving a Challenging Quadratic 3D Assignment Problem

Strategies for Using Algebraic Modeling Languages to Formulate Second-Order Cone Programs

Restrict and Relax Search

SATGPU - A Step Change in Model Runtimes

Rounding and Propagation Heuristics for Mixed Integer Programming

Parallel Algorithms for the Third Extension of the Sieve of Eratosthenes. Todd A. Whittaker Ohio State University

Martin Kruliš, v

Parallel Branch & Bound

Penalty Alternating Direction Methods for Mixed- Integer Optimization: A New View on Feasibility Pumps

Comparisons of Commercial MIP Solvers and an Adaptive Memory (Tabu Search) Procedure for a Class of 0-1 Integer Programming Problems

Linear & Integer Programming: A Decade of Computation

Contents Overview of the Compression Server White Paper... 5 Business Problem... 7

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I

The Need for Speed: Understanding design factors that make multicore parallel simulations efficient

DRS: Advanced Concepts, Best Practices and Future Directions

Primavera Compression Server 5.0 Service Pack 1 Concept and Performance Results

Parallel PIPS-SBB: Multi-Level Parallelism For Stochastic Mixed-Integer Programs

Updating the HPC Bill Punch, Director HPCC Nov 17, 2017

CS61C : Machine Structures

Solving Linear and Integer Programs

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

PARALLEL ID3. Jeremy Dominijanni CSE633, Dr. Russ Miller

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

Parallelizing the dual revised simplex method

RACS: Extended Version in Java Gary Zibrat gdz4

Algorithm Engineering with PRAM Algorithms

Applied Mixed Integer Programming: Beyond 'The Optimum'

EE 4683/5683: COMPUTER ARCHITECTURE

Modern Benders (in a nutshell)

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

Doing It in Parallel (DIP) with The COIN-OR High-Performance Parallel Search Framework (CHiPPS)

A Brief Look at Optimization

15.083J Integer Programming and Combinatorial Optimization Fall Enumerative Methods

A hybrid branch-and-bound approach for exact rational mixed-integer programming

Transcription:

Parallel and Distributed Optimization with Gurobi Optimizer

Our Presenter Dr. Tobias Achterberg Developer, Gurobi Optimization 2

Parallel & Distributed Optimization 3

Terminology for this presentation Parallel computation Distributed computation One computer Multiple processor cores Multiple computers, linked via a network 1 or more processor sockets Part of Gurobi throughout our history MIP branch-and-cut Barrier for LP, QP and SOCP Concurrent optimization Relatively new feature Each independent computer can do parallel computation! 4

Parallel algorithms and hardware Parallel algorithms must be designed around hardware What work should be done in parallel How much communication is required How long will communication take Goal: Make best use of available processor cores 5

Multi-Core Hardware Multi-core CPU Core Core Core Core Computer Memory Bottleneck 6

Distributed Computing Multi-core CPU Core Core Core Core Network Computer Memory Bottleneck Huge bottleneck! 7

How Slow Is Communication? Multi-core CPU Core Core Core Core Network Computer Memory 50GB/s 60ns latency 100MB/s 100us latency Network is ~1000x slower than memory Faster on a supercomputer, but still relatively slow 8

Distributed Algorithms in Gurobi 6.0 3 distributed algorithms in version 6.0 Distributed tuning Distributed concurrent LP (new in 6.0) MIP Distributed MIP (new in 6.0) 9

Distributed Tuning Tuning: MIP has lots of parameters Tuning performs test runs to find better settings Independent solves are obvious candidate for parallelism Distributed tuning a clear win 10x faster on 10 machines Hard to go back once you have tried it 10

Concurrent Optimization 11

Concurrent Optimization Run different algorithms/strategies on different machines/cores First one that finishes wins Nearly ideal for distributed optimization Communication: Send model to each machine Winner sends solution back Concurrent LP: Different algorithms: Primal simplex/dual simplex/barrier Concurrent MIP: Different strategies Default: vary the seed used to break ties Easy to customize via concurrent environments 12

MIPLIB 2010 Testset MIPLIB 2010 test set Set of 361 mixed-integer programming models Collected by academic/industrial committee MIPLIB 2010 benchmark test set Subset of the full set - 87 of the 361 models Those that were solvable by 2010 codes (Solvable set now includes 206 of the 361 models) Notes: Definitely not intended as a high-performance computing test set More than 2/3 solve in less than 100s 8 models solve at the root node ~1/3 solve in fewer than 1000 nodes 13

Distributed Concurrent MIP Results on MIPLIB benchmark set (>1.00x means concurrent MIP is faster): 4 machines vs 1 machine: Runtime Wins Losses Speedup >1s 38 20 1.26x >100s 17 3 1.50x 16 machines vs 1 machine: Runtime Wins Losses Speedup >1s 54 19 1.40x >100s 26 1 2.00x 14

Customizing Concurrent Easy to choose your own settings: Example 2 concurrent MIP solves: Aggressive cuts on one machine Aggressive heuristics on second machine Java example GRBEnv env0 = model.getconcurrentenv(0); GRBEnv env1 = model.getconcurrentenv(1); env0.set(grb.intparam.cuts, 2); env1.set(grb.doubleparam.heuristics, 0.2); model.optimize(); model.discardconcurrentenvs(); Also supported in C++,.NET, Python and C 15

Distributed MIP 16

Distributed MIP Architecture Manager-worker paradigm Manager Send model to all workers Track dual bound and worker node counts Rebalance search tree to put useful load on all workers Distribute feasible solutions Workers Solve MIP nodes Report status and feasible solutions Synchronized deterministically 17

Distributed MIP Phases Racing ramp-up phase Distributed concurrent MIP Solve same problem individually on each worker, using different parameter settings Stop when problem is solved or enough nodes are explored Choose a winner worker that made the most progress Main phase Discard all worker trees except the winner's Collect active nodes from winner, distribute them among now idle workers Periodically synchronize to rebalance load 18

Bad Cases for Distributed MIP Easy problems Why bother with heavy machinery? Small search trees Nothing to gain from parallelism Unbalanced search trees Most nodes sent to workers will be solved immediately and worker will become idle again "neos3" solved with SIP (predecessor of SCIP) Achterberg, Koch, Martin: "Branching Rules Revisited" (2004) 19

Good Cases for Distributed MIP Large search trees Well-balanced search trees Many nodes in frontier lead to large sub-trees "vpm2" solved with SIP (predecessor of SCIP) Achterberg, Koch, Martin: "Branching Rules Revisited" (2004) 20

Performance 21

Three Views of 16 Cores Consider three different tests, all using 16 cores: On a 16-core machine: Run the standard parallel code on all 16 cores Run the distributed code on four 4-core subsets On four 4-way machines: Run the distributed code Which gives the best results? 22

Parallel MIP on 1 Machine Use one 16-core machine: Multi-core CPU Multi-core CPU Memory Memory Computer 23

Distributed MIP on 1 machine Treat one 16-core machine as four 4-core machines: Multi-core CPU Multi-core CPU Memory Memory Computer 24

Distributed MIP on 4 machines Use four 4-core machines Multi-core CPU Multi-core CPU Multi-core CPU Multi-core CPU Memory Memory Memory Memory Computer Computer Computer Computer Network 25

Performance Results Using one 16-core machine (MIPLIB 2010, baseline is 4-core run on the same machine) Config >1s >100s One 16-core 1.57x 2.00x Four 4-core 1.26x 1.82x Better to run one-machine algorithm on 16 cores than treat the machine as four 4-core machines Degradation isn't large, though 26

Performance Results Comparing one 16-core machine against four 4-core machines (MIPLIB 2010, baseline is single-machine, 4-core run) Config >1s >100s One 16-core machine 1.57x 2.00x Four 4-core machines 1.43x 2.09x Given a choice Comparable mean speedups Other factors Cost: four 4-core machines are much cheaper Admin: more work to admin 4 machines 27

Distributed Algorithms in 6.0 MIPLIB 2010 benchmark set Intel Xeon E3-1240v3 (4-core) CPU Compare against 'standard' code on 1 machine Machines >1s >100s Wins Losses Speedup Wins Losses Speedup 2 40 16 1.14x 20 7 1.27x 4 50 17 1.43x 25 2 2.09x 8 53 19 1.53x 25 2 2.87x 16 52 25 1.58x 25 3 3.15x 28

Some Big Wins Model seymour Hard set covering model from MIPLIB 2010 4944 constraints, 1372 (binary) variables, 33K non-zeroes Machines Nodes Time (s) Speedup 1 476,642 9,267-16 1,314,062 1,015 9.1x 32 1,321,048 633 14.6x 29

Some Big Wins Model a1c1s1 lot sizing model from MIPLIB 2010 3312 constraints, 3648 variables (192 binary), 10k non-zeros Machines Nodes Time (s) Speedup 1 3,510,833 17,299-49 9,761,505 1,299 13.3x 30

Distributed Concurrent Versus Distributed MIP MIPLIB 2010 benchmark set (versus 1 machine run): >1s Machines Concurrent Distributed 4 1.26x 1.43x 16 1.40x 1.58x >100s Machines Concurrent Distributed 4 1.50x 2.09x 16 2.00x 3.15x 31

Gurobi Distributed MIP Makes huge improvements in performance possible Mean performance improvements are significant but not huge Some models get big speedups, but many get none Much better than distributed concurrent As effective as adding more cores to one box Effectively exploiting parallelism remains: A difficult problem A focus at Gurobi 32

How To Use Distributed Algorithms in Gurobi? 33

Gurobi Remote Services Install Gurobi Remote Services on worker machines No Gurobi license required on workers Machine listens for Distributed Worker requests Set a few parameters on manager ConcurrentJobs=4 WorkerPool=machine1,machine2,machine3,machine4 No other code changes required Manager must be licensed to use distributed algorithms Gurobi Distributed Add-On Enables up to 100 workers 34

Integral Part of Product Built on top of Gurobi Compute Server Only 1500 lines of C code specific to concurrent/distributed MIP Built into the product No special binaries involved Bottom line: Changes to MIP solver automatically apply to distributed code too Performance gains in regular MIP also benefit distributed MIP Distributed MIP will evolve with regular MIP 35

Licensing Commercial Not included must purchase the distributed option Ask your sales representative for benchmarks or pricing Academic Named-user: not included in licenses from Gurobi website Site license: includes distributed parallel algorithms 36

Questions Please use the Question box to ask questions to the speaker