Solving Hard Integer Programs with MW

Similar documents
The Tera-Gridiron: A Natural Turf for High-Throughput Computing

Outline An Introduction to the Computational Grid. Come on Let s Play the Feud. The Big Board. Jeff Linderoth. Name one common use of the Internet

Improving Bounds on the Football Pool Problem via Symmetry Reduction and High-Throughput Computing

High Throughput Urgent Computing

Grid Mashups. Gluing grids together with Condor and BOINC

Building Campus HTC Sharing Infrastructures. Derek Weitzel University of Nebraska Lincoln (Open Science Grid Hat)

Implementing Scalable Parallel Search Algorithms for Data-Intensive Applications

A Parallel Macro Partitioning Framework for Solving Mixed Integer Programs

Pegasus WMS Automated Data Management in Shared and Nonshared Environments

Grid Compute Resources and Grid Job Management

Work Queue + Python. A Framework For Scalable Scientific Ensemble Applications

TreeSearch User Guide

Day 9: Introduction to CHTC

Shooting for the sky: Testing the limits of condor. HTCondor Week May 2015 Edgar Fajardo On behalf of OSG Software and Technology

Condor-G: HTCondor for grid submission. Jaime Frey (UW-Madison), Jeff Dost (UCSD)

CCB The Condor Connection Broker. Dan Bradley Condor Project CS and Physics Departments University of Wisconsin-Madison

What s new in HTCondor? What s coming? HTCondor Week 2018 Madison, WI -- May 22, 2018

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing

UW-ATLAS Experiences with Condor

MW: A Software Framework for Combinatorial Optimization on Computational Grids

A Fully Automated Faulttolerant. Distributed Video Processing and Off site Replication

Clouds: An Opportunity for Scientific Applications?

Parallelism paradigms

Managing large-scale workflows with Pegasus

Flying HTCondor at 100gbps Over the Golden State

Grid Compute Resources and Job Management

glideinwms Training Glidein Internals How they work and why by Igor Sfiligoi, Jeff Dost (UCSD) glideinwms Training Glidein internals 1

Multi-Level Feedback Queues

Shows nodes and links, Node pair A-B, and a route between A and B.

Lab 12: Lijnenspel revisited

Parallel and High Performance Computing for Stochastic Programming

PROOF-Condor integration for ATLAS

An Enabling Framework for Master-Worker Applications on the Computational Grid

Changing landscape of computing at BNL

BOSCO Architecture. Derek Weitzel University of Nebraska Lincoln

User s Guide to MW. A guide to using MW. The MW team

Regional & National HPC resources available to UCSB

HPC learning using Cloud infrastructure

Solving Difficult MIP Problems using GAMS and Condor

FATCOP 2.0: Advanced Features in an Opportunistic Mixed Integer Programming Solver

A Constant Factor Approximation Algorithm for the Multicommodity Rent-or-Buy Problem

Do Less By Default BRUCE RICHARDSON - INTEL THOMAS MONJALON - MELLANOX

CHTC Policy and Configuration. Greg Thain HTCondor Week 2017

Distributed Operating Systems

HTCONDOR USER TUTORIAL. Greg Thain Center for High Throughput Computing University of Wisconsin Madison

A Guide to Condor. Joe Antognini. October 25, Condor is on Our Network What is an Our Network?

Introduction, History

Making Workstations a Friendly Environment for Batch Jobs. Miron Livny Mike Litzkow

An update on the scalability limits of the Condor batch system

New Directions and BNL

AliEn Resource Brokers

glideinwms: Quick Facts

Modeling Cone Optimization Problems with COIN OS

Git Tutorial. Version: 0.2. Anders Nilsson April 1, 2014

Project Blackbird. U"lizing Condor and HTC to address archiving online courses at Clemson on a weekly basis. Sam Hoover

Introducing the HTCondor-CE

LECTURE 3:CPU SCHEDULING

c 2001 Society for Industrial and Applied Mathematics

STORK: Making Data Placement a First Class Citizen in the Grid

Corral: A Glide-in Based Service for Resource Provisioning

Condor and BOINC. Distributed and Volunteer Computing. Presented by Adam Bazinet

Message Passing Interface (MPI)

Introduction to Grid Computing

Pegasus. Automate, recover, and debug scientific computations. Rafael Ferreira da Silva.

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

To the Cloud and Back: A Distributed Photo Processing Pipeline

XSEDE High Throughput Computing Use Cases

Lab 12 Lijnenspel revisited

CPU Scheduling: Objectives

15-440: Recitation 8

CPU Scheduling. Basic Concepts. Histogram of CPU-burst Times. Dispatcher. CPU Scheduler. Alternating Sequence of CPU and I/O Bursts

! " # " $ $ % & '(()

Autonomic Condor Clouds. David Wolinsky ACIS P2P Group University of Florida

Tutorial 4: Condor. John Watt, National e-science Centre

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

What s new in HTCondor? What s coming? European HTCondor Workshop June 8, 2017

BOSCO Architecture. Derek Weitzel University of Nebraska Lincoln

Cloud Computing. Summary

2.3 Algorithms Using Map-Reduce

EGEE and Interoperation

glideinwms architecture by Igor Sfiligoi, Jeff Dost (UCSD)

Heuristic Optimization Today: Linear Programming. Tobias Friedrich Chair for Algorithm Engineering Hasso Plattner Institute, Potsdam

Version control. with git and GitHub. Karl Broman. Biostatistics & Medical Informatics, UW Madison

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Hi everyone. I hope everyone had a good Fourth of July. Today we're going to be covering graph search. Now, whenever we bring up graph algorithms, we

Chapter 5 Scheduling

The Lattice BOINC Project Public Computing for the Tree of Life

The following program computes a Calculus value, the "trapezoidal approximation of

HTCondor: Virtualization (without Virtual Machines)

NUIT Tech Talk Topics in Research Computing: XSEDE and Northwestern University Campus Champions

Math Models of OR: The Simplex Algorithm: Practical Considerations

Quick Start Guide. Table of Contents

MapReduce-II. September 2013 Alberto Abelló & Oscar Romero 1

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices

Distributed Systems CS6421

Pegasus Workflow Management System. Gideon Juve. USC Informa3on Sciences Ins3tute

Building the International Data Placement Lab. Greg Thain

HPC Resources at Lehigh. Steve Anthony March 22, 2012

Daniel D. Warner. May 31, Introduction to Parallel Matlab. Daniel D. Warner. Introduction. Matlab s 5-fold way. Basic Matlab Example

Shaking-and-Baking on a Grid

Transcription:

Solving Hard Integer Programs with MW Jeff Linderoth ISE Department COR@L Lab Lehigh University jtl3@lehigh.edu 2007 Condor Jamboree Madison, WI May 2, 2007 Thanks! NSF OCI-0330607, CMMI-0522796, DOE DE-FG02-05ER25694 Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 1 / 1

Collaborators { François Margot Carnegie Mellon { Greg Thain UW-Madison Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 2 / 1

Football Pool Problem Problem Description In Our Last Episode... The Design of a Gambling System Predict the outcome of v soccer matches α = 3 0: Team A wins 1: Team B wins 2: Draw You win if you miss at most d = 1 games The Football Pool Problem What is the minimum number of tickets you must buy to guarantee that you hold a winning ticket? Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 3 / 1

Football Pool Problem Problem Description How Many Must I Buy? Known Optimal Values v 1 2 3 4 5 C v 1 3 5 9 27 The Football Pool Problem What is C 6? Despite significant effort on this problem for > 40 years, it is (was) only known that 65 C 6 73 Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 4 / 1

Football Pool Problem Problem Description But It s Trivial! There is a simple formulation of the problem as a reasonably-sized integer program (IP) For each j W, let x j = 1 iff the word j is in code C Let A {0, 1} W W a ij = 1 iff word i W is distance d = 1 from word j W IP Formulation min e T x s.t. Ax e x {0, 1} W Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 5 / 1

Football Pool Problem Problem Description Football Pool Covering Matrix Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 6 / 1

Football Pool Problem Solution Methodology Beautiful But Deadly Workhorse algorithm is a tree-search procedure known as branch-and-bound. CPLEX: A commercial IP package that is putting integer programmers out of business. CPLEX routinely solves 0-1 integer programs with (tens of) thousands of variables and constraints Theorem: Pretty Matrices Make Hard IPs Many branches must be done to remove the symmetry Recognizing symmetry and designing algorithms to exploit the symmetry are fundamentally important Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 7 / 1

Football Pool Problem Solution Methodology Grid Programmers Do It In Parallel Nodes in disjoint subtrees can be evaluated independently But this is not a embarrassingly pleasantly parallel operation We use the master-worker parallelization scheme Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 8 / 1

MW MWBackground Use Master-Worker! Liberate Find WMDs! Iraqi People As None YouFound Wish Declare Mission Accomplished! Huh? Master: Send task (node) to workers Worker: Evaluate node and send result to master Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 9 / 1

MW MWBackground MW Master-Worker is a flexible, powerful framework for Grid Computing It s easy to be fault tolerant It s easy to take advantage of machines whenever they are available You can be flexible and adaptive in your approach to computing Everyone can have access to a powerful computing platform MW We re Here to Help! MW is a C ++ software package that encapsulates the abstractions of the Master-Worker paradigm Allows users to easily build master-worker type computations running on Condor-provided computational grids It s Free, Like Free Beer: http://www.cs.wisc.edu/condor/mw Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 10 / 1

MW MWDetails MW Classes MWMaster get userinfo() setup initial tasks() pack worker init data() act on completed task() MWTask (un)pack work (un)pack result MWWorker unpack worker init data() execute task() We re Here To Help! Please contact Greg or Myself if you need help getting set-up with MW Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 11 / 1

Building a Grid Mechanisms Condor + MW Cobbling Together Resources 1 Condor Flocking Jobs submit to local pool run in remote pools 2 Condor Glide-in (or manual hobble-in ) Batch-scheduled resources join existing Condor pool. 3 Remote Submit Log-in and submit worker executables remotely Can use port-forwarding for hard-to-reach private networks Schedd-on-the-side A new Condor technology which takes idle jobs out of the local Condor queue, translates them into Grid jobs, and uses Condor-G to submit them to a remote Grid queue Perfect for OSG! Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 12 / 1

Building a Grid Using Your Grid Deja Vu In 2006, we had almost estalished a lower bound of 70 C 6 73 Statistics so far... Wall Time 47.15 days CPU Time 57.31 years Avg Workers 467.3 Max Workers 1253 Total Nodes 8.37 10 8 Total LP Pivots 6.53 10 11 Parallel Performance 94.9% Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 13 / 1

Building a Grid Using Your Grid Mission Accomplished We were able to establish a lower bound of 70 C 6 73 Computational Statistics Wall Time 72.3 days CPU Time 110.1 years Avg Workers 555.8 Max Workers 2038 # Different Workers 4266 Total Nodes 2.85 10 9 Total LP Pivots 2.65 10 12 Parallel Performance 90.3% Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 14 / 1

Building a Grid Using Your Grid Our Continuing Mission I couldn t think of a good analogy, but it has been 312 days since we declared mission accomplished : 70 C 6 73 But We Press On Can we improve the lower bound even more? We ve ramped up the scale of resources available Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 15 / 1

Building a Grid Using Your Grid Resources Used in Computation Site Access Method Arch/OS Machines Wisconsin - CS Flocking x86 32/Linux 975 Wisconsin - CS Flocking Windows 126 Wisconsin - CAE Remote submit x86 32/Linux 89 Wisconsin - CAE Remote submit Windows 936 Lehigh - COR@L Lab Flocking x86 32/Linux 57 Lehigh - Campus Remote Submit Windows 803 Lehigh - Beowulf ssh + Remote Submit x86 32 184 Lehigh - Beowulf ssh + Remote Submit x86 64 120 TG - NCSA Flocking x86 32/Linux 494 TG - NCSA Flocking x86 64/Linux 406 TG - NCSA Hobble-in ia64-linux 1732 TG - ANL/UC Hobble-in ia-32/linux 192 TG - ANL/UC Hobble-in ia-64/linux 128 TG - TACC Hobble-in x86 64/Linux 5100 TG - SDSC Hobble-in ia-64/linux 524 TG - Purdue Remote Submit x86 32/Linux 1099 TG - Purdue Remote Submit x86 64/Linux 1529 TG - Purdue Remote Submit Windows 1460 Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 16 / 1

Building a Grid Using Your Grid OSG Resources Used in Computation Site Access Method Arch/OS Machines OSG - Wisconsin Schedd-on-side x86 32/Linux 1000 OSG - Nebraska Schedd-on-side x86 32/Linux 200 OSG - Caltech Schedd-on-side x86 32/Linux 500 OSG - Arkansas Schedd-on-side x86 32/Linux 8 OSG - BNL Schedd-on-side x86 32/Linux 250 OSG - MIT Schedd-on-side x86 32/Linux 200 OSG - Purdue Schedd-on-side x86 32/Linux 500 OSG - Florida Schedd-on-side x86 32/Linux 100 OSG: 2758 Total: 19,012 Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 17 / 1

Building a Grid Using Your Grid Mission Accomplished-er We have been able to establish 71 C 6 73 Computational Statistics Avg. Workers 562.4 Max Workers 1775 Worker Time (years) 30.3 Wall Time (days) 19.7 Nodes 1.89 10 8 LP Pivots 1.82 10 11 Mission Accomplished-est : Working on 72 C 6 73 Brings the total to > 200 CPU Years! Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 18 / 1

Building a Grid Using Your Grid M = 71, Number of Processors (Slice) Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 19 / 1

Building a Grid Solution Status Working Paper Greatest Title Ever! With thanks to Greg Thain for the title (among other things) The Tera-Gridiron, A Natural-Turf for High-Throughput Computing. Submit to Teragrid 07 Rejected Perhaps the program committee wishes to wait until the problem is finally solved. I have one thing to say to them... Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 20 / 1

MWBlackBox New Problems You Can t Legislate a Timeline for Completion! Stupid Tax-o-crats! Having my fragile ego crushed, we may really declare mission accomplished and turn to the solution of other important, pretty integer programs Our approach this time is making use of new MWBlackBox Framework Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 21 / 1

MWBlackBox New Problems Pretty IPs Steiner Triples Covering Designs Error Correcting Codes Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 22 / 1

MWBlackBox Dynamic DAGMAN MWBlackBox = Dynamic DAGMAN Each of the nodal subproblems is in-fact a smaller version of the original problem (with some variables fixed/removed). Can we use the exact same (blackbox) software that is used to solve the full-scale problem? Have user write no code for Task or Worker classes. Many people would like this functionality. Maybe you would too. Condor DAGMAN Designed for static job dependencies Simple, Robust, Reliable Have to pay Condor executable startup overhead MW Blackbox Designed for dynamic job dependencies C++ coding must be done. Less battle-tested Amortizes Condor executable startup overhead Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 23 / 1

MWBlackBox Code Snippets BlackBox Snippets MWTask_blackbox(const list<string> &args, const list<string> &input_files, const list<string> &output_files); Creating Blackbox Tasks 1 List of arguments to the executable 2 List of input files (shipped to the worker executing the task) 3 List of output files: automatically shipped back to the master Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 24 / 1

MWBlackBox Code Snippets MWBlackBox Snippets BlackboxIPMaster::get_userinfo(int argc, char *argv[]) { // Set target number of workers RMC->set_target_num_workers(512); // Set name of blackbox executable set_executable(string("cplex")); // Can also stage files on the workers add_staged_file(param_.getproblemname()); } Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 25 / 1

MWBlackBox Code Snippets More MWBlackBox Snippets BlackboxIPMaster::act_on_completed_task(MWBlackboxTask *bbt) { // Can get file streams for standard output ifstream stdoutfile = bbt->getstdoutstream(); // Or open streams from output file string ofname = bbt->getoutputfiles()[0]; ifstream ofile = ifstream(ofname.c_str()); // Then must parse output stream. // 1) Collect statistics or // 2) Create new tasks } Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 26 / 1

MWBlackBox The End! Conclusions The Football Pool Problem is hard!, but now 71 C 6 73 The burning question: Will Miron let me speak about 72 C 6 73 next year? Most important: Real large-scale applications can take advantages of all the computing power that s out there Local grids, Tera-grids, Open-Science Grids MW (via Condor) can help pull all of this together MWBlackBox: Write no code for worker and task. Why not try it out? Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 27 / 1

MWBlackBox The End! Any Questions www.cs.wisc.edu/condor/mw Please talk to Greg or I We d be happy to help you get started with MW mailto: jtl3@lehigh.edu gthain@cs.wisc.edu mw@cs.wisc.edu Jeff Linderoth (Lehigh University) Solving Hard IPs with MW 2007 Condor Jamboree 28 / 1