Ranking and selection problem with 1,016,127 systems. requiring 95% probability of correct selection. solved in less than 40 minutes

Similar documents
A Comparison of Two Parallel Ranking and Selection Procedures

RANKING AND SELECTION IN A HIGH PERFORMANCE COMPUTING ENVIRONMENT. Eric C. Ni Susan R. Hunter Shane G. Henderson

Efficient Ranking and Selection in Parallel Computing Environments

Package rstream. R topics documented: June 21, Version Date Title Streams of Random Numbers

Lecture: Simulation. of Manufacturing Systems. Sivakumar AI. Simulation. SMA6304 M2 ---Factory Planning and scheduling. Simulation - A Predictive Tool

Bayesian Sequential Sampling Policies and Sufficient Conditions for Convergence to a Global Optimum

General Factorial Models

General Factorial Models

Randomized rounding of semidefinite programs and primal-dual method for integer linear programming. Reza Moosavi Dr. Saeedeh Parsaeefard Dec.

A New Statistical Procedure for Validation of Simulation and Stochastic Models

A HYBRID METHOD FOR SIMULATION FACTOR SCREENING. Hua Shen Hong Wan

SELECTION OF THE BEST WITH STOCHASTIC CONSTRAINTS. Engineering 3211 Providence Drive Black Engineering 3004

CONFIDENCE INTERVALS FROM A SINGLE SIMULATION USING THE DELTA METHOD 1

A simple OMNeT++ queuing experiment using different random number generators

Simultaneous Perturbation Stochastic Approximation Algorithm Combined with Neural Network and Fuzzy Simulation

Simulation ranking and selection procedures and applications in network reliability design

Integer and Combinatorial Optimization: Clustering Problems

Group Secret Key Generation Algorithms

NEW MODEL OF FRAMEWORK FOR TASK SCHEDULING BASED ON MOBILE AGENTS

Simulation Calibration with Correlated Knowledge-Gradients

On the Maximum Throughput of A Single Chain Wireless Multi-Hop Path

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

When does a digraph admit a doubly stochastic adjacency matrix?

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds.

1 Introduction and Results

HAMILTON CYCLES IN RANDOM LIFTS OF COMPLETE GRAPHS

Directional Sensor Control for Maximizing Information Gain

Parallel Auction Algorithm for Linear Assignment Problem

Missing Data Analysis for the Employee Dataset

Statistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.

Metaheuristic Optimization with Evolver, Genocop and OptQuest

EVALUATION OF METHODS USED TO DETECT WARM-UP PERIOD IN STEADY STATE SIMULATION. Prasad S. Mahajan Ricki G. Ingalls

Research Interests Optimization:

A Non-Iterative Approach to Frequency Estimation of a Complex Exponential in Noise by Interpolation of Fourier Coefficients

Transactum Business Process Manager with High-Performance Elastic Scaling. November 2011 Ivan Klianev

Semantic Website Clustering

c 2003 Society for Industrial and Applied Mathematics

Seeing Around Corners: Fast Orthogonal Connector Routing

Lecture 11: Clustering and the Spectral Partitioning Algorithm A note on randomized algorithm, Unbiased estimates

Simulation Calibration with Correlated Knowledge-Gradients

Bichromatic Line Segment Intersection Counting in O(n log n) Time

The strong chromatic number of a graph

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

A PARAMETRIC SIMPLEX METHOD FOR OPTIMIZING A LINEAR FUNCTION OVER THE EFFICIENT SET OF A BICRITERIA LINEAR PROBLEM. 1.

1 (eagle_eye) and Naeem Latif

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

ADAPTATION OF THE UOBYQA ALGORITHM FOR NOISY FUNCTIONS

Computational and Statistical Tradeoffs in VoI-Driven Learning

Lecture 9: Support Vector Machines

A Survey on Postive and Unlabelled Learning

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

The Dynamic Hungarian Algorithm for the Assignment Problem with Changing Costs

Interval Algorithms for Coin Flipping

Missing Data Analysis for the Employee Dataset

LATIN SQUARES AND THEIR APPLICATION TO THE FEASIBLE SET FOR ASSIGNMENT PROBLEMS

Implementation of a High-Performance Distributed Web Crawler and Big Data Applications with Husky

ALTHOUGH the Maximum Likelihood (ML) multiuser

A noninformative Bayesian approach to small area estimation

Bipolar Fuzzy Line Graph of a Bipolar Fuzzy Hypergraph

Geometry. Geometric Graphs with Few Disjoint Edges. G. Tóth 1,2 and P. Valtr 2,3. 1. Introduction

AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING

On a Cardinality-Constrained Transportation Problem With Market Choice

OPTIMAL LINK CAPACITY ASSIGNMENTS IN TELEPROCESSING AND CENTRALIZED COMPUTER NETWORKS *

Some new results on circle graphs. Guillermo Durán 1

TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA)

Coalition formation in multi-agent systems an evolutionary approach

Cooperation between Data Modeling and Simulation Modeling for Performance Analysis of Hadoop

Low Latency via Redundancy

REDUCING GRAPH COLORING TO CLIQUE SEARCH

Importance-Scanning Worm Using Vulnerable-Host Distribution

Analysis of Replication Control Protocols

Fair Multi-Resource Allocation with External Resource for Mobile Edge Computing

An Effective Upperbound on Treewidth Using Partial Fill-in of Separators

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems

On Approximating Minimum Vertex Cover for Graphs with Perfect Matching

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation

Secretary Problems and Incentives via Linear Programming

A matching of maximum cardinality is called a maximum matching. ANn s/2

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995)

A Random Number Based Method for Monte Carlo Integration

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang

The Bootstrap and Jackknife

A Counterexample to the Dominating Set Conjecture

StreamIt on Fleet. Amir Kamil Computer Science Division, University of California, Berkeley UCB-AK06.

LEAST COST ROUTING ALGORITHM WITH THE STATE SPACE RELAXATION IN A CENTRALIZED NETWORK

Practical Test Architecture Optimization for System-on-a-Chip under Floorplanning Constraints

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Introduction to Mathematical Programming IE406. Lecture 16. Dr. Ted Ralphs

Statistical Matching using Fractional Imputation

A New Binary Vote Assignment on Grid Algorithm to Manage Replication in Distributed Database Environment

Sequential Screening: A Bayesian Dynamic Programming Analysis

CME323 Report: Distributed Multi-Armed Bandits

STATISTICAL CALIBRATION: A BETTER APPROACH TO INTEGRATING SIMULATION AND TESTING IN GROUND VEHICLE SYSTEMS.

Approximating Fault-Tolerant Steiner Subgraphs in Heterogeneous Wireless Networks

Modeling and Analysis of Random Walk Search Algorithms in P2P Networks

Multivariate probability distributions

Stat 321: Transposable Data Clustering

Privacy in Statistical Databases

CUT VERTICES IN ZERO-DIVISOR GRAPHS OF FINITE COMMUTATIVE RINGS

The Bounded Edge Coloring Problem and Offline Crossbar Scheduling

Transcription:

Ranking and selection problem with 1,016,127 systems requiring 95% probability of correct selection solved in less than 40 minutes with 600 parallel cores

with near-linear scaling.. Wallclock time (sec) 10 4 Perfect scaling Actual performance 10 3 60 96 120 240 360 480 600 960 Number of cores

and High utilization of computing budget

Ranking and Selection in a High Performance Computing Environment Eric Cao Ni Susan R. Hunter Shane G. Henderson School of Operations Research and Information Engineering, Cornell University School of Industrial Engineering, Purdue University Supported by NSF grant CMMI-1200315 and Extreme Science and Engineering Discovery Environment (XSEDE), NSF grant OCI-1053575. Winter Simulation Conference Washington, DC, December 9, 2013

1 Introduction 2 Considerations for parallel procedures 3 The Algorithm 4 Summary Introduction Considerations for parallel procedures The Algorithm Summary 1/26

1 Introduction 2 Considerations for parallel procedures 3 The Algorithm 4 Summary Introduction Considerations for parallel procedures The Algorithm Summary 2/26

Parallelism in simulation Two main research areas related to exploiting parallelism in discrete-event simulation Many processors on a single replication (Fujimoto, 2000) Many processors on independent replications (Heidelberger, 1988; Glynn and Heidelberger, 1990, 1991) Introduction Considerations for parallel procedures The Algorithm Summary 3/26

Ranking and Selection (R&S) max i S y(i) = E[Y (i; ξ)] Optimize a function through a stochastic simulation Function evaluated with error Feasible region is finite: K = S <. No assumption on topology of S. Want to find the best system i S with a certain degree of statistical confidence: P[select system j : g(j) g(i) δ, i S] 1 α Introduction Considerations for parallel procedures The Algorithm Summary 4/26

Parallelism in simulation optimization Many existing simulation optimization (in particular, R&S) algorithms are sequential in nature (Paulson, 1964; Fabian, 1974; Kim and Nelson, 2001, 2006; Hong, 2006). Past studies on parallel ranking and selection procedures include Chen (2005) and Luo and Hong (2011). Luo et al. (2013) proved an asymptotically valid parallel ranking and selection procedure. We propose a parallel algorithm for R&S that is Valid (maintains a required probability of correct selection) Efficient (speeds up as more cores are employed) Introduction Considerations for parallel procedures The Algorithm Summary 5/26

Assumptions on simulation output Y ijk : the output of the kth replication of simulating System i on core j, where 1 i S, 1 j W, k = 1, 2,... T ijk : the (random) completion time The cores produce i.i.d. replicates of (Y i, T i ) for each i S. Y i is marginally normally distributed with finite mean µ i and finite (and possibly unknown) variance σ 2 i. E[T i ] < for all i S. {Y i : 1 i S } can be correlated: possible to use Common Random Numbers (CRN). Introduction Considerations for parallel procedures The Algorithm Summary 6/26

The computing environment (1) A pre-specified, fixed number of cores are always available and do not fail or suddenly become unavailable; (2) The cores are identical and capable of message-passing. (3) Communication between cores is nearly instantaneous; (4) Messages join a queue for processing by the receiving core and are never lost. We implemented our algorithm in C/C++ using Message Passing Interface (MPI), tested on Extreme Science and Engineering Discovery Environment (XSEDE) s Lonestar cluster. Introduction Considerations for parallel procedures The Algorithm Summary 7/26

A simple master-worker framework One core is designated the master and the others are workers The master core monitors the progress of the algorithm and distributes work to workers Worker cores produce simulation replications according to the master s instruction. Each worker only exchanges information with the master Source: Hadoop Illuminated, M Kerzner and S Maniyam Introduction Considerations for parallel procedures The Algorithm Summary 8/26

1 Introduction 2 Considerations for parallel procedures 3 The Algorithm 4 Summary Introduction Considerations for parallel procedures The Algorithm Summary 9/26

Naive parallellism does not work Example: 1 system, 2 workers. Worker j produces i.i.d. replications ((Y j1, T j1 ), (Y j2, T j2 ),...) Y j is (marginally) { Normal(0, 1). 1 if Yjk < 0, T jk = 2 if Y jk 0. Let (Y 1, T 1 ) be the outcome of the first replication completed. P(Y 11 < 0, Y 21 < 0) }{{} Y 1 <0,T 1 =1 = P(Y 11 0, Y 21 < 0) }{{} Y 1 <0,T 1 =1 Y 1 is NOT normal, and E[Y 1 ] 0! = P(Y 11 < 0, Y 21 0) }{{} Y 1 <0,T 1 =1 = P(Y 11 0, Y 21 0) = 1/4. }{{} Y 1 0,T 1 =2 Introduction Considerations for parallel procedures The Algorithm Summary 9/26

Naive parallellism does not work In general, the set of replications that have completed by a fixed time may not be i.i.d. with the correct distribution (Heidelberger, 1988; Glynn and Heidelberger, 1990, 1991). Solution: Use estimators based on a fixed number of replications in a random completion time, in a pre-determined order. Introduction Considerations for parallel procedures The Algorithm Summary 10/26

Naive parallellism does not work 1 1 3 1 2 3 T 1 T 2 T 3 Figure 1: An illustration of the simulation results collected on Master. Notice that at T 2, a valid estimator uses only the output from replication 1. Introduction Considerations for parallel procedures The Algorithm Summary 11/26

Screening can be expensive In sequential R&S procedures, screening is often pairwise and periodic: Each system is compared with all other surviving system after one (or a few) additional replication. This may prove problematic in a parallel setting: With multiple workers, the generation of replications are much faster. In contrast, screening may become too much work for any single core. Introduction Considerations for parallel procedures The Algorithm Summary 12/26

Don t screen all pairs Systems 0 10 20 30 40 0 10 Systems 20 30 40 Within-core screening Best of master core Figure 2: Screening on the Master Solution 1: Distribute screening among workers. Solution 2: Perform a subset of pairwise screens. Introduction Considerations for parallel procedures The Algorithm Summary 13/26

Don t screen all pairs Systems 0 10 20 30 40 0 Systems 0 10 20 30 40 0 10 Core 1 10 Systems 20 30 Systems 20 30 Core 2 Core 3 40 Within-core screening Best of master core 40 Core 4 Within-core screening Between-core screening Best of each core Core 5 Figure 3: Screening on the Master Figure 4: Solution 1: Distribute screening to workers. Solution 2: Perform a subset of pairwise screens. Screening on the Workers Introduction Considerations for parallel procedures The Algorithm Summary 14/26

Don t screen all pairs Solution 2: Perform a subset of pairwise screens. Proposition 1 The statistical guarantee is preserved if some pairs are dropped from screening, if such guarantee is based on the Bonferroni argument P(ICS) K 1 i=1 P(A ik ) where A ik is the event that some inferior system i incorrectly eliminates the best system K. Thus, each worker may only screen among the systems assigned to it, plus against other worker s best systems. Introduction Considerations for parallel procedures The Algorithm Summary 15/26

Screening can be expensive Solution 3: Do not screen on every replication Proposition 2 Screening on a pre-determined subsequence of replications does not decrease the probability of correct selection. Proof follows directly from Jennison et al. (1980, 1982). The subsequences have to be pre-determined to avoid the bias implied by random completion time: between each pair of systems (i 1, i 2 ), screen at replications (bn 1, bn 2 ), for b = 1, 2,... and possibly unequal n 1, n 2. The step count b does not need to be equal among all pairs: screening can be made asynchronous. Introduction Considerations for parallel procedures The Algorithm Summary 16/26

Handling random number generation Random numbers generated across the workers should be independent. The workers generate identical random numbers if no specific instruction is given! Solution: Use the RngStream MCG generator with streams and substreams, proposed in L Ecuyer et al. (2002), in the master-worker framework: At initialization, the master generates one stream Z j for each worker j. When worker j simulates system i, it uses a fixed amount of random numbers under the ith substream, Z ji. Introduction Considerations for parallel procedures The Algorithm Summary 17/26

1 Introduction 2 Considerations for parallel procedures 3 The Algorithm 4 Summary Introduction Considerations for parallel procedures The Algorithm Summary 18/26

A three-stage parallel R&S algorithm Stage 0: We simulate all systems to estimate simulation completion times. Stage 1: (If variances need to be estimated) Independently of Stage 0, systems are simulated by workers to obtain variance estimates. Stage 2: Remaining systems are simulated and screened until one system remains. Introduction Considerations for parallel procedures The Algorithm Summary 18/26

Don t screen all pairs Systems 0 10 20 30 40 0 Systems 0 10 20 30 40 0 10 Core 1 10 Systems 20 30 Systems 20 30 Core 2 Core 3 40 Within-core screening Best of master core 40 Core 4 Within-core screening Between-core screening Best of each core Core 5 Figure 5: Screening on the Master Figure 6: Screening on the Workers Introduction Considerations for parallel procedures The Algorithm Summary 19/26

A three-stage parallel R&S algorithm Strategy 1: Dedicate both simulation and screening of each system to a worker Figure 7: Utilization of workers using Strategy 1 Introduction Considerations for parallel procedures The Algorithm Summary 20/26

A three-stage parallel R&S algorithm Strategy 2: Only screening is partitioned and dedicated to a worker Figure 8: Utilization of workers using Strategy 2 Introduction Considerations for parallel procedures The Algorithm Summary 21/26

Numerical example Our parallel algorithm is applied to a throughput-maximization problem (SimOpt.org) for which Luo et al. (2013) solved a version with 3,249 systems. We solve the problem with 1,016,127 systems in consideration. Introduction Considerations for parallel procedures The Algorithm Summary 22/26

Performance on 1,016,127 systems Wallclock time (sec) 10 4 Perfect scaling Actual performance 10 3 60 96 120 240 360 480 600 960 Number of cores Introduction Considerations for parallel procedures The Algorithm Summary 23/26

1 Introduction 2 Considerations for parallel procedures 3 The Algorithm 4 Summary Introduction Considerations for parallel procedures The Algorithm Summary 24/26

Summary We proposed a R&S procedure in a high-performance parallel computing environment which is capable of solving large-scale R&S problems. The statistical guarantee is maintained through Screening on subsequences Carefully managing random number generation Parallelizing both simulation and screening leads to decent speed-up Introduction Considerations for parallel procedures The Algorithm Summary 24/26

What s next? Consider other computing architectures Eliminate the master i.e. bottleneck Cloud platform where cores are less reliable High switching cost Compare with parallel versions of two-stage procedures Test on different problems Introduction Considerations for parallel procedures The Algorithm Summary 25/26

Thank you! Questions? 26/26

References I E. Jack Chen. Using parallel and distributed computing to increase the capability of selection procedures. In Proceedings of the 37th Conference on Winter Simulation, WSC 05, pages 723 731. Winter Simulation Conference, 2005. ISBN 0-7803-9519-0. URL http://dl.acm.org/citation.cfm?id=1162708.1162832. V. Fabian. Note on anderson s sequential procedures with triangular boundary. Annals of Statistics, 2(1):170 176, 1974. R. M. Fujimoto. Parallel and Distributed Simulation Systems. Wiley, New York, 2000. P. W. Glynn and P. Heidelberger. Bias properties of budget constrained simulations. Operations Research, 38:801 814, 1990. References 1/5

References II P. W. Glynn and P. Heidelberger. Analysis of parallel replicated simulations under a completion time constraint. ACM Transactions on Modeling and Computer Simulation, 1 (1):3 23, 1991. P. Heidelberger. Discrete event simulations and parallel processing: statistical properties. Siam J. Stat. Comput., 9 (6):1114 1132, 1988. L. Jeff Hong. Fully sequential indifference-zone selection procedures with variance-dependent sampling. Naval Research Logistics (NRL), 53(5):464 476, 2006. C. Jennison, I.M. Johnstone, and B.W. Turnbull. Asymptotically optimal procedures for sequential adaptive selection of the best of several normal means. Technical Report 463, School of Operations Research and Industrial Engineering, Cornell University, Ithaca NY, 1980. References 2/5

References III C. Jennison, I.M. Johnstone, and B.W. Turnbull. Asymptotically optimal procedures for sequential adaptive selection of the best of several normal means. In S.S. Gupta and J.O. Berger, editors, Statistical Decision Theory and Related Topics III, vol. 2., pages 55 86. Academic Press, New York, New York, 1982. S.-H. Kim and B. L. Nelson. Selecting the best system. In S. G. Henderson and B. L. Nelson, editors, Simulation, Handbooks in Operations Research and Management Science, pages 501 534. Elsevier, Amsterdam, 2006. Seong-Hee Kim and Barry L. Nelson. A fully sequential procedure for indifference-zone selection in simulation. ACM Transactions on Modeling and Computer Simulation (TOMACS), 11(3):251 273, 2001. References 3/5

References IV P. L Ecuyer, R. Simard, E. J. Chen, and W. D. Kelton. An objected-oriented random-number package with many long streams and substreams. Operations Research, 50(6): 1073 1075, 2002. Jun Luo and L. Jeff Hong. Large-scale ranking and selection using cloud computing. In Proceedings of the Winter Simulation Conference, WSC 11, pages 4051 4061. Winter Simulation Conference, 2011. URL http://dl.acm.org/citation.cfm?id=2431518.2432002. Jun Luo, Jeff L. Hong, Barry L. Nelson, and Yang Wu. Fully sequential procedures for large-scale ranking-and-selection problems in parallel computing environments. Submitted, 2013. References 4/5

References V E. Paulson. A sequential procedure for selecting the population with the largest mean from k normal populations. Annals of Mathematical Statistics, 35(1):174 180, 1964. References 5/5