Raj Boppana, Ph.D. Professor and Interim Chair. University of Texas at San Antonio

Similar documents
REPORT DOCUMENTATION PAGE

Testing parallel random number generators

November , Universität Leipzig, CompPhys12

Pseudo-Random Number Generators for Massively Parallel Discrete-Event Simulation

The rsprng Package. July 24, 2006

Forrest B. Brown, Yasunobu Nagaya. American Nuclear Society 2002 Winter Meeting November 17-21, 2002 Washington, DC

A Survey and Empirical Comparison of Modern Pseudo-Random Number Generators for Distributed Stochastic Simulations

PSEUDORANDOM numbers are very important in practice

Proposed Pseudorandom Number Generator

IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY FPGA

Random Number Generation A Practitioner s Overview

Variants of Mersenne Twister Suitable for Graphic Processors

CSCE GPU PROJECT PRESENTATION SCALABLE PARALLEL RANDOM NUMBER GENERATION

REPRODUCIBILITY BEGINNER s

SOME NOTES ON MULTIPLICATIVE CONGRUENTIAL RANDOM NUMBER GENERATORS WITH MERSENNE PRIME MODULUS Dr. James Harris*

RESEARCH ARTICLE Software/Hardware Parallel Long Period Random Number Generation Framework Based on the Well Method

Monte Carlo Integration and Random Numbers

CPSC 531: System Modeling and Simulation. Carey Williamson Department of Computer Science University of Calgary Fall 2017

A Reconfigurable Supercomputing Library for Accelerated Parallel Lagged-Fibonacci Pseudorandom Number Generation

Analysis of Cryptography and Pseudorandom Numbers

SAC: G: 3-D Cellular Automata based PRNG

PRNGCL: OpenCL Library of Pseudo-Random Number Generators for Monte Carlo Simulations

Reproducibility in Stochastic Simulation

arxiv: v1 [cs.dc] 2 Aug 2011

arxiv: v1 [physics.comp-ph] 22 May 2010

GENERATION OF PSEUDO-RANDOM NUMBER BY USING WELL AND RESEEDING METHOD. V.Divya Bharathi 1, Arivasanth.M 2

Linear Congruential Number Generators. A useful, if not important, ability of modern computers is random number

Comparative Analysis of SLA-LFSR with Traditional Pseudo Random Number Generators

Testing unrolling optimization technique for quasi random numbers

Using Quasigroups for Generating Pseudorandom Numbers

A PARALLEL RANDOM NUMBER GENERATOR FOR SHARED MEMORY ARCHITECTURE MACHINE USING OPENMP

A simple OMNeT++ queuing experiment using different random number generators

Random Number Generators for Parallel Computers

RNG: definition. Simulating spin models on GPU Lecture 3: Random number generators. The story of R250. The story of R250

A Secured Key Generation Scheme Using Enhanced Entropy

Random-Number Generation

SSS: An Implementation of Key-value Store based MapReduce Framework. Hirotaka Ogawa (AIST, Japan) Hidemoto Nakada Ryousei Takano Tomohiro Kudoh

Introduction to GPU hardware and to CUDA

GPU Fundamentals Jeff Larkin November 14, 2016

Parallel Computing 35 (2009) Contents lists available at ScienceDirect. Parallel Computing. journal homepage:

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization

Introduction to HPC. Introduction to HPC, Random number generation and OpenMP. Jérôme Lelong, Christophe Picard, Laurence Viry

Statistical Evaluation of a Self-Tuning Vectorized Library for the Walsh Hadamard Transform

Random Number Generators

Performance potential for simulating spin models on GPU

Monte Carlo Simulation With The GATE Software Using Grid Computing

RNG EVALUATION - TESTING REPORT

Banking on Monte Carlo and beyond

Random Number Generators. Summer Internship Project Report submitted to Institute for Development and. Research in Banking Technology (IDRBT)

Hardware Accelerated Scalable Parallel Random Number Generation

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors

Random Numbers for Parallel Computers: Requirements and Methods, With Emphasis on GPUs

Efficient pseudo-random number generation for monte-carlo simulations using graphic processors

A Random Number Generator Test Suite for the C++ Standard

An On-Demand Fast Parallel Pseudo Random Number Generator with Applications

ENABLING SIMHEURISTICS THROUGH DESIGNS FOR TENS OF VARIABLES: COSTING MODELS AND ONLINE AVAILABILITY

Empirical Spectral Analysis of Random Number Generators

Cryptography and Network Security Chapter 7

A Multi-Tiered Optimization Framework for Heterogeneous Computing

NVIDIA COLLECTIVE COMMUNICATION LIBRARY (NCCL)

PLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters

Cluster-based 3D Reconstruction of Aerial Video

Testing Primitive Polynomials for Generalized Feedback Shift Register Random Number Generators

Accelerated Machine Learning Algorithms in Python

Chapter 4: (0,1) Random Number Generation

Random Number Generator of STMC and simple Monte Carlo Integration

What We ll Do... Random

Martin Dubois, ing. Contents

CS 179: GPU Computing. Lecture 16: Simulations and Randomness

PRNG. Generating random numbers Version: Date: 12 March Otmar Lendl Josef Leydold

Intel Performance Libraries

Tiny GPU Cluster for Big Spatial Data: A Preliminary Performance Evaluation

Fundamental Optimizations in CUDA Peng Wang, Developer Technology, NVIDIA

Performance impact of dynamic parallelism on different clustering algorithms

Investigation and Design of the Efficient Hardwarebased RNG for Cryptographic Applications

Efficient Parallel Simulation of an Individual-Based Fish Schooling Model on a Graphics Processing Unit

NVIDIA COLLECTIVE COMMUNICATION LIBRARY (NCCL)

Further scramblings of Marsaglia s xorshift generators

Teaching with Technology. Retooling Lecture Capture

Sampling Using GPU Accelerated Sparse Hierarchical Models

PRNGs & DES. Luke Anderson. 16 th March University Of Sydney.

GPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27

Embedded real-time stereo estimation via Semi-Global Matching on the GPU

SDA: Software-Defined Accelerator for general-purpose big data analysis system

Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs

N-Body Simulation using CUDA. CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo

HPC Capabilities at Research Intensive Universities

Issuing Laboratory: Evaluating Laboratory: Jurisdiction: Technical Standard for Testing: Software Supplier: Submitting Party: Product Tested:

UNCLASSIFIED//FOR OFFICIAL USE ONLY INDUSTRIAL CONTROL SYSTEMS CYBER EMERGENCY RESPONSE TEAM

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić

Recurrent Neural Network Models for improved (Pseudo) Random Number Generation in computer security applications

An Execution Strategy and Optimized Runtime Support for Parallelizing Irregular Reductions on Modern GPUs

UNIT 9A Randomness in Computa5on: Random Number Generators. Randomness in Compu5ng

Memory Bandwidth and Low Precision Computation. CS6787 Lecture 10 Fall 2018

A Comparison of CPUs, GPUs, FPGAs, and Massively Parallel Processor Arrays for Random Number Generation

Massively Parallel Approximation Algorithms for the Knapsack Problem

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink

XIV International PhD Workshop OWD 2012, October Optimal structure of face detection algorithm using GPU architecture

Locality-Aware Mapping of Nested Parallel Patterns on GPUs

GPU LIBRARY ADVISOR. DA _v8.0 September Application Note

Transcription:

Raj Boppana, Ph.D. Professor and Interim Chair Computer Science Department University of Texas at San Antonio

Terminology RN: pseudorandom number RN stream: a sequence of RNs Cycle: the maximum number of RNs a stream gives before repeating the starting ti sequence of RNs RNG: pseudorandom d number generator PRNG: parallel pseudorandom number generator Provides multiple l distinct t RN streams Correlation: how predictable a set of RNs is given another set of RNs Interstream and intrastream correlations Open Cloud Laboratory Silicon Informatics 9.10.13 2

Mobile Ad Hoc Network Simulator Initialization Node 0 Simulation Node 1 Simulation Node n-1 Simulation RN Stream 0 RN Stream 1 RN Stream n-1 Open Cloud Laboratory Silicon Informatics 9.10.13 3

Node Simulation Pseudocode // node i, i varies 0 n-1 While (1) { Delta_X = RNstream(i)*MAX_DELTA; Delta_Y = RNstream(i)*MAX_DELTA; Move node to (Current_X+Delta_X, Current_Y+Delta_Y) // Received a broadcast packet. Retransmit it selectively Process the packet if (RNstream(i) < ReTX_Probability) { Jitter = RNstream(i)*MAX_JITTER; Broadcast Message at Current_Time + Jitter } } Open Cloud Laboratory Silicon Informatics 9.10.13 4

Results Obtained Context-aware parallel random number generator (CPRNG) [Patent pending] Virtually unlimited number of random number streams, each stream with a large cycle Provides distinct streams based on the location of calls for RNs (user selectable; no app. recoding) Reduces the impact of intrastream correlations Dynamically allocates additional RN streams beyond any upper limit specified Some large, complex simulations require unpredictable and very large number of RN streams for subtasks Open Cloud Laboratory Silicon Informatics 9.10.13 5

Results Obtained Interstream Correlation (ISC) Test [Pat. pending] Evaluates correlations among parallel streams and gives a single number as quality metric The first statistical test of its kind The current statistical tests are 15-150 single-stream tests for intrastream correlations No single quality metric; a vector of pass/fail data Interstream correlations: interleave streams & apply s-s tests Integrates with CPRNG to provide the quality metric continually as the application consumes RNs Not feasible with the current test batteries Open Cloud Laboratory Silicon Informatics 9.10.13 6

Outline Related work Context-aware parallel pseudorandom number generator Interstream correlation test Summary and further work Open Cloud Laboratory Silicon Informatics 9.10.13 7

Sequential RNGs Linear congruential generator (LCG) x (mod 2 64 n axn 1 b ), n 1 Specify a, b, and the initial value x 0 Parallel streams are obtained by choosing different values for a 48-bit versions in Unix systems: drand48() Recursion with carry (RWC): c n G. Marsaglia, Diehard package, FSU RWC is used to seed the parallel RNGs we implemented x n a x a c 1 n1 5 n5 n1 x (mod 2 32 ) upper lower Mersenne Twister (MT): x x x x ) A n m A is a 64x64 bit matrix k n k m ( k k 1, x k+n is further multiplied by a bit matrix T to improve randomness Very popular; GPU version, MTGP, by Nvidia Matsumoto & Nishimura [ACM TOMACS 1998] Open Cloud Laboratory Silicon Informatics 9.10.13 8

Parameterized Parallel RNGs ALFG xn xnk xnl (mod 2 64 ), 0 k l Additive lagged Fibonacci generator n Initialization requires specification of x 0,, x l-1 2 63*(l-1) distinct RN streams Each with a cycle of length 2 63 *(2 l -1) Lag l=1279 or larger is used to minimize intra- and inter-stream correlations A canonical form to initialize ALFG streams is known 64 MLFG xn xnk * xnl (mod 2 ), 0 k l n Multiplicative lagged Fibonacci generator 2 61*16 = 2 976 distinct RN streams with lag l=17 Each with a cycle of length 2 61 *(2 17-1) 2 78 Lag l=17 is sufficient to get high quality RN streams Open Cloud Laboratory Silicon Informatics 9.10.13 9

Newer RNGs Cryptography-based RNGs Use the scrambling techniques from cryptography to generate random numbers from simple sequences GPU friendly RNGs Nvidia adapted MT for GPU implementation and offered as part of CUDA library Parameterized RNGs Open Cloud Laboratory Silicon Informatics 9.10.13 10

SPRNG: Parallel RNG and Test Package A. Srinivasan et al. ACM TOMS 2000, Parallel Computing 2003 and 2004 Provides 6 PRNGs: ALFG, MLFG, two LCGs and two other generators Several single stream tests Interleaves multiple parallel RN streams to form a sequential stream and applies sequential tests for interstream correlation checks Ising model (in Physics) simulation codes using Metropolis and Wolff algorithms Exact solutions are known so that the simulation error can be estimated Used for initial implementation of CPRNG Open Cloud Laboratory Silicon Informatics 9.10.13 11

Test Packages Statistical tests to analyze intrastream correlations of a single stream Dieharder (Brown, Duke University) TU01 ( Ecuyer & Blouin, ACM TOMS, 2007) Application-based testing 2-D Ising model simulations Exact theoretical results known Simulations are used to evaluate the quality of RNGs Two algorithms for simulations: Metropolis and Wolff Open Cloud Laboratory Silicon Informatics 9.10.13 12

Based on the Fibonacci recurrence: x n x n 64 k * xnl (mod 2 ), 0 k l n Implementation: an include file + a small library module

CPRNG Initialization of RN Streams 64 bits Lag L-1 Lag L-2 Random Fill using RWC and userspecified seed Common to all RN streams Lag K+2 Lag K+1 Lag K Distinct ID RN Context RN Context Lag 0 Open Cloud Laboratory Silicon Informatics 9.10.13 14

CPRNG Initialization: RN Context Calls for RNs with the same stream ID, but from different Lag L-1 locations, result in the use of Lag L-2 distinct RN streams Random number context consists of Stream identifier: 0,, total_str str-1 Program location: return address of the function call to RNG, or program line number and file name Optional: process ID, thread ID, iteration number Lag K+2 Lag K+1 Lag K Lag 0 Open Cloud Laboratory Silicon Informatics 9.10.13 15

RN Stream Initialization Time MLFG is part of SPRNG CPRNG1 implemented in Phase I inside SPRNG package CPRNG2 implemented in Phase II Open Cloud Laboratory Silicon Informatics 9.10.13 16

Random Number Generation Time CPRNG1 takes 1ns more than MLFG because of API compatibility issues CPRNG2 does not have the same issues Open Cloud Laboratory Silicon Informatics 9.10.13 17

Single-stream Tests Collisions Coupon collector s Equidistribution Gap Maximum-of-t Permutations Poker Runs up Serial Random walk Matrix rank test RN Stream Statistical Test x 2 x 1 x 0 applied k times p-value 1 [0,1] p-value 2 p-value k Open Cloud Laboratory Kolmogorov-Smirnov (KS) Test Silicon Informatics Pass/Fail 9.10.13 19

Current Parallel RN-Stream Tests Single-stream tests evaluate intrastream correlations in a stream Interstream t correlations of parallel l RN streams are evaluated by Interleaving the RN streams into a single stream Applying single-stream tests Alternatively, simulations of system/process models with known exact solutions may be used 2-D Ising model in Physics: Metropolis and Wolff algorithms Open Cloud Laboratory Silicon Informatics 9.10.13 20

Interstream Correlation (ISC) Test RN Streams Mixer Mixer Mixer Aggregator (K-S or D-R test) Correlation Coefficient PRNG Quality Metric Open Cloud Laboratory Silicon Informatics 9.10.13 21

Application of ISC Test CPRNG and MLFG were tested Up to 1.5 billion RN streams used 1500 sets of RN streams, 10 to 1 million per set 100+ RNs from each stream Shuffle interleaving of streams in a set Even the largest test case completes in 6 hours on a desktop with i7-870 CPU, 12 GB mem Open Cloud Laboratory Silicon Informatics 9.10.13 22

ISC Test: Donner & Rosner Method If the absolute value of the DR-statistic is above 1.96, then the correlations among the RN streams are considered significant. The probability of false conclusion is 0.05. A set of 10 to 10 6 RN streams gives one cor. coeff., r 1500 r s are used to calculate the statistic Open Cloud Laboratory Silicon Informatics 9.10.13 23

ISC Test: Kolmogorov-Smirnov Method If the absolute value of the KS-statistic is above 0.0274, then the correlations among the RN streams are considered significant. The probability of false conclusion is 0.01. Open Cloud Laboratory Silicon Informatics 9.10.13 24

Adapting ISC Test for Online Testing PRNG Application Results ISC Test PRNG Quality Metric Test Specification Open Cloud Laboratory Silicon Informatics 9.10.13 25

Summary and Further Work CPRNG provides two new features that other generators do not provide Distinct RN streams based on program context Virtually unlimited number of RN streams without requiring the user to specify a maximum upper bound CPRNG is implemented with simple API An API definition file and a library module Wrappers can be built to adapt CPRNG for the applications that use SPRNG or default RNGs on various platforms Seamless integration of CPU and GPU modes The same streams can be used in both modes Open Cloud Laboratory Silicon Informatics 9.10.13 26

Summary and Further Work ISC test: a new statistical test to evaluate interstream t correlations of billions of RN streams Can be adapted to provide a quality metric continually while the application runs and after it completes Ongoing work Intel Xeon Phi implementation of CPRNG Refine GPU implementation of CPRNG Looking for HPC users Use cases for CPRNG and ISC test HPC application testing of ISC test and CPRNG context- awareness feature Visit getcprng.com and register to obtain additional information Open Cloud Laboratory Silicon Informatics 9.10.13 27

Acknowledgments This work was supported by STTR Phase I & II grants from ARO (Program manager: Dr. Joe Myers) The desktop computer used for development was acquired with funds from an NSF grant; additional computer resources were provided by UTSA s Institute for Cyber Security STTR project participants Prof. Ravi Sandhu and Prof. Ram Tripathi, UT San Antonio Mr. Robert Keller and Mr. Hemant Trivedi, Silicon Informatics Prof. Srinivasan, Florida State U. This presentation does not necessarily reflect the position or the policy of the government, and no official endorsement should be inferred. Raj Boppana boppana@cs.utsa.edu 210-458-4434 Open Cloud Laboratory Silicon Informatics 9.10.13 28