CS 179: GPU Computing. Lecture 16: Simulations and Randomness

Similar documents
PRNGs & DES. Luke Anderson. 16 th March University Of Sydney.

Cryptographic Algorithms - AES

Cryptography and Network Security Chapter 7

Goals of Modern Cryptography

RNG: definition. Simulating spin models on GPU Lecture 3: Random number generators. The story of R250. The story of R250

Information Security CS526

CS 161 Computer Security. Week of September 11, 2017: Cryptography I

Parallelizing Cryptography. Gordon Werner Samantha Kenyon

ECE596C: Handout #7. Analysis of DES and the AES Standard. Electrical and Computer Engineering, University of Arizona, Loukas Lazos

Secret Key Cryptography

page 1 Introduction to Cryptography Benny Pinkas Lecture 3 November 18, 2008 Introduction to Cryptography, Benny Pinkas

Cryptography BITS F463 S.K. Sahay

Computer Security. 08. Cryptography Part II. Paul Krzyzanowski. Rutgers University. Spring 2018

UNIT 9A Randomness in Computation: Random Number Generators

Computer Security 3/23/18

AES Cryptosystem Acceleration Using Graphics Processing Units. Ethan Willoner Supervisors: Dr. Ramon Lawrence, Scott Fazackerley

U-II BLOCK CIPHER ALGORITHMS

CS 161 Computer Security

Dr. Jinyuan (Stella) Sun Dept. of Electrical Engineering and Computer Science University of Tennessee Fall 2010

1. Practice the use of the C ++ repetition constructs of for, while, and do-while. 2. Use computer-generated random numbers.

Computer Security CS 526

Secret Key Systems (block encoding) Encrypting a small block of text (say 64 bits) General Considerations:

Stochastic Algorithms

T Cryptography and Data Security

Dawn Song

Block Ciphers. Lucifer, DES, RC5, AES. CS 470 Introduction to Applied Cryptography. Ali Aydın Selçuk. CS470, A.A.Selçuk Block Ciphers 1

Monte Carlo Simulations

Chapter Four: Loops. Slides by Evan Gallagher. C++ for Everyone by Cay Horstmann Copyright 2012 by John Wiley & Sons. All rights reserved

ENEE 457: Computer Systems Security 09/12/16. Lecture 4 Symmetric Key Encryption II: Security Definitions and Practical Constructions

T Cryptography and Data Security

A New hybrid method in watermarking using DCT and AES

November , Universität Leipzig, CompPhys12

Security Applications

Cryptography and Network Security

A Secured Key Generation Scheme Using Enhanced Entropy

5.12 EXERCISES Exercises 263

Stream Ciphers. Koç ( ucsb ccs 130h explore crypto fall / 13

Cryptography (cont.)

CPSC 467b: Cryptography and Computer Security

Reproducibility in Stochastic Simulation

Lecturers: Mark D. Ryan and David Galindo. Cryptography Slide: 24

Monte Carlo Techniques. Professor Stephen Sekula Guest Lecture PHY 4321/7305 Sep. 3, 2014

Block ciphers. CS 161: Computer Security Prof. Raluca Ada Popa. February 26, 2016

I. INTRODUCTION. Manisha N. Kella * 1 and Sohil Gadhiya2.

Symmetric Cryptography

Computational Security, Stream and Block Cipher Functions

Scientific Computing: An Introductory Survey

UNIT 9A Randomness in Computation: Random Number Generators Principles of Computing, Carnegie Mellon University - CORTINA

Winter 2011 Josh Benaloh Brian LaMacchia

CS61B Lecture #32. Last modified: Sun Nov 5 19:32: CS61B: Lecture #32 1

Monte Carlo Integration and Random Numbers

Introduction to Modern Cryptography. Lecture 2. Symmetric Encryption: Stream & Block Ciphers

CPSC 467b: Cryptography and Computer Security

PGP: An Algorithmic Overview

CIS 3362 Final Exam 12/4/2013. Name:

PSEUDORANDOM numbers are very important in practice

Block Ciphers. Secure Software Systems

Side channel attack: Power Analysis. Chujiao Ma, Z. Jerry Shi CSE, University of Connecticut

Practical Aspects of Modern Cryptography

Random-Number Generation

A Class of Weak Keys in the RC4 Stream Cipher Preliminary Draft

Course Business. Midterm is on March 1. Final Exam is Monday, May 1 (7 PM) Allowed to bring one index card (double sided) Location: Right here

Vortex: A New Family of One-way Hash Functions Based on AES Rounds and Carry-less Multiplication

ISSN: Page 320

Encryption Details COMP620

Stream ciphers. Lecturers: Mark D. Ryan and David Galindo. Cryptography Slide: 91

Cryptography Symmetric Cryptography Asymmetric Cryptography Internet Communication. Telling Secrets. Secret Writing Through the Ages.

Block cipher modes. Lecturers: Mark D. Ryan and David Galindo. Cryptography Slide: 75

Advanced Encryption Standard and Modes of Operation. Foundations of Cryptography - AES pp. 1 / 50

Chapter 4: (0,1) Random Number Generation

Analysis of Cryptography and Pseudorandom Numbers

Uses of Cryptography

CPS2323. Symmetric Ciphers: Stream Ciphers

Stream Ciphers. Çetin Kaya Koç Winter / 13

CPSC 467: Cryptography and Computer Security

Cryptography and Network Security Chapter 7. Fourth Edition by William Stallings

Lecture 1: Perfect Security

Security: Cryptography

Implementation of Full -Parallelism AES Encryption and Decryption

You ve already read basics of simulation now I will be taking up method of simulation, that is Random Number Generation

Induction and Recursion. CMPS/MATH 2170: Discrete Mathematics

Midterm Exam 2B Answer key

AES Core Specification. Author: Homer Hsing

Cryptography and Network Security. Sixth Edition by William Stallings

Cryptographic Primitives A brief introduction. Ragesh Jaiswal CSE, IIT Delhi

Data Encryption Standard (DES)

MERSENNE TWISTER AND FUBUKI STREAM/BLOCK CIPHER

Introduction to Cryptography. Lecture 3

in a 4 4 matrix of bytes. Every round except for the last consists of 4 transformations: 1. ByteSubstitution - a single non-linear transformation is a

Hiding of Random Permutated Encrypted Text using LSB Steganography with Random Pixels Generator

Random Numbers Random Walk

Block Ciphers and Data Encryption Standard. CSS Security and Cryptography

Fundamentals of Cryptography

CSC 580 Cryptography and Computer Security

Text Steganography Using Compression and Random Number Generators

PRNGCL: OpenCL Library of Pseudo-Random Number Generators for Monte Carlo Simulations

ISA 562: Information Security, Theory and Practice. Lecture 1

Cryptography (DES+RSA) by Amit Konar Dept. of Math and CS, UMSL

Goals for Today. Substitution Permutation Ciphers. Substitution Permutation stages. Encryption Details 8/24/2010

Chapter Four: Loops II

Transcription:

CS 179: GPU Computing Lecture 16: Simulations and Randomness

Simulations South Bay Simulations, http://www.panix.com/~brosen/graphics/iacc.400.jpg Exa Corporation, http://www.exa.com/images/f16.png Flysurfer Kiteboarding, http://www.flysurfer.com/wpcontent/blogs.dir/3/files/gallery/research-and-development/zwischenablage07.jpg Max-Planck Institut, http://www.mpagarching.mpg.de/gadget/hydrosims/

Simulations But what if your problem is hard to solve? e.g. EM radiation attenuation Estimating complex probability distributions Complicated ODEs, PDEs (e.g. option pricing in last lecture) Geometric problems w/o closed-form solutions Volume of complicated shapes

Simulations Potential solution: Monte Carlo methods Run simulation with randomly chosen inputs (Possibly according to some distribution) Do it again and again and again Aggregate results

Monte Carlo example Estimating the value of π

Monte Carlo example Estimating the value of π Quarter-circle of radius r: Area = (πr 2 )/4 Enclosing square: Area = r 2 Fraction of area: π/4 "Pi 30K" by CaitlinJo - Own workthis mathematical image was created with Mathematica. Licensed under CC BY 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/file:pi_30k.gif#/media/file:pi_30k.gif

Monte Carlo example Estimating the value of π Quarter-circle of radius r: Area = (πr 2 )/4 Enclosing square: Area = r 2 Fraction of area: π/4 0.79 Solution : Randomly generate lots of points, calculate fraction within circle Answer should be pretty close! "Pi 30K" by CaitlinJo - Own workthis mathematical image was created with Mathematica. Licensed under CC BY 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/file:pi_30k.gif#/media/file:pi_30k.gif

Monte Carlo example Pseudocode: (simulate on N points) (assume r = 1) points_in_circle = 0 for i = 0,,N-1: randomly pick point (x,y) from uniform distribution in [0,1] 2 if (x,y) is in circle: points_in_circle++ return (points_in_circle / N) * 4 "Pi 30K" by CaitlinJo - Own workthis mathematical image was created with Mathematica. Licensed under CC BY 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/file:pi_30k.gif#/media/file:pi_30k.gif

Monte Carlo example Pseudocode: (simulate on N points) (assume r = 1) points_in_circle = 0 for i = 0,,N-1: randomly pick point (x,y) from uniform distribution in [0,1] 2 if x^2 + y^2 < 1: points_in_circle++ return (points_in_circle / N) * 4 "Pi 30K" by CaitlinJo - Own workthis mathematical image was created with Mathematica. Licensed under CC BY 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/file:pi_30k.gif#/media/file:pi_30k.gif

Monte Carlo simulations Planetary Materials Microanalysis Facility,, Northern Arizona University, http://www4.nau.edu/microanalysis/microprobesem/images/monte_carlo.jpg Center for Air Pollution Impact & Trend Analysis, Washington University in St. Louis, http://www4.nau.edu/microanalysis/microprobesem/images/monte_carlo.jpg http://www.cancernetwork.com/sites/default/files/cn_import/n0011bf1.jpg

General Monte Carlo method Pseudocode: for (number of trials): randomly pick value from a probability distribution perform deterministic computation on inputs (aggregate results)

General Monte Carlo method Why it works: Law of large numbers!

General Monte Carlo method Pseudocode: for (number of trials): randomly pick value from a probability distribution perform deterministic computation on inputs (aggregate results) Can we parallelize this?

General Monte Carlo method Pseudocode: for (number of trials): randomly pick value from a probability distribution perform deterministic computation on inputs (aggregate results) Trials are independent Can we parallelize this?

General Monte Carlo method Pseudocode: for (number of trials): randomly pick value from a probability distribution perform deterministic computation on inputs (aggregate results) Usually so (e.g. with reduction) Trials are independent Can we parallelize this?

General Monte Carlo method Pseudocode: for (number of trials): randomly pick value from a probability distribution perform deterministic computation on inputs (aggregate results) Usually so (e.g. with reduction) What about this? Trials are independent Can we parallelize this?

Parallelized Random Number Generation

Early Credits Algorithm and presentation based on: Parallel Random Numbers: As Easy as 1, 2, 3 (Salmon, Moraes, Dror, Shaw) at D. E. Shaw Research Developed for biomolecular simulations on Anton (massively parallel ASIC-based supercomputer) Also applicable to CPUs, GPUs

Random Number Generation Generating random data computationally is hard Computers are deterministic! https://cdn.tutsplus.com/vector/uploads/legacy/tuts/165_shiny_dice/27.jpg

Random Number Generation Two methods: Hardware random number generator aka TRNG ( True RNG) Uses data collected from environment (thermal, optical, etc) Very slow! Pseudorandom number generator (PRNG) Algorithm that produces random-looking numbers Faster limited by computational power

Demonstration

Random Number Generation PRNG algorithm should be: High-quality Fast Produce good random data (In its own right) Parallelizable! Can we do it? (Assume selection from uniform distribution)

A Very Basic PRNG Linear congruential generator (LCG) e.g. C s rand() //from glibc General formula: int32_t val = state[0]; val = ((state[0] * 1103515245) + 12345) & 0x7fffffff; state[0] = val; *result = val; X n+1 = ax n + c mod m X 0 is the seed (e.g. system time)

A Very Basic PRNG Linear congruential generator (LCG) e.g. C s rand() //from glibc General formula: int32_t val = state[0]; val = ((state[0] * 1103515245) + 12345) & 0x7fffffff; state[0] = val; *result = val; X n+1 = ax n + c mod m Non-parallelizable recurrence relation!

Linear congruential generators X n+1 = ax n + c mod m Not high quality! Clearly non-uniform Fast to compute Not parallelizable! "Lcg 3d". Licensed under CC BY-SA 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/file:lcg_3d.gif#/media/file:lcg_3d.gif

Measures of RNG quality Impossible to prove a sequence is random Possible tests: Frequency Periodicity - do the values repeat too early? Linear dependence

PRNG Parallelizability Many PRNGs (like the LCG) have a non-parallelizable appearance: X n+1 = f(x n ) (Better chance of good data when): All X i in some large state space Complicated function f

PRNG Parallelizability Possible approach to GPU parallelization: Assign a PRNG to each thread! Initialize with e.g. different X 0 Thread 0 produces sequence X n+1,0 = f(x n,0 ) Thread 1 produces sequence X n+1,1 = f(x n,1 )

PRNG Parallelizability Possible approach to GPU parallelization: Assign a PRNG to each thread! Initialize with e.g. different X 0 Thread 0 produces sequence X n+1,0 = f(x n,0 ) Thread 1 produces sequence X n+1,1 = f(x n,1 ) In practice, often cannot get high quality Repeated values, lack of good, enumerable parameters

PRNG Parallelizability Instead of: Suppose we had: X n+1 = f(x n ) X n+1 = b n This is parallelizable! (Without our previous trick ) Is this possible?

More General PRNG Keyed PRNG given by: Transition function: f: S S Output function: g: K S U S: U: K: Internal (hidden) state space Output space Key space Can seed output behavior without relying on X 0 alone useful for scientific reproducibility!

More General PRNG Keyed PRNG given by: Transition function: f: S S Output function: g: K S U S: U: K: Internal (hidden) state space Output space Key space If S has J times more bits than U, can produce J outputs per transition. Assume J = 1 in this lecture Can seed output behavior without relying on X 0 alone useful for scientific reproducibility!

More General PRNG Keyed PRNG given by: Transition function: f: S S Output function: g: K S U Trivial example: LCG X n+1 = ax n + c mod m f X n = ax n + c g X n = X n S is (for example) the space of 32-bit integers U = S K is trivial (no keys used)

More General PRNG Keyed PRNG given by: Transition function: f: S S Output function: g: K S U Trivial example: LCG X n+1 = ax n + c mod m f X n = ax n + c g X n = X n f is more complicated than g!

More General PRNG Keyed PRNG given by: Transition function: f: S S Output function: g: K S U General theme: f is complicated, g is simple What if we flipped that?

More General PRNG Keyed PRNG given by: Transition function: f: S S Output function: g: K S U General theme: f is complicated, g is simple What if we flipped that? What if f were so simple that it could be evaluated explicity?

More General PRNG i.e. what if we had: Simple transition function (p-bit integer state space): f s = (s + 1) mod 2 p This is just a counter! Can expand into explicit formula f n = (n + n 0 ) mod 2 p These form counter-based PRNGs Complicated output function g Would this work?

More General PRNG i.e. what if we had: Simple transition function f Complicated output function g(k, n) Should be bijective w/r/to n Guarantees period of 2 p Shouldn t be too difficult to compute

Bijective Functions Cryptographic block ciphers! AES (Advanced Encryption Standard), Threefish, Must be bijective! (Otherwise messages can t be encrypted/decrypted)

AES-128 Algorithm 1) Key Expansion Determine all keys k from initial cipher key k B Used to strengthen weak keys Sohaib Majzoub and Hassan Diab, Reconfigurable Systems for Cryptography and Multimedia Applications, http://www.intechopen.com/source/html/38442/m edia/image19_w.jpg

AES-128 Algorithm 2) Add round key Bitwise XOR state s with key k 0 By User:Matt Crypto - Own work. Licensed under Public Domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/file:aes- AddRoundKey.svg#/media/File:AES-AddRoundKey.svg

AES-128 Algorithm 3) For each round (10 rounds total) a) Substitute bytes Use lookup table to switch positions By User:Matt Crypto - Own work. Licensed under Public Domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/file:aes- AddRoundKey.svg#/media/File:AES-AddRoundKey.svg

AES-128 Algorithm 3) For each round b) Shift rows By User:Matt Crypto - Own work. Licensed under Public Domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/file:aes- AddRoundKey.svg#/media/File:AES-AddRoundKey.svg

AES-128 Algorithm 3) For each round c) Mix columns Multiply by constant matrix By User:Matt Crypto - Own work. Licensed under Public Domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/file:aes- AddRoundKey.svg#/media/File:AES-AddRoundKey.svg

AES-128 Algorithm 3) For each round d) Add round key (as before) By User:Matt Crypto - Own work. Licensed under Public Domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/file:aes- AddRoundKey.svg#/media/File:AES-AddRoundKey.svg

AES-128 Algorithm 4) Final round Do everything in normal round except mix columns

AES-128 Algorithm Summary: 1) Expand keys 2) Add round key 3) For each round (10 rounds total) Substitute bytes Shift rows Mix columns Add round key 4) Final round: (do everything except mix columns)

Algorithmic Improvements We have a good PRNG! Simple transition function f Counter Complicated output function g(k, n) AES-128

Algorithmic Improvements We have a good PRNG! Simple transition function f Counter Complicated output function g(k, n) AES-128 High quality! Passes Crush test suite (more on that later) Parallelizable! f and g only depend on k, n! Sort of slow to compute AES is sort of slow without special instructions (which GPUs don t have)

Algorithmic Improvements Can we make AES go faster? AES is a cryptographic algorithm, but we re using it for PRNG Can we change the algorithm for our purposes?

AES-128 Algorithm Summary: 1) Expand keys 2) Add round key 3) For each round (10 rounds total) Substitute bytes Shift rows Mix columns Add round key 4) Final round: (do everything except mix columns)

Summary: 1) Expand keys 2) Add round key AES-128 Algorithm 3) For each round (10 rounds total) Substitute bytes Shift rows Mix columns Add round key 4) Final round: Purpose of this step is to hide key from attacker using chosen plaintext. Not relevant here. (do everything except mix columns)

Summary: 1) Expand keys 2) Add round key AES-128 Algorithm 3) For each round (10 rounds total) Substitute bytes Shift rows Mix columns Add round key 4) Final round: Purpose of this step is to hide key from attacker using chosen plaintext. Not relevant here. Do we really need this many rounds? (do everything except mix columns) Other changes?

Key Schedule Change Old key schedule: New key schedule: The first n bytes of the expanded key are simply the encryption key. The rcon iteration value i is set to 1 Until we have b bytes of expanded key, we do the following to generate n more bytes of expanded key: We do the following to create 4 bytes of expanded key: We create a 4-byte temporary variable, t We assign the value of the previous four bytes in the expanded key to t We perform the key schedule core (see above) on t, with i as the rcon iteration value We increment i by 1 We exclusive-or t with the four-byte block n bytes before the new expanded key. This becomes the next 4 bytes in the expanded key We then do the following three times to create the next twelve bytes of expanded key: We assign the value of the previous 4 bytes in the expanded key to t We exclusive-or t with the four-byte block n bytes before the new expanded key. This becomes the next 4 bytes in the expanded key If we are processing a 256-bit key, we do the following to generate the next 4 bytes of expanded key: We assign the value of the previous 4 bytes in the expanded key to t We run each of the 4 bytes in t through Rijndael's S-box We exclusive-or t with the 4-byte block n bytes before the new expanded key. This becomes the next 4 bytes in the expanded key. k 0 = k B k i+1 = k i + constant e.g. golden ratio Copied from Wikipedia (Rijndael Key Schedule)

AES-128 Algorithm Summary: 1) Expand keys using simplified algorithm 2) Add round key 3) For each round (10 5 rounds total) Substitute bytes Shift rows Mix columns Add round key 4) Final round: (do everything except mix columns) Other simplifications possible!

Algorithmic Improvements We have a good PRNG! Simple transition function f Counter Complicated output function g(k, n) Modified AES-128 (known as ARS-5) High quality! Passes Crush test suite (more on that later) Parallelizable! f and g only depend on k, n! Moderately faster to compute

Even faster parallel PRNGs Use a different g, e.g. Threefish cipher Optimized for PRNG known as Threefry Philox (see paper for details) 202 GB/s on GTX580! Fastest known PRNG in existence

General Monte Carlo method Pseudocode: for (number of trials): randomly pick value from a probability distribution perform deterministic computation on inputs (aggregate results) Usually so (e.g. with reduction) What about this? Trials are independent Can we parallelize this?

General Monte Carlo method Pseudocode: for (number of trials): randomly pick value from a probability distribution perform deterministic computation on inputs (aggregate results) Usually so (e.g. with reduction) Yes! Trials are independent Can we parallelize this? Yes! Part of curand

Summary Monte Carlo methods Very useful in scientific simulations Parallelizable because of Parallelized random number generation Another story of parallel algorithm analysis

Credits (again) Parallel RNG algorithm and presentation based on: Parallel Random Numbers: As Easy as 1, 2, 3 (Salmon, Moraes, Dror, Shaw) at D. E. Shaw Research