Team Prob. Team Prob

Similar documents
Lab 3: Sampling Distributions

LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT

Here is a sample IDLE window illustrating the use of these two functions:

Chapter 6: DESCRIPTIVE STATISTICS

In the real world, light sources emit light particles, which travel in space, reflect at objects or scatter in volumetric media (potentially multiple

Notes on Turing s Theorem and Computability

Chapter 16 Heuristic Search

III Data Structures. Dynamic sets

Distributions of Continuous Data

1 Pencil and Paper stuff

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02)

Intro. Scheme Basics. scm> 5 5. scm>

6. Relational Algebra (Part II)

Foundations, Reasoning About Algorithms, and Design By Contract CMPSC 122

An introduction to plotting data

Probability and Statistics for Final Year Engineering Students

RECURSION. Week 6 Laboratory for Introduction to Programming and Algorithms Uwe R. Zimmer based on material by James Barker. Pre-Laboratory Checklist

14.1 Encoding for different models of computation

Regularization and model selection

Lecture 4: examples of topological spaces, coarser and finer topologies, bases and closed sets

n! = 1 * 2 * 3 * 4 * * (n-1) * n

Kuratowski Notes , Fall 2005, Prof. Peter Shor Revised Fall 2007

4. Use a loop to print the first 25 Fibonacci numbers. Do you need to store these values in a data structure such as an array or list?

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: September 28, 2016 Edited by Ofir Geri

2. Use elementary row operations to rewrite the augmented matrix in a simpler form (i.e., one whose solutions are easy to find).

4.7 Approximate Integration

Lecture 1: Overview

Divisibility Rules and Their Explanations

1.7 Limit of a Function

Welfare Navigation Using Genetic Algorithm

Lecture Transcript While and Do While Statements in C++

ELEMENTARY NUMBER THEORY AND METHODS OF PROOF

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: April 1, 2015

CS125 : Introduction to Computer Science. Lecture Notes #38 and #39 Quicksort. c 2005, 2003, 2002, 2000 Jason Zych

User Defined Functions

Programming for Experimental Research. Flow Control

ELEMENTARY NUMBER THEORY AND METHODS OF PROOF

Fall 2004 CS414 Prelim 1

LOOPS. Repetition using the while statement

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #17. Loops: Break Statement

9. MATHEMATICIANS ARE FOND OF COLLECTIONS

Electrical Circuits and Random Walks

16 Greedy Algorithms

The Probabilistic Method

Lab 4: Imperative & Debugging 12:00 PM, Feb 14, 2018

Animations involving numbers

Cantor s Diagonal Argument for Different Levels of Infinity

Bulgarian Math Olympiads with a Challenge Twist

1 Matrices and Vectors and Lists

CS61A Discussion Notes: Week 11: The Metacircular Evaluator By Greg Krimer, with slight modifications by Phoebus Chen (using notes from Todd Segal)

CS103 Handout 29 Winter 2018 February 9, 2018 Inductive Proofwriting Checklist

Project 1 Balanced binary

Hashing and sketching

Due Thursday, July 18 at 11:00AM

Chapel Hill Math Circle: Symmetry and Fractals

Assignment 3: Block Ciphers

Monte Carlo Integration

Worst-case running time for RANDOMIZED-SELECT

Let denote the number of partitions of with at most parts each less than or equal to. By comparing the definitions of and it is clear that ( ) ( )

Lecture 10. Finding strongly connected components

Chapter 3. Set Theory. 3.1 What is a Set?

Excerpt from "Art of Problem Solving Volume 1: the Basics" 2014 AoPS Inc.

Type Checking and Type Equality

Import Statements, Instance Members, and the Default Constructor

Algebra of Sets. Aditya Ghosh. April 6, 2018 It is recommended that while reading it, sit with a pen and a paper.

CS2 Algorithms and Data Structures Note 10. Depth-First Search and Topological Sorting

Chapter 15 Introduction to Linear Programming

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

height VUD x = x 1 + x x N N 2 + (x 2 x) 2 + (x N x) 2. N

Computational Geometry: Lecture 5

To prove something about all Boolean expressions, we will need the following induction principle: Axiom 7.1 (Induction over Boolean expressions):

Starting Boolean Algebra

Short-Cut MCMC: An Alternative to Adaptation

Chapter 2: Modeling Distributions of Data

Multi-step transformations

Programming and Data Structures Prof. N.S. Narayanaswamy Department of Computer Science and Engineering Indian Institute of Technology, Madras

GOV 2001/ 1002/ E-2001 Section 1 1 Monte Carlo Simulation

Shadows in the graphics pipeline

Understanding Recursion

CIS 194: Homework 6. Due Monday, February 25. Fibonacci numbers

CIS192 Python Programming

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14

An Interesting Way to Combine Numbers

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

CS 4349 Lecture October 18th, 2017

An aside. Lecture 14: Last time

XP: Backup Your Important Files for Safety

The Bizarre Truth! Automating the Automation. Complicated & Confusing taxonomy of Model Based Testing approach A CONFORMIQ WHITEPAPER


RACKET BASICS, ORDER OF EVALUATION, RECURSION 1

Difference Between Dates Case Study 2002 M. J. Clancy and M. C. Linn

CS103 Handout 50 Fall 2018 November 30, 2018 Problem Set 9

Logic and Computation Lecture 20 CSU 290 Spring 2009 (Pucella) Thursday, Mar 12, 2009

Week - 01 Lecture - 04 Downloading and installing Python

A Quick Introduction to R

Statistics: Interpreting Data and Making Predictions. Visual Displays of Data 1/31

CSCI 1100L: Topics in Computing Lab Lab 11: Programming with Scratch

Why Deprecating async() is the Worst of all Options

Random Oracles - OAEP

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

Transcription:

1 Introduction In this module, we will be simulating the draft lottery used by the National Basketball Association (NBA). Each year, the worst 14 teams are entered into a drawing to determine who will get the first three picks in the upcoming draft. The lottery is designed so that the worse a team s record is, the better their chances of getting the first pick. You can find more information on how the lottery works here. For clarity we label the 14 teams in the lottery 1, 2,..., 14, with 1 corresponding to the team with the best chance of landing the first pick in the draft. Table?? shows the probability that each team wins the lottery and gets the first pick in the draft. Team Prob. Team Prob. 1 0.250 8 0.028 2 0.199 9 0.017 3 0.156 10 0.011 4 0.119 11 0.008 5 0.088 12 0.007 6 0.063 13 0.006 7 0.043 14 0.005 Table 1: Probability of getting the first pick in the draft At this point, it is important to formalize what we mean when we say that team 1 wins the lottery with probability 0.250. We will adopt the frequency definition. 1 Briefly, suppose we repeatedly perform a random experiment, which can result in one of many outcomes, several times and record the number of times each outcome occurred. With these counts, we can compute the proportion of times each outcome occurred of each outcome by dividing the count by the number of trials. Now imagine that we were able to repeat the experiment for infinitely many trials. The probability of any particular outcome would be the proportion of trials which resulted in that outcome. So when we saw the probability that team 1 will win the lottery is 25%, we really mean that if we were to run the lottery infinitely many times, team 1 would win in a quarter of these trials. Unfortunately, we cannot actually perform any experiment infinitely often and we are limited to only finitely many trials. Luckily, if we perform a random experiment for a sufficiently large number of trials, the resulting empirical proportion of an outcome will be very close to the true probability. In the context of the NBA draft lottery, this means that if we were to simulate the lottery, say, 1,000,000 times, the number of times team 1 won the lottery will be very close to 0.25 1, 000, 000 = 250, 000. 2. 1 Interpretting probabilities is a fundamental philosophical problem. For more information, check out the entry on interpreting probability in the Stanford Encyclopedia of Philosophy 2 This is essentially what the Law of Large Numbers guarantees 1

2 Generating Random Numbers in R One extremely useful feature of R is its ability to draw random numbers from a wide variety of probability distributions. For instance, let s say that we wish to pick a random number from the set {1, 2, 3, 4} uniformly (i.e. each number is equally likely to be picked). We can do this using the sample() function. [ 1 ] 1 [ 1 ] 1 [ 1 ] 4 [ 1 ] 1 Looking at these examples, however, it is not immediately obvious that each number was equally likely to be picked (in fact, there is barely a 10% chance that 2 was not picked in 8 repeated uniform drawings from {1, 2, 3, 4}). One way to see whether sample() is actually picking the numbers uniformly at random, we can ask it to simulate this random drawing 1,000,000 times and make a histogram to measure the relatively frequency that each number is picked. The following code does just that and produces the image in Figure 1 > x < sample ( 1 : 4, s i z e = 1000000, r e p l a c e = TRUE) > h i s t ( x, breaks = seq ( 0, 4, by = 1), f r e q = FALSE, ylim = c (0, 0. 3 ) ) > a b l i n e ( h = 0. 2 5, c o l = ' red ' ) 2

Figure 1: Histogram showing results of repeatedly sampling uniformly from {1, 2, 3, 4}. There are a few things to observe in this code above. First, when we call the sample() function, we must specify replace=true. If we had left replace=false (which is the default setting), we would have gotten an error: > sample ( 1 : 4, s i z e =1000000, r e p l a c e = FALSE) Error in sample. i n t ( l ength ( x ), s i z e, r e p l a c e, prob ) : cannot take a sample l a r g e r than the population when ' r e p l a c e = FALSE ' After defining the vector x, we create a histogram. We have manually set the breaks argument so that the bins in our histogram go from 0 to 1, 1 to 2, etc. We have also set the freq argument to FALSE, yielding a density histogram. Finally, we have added a red line at height 0.25. It is quite reassuring to see that each number was picked approximately 25% of the time! Up to this point, we have only seen how sample() can be used to generating from a set uniformly at random. In order to properly simulate the draft lottery process, we need to draw numbers from a non-uniform distribution. This is where the prob argument in sample() comes in handy. Exercise 1. Just like we did in Figure 1, verify that when we set the argument prob = c(0.5, 0.25, 0.15, 0.1) in sample(1:4, replace = TRUE), the number 1 is picked about 50% of the time, 2 is 3

picked about 25% of the time, 3 is picked about 15% of the time, and 4 is picked 10% of the time 2. Create a vector named lottery.probs that contains the probability listed in Table??each team in the lottery wins. Pass this vector as the prob argument to sample() to simulate the winner of the lottery 1,000,000. Make a histogram similar to Figure 1 based on these results. 4

3 Beyond the first pick Admittedly, the last exercise was a bit anti-climatic, in light of our frequentist interpretation of probability. Using just the values in Table 1, it is not trivial to compute the probability that a team gets the 2 nd pick in the draft. In order to simulate the full draft order we ll need to do a bit more work. The actual lottery is performed as follows: teams are assigned a set of four-number combinations from {1, 2,..., 14}, with the team with the worst record receiving 250 combinations, the team with the second-worst record receiving 199 combinations, and so on. To determine the first pick, four balls are selected uniformly and at random, with replacement, from a bin of balls labelled 1, 2,..., 14. The team who has that combination is awarded the first pick. Then, the process of drawing four balls from the bin is repeated and the team with the resulting combination is awarded the second pick. Since no team can be awarded multiple picks through the lottery, if the team with the first pick also owns the second four-number combination drawn, the process of drawing four balls is repeated until we get a combination which does not belong to the team with the first pick. A similar procedure is followed to award the third pick. After the first three picks have been awarded, the remainder of the draft order is set by team record. Rather than simulate the actual process of drawing four balls and looking up which teams own which combinations, we will simulate an equivalent process. In particular, we will use sample() to pick a number from {1, 2,..., 14} with probabilities listed in Table 1 to determine which team gets the first pick. We ll then pick another number from {1, 2,..., 14} with the same probabilities. If they are equal, we will keep drawing until we get a new number to determine who gets the second pick. In order to program the process of repeatedly drawing until we get a new number, we need to use a while() loop. A while() loop consists of two parts: a logical condition and a block of code. The loop starts by checking the condition and if it is TRUE, it will execute the block of code. It will moreover repeatedly execute the block of code until the condition is no longer true. Here s a really basic example of a while() loop > x < 0 > while ( x < 5) + p r i n t ( paste0 ( "x = ", x ) ) + x < x + 1 [ 1 ] "x = 0" [ 1 ] "x = 1" [ 1 ] "x = 2" [ 1 ] "x = 3" [ 1 ] "x = 4" > x [ 1 ] 5 In this example, the loop checks to see whether or not x is less than 5. If x < 5, then we add one to 5

x and check the condition again. From the printed statements, we see that the block of code within the loop (i.e. between the curled braces) is executed until x = 5. Note that a while() loop will continue executing the block of code until the condition is no longer TRUE, meaning that there is a potential for a while() loop to continue indefinitely. Such infinite loops are highly problematic and care must be taken to avoid them. If you find yourself stuck in an infinite loop, you should halt the execution with ESC key. To avoid getting stuck in an infinite loop, you need to make sure that the you haven t used a constraint that is always TRUE. Additionally, you need to make sure that there is a way to update the expression or quantities being checked by the condition in the block of code being executed. For instance, in the above example, if we had neglected to include x < x + 1, then there would be no way for the condition x < 5 to fail. To simulate awarding first and second picks of the draft, we can do > f i r s t. pick < sample ( 1 : 1 4, s i z e = 1, r e p l a c e = TRUE, prob = l o t t e r y. probs ) > second. pick < sample ( 1 : 1 4, s i z e = 1, r e p l a c e = TRUE, prob = l o t t e r y. probs ) > while ( second. pick == f i r s t. pick ) + p r i n t ( " f i r s t. pick = second. pick! need to re draw f o r 2nd pick " ) + second. pick < sample ( 1 : 1 4, s i z e = 1, r e p l a c e = TRUE, prob = l o t t e r y. probs ) > > > f i r s t. pick > second. pick [ 1 ] 4 In this example, we first drew the first pick and second picks and stored their values as first.pick and second.pick. We then wrote a while() loop to re-draw the second pick if necessary. Notice in the while() loop we included a print statement. This lets us know that we have started executing the block of code contained in the while() loop. The fact that nothing was printed when we executed the code in this example means that in this case, we did not have to re-draw for the second pick (we confirm this at the end of the example). Exercise 1. Execute the code in the example above several times. Keep track of the number of times that you had to re-draw the second pick. Hint: It d be helpful to write this code in a script. Then you can select all of the lines and execute them at once with Command+Enter on a Mac or Control + R on Windows 6

The previous exercise had you repeatedly execute a block code several times by hand. This process, however, is not scalable, especially if you wish to simulate drawing the first two picks 1,000,000 times. This brings us to for () loops. Basically, a for () loop allows us to repeatedly execute a block of code several times. Like a while() loop, a for () consists of two parts: a vector of iterators and a block of code. For each iterator in the vector, the loop will execute the block of code. > f o r ( i in 1 : 4 ) + f i r s t. pick < sample ( 1 : 1 4, s i z e = 1, r e p l a c e = TRUE, prob = l o t t e r y. probs ) + p r i n t ( f i r s t. pick ) [ 1 ] 2 [ 1 ] 8 In this example, we have drawn the first pick of the draft 4 separate times, each time printing the result. Typically, when we use a for () loop to simulate a random process, we d like to save the results in a matrix or data.frame. In this case, we ll use a matrix > p i c k s. matrix < matrix ( nrow = 5, ncol = 2, dimnames = l i s t ( c ( ), c ( " F i r s t Pick ", " Second Pick " ) ) ) > f o r ( i in 1 : 5 ) + f i r s t. pick < sample ( 1 : 1 4, s i z e = 1, r e p l a c e = TRUE, prob = l o t t e r y. probs ) + second. pick < sample ( 1 : 1 4, s i z e = 1, r e p l a c e = TRUE, prob = l o t t e r y. probs ) + while ( second. pick == f i r s t. pick ) + second. pick < sample ( 1 : 1 4, s i z e = 1, r e p l a c e = TRUE, prob = l o t t e r y. probs ) + + p i c k s. matrix [ i, " F i r s t Pick " ] < f i r s t. pick + p i c k s. matrix [ i, " Second Pick " ] < second. pick + In this example, we first created a matrix picks.matrix to store the results from our simulation. Then we ran the code we wrote earlier simulate the first two picks (note, we removed the print() statement from the while() loop for ease of presentation). The resulting picks were 7

F i r s t Pick Second Pick [ 1, ] 2 14 [ 2, ] 2 1 [ 3, ] 6 1 [ 4, ] 1 2 [ 5, ] 4 1 We are now equipped to answer the following question What is the probability that team 1 gets the second pick of the draft? Above, we see that team 1 received the second pick 3 times out 5 (i.e. 60% of the time). Unfortunately, 5 simulations is by no means close to sufficient to determine this probability accurately. We need, instead, to do something like 1,000,000 simulations and in this case, we can not rely on printed output to find the probability. Instead, we can do the following: > mean( p i c k s. matrix [, " Second Pick " ] == 1) [ 1 ] 0. 6 This line of code is doing several things: first, the expression picks. matrix[,"second Pick"] == 1 creates a logical vector. R can evaluate TRUE s to be 1 s and FALSE s to be 0 s so when we pass this vector to mean(), we are simply computing the proportion of TRUE s. This is precisely the proportion of simulations in which the second pick went to team 1. Exercises 1. Modify the code in the example above to simulate drawing the first two draft picks 1,000,000 times. 2. For each team, compute the probability that it is awarded the second pick in the draft. At this point, you may be wondering how we can modify our code to answer questions such as What is the probability that team 1 does not get either the first pick or the second pick in the draft? and What is the probability that team 5 gets either the first pick or the second pick? To answer this, we need the quantifies like AND and OR. Indeed, the first probability is the proportion of times that picks. matrix[,"first Pick"]! = 1 AND picks. matrix[,"second Pick"]!= 1, while the second probability is the proportion of times that picks. matrix[,"first Pick"] == 5 OR picks. matrix[,"second Pick"] == 5. In R, the AND quantifier is denoted & and the OR quantifier is denoted. We can use them as follows: > mean( p i c k s. matrix [, " Second Pick " ]!= 1 & p i c k s. matrix [, " F i r s t Pick " ]!= 1) > mean( p i c k s. matrix [, " Second Pick " ] == 5 p i c k s. matrix [, " F i r s t Pick " ] == 5) 8

4 The first 3 picks Up to this point, we ve only focused on the first two picks of the draft. In order to simulate drawing the third pick we can add to the code we ve written before. First, we can create a new variable called third.pick and initialize it like we did with second.pick. We can then go ahead and re-draw second.pick as necessary until we know that first. pick! = second.pick. We then can write another while() loop to re-draw the third pick as necessary. The condition in this loop, however, is a little bit more complex than the previous loop. In particular, we need to re-draw third.pick if third. pick == first.pick OR third.pick == second.pick. We can add the OR operator to our condition with the symbol: > while ( t h i r d. pick == f i r s t. pick t h i r d. pick == second. pick ) {... } Exercise 1. Fill in the... in the above example with the code necessary to re-draw third.pick. At this point, you should have a single block of code that will simulate drawing the first three picks of the draft. 2. Wrap this block of code into a for () loop and simulate drawing the first pick 100 times. Be sure to save the resulting picks in a matrix like the one we created above. 3. Using this matrix, answer the following questions (a) What is the probability that team 1 gets the second or third pick? (b) What is the probability that team 14 gets one of the top three picks? (c) What is the probability that team 1 does not get any of the top 3 picks? 9