Lecture 7. SchemeList, finish up; Universal Hashing introduction

Lecture 7. SchemeList, finish up; Universal Hashing introduction CS 16 February 24, 2010 1 / 15

foldleft #!/usr/bin/python def foldleft(func, slist, init): foldleft: (<a> * <b> -> <a>) * (<b> SchemeList) * <a> -> <a> Consumes: a function of two parameters, of types <a> and <b>, returning type <a> a SchemeList whose entries are all of type <b>, and a value of type <a> Produces: a value Purpose: apply func over slist left to right, at first with init as the left arg, and subsequently with the previous result as the left arg. if slist.isempty(): return init else: val = func(init, slist.first()) return foldleft(func, slist.rest(), val) 2 / 15

foldleft, applied #!/usr/bin/python def length(slist) : foldl(lambda x, y: x + 1, slist, 0) def contains(slist, item) : foldl(lambda x, y: x (y == item), slist, False) def list_max(slist) : <fill in here to test your understanding> def odd_count(slist) : <fill in here to test your understanding> 3 / 15

Analysis of foldleft If the list has n items, and the function being folded takes time T... then the total time is O(nT ), Reason: let F (n) be the time taken by foldleft on a list of size n Then F (0) = C and F (n) K + T + F (n 1) for some constants C and K. Use induction to conclude F (n) MnT for some constant M. n.b.: it s possible that T may depend on n! 4 / 15

reverse def reverse(slist): reverse: SchemeList -> SchemeList Consumes: a SchemeList Produces: a SchemeList Purpose: to reverse slist if slist.isempty(): return slist else: r = reverse(slist.rest()) return append(r, slist.first()) def append(slist, item): append: SchemeList * any -> SchemeList Consumes: a SchemeList, a value Produces: a SchemeList Purpose: add item to the end of slist if slist.isempty(): return SchemeList.makeCons(item, slist) else: a = append(slist.rest(), item) return SchemeList.makeCons(slist.first(), a) 5 / 15

reverse Let A(n) be running time of append on list of length n By usual analysis, A(n) cn, i.e., it s O(n n) Let R(n) be running time of reverse on list of length n Then R(0) = C and R(n) K + A(n 1) + R(n 1) for n > 1 Plug-n-chug to see that R(n) nk + c((n 1) + (n 2) +... + 2 + 1) for n > 1. Then use induction to prove that (omitted) Conclude that R(n) is O(n n 2 ) 6 / 15

fast reverse #!/usr/bin/python def fast_reverse(slist): fast_reverse: SchemeList -> SchemeList Consumes: a SchemeList Produces: a SchemeList Purpose: to reverse slist return fr_helper(slist, SchemeList.empty()) def fr_helper(slist, partial): Consumes: two SchemeLists Produces: a SchemeList Purpose: helper function to fast_reverse if slist.isempty(): return partial else: return fr_helper(slist.rest(), SchemeList.makeCons(slist.first(), partia 7 / 15

Elegant,. high-level fast reverse #!/usr/bin/python def fast_reverse(slist): fast_reverse: SchemeList -> SchemeList Consumes: a SchemeList Produces: a SchemeList Purpose: to reverse slist return foldleft(schemelist.makecons, slist, SchemeList.empty()) Notice that all recursion is hidden in the folding! 8 / 15

Hashing Goal: to implement the Dictionary ADT so that insert, find, delete all are O(n 1) Application: maintain a table of about 250 IP addresses (each of which consists of four bytes, as in 128.148.37.3; the individual numbers are between 0 and 255). Could implement Dictionary as a SchemeList of (key, value) pairs. insert is O(n 1) find and delete are O(n n) this is OK for very small dictionaries 9 / 15

Hashing IP addresses Idea 1: make an array where the index is a 32-bit integer corresponding to the four bytes of IP address. VERY large, very empty. Not good. Idea 2: make a table of 256 entries, and use the last byte of the IP address as index Problem: collisions. Solution: Each table entry is a SchemeList! Problem: if all IP addresses fall in one bucket, then find is O(n n) If we use any small table, this is bound to happen in some bad case We ll show a way to guarantee that such bad performance is extremely unlikely. Read pages 33-36 of Dasgupta et al. 10 / 15

Introduction to Probability Goal: to make a mathematical model of everyday reasoning about how likely things are Clever idea: replace the idea of a single coin-flip with a set containing two possible outcomes, and the probability of each. Reason about this set! 11 / 15

Probability Space A probability space is a finite set S, together with... a function p : S R, satisfying: P 0 p(s) 1 for every s S s S p(s) = 1 Example: S 1 = {H, T } and p(h) = 0.5 and P(T ) = 0.5. This is a model of a fair coin toss. 12 / 15

Probability Space, II Example2: S 2 = {1, 2, 3, 4, 5, 6} and p(i) = 1/6 for i = 1, 2,..., 6. This is a model of a fair roll of a die. 13 / 15

Event An event is a subset E of a probability space S. For the coin-toss example, possible events are {}, {H},{T } and {H, T }. These correspond to The coin doesn t land on either face The coin comes up heads The coin comes up tails The coin comes up either heads or tails The probability of an event E is Pr{E} = P s E p(s). In the examples, Pr{} = 0, Pr{H} = 0.5, Pr{T } = 0.5, and Pr{H, T } = 1.0. Correspond to the notions that the coin never lands on its edge, heads and tails are equally likely, and the coin always comes up heads or tails. 14 / 15

Random Variable An random variable (or RV ) on a probability space S is a function X : S R. For the coin-toss example, if I agree to pay Ben $100 if the coin comes up heads, but $0 if it comes up tails... we can model Ben s winnings with a random variable: X (H) = 100; X (T ) = 0 Notice that X is just a function! It s not a variable, and it s not random! Next class: expected value of a random variable. 15 / 15