A Quantitative Measure of Relevance Based on Kelly Gambling Theory

Size: px

Start display at page:

Download "A Quantitative Measure of Relevance Based on Kelly Gambling Theory"

Gillian Caldwell
6 years ago
Views:

1 A Quantitative Measure of Relevance Based on Kelly Gambling Theory Mathias Winther Madsen Institute for Logic, Language, and Computation University of Amsterdam

2 PLAN Why How Examples

3 Why

4 Why

5 How

6 Why not use Shannon information H(X) == E log Pr(X == x) Claude Shannon (96 200)

7 Why not use Shannon information Information Content === Prior Uncertainty (cf. Klir 2008; Shannon 948) Posterior Uncertainty

8 Why not use Shannon information Pr(X == ) == 0.5 Pr(X == 2) == 0.9 What is the value of X Pr(X == 3) == 0.23 Pr(X == 4) == 0.2 H(X) == E log Pr(X == x) Pr(X == 5) == 0.22 == 2.3

9 Why not use Shannon information 0 Pr(X == ) == 0.5 Pr(X == 2) == 0.9 Is X == 2 0 Is X == 3 0 Pr(X == 3) == 0.23 Is X in {4,5} Is X == 5 Expected number of questions: == 2.34 Pr(X == 4) == Pr(X == 5) == 0.22

10 What color are my socks H(p) == p log p == 6.53 bits of entropy.

11 How

12 Why not use value-of-information $! Value-ofInformation! === $$ $ $$ Posterior Expectation Prior Expectation

13 Why not use value-of-information Rules: Your capital can be distributed freely Bets on the actual outcome are returned twofold Bets on all other outcomes are lost

14 Why not use value-of-information Expected payoff Optimal Strategy: (Everything on Heads) Degenerate Gambling (Everything on Tails)

15 Why not use value-of-information Capital Probability Rounds Rate of return (R)

16 Why not use value-of-information Rate of return: Ri Probability Capital at time i + == Capital at time i Long-run behavior: E[ R R2 R3 Rn ] Rate of return (R)

17 Why not use value-of-information Rate of return: Ri Probability Capital at time i + == Capital at time i Long-run behavior: E[ R R2 R3 Rn ] Converges to 0 in probability as n Rate of return (R)

18 Optimal reinvestment Daniel Bernoulli ( ) John Larry Kelly, Jr. ( )

19 Optimal reinvestment Doubling rate: Capital at time i + Wi == log Capital at time i (so R = 2W)

20 Optimal reinvestment Doubling rate: Capital at time i + Wi == log Capital at time i (so R = 2W) Long-run behavior: E[ R R2 R3 Rn ] == E[2W + W2 + W3 + + Wn] == 2E[W + W2 + W3 + + Wn] 2nE[W] for n

21 Optimal reinvestment Logarithmic expectation E[W] == p log bo is maximized by proportional gambling (b* == p). Arithmetic expectation E[R] == pbo is maximized by degenerate gambling

22 Measuring relevant information $! Amount of relevant information! === $$ $ $$ Posterior expected doubling rate Prior expected doubling rate

23 Measuring relevant information Definition (Relevant Information): For an agent with utility function u, the amount of relevant information contained in the message Y == y is K(y) == maxs Pr(x y) log u(s, x) Posterior optimal doubling rate maxs Pr(x) log u(s, x) Prior optimal doubling rate

24 Measuring relevant information K(y) == maxs Pr(x y) log u(s, x) maxs Pr(x) log u(s, x) Expected relevant information is non-negative. Relevant information equals the maximal fraction of future gains you can pay for a piece of information without loss. When u has the form u(s, x) == v(x) s(x) for some non-negative function v, relevant information equals Shannon information.

25 Example: Code-breaking

26 Example: Code-breaking Entropy: H=4 Accumulated information: I(X; Y) == 0

27 Example: Code-breaking bit! Entropy: H=3 Accumulated information: I(X; Y) ==

28 Example: Code-breaking 0 bit! Entropy: H=2 Accumulated information: I(X; Y) == 2

29 Example: Code-breaking 0 bit! Entropy: H= Accumulated information: I(X; Y) == 3

30 Example: Code-breaking 0 bit! Entropy: H=0 Accumulated information: I(X; Y) == 4

31 Example: Code-breaking 0 bit bit bit bit Entropy: H=0 Accumulated information: I(X; Y) == 4

32 Example: Code-breaking Rules: You can invest a fraction f of your capital in the guessing game If you guess the correct code, you get your investment back 6-fold: u == f + 6 f Otherwise, you lose it: u == f 5 W(f) == log( f ) + log( f + 6f ) 6 6

33 Example: Code-breaking Optimal strategy: f * == 0 Optimal doubling rate: W(f *) == W(f) == log( f ) + log( f + 6f ) 6 6

34 Example: Code-breaking 0.04 bits Optimal strategy: f * == /5 Optimal doubling rate: W(f *) == log( f ) + log( f + 6f ) W(f) == 8 8

35 Example: Code-breaking bits Optimal strategy: f * == 3/5 Optimal doubling rate: W(f *) == log( f ) + log( f + 6f ) W(f) == 4 4

36 Example: Code-breaking bits Optimal strategy: f * == 7/5 Optimal doubling rate: W(f *) ==.05 log( f ) + log( f + 6f ) W(f) == 2 2

37 Example: Code-breaking bits Optimal strategy: f * == Optimal doubling rate: W(f *) == log( f ) + log( f + 6f ) W(f) ==

38 Example: Code-breaking Raw information (drop in entropy) Relevant information (increase in doubling rate)

39 Example: Randomization

40 Example: Randomization def choose(): if flip(): if flip(): return ROCK /3, /3, /3 else: return PAPER else: return SCISSORS /2, /4, /4

41 Example: Randomization Rules: 2 You () and the adversary (2) both bet $ You move first The winner takes the whole pool W(p) == log min { p + 2 p2, p2 + 2 p3, p3 + 2 p }

42 Example: Randomization Best accessible strategy: p* == (, 0, 0) Doubling rate: W(p*) == W(p) == log min { p + 2 p2, p2 + 2 p3, p3 + 2 p }

43 Example: Randomization Best accessible strategy: p* == (/2, /2, 0) Doubling rate: W(p*) ==.00 W(p) == log min { p + 2 p2, p2 + 2 p3, p3 + 2 p }

44 Example: Randomization Best accessible strategy: p* == (2/4, /4, /4) Doubling rate: W(p*) == 0.42 W(p) == log min { p + 2 p2, p2 + 2 p3, p3 + 2 p }

45 Example: Randomization Best accessible strategy: p* == (3/8, 3/8, 2/8) Doubling rate: W(p*) == 0.9 W(p) == log min { p + 2 p2, p2 + 2 p3, p3 + 2 p }

46 Example: Randomization Best accessible strategy: p* == (6/6, 5/6, 5/6) Doubling rate: W(p*) == 0.09 W(p) == log min { p + 2 p2, p2 + 2 p3, p3 + 2 p }

47 Example: Randomization Coin flips Distribution Doubling rate 0 (, 0, 0) (/2, /2, 0) (/2, /4, /4) (3/8, 3/8, 2/8) (6/6, 5/6, 5/6) (/3, /3, /3)

49 January: Project course in information theory ith w w No E MOR N! O N SHAN Day : Uncertainty and Inference Probability theory: Semantics and expressivity Random variables Generative Bayesian models stochastic processes Uncertain and information: Uncertainty as cost The Hartley measure Shannon information content and entropy Huffman coding Day 2: Counting Typical Sequences Day 3: Guessing and Gambling Evidence, likelihood ratios, competitive prediction Kullback-Leibler divergence Examples of diverging stochastic models Expressivity and the bias/variance tradeoffs. Doubling rates and proportional betting Card color prediction Day 4: Asking Questions and Engineering Answers Questions and answers (or experiments and observations) mutual information Coin weighing The maximum entropy principle The channel coding theorem Day 5: Informative Descriptions and Residual Randomness The law of large numbers Typical sequences and the source coding theorem. The practical problem of source coding Kraft s inequality and prefix codes Arithmetic coding Stochastic processes and entropy rates the source coding theorem for stochastic processes Examples Kolmogorov complexity Tests of randomness Asymptotic equivalence of complexity and entropy

Information Theory and Communication

Information Theory and Communication Shannon-Fano-Elias Code and Arithmetic Codes Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/12 Roadmap Examples