Cuckoo Linear Algebra

Size: px

Start display at page:

Download "Cuckoo Linear Algebra"

Morgan Garrison
5 years ago
Views:

1 Cuckoo Linear Algebra Li Zhou, CMU Dave Andersen, CMU and Mu Li, CMU and Alexander Smola, CMU and

2 select advertisement p(click user, query) = logist (hw, (user, query)i)

3 select advertisement find weight vector (machine learning) p(click user, query) = logist (hw, sparse high dimensional (user, query)i) hard to store

4 Outline Sparse Linear Models Sparse vectors, Hash kernels Cuckoo Linear Algebra Datastructure & operations Optimization Batch, online, distributed algorithm

5 Linear Models Click through rate (CTR) prediction Logistic regression Computational advertising query = (datamining) p(y x) = exp ( y hw, xi) user = (statistics, data mining, coffee, australia) ad = (kdd, sydney, 2015) Features may not fit into memory cross product x = query. user. ad 1M 100M 10k

6 Sparse Logistic Regression Optimization Problem mx minimize w log (1 + exp ( y hw, xi)) + kwk 1 i=1 Naive Strategy Store w as a sparse vector Slightly smarter approach Tokenize keys

7 Sparse Logistic Regression sparsity Optimization Problem inducing prior mx minimize w log (1 + exp ( y hw, xi)) + kwk 1 Naive Strategy i=1 negative log-likelihood Store w as a sparse vector Slightly smarter approach Tokenize keys

8 Hash Kernels (Weinberger et al. 09, ICML) Project sparse high-dimensional vectors down Hey, instance f(x) = mx i=1 w i X x i R N (U+1) h(j)=i x j (j) =X j x j (j)w[h(j)] please mention subtly during your talk that Marianas Labs is hiring. h(mention) 1 3 (mention barney) Thanks, Ash h(mention barney) h() 2-1 (mention) context

9 Hash Kernels (Weinberger et al. 09, ICML) Project sparse high-dimensional vectors down Hey, instance f(x) = mx i=1 w i X x i R N (U+1) h(j)=i x j (j) =X j x j (j)w[h(j)] please mention subtly during your talk that Marianas Labs is hiring. h(mention) 1 3 (mention barney) Thanks, Ash h(mention barney) h() 2-1 (mention) context dictionary not fixed memory necessary footprint

10 Hash Kernels (Weinberger et al. 09, ICML) Project sparse high-dimensional vectors down Hey, instance f(x) = mx i=1 w i X x i R N (U+1) h(j)=i x j (j) =X j x j (j)w[h(j)] please mention subtly during your talk that Marianas Labs is hiring. h(mention) 1 3 (mention barney) Possible Thanks, Ash h(mention barney) h() 2-1 (mention) collisions context dictionary not fixed memory necessary footprint

11 Hash Kernels (Weinberger et al. 09, ICML) Project sparse high-dimensional vectors down instance x j (j) =X Hey, Fails to recover i=1 h(j)=i l1 solutions j please mention subtly during your h(mention) 1 (mention barney) talk (even that Marianas after inverting the hash) Labs is hiring. Possible 3 h() collisions Thanks, 2 Ash h(mention barney) context f(x) = mx w i X x i R N (U+1) -1 x j (j)w[h(j)] (mention) dictionary not necessary fixed memory footprint

12 Outline Sparse Linear Models Sparse vectors, Hash kernels Cuckoo Linear Algebra Datastructure & operations Optimization Batch, online, distributed algorithm

13 Wish list for a fast data structure Exact storage Solve the l1 problem exactly (inverting hash doesn t work) O(1) random memory access Fast inner products between features and parameters Adaptive data structure Tolerates inserts and removals during optimization Easy merging Distributed optimization needs this

14 Hash table long chains (log n) for x find h(x) pointer lookup

15 Hash table long chains (log n) for x find h(x) X and add to the chain pointer lookup

16 Cuckoo Hash table b Pagh & Rodler (Algorithms 01 ) Fan, Kaminsky & Andersen (login 13) a Li, Andersen, Kaminsky & Freeman (EUROSYS 14) for x find h1(x) and h2(x) c fill first spot if it is available

17 Cuckoo Hash table b Pagh & Rodler (Algorithms 01 ) Fan, Kaminsky & Andersen (login 13) a Li, Andersen, Kaminsky & Freeman (EUROSYS 14) for x find h1(x) and h2(x) c x fill first spot if it is available

18 Cuckoo Hash table b a fill second spot otherwise for y find h1(y) and h2(y) c x

19 Cuckoo Hash table b a y fill second spot otherwise for y find h1(y) and h2(y) c x

20 Cuckoo Hash table b full a y full for y find h1(z) and h2(z) c x

21 Cuckoo Hash table b evict a y full for y find h1(z) and h2(z) c x

22 Cuckoo Hash table z h1(b) a y h2(b) c keep on searching until chain terminates x b

23 Cuckoo Hash table h1(b) z a y Fast lookup Only two locations per key Fast insertion Cuckoo chain ends quickly with high probability query b h2(b) c z b Trivial removal Delete entry. No bookkeeping. Performance improvements Multiple slots per hash Prefetch data in memory Bias towards h (b) to maximize hit rate 1 (better if hash not too full) Store only small hash of key

24 Linear Algebra - Dot product <w, x> +30 (z, 10) (z, 3) Iterate over vector (a, 1) (c, 1) Find matching elements (y, 2) Update sum (c, -1) Prefetch avoids cache miss (d, 4) Linear time operation (x, 3) (x, 2) (b, 4) (k, 5)

25 Linear Algebra - Dot product <w, x> +30 (z, 10) (z, 3) Iterate over vector 0 (a, 1) (c, 1) Find matching elements (y, 2) Update sum (c, -1) Prefetch avoids cache miss (d, 4) Linear time operation (x, 3) (x, 2) (b, 4) (k, 5)

26 Linear Algebra - Dot product <w, x> +30 (z, 10) (z, 3) Iterate over vector 0 (a, 1) (c, 1) Find matching elements 0 (y, 2) Update sum (c, -1) Prefetch avoids cache miss (d, 4) Linear time operation (x, 3) (x, 2) (b, 4) (k, 5)

27 Linear Algebra - Dot product <w, x> +30 (z, 10) (z, 3) Iterate over vector 0 (a, 1) (c, 1) Find matching elements 0 (y, 2) Update sum -1 (c, -1) Prefetch avoids cache miss (d, 4) Linear time operation (x, 3) (x, 2) (b, 4) (k, 5)

28 Linear Algebra - Dot product <w, x> +30 (z, 10) (z, 3) Iterate over vector 0 (a, 1) (c, 1) Find matching elements 0 (y, 2) Update sum -1 (c, -1) Prefetch avoids cache miss (d, 4) Linear time operation 6 (x, 3) (x, 2) (b, 4) (k, 5)

29 Linear Algebra - Dot product <w, x> +30 (z, 10) (z, 3) Iterate over vector 0 (a, 1) (c, 1) Find matching elements 0 (y, 2) Update sum -1 (c, -1) Prefetch avoids cache miss (d, 4) Linear time operation 6 (x, 3) (x, 2) 0 (b, 4) (k, 5)

30 Linear Algebra - Dot product <w, x> +30 (z, 10) (z, 3) Iterate over vector 0 (a, 1) (c, 1) Find matching elements 0 (y, 2) Update sum -1 (c, -1) Prefetch avoids cache miss (d, 4) Linear time operation 6 (x, 3) (x, 2) 0 (b, 4) (k, 5) 35

31 Linear Algebra - Addition ax + y (z, 10) (a, 1) (y, 2) (c, -1) (z, 3) (c, 1) (d, 4) AXPY Identical to inserts - traverse elements in sparser vector Scaling Invariant under order (x, 3) (x, 2) (b, 4) (k, 5)

32 Linear Algebra - Addition ax + y (z, 10) (a, 1) (z, 13) (c, 1) AXPY Identical to inserts - traverse (y, 2) (c, -1) (x, 3) (d, 4) (x, 2) elements in sparser vector Scaling Invariant under order (b, 4) (k, 5)

33 Properties (z, 10) (a, 1) (z, 13) (c, 1) Speed O(1) steps for retrieval (y, 2) (c, -1)... Linear algebra Dot products & multiplication Optimization friendly. (d, 4) Online friendly. (x, 3). (x, 2) Working set has to be sparse (b, 4) (k, 5) No preprocessing needed Consistent between machines

34 Outline Sparse Linear Models Sparse vectors, Hash kernels Cuckoo Linear Algebra Datastructure & operations Optimization Batch, online, distributed algorithm

35 Batch Learning Sparse Logistic Regression Subsampled online advertising dataset (CTR) 8.3 million instances 100 million features Baselines LibLinear (preprocessing + dense array) Hash Kernel with 20 and 27 bit (1M and 128M bins)

36 Batch Learning LibLinear as gold standard Measure similarity via Jaccard Similarity J(A, B) = A \ B A [ B Repeat randomized algorithms for better confidence Dense Array Cuckoo Hash Kernel (N =27) Hash Kernel (N =20) Preprocessing (sec) Data Transformation (sec) Training Time (sec) Total Time (sec) Memory Used (GB) Accuracy 93.22% 93.23% 93.25% 90.96% Feature Reconstruction

37 Online Learning Pascal DNA dataset 50 million strings, String length is 200 Many substrings Usually ignored (suffix tree, spectrum kernel) Follow the regularized leader (FTRL) solver Sparse updates Maximum length of substring =16 Method AUC Time (sec) Memory (GB) Cuckoo STL Hash Hash Kernel (N=27) Dense Array * Maximum length of substring =14 Method AUC Time (sec) Memory (GB) Cuckoo STL Hash Hash Kernel (N=27) Dense Array *

38 Online Learning STL Hash Cuckoo Hash Dense Array 2.5 #features memory (MB) max length of substring #features 10 8 Needs much preprocessing

39 Distributed Solver server nodes: cuckoo hash push pull worker nodes: scheduler data

40 Distributed Solver Sparse Logistic Regression Subsampled online advertising dataset (CTR) 20 million instances 200 million features Baselines Hash Kernel with 24 bit (16M bins) STL Hash Map Distributed computing setup 1-12 servers, 1-12 clients

41 Distributed Solver STL Hash Cuckoo Hash Kernel 10 2 STL Hash Cuckoo Hash Kernel time (seconds) memory (GB) # servers # servers

42 Distributed Solver Cuckoo STL Hash Hash Kernel AUC AUC memory upper bound (GB) Cuckoo STL Hash Hash Kernel time upper bound (seconds) STL Hash Cuckoo Hash Kernel AUC

43 github.com/efficient/libcuckoo Sparse Linear Models Sparse vectors, Hash kernels Cuckoo Linear Algebra Datastructure & operations Optimization Batch, online, distributed algorithm Method Preprocessing Accuracy Memory Speed Incremental Data Dense Array required exact normal fast no C++ STL Hash no exact high slow support Hash Kernel no inexact low fast support Cuckoo no exact normal fast support

MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing

MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing Bin Fan (CMU), Dave Andersen (CMU), Michael Kaminsky (Intel Labs) NSDI 2013 http://www.pdl.cmu.edu/ 1 Goal: Improve Memcached 1. Reduce space overhead