Case Study 4: Collabora1ve Filtering

Size: px

Start display at page:

Download "Case Study 4: Collabora1ve Filtering"

Rosamund Johnston
5 years ago
Views:

1 Case Study 4: Collabora1ve Filtering Graph- Parallel Problems Synchronous v. Asynchronous ComputaPon Machine Learning for Big Data CSE547/STAT548, University of Washington Carlos Guestrin, guest lecturer May 14 th, 2015 Emily Fox ML in the Context of Parallel Architectures GPUs Multicore Clusters Clouds Supercomputers But scalable ML in these systems is hard, especially in terms of: 1. Programmability 2. Data distribupon 3. Failures Emily Fox

2 Move Towards Higher- Level AbstracPon Distributed compupng challenges are hard and annoying! 1. Programmability 2. Data distribupon 3. Failures High- level abstracpons try to simplify distributed programming by hiding challenges: Provide different levels of robustness to failures, oppmizing data movement and communicapon, protect against race condipons Generally, you are spll on your own WRT designing parallel algorithms Some common parallel abstracpons: Lower- level: Pthreads: abstracpon for distributed threads on single machine MPI: abstracpon for distributed communicapon in a cluster of computers Higher- level: Map- Reduce (Hadoop: open- source version): mostly data- parallel problems GraphLab: for graph- structured distributed problems Emily Fox Simplest Type of Parallelism: Data Parallel Problems You have already learned a classifier What s the test error You have 10B labeled documents and 1000 machines Problems that can be broken into independent subproblems are called data- parallel (or embarrassingly parallel) Map- Reduce is a great tool for this Focus of today s lecture but first a simple example Emily Fox

3 Data Parallelism (MapReduce) CPU. 1 CPU. 2 CPU.. 3 CPU Solve a huge number of independent subproblems, e.g., extract features Emily Fox 2015 in images 5 Map- Reduce AbstracPon Map: Data- parallel over elements, e.g., documents Generate (key,value) pairs value can be any data type Reduce: Aggregate values for each key Must be commutapve- associate operapon Data- parallel over keys Generate (key,value) pairs Map- Reduce has long history in funcponal programming But popularized by Google, and subsequently by open- source Hadoop implementapon from Yahoo! Emily Fox

Map- Reduce ExecuPon Overview Map Phase Shuffle Phase Reduce Phase M1 (k 1,v 1 ) (k 2,v 2 ) M1 (k 1,v 1 ) (k 2,v 2 ) Big Data Split data across machines M2 (k 1,v 1 ) (k 2,v 2 ) Assign tuple (k i,v i

4 Map- Reduce ExecuPon Overview Map Phase Shuffle Phase Reduce Phase M1 (k 1,v 1 ) (k 2,v 2 ) M1 (k 1,v 1 ) (k 2,v 2 ) Big Data Split data across machines M2 (k 1,v 1 ) (k 2,v 2 ) Assign tuple (k i,v i ) to machine h[k i ] M2 (k 3,v 3 ) (k 4,v 4 ) M1000 (k 1,v 1 ) (k 2,v 2 ) M1000 (k 5,v 5 ) (k 6,v 6 ) Emily Fox Issues with Map- Reduce AbstracPon Olen all data gets moved around cluster Very bad for iterapve senngs DefiniPon of Map & Reduce funcpons can be unintuipve in many apps Graphs are challenging ComputaPon is synchronous Emily Fox

5 SGD for Matrix FactorizaPon in Map- Reduce " L (t+1) u R (t+1) v # " (1 t u )L (t) u (1 t v )R v (t) t t R v (t) t t L (t) u # Map and Reduce funcpons Map- Reduce: Data- parallel over all mappers Data- parallel over reducers with same key t = L (t) u R (t) v r uv Here, one update at a Pme! Emily Fox Matrix FactorizaPon as a Graph 4 3 Women on the Verge of a Nervous Breakdown The CelebraPon City of God 2 5 Wild Strawberries La Dolce Vita Emily Fox

6 Flashback to 1998 Why First Google advantage: a Graph Algorithm & a System to Support it! Emily Fox Social Media Science Adver1sing Web Graphs encode the rela1onships between: People Facts Products Interests Ideas Big: 100 billions of ver1ces and edges and rich metadata Facebook (10/2012): 1B users, 144B friendships Twiqer (2011): 15B follower edges Emily Fox

7 5/14/15 Facebook Graph Emily Fox 2015 Slide from Facebook Engineering presentapon 13 Label a Face and Propagate Emily Fox

8 5/14/15 Pairwise similarity not enough Not similar enough to be sure Emily Fox Propagate SimilariPes & Co- occurrences for Accurate PredicPons similarity edges co- occurring faces further evidence Emily Fox

9 Example: Es9mate Poli9cal Bias Liberal ConservaPve Emily Fox Topic Modeling (e.g., LDA) Cat Apple Growth Hat Plant Emily Fox

10 ML Tasks Beyond Data- Parallelism Data-Parallel Graph-Parallel Map Reduce Feature ExtracPon Cross ValidaPon CompuPng Sufficient StaPsPcs Graphical Models Gibbs Sampling Belief PropagaPon VariaPonal Opt. Collabora1ve Filtering Tensor FactorizaPon Semi- Supervised Learning Label PropagaPon CoEM Graph Analysis PageRank Triangle CounPng Emily Fox Example of a Graph- Parallel Algorithm Emily Fox

11 PageRank Depends on rank of who follows her Depends on rank of who follows them What s the rank of this user Rank Loops in graph è Must iterate! Emily Fox PageRank IteraPon R[j] w ji R[i] = +(1 ) X (j,i)2e w ji R[j] R[i] α is the random reset probability w ji is the prob. transiponing (similarity) from j to i Emily Fox

ProperPes of Graph Parallel Algorithms Dependency Graph Local Updates IteraPve ComputaPon My Rank Friends Rank Emily Fox 2015 23 Addressing Graph- Parallel ML Data-Parallel Map Reduce Graph-Parallel

12 ProperPes of Graph Parallel Algorithms Dependency Graph Local Updates IteraPve ComputaPon My Rank Friends Rank Emily Fox Addressing Graph- Parallel ML Data-Parallel Map Reduce Graph-Parallel Graph- Parallel AbstracPon Feature ExtracPon Cross ValidaPon CompuPng Sufficient StaPsPcs Graphical Models Gibbs Sampling Belief PropagaPon VariaPonal Opt. Collabora1ve Filtering Tensor FactorizaPon Semi- Supervised Learning Label PropagaPon CoEM Data- Mining PageRank Triangle CounPng Emily Fox

13 Graph ComputaPon: Synchronous v. Asynchronous Bulk Synchronous Parallel Model: Pregel (Giraph) [Valiant 90] Compute Communicate Barrier Emily Fox

14 Map- Reduce ExecuPon Overview Map Phase Shuffle Phase Reduce Phase M1 (k 1,v 1 ) (k 2,v 2 ) M1 (k 1,v 1 ) (k 2,v 2 ) Big Data Split data across machines M2 (k 1,v 1 ) (k 2,v 2 ) Assign tuple (k i,v i ) to machine h[k i ] M2 (k 3,v 3 ) (k 4,v 4 ) M1000 (k 1,v 1 ) (k 2,v 2 ) M1000 (k 5,v 5 ) (k 6,v 6 ) Emily Fox BSP ExecuPon Overview Compute Phase Communicate Phase M1 (vid 1,vid 1, v 1 ) (vid 2,vid 2, v 2 ) M1 Big Graph Split graph across machines M2 (vid 1,vid 1, v 1 ) (vid 2,vid 2, v 2 ) Message machine for every edge (vid,vid,val) M2 M1000 (vid 1,vid 1, v 1 ) (vid 2,vid 2, v 2 ) M1000 Emily Fox

15 Bulk synchronous parallel model provably inefficient for some ML tasks Analyzing Belief PropagaPon focus here [Gonzalez, Low, G. 09] A B Priority Queue Smart Scheduling important influence Emily Fox

Algorithms can be Inefficient Run1me in Seconds 10000 8000 6000 4000 2000 0 Bulk Synchronous (e.g., Pregel) Asynchronous

16 Asynchronous Belief PropagaPon Challenge = Boundaries Many Updates SynthePc Noisy Image CumulaPve Vertex Updates Few Updates Graphical Model Algorithm idenpfies and focuses on hidden sequenpal structure Emily Fox BSP ML Problem: Synchronous Algorithms can be Inefficient Run1me in Seconds Bulk Synchronous (e.g., Pregel) Asynchronous Splash BP Number of CPUs Theorem: Bulk Synchronous BP O(#verPces) slower than Asynchronous BP Emily Fox

17 Synchronous v. Asynchronous Bulk synchronous processing: ComputaPon in phases All verpces parpcipate in a phase Though OK to say no- op All messages are sent Simpler to build, like Map- Reduce No worries about race condipons, barrier guarantees data consistency Simpler to make fault- tolerant, save data on barrier Slower convergence for many ML problems In matrix- land, called Jacobi IteraPon Implemented by Google Pregel 2010 Asynchronous processing: VerPces see latest informapon from neighbors Most closely related to sequenpal execupon Harder to build: Race condipons can happen all the Pme Must protect against this issue More complex fault tolerance When are you done Must implement scheduler over verpces Faster convergence for many ML problems In matrix- land, called Gauss- Seidel IteraPon Implemented by GraphLab 2010, 2012 Emily Fox Case Study 4: Collabora1ve Filtering GraphLab Machine Learning for Big Data CSE547/STAT548, University of Washington Carlos Guestrin, guest lecturer May 14 th, 2015 Emily Fox

18 Data Graph Data associated with verpces and edges Graph: Social Network Vertex Data: User profile text Current interests espmates Edge Data: Similarity weights Emily Fox How do we program graph computapon Think like a Vertex. - Malewicz et al. [SIGMOD 10] 18

19 Update FuncPons User- defined program: applied to vertex transforms data in scope of vertex pagerank(i, scope){ } Emily Fox Update FuncPon Example: Connected Components Emily Fox

20 Update FuncPon Example: Connected Components Emily Fox The Scheduler The scheduler determines order verpces are updated CPU 1 a b c d Scheduler b a h i CPU 2 h e f g i j k Emily Fox

distributed fashion ApproximaPons used (each machine has its own priority queue)

21 Example Schedulers Round- robin SelecPve scheduling (skipping): round robin but jump over un- scheduled verpce FIFO PrioriPze scheduling Hard to implement in a distributed fashion ApproximaPons used (each machine has its own priority queue) Emily Fox Ensuring Race- Free Code How much can computapon overlap Emily Fox

Need for Consistency Higher Throughput (#updates/sec) No

Fox 2015 43 GraphLab Ensures Sequen1al Consistency For

execu1on of update funcpons which produces the same

22 Need for Consistency Higher Throughput (#updates/sec) No Consistency PotenPally Slower Convergence of ML Emily Fox GraphLab Ensures Sequen1al Consistency For each parallel execu1on, there exists a sequen1al execu1on of update funcpons which produces the same result Parallel CPU 1 CPU 2 Pme SequenPal Single CPU Emily Fox

Consistency in CollaboraPve Filtering 128 Train RMSE 64 32 16 8 4 2 1 Dynamic Inconsistent Inconsistent updates Dynamic Consistent updates 0.

23 Consistency in CollaboraPve Filtering 128 Train RMSE Dynamic Inconsistent Inconsistent updates Dynamic Consistent updates Updates Millions Ne{lix data, 8 cores Emily Fox The GraphLab Framework Graph Based Data Representa9on Update FuncPons User Computa9on Scheduler Consistency Model Emily Fox

Triangle CounPng in Twiqer Graph 40M Users 1.2B Edges Total: 34.8 Billion Triangles Hadoop GraphLab 1536 Machines 423 Minutes 64 Machines, 1024 Cores 1.

24 Triangle CounPng in Twiqer Graph 40M Users 1.2B Edges Total: 34.8 Billion Triangles Hadoop GraphLab 1536 Machines 423 Minutes 64 Machines, 1024 Cores 1.5 Minutes Emily Fox Hadoop results from [Suri & Vassilvitskii '11] CoEM (Jones et al., 2005) Named En1ty Recogni1on Task Is Dog an animal Is Catalina a place dog <X> ran quickly Australia travelled to <X> Catalina Island <X> is pleasant Emily Fox

25 Never Ending Learner Project (CoEM) Ver1ces: 2 Million Edges: 200 Million Hadoop 95 Cores 7.5 hrs Distributed GraphLab EC2 machines 80 secs Emily Fox Women on the Verge of a Nervous Breakdown The CelebraPon What do I recommend recommend City of God Wild Strawberries La Dolce Vita Emily Fox

InterprePng Low- Rank Matrix ComplePon (aka Matrix FactorizaPon) X = L R Emily Fox 2015 51 Matrix ComplePon as a Graph X = X ij

26 InterprePng Low- Rank Matrix ComplePon (aka Matrix FactorizaPon) X = L R Emily Fox Matrix ComplePon as a Graph X = X ij known for black cells X ij unknown for white cells Rows index users movies Columns index index movies users Emily Fox

27 Coordinate Descent for Matrix FactorizaPon: AlternaPng Least- Squares min L,R X (u,v):r uv 6= Fix movie factors, oppmize for user factors X Independent least- squares over users min (L u R v r uv ) 2 L u v2v u Fix user factors, oppmize for movie factors Independent least- squares over movies X min (L u R v r uv ) 2 R v u2u v System may be underdetermined: (L u R v r uv ) 2 Converges to Emily Fox AlternaPng Least Squares Update FuncPon X min (L u R v r uv ) 2 L u v2v u min Rv X (L u R v r uv ) 2 u2uv recommend Emily Fox

28 SGD for Matrix FactorizaPon in GraphLab t = L (t) u R (t) v r uv " L (t+1) u R (t+1) v # " (1 t u )L (t) u (1 t v )R v (t) t t R v (t) t t L (t) u # recommend Emily Fox Out- of- core computapon Olen data doesn t fit in memory Must use disk/ssds Although random accesses to disk are very slow, high performance is possible with oppmized memory layout. Implemented in: GraphChi GraphLab Create Example performance of GraphLab Create: Common Crawl Graph (3.5 billion Nodes and 128 billion Edges) PageRank iterapon in 9 mins on single, commodity machine Emily Fox

29 What you need to know Data- parallel versus graph- parallel computapon Bulk synchronous processing versus asynchronous processing GraphLab system for graph- parallel computapon Data representapon Update funcpons Scheduling Consistency model ALS, SGD for matrix factorizapon in GraphLab Emily Fox Reading Papers under Case Study IV: Parallel Learning with GraphLab OpPonal: Parallel Splash BP hqp:// papers/dap- gonzalez.pdf Emily Fox

Graph-Parallel Problems. ML in the Context of Parallel Architectures

Graph-Parallel Problems. ML in the Context of Parallel Architectures Case Study 4: Collaborative Filtering Graph-Parallel Problems Synchronous v. Asynchronous Computation Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 20 th, 2014