Apache Giraph for applications in Machine Learning & Recommendation Systems Maria Stylianou @marsty5 Novartis Züri Machine Learning Meetup #5 June 16, 2014
Apache Giraph for applications in Machine Learning & Recommendation Systems Maria Stylianou @marsty5 Novartis Züri Machine Learning Meetup #5 June 16, 2014
Outline Machine Learning Cases Why Apache Giraph? Walk-through example for Recommendation Systems What s more? 3
Machine Learning cases Friend Recommendation Fake account detection 4
Machine Learning cases Product recommendation Online advertisements 5
Machine Learning cases Route planning Delivery scheduling 6
Machine Learning cases Graphs are everywhere Graphs need processing! 7
(Graphs ABC) Graph: a representation of a set of objects V = Vertices (nodes) E = Edges (links) Graphs capture the relationship between objects ) Graphs can be directed or undirected 8
Graphs need processing! So what? 9
Challenge #1 Scale of graphs indexes ~50B pages has ~1.1B users has ~570M users has ~530M users 10
Challenge #2 Complexity of graphs Compute shortest distance from google.com à Multiple passes to compute the result à Inherent dependencies make it hard to parallelize 11
MapReduce Well established Efficient for big data analytics Not efficient with iterative algorithms (stateless) Graph algorithms are iterative 12
Why Apache Giraph? Explicitly designed for graph processing on top of the Hadoop ecosystem 13
The story. Google Pregel (2010) Apache Top Level Project (2012) 1.1 release (2014) Donated to ASF by Yahoo! (2011) 1.0 release (2013) Supported by: Facebook Yahoo! LinkedIn 14
Giraph follows the Pregel model or Bulk Synchronous Parallel 15
I am a vertex! How would I coordinate with other vertices to solve the problem? Thinking like a vertex 16
Shortest Paths I only know my value and who my neighbors are 17
Receive messages à Update value à Send messages Vertices compute asynchronously 18
Global Synchronization Synchronization barrier 19
And again 20
And again 21
Giraph super powers Message-passing communication In-memory computation à stateful Global synchronization Iterations à Iterations à Iterations 22
Recommendation Systems 23
Collaborative Filtering Recommendation systems technique June 16, 2014 Apache Giraph for applications in Machine Learning Maria Stylianou 24
Giraph for Recommendation Systems Stochastic Gradient Descent algorithm (SGD) 25
Giraph for Recommendation Systems Stochastic Gradient Descent algorithm (SGD) 26
Giraph for Recommendation Systems Stochastic Gradient Descent algorithm (SGD) 27
Giraph for Recommendation Systems Stochastic Gradient Descent algorithm (SGD) 28
Giraph for Recommendation Systems Stochastic Gradient Descent algorithm (SGD) 29
Giraph for Recommendation Systems Stochastic Gradient Descent algorithm (SGD) 30
What s more Okapi ML The 1 st advanced ML toolkit for Giraph Available as open source Code available at: https://github.com/grafos-ml-okapi Documentation: http://grafos.ml/okapi.html 31
The Okapi library Collaborative filtering Alternating Least Squares Stochastic Gradient Descent Singular Value Decomposition Collaborative Less-is-More (CLiMF) Context-aware recom. (TFMAP) Bayesian Personalized Ranking Popularity Ranking Clustering Affinity propagation Kmeans Graph analytics Clustering coefficient Graph partitioning K-Core PageRank Semi-clustering Shortest distances SybilRank and adding Triangle counting 32
What s more Giraph in Action The 1 st book for Giraph First steps with Giraph Build applications Integrate with other tools More! More details: http://manning.com/martella/ 33
Apache Giraph for applications in Machine Learning & Recommendation Systems Maria Stylianou @marsty5 Novartis Züri Machine Learning Meetup #5 June 16, 2014