Matrix-Vector Multiplication by MapReduce. From Rajaraman / Ullman- Ch.2 Part 1

Similar documents
2.3 Algorithms Using Map-Reduce

Databases 2 (VU) ( / )

CS435 Introduction to Big Data FALL 2018 Colorado State University. 9/24/2018 Week 6-A Sangmi Lee Pallickara. Topics. This material is built based on,

Data Partitioning and MapReduce

Generalizing Map- Reduce

CS535 Big Data Fall 2017 Colorado State University 9/5/2017. Week 3 - A. FAQs. This material is built based on,

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

Distributed Itembased Collaborative Filtering with Apache Mahout. Sebastian Schelter twitter.com/sscdotopen. 7.

Fall 2018: Introduction to Data Science GIRI NARASIMHAN, SCIS, FIU

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

Distributed computing: index building and use

Databases 2 (VU) ( )

An Introduction to Double Integrals Math Insight

Recommender Systems 6CCS3WSN-7CCSMWAL

Multi-domain Predictive AI. Correlated Cross-Occurrence with Apache Mahout and GPUs

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

Recommender Systems New Approaches with Netflix Dataset

Lecture Map-Reduce. Algorithms. By Marina Barsky Winter 2017, University of Toronto

A Parallel Algorithm for Finding Sub-graph Isomorphism

B490 Mining the Big Data. 5. Models for Big Data

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

MapReduce and the New Software Stack

CS224W Project: Recommendation System Models in Product Rating Predictions

Recommender System. What is it? How to build it? Challenges. R package: recommenderlab

Big Data Analytics CSCI 4030

INTRODUCTION TO DATA SCIENCE. Link Analysis (MMDS5)

Link Analysis. Chapter PageRank

Evaluating Private Information Retrieval on the Cloud

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs

CS30 - Neural Nets in Python

1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3

The Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs

Unit #13 : Integration to Find Areas and Volumes, Volumes of Revolution

Collaborative Filtering using Euclidean Distance in Recommendation Engine

CSE152 Introduction to Computer Vision Assignment 3 (SP15) Instructor: Ben Ochoa Maximum Points : 85 Deadline : 11:59 p.m., Friday, 29-May-2015

Distributed computing: index building and use

CS425: Algorithms for Web Scale Data

Apache Giraph. for applications in Machine Learning & Recommendation Systems. Maria Novartis

STAT STATISTICAL METHODS. Statistics: The science of using data to make decisions and draw conclusions

Using Social Networks to Improve Movie Rating Predictions

Jeff Howbert Introduction to Machine Learning Winter

Similarity Ranking in Large- Scale Bipartite Graphs

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

Part 11: Collaborative Filtering. Francesco Ricci

Non-exhaustive, Overlapping k-means

Opening the Black Box Data Driven Visualizaion of Neural N

Graph Data Management

Machine Learning using MapReduce

Part 1: Link Analysis & Page Rank

Big Data Analytics CSCI 4030

MapReduce Design Patterns

HW 4: PageRank & MapReduce. 1 Warmup with PageRank and stationary distributions [10 points], collaboration

Part 11: Collaborative Filtering. Francesco Ricci

AI Dining Suggestion App. CS 297 Report Bao Pham ( ) Advisor: Dr. Chris Pollett

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

REGULAR GRAPHS OF GIVEN GIRTH. Contents

Exploring the Hidden Dimension in Graph Processing

Sparse Estimation of Movie Preferences via Constrained Optimization

Jeffrey D. Ullman Stanford University

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #43. Multidimensional Arrays

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

Personalized Web Search

For example, the system. 22 may be represented by the augmented matrix

HaLoop Efficient Iterative Data Processing on Large Clusters

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Unit VIII. Chapter 9. Link Analysis

Distributed Computing with Spark

A probabilistic model to resolve diversity-accuracy challenge of recommendation systems

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

Diffusion Wavelets for Natural Image Analysis

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

Data Mining Lecture 2: Recommender Systems

Antonio Fernández Anta

Machine Learning 13. week

Recommender Systems. Techniques of AI

General Instructions. Questions

Hybrid Recommendation System Using Clustering and Collaborative Filtering

Recommender Systems - Introduction. Data Mining Lecture 2: Recommender Systems

Slides based on those in:

Web Personalization & Recommender Systems

Neural Network Weight Selection Using Genetic Algorithms

Building a Recommendation System for EverQuest Landmark s Marketplace

Rapid growth of massive datasets

Lecture #3: PageRank Algorithm The Mathematics of Google Search

Collaborative Filtering for Netflix

Extracting Information from Complex Networks

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian

Convex Optimization / Homework 2, due Oct 3

Graphs (Part II) Shannon Quinn

[3-5] Consider a Combiner that tracks local counts of followers and emits only the local top10 users with their number of followers.

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group

Joint Entity Resolution

Music Recommendation with Implicit Feedback and Side Information

STREAMING RANKING BASED RECOMMENDER SYSTEMS

Indexing Large-Scale Data

MATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix.

Edge and local feature detection - 2. Importance of edge detection in computer vision

Parallel learning of content recommendations using map- reduce

Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

Transcription:

Matrix-Vector Multiplication by MapReduce From Rajaraman / Ullman- Ch.2 Part 1

Google implementation of MapReduce created to execute very large matrix-vector multiplications When ranking of Web pages that goes on at search engines, n is in the tens of billions. Page Rank- iterative algorithm Also, useful for simple (memory-based) recommender systems

Problem Statement Given, n n matrix M, whose element in row i and column j will be denoted m ij. a vector v of length n. x i = σ 1 n m ij. v j Assume that the row-column coordinates of each matrix element will be discoverable, either from its position in the file, or because it is stored with explicit coordinates, as a triple (i, j, m ij ). the position of element v j in the vector v will be discoverable in the analogous way.

Case 1: n is large, but not so large that vector v cannot fit in main memory The Map Function: Map function is written to apply to one element of M v is first read to computing node executing a Map task and is available for all applications of the Map function at this compute node. Each Map task will operate on a chunk of the matrix M. From each matrix element m ij it produces the keyvalue pair (i, ( m ij. v j ) ). Thus, all terms of the sum that make up the component x i of the matrix-vector product will get the same key, i.

Case 1 Continued The Reduce Function: The Reduce function simply sums all the values associated with a given key i. The result will be a pair (i,x i ).

Case 2: n is large to fit into main memory v should be stored in computing nodes used for the Map task Divide the matrix into vertical stripes of equal width and divide the vector into an equal number of horizontal stripes, of the same height. Our goal is to use enough stripes so that the portion of the vector in one stripe can fit conveniently into main memory at a compute node.

Matrix M vector v Figure 2.4: Division of a matrix and vector into five stripes The ith stripe of the matrix multiplies only components from the ith stripe of the vector. Divide the matrix into one file for each stripe, and do the same for the vector. Each Map task is assigned a chunk from one of the stripes of the matrix and gets the entire corresponding stripe of the vector. The Map and Reduce tasks can then act exactly as was described, as case 1.

Recommender is one of the most popular large-scale machine learning techniques. Amazon ebay Facebook Netflix

Two types of recommender techniques Collaborative Filtering Content Based Recommendation Collaborative Filtering Model based Memory based User similarity based Item similarity based

Item-based collaborative filtering Basic idea: Use the similarity between items (and not users) to make predictions Example: Look Item1 for items Item2that Item3 are similar Item4 to Item5 Alice 5 3 4 4? Take Alice's ratings for these items to predict the rating for Item5 User1 3 1 2 3 3 User2 4 3 4 3 5 User3 3 3 1 5 4 User4 1 5 5 2 1

The cosine similarity measure

Making predictions A common prediction function: u > user; i,p -> items; user has not rated item p. Neighborhood size is typically also limited to a specific size Not all neighbors are taken into account for the prediction An analysis of the MovieLens dataset indicates that "in most real-world situations, a neighborhood of 20 to 50 neighbors seems reasonable" (Herlocker et al. 2002)

Pre-processing for item-based filtering Item-based filtering does not solve the scalability problem itself Pre-processing approach by Amazon.com (in 2003) Calculate all pair-wise item similarities in advance The neighborhood to be used at run-time is typically rather small, because only items are taken into account which the user has rated Item similarities are supposed to be more stable than user similarities Memory requirements Up to N 2 pair-wise similarities to be memorized (N = number of items) in theory In practice, this is significantly lower (items with no co-ratings) Further reductions possible Minimum threshold for co-ratings Limit the neighborhood size (might affect recommendation accuracy)

Mahout s Item-Based RS Apache Mahout is an open source Apache Foundation project for scalable machine learning. Mahout uses Map-Reduce paradigm for scalable recommendation 14

Mahout s RS Matrix Multiplication for Preference Prediction 1 2 3 4 5 1 16 9 16 5 6 0 135 1 2 9 30 19 3 2 5 251 2 3 16 19 23 5 4 5 220 3 4 5 3 5 10 20 2 60 4 5 6 2 4 20 9 0 70 5 Item-Item Similarity Matrix Active User s Preference Vector 15

As matrix math, again Inside-out method 9 16 5 135 30 19 3 251 5 19 5 23 2 5 220 3 5 10 60 2 4 20 70 16

Page Rank Algorithm One iteration of the PageRank algorithm involves taking an estimated Page-Rank vector v and computing the next estimate v by v = βmv + (1 β)e/n β is a constant slightly less than 1, e is a vector of all 1 s, and n is the number of nodes in the graph that transition matrix M represents.

Vectors v and v are too large for Main Memory If using the stripe method to partition a matrix and vector that do not fit in main memory, then a vertical stripe from the matrix M and a horizontal stripe from the vector v will contribute to all components of the result vector v. Since that vector is the same length as v, it will not fit in main memory either. M is stored column-by-column for efficiency reasons, a column can affect any of the components of v. As a result, it is unlikely that when we need to add a term to some component v i, that component will already be in main memory. Thrashing

Square Blocks, instead of Slices An additional constraint is that the result vector should be partitioned in the same way as the input vector, so the output may become the input for another iteration of the matrix-vector multiplication. An alternative strategy is based on partitioning the matrix into k 2 blocks, while the vectors are still partitioned into k stripes.

Figure from Rajaraman, Ullmann

Iterative Computation- Square Blocks Method In this method, we use k 2 Map tasks. Each task gets one square of the matrix M, say M ij, and one stripe of the vector v, which must be v j. Notice that each stripe of the vector is sent to k different Map tasks; v j is sent to the task handling M ij for each of the k possible values of i. Thus, v is transmitted over the network k times. However, each piece of the matrix is sent only once. The advantage of this approach is that we can keep both the jth stripe of v and the ith stripe of v in main memory as we process M ij. Note that all terms generated from M ij and v j contribute to v i and no other stripe of v.

Matrix-Matrix Multiplication using MapReduce Like Natural Join Operation

The Alternating Least Squares (ALS) Recommender Algorithm

Matrix A represents a typical user-product-rating model The algorithm tries to derive a set of latent factors (like tastes) from the user-product-rating matrix from matrix A. So matrix X can be viewed as user-taste matrix And matrix Y can be viewed as a product-taste matrix

Objective Function minimize 1 2 A XYT 2 Only check observed ratings minimize 1 2 (i,j) Ω (a i,j x i T yj ) 2 If we fix X, the objective function becomes a convex. Then we can calculate the gradient to find Y If we fix Y, we can calculate the gradient to find X

Iterative Training Algorithm Given Y, you can Solve X X i = AY i 1 (Y i 1T Y i 1 ) -1 The algorithm works in the following way: Y 0 -> X 1 -> Y 1 -> X 2 -> Y 2 -> Till the squared difference is small enough