Distributed CUR Decomposition for Bi-Clustering

Size: px
Start display at page:

Download "Distributed CUR Decomposition for Bi-Clustering"

Transcription

1 Distributed CUR Decomposition for Bi-Clustering Kevin Shaw, Stephen Kline {keshaw, June 5, 2016 Stanford University, CME 323 Final Project Abstract We describe a distributed algorithm for calculating the CUR decomposition of a sparse matrix. Using p machines, with m rows and n columns, we analyze the theoretical run-time of both algorithms. Further, we compare the use of CUR decomposition for bi-clustering algorithms which originally rely on the SVD decomposition. We find that using a CUR decomposition speeds up the run-time of a bi-clustering algorithm by roughly a factor of 2x on sample data sets with a consensus score of roughly 0.7 when compared to using SVD. In certain settings, this may be an acceptable penalty for the speed increase. 1 Introduction A matrix decomposition is a factorization of a matrix into a product of smaller matrices. There are many different matrix decompositions, and each has various applications. In our data analysis, the data matrix is a matrix with m rows and n columns, where m represents a sample of item-entities (e.g. people) and n represents another sample of item-entities (e.g. movies) related to each other with specific weights (e.g. ratings.) For various machine learning algorithms, matrix decomposition can be used on such a data matrix in order to cluster, or group, the samples by a measure of similarity. 2 Background of CUR The CUR matrix decomposition is a low-rank approximation of a given matrix A, made up of three matrices: C, U and R. When multiplied together, these three matrices approximate the original matrix A. CUR decompositions are modeled after SVD decompositions and carry different benefits and costs when compared to the SVD decomposition.

2 Formally, for a CUR decomposition of matrix A, the C matrix is made up of actual columns of the original matrix. Likewise, the R matrix is made of up actual rows. This gives the added benefit that matrices C and R represent actual observations from the data, unlike the SVD decomposition. 1 Lastly, the U matrix is then constructed in such a way that the product of C, U and R closely approximates the original matrix A. Beyond interpret-ability, another benefit of the CUR decomposition is the storage space. Since C and R represent actual columns and rows of the original matrix A, these matrices will also be sparse (given that A is sparse). Further, though the U matrix is dense, it is quite small, and thus does not require large storage space. 2 This provides an additional benefit over SVD decompositions, as the U and V matrices are often large and very dense, while Σ is small and diagonal. Figure 1 and Figure 2 provide visual representations for both the SVD and CUR decomposition, respectively. Figure 1: Example of SVD Matrix Decomposition Figure 2: Example of CUR Matrix Decomposition 1 There are other CUR decompositions that use right and left singular values for the C and R matrix. We will not be analyzing those in this paper. 2 The U matrix is r-by-c, where r and c are often a small numbers in relation to the dimension of the original matrix. 2

3 3 CUR Algorithms Over the past few years, there have been various serial algorithms proposed for CUR matrix decompositions. For our analysis, we chose to focus on one that provides a fast run-time while still providing a good approximation to the SVD decomposition. For instance, some CUR algorithms provide better accuracy by constructing the C and R matrices in a different procedure, though pay for this in terms of run-time. The algorithm we describe and analyze below was proposed in a paper by Petros Drineas, Ravi Kannan and Michael W. Mahoney in 2006 [1]. This paper presents both a linear-time CUR algorithm (scales linearly) and a constant-time CUR algorithm. We chose the linear-time CUR algorithm for our analysis and implementation. It makes a better approximation but requires greater computation. 3.1 Serial CUR Algorithm/Implementation of CUR Below is the linear-time algorithm described by Drineas, Kannan and Mahoney from their 2006 paper [1]. Data: A R m n, r, c, k Z + such that 1 r m, 1 c n, and 1 k min(r, c), {p i } m i such that p i 0 and Σ m i=1 p i = 1, and {q j } n j such that q j 0 and Σ n j=1 q i = 1. Result: C, U and R matrices for t = 1 to c do Pick j t {1,...,n} with Pr[j t = α] = q α, α = 1,...,n Set C (t) = A (jt) / cq jt end for t = 1 to r do Pick i t {1,...,n} with Pr[i t = α] = p α, α = 1,...,n Set R (t) = A (it) / rp jit Set Y (t) = C (it) / rp jit end Compute C T C and its SVD. Say C T C = Σ c t=1 σt 2 (C)y t y tt X = Σ k t=1 U = XY T 1 σ 2 t (C)yt y tt Analysis of Run Time Algorithm 1: Serial CUR Algorithm For the analysis of this algorithm, we choose to focus on the construction of the C and R matrices. 3 Assume we have a sparse matrix A which has m rows, n columns and b non-zero entries. We will consider A to be sparse. Further, 3 This part of the algorithm provides opportunity for distribution amongst a cluster of machines. 3

4 assume r, c, and k are constants as described above in Algorithm 1. Though the algorithm above assumes the user is already provided the probability measures for both the columns and rows of A, this may not be the case. Thus, it is important to also analyze the run-time to develop these figures. The probabilities are based off the L2 norm of each column and row. The L2 norm is calculated as: x = Σ n k=1 x2 k For each column c j, we calculate its L2 norm. A similar procedure is done for each row r i. Next, we must loop through each c j again to divide its L2 norm by the Frobenius norm, calculated as: A 2 F = Σ m i=1σ n j=1a 2 ij This provides our probability measure for each column and row of the matrix. This procedures takes O(mn), as calculating the Frobenius, as well as the L2 norm of columns and L2 norm of the rows, each take O(mn). Given the probability measures for both the columns and rows of A, we must sample c columns and r rows. To do this, we convert our probability measures for each column into a cumulative sum. We first draw c random numbers between 0 and 1, and, using a binary-search, determine the corresponding column of A to which the random number is associated with. Lastly, we must scale each number of the selected column. This process takes O(n + c log n + cn), or more simply, O(cn), due to the scaling procedure. A similar analysis shows that the run-time for sampling r rows from A takes O(rm). As discussed above, the remainder of the algorithm does not provide significant opportunity to distribute. Thus, we have chosen to not analyze the run-time. 3.2 Distributed CUR Algorithm/Implementation of CUR From our analysis above for the Serial CUR algorithm, it is clear there are areas of the algorithm which we can improve from a run-time perspective using distributed computing. The following is our proposed distributed algorithm for CUR. 4 4 Our algorithm does not assume probability measures have been provided to the user. 4

5 Data: A R m n, provided in coordinate form (i.e. (x, y, v)) with b non-zero entries, r, c, k Z + such that 1 r m, 1 c n, and 1 k min(r, c), and p machines. Note, a d subscript (i.e. x d ) implies x is an RDD. Result: C, U and R matrices R d = partition of A by rows, with key-value pairs = (x, (y, v)) C d = partition of A by columns, with key-value pairs = (y, (x, v)) Compute RN d as reduce among keys of R d Compute CN d as reduce among keys of C d Compute T as reduce of RN d Compute RP d probability measures of RN d with T as scaling factor Compute CP d probability measures of CN d with T as scaling factor Sample r rows locally from RP d and collect data Sample c columns locally from CP d and collect data Follow Serial CUR Algorithm to compute U Algorithm 2: Proposed Distributed CUR Algorithm Analysis of Run Time Similar to Serial CUR, for the analysis of this algorithm, we choose to focus on the construction of the C and R matrices. We again assume we have a sparse matrix A which has m rows, n columns and b non-zero entries. We will consider A to be sparse. Further, assume r, c, and k are constants as described above in Algorithm 2. First, the algorithm calls to duplicate the A matrix coordinate data, and partition one by columns and the other by rows. 5 This partition will require two all-to-all communications, each with a potential cost of O(mn). However, as noted in the assumptions, there are b non-zero entries, so we can be more precise and note the cost at O(b). By paying a potentially high up-front cost to move the data, we will save ourselves multiple all-to-all communications in subsequent steps. Next, we compute the column and row norms. This is similar to our L2 calculation in the Serial CUR algorithm. However, since we have two partitioned data sets, one by rows and the other by columns, this becomes an embarrassingly parallel calculation. There is no required communication, and the run-time is calculated as the last reducer to finish. In this case, this is determined by the row or column with most non-zero entries. Given this is a sparse matrix, we assume this cost will be small. After calculating each column and row L2 norm, we must calculate the Frobenius norm of A. While calculating the L2 norm of each column and row, we also calculate the sum of each entry squared. Thus, for a given key r i, our RN d 5 This is not an uncommon practice, and is implemented in other distributed algorithms. 5

6 contains: ( ( )) r i, Σ n j=1 A2 ij, Σn j=1a 2 ij Therefore, by reducing the RN d, we can also calculate the Frobenius norm. This reduce will require an all-to-one communication. Here, we take advantage p of the tree pattern, which says at each step i, 2 machines communicate with p i 2 other machines. Since we are only sending a constant amount of numbers i (in this case, just one number) between each machine, the computation cost is O(p) and the communication time is O(log p). To calculate each column and row s probability measure, we must collect each column and row norm, and divide it by the Frobenius norm, or T. This requires an all-to-1 communication. Again, we will take advantage of the tree pattern. Once we have all of the norms located on the driver, the simple division is assumed to be O(1). With the probability measures calculated, we can now sample the c columns and r rows to create C and R matrices. Similarly to the Serial CUR algorithm, we calculate the cumulative sum, and draw c random numbers between 0 and 1. Then, using a binary-search, we determine the corresponding column of A associated with each random number. This process takes O(n + c log n). Since c is assumed to be a small number, we report this as O(n). A similar analysis shows the run-time for sampling r rows is O(m). Finally, we must create our C and R matrices. We broadcast the selected columns, or c, and selected rows, r, to each machine. Each column or row determines if it has been selected, and if so, is communicated back to the driver. The initial broadcast is an all-to-one communication, with computation cost of O((c + r)p) and communication time of O((c + r) log p). To collect the selected rows and columns, we need to grab data from at most r + c machines. We can still implement a tree pattern of communication, but it will involve less machines. This will resemble an all-to-one communication, with a computation cost of O(max(m, n)(r+c)) and communication time of O(max(m, n) log(r+c)). Lastly, with C and R now stored locally, we follow the Serial CUR Algorithm to compute U Benefits of Duplicating and Partitioning Data The up-front cost paid for duplicating and partitioning the data is easily outweighed by the savings farther along the algorithm. Without duplicating and partitioning, we would need to incur an additional all-to-all communications with potentially large shuffle sizes. First, computing the CN d and RN d would both potentially require an all-to-all communication. There is no guarantee that 6

7 rows or columns would be on the same machine, thus calculating any norms would require moving them to be local first. Second, by duplicating the data, we can have one partitioned by rows and another partitioned by columns. This allows us to easily process calculations for each in a efficient manner without restructuring the data each time. Further, since we assume A is sparse, the increase in storage size is assumed to be small, and thus not a huge cost. 4 Applications of CUR: Bi-clustering 4.1 Introduction to Bi-Clustering Bi-clustering was initially developed from an applications perspective for identifying structure in DNA microarray data. In this context, the search is for groups of genes ( marker genes ) that identify certain sets of conditions (e.g. tumor types). The concept of bi-clustering is to simultaneously cluster the genes and tumor samples so as to let the data determine which genes expressed most prominently with which tumors and there by group both genes and tumors simultaneously. In the context of movie ratings, this might be seen as looking for movie groupings (e.g. playlists ) that target a particular taste profile (group of users.) The benefit of this approach (e.g. vs. collaborative filtering) may be a greater contextual awareness for the user. A playlist might be suggested to a group of users along with an explanation of their machine-generated taste profile. 4.2 Previous Algorithms Using SVD There are a number of approaches to achieve the biclustered result. Our approach in this paper focuses on spectral biclustering in particular - namely, the use of the SVD as a core component. Specifically we followed the approach as presented in Kluger, et al. (2003) [3]. The main steps are: 1. Normalize the data 2. Compute the SVD 3. Select best vectors and k-means cluster 1) Normalize the data Given the sparsity of our data set (MovieLens data), we chose a bistochastic approach which normalizes the rows and columns of the matrix simultaneously in an iterative manner so all rows sum to one constant and all columns sum to a different constant. 2) Compute the SVD 7

8 The algorithm only requires the finding of singular vectors (not values) and uses a randomized approach or ARPACK. Given the randomization already part of the CUR algorithm, we chose ARPACK which uses the underlying sklearn implementation at sklearn.utils.arpack.svds. The underlying implementation uses the Lanczos algorithm (an adaptation of power methods) to find the most useful vectors. 3) Select best vectors and k-means cluster Select a subset of singular vectors and pick the best subset x out of y total singular vectors i.e. that are best approximated according to Euclidean distance by piecewise constant vectors representing the clusters. The piecewise constant vectors are found using k-means clustering.* Finally the columns are rows are projected onto these singular vectors and the result is clustered. * For the k-means clustering there are options to use k-means++ (default), randomly initialize or initialize with a specific array. We stuck with the k-mean++ default. The number of k-means initializations can be specified as well. For our experiments thus far, we left the default at 10. Finally, there is also a mini-batch k-means option which attempts several initializations to determine the best and the run the full algorithm only once. 4.3 Possible benefits of CUR A CUR version of the algorithm above focuses on step 2 from Section 4.2 above, the calculation of the SVD which is the most time consuming step (based on our observation of repeated runs via DataBricks.) As described in Section 3, instead of singular vectors (in rotated space), CUR uses actual rows and columns attempting to approximate these singular vectors. These rows and columns are selected via a random algorithm that can run very quickly. We can replace the CUR for SVD within the biclustering algorithm easily because we only need the singular vectors (not the singular values). As discussed in Section 3, we have come up with a distributed approach enabling all the benefits of the CUR algorithm in a highly scalable form - i.e. when the A matrix (e.g. ratings matrix) does not fit in memory on a single machine. 4.4 Experiment and Results In our implementation, we started with an existing implementation of the algorithm above in sklearn.cluster.bicluster. 6 Then, we modified this algorithm to use CUR by breaking existing code into sections and parameterizing as separate function calls (i.e. inserting hooks ) according to the 3 primary algorithm steps above enabling us to effectively replace SVD with CUR. For the baseline 6 Specifically, we implemented SpectralBiclustering, which is available at: 8

9 serial CUR algorithm we started with an implementation from the Python Matrix Factorization (PyMF) 7 which uses Drineas, et al. (2006). [1] In particular, we implemented the linearly scaling CUR. (There is also a constant-time CUR which further sacrifices accuracy for scalability.) For our experiment, we used the MovieLens data. We ran the test on three different sizes: 100K, 500K and 1M. 8 We first ran the out of box algorithm in a serial fashion as a baseline. This was important for both SVD and CUR versions in order to match the results of our distributed algorithm to be confident we were producing the same result - e.g. counts per cluster in for SVD (exact) and CUR (approximate). Although these data sets were known to be sparse (Table 1 below), we also found that they skipped IDs - i.e. the IDs for users and movies was not consecutive and affected the matrix build. We had to implement a data preparation stage which effectively re-indexed the data so as to avoid any empty rows or columns. Dataset Users Movies Observations Sparsity Measure 100K % 500K % 1M % Table 1: Dimensions of datasets (MovieLens) All of our code was written in Python, tested locally, and then implemented in Spark via PySpark where serial was first tested then sections re-implemented in a distributed fashion. For each experimental test, we ran with the following fixed parameters: Data cluster count: 3 x 3 (users x movies clusters) SVD components: 6, best: 3 (defaults) CUR components: 24, best: 12 Machine cluster count: 8 (AWS Memory-optimized) - Spark 8 Node, 270 GB On-demand, Spark (Hadoop 1), 23.2 GB per machine (1 driver, 8 masters) 9 SVD runs before CUR, using shared memory 7 The code is available at: 8 The MovieLens data provides data sets for 100k and 1M. We also created a 500K row data set by sub-sampling users from the 1M row set. The data is available at: 9 For an 8 node cluster established via Databricks, we observed that it spun up 5 instances in AWS (Amazon Web Services) and so some nodes were sharing the same machine. 9

10 Spark cluster restarted between runs to clear memory, as well as Databricks notebook detached/reattached (to clear any local memory) Run Dataset SVD Time CUR Time Speed Up Consensus Score 1 100K K K K K K M M M Table 2: Experimental results on increasing dataset size (MovieLens) The SVD and CUR times in Table 2 above are in seconds. The Consensus Score above was calculated via sklearn.metrics.consensus score using the defaults which employ a Jaccard similarity coefficient. The sklearn function takes 4 vectors identifying the rows (users) and columns (movie) cluster memberships i.e. both the SVD-based cluster result and the CUR-based cluster result. The similarity is then calculated as size of the intersection of these row/col sets divided by the size of the union of the row/column sets for each cluster. Consider the two sets as a bipartite graph, the best matching between row/column sets is found using the Hungarian algorithm, a combinatorial optimization algorithm.[2] The final score is then the sum the similarities divided by the size of the larger set. Figure 3 and Figure 4 below provide visual representations of the tabular data above. These were created in R with ggplot and jitter to ensure that points do not obscure each other. 5 Conclusions As can be seen in the results above we saw a decreased runtime in a distributed biclustering context. We gain the benefits of distribution (e.g. working with data that is not possible to fit on a single machine) and still keep the scalability and speed benefits of CUR. As discussed, CUR approximates the SVD with an actual data sample. In the course of our experimentation, we found this was indeed the case, especially very tall and skinny matrices (e.g. by limiting the total number of rated movies). However, the curse of dimensionality plays a role as the dimension of each observation (e.g. total number of movies rated) becomes high, making the comparison to SVD more difficult because the the singular vector is harder to approximate. 10

11 Figure 3: Distributed Run Times Figure 4: Biclusters Consensus Score 6 Further Research Opportunities We believe there are many opportunities for further research to both improve the run-time of our proposed Distributed CUR algorithm, and continue to test CUR as an alternative to SVD for bi-clustering. Though we believe our proposed algorithm provides a nice outline to improve the run-time of calculating the CUR decomposition, there are possible bottlenecks. For instance, to compute the row and column norms, are algorithm performs a simple serial sum, which is linear in the number of non-zero items. However, given a very dense row or column (for simplicity, call this r l ), this become a bottleneck to the algorithm. One potential solution is to distribute r l among more than one machine. For instance, for a given matrix A, if row r l is very dense, while every other row r i only has 1 entry, then partitioning by rows may not be effective. Thus, we would like to partition by rows while requiring each partition to contain a roughly equal number of observations. This may require some prior knowledge about the data beforehand to determine which rows to partition across multiple machines, and which to keep intact. We would also like to continue to test our distributed algorithm on larger data sets. Though we feel our results reflect the true speed increase of distributed CUR vs SVD, in practical use today, data sets are much larger in size. Movie- Lens data offers sample data of larger sizes, including 10M, 20M and 22M. These would provide us with a nice range of data sets to use. 11

12 References [1] Petros Drineas, Ravi Kannan, and Michael W. Mahoney. Fast Monte Carlo Algorithms For Matrices III: Computing a Compressed Approximate Matrix Decomposition. In: Society for Industrial and Applied Mathematics 36.1 (2006), pp doi: /S [2] Sepp Hochreiter et al. FABIA: factor analysis for bicluster acquisition. In: Bioinformatics (2010), pp doi: /bioinformatics/ btq227. [3] Yuval Kluger et al. Spectral Biclustering of Microarray Cancer Data: Coclustering Genes and Conditions. In: Genome Research 13 (2003), pp doi:

LSRN: A Parallel Iterative Solver for Strongly Over- or Under-Determined Systems

LSRN: A Parallel Iterative Solver for Strongly Over- or Under-Determined Systems LSRN: A Parallel Iterative Solver for Strongly Over- or Under-Determined Systems Xiangrui Meng Joint with Michael A. Saunders and Michael W. Mahoney Stanford University June 19, 2012 Meng, Saunders, Mahoney

More information

Distributed Machine Learning" on Spark

Distributed Machine Learning on Spark Distributed Machine Learning" on Spark Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Outline Data flow vs. traditional network programming Spark computing engine Optimization Example Matrix Computations

More information

Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments

Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments Jiyan Yang ICME, Stanford University Nov 1, 2015 INFORMS 2015, Philadephia Joint work with Xiangrui Meng (Databricks),

More information

Collaborative Filtering for Netflix

Collaborative Filtering for Netflix Collaborative Filtering for Netflix Michael Percy Dec 10, 2009 Abstract The Netflix movie-recommendation problem was investigated and the incremental Singular Value Decomposition (SVD) algorithm was implemented

More information

Fast Approximation of Matrix Coherence and Statistical Leverage

Fast Approximation of Matrix Coherence and Statistical Leverage Fast Approximation of Matrix Coherence and Statistical Leverage Michael W. Mahoney Stanford University ( For more info, see: http:// cs.stanford.edu/people/mmahoney/ or Google on Michael Mahoney ) Statistical

More information

Information Retrieval Through Various Approximate Matrix Decompositions

Information Retrieval Through Various Approximate Matrix Decompositions Information Retrieval Through Various Approximate Matrix Decompositions Kathryn Linehan, klinehan@math.umd.edu Advisor: Dr. Dianne O Leary, oleary@cs.umd.edu Abstract Information retrieval is extracting

More information

General Instructions. Questions

General Instructions. Questions CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

Iterative Signature Algorithm for the Analysis of Large-Scale Gene Expression Data. By S. Bergmann, J. Ihmels, N. Barkai

Iterative Signature Algorithm for the Analysis of Large-Scale Gene Expression Data. By S. Bergmann, J. Ihmels, N. Barkai Iterative Signature Algorithm for the Analysis of Large-Scale Gene Expression Data By S. Bergmann, J. Ihmels, N. Barkai Reasoning Both clustering and Singular Value Decomposition(SVD) are useful tools

More information

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as

More information

New user profile learning for extremely sparse data sets

New user profile learning for extremely sparse data sets New user profile learning for extremely sparse data sets Tomasz Hoffmann, Tadeusz Janasiewicz, and Andrzej Szwabe Institute of Control and Information Engineering, Poznan University of Technology, pl.

More information

Principal Component Analysis for Distributed Data

Principal Component Analysis for Distributed Data Principal Component Analysis for Distributed Data David Woodruff IBM Almaden Based on works with Ken Clarkson, Ravi Kannan, and Santosh Vempala Outline 1. What is low rank approximation? 2. How do we solve

More information

Singular Value Decomposition, and Application to Recommender Systems

Singular Value Decomposition, and Application to Recommender Systems Singular Value Decomposition, and Application to Recommender Systems CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Recommendation

More information

Dimension reduction : PCA and Clustering

Dimension reduction : PCA and Clustering Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental

More information

Graph Classification for the Red Team Event of the Los Alamos Authentication Graph. John M. Conroy IDA Center for Computing Sciences

Graph Classification for the Red Team Event of the Los Alamos Authentication Graph. John M. Conroy IDA Center for Computing Sciences Graph Classification for the Red Team Event of the Los Alamos Authentication Graph John M. Conroy IDA Center for Computing Sciences Thanks! Niall Adams, Imperial College, London! Nick Heard, Imperial College,

More information

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan Explore Co-clustering on Job Applications Qingyun Wan SUNet ID:qywan 1 Introduction In the job marketplace, the supply side represents the job postings posted by job posters and the demand side presents

More information

CS Introduction to Data Mining Instructor: Abdullah Mueen

CS Introduction to Data Mining Instructor: Abdullah Mueen CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts

More information

Clustering. Lecture 6, 1/24/03 ECS289A

Clustering. Lecture 6, 1/24/03 ECS289A Clustering Lecture 6, 1/24/03 What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed

More information

A Spectral-based Clustering Algorithm for Categorical Data Using Data Summaries (SCCADDS)

A Spectral-based Clustering Algorithm for Categorical Data Using Data Summaries (SCCADDS) A Spectral-based Clustering Algorithm for Categorical Data Using Data Summaries (SCCADDS) Eman Abdu eha90@aol.com Graduate Center The City University of New York Douglas Salane dsalane@jjay.cuny.edu Center

More information

Compression, Clustering and Pattern Discovery in Very High Dimensional Discrete-Attribute Datasets

Compression, Clustering and Pattern Discovery in Very High Dimensional Discrete-Attribute Datasets IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1 Compression, Clustering and Pattern Discovery in Very High Dimensional Discrete-Attribute Datasets Mehmet Koyutürk, Ananth Grama, and Naren Ramakrishnan

More information

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017 Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2017 Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data? Supervised vs. Unsupervised

More information

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology 9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example

More information

Exploratory data analysis for microarrays

Exploratory data analysis for microarrays Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA

More information

Progress Report: Collaborative Filtering Using Bregman Co-clustering

Progress Report: Collaborative Filtering Using Bregman Co-clustering Progress Report: Collaborative Filtering Using Bregman Co-clustering Wei Tang, Srivatsan Ramanujam, and Andrew Dreher April 4, 2008 1 Introduction Analytics are becoming increasingly important for business

More information

Sampling PCA, enhancing recovered missing values in large scale matrices. Luis Gabriel De Alba Rivera 80555S

Sampling PCA, enhancing recovered missing values in large scale matrices. Luis Gabriel De Alba Rivera 80555S Sampling PCA, enhancing recovered missing values in large scale matrices. Luis Gabriel De Alba Rivera 80555S May 2, 2009 Introduction Human preferences (the quality tags we put on things) are language

More information

Dense Matrix Algorithms

Dense Matrix Algorithms Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003. Topic Overview Matrix-Vector Multiplication

More information

COSC2430: Programming and Data Structures Sparse Matrix Addition 2 2: Non-recursive and recursive; arrays and linked lists

COSC2430: Programming and Data Structures Sparse Matrix Addition 2 2: Non-recursive and recursive; arrays and linked lists COSC2430: Programming and Data Structures Sparse Matrix Addition 2 2: Non-recursive and recursive; arrays and linked lists 1 Introduction You will create a C++ program to add two sparse matrices. Sparse

More information

Bipartite Edge Prediction via Transductive Learning over Product Graphs

Bipartite Edge Prediction via Transductive Learning over Product Graphs Bipartite Edge Prediction via Transductive Learning over Product Graphs Hanxiao Liu, Yiming Yang School of Computer Science, Carnegie Mellon University July 8, 2015 ICML 2015 Bipartite Edge Prediction

More information

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD Goals. The goal of the first part of this lab is to demonstrate how the SVD can be used to remove redundancies in data; in this example

More information

MLlib and Distributing the " Singular Value Decomposition. Reza Zadeh

MLlib and Distributing the  Singular Value Decomposition. Reza Zadeh MLlib and Distributing the " Singular Value Decomposition Reza Zadeh Outline Example Invocations Benefits of Iterations Singular Value Decomposition All-pairs Similarity Computation MLlib + {Streaming,

More information

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies. CSE 547: Machine Learning for Big Data Spring 2019 Problem Set 2 Please read the homework submission policies. 1 Principal Component Analysis and Reconstruction (25 points) Let s do PCA and reconstruct

More information

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions

More information

Text Modeling with the Trace Norm

Text Modeling with the Trace Norm Text Modeling with the Trace Norm Jason D. M. Rennie jrennie@gmail.com April 14, 2006 1 Introduction We have two goals: (1) to find a low-dimensional representation of text that allows generalization to

More information

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please) Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

Stanford University. A Distributed Solver for Kernalized SVM

Stanford University. A Distributed Solver for Kernalized SVM Stanford University CME 323 Final Project A Distributed Solver for Kernalized SVM Haoming Li Bangzheng He haoming@stanford.edu bzhe@stanford.edu GitHub Repository https://github.com/cme323project/spark_kernel_svm.git

More information

Sparse Estimation of Movie Preferences via Constrained Optimization

Sparse Estimation of Movie Preferences via Constrained Optimization Sparse Estimation of Movie Preferences via Constrained Optimization Alexander Anemogiannis, Ajay Mandlekar, Matt Tsao December 17, 2016 Abstract We propose extensions to traditional low-rank matrix completion

More information

On Dimensionality Reduction of Massive Graphs for Indexing and Retrieval

On Dimensionality Reduction of Massive Graphs for Indexing and Retrieval On Dimensionality Reduction of Massive Graphs for Indexing and Retrieval Charu C. Aggarwal 1, Haixun Wang # IBM T. J. Watson Research Center Hawthorne, NY 153, USA 1 charu@us.ibm.com # Microsoft Research

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 15: Microarray clustering http://compbio.pbworks.com/f/wood2.gif Some slides were adapted from Dr. Shaojie Zhang (University of Central Florida) Microarray

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Data Matrices and Vector Space Model Denis Helic KTI, TU Graz Nov 6, 2014 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 1 / 55 Big picture: KDDM Probability

More information

Predictive Indexing for Fast Search

Predictive Indexing for Fast Search Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive

More information

Today. Parity. General Polygons? Non-Zero Winding Rule. Winding Numbers. CS559 Lecture 11 Polygons, Transformations

Today. Parity. General Polygons? Non-Zero Winding Rule. Winding Numbers. CS559 Lecture 11 Polygons, Transformations CS559 Lecture Polygons, Transformations These are course notes (not used as slides) Written by Mike Gleicher, Oct. 005 With some slides adapted from the notes of Stephen Chenney Final version (after class)

More information

Principal Coordinate Clustering

Principal Coordinate Clustering Principal Coordinate Clustering Ali Sekmen, Akram Aldroubi, Ahmet Bugra Koku, Keaton Hamm Department of Computer Science, Tennessee State University Department of Mathematics, Vanderbilt University Department

More information

CSC 411 Lecture 18: Matrix Factorizations

CSC 411 Lecture 18: Matrix Factorizations CSC 411 Lecture 18: Matrix Factorizations Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 18-Matrix Factorizations 1 / 27 Overview Recall PCA: project data

More information

Divide and Conquer Kernel Ridge Regression

Divide and Conquer Kernel Ridge Regression Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem

More information

Feature selection. LING 572 Fei Xia

Feature selection. LING 572 Fei Xia Feature selection LING 572 Fei Xia 1 Creating attribute-value table x 1 x 2 f 1 f 2 f K y Choose features: Define feature templates Instantiate the feature templates Dimensionality reduction: feature selection

More information

An Approximate Singular Value Decomposition of Large Matrices in Julia

An Approximate Singular Value Decomposition of Large Matrices in Julia An Approximate Singular Value Decomposition of Large Matrices in Julia Alexander J. Turner 1, 1 Harvard University, School of Engineering and Applied Sciences, Cambridge, MA, USA. In this project, I implement

More information

Convex Optimization / Homework 2, due Oct 3

Convex Optimization / Homework 2, due Oct 3 Convex Optimization 0-725/36-725 Homework 2, due Oct 3 Instructions: You must complete Problems 3 and either Problem 4 or Problem 5 (your choice between the two) When you submit the homework, upload a

More information

Specialist ICT Learning

Specialist ICT Learning Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.

More information

Comparison of Variational Bayes and Gibbs Sampling in Reconstruction of Missing Values with Probabilistic Principal Component Analysis

Comparison of Variational Bayes and Gibbs Sampling in Reconstruction of Missing Values with Probabilistic Principal Component Analysis Comparison of Variational Bayes and Gibbs Sampling in Reconstruction of Missing Values with Probabilistic Principal Component Analysis Luis Gabriel De Alba Rivera Aalto University School of Science and

More information

Lecture 27: Fast Laplacian Solvers

Lecture 27: Fast Laplacian Solvers Lecture 27: Fast Laplacian Solvers Scribed by Eric Lee, Eston Schweickart, Chengrun Yang November 21, 2017 1 How Fast Laplacian Solvers Work We want to solve Lx = b with L being a Laplacian matrix. Recall

More information

Biclustering Algorithms for Gene Expression Analysis

Biclustering Algorithms for Gene Expression Analysis Biclustering Algorithms for Gene Expression Analysis T. M. Murali August 19, 2008 Problems with Hierarchical Clustering It is a global clustering algorithm. Considers all genes to be equally important

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Reviewer Profiling Using Sparse Matrix Regression

Reviewer Profiling Using Sparse Matrix Regression Reviewer Profiling Using Sparse Matrix Regression Evangelos E. Papalexakis, Nicholas D. Sidiropoulos, Minos N. Garofalakis Technical University of Crete, ECE department 14 December 2010, OEDM 2010, Sydney,

More information

Efficient Semi-supervised Spectral Co-clustering with Constraints

Efficient Semi-supervised Spectral Co-clustering with Constraints 2010 IEEE International Conference on Data Mining Efficient Semi-supervised Spectral Co-clustering with Constraints Xiaoxiao Shi, Wei Fan, Philip S. Yu Department of Computer Science, University of Illinois

More information

A GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang

A GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang A GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang University of Massachusetts Amherst Introduction Singular Value Decomposition (SVD) A: m n matrix (m n) U, V: orthogonal

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Neural Computation : Lecture 14 John A. Bullinaria, 2015 1. The RBF Mapping 2. The RBF Network Architecture 3. Computational Power of RBF Networks 4. Training

More information

SVM Classification in -Arrays

SVM Classification in -Arrays SVM Classification in -Arrays SVM classification and validation of cancer tissue samples using microarray expression data Furey et al, 2000 Special Topics in Bioinformatics, SS10 A. Regl, 7055213 What

More information

Project 2: Structure from Motion

Project 2: Structure from Motion Project 2: Structure from Motion CIS 580, Machine Perception, Spring 2015 Preliminary report due: 2015.04.27. 11:59AM Final Due: 2015.05.06. 11:59AM This project aims to reconstruct a 3D point cloud and

More information

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\ Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured

More information

Biclustering with δ-pcluster John Tantalo. 1. Introduction

Biclustering with δ-pcluster John Tantalo. 1. Introduction Biclustering with δ-pcluster John Tantalo 1. Introduction The subject of biclustering is chiefly concerned with locating submatrices of gene expression data that exhibit shared trends between genes. That

More information

Apache SystemML Declarative Machine Learning

Apache SystemML Declarative Machine Learning Apache Big Data Seville 2016 Apache SystemML Declarative Machine Learning Luciano Resende About Me Luciano Resende (lresende@apache.org) Architect and community liaison at Have been contributing to open

More information

Behavioral Data Mining. Lecture 18 Clustering

Behavioral Data Mining. Lecture 18 Clustering Behavioral Data Mining Lecture 18 Clustering Outline Why? Cluster quality K-means Spectral clustering Generative Models Rationale Given a set {X i } for i = 1,,n, a clustering is a partition of the X i

More information

Large-Scale Face Manifold Learning

Large-Scale Face Manifold Learning Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1 Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random

More information

Orange3 Data Fusion Documentation. Biolab

Orange3 Data Fusion Documentation. Biolab Biolab Mar 07, 2018 Widgets 1 IMDb Actors 1 2 Chaining 5 3 Completion Scoring 9 4 Fusion Graph 13 5 Latent Factors 17 6 Matrix Sampler 21 7 Mean Fuser 25 8 Movie Genres 29 9 Movie Ratings 33 10 Table

More information

TELCOM2125: Network Science and Analysis

TELCOM2125: Network Science and Analysis School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning

More information

Generalized trace ratio optimization and applications

Generalized trace ratio optimization and applications Generalized trace ratio optimization and applications Mohammed Bellalij, Saïd Hanafi, Rita Macedo and Raca Todosijevic University of Valenciennes, France PGMO Days, 2-4 October 2013 ENSTA ParisTech PGMO

More information

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian Demystifying movie ratings 224W Project Report Amritha Raghunath (amrithar@stanford.edu) Vignesh Ganapathi Subramanian (vigansub@stanford.edu) 9 December, 2014 Introduction The past decade or so has seen

More information

BOOLEAN MATRIX FACTORIZATIONS. with applications in data mining Pauli Miettinen

BOOLEAN MATRIX FACTORIZATIONS. with applications in data mining Pauli Miettinen BOOLEAN MATRIX FACTORIZATIONS with applications in data mining Pauli Miettinen MATRIX FACTORIZATIONS BOOLEAN MATRIX FACTORIZATIONS o THE BOOLEAN MATRIX PRODUCT As normal matrix product, but with addition

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Recursive inverse factorization

Recursive inverse factorization Recursive inverse factorization Anton Artemov Division of Scientific Computing, Uppsala University anton.artemov@it.uu.se 06.09.2017 Context of work Research group on large scale electronic structure computations

More information

Visual Representations for Machine Learning

Visual Representations for Machine Learning Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering

More information

2.3 Algorithms Using Map-Reduce

2.3 Algorithms Using Map-Reduce 28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

1 Training/Validation/Testing

1 Training/Validation/Testing CPSC 340 Final (Fall 2015) Name: Student Number: Please enter your information above, turn off cellphones, space yourselves out throughout the room, and wait until the official start of the exam to begin.

More information

Computer Graphics. Coordinate Systems and Change of Frames. Based on slides by Dianna Xu, Bryn Mawr College

Computer Graphics. Coordinate Systems and Change of Frames. Based on slides by Dianna Xu, Bryn Mawr College Computer Graphics Coordinate Systems and Change of Frames Based on slides by Dianna Xu, Bryn Mawr College Linear Independence A set of vectors independent if is linearly If a set of vectors is linearly

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1 Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group

More information

Optimization for Machine Learning

Optimization for Machine Learning with a focus on proximal gradient descent algorithm Department of Computer Science and Engineering Outline 1 History & Trends 2 Proximal Gradient Descent 3 Three Applications A Brief History A. Convex

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Minh Dao 1, Xiang Xiang 1, Bulent Ayhan 2, Chiman Kwan 2, Trac D. Tran 1 Johns Hopkins Univeristy, 3400

More information

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM) School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression

More information

A Memetic Heuristic for the Co-clustering Problem

A Memetic Heuristic for the Co-clustering Problem A Memetic Heuristic for the Co-clustering Problem Mohammad Khoshneshin 1, Mahtab Ghazizadeh 2, W. Nick Street 1, and Jeffrey W. Ohlmann 1 1 The University of Iowa, Iowa City IA 52242, USA {mohammad-khoshneshin,nick-street,jeffrey-ohlmann}@uiowa.edu

More information

Community detection algorithms survey and overlapping communities. Presented by Sai Ravi Kiran Mallampati

Community detection algorithms survey and overlapping communities. Presented by Sai Ravi Kiran Mallampati Community detection algorithms survey and overlapping communities Presented by Sai Ravi Kiran Mallampati (sairavi5@vt.edu) 1 Outline Various community detection algorithms: Intuition * Evaluation of the

More information

Spectral Graph Sparsification: overview of theory and practical methods. Yiannis Koutis. University of Puerto Rico - Rio Piedras

Spectral Graph Sparsification: overview of theory and practical methods. Yiannis Koutis. University of Puerto Rico - Rio Piedras Spectral Graph Sparsification: overview of theory and practical methods Yiannis Koutis University of Puerto Rico - Rio Piedras Graph Sparsification or Sketching Compute a smaller graph that preserves some

More information

Mathematics. 2.1 Introduction: Graphs as Matrices Adjacency Matrix: Undirected Graphs, Directed Graphs, Weighted Graphs CHAPTER 2

Mathematics. 2.1 Introduction: Graphs as Matrices Adjacency Matrix: Undirected Graphs, Directed Graphs, Weighted Graphs CHAPTER 2 CHAPTER Mathematics 8 9 10 11 1 1 1 1 1 1 18 19 0 1.1 Introduction: Graphs as Matrices This chapter describes the mathematics in the GraphBLAS standard. The GraphBLAS define a narrow set of mathematical

More information

Tighter Low-rank Approximation via Sampling the Leveraged Element

Tighter Low-rank Approximation via Sampling the Leveraged Element Tighter Low-rank Approximation via Sampling the Leveraged Element Srinadh Bhojanapalli The University of Texas at Austin bsrinadh@utexas.edu Prateek Jain Microsoft Research, India prajain@microsoft.com

More information

Recommender System. What is it? How to build it? Challenges. R package: recommenderlab

Recommender System. What is it? How to build it? Challenges. R package: recommenderlab Recommender System What is it? How to build it? Challenges R package: recommenderlab 1 What is a recommender system Wiki definition: A recommender system or a recommendation system (sometimes replacing

More information

Combinatorial Optimization of Matrix-Vector Multiplication for Finite Element Assembly

Combinatorial Optimization of Matrix-Vector Multiplication for Finite Element Assembly Combinatorial Optimization of Matrix-Vector Multiplication for Finite Element Assembly Michael M. Wolf Michael T. Heath February, 9 Abstract It has been shown that combinatorial optimization of matrix-vector

More information

15.1 Data flow vs. traditional network programming

15.1 Data flow vs. traditional network programming CME 323: Distributed Algorithms and Optimization, Spring 2017 http://stanford.edu/~rezab/dao. Instructor: Reza Zadeh, Matroid and Stanford. Lecture 15, 5/22/2017. Scribed by D. Penner, A. Shoemaker, and

More information

Distributed computing: index building and use

Distributed computing: index building and use Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput

More information

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 1 Today Characteristics of Tasks and Interactions (3.3). Mapping Techniques for Load Balancing (3.4). Methods for Containing Interaction

More information

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among

More information

Matrix-Vector Multiplication by MapReduce. From Rajaraman / Ullman- Ch.2 Part 1

Matrix-Vector Multiplication by MapReduce. From Rajaraman / Ullman- Ch.2 Part 1 Matrix-Vector Multiplication by MapReduce From Rajaraman / Ullman- Ch.2 Part 1 Google implementation of MapReduce created to execute very large matrix-vector multiplications When ranking of Web pages that

More information

Explicit fuzzy modeling of shapes and positioning for handwritten Chinese character recognition

Explicit fuzzy modeling of shapes and positioning for handwritten Chinese character recognition 2009 0th International Conference on Document Analysis and Recognition Explicit fuzzy modeling of and positioning for handwritten Chinese character recognition Adrien Delaye - Eric Anquetil - Sébastien

More information

Clustering Lecture 9: Other Topics. Jing Gao SUNY Buffalo

Clustering Lecture 9: Other Topics. Jing Gao SUNY Buffalo Clustering Lecture 9: Other Topics Jing Gao SUNY Buffalo 1 Basics Outline Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Miture model Spectral methods Advanced topics

More information

Solutions to Problem Set 1

Solutions to Problem Set 1 CSCI-GA.3520-001 Honors Analysis of Algorithms Solutions to Problem Set 1 Problem 1 An O(n) algorithm that finds the kth integer in an array a = (a 1,..., a n ) of n distinct integers. Basic Idea Using

More information

Databases 2 (VU) ( / )

Databases 2 (VU) ( / ) Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:

More information

Missing Data Estimation in Microarrays Using Multi-Organism Approach

Missing Data Estimation in Microarrays Using Multi-Organism Approach Missing Data Estimation in Microarrays Using Multi-Organism Approach Marcel Nassar and Hady Zeineddine Progress Report: Data Mining Course Project, Spring 2008 Prof. Inderjit S. Dhillon April 02, 2008

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Randomized algorithms have several advantages over deterministic ones. We discuss them here:

Randomized algorithms have several advantages over deterministic ones. We discuss them here: CS787: Advanced Algorithms Lecture 6: Randomized Algorithms In this lecture we introduce randomized algorithms. We will begin by motivating the use of randomized algorithms through a few examples. Then

More information