Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan

Similar documents
A Patent Retrieval Method Using a Hierarchy of Clusters at TUT

Progress Report: Collaborative Filtering Using Bregman Co-clustering

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

Mining Social Network Graphs

Visual Representations for Machine Learning

Introduction to spectral clustering

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Information-Theoretic Co-clustering

Co-Clustering by Similarity Refinement

Mining Web Data. Lijun Zhang

Generalized Transitive Distance with Minimum Spanning Random Forest

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Data Clustering. Danushka Bollegala

BOOLEAN MATRIX FACTORIZATIONS. with applications in data mining Pauli Miettinen

Spectral Clustering and Community Detection in Labeled Graphs

Scalable Clustering of Signed Networks Using Balance Normalized Cut

Clustering CS 550: Machine Learning

Clustering Algorithms for general similarity measures

Customer Clustering using RFM analysis

Efficient Semi-supervised Spectral Co-clustering with Constraints

Size Regularized Cut for Data Clustering

Recommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu

MSA220 - Statistical Learning for Big Data

Gene Clustering & Classification

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Hierarchical Multi level Approach to graph clustering

Clustering Lecture 9: Other Topics. Jing Gao SUNY Buffalo

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

TELCOM2125: Network Science and Analysis

Unsupervised learning in Vision

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 11

On Order-Constrained Transitive Distance

The clustering in general is the task of grouping a set of objects in such a way that objects

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

ECLT 5810 Clustering

Louis Fourrier Fabien Gaie Thomas Rolf

ECLT 5810 Clustering

DOCUMENT CLUSTERING USING HIERARCHICAL METHODS. 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar. 3. P.Praveen Kumar. achieved.

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

Singular Value Decomposition, and Application to Recommender Systems

Introduction to Machine Learning

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Community Detection. Community

My favorite application using eigenvalues: partitioning and community detection in social networks

DEGENERACY AND THE FUNDAMENTAL THEOREM

c 2004 Society for Industrial and Applied Mathematics

Clustering & Classification (chapter 15)

Lecture 27: Fast Laplacian Solvers

Clustering. Bruno Martins. 1 st Semester 2012/2013

Learning Probabilistic Relational Models using co-clustering methods

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM

General Instructions. Questions

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Identifying Important Communications

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Distributed CUR Decomposition for Bi-Clustering

A Memetic Heuristic for the Co-clustering Problem

Robust PDF Table Locator

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16

Graph drawing in spectral layout

A ew Algorithm for Community Identification in Linked Data

Types of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters

Bipartite Graph Partitioning and Content-based Image Clustering

Evaluating Classifiers

CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Cluster Analysis (b) Lijun Zhang

Clustering: Classic Methods and Modern Views

Mining Web Data. Lijun Zhang

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems

Statistical Physics of Community Detection

Modern GPUs (Graphics Processing Units)

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Behavioral Data Mining. Lecture 18 Clustering

Cluster Analysis chapter 7. cse634 Data Mining. Professor Anita Wasilewska Compute Science Department Stony Brook University NY

Understanding Clustering Supervising the unsupervised

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

Segmentation: Clustering, Graph Cut and EM

Clustering Part 4 DBSCAN

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Collaborative Filtering for Netflix

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

International Journal of Advanced Research in Computer Science and Software Engineering

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Lesson 3. Prof. Enza Messina

CSCI-B609: A Theorist s Toolkit, Fall 2016 Sept. 6, Firstly let s consider a real world problem: community detection.

Collaborative Filtering using a Spreading Activation Approach

Analyzing Dshield Logs Using Fully Automatic Cross-Associations

Predicting Popular Xbox games based on Search Queries of Users

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Hierarchical Co-Clustering Based on Entropy Splitting

Final report - CS224W: Using Wikipedia pages traffic data to infer pertinence of links

Lecture 11: Clustering and the Spectral Partitioning Algorithm A note on randomized algorithm, Unbiased estimates

Transcription:

Explore Co-clustering on Job Applications Qingyun Wan SUNet ID:qywan

1 Introduction In the job marketplace, the supply side represents the job postings posted by job posters and the demand side presents job seekers who would like to apply for suitable jobs. In the platforms of job marketplace, like Linkedin, Indeed and etc., it is crucial for these job posting platforms to recommend desirable jobs to job seekers based on their career interests to deliver high value for both job posters and seekers to become qualified matches, and ultimately serve a reliabled ecosystem of job markeplace themselves. In the recommender system, there are two general approaches to provide recommendations. Consider the job marketplace. One approach is based on content, where job seekers preferences are predicted depending on how their explicit skills, titles, industries and etc. match the jobs. It involves extensive data from job seekers and jobs so that features can be complicated. Another approach is consider jobs preferred by similar job seekers via collaborative filtering which leverages less data. In collaborative filtering, one way to obtain similar job seekers is through clustering [1] by pre-training K-means clustering of seekers based on their applications to jobs. However, compared to one-way clustering, co-clustering on both job seekers and jobs simultaneously can may provide better performance and the constructed clusters can be directly used to improve the quality of job recommendations. In the follow sections, first, I will briefly discuss related work on co-clustering and the data set to use in this project. Second, I will introduce two co-clustering methods I experimented to deal with job applications. Third, I will discuss the validation process, compare the performances of co-clustering methods with the baseline algorithm - K-means and demonstrate visual comparision. At the end is the conclusion. 2 Related Work There are many co-clustering approaches that simultaneously clustering rows and columns of a given matrix experimented on clustering documents and words. Some of them are representative while the others more or less extend their ideas, as they use different methodologies but acheive the the goal. [2] attempted to minimize the loss of mutual information between original and clustered random variables. [3] proposed a spectral co-clustering algorithm based on bipartite graph, turning co-clustering into graph partitioning. [4] and [5] focused on low-rank matrix factorization and related it with simultaneously clustering rows and columns. The latter two involves matrix decompostion which are slow and computationally expensive so the work [6] that utilizes co-clustering in collaboritive filtering doesn t pick them for static training. To explore co-clustering jobs and job seekers via job application in this project, intuitively, [3] and [4] can possibly provide insightful clusters. The former one tries to build clusters that have more intra-cluster applications in and less inter-cluster applications, which can produce exclusive job clusters. The latter one instead can discover latent clusters for job seekers and jobs during matrix trifactorization to approximate the original seekerjob matrix. 3 Data set and Preprocessing The data set is from CareerBuilder s competition (https://www.kaggle.com/c/job-recommendatio n/data). It contains job applications lasting for 13 weeks. For each application record, each row in the original data set contains UserID, ApplicationDate and JobID. ApplicationDate is not considered in the project when splitting the data set into training and testing sets though it seems more fair to use job applications happened later to test against clusters trained on previous job applications, due to the fact that job postings are usually posted in a limited period of time and receives the majority of all applications as soon as they are posted and as a result, jobs applied in the training set will hardly appear in the testing set if splitted by ApplicationDate, which leads to insufficient and less representative testing datapoints. Therefore, I extracted UserID and JobID for each application from the raw data, removed duplicated applications, shuffled them randomly and transformed to 0-1 valued user-job matrices, whose rows represent unique users, columns represent unique jobs and entries represent whether the user applies for the job, since both of the two co-clustering methods take an input of user-job matrix. One concern is that the original user-job matrix is very sparse which contains non-trivial noise for co-clustering. So I also filtered out jobs whose job has less than certain number of job applications to reduce the sparsity. Table 1 shows the statistics when using different threshold for each job to filter: So the general preprocessing steps are: 1. Create two data sets of job applications from the original data set by filtering on different number (75 and 100) of job applications per job. 2

Table 1 Minimum # of Job Applications per Job Density # of Job Applications # of Jobs # of Users 75 0.32% 79666 823 29957 100 0.73% 32873 271 16449 2. For each data set generated above, split into 5 partitions and create 5 pairs of training set and testing set for 5-fold cross validation. 3. For each training set, convert it into 0-1 user-job matrix for clustering. Two data sets yield user-job matrices with different densities. The sizes of training and testing set for each density is in Table 2. Table 2 Minimum # of Job Applications per Job Training Testing 75 63732 15934 100 26298 6575 4 Methods 4.1 One-way clustering: K-means The baseline clustering method is one-way clustering on users using K-means. Each user is represented by a vector u where { 1 if the user applies to the ith job u i = 0, otherwise K-means will cluster these user vectors based on the Euclidean distance. 4.2 Co-clustering: Nonnegative Matrix Tri-factorization This method is proposed in [4]. Basically, Given the nonnegative user-job matrix X, Nonnegative Matrix Tri-factorization is derived from NMF which factorizes the nonnegative X into 2 nonnegative matrices, X F G T If imposing orthogonality on both F and G, the objective is: min X F F 0,G 0 GT 2, s.t.f T F = I, G T G = I As shown in [7], it is equivalent to simultaneously runninig K-means clusterins of rows and columns of X. Since the orthogonality requirement is very strict, instead, we can consider the 3-factor decomposition, X F SG T and the objective function then becomes min X F F 0,G 0 SGT 2, s.t.f T F = I, G T G = I where F is the cluster indicator matrix of row clusters and G is the cluster indicator matrix of column clusters because as described in [4], G is the K-means clustering result using kernel matrix X T F F T X to calculate the distance and similarily, F is the K-means clustering result using kernel matrix XGG T X T. Since our user-job matrices contain 0-valued entries when the corresponding user (row) doesn t apply to the corresponding job (column), I replaced 0 with a small fractional number 0.01 which can be distinctive from the 1-valued entries before running this method then implemented the following algorithm to minimize the objective function and obtain F and G: Algorithm NMTF Initialization: 1. Run K-means of columns to obtain column cluster centroids as G. 2. Run K-means of rows to obtain row cluster centroids as F. 3. Let S = F T XG Update rules: G := G XT F S GG T X T F S XGS T F := F F F T XGS T X := S F T XG F T F SG T G Then the cluster membership of rows and columns are extracted from F and G. 4.3 Co-clustering: Spectral Co-clustering This method is from [3] which leverages partitioning bipartite graph to achieve co-clustering. Giving the 0-1 user-job matrix X, a bipartite graph is constructed with two vertex sets representing users and jobs respectively with edge weight be 1 3

Figure 1 or 0 as illustrated in Figure 1. Then the adjancy matrix is [ ] 0 X M = X T 0 Let D be [ ] Drow 0 0 D col where D row (i, i) = j X i,j, D col (j, j) = j X i,j We have the Laplacian matrix L = D M. It is proved in [3] that the second eigenvector of the problem Lz = λdz leads to a relaxed solution of finding the minimum normalized cut (a cut is defined as the sum of weights of edges connecting two partitions) of this bipartite graph, in the meantime balancing partitioning sizes. [3] also shows to obtain the second eigenvector of L, we need to compute singular vectors of D 1 2 rowxd 1 2 col first. To get the clustering result, we can directly run K-means on the second eigenvector of L which is a reduced-dimensional data set. 5 Experiments 5.1 Preprocessing the testing set Since there are no labels for either jobs or users, the way to validate the trained clusters and generate usefuly metrics is to verify job applications in the testing set against the trained clusters. Since we shouldn t verify the same job applications in the testing set as the training set and are unable to verify jobs and users in the testing set that are unseen in the training set, I preprocessed the testing test to extract unseen job applications whose corresponding jobs and users were involved in the training set. 5.2 Metrics Due to lack of label, the metrics to verify the effectiveness of clustering algorithms are recall, accuracy and F1-score. To compute them, we need to compute the metrices: True Positive: a job applied by a user in the testing set is in the same cluster of this user in the trained clusters. False Positive: a job not applied by a user in the testing set is in the same cluster of this user in the trained clusters. True Negative: a job not applied by a user in the testing set is not in the same cluster of this user in the trained clusters. False Negative: a job applied by a user in the testing set is not in the same cluster of this user in the trained clusters. In practical, the ways to determine the cluster which a job belongs to are different for different clustering methods: K-means: Any cluster if any user in this cluster applies to this job in the training set NMTF: Though it is co-clustering jobs and users, the cluster labels of jobs and users in the same co-cluster are different. So follow the K-means way. Spectral Co-clustering: Since if jobs and users are in the same co-cluster, their cluster labels are same. So use the job cluster label. The metric to determine the optimal number of cluster is silhouette score. The silhouette score for each user i is b(i) a(i) s(i) = max(b(i), a(i)) a(i) is the dissimilarity to the user s cluster and b(i) is the minimum dissimilarity to other clusters where the disimilarity to a cluster for a user is defined by # of jobs applied in this cluster by this user 1 total # of jobs applied by this user The average of the cluster silhouette score (i.e., the average of the user silhouette score in this cluster) will be used to determine the optimal number of clusters. 5.3 Experiments and Results 5.3.1 Performance The recall, accuracy and F1-score are computed on different number of clusters for two data sets with different densities. The results are illustrated in Figure 2, 3 and 4 (Density 1 is larger and less dense than Density 2). We can see although the recall of K-means is not bad since the jobs are overlapping in different clusters, it can provide many true-positives. 4

silhouette scores start to drop, for the first data set which is larger and more sparse, the optimal number of cluster is 80. For the second one which is smaller and more dense, the optimal number is 50. 5.3.3 Visual Comparision Figure 2: K-means The figures (a), (b) and (c) visualize example clustering results using K-means, Nonnegative Matrix Tri-factorization and Spectral Co-clustering on user-job matrix. Figure 3: NMTF (Nonnegative Matrix Trifactorization) (a) K-means (b) Spectral Co-clustering Figure 4: Spectral Co-clustering (c) NMTF When coming to accuracy, co-clustering methods NMTF and spectral co-clustering have much higher accuracies since their job clusterings are more exclusive, there are much less falsepositives. 5.3.2 Silhouette Analysis for Number of Clusters By running 5-fold cross-validation and computing silhouette scores on different number of clusters for two data sets with different densities, the result is: By observing the coordinates where the We can observe some straightforward facts that 1. Spectral Co-clustering produces more balanced co-clusters of users and jobs as its algorithm is designed for while Nonnegative Matrix Tri-factorization may result in one big cluster while the others are relatively small. 2. Nonnegative Matrix Tri-factorization allows specifying different numbers of clusters on rows and columns though normally they are specified as same. Then it requires extra steps to corresponding the user cluster to the job cluster if they actually represent the same cocluster, as their cluster labels are different, which is not as convenient as Spectral Co-clustering. 3. Like NMTF, one-way clustering using K-means produces inbalanced clusters. More importantly, unlike the coclustering methods, from the visualization we can tell in the big cluster, the jobs applied by the users in the same cluster overlap heavily with the ones 5

in other clusters, which will introduce many noises for job recommendation, while the users in the rest small clusters are matched so accurately that it might miss potential suitable jobs to recommend. 6 Conclusion It is shown that simultaineously clustering users and jobs based on merely job applications using different co-clustering methods can produce exclusive clustering of jobs, which improves the accuracy as jobs with potentially higher apply rate can be recommended compared to one-way clustering. In addition, Spectral co-clustering is designed to construct more balanced clusters, which is more ideal for recommendation by supplying more accurate pools of jobs to recommend. In the future, since both co-clustering methods are slow because they leverage matrix decomposition, it s beneficial to explore more scalable co-clustering methods so that we can co-cluster efficiently on more sparse data set. [5] Long, B., Zhang, Z. M., & Yu, P. S. (2005, August). Co-clustering by block value decomposition. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (pp. 635-640). ACM. [6] George, T., & Merugu, S. (2005, November). A scalable collaborative filtering framework based on coclustering. In Data Mining, Fifth IEEE international conference on (pp. 4-pp). IEEE. [7] Ding, C., He, X., & Simon, H. D. (2005, April). On the equivalence of nonnegative matrix factorization and spectral clustering. In Proceedings of the 2005 SIAM International Conference on Data Mining (pp. 606-610). Society for Industrial and Applied Mathematics. 7 Contributions All is done by myself. 8 Github The source code is in https://github.com/qingyunwan/cs229-project. References [1] Ungar, L. H., & Foster, D. P. (1998, July). Clustering methods for collaborative filtering. In AAAI workshop on recommendation systems (Vol. 1, pp. 114-129). [2] Dhillon, I. S., Mallela, S., & Modha, D. S. (2003, August). Information-theoretic co-clustering. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 89-98). ACM. [3] Dhillon, I. S. (2001, August). Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 269-274). ACM. [4] Ding, C., Li, T., Peng, W., & Park, H. (2006, August). Orthogonal nonnegative matrix t-factorizations for clustering. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 126-135). ACM. 6