Supervised Clustering of Label Ranking Data

Size: px

Start display at page:

Download "Supervised Clustering of Label Ranking Data"

Lucy Gordon
5 years ago
Views:

Supervised Clustering of Label Ranking Data Mihajlo Grbovic, Nemanja Djuric, Slobodan Vucetic {mihajlo.grbovic, nemanja.djuric, slobodan.vucetic}@temple.

1 Supervised Clustering of Label Ranking Data Mihajlo Grbovic, Nemanja Djuric, Slobodan Vucetic {mihajlo.grbovic, nemanja.djuric, SIAM SDM 202, Anaheim, California, USA Temple University Department of Computer and Information Sciences Center for Data Analytics and Biomedical Informatics Philadelphia, USA

2 Outline Introduction Label Ranking Performance Measures Related Work Supervised clustering in context of Label Ranking Motivation Performance Measures Approaches Baseline Approaches Placket-Luce Mixture Model Empirical Evaluation Experiments on Synthetic Data Experiments on Real-world Data Page 2 Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

3 Introduction Label Ranking Setup: L = 5 labels Costumer Features Product Features (x) y Costumer Features: age, gender, how often they buy from us, how much on average they spend, etc. Page 3 Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

4 Introduction Label Ranking Setup: L = 5 labels Costumer Features Product Ranking Features (x) Label Ranking (π) π = , pairwise label preferences: > 4 > 2 > 5 > 3 Goal: Learn a model that maps instances x to a total label order π D = {(x n, π n ), n = N} h : x n π n Page 4 Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

5 Introduction Label Ranking: Missing Information Features (x) Label Ranking (π) Partial Ranking π = 3 5 6?? 2 Page 5 Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

6 Label Ranking: Performance Measure Notation: π(i) the class label at i-th position in the order π - (j) the position of the y j class label in the order Distance between two rankings: true ranking (π) and predicted ranking (ρ): Page 6 Given Data set: D = {(x n, π n ), n = N} )} ( ) ( )} ( ) ( ) :, {( ), ( i n j n j n i n j i y y y y y y d Kendall tau distance - counts the number of discordant label pairs N n n n LR L L d N loss ) ( ) ˆ, ( 2 Label Ranking Loss: Introduction

7 Introduction Label Ranking: Related Work. Map into classification - L(L-)/2 classifiers - (d x L) dimensional problem 2. knn based algorithms 3. Utility functions - Learn mappings f k : x R, k =,, L - Prediction: rank the utility scores Page 7

8 Introduction Label Ranking: Supervised Clustering Attribute x Colors correspond to assigned labels SYNTHETIC DATA 2 features 5 labels Each permutation represented with a color (similar color similar rank) 5 natural clusters in feature space 3 natural clusters in label space Attribute x Page 8 Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

9 Introduction Label Ranking: Supervised Clustering GOAL: Cluster data instances (customers) in the feature space by taking into consideration the assigned, potentially incomplete label rankings (product preferences) Such that the rankings of instances within a cluster are more similar to each other than to the rankings of instances in the other clusters Extract cluster centroid-rankings (preferences that represent each cluster uniquely) Page 9 Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

10 Introduction Label Ranking: Supervised Clustering Traditional Clustering Supervised Clustering 6 Colors correspond to assigned labels 6 Colors correspond to assigned labels 5 ρ={4,3,,5,2} 5 ρ={,2,3,4,5} Attribute x 2 2 Attribute x Attribute x Attribute x Page 0 Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

Introduction Label Ranking: Supervised Clustering Example: Target marketing A

space) Purpose: designing cluster-specific promotional material For each

different order that best reflects the taste of its target costumers Page

11 Introduction Label Ranking: Supervised Clustering Example: Target marketing A company with several products would like to cluster its costumers (in feature space) Purpose: designing cluster-specific promotional material For each cluster, the company can make a different catalog, by promoting products in different order that best reflects the taste of its target costumers Page Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

12 Introduction Label Ranking: Supervised Clustering Performance Measures Tightness of clusters in label ranking space How similar are the rankings of instances within the clusters How far are cluster central ranking from cluster member rankings 6 Colors correspond to assigned labels Attribute x Happiness of new costumer when he receives the catalog by mail How close is the cluster central ranking to true costumer ranking Attribute x loss LR 2 d (, ˆ ) N N n n n L ( L ) Page 2 Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

13 Approaches Heuristic Baselines. Cluster in Feature Space Find Central Cluster Rankings Kmeans Mallows 2. Cluster in Label Ranking Space Multi-Class Classification Naïve SVM EBMS * SVM 3. Add Label Rankings to Features Unsupervised Clustering Naïve Kmeans 4. -Rank (represent all data using one ranking) * M. Meila and L. Bao, An exponential model for infinite rankings, Journal of Machine Learning Research, (200) Page 3 Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

14 Approaches Plackett-Luce Mixture Model Page 4 Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

15 Approaches Plackett-Luce Mixture Model (K clusters) K clusters: Likelihood: Page 5 Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

16 Empirical Evaluation ρ={3,,6,2,5,4} ρ={,2,3,4,5,6} ρ={,2,3,4,5,6} ρ={3,,6,2,5,4} ρ={6,5,4,3,2,} ρ={6,5,4,3,2,} Page 6 Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

17 Empirical Evaluation Page 7 Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

18 Empirical Evaluation Sushi Data Set (L=0) Page 8 Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

19 Empirical Evaluation Page 9 Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

20 Empirical Evaluation Page 20 Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

21 Empirical Evaluation Page 2 Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

22 Empirical Evaluation Page 22 Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

23 Conclusion Conclusion This paper presents the first attempt at supervised clustering of complex label rank data We established several baselines for supervised clustering of label ranking data and proposed a Plackett-Luce (PL) mixture model specifically tailored for this application We empirically showed the strength of the PL model by experiments on real-world and synthetic data In addition to the supervised clustering scenario, we compared the PL model to the previously proposed label ranking algorithms in terms of predictive accuracy Page 23 Grbovic M., Djuric N., Vucetic S., Supervised Clustering of Label Ranking Data, SIAM SDM 202

24 THANK YOU

Supervised Clustering of Label Ranking Data

Supervised Clustering of Label Ranking Data Mihajlo Grbovic Nemanja Djuric Slobodan Vucetic Abstract In this paper we study supervised clustering in the context of label ranking data. Segmentation of such