Categorizing Social Multimedia by Neighborhood Decision using Local Pairwise Label Correlation

Size: px

Start display at page:

Download "Categorizing Social Multimedia by Neighborhood Decision using Local Pairwise Label Correlation"

Amie Blankenship
6 years ago
Views:

1 Categorizing Social Multimedia by Neighborhood Decision using Local Pairwise Label Correlation Jun Huang 1, Guorong Li 1, Shuhui Wang 2, Qingming Huang 1,2 1 University of Chinese Academy of Sciences, Beijing, , China 2 Key Lab of Intelligent Information Processing (CAS), ICT, CAS, Beijing, , China {jun.huang, guorong.li}@vipl.ict.ac.cn, {wangshuhui, qmhuang}@ict.ac.cn Abstract On social media, the user generated contents, e.g., articles and images, can be assigned with multiple labels. In this paper, we focus on the problem of performing multi-label classification on social media data, where the user generated contents are associated with multiple labels. Multi-label learning studies the problem where each object is represented by a single instance and associated with a set of labels. Current multi-label learning algorithms mainly exploit label correlations globally, by assuming that the label correlations are shared by all the examples. In real applications, however, different examples may share different label correlations. In this paper, we propose a Local Pairwise Label Correlation (LPLC) method for social media content categorization. We try to exploit the strongest local pairwise label correlations between the ground truth labels for each training example by computing the maximum conditional probabilities. If two labels have strong correlation, there will be a larger conditional probability of one label given by another. In the training stage, we find the most correlated labels for ground truth labels of each training example. In the test stage, we make prediction through maximizing the posterior probability, which is estimated with the distribution of each label in the k nearest neighbors and their most correlated local pairwise label correlations. We compare our method with six well-established multilabel learning algorithms over nine data sets from different social media data domains and scales. Comparison results with the state-of-the-arts approaches manifest competitive performances of our method. Keywords Social Multimedia; Local Pairwise Label Correlation; Multi-Label Classification; k Nearest Neighbors I. INTRODUCTION Social network research has achieved significant advances in recent years. On social network, for the prevalence of all kinds of online social network services, the user generated contents refer to a variety of media types, large scales and complicate connections [1] [3]. The user generated contents can be assigned with multiple labels. For example, the news articles may talk about several subjects simultaneously, and the images shared in social networks can be annotated with multiple semantic concepts. In community detection [4], users may have different social roles and interests in the social society, and they can be assigned with several communities simultaneously. Multi-label learning has attracted significant attentions from researchers. Multi-label learning, unlike traditional single-instance single-label learning, deals with examples having multiple class labels simultaneously and each example is represented by only one single instance. The task is to learn a model which can predict a set of possible labels for an unseen example. Multi-label learning has been applied to a variety of domains, such as text classification [5] [9], image annotation [10] [12], video annotation [13], [14], bioinformatics [9], [15], [16], social network [3] and music categorization [17], [18] into emotions, etc. A common approach is to perform problem transformation [19], [20], where a multi-label problem is transformed into one or more single-label learning problems. There are several types of problem transformation methods in the multi-label learning literatures. Binary Relevance (BR) [10] approach, which is one of the representative algorithms of problem transformation methods, considers each label as an independent binary classification problem. BR approach is theoretically simple and intuitive, but it is criticized for not taking the label correlations into account. These independently learned binary classifiers will make the predicting results incorrect sometimes. A number of remedies have been proposed, and most of current multi-label learning algorithms try to exploit label correlations globally, by assuming that the label correlations are shared by all the examples. In real applications, however, different examples may share different label correlations. One label may have correlations with many other labels in the global situation, but the most correlated label may be different for different examples. For example, in community detection, there are three interesting community groups named as social network, data mining and machine learning respectively. Suppose most researcher who are interested in social network are also interested in data mining, so group social network and group data mining have strong global correlation. If someone is interested in social network, and there will be a high probability that he is interested in data mining simultaneously. However, there may be a small group of researchers who are interested in social network are strongly interested in machine learning, but not in data mining (this is local pairwise label correlation). In this paper, we propose a simple and efficient Bayesian model for online social media content categorization called Local Pairwise Label Correlation (LPLC). We assume that the most correlated local pairwise label correlation for each training example only exist between the ground truth labels of the training example and similar examples. We try to exploit the most correlated local pairwise label correlations between the ground truth labels for each training example by computing the maximum conditional probabilities. LPLC exploits the most correlated local pairwise label correlation with two stages. First, we find the k nearest neigh-

2 bors N (x i ) for a training example x i, then we calculate the conditional probability p(y l y j, N (x i )) of one label y l given by another ground truth label y j of x i, and the conditional label y j which can make a maximum conditional probability p(y l y j, N (x i )) will be chosen as the most correlated label of y l for the training example x i. These most correlated labels of each ground truth label for each training example are stored for test stage. In the test stage, we first find the k nearest neighbors and their corresponding most correlated pairwise label correlations for a test example, then make prediction through maximizing the posterior probability, which is estimated with the distribution of each label in the k nearest neighbors and their strongest local pairwise label correlations. The rest of this paper is organized as follows. Section II reviews previous works on multi-label learning. Section III presents details of the proposed method LPLC. Experimental results on nine social media data sets are shown in section IV. Finally, we conclude and indicate several issues for future work in section V. II. RELATED WORK In the past decades, many well-established methods have been proposed to solve multi-label learning problems in various domains. All these methods can be divided into two categories [19], [20]: Problem Transformation Methods (fitting data to algorithm) and Algorithm Adaption Methods (fitting algorithm to data). A. Problem Transformation Methods Problem Transformation Methods transform the multi-label classification problem either into one or more single-label classification (binary classification, multiple class classification). Label powerset (LP) [19] is a simple but effective problem transformation method. LP considers each unique set of labels that exists in a multi-label training set as one of the classes of a new single-label classification task. LP consider label correlation by combining the unique set of labels, but it generates a huge number of labels, so it is usually unfeasible for practical application. Binary Relevance (BR) [10] is a representative algorithm of problem transformation methods. The basic idea of BR is to decompose the multi-label learning problem into Q independent binary (one-vs-rest) classification problems, where each binary classification problem corresponds to one label in the label space. BR is a simple and straightforward way for multilabel learning, but it does not consider label correlations. In multi-label learning, however, labels are often have correlations with each other. On the other hand, the binary classifiers for each label may suffer from the issue of class-imbalance when the number of labels is large and label density is low [20]. Classifier Chains (CC) [21] is a novel chaining method that can model label correlations. The Classifier Chain model transforms the multi-label learning problem into a chain of binary classification problems. It involves Q binary classifiers, and each binary classifier is trained one by one. Classifier h i is trained by using y 1,..., y i 1 as additional input information. Ensemble classifier Chains (ECC) [21] is the ensemble framework of Classifier chains. Probability Classifier Chains (PCC) [22] is an extended work on CC by formulating a probabilistic interpretation. In their analysis of the classifier chains problem, they explain how Bayes-optimal probabilistic classifier chains can be formed based on probability theory. LIFT [23] is a multi-label learning algorithm with labelspecific features. It assumes that one object belongs to multiple labels simultaneously because each label is correlated with its corresponding features. LIFT [23] exploits different features set for the discrimination of different labels by conducting clustering analysis on the positive and negative instances, and then performs training and testing by querying the clustering results. ML-LOC [24] exploits label correlations locally for multilabel learning. It assumes that the instances can be separated into different groups and each group shares a subset of label correlations. To encode the local influence of label correlations, it constructs a LOC (LOcal Correlation) code for each instance and use this code as additional features for the instance. The classifier is trained with original features and LOC codes. For test examples, LOC codes are unknown and regression models are trained to predict their LOC codes. B. Algorithm Adaption Methods Algorithm Adaption Methods modify traditional singlelabel learning algorithms for multi-label learning, which can handle multi-label data directly. A number of multi-label algorithms [18], [25] [29] are designed based on k Nearest Neighbors algorithm. For example, ML-kNN [29] is Multi-Label k Nearest Neighbor which is extended from standard knn algorithm. The basic idea of ML-kNN is to adapt k-nearest neighbor techniques to deal with multi-label data, where maximum a posteriori (MAP) rule is utilized to make prediction by reasoning with the labeling information embodied in the neighbors. A case-based multi-label ranking algorithm is proposed in [25], which introduced a conceptually novel framework. It views multi-label ranking as a special case of aggregating rankings which are supplemented with an additional virtual label and in which ties are permitted. It computes optimal aggregations with respect to the Spearman rank correlation as an underlying loss function in this framework, while the framework is amenable to a variety of aggregation procedures. CLAC (correlated lazy associative classifier) [28] is a multi-label lazy associative classifier, which progressively exploits dependencies among labels. knn is used to get relevant features and examples from the training data for an test instance. The Multi-label Class Association Rules (MCARs) are generated from this relevant data by considering the label correlations, and then used to predict a set of possible labels for the test instance. An empirical study of lazy multi-label classification algorithms is given in [27], which provides two useful extensions of BRkNN based on the calculation of confidence scores for each label. The first extension is to assign the label with the highest confidence to a test object when empty label is outputted, and the second one is that the BRkNN classifier outputs a set of labels with highest confidence, where the number of labels is equal to the average size of the label sets in the k nearest

3 TABLE I. FIVE NEAREST NEIGHBORS OF TEST EXAMPLE x t N (x t ) y 1 y 2 y 3 y 4 y 5 x x x x x neighbors. Mr.kNN [26] is a voting Margin-Ratio knn based multi-label learning algorithm, which is designed to solve the problem of outlier when using the binary relevance method. Mr.kNN first uses a modified fuzzy c-means (FCM)-based [30] approach, which can produce a relevance score of an instance with respect to each label, and this soft relevance is then employed in a voting function which is used in a k nearest neighbor classifier. There are also a variety of multi-label learning algorithms that have been adapted from other well-known traditional single-label learning algorithms. For example, an algorithm is adapted from decision tree C4.5 [31] through modifying the definition of entropy for multi-label learning, which is proposed in [32]. RankSVM [15] adapts maximum margin strategy to deal with multi-label data, where a set of liner classifiers are optimized to minimize the empirical ranking loss and enabled to handle nonlinear cased with kernel tricks. AdaBoost.MH and AdaBoost.MR [6] are two extensions of AdaBoost for multi-label data. While AdaBoost.MH is designed to minimize Hamming loss, AdaBoost. MR is designed to find a hypothesis which places the correct labels at the top of the ranking. BP-MLL [9] is derived from the popular back propagation algorithm through employing a novel error function which extended from ranking loss to capture the characteristics of multi-label learning. III. PROPOSED METHOD In this section, details of the proposed method LPLC will be presented. First, we will present how to model the most correlated local pairwise label correlation between labels, and then give the details of the proposed LPLC model and probability estimation for the model. A. Preliminaries In multi-label learning, suppose X = R d be the input space with d-dimensional and Y = {y 1, y 2,..., y Q } be the finite set of Q possible labels. D = {(x i, Y i ) 1 i N} is the training data set with N examples. The i-th object is denoted by a vector with d attribute values x i = [x i1, x i2,..., x id ], x i X, and Y i = [y i1, y i2,..., y iq ] is the possible label sets of x i. Each element y ij = 1 if the label y j is associated with x i, otherwise y ij = 0. Let N (x) be the k nearest neighbors for example x, and k is the number of nearest neighbors. The similarity between two examples is calculated by Euclidean distance. We can also adopt the kernelized locality sensitive hashing method [33] to calculate the similarities or learning a distance metric. y j is the label distribution of y j in N (x), which is a column vector with k elements (0 or 1) indicating whether the examples in N (x) belong to label y j. For example, Table I shows five nearest neighbors of a test example x t, and y 1 = [ ] T. Algorithm 1 Construct Local Pairwise Label Correlation Input: D : the multi-label training data set, D = {(x i, Y i ) 1 i N}, Y i {0, 1} Q ; k : the number of nearest neighbors; Output: M : local pairwise label correlation matrix; 1: for i = 1 to N do 2: D train = D x i ; 3: find k nearest neighbors N (x i ) for x i from D train ; 4: for all y ij Y i do 5: Pr max = 0; 6: for all y il Y i do 7: if j l and p(y j = 1 y l = 1, N (x i )) > Pr max then 8: M ij = l; 9: Pr max = p(y j = 1 y l = 1, N (x i )); 10: end if 11: end for 12: end for 13: end for 14: return M B. Local Pairwise Label Correlation We define a N Q matrix M, storing the most correlated labels for the ground truth labels to which each training examples x i belongs. If y ij = 1, M ij stores the most correlated pairwise label correlation of label y j for x i, otherwise, M ij = 0. The most correlated pairwise label correlation of label y j for x i is determined by calculating the posterior probability of label y j given by another ground truth label of example x i and the k nearest neighbors N (x i ). If two labels have strong correlation with each other, the posterior probability will be larger. The value of M ij is calculated by Eq.(1). M ij = arg max l:l j p(y j = 1 y l = 1, N (x i )) (1) We assume that the strong pairwise label correlation only exists in the ground truth labels of each training example. Because, if two labels are strong correlated, they must be often co-occurred with each other. So the value of Eq.(1) will be larger. When calculating M ij by Eq.(1), y l and y j should be associated with x i, that means y il = 1 and y ij = 1. This constraint will make our algorithm exploit local pairwise label correlation efficiently. It should be noted that M ij may not equal M ji. For example, in image annotation, if we know one image can be annotated as whale, it will be confidently to assign label seawater to this image, but not the vice versa. If label y l is chosen as the strongest pairwise correlated label for label y j, it can make a maximum conditional probability of p(y j = 1 y l = 1, N (x i )). The probability p(y j = 1 y l = 1, N (x i )) is calculated by Eq.(2). p(y j = 1 y l = 1, N (x i )) = y j T y l y l 1 (2) The probability p(y j = 0 y l = 1, N (x i )) = 1 p(y j = 1 y l = 1, N (x i )).

4 Algorithm 2 LPLC Prediction Input: D : the multi-label training data set, D = {(x i, Y i ) 1 i N}, Y i {0, 1} Q ; x t : a test example; k : the number of nearest neighbors; M : local pairwise label correlation matrix; Output: Y t : the set of predicted labels for x t ; 1: find the k nearest neighbors N (x t ) for x t ; 2: get the local pairwise label correlations N C(x t ) of N (x t ) from M; 3: for i = 1 to Q do 4: get the correlated labels S i of y i from N C(x t ); 5: calculate Pr(y i = b x t ) by Eq.(3); 6: h i (x t ) = arg max b {0,1} Pr(y i = b x t ); 7: end for 8: return Y t = {h 1 (x t ), h 2 (x t ),..., h Q (x t )}; All the procedures of finding the most correlated local pairwise label correlations are summarized in Algorithm 1. C. LPLC Model After the training stage, we have obtain the local pairwise label correlation for each training example. We assume that similar examples may share the same label correlations. In the test stage, we first to find the k nearest neighbors of the test example, and the local pairwise label correlations of these nearest neighbors. The test example will share these local pairwise label correlations with it s k nearest neighbors. Given a test example x, and its k nearest neighbors N (x). S i is the set of correlated labels of label y i in the local pairwise label correlations N C(x) of N (x) from M; The posterior probabilities Pr(y i = b x), b = 0 or 1 for each label y i are computed, and the prediction of each label y i for example x is determined by Maximum a Posteriori (MAP) rule on these computed posterior probabilities. According to the Bayesian rule, the probability Pr(y i = b x) can be written as Eq.(3). Pr(y i = b x) = 1 Z = 1 Z p(y i = b, y c = 1 N (x)) y c S i p(y i = b y c = 1, N (x))p(y c = 1 N (x)) y c S i where Z is the normalizing constant which is needed to ensure that the density integrates to one, and Z = Pr(y i = 1 x) + Pr(y i = 0 x). The probability p(y c = 1 N (x)) is calculated by Eq.(4). p(y c = b N (x)) = k(1 b)+( 1)(1 b) y c 1 k In multi-label learning, labels often have correlations with each other, but not all the labels. Some labels are independent with other labels. So, when label y i is independent. Eq.(3) is equal to Eq.(5). In this situation, LPLC model is the same as traditional BR-kNN. (3) (4) Pr(y i = b x) = p(y i = b N (x)) (5) TABLE II. DESCRIPTION OF DATA SETS ID Data sets Instance Features Labels Cardinality Domain 1 flags image 2 scene image 3 corel5k image 4 mediamill video 5 emotions music 6 medical text 7 social text 8 education text 9 tmc text After computing the probabilities Pr(y i = b x), 1 i Q. The LPLC classifier can be written as Eq.(6). h(x) = {h 1 (x), h 2 (x),..., h Q (x)} (6) where each h i (x) is defined as Eq.(7). h i (x) = arg max Pr(y i = b x) (7) b {0,1} The prediction procedures of LPLC can be summarized as Algorithm 2. IV. EXPERIMENTS In this section, we empirically evaluate the effectiveness of our method. We evaluate the effectiveness of LPLC over nine data sets from various types and scales of social media data. We further evaluate the influence of parameter k on the performance of our method and other knn based multi-label learning methods which are compared in this paper. A. Data Sets Nine social media data sets are studied in this paper. The detailed characteristics of these data sets are summarized in Table II. Data sets are ordered by the types of social media data, and all these data sets can be downloaded from mulan 1 and lamda 2. These data sets are composed of both training and testing parts. The multi-label classifiers studied in our experiments are trained on the training parts and tested on the testing parts, and some of these data sets are normalized by max-min normalization in our experiments. In Table II, column Cardinality means the label cardinality of data set. It indicates the average number of labels per instance, which is calculated by Eq.(8). Cardinality = 1 N N Y i (8) where N is the number of total instances in the data set, including both training and testing parts. Y i is the label set of the i-th instance. represents the number of elements in one set

5 B. Evaluation Metrics To evaluate the performance of different algorithms for multi-label classification, five common evaluation metrics are used in our experiments: Hamming loss, Accuracy, P recision, Recall, and F Measure. Given a set of multilabel test instances S = {(x i, Y i ) 1 i m}, where Y i {0, 1} Q and the predicted label sets h(x i ) {0, 1} Q. Hamming loss evaluates how many times an instancelabel pair is misclassified, i.e., a label not belonging to the instance is predicted or a label belonging to the instance is not predicted. Hamming loss is defined as Eq.(9). Hammingloss = 1 m m 1 Q Y i h(x i ) (9) where represents the symmetric difference of two sets. The smaller the value of Hamming loss, the better performance of the classifier. Accuracy is defined as Eq.(10). P recision and Recall are defined as Eq.(11) and (12). Accuracy = 1 m Y i h(xi ) (10) m Y i h(xi ) P recision = 1 m Recall = 1 m m m Y i h(xi ) h(x i ) Y i h(xi ) Y i (11) (12) where represents the joint of two sets, and is the union of two sets. The last evaluation metric is F measure, which is the integrated version of precision and recall with β > 0, and it s called F 1 Measure when β = 1. It is defined as Eq.(13). F Measure = (1 + β2 )P recision Recall βp recision + Recall (13) For the later four metrics, the bigger value obtained represents a better performance of the classifier. These five evaluation metrics (HammingLoss, Accuracy, P recision, Recall and F 1 Measure) are widely used in multi-label learning literatures, but the importance differs when used in multi-label prediction evaluation. The evaluation of prediction performance should be an integrative consideration of all the five metrics according to F 1 Measure, Accuracy > P recision, Recall > HammingLoss [34]. According to this criterion, we can see that our method is better than the compared methods in our experiments. C. Compared methods We compare the LPLC with five well established multilabel learning algorithms: ECC [21], LIFT 3 [23], ML-LOC 4 [24], ML-kNN [29] and EFP 5 [35]. We also compare our method with BR [10]. The traditional knn algorithm is used 3 Code available at: 4 Code available at: 5 Code available at: as the binary classifiers for BR and ECC methods which are called BR-kNN and ECC-kNN in this paper. All compared algorithms are summarized in Table III, Correlation means whether the corresponding method considered the label correlation. Algorithms BR-kNN, ECC-kNN and LPLC are implemented in Matlab. TABLE III. COMPARED METHODS ID Method Correlation Publication 1 ML-LOC yes [24] 2 ECC yes [21] 3 LIFT no [23] 4 EFP no [35] 5 ML-kNN no [29] 6 BR no [10] 7 LPLC yes this paper In our experiments, the parameters for each algorithm are set according to the published literatures. For LIFT, parameter r is set to be 0.1, which is suggested in [23]. For ML-LOC, λ 1 = 1, λ 2 = 100, and m = 15, which are suggested in [24]. The value of parameter k (number of nearest neighbors) for LPLC, ML-kNN, BR-kNN and ECC-kNN are set from 5 to 45 stepping with 5 for all data sets. The ensemble size for ECC-kNN is set to be 10, and the learning orders of classifier chains for ECC-kNN are generated randomly. D. Comparison Results In this paper, we evaluate the effectiveness of LPLC on four types of media: image, video, music and text. Experimental results are shown from Table IV to Table VIII. To be fair, we use the best results of each compared algorithm for comparison. In multi-label learning, it is difficult that an algorithm performs best over all types of evaluation criteria. We conducted the Friedman test based on the average ranks in order to verify whether the difference between algorithms are statistically significant. The second term behind each value in Table IV to Table VIII means the rank of its value in the row. If the results of different algorithms get the same value, we will give the average rank to them. The best performances are selected for each algorithm on different data sets. From Table IV to VIII, we can see that our method LPLC gets the best performance on Accuracy, F 1 Measure and P recision among these compared methods, and ranks 4th on hamming loss and the second rank on Recall. Although EFP algorithm is designed to optimize F 1 Measure, EFP achieves better performance than LPLC on F 1 Measure sometimes, but LPLC achieves statistically superior performance on F 1 M easure against EFP. It is noteworthy that Our LPLC method achieves superior performance against BR in terms of each evaluation metric. The superior performance clearly verifies the effectiveness of employing Label Correlation between labels. LPLC obtains better result than ECC and ML-LOC, demonstrating that considering local pairwise label correlation is more effective. From all the results, we also note that ECC achieves better performance than BR on the whole, but sometimes BR gets a better performance than ECC. The main reason is that ECC suffers from error propagation [36], and with the increasing

6 TABLE IV. EXPERIMENTAL RESULTS OF EACH COMPARING ALGORITHM ON HAMMING LOSS flags scene corel5k mediamill emotions medical social education tmc Avg.Rank Total Order ML-LOC LIFT ML-kNN LPLC BR ECC EFP TABLE V. EXPERIMENTAL RESULTS OF EACH COMPARING ALGORITHM ON ACCURACY flags scene corel5k mediamill emotions medical social education tmc Avg.Rank Total Order LPLC EFP ML-LOC ECC LIFT BR ML-kNN TABLE VI. EXPERIMENTAL RESULTS OF EACH COMPARING ALGORITHM ON PRECISION flags scene corel5k mediamill emotions medical social education tmc Avg.Rank Total Order LPLC ML-LOC ECC EFP LIFT ML-kNN BR TABLE VII. EXPERIMENTAL RESULTS OF EACH COMPARING ALGORITHM ON RECALL flags scene corel5k mediamill emotions medical social education tmc Avg.Rank Total Order EFP LPLC ECC ML-LOC LIFT ML-kNN BR TABLE VIII. EXPERIMENTAL RESULTS OF EACH COMPARING ALGORITHM ON F 1 -MEASURE flags scene corel5k mediamill emotions medical social education tmc Avg.Rank Total Order LPLC EFP ML-LOC ECC LIFT ML-kNN BR

7 Accuracy F1 Measure Accuracy (a) Accuracy of scene (b) F 1 -Measure of scene (c) Accuracy of emotions F1 Measure Accuracy F1 Measure (d) F 1 -Measure of emotions (e) Accuracy of social (f) F 1 -Measure of social Fig. 1. Evaluation result of LPLC at different values of k. The horizon axis of each figure indicates different values of k, and the vertical axis represents the values of each evaluation criterion. of number of labels for the data set, the learning order of classifier chains has significant influence on the performances. Our method dose not suffer from this problem. E. Evaluation of the Influence of Parameter k For knn-based approaches, it is critical to choose a appropriate number for parameter k (number of nearest neighbors). We perform experiments with different values of k on three data sets: scene (image), emotions (music) and social (text). We also evaluate the influence of k to the other three knn based multi-label learning algorithms: BR, ECC and ML-kNN. Due to space limitation, we only illustrate the experimental results of these data sets under two important evaluation criteria (Accuracy and F 1 Measure). The results are shown in Fig.1. The horizon axis of each figure indicates different values of k, and the vertical axis represents the values of each evaluation metric. Figure 1(a) presents the results of Accuracy for four knn based methods on scene. k is set from 5 to 45 stepping with 5. Traditional BR is very sensitive to the value of k. ECC and ML-kNN are also sensitive to the parameter k, but better than BR method. The performances of BR, ECC and ML-kNN are declining with the increasing of value of k. However, the result Accuracy of our method display as a very smooth curve, and different values of parameter k have little influence to our method on scene. The results of F 1 Measure for these knn based methods on scene data are the same as Accuracy. Figure 1(c) and 1(d) show the results of Accuracy and F 1 Measure for these four knn based methods on emotions. BR, ECC and ML-kNN are very sensitive to the value of k. Different values of parameter k have little influence to our method on emotions, and the values of Accuracy and F 1 M easure for our method fluctuate weakly. The results of Accuracy and F 1 Measure for these four knn based methods on social are shown in Figure 1(e) and 1(f). BR, ECC and ML-kNN are very sensitive to the value of k, while different values of parameter k have little influene to our method on social. From Fig. 1, it can be learned that our proposed method is not sensitive to parameter k, while other three knn-based approaches are very sensitive to the value of k, and the results display as different variation trends on different data sets. V. CONCLUSION In this paper, we proposed a simple but efficient knn based Bayesian model LPLC for multi-label social multimedia content categorization by exploiting local pairwise label correlations between labels. LPLC tries to find the most correlated local pairwise label correlation for each ground truth label of each training example. It assumes that similar examples may share the same local label correlations and the strong pairwise label correlation only exists between the ground truth labels of each training example. LPLC makes prediction through maximizing the posterior probability, which is estimated with the distribution of each label in the k nearest neighbors and their most correlated local pairwise label correlations. LPLC achieves statistically superior or comparable performances against state-of-arts methods in terms of each evaluation metric. Comparison results with BR method demonstrate that considering label correlation could improve the performance of classifier. While from the comparison results with ECC and ML-LOC which also take label correlation into consideration, we can see that the way we propose to model label correlation is more effective and simple. Future work should include the study of improving the performance of our proposed method, especially on large-scale (instance space

8 and label space) data sets, and applying this method to other learning tasks of social network. ACKNOWLEDGMENT This work was supported in part by National Basic Research Program of China (973 Program): 2012CB316400, in part by National Natural Science Foundation of China: , , , and , in part by 863 program of China: 2014AA015202, in part by China Postdoctoral Science Foundation: 2014T70111 and 2014T70126, and in part by Present Foundation of UCAS. REFERENCES [1] S. Liu, Y. Liu, L. Ni, J. Fan, and M. Li, Towards mobility-based clustering, in Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2010, pp [2] S. Liu, S. Wang, F. Zhu, J. Zhang, and R. Krishnan, Hydra: Largescale social identity linkage via heterogeneous behavior modeling, in Proceedings of the 41st ACM SIGMOD International Conference on Management of Data, 2014, pp [3] X. Wang and G. Sukthankar, Multi-label relational neighbor classification using social context features, in Proceedings of the 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2013, pp [4] S. Liu, S. Wang, K. Jayarajah, A. Misra, and R. Krishnan, Todmis: Mining communities from trajectories, in Proceedings of The 22nd ACM CIKM International Conference on Information and Knowledge Management, 2013, pp [5] H. Kazawa, T. Izumitani, H. Taira, and E. Maeda, Maximal margin labeling for multi-topic text categorization, in Advances in Neural Information Processing Systems 17. MIT Press, 2005, pp [6] R. E. Schapire and Y. Singer, Boostexter: A boosting-based system for text categorization, Machine Learning, vol. 39, no. 2/3, pp , [7] N. Ueda and K. Saito, Parametric mixture models for multi-labeled text, in Advances in Neural Information Processing Systems 15. MIT Press, 2003, pp [8] KaiYu, S. Yu, and V. Tresp, Multi-label informed latent semantic indexing, in Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2005, pp [9] M. Zhang and Z. Zhou, Multilabel neural networks with applications to functional genomics and text categorization, IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 10, pp , [10] M. R. Boutell, J. Luo, X. Shen, and C. M. Brown, Learning multi-label scene classification, Pattern Recognition, vol. 37, no. 9, pp , [11] Y. Luo, D. Tao, C. Xu, D. Li, and C. Xu, Vector-valued multiview semi-supervised learning for multi-label image classification, in Proceedings of the 27th AAAI Conference on Artificial Intelligence, 2013, pp [12] F. Sun, J. Tang, H. Li, G. Qi, and T. S. Huang, Multi-label image categorization with sparse factor representation, IEEE Transaction on Image Processing, vol. 23, no. 3, pp , March [13] F. Kang, R. Jin, and R. Sukthankar, Correlated label propagation with application to multi-label learning, in Proceedings of The 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006, pp [14] G. Qi, H. Sheng, Y. Rui, J. Tang, T. Mei, and H. Zhang, Correlative multi-label video annotation, in Proceedings of the 15th International Conference on Multimedia. ACM, 2007, pp [15] A. Elisseeff and W. Jason, A kernel method for multi-labelled classification, in Advances in Neural Information Processing Systems, 2001, pp [16] X. Wang and G. Li, Multilabel learning via random label selection for protein subcellular multilocations prediction, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 2, pp , [17] K. Trohidis, G. Tsoumakas, G. Kalliris, and I. P. Vlahavas, Multilabel classification of music into emotions, in International Society for Music Information Retrieval, 2008, pp [18] A. Wieczorkowska, P. Synak, and Z. W. Raś, Multi-label classification of emotions in music, in Intellligent Information Processing and Web Mining, 2006, pp [19] G. Tsoumakas, I. Katakis, and I. Vlahavas, Mining multi-label data, Data Mining and Knowledge Discovery Handbook, pp , [20] M. Zhang and Z. Zhou, A review on multi-label learning algorithms, IEEE Transactions on Knowledge and Data Engineering, In press. [21] R. Jesse, P. Bernhard, H. Geoff, and F. Eibe, Classifier chains for multilabel classification, in Proceedings of the 20th European Conference on Machine Learning. Berlin, Heidelberg: Springer-Verlag, 2009, pp [22] K. Dembczyński, W. Cheng, and E. Hüllermeier, Bayes optimal multilabel classification via probabilistic classifier chains, in International Conference on Machine Learning. Haifa, Israel, 2010, pp [23] M. Zhang, Lift:multi-label learning with label-specific features, in Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Barcelona, Spain, July 2011, pp [24] S. J. Huang and Z. H. Zhou, Multi-label learning by exploiting label correlations locally, in Proceedings of the 26th AAAI Conference on Artificial Intelligence, [25] K. Brinker and E. Hüllermeier, Case-based multilabel ranking, in Proceedings of the 20th International Joint Conference on Artificial Intelligence, 2007, pp [26] X. Lin and X. Chen, Mr.knn: Soft relevance for multi-label classification, in Proceedings of the 19th ACM Conference on Information and Knowledge Management. Toronto, Ontario, Canada, October 2010, pp [27] E. Spyromitros, G. Tsoumakas, and V. Ioannis, An empirical study of lazy multilabel classification algorithms, in Proceedings of the 5th Hellenic Conference on Artificial Intelligence, 2008, pp [28] A. Veloso, W. Meira, M. Gonçalves, and M. Zaki, Multi-label lazy associative classification, in Knowledge Discovery in Databases: PKDD 2007, vol Springer Berlin Heidelberg, 2007, pp [29] M. Zhang and Z. Zhou, Ml-knn: A lazy learning approch to multi-label learning, Pattern Recognition, vol. 40, no. 7, pp , [30] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Norwell, MA, USA: Kluwer Academic Publishers, [31] J. R. Quinlan, C4.5: Programs for Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., [32] A. Clare and R. D. King, Knowledge discovery in multi-label phenotype data, in Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery, 2001, pp [33] S. Wang, Q. Huang, S. Jiang, Q. Tian, and L. Qin, Nearest-neighbor method using multiple neighborhood similarities for social media data mining, NeuroComputing, vol. 95, no. 15, pp , [34] T. Zhou, D. Tao, and X. Wu, Compressed labeling on distilled labelsets for multi-label learning, Machine Learning, vol. 88, no. 1-2, pp , [35] K. Dembczynski, A. Jachnik, W. Kotlowski, W. Waegeman, and E. H- llermeier, Optimizing the f-measure in multi-label classification: Plugin rule approach versus structured loss minimization, in The 30th International Conference on Machine Learning, 2013, pp [36] R. Senge, J. José Del Coz, and E. Hüllermeier, On the problem of error propagation in classifier chains for multi-label classification, in Conference of the German Classification Society, 2012.

An Empirical Study of Lazy Multilabel Classification Algorithms

An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece