Multimedia Social Event Detection in Microblog

Size: px
Start display at page:

Download "Multimedia Social Event Detection in Microblog"

Transcription

1 Multimedia Social Event Detection in Microblog Yue Gao 1, Sicheng Zhao 2, Yang Yang 1, and Tat-Seng Chua 1 1 School of Computing, National University of Singapore, Singapore 2 School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China {dcsgaoy,dcsyangy,dcscts}@nus.edu.sg, zsc@hit.edu.cn Abstract. Event detection in social media platforms has become an important task. It facilities exploration and browsing of events with early plans for preventive measures. The main challenges in event detection lie in the characteristics of social media data, which are short/conversational, heterogeneous and live. Most of existing methods rely only on the textual information while ignoring the visual content as well as the intrinsic correlation among the heterogeneous social media data. In this paper, we propose an event detection method, which generates an intermediate semantic entity, named microblog clique (MC), to explore the highly correlated information among the noisy and short microblogs. The heterogeneous social media data is formulated as a hypergraph and the highly correlated ones are grouped to generate the MCs. Based on these MCs, a bipartite graph is constructed and partitioned to detect social events. The proposed method has been evaluated on the Brand-Social-Net dataset. Experimental results and comparison with state-of-the-art methods demonstrate the effectiveness of the proposed approach. Further evaluation has shown that the use of the visual content can significantly improve the event detection performance. Keywords: Event detection, Microblog clique, Live data, Multimedia. 1 Introduction Social media platforms [11], such as Twitter 1, Facebook 2, and Sina Weibo 3,havebecome important real-time information resources and host a huge amounts of user contributed content (UGC). The rapid development of social media platforms has led to continuously increasing data, which plays an important role in information sharing and diffusion. An example is the Super Bowl 2013, which attracts up to 24 million tweets in total and the number of tweets about just the blackout is over 231k per minute. The users can employ these platforms to report real-life events which may spread quickly and widely across the entire social network. This live information in social media streams requires effective technique for organization and management. Events in social media refer to observable occurrences of people, places, times and activities [8]. As introduced in [16], an event can be regarded as a single episode of a large story and event detection can benefit the social media content analysis and enable powerful event browsing X. He et al. (Eds.): MMM 2015, Part I, LNCS 8935, pp , c Springer International Publishing Switzerland 2015

2 270 Y. Gao et al. In recent years, event detection in social media platforms has attracted extensive research attention [14,15]. Most of existing event detection methods consider only the textual context and the social connection while ignoring the visual context which has been growing in importance in social media. It is noted that detecting events in social media is challenging due to the following three characteristics. First, social media posts tend to be short and conversational in nature. Thus, the contents and vocabularies used in these posts tend to change rapidly. Under this circumstance, a single post may not be adequate to reflect meaningful content, and exploration of highly correlated posts on the same topic becomes an urgent requirement. Second, the content of social media posts has become increasingly heterogeneous and multimedia. The social posts may contain not only the text and image, but also time-stamp, location, social connection, user preference, and other metadata. Our recent investigation shows that about 30% of microblog posts now contain images and this number is still increasing. Therefore, visual content becomes more important in these days. Third, social media content comes in the form of social media streams. The amount of social media data is not only enormous but also continuously growing in every minute. These live data make it a hard task to detect new events and to handle increasingly large scale data. In this paper, we aim to detect events from social media posts. To address the problems of short and conversational posts, we propose to generate microblog cliques (MCs), which is a group of highly correlated microblogs. MCs help to enrich the single post content and tackle the data sparseness issue. To address the heterogeneity of microblog data, we propose to jointly employ the textual and visual content in microblog for analysis, which can substantially explore the intrinsic correlation among the heterogeneous data. Figure 1 presents the framework of the proposed event detection method. Fig. 1. The framework of the proposed event detection method in microblog The proposed event detection method can be briefly described as follows. Given a set of microblogs, we first construct a heterogeneous microblog hypergraph, in which the distance between two microblogs is measured by multiple facets, including textual content, visual content, location, time-stamp and the user connection. The heterogeneous hypergraph is then partitioned into small sub-graphs, which are denoted as the MCs. Each MC comprises a group of highly-correlated microblogs, such as near-duplicates or reposted microblogs. We summarize each MC by selecting several representative microblogs. These MCs are then used to construct an MC graph, and the K-way segmentation on the MC graph is conducted using the transfer cut to generate the K events for the given microblogs. In our method, the Bayesian information criteria is employed to select the optimal event number. The proposed event detection method has been evaluated on the Brand-Social-Net dataset [6], which includes 20 saga events of different types, i.e., the saga story. Experimental results and comparison with state-of-the-art methods demonstrate the effectiveness of the proposed method.

3 Multimedia Social Event Detection in Microblog 271 The remainder of this paper is organized as follows. Section 2 reviews related work on event detection in social media platforms. The detailed algorithms, including MC generation and event detection, are elaborated in Section 3 and Section 4, respectively. Section 5 presents the experimental results, followed by conclusions and discussions for further work in Section 6. 2 Related Work In this section, we briefly review the related work in event detection in social media platforms. Given new incoming data, the similarity between the new data and the existing events are computed first and the event with the maximal similarity is selected. When all the similarities are below a predefined threshold, it will be considered as a new event. A modified TF/IDF and time-based threshold are employed in [2] to measure the relevance between events and documents, in which an auxiliary dataset is used to estimate the IDF due to the fact that the future documents are unknown. An incremental IDF is introduced in [20] which considers a time window and a decay factor to measure the similarity between documents and events. Fung et al. [5] explored the word appearance as the binomial distribution, and the word burst is identified by a heuristic with thresholds. The frequency domain of textual content has also been investigated. The Wavelet-based signal processing has been introduced in [18] to detect events, in which the cross correlation between the word appearance is measure by using the Waveletbased feature. Reuter and Cimiano [12] proposed an event classification method to deal with incremental data in social media streams. In their method, a candidate retrieval step is first performed to gather related events by using the capture time, upload time, geographic location, tags, titles and the description. Then the similarities between the document and each event for the top returned retrieved events are measured based on nine features, including the temporal information, geographical information and textual information. The probabilities of the documents belonging to the event or belonging to a new event can then be computed by a trained Support Vector Machine. A threshold is empirically selected using a gradient descent method on a split of training data. Becker et al. [4] introduce the learning similarity metrics to identify events in social media streams, in which the event identification task is formulated as a clustering problem. In this method, each event is denoted by a document cluster, and the scalable clustering is evaluated using normalized mutual information and B-cubed [3]. Considering the different information in social media documents, such as the textual feature and the location data, different similarities are combined in an ensemble-clustering procedure. To classify new data into existing events, a group of training samples are first selected from labeled data and the logistic regression and SVM are employed as the classifier, which shows the best performance in experimental results, i.e., CLASS-LR and CLASS-SVM. It is noted that most of the existing methods are based on the textual content associated with the time-stamp. With the increasing amount of multimedia content in social media streams, such as images and videos, it is important to further explore the roles of visual context in microblogs for event analysis.

4 272 Y. Gao et al. 3 Microblog Clique Generation Most of the social media posts are short and conversational. Therefore, it is difficult to explore useful information from the limited and noisy content of one single microblog. On the other hand, most of the microblogs are not alone due to the conversational nature of social media. For instance, the highly correlated reposts and/or comments can be exploited as a valuable resource for enriching the original microblog post. Under this circumstance, we propose to generate a middle level object, termed microblog clique (MC), to represent the grouped microblogs. Here an MC is a set of highly correlated microblogs, which are all related to the same topic in a short time window. Each MC is a combination of several relevant microblogs, which is more informative. In this way, MCs can be used as basis to explore a set of microblogs as a basic unit instead of a single microblog. Figure 2 illustrates the workflow of the proposed MC generation method. To formulate the relationship among microblogs, a heterogeneous microblog hypergraph is constructed and the hierarchical bi-partition on the hypergraph is conducted under the constraint of Bayesian information criteria. Fig. 2. The framework for microblog clique generation 3.1 Microblog Hypergraph Construction Considering the multi-modal data in microblogs, such as the textual information, visual information, social connection, and location, a hypergraph is a good structure to formulate the microblog relationship. Hypergraph [22] is able to handle heterogeneous data and has been extensively employed in many data mining and information retrieval tasks [9,7] due to its superiority in high-order relationship modeling. Given a group of microblogs M = {m 1,m 2,...,m n }, a microblog hypergraph G H = {V, E, W} is constructed. In G H, each vertex denotes one microblog and there are n vertices in total. To generate the edges E linking different vertices, the heterogeneous data for microblogs are employed to measure the distance between each two microblogs. Textual information: The textual information of each microblog is described by TF-IDF, and the cosine similarity is employed to measure the pairwise microblog textual distance. Visual content: Given two microblogs with images, the visual content distance can be measured. Here a spatial pyramid image feature [19] is extracted for each image, which is highly discriminative on spatial layout and local information. The dense

5 Multimedia Social Event Detection in Microblog 273 sift feature is extracted for each image and a visual dictionary size of 1,024 is learnt. The spatial pyramid structure includes three levels, i.e., 1 1, 2 2 and 4 4,anda 21,504-D feature is generated for each image. A 200-D feature is further extracted by using PCA as the visual feature for each image. Location: The geographical similarity between two microblogs (if available) is measured by using the Harversine-formula [13]. Social connection: When the two microblogs share the same owner or the two corresponding owners are connected in the social media platform, such as the follower/followee relation, these microblogs are close in social space. We measure the social similarity between two microblogs by: s s (m i,m j )= 1, 0.5, 0, u i = u j ; u i u j ; otherwise. where u i u j indicates u i and u j are connected in the social media platform. Temporal information: When two microblogs are posted within a short time gap, there is a high probability that the two microblogs are related. Here the temporal similarity between two microblogs m i and m j is measured by: s t (m i,m j )=1 ( t i t j ), (2) τ where t i and t j are the time-stamps of m i and m j respectively, and τ is a normalized factor. Each microblog m i is regarded as the centroid, and the top N nearest microblogs are selected to generate an edge based on the textual information and the visual content, respectively. For the location information and the temporal information, each microblog m i is connected with its neighbors with a geo-distance threshold and a time-distance threshold, respectively. For the social connection, each user generates one edge, which connects all the microblogs of this user and the related users. Figure 3 illustrates the hypergraph construction procedure. The incidence matrix H of the microblog hypergraph G H is generated by: H (v, e) = The vertex degree of a vertex v Vis defined by: (1) { 1 if v e 0 if v/ e. (3) d (v) = e E w (e) H (v, e), (4) and the edge degree of an edge e Eis defined by: δ (e) = v V H (v, e). (5)

6 274 Y. Gao et al. Fig. 3. Illustration for microblog hypergraph construction 3.2 Microblog Hypergraph Segmentation Given the microblog hypergraph, we aim to generate the MCs, which are groups of microblogs with the same topics. We partition the hypergraph using the hypergraph cut approach, which has been widely investigated in recent years. Let S and S denote the two-way partition of G H. The hypergraph cut can be defined as ( Cut H S, S) := w (e) e S e S, (6) d (e) e S where S is the hyperedge boundary and it is defied by: S := { e E e S,e S }. The two-way normalized hypergraph partition can be defined as: ( ( ( ) 1 NCut H S, S) := CutH S, S) vol (S) + 1 vol (, (7) S) where vol (S) = d (v) and vol ( S) = d (v) are the volume of S and S v S respectively. Following [22], the normalized cut can be relaxed as a real valued optimization task, which can be solved by using the eigenvector for the smallest non-zeros eignevector of the hypergraph Laplacian, i.e., Δ = I D 1 2 v v S HWD 1 e HT D 1 2 v. In this way, the input microblogs M = {m 1,m 2,...,m n } are partitioned into two parts, and the two-way partition will be conducted in each new partition. This procedure continues until the optimal results are achieved. Here we employ the Bayesian information criteria (BIC) [17] to evaluate the partition results, which is used to determine whether to accept the two-way partitions or not.

7 Multimedia Social Event Detection in Microblog Bayesian Information Criteria To identify an optimal partition, we should measure the representability of different partition results. Here the Bayesian information criteria (BIC) [17] is employed to evaluate the representation of a model, i.e., the selected representative microblogs from each partition. Given a group of partitions P={P 1,P 2,...,P m } for the data M={m 1,m 2,...,m n }, the BIC value is calculated by: BIC = δ (M) N p log n, (8) 2 where N p is the number of parameters, which can be regarded as the feature dimension for microblog description, δ (M) is the log-likehood of the microblog for the partition P with the maximum likelihood, and n is the number of microblogs for processing. In our experiments, the maximum likehood estimate for the variance can be calculate by: 2 1 θ = d(m i,c mi ) 2, (9) n m i where d (m i,c mi ) is the distance between m i and the corresponding representative microblog c mi. The log-likehood of the microblog data can be measured by: δ (M) = 1 N p 1 i 2πθ 2 θ 2 d(m i,c mi ) 2 n i +log, (10) n where n i is the number of microblogs in the corresponding partition of m i. Here we take the bi-partition as an example. The to-be-measured partitions are {P 0 } and {P 1,P 2 },wherep 0 = P 1 + P 2 indicates that P 1 and P 2 are the partition results of P 0. The BIC values for these two partition results are calculated, and the partition result with higher BIC value is employed as the final result. 3.4 The MC Representation With the microblog hypergraph partition results, all the input microblogs can be divided into a group of clusters. These sets of microblogs are regarded as the MCs, which are used in the next event detection procedure. For each MC, we use the combination of the textual content and visual content of all its microblogs to represent the MC. After the duplicate textual and visual content removing, the enriched textual content and combined images are employed for MC description. In comparison with the use of a single microblog, MC provides enriched information from a set of highly relevant microblogs, which can generate more meaningful content for microblog analysis. 4 Event Detection with MC The generated MCs can be regarded as the intermediate level semantic entities for the event detection task. The task is formulated to explore the relations and infer events

8 276 Y. Gao et al. between different MCs, MC = {MC 1,MC 2,...,MC p } and the corresponding microblogs M = {m 1,m 2,...,m n }. The objective here is to further partition the MCs and the microblogs into event clusters. We formulate MCs and the corresponding microblogs in a bipartite graph G B = {X, Y, B}, where the vertex set X = MC M,the vertex set Y = MC, and B is the across-affinity matrix to link X and Y. B is defined as follows: η B ij = e γdij 0 if x i M and x i y i if x i MC and y j MC otherwise, (11) where η and γ are two parameters to balance the inner-mc correlation and the between- MC smoothness; d ij is the distance between two MCs, which can be calculated by using the combination of textual distance and visual distance, if the images are available. To partition MCs, the transfer cut method [10] is again employed here, which can be summarized as follows. Given the bipartite graph G B and the number of required partition numbers K, we first generate D X = diag (B1) and D Y = diag ( B T 1 ).As X is much larger than Y, we first focus on the smaller bipartite graph G BY = {Y,W Y }, which only contains the MC vertices and W Y = B T D 1 X B. The graph Laplacian of G BY can be calculated by L Y = D Y W Y.TheK bottom eigenpairs {λ i, v i } K 1 of G BY can be obtained. As proved in [10], the bottom K eigenpairs {ξ i, f i } K 1 can be calculated by: 0 ξ i 1 ξ i (2 ξ i )=λ i u i = 1 1 ξ i D 1 X Bv. (12) i f i = ( u T i, ) T vt i Then {f 1, f 2,...,f K } can be used for spectral clustering [21] on the bipartite graph G B,andK microblog clusters can be obtained. Due to the noise in microblogs, some small clusters are formed but they are disregarded as noise and only those clusters with more than 2% of microblogs are selected as the detected events. It is nontrivial to select an optimal K value. Here we further employ BIC to evaluate the selection of K. We assume that the number of existing events is K 0 which is initialized as 0 at the beginning, and the largest number of events with new incoming data is no more than K 0 + n new /t m,wheret m is a threshold to determine the minimal microblog requirement for an event which is set as 50 in our experiment. The bipartite graph will be partitioned n new /t m +1times, and the partition result with the highest BIC value is selected as the event detection output. Here we assume that {Γ 1,Γ 2,...,Γ K } are the K detected events in the last procedure. The description for each Γ i is based on the MC, where a MC selection is conducted to find key MCs for the event. Here the weight for each MC is measured by the importance, such as the number of microblogs, reposts, and comments. Then top n s MCs are selected, which is set as 3 in our experiments.

9 Multimedia Social Event Detection in Microblog Experiments 5.1 Experimental Settings The Testing Dataset. In the experiments, we employ the Brand-Social-Net dataset [6]. This dataset consists of 3 million microblogs with 1.3 million images from Sina Weibo on June and July, Each microblog contains the text description, the image if available, the owner information, posting time, geo location and user connections on Sina Weibo. The number of users is 1 million. This dataset contains 20 saga events, such as Windows 8 Preview, Chongqing Auto Expo, and Honda Elysion. These saga events happened during June and July, 2012, and the number of relevant microblogs for each saga event ranges from hundreds to thousands. Given the microblogs of each saga event, event detection is conducted to explore the sub-events in these saga events. Compared Methods. To evaluate the proposed event detection method, the following methods are employed for comparison. The Candidate-Ranking method [12] (CR). The Candidate-Ranking method first retrieves several promising events and the probability of the incoming document for these events or a new event can be measured by an SVM classifier. The Candidate-Ranking method [12] with visual content (CR+V). We further implement the Candidate-Ranking method by incorporating visual content analysis. The CLASS-SVM method [4] (CS). CS is an incremental clustering method which employs SVM as the classifier to identify whether a new document belongs to an existing event or a new event. The CLASS-SVM method [4] with visual content (CS+V). We further implement the CLASS-SVM method with visual content analysis. The proposed method, denoted by Proposed. The proposed method without visual content (Proposed-V). In this method, the visual content of microblogs are not taken into consideration. The proposed method without MC, i.e., Proposed-MC. In this method, the MC generation process is removed. Evaluation Criteria. Event detection is conducted on all related data. To evaluate the event detection performance, two types of ground truth are manually annotated, i.e., the summarized ground truth, consisting of tens of microblogs and reflecting most of the main content in the data, and the top-ranking content, consisting of 10 microblogs which are the most important content in the data. Three students were employed to manually select the summarized ground truth and the top-ranking content from all related microblogs. We adopt the following performance evaluation measures. Recall is to measure the data coverage of the generated events, Precision aims to evaluate the event detection accuracy, and F-Measure is a joint measure of Precision and Recall. Average normalized modified retrieval rank (ANMRR) [1] is a rank-based measure, which considers the ranking information of microblogs. A lower ANMRR value

10 278 Y. Gao et al. indicates better performance, i.e., relevant microblogs rank at top positions. In our experiments, the selected top-ranking microblogs are regarded as the positive samples, and ANMRR is to evaluate the ranking results. A lower ANMRR value indicates that the important microblogs are listed at the top positions. 5.2 Comparison with the State-of-the-Art We first compare the proposed method with the state-of-the-art methods, i.e., CR [12] and CS [4]. The average performance of different methods on event detection is presented in Figure 4. (a) Recall (b) Precision (c) F (d) ANMRR Fig. 4. The performance comparison on static event detection among different methods From Figure 4, we observe that the proposed method can achieve better results in comparison with CR and CS in the event detection task. The proposed method achieves an improvement of 40.9%, 46.1%, 43.4%, and 19.4% in terms of Recall, Precision, F, and ANMRR, respectively as compared to CR, and an improvement of 50.7%, 55.5%, 53.0%, and 20.9% as compared to CS. These results demonstrate the effectiveness of the proposed method on event detection. The better results are benefited from the proposed intermediate semantic level, i.e., the MCs, which can jointly explore the highly related microblogs to address the inadequate information issue. In our method, the visual content has been investigated in both the MC generation and the event detection procedures, which also contributes a lot to the event detection performance. In next two subsections, we will elaborate the effects of visual content and MC on event detection. 5.3 On Visual Content We evaluate the influence of visual content in event detection, in which we compare the performance of CR, CS and the Proposed method with/without visual content. Figure 5 presents the experimental results, which show that the use of visual content can significantly improve the event detection performance. In comparison with the textual content, the visual content has shown its superiority on information spreading in social media platforms. The visual content in microblogs has been forward to be able to enrich the short and conversational textual data.

11 Multimedia Social Event Detection in Microblog 279 (a) Recall (b) Precision (c) F (d) ANMRR Fig. 5. The performance comparison of with/without visual content on static event detection 5.4 On MC Performance As an intermediate concept level representation, MC aims to enrich the microblog information from a small group of highly correlated microblogs. Here we compare the performance of the proposed method with Proposed-MC, which removes the MC generation step for event detection. Experimental results are shown in Figure 6, which indicates that the event detection performance degrades without MC. The use of MC achieves an improvement of 18.2%, 23.2%, 20.6%, and 14.8% in terms of Recall, Precision, F, and ANMRR, respectively. These results indicate that the proposed MC is effective and essential for the event detection task. The advantage of MC comes from its intermediate concept level representation, which is beyond that can be conveyed in a single microblog. An MC is composed of a group of highly correlated microblogs, which may share similar textual content, visual content, closed geographical information and connected owners, such as the reposted microblogs and the corresponding comments. These microblogs can reinforce each other with the continuous data such as the new content from the reposts and/or the comments, which can address the information sparseness issue in microblogs. Fig. 6. The performance comparison of with/without MC on static event detection 6 Conclusion In this paper, we proposed an event detection method in microblog. An intermediate concept level, i.e., microblog clique, is introduced to explore the highly correlated microblogs to enrich event representation. To tackle the heterogeneous data in microblogs, the microblogs are formulated in a hypergraph structure and hierarchical bi-partition is conducted to generate MCs. A bipartite graph is then constructed using the MCs and the corresponding microblogs, and the bipartite graph partition is performed to detect events. The proposed method has been evaluated on the Brand-Social-Net dataset.

12 280 Y. Gao et al. From the experimental results and comparisons with the state-of-the-art methods, we can draw the following conclusions. The proposed event detection method outperforms the existing state-of-the-art methods on all evaluation criteria, which clearly demonstrates the superiority of the proposed method. The evaluation on the proposed intermediate concept level, i.e., MC, confirms that MC is able to explore richer information from highly correlated microblogs and further leads to better event detection performance. The evaluation on the visual content shows that it can improve the event detection performance too. To address the event detection task in social media platforms, there are still several difficult tasks. First, most existing methods directly combine multi-modal data in social media posts, such as the textual and visual content. It is noted that these heterogeneous data may have high correlation, and how to jointly investigate the multi-modal data in microblogs requires further attention. Second, the social network can infer important latent information for social events behind the microblogs and the users, which is another future research topic. Acknowledgements. This research is supported by the Singapore National Research Foundation under its International Research Singapore Funding Initiative and administered by the IDM Programme Office. References 1. Description of core experiments for mpeg-7 color/texture descriptors. In: Standard ISO/MPEGJTC1/SC29/WG11 MPEG98/M2819 (1999) 2. Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: ACM SIGIR (1998) 3. Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 12(4) 4. Becker, H., Naaman, M., Gravano, L.: Learning similarity metrics for event identification in social media. In: WSDM, pp (2010) 5. Fung, G.P.C., Yu, J.X., Yu, P.S., Lu, H.: Parameter free bursty events detection in text streams. In: VLDB, pp (2005) 6. Gao, Y., Wang, F., Luan, H., Chua, T.-S.: Brand data gathering from social media streams. In: Proceedings of ACM Conference on Multimedia Retrieval (2014) 7. Gao, Y., Wang, M., Zha, Z.-J., Shen, J., Li, X., Wu, X.: Visual-textual joint relevance learning for tag-based social image search. IEEE Transactions on Image Processing 22(1), (2013) 8. Hearst, M.: Search user interfaces. Cambridge University Press (2009) 9. Huang, Y., Liu, Q., Zhang, S., Metaxas, D.: Image retrieval via probabilistic hypergraph ranking. In: CVPR (2010) 10. Li, Z., Wu, X.M., Chang, S.F.: Segmentation using superpixels: A bipartite graph partitioning approach. In: CVPR, pp (2012) 11. Naveed, N., Gottron, T., Kunegis, J., Alhadi, A.C.: Searching microblogs: coping with sparsity and document quality. In: Proceedings of CIKM, pp (2011) 12. Reuter, T., Cimiano, P.: Event-based classification of social media streams. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval (2012)

13 Multimedia Social Event Detection in Microblog Reuter, T., Cimiano, P., Drumond, L., Buza, K., Schmidt-Thieme, L.: Scalable event-based clustering of social media via record linkage techniques. In: ICWSM (2011) 14. Ritter, A., Etzioni, O., Clark, S., et al.: Open domain event extraction from twitter. In: KDD, pp ACM (2012) 15. Rozenshtein, P., Anagnostopoulos, A., Gionis, A., Tatti, N.: Event detection in activity networks. In: KDD, pp ACM (2014) 16. Sayyadi, H., Hurst, M., Maykov, A.: Event detection and tracking in social streams. In: WSDM (2009) 17. Schwarz, G.: Estimating the dimension of a model. Ann. Statist. 6, (1978) 18. Weng, J.S., Lee, B.S.: Event detection in twitter. In: ICWSM (2011) 19. Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR, pp (2009) 20. Yang, Y., Pierce, T., Carbonell, J.G.: A study on retrospective and on-line event detection. In: ACM SIGIR (1998) 21. Yang, Y., Yang, Y., Shen, H.T., Zhang, Y., Du, X., Zhou, X.: Discriminative nonnegative spectral clustering with out-of-sample extension. IEEE Transactions on Knowledge and Data Engineering 25(8), (2013) 22. Zhou, D., Huang, J., Schokopf, B.: Learning with hypergraphs: Clustering, classification, and embedding. In: NIPS (2007)

Learning Similarity Metrics for Event Identification in Social Media. Hila Becker, Luis Gravano

Learning Similarity Metrics for Event Identification in Social Media. Hila Becker, Luis Gravano Learning Similarity Metrics for Event Identification in Social Media Hila Becker, Luis Gravano Columbia University Mor Naaman Rutgers University Event Content in Social Media Sites Event Content in Social

More information

Hyperspectral Image Classification by Using Pixel Spatial Correlation

Hyperspectral Image Classification by Using Pixel Spatial Correlation Hyperspectral Image Classification by Using Pixel Spatial Correlation Yue Gao and Tat-Seng Chua School of Computing, National University of Singapore, Singapore {gaoyue,chuats}@comp.nus.edu.sg Abstract.

More information

Video annotation based on adaptive annular spatial partition scheme

Video annotation based on adaptive annular spatial partition scheme Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory

More information

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,

More information

The Design of a Live Social Observatory System

The Design of a Live Social Observatory System The Design of a Live Social Observatory System Huanbo Luan 1,2, Juanzi Li 2, Maosong Sun 2, Tat-Seng Chua 1 1 School of Computing, National University of Singapore 2 Department of Computer Science and

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Linking Entities in Chinese Queries to Knowledge Graph

Linking Entities in Chinese Queries to Knowledge Graph Linking Entities in Chinese Queries to Knowledge Graph Jun Li 1, Jinxian Pan 2, Chen Ye 1, Yong Huang 1, Danlu Wen 1, and Zhichun Wang 1(B) 1 Beijing Normal University, Beijing, China zcwang@bnu.edu.cn

More information

Supervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information

Supervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information Supervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information Duc-Tien Dang-Nguyen, Giulia Boato, Alessandro Moschitti, Francesco G.B. De Natale Department of Information

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

An efficient face recognition algorithm based on multi-kernel regularization learning

An efficient face recognition algorithm based on multi-kernel regularization learning Acta Technica 61, No. 4A/2016, 75 84 c 2017 Institute of Thermomechanics CAS, v.v.i. An efficient face recognition algorithm based on multi-kernel regularization learning Bi Rongrong 1 Abstract. A novel

More information

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 188 CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 6.1 INTRODUCTION Image representation schemes designed for image retrieval systems are categorized into two

More information

Topic Diversity Method for Image Re-Ranking

Topic Diversity Method for Image Re-Ranking Topic Diversity Method for Image Re-Ranking D.Ashwini 1, P.Jerlin Jeba 2, D.Vanitha 3 M.E, P.Veeralakshmi M.E., Ph.D 4 1,2 Student, 3 Assistant Professor, 4 Associate Professor 1,2,3,4 Department of Information

More information

Comment Extraction from Blog Posts and Its Applications to Opinion Mining

Comment Extraction from Blog Posts and Its Applications to Opinion Mining Comment Extraction from Blog Posts and Its Applications to Opinion Mining Huan-An Kao, Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan

More information

Scalable Hierarchical Summarization of News Using Fidelity in MPEG-7 Description Scheme

Scalable Hierarchical Summarization of News Using Fidelity in MPEG-7 Description Scheme Scalable Hierarchical Summarization of News Using Fidelity in MPEG-7 Description Scheme Jung-Rim Kim, Seong Soo Chun, Seok-jin Oh, and Sanghoon Sull School of Electrical Engineering, Korea University,

More information

Ranking Web Pages by Associating Keywords with Locations

Ranking Web Pages by Associating Keywords with Locations Ranking Web Pages by Associating Keywords with Locations Peiquan Jin, Xiaoxiang Zhang, Qingqing Zhang, Sheng Lin, and Lihua Yue University of Science and Technology of China, 230027, Hefei, China jpq@ustc.edu.cn

More information

Information Integration of Partially Labeled Data

Information Integration of Partially Labeled Data Information Integration of Partially Labeled Data Steffen Rendle and Lars Schmidt-Thieme Information Systems and Machine Learning Lab, University of Hildesheim srendle@ismll.uni-hildesheim.de, schmidt-thieme@ismll.uni-hildesheim.de

More information

Text Document Clustering Using DPM with Concept and Feature Analysis

Text Document Clustering Using DPM with Concept and Feature Analysis Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,

More information

A REVIEW ON IMAGE RETRIEVAL USING HYPERGRAPH

A REVIEW ON IMAGE RETRIEVAL USING HYPERGRAPH A REVIEW ON IMAGE RETRIEVAL USING HYPERGRAPH Sandhya V. Kawale Prof. Dr. S. M. Kamalapur M.E. Student Associate Professor Deparment of Computer Engineering, Deparment of Computer Engineering, K. K. Wagh

More information

A Bayesian Approach to Hybrid Image Retrieval

A Bayesian Approach to Hybrid Image Retrieval A Bayesian Approach to Hybrid Image Retrieval Pradhee Tandon and C. V. Jawahar Center for Visual Information Technology International Institute of Information Technology Hyderabad - 500032, INDIA {pradhee@research.,jawahar@}iiit.ac.in

More information

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632

More information

Tag Based Image Search by Social Re-ranking

Tag Based Image Search by Social Re-ranking Tag Based Image Search by Social Re-ranking Vilas Dilip Mane, Prof.Nilesh P. Sable Student, Department of Computer Engineering, Imperial College of Engineering & Research, Wagholi, Pune, Savitribai Phule

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN: Semi Automatic Annotation Exploitation Similarity of Pics in i Personal Photo Albums P. Subashree Kasi Thangam 1 and R. Rosy Angel 2 1 Assistant Professor, Department of Computer Science Engineering College,

More information

Scalable Clustering of Signed Networks Using Balance Normalized Cut

Scalable Clustering of Signed Networks Using Balance Normalized Cut Scalable Clustering of Signed Networks Using Balance Normalized Cut Kai-Yang Chiang,, Inderjit S. Dhillon The 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) Oct.

More information

Multimodal Information Spaces for Content-based Image Retrieval

Multimodal Information Spaces for Content-based Image Retrieval Research Proposal Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

ImageCLEF 2011

ImageCLEF 2011 SZTAKI @ ImageCLEF 2011 Bálint Daróczy joint work with András Benczúr, Róbert Pethes Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences Training/test

More information

A Survey on Postive and Unlabelled Learning

A Survey on Postive and Unlabelled Learning A Survey on Postive and Unlabelled Learning Gang Li Computer & Information Sciences University of Delaware ligang@udel.edu Abstract In this paper we survey the main algorithms used in positive and unlabeled

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

An Efficient Methodology for Image Rich Information Retrieval

An Efficient Methodology for Image Rich Information Retrieval An Efficient Methodology for Image Rich Information Retrieval 56 Ashwini Jaid, 2 Komal Savant, 3 Sonali Varma, 4 Pushpa Jat, 5 Prof. Sushama Shinde,2,3,4 Computer Department, Siddhant College of Engineering,

More information

Separating Objects and Clutter in Indoor Scenes

Separating Objects and Clutter in Indoor Scenes Separating Objects and Clutter in Indoor Scenes Salman H. Khan School of Computer Science & Software Engineering, The University of Western Australia Co-authors: Xuming He, Mohammed Bennamoun, Ferdous

More information

Patent Classification Using Ontology-Based Patent Network Analysis

Patent Classification Using Ontology-Based Patent Network Analysis Association for Information Systems AIS Electronic Library (AISeL) PACIS 2010 Proceedings Pacific Asia Conference on Information Systems (PACIS) 2010 Patent Classification Using Ontology-Based Patent Network

More information

Improving Suffix Tree Clustering Algorithm for Web Documents

Improving Suffix Tree Clustering Algorithm for Web Documents International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Improving Suffix Tree Clustering Algorithm for Web Documents Yan Zhuang Computer Center East China Normal

More information

Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm

Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm K.Parimala, Assistant Professor, MCA Department, NMS.S.Vellaichamy Nadar College, Madurai, Dr.V.Palanisamy,

More information

ACM MM Dong Liu, Shuicheng Yan, Yong Rui and Hong-Jiang Zhang

ACM MM Dong Liu, Shuicheng Yan, Yong Rui and Hong-Jiang Zhang ACM MM 2010 Dong Liu, Shuicheng Yan, Yong Rui and Hong-Jiang Zhang Harbin Institute of Technology National University of Singapore Microsoft Corporation Proliferation of images and videos on the Internet

More information

COMP 465: Data Mining Still More on Clustering

COMP 465: Data Mining Still More on Clustering 3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following

More information

Jianyong Wang Department of Computer Science and Technology Tsinghua University

Jianyong Wang Department of Computer Science and Technology Tsinghua University Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity

More information

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS Nilam B. Lonkar 1, Dinesh B. Hanchate 2 Student of Computer Engineering, Pune University VPKBIET, Baramati, India Computer Engineering, Pune University VPKBIET,

More information

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction Marc Pollefeys Joined work with Nikolay Savinov, Christian Haene, Lubor Ladicky 2 Comparison to Volumetric Fusion Higher-order ray

More information

Object Classification Problem

Object Classification Problem HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category

More information

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier Wang Ding, Songnian Yu, Shanqing Yu, Wei Wei, and Qianfeng Wang School of Computer Engineering and Science, Shanghai University, 200072

More information

Multimodal Medical Image Retrieval based on Latent Topic Modeling

Multimodal Medical Image Retrieval based on Latent Topic Modeling Multimodal Medical Image Retrieval based on Latent Topic Modeling Mandikal Vikram 15it217.vikram@nitk.edu.in Suhas BS 15it110.suhas@nitk.edu.in Aditya Anantharaman 15it201.aditya.a@nitk.edu.in Sowmya Kamath

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Story Unit Segmentation with Friendly Acoustic Perception *

Story Unit Segmentation with Friendly Acoustic Perception * Story Unit Segmentation with Friendly Acoustic Perception * Longchuan Yan 1,3, Jun Du 2, Qingming Huang 3, and Shuqiang Jiang 1 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing,

More information

Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data

Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data Shih-Fu Chang Department of Electrical Engineering Department of Computer Science Columbia University Joint work with Jun Wang (IBM),

More information

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Effective Latent Space Graph-based Re-ranking Model with Global Consistency Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case

More information

SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR

SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR P.SHENBAGAVALLI M.E., Research Scholar, Assistant professor/cse MPNMJ Engineering college Sspshenba2@gmail.com J.SARAVANAKUMAR B.Tech(IT)., PG

More information

IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK

IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK 1 Mount Steffi Varish.C, 2 Guru Rama SenthilVel Abstract - Image Mining is a recent trended approach enveloped in

More information

AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH

AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH Sai Tejaswi Dasari #1 and G K Kishore Babu *2 # Student,Cse, CIET, Lam,Guntur, India * Assistant Professort,Cse, CIET, Lam,Guntur, India Abstract-

More information

A Feature Selection Method to Handle Imbalanced Data in Text Classification

A Feature Selection Method to Handle Imbalanced Data in Text Classification A Feature Selection Method to Handle Imbalanced Data in Text Classification Fengxiang Chang 1*, Jun Guo 1, Weiran Xu 1, Kejun Yao 2 1 School of Information and Communication Engineering Beijing University

More information

A Semantic Model for Concept Based Clustering

A Semantic Model for Concept Based Clustering A Semantic Model for Concept Based Clustering S.Saranya 1, S.Logeswari 2 PG Scholar, Dept. of CSE, Bannari Amman Institute of Technology, Sathyamangalam, Tamilnadu, India 1 Associate Professor, Dept. of

More information

CS 534: Computer Vision Segmentation and Perceptual Grouping

CS 534: Computer Vision Segmentation and Perceptual Grouping CS 534: Computer Vision Segmentation and Perceptual Grouping Ahmed Elgammal Dept of Computer Science CS 534 Segmentation - 1 Outlines Mid-level vision What is segmentation Perceptual Grouping Segmentation

More information

Metric Learning for Large Scale Image Classification:

Metric Learning for Large Scale Image Classification: Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Thomas Mensink 1,2 Jakob Verbeek 2 Florent Perronnin 1 Gabriela Csurka 1 1 TVPA - Xerox Research Centre

More information

DOCUMENT CLUSTERING USING HIERARCHICAL METHODS. 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar. 3. P.Praveen Kumar. achieved.

DOCUMENT CLUSTERING USING HIERARCHICAL METHODS. 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar. 3. P.Praveen Kumar. achieved. DOCUMENT CLUSTERING USING HIERARCHICAL METHODS 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar 3. P.Praveen Kumar ABSTRACT: Cluster is a term used regularly in our life is nothing but a group. In the view

More information

modern database systems lecture 4 : information retrieval

modern database systems lecture 4 : information retrieval modern database systems lecture 4 : information retrieval Aristides Gionis Michael Mathioudakis spring 2016 in perspective structured data relational data RDBMS MySQL semi-structured data data-graph representation

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

Multi-label Classification. Jingzhou Liu Dec

Multi-label Classification. Jingzhou Liu Dec Multi-label Classification Jingzhou Liu Dec. 6 2016 Introduction Multi-class problem, Training data (x $, y $ ) ( ), x $ X R., y $ Y = 1,2,, L Learn a mapping f: X Y Each instance x $ is associated with

More information

Content Based Image Retrieval system with a combination of Rough Set and Support Vector Machine

Content Based Image Retrieval system with a combination of Rough Set and Support Vector Machine Shahabi Lotfabadi, M., Shiratuddin, M.F. and Wong, K.W. (2013) Content Based Image Retrieval system with a combination of rough set and support vector machine. In: 9th Annual International Joint Conferences

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Developing Focused Crawlers for Genre Specific Search Engines

Developing Focused Crawlers for Genre Specific Search Engines Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

A Miniature-Based Image Retrieval System

A Miniature-Based Image Retrieval System A Miniature-Based Image Retrieval System Md. Saiful Islam 1 and Md. Haider Ali 2 Institute of Information Technology 1, Dept. of Computer Science and Engineering 2, University of Dhaka 1, 2, Dhaka-1000,

More information

Aggregating Descriptors with Local Gaussian Metrics

Aggregating Descriptors with Local Gaussian Metrics Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,

More information

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks Archana Sulebele, Usha Prabhu, William Yang (Group 29) Keywords: Link Prediction, Review Networks, Adamic/Adar,

More information

Multi-Stage Rocchio Classification for Large-scale Multilabeled

Multi-Stage Rocchio Classification for Large-scale Multilabeled Multi-Stage Rocchio Classification for Large-scale Multilabeled Text data Dong-Hyun Lee Nangman Computing, 117D Garden five Tools, Munjeong-dong Songpa-gu, Seoul, Korea dhlee347@gmail.com Abstract. Large-scale

More information

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 1 Student, M.E., (Computer science and Engineering) in M.G University, India, 2 Associate Professor

More information

Enhancing Clustering Results In Hierarchical Approach By Mvs Measures

Enhancing Clustering Results In Hierarchical Approach By Mvs Measures International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.25-30 Enhancing Clustering Results In Hierarchical Approach

More information

Face Recognition Based on LDA and Improved Pairwise-Constrained Multiple Metric Learning Method

Face Recognition Based on LDA and Improved Pairwise-Constrained Multiple Metric Learning Method Journal of Information Hiding and Multimedia Signal Processing c 2016 ISSN 2073-4212 Ubiquitous International Volume 7, Number 5, September 2016 Face Recognition ased on LDA and Improved Pairwise-Constrained

More information

Organization and Retrieval Method of Multimodal Point of Interest Data Based on Geo-ontology

Organization and Retrieval Method of Multimodal Point of Interest Data Based on Geo-ontology , pp.49-54 http://dx.doi.org/10.14257/astl.2014.45.10 Organization and Retrieval Method of Multimodal Point of Interest Data Based on Geo-ontology Ying Xia, Shiyan Luo, Xu Zhang, Hae Yong Bae Research

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

GraphGAN: Graph Representation Learning with Generative Adversarial Nets

GraphGAN: Graph Representation Learning with Generative Adversarial Nets The 32 nd AAAI Conference on Artificial Intelligence (AAAI 2018) New Orleans, Louisiana, USA GraphGAN: Graph Representation Learning with Generative Adversarial Nets Hongwei Wang 1,2, Jia Wang 3, Jialin

More information

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,

More information

TELCOM2125: Network Science and Analysis

TELCOM2125: Network Science and Analysis School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning

More information

Extraction of Web Image Information: Semantic or Visual Cues?

Extraction of Web Image Information: Semantic or Visual Cues? Extraction of Web Image Information: Semantic or Visual Cues? Georgina Tryfou and Nicolas Tsapatsoulis Cyprus University of Technology, Department of Communication and Internet Studies, Limassol, Cyprus

More information

Link Prediction for Social Network

Link Prediction for Social Network Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue:

More information

A Novel Extreme Point Selection Algorithm in SIFT

A Novel Extreme Point Selection Algorithm in SIFT A Novel Extreme Point Selection Algorithm in SIFT Ding Zuchun School of Electronic and Communication, South China University of Technolog Guangzhou, China zucding@gmail.com Abstract. This paper proposes

More information

SHOT-BASED OBJECT RETRIEVAL FROM VIDEO WITH COMPRESSED FISHER VECTORS. Luca Bertinetto, Attilio Fiandrotti, Enrico Magli

SHOT-BASED OBJECT RETRIEVAL FROM VIDEO WITH COMPRESSED FISHER VECTORS. Luca Bertinetto, Attilio Fiandrotti, Enrico Magli SHOT-BASED OBJECT RETRIEVAL FROM VIDEO WITH COMPRESSED FISHER VECTORS Luca Bertinetto, Attilio Fiandrotti, Enrico Magli Dipartimento di Elettronica e Telecomunicazioni, Politecnico di Torino (Italy) ABSTRACT

More information

HYDRA Large-scale Social Identity Linkage via Heterogeneous Behavior Modeling

HYDRA Large-scale Social Identity Linkage via Heterogeneous Behavior Modeling HYDRA Large-scale Social Identity Linkage via Heterogeneous Behavior Modeling Siyuan Liu Carnegie Mellon. University Siyuan Liu, Shuhui Wang, Feida Zhu, Jinbo Zhang, Ramayya Krishnan. HYDRA: Large-scale

More information

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University

More information

A Novel Burst-based Text Representation Model for Scalable Event Detection

A Novel Burst-based Text Representation Model for Scalable Event Detection A Novel Burst-based Text Representation Model for Scalable Event Detection Wayne Xin Zhao, Rishan Chen, Kai Fan, Hongfei Yan and Xiaoming Li School of Electronics Engineering and Computer Science, Peking

More information

Comparative Study of Subspace Clustering Algorithms

Comparative Study of Subspace Clustering Algorithms Comparative Study of Subspace Clustering Algorithms S.Chitra Nayagam, Asst Prof., Dept of Computer Applications, Don Bosco College, Panjim, Goa. Abstract-A cluster is a collection of data objects that

More information

Theme Identification in RDF Graphs

Theme Identification in RDF Graphs Theme Identification in RDF Graphs Hanane Ouksili PRiSM, Univ. Versailles St Quentin, UMR CNRS 8144, Versailles France hanane.ouksili@prism.uvsq.fr Abstract. An increasing number of RDF datasets is published

More information

Link Analysis in Weibo

Link Analysis in Weibo Link Analysis in Weibo Liwen Sun AMPLab, EECS liwen@cs.berkeley.edu Di Wang Theory Group, EECS wangd@eecs.berkeley.edu Abstract With the widespread use of social network applications, online user behaviors,

More information

Metric Learning for Large-Scale Image Classification:

Metric Learning for Large-Scale Image Classification: Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka

More information

Top-N Recommendations from Implicit Feedback Leveraging Linked Open Data

Top-N Recommendations from Implicit Feedback Leveraging Linked Open Data Top-N Recommendations from Implicit Feedback Leveraging Linked Open Data Vito Claudio Ostuni, Tommaso Di Noia, Roberto Mirizzi, Eugenio Di Sciascio Polytechnic University of Bari, Italy {ostuni,mirizzi}@deemail.poliba.it,

More information

arxiv: v1 [cs.mm] 12 Jan 2016

arxiv: v1 [cs.mm] 12 Jan 2016 Learning Subclass Representations for Visually-varied Image Classification Xinchao Li, Peng Xu, Yue Shi, Martha Larson, Alan Hanjalic Multimedia Information Retrieval Lab, Delft University of Technology

More information

Graph-based High Level Motion Segmentation using Normalized Cuts

Graph-based High Level Motion Segmentation using Normalized Cuts Graph-based High Level Motion Segmentation using Normalized Cuts Sungju Yun, Anjin Park and Keechul Jung Abstract Motion capture devices have been utilized in producing several contents, such as movies

More information

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation IJCSNS International Journal of Computer Science and Network Security, VOL.13 No.11, November 2013 1 Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial

More information

Classification. 1 o Semestre 2007/2008

Classification. 1 o Semestre 2007/2008 Classification Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 Single-Class

More information

GRAPHICAL REPRESENTATION OF TEXTUAL DATA USING TEXT CATEGORIZATION SYSTEM

GRAPHICAL REPRESENTATION OF TEXTUAL DATA USING TEXT CATEGORIZATION SYSTEM http:// GRAPHICAL REPRESENTATION OF TEXTUAL DATA USING TEXT CATEGORIZATION SYSTEM Akshay Kumar 1, Vibhor Harit 2, Balwant Singh 3, Manzoor Husain Dar 4 1 M.Tech (CSE), Kurukshetra University, Kurukshetra,

More information

A Patent Retrieval Method Using a Hierarchy of Clusters at TUT

A Patent Retrieval Method Using a Hierarchy of Clusters at TUT A Patent Retrieval Method Using a Hierarchy of Clusters at TUT Hironori Doi Yohei Seki Masaki Aono Toyohashi University of Technology 1-1 Hibarigaoka, Tenpaku-cho, Toyohashi-shi, Aichi 441-8580, Japan

More information

A Recommender System Based on Improvised K- Means Clustering Algorithm

A Recommender System Based on Improvised K- Means Clustering Algorithm A Recommender System Based on Improvised K- Means Clustering Algorithm Shivani Sharma Department of Computer Science and Applications, Kurukshetra University, Kurukshetra Shivanigaur83@yahoo.com Abstract:

More information

STREAMING RANKING BASED RECOMMENDER SYSTEMS

STREAMING RANKING BASED RECOMMENDER SYSTEMS STREAMING RANKING BASED RECOMMENDER SYSTEMS Weiqing Wang, Hongzhi Yin, Zi Huang, Qinyong Wang, Xingzhong Du, Quoc Viet Hung Nguyen University of Queensland, Australia & Griffith University, Australia July

More information

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES STUDYING OF CLASSIFYING CHINESE SMS MESSAGES BASED ON BAYESIAN CLASSIFICATION 1 LI FENG, 2 LI JIGANG 1,2 Computer Science Department, DongHua University, Shanghai, China E-mail: 1 Lifeng@dhu.edu.cn, 2

More information

Predicting Messaging Response Time in a Long Distance Relationship

Predicting Messaging Response Time in a Long Distance Relationship Predicting Messaging Response Time in a Long Distance Relationship Meng-Chen Shieh m3shieh@ucsd.edu I. Introduction The key to any successful relationship is communication, especially during times when

More information

Behavioral Data Mining. Lecture 18 Clustering

Behavioral Data Mining. Lecture 18 Clustering Behavioral Data Mining Lecture 18 Clustering Outline Why? Cluster quality K-means Spectral clustering Generative Models Rationale Given a set {X i } for i = 1,,n, a clustering is a partition of the X i

More information

Hierarchical Document Clustering

Hierarchical Document Clustering Hierarchical Document Clustering Benjamin C. M. Fung, Ke Wang, and Martin Ester, Simon Fraser University, Canada INTRODUCTION Document clustering is an automatic grouping of text documents into clusters

More information