HN-Sim: A Structural Similarity Measure over Object-Behavior Networks

Size: px
Start display at page:

Download "HN-Sim: A Structural Similarity Measure over Object-Behavior Networks"

Transcription

1 HN-Sim: A Structural Similarity Measure over Object-Behavior Networks Jiazhen Nian, Shanshan Wang, and Yan Zhang Department of Machine Intelligence, Peking University Key Laboratory on Machine Perception, Ministry of Education Beijing , P.R. China {nian,shanshanwang}@pku.edu.cn zhy@cis.pku.edu.cn Abstract. Measurement of similarity is a critical work for many applications such as text analysis, link prediction and recommendation. However, existing work stresses on content and rarely involves structural features. Even fewer methods are applicable for heterogeneous network, which is prevalent in the real world, such as bibliographic information network. To address this problem, we propose a new measurement of similarity from the perspective of the heterogeneous structure. Heterogeneous neighborhood is utilized to instantiate the topological features and categorize the related nodes in graph model. We make a comparison between our measurement and some traditional ones with the real data in DBLP 1 and Flickr 2. Manual evaluation shows that our method outperforms the traditional ones. Keywords: Structural similarity measurement, Heterogeneous network, Object-behavior network. 1 Introduction Similarity measurement is quite an important and fundamental study for many practical information retrieval tasks, such as relevance search [2], clustering and even ontology generation and integration [9]. It evaluates the similarity between objects in the relation networks. To deal with the relevance search problem, content-based models are commonly used, with bag-of-words as an instance. Semantic information is learned from the words and text relation is extracted by the comparison of contents [3]. For instance, when we measure the similarity between two authors in a bibliographical information network, the co-author and conference-participation behaviors can be extracted to characterize the similarity. The method gets beyond linguistic analysis, but it still works effectively. Though these state-of-art methods exhibit really well, some other features can be borrowed to further enhance the similarity measurement result. Take the popularization of social network as an example. Behavioral characteristics and social Corresponding author H. Motoda et al. (Eds.): ADMA 2013, Part I, LNAI 8346, pp , c Springer-Verlag Berlin Heidelberg 2013

2 HN-Sim: A Structural Similarity Measure over Object-Behavior Networks 49 relationship often reflect the similarity between users. To study the behaviorof objects, link-based graph model, or object-behavior network are always established, where behavioral objects are the nodes and the particular behavior or relationship turns into the links between them. Hence, the analysis of object behavior is translated into the analysis of this object-behavior network. New similarity measures can be proposed based on structure analysis over object-behavior networks. In graph model, some homogeneous structural features have been extensively studied such as common neighbors and link network [7]. However, objects and relations are always of various types in abundant networks which are called the heterogeneous information networks. In this case, conventional methods cannot work well and thus researchers exert some new ideas of similarity measurement, such as PathSim [11] and HeteSim [10]. In this paper, we tackle the structural similarity measure problem at a general level. Every object in the network is categorized in accordance with their types and attributes. Object-behavior network will be set up and nodes are linked by heterogeneous edges even some of them are not actually connected. We call these connected nods as heterogeneous neighbors. For a particular object, all of its heterogeneous neighbors which are connected with the same relationship constitute a heterogeneous neighborhood. Thus far, the measurement of similarity will be established based on these heterogeneous neighborhoods. Meanwhile, this method is more efficient and cost-saving because it merely takes into account a very small part of data. We conduct some experiments on DBLP data set, calculating the similarity between authors. Besides, a manual evaluation on Flickr is made to compare our method with the conventional measurements. Our contribution can be summarized as follows: we study the structural similarity beyond semantic researches and this method can be applied to measure similarity efficiently in object-behavior networks. Meanwhile, we propose heterogeneous neighborhoods in order to categorize different neighbor objects and extract the semantic information of the relationships. The remainder of the paper is organized as follows. We review the related works in Section 2. The details of ths similairty measurement method are described in Section 3. The experimental results are shown in Section 4. Finally, we conclude our work in Section 5. 2 Related Work The problem of similarity search between structured objects has been studied in the domain of structural pattern recognition and pattern analysis. The most common approach in previous work is based on the comparison of structure. Two objects are considered structurally equivalent if they share many common neighbors. Contrary to homogeneous networks, in heterogeneous information networks neighbors cannot be treated as the same because of the multi-type relations.

3 50 J. Nian, S. Wang, and Y. Zhang In order to utilize the information of structure in heterogeneous information network, Sun et al. advised PathSim [11] about the measurement of the similarity of same-typed objects based on symmetric paths, by considering semantics in meta-path which is constituted by different-typed objects) Shi et al. proposed another measure, called HeteSim [10]. That is based on the theory that relatedness of object pairs is defined according to the search path, which connects two objects through following a sequence of nodes. Some similarity measurements based on link structure are effective such as SimRank [5] and P-Rank (Penetrating Rank) [13]. These methods consider that two vertices are similar if their immediate neighbors in the network are similar. The difference is that SimRank only calculates vertex similarity between sametyped vertices and only partial structural information from inlink direction is considered during similarity computation. P-Rank enriches SimRank by jointly encoding both inlink and outlink relationships into structural similarity computation. Though P-Rank has considered heterogenous relations between vertexes, only directly-linked neighbors are put into computation. Hence some semantic information is lost. Furthermore, both SimRank and P-Rank are iterative algorithms. When the network gets larger, the cost will become heavier. 3 Methodology 3.1 Notation and Definition To formalize our method, we will first give some concepts. Definition 1. Network: A network is defined as a directed graph, noted as G =(V,E). V is the set of vertexes and E is that of edges. In an object-behavior network, every object is abstracted as a vertex and if two objects are related by some a particular behavior or relationship, they will be linked by an edge. Definition 2. Node. A node stands for an object in an object-behavior network, noted as v V. It is represented as a triple <info,type,φ>,whereinfo and type denote the information of this object; Φ is the general set of neighborhoods of node v. We refer to info, type and Φ of v with v.info, v.type and v.φ. Definition 3. Heterogeneous Neighborhood. A heterogeneous neighborhood is actually a set of nodes which describe a kind of topological structural features of an object in an object-behavior network. We formalize it as an infinite set φ. Eachφ is defined as a triple < relation, distance, ν >, whererelation is the description of this relationship and distance is the length of the path from the cynosure to each node v in the node set ν. Each type of links represents a kind of relation(r), and each relation is defined as R R. Function R(v p )={v v V,v and v p are connected by relation R}. In Definition 3, if a heterogeneous neighborhood φ v p.φ is relevant with relation R m. For simplicity, we use subscript relation name to modify the neighborhood, e.g. φ R denotes the φ which relation is R, and it will be inferred that φ Rm.ν = R m (v p ).

4 HN-Sim: A Structural Similarity Measure over Object-Behavior Networks 51 (a) (b) Fig. 1. A heterogeneous information network and a schematic diagram of the relations in a heterogeneous network In this paper, distance plays a very important role. Hence, for making a clear introduction, another notation will be defined: regard to a particular node v, every φ v.φ is labeled by φ.distance. Each neighborhood set Φ k, is defined as {φ i φ i Φ, φ i.distance = k}, whereφ = n k=1 Φk. A heterogeneous neighborhood contains same type of nodes which are connected by the same relation. In homogeneous networks, the relation is only linked or not. Here, relation is expressed by a sequence of related-object pairs. For instance, in Fig. 1(a), Author 2 is Paper 1 s homogeneous neighbor because there is an edge between them. Paper 2 is Paper 1 s heterogeneous neighbor for they can be connected by (Paper 1,Author 2 )and(author 2,Paper 2 ). Fig. 1(b) shows all the feasible relations in that heterogeneous information network. Example 1. Heterogeneous neighborhood of Paper 1 (in Fig. 1(a)). Node Paper 1 : <Paper 1, paper, Φ >, Φ = 3 k=1 Φk. Φ 1 = {φ 0,φ 1, φ 2, φ 3 }. Thereinto φ 0.relation is be written, φ 1.relation is be published in, etc. Now we use P, A, C, T to represent paper, author, conference and terms respectively, meanwhile the relation R bewritten can be noted as PA. For simplification and clearness, we use φ PA instead of φ 0. Hence, φ PA.distance is 1, φ PA.ν is {Author 1, Author 2 }. 3.2 Similarity Measure Based on Heterogeneous Neighborhood Homogenous Similarity Measure Review. The principle of establishing information network is to connect the relevant objects. For instance, two authors are related because they have once collaborated on one paper, and Hawaii is connected to CIKM in bibliographical network because CIKM was once held in Hawaii. Consequently, if two authors share so many common neighbors just like

5 52 J. Nian, S. Wang, and Y. Zhang papers, they must have common research interests. So we believe that structural features fully reveal the similarity between nodes in an information network, and the similar nodes are connect to each other strongly. A variety of similarity measurements based on structural features in homogeneous network have been proposed. Here we review some of the classical similarity measurements. Personalized PageRank (PR): Personalized PageRank [1] is an extension for personalization of PageRank by introducing preference set P. The Personalized PageRank equation is defined as v =(1 c)av + cu. Common neighbors (CN): Common neighbors [1] is defined in graph model as the number of common neighbors shared by two vertex v i and v j,namely Γ (v i ) Γ (v j ). A larger value means these two objects are more similar. Jaccard s coefficient (JC): Jaccard s coefficient is another measure to evaluate the similarity between two vertex v i and v j, which is the normalized value of common neighbors, namely Γ (vi) Γ (v j) Γ (v i) Γ (v. j) HN-Sim Formula. However, the neighbors in different neighborhood or of diverse types always express differently. Taking DBLP as an example, an institute and a conference can be the neighbors of the same papers, but it s equivocal to tell whether they are similar. By analyzing the neighbor ingredient, we can measure the similarity between two nodes in an information network. However, the diversity of neighbors provides different semantic information about nodes. Neighbors in different level represent different properties. In this paper, heterogeneous neighborhood is proposed to categorize the neighbors. Accordingly the similarity of two objects will be measured in each heterogeneous neighborhood separately and finally merged by an influence-based function. As is discussed in Section 3.1, we have known that a node s heterogeneous neighbors can be denoted by sequences of object pairs. While the sequences length can be unlimited, Φ will be infinite, which will lead to a big problem. On one hand, computing too many neighbors may bring in over fitting. Hence, a strategy needs to be settled to decide which neighborhoods should be considered. On the other hand, when the distance of neighborhood gets larger, the object on the neighbor level will probably be reiteration. In response to this issue, experiments and intuitions show that the length of the path used should be positively correlated to the diversity of the system and negatively correlated to thescaleofthenodeset. Meanwhile, since content is not utilized, neighborhoods with same distance will have the same influence in similarity measurement. These neighborhoods have different semantic meanings, however, we can average their similarity contributions in terms of their influence. Therefore the problem of synthesizing the similarity of individual heterogeneous neighborhoods is simplified as synthesizing the similarity of each neighborhood set.

6 HN-Sim: A Structural Similarity Measure over Object-Behavior Networks 53 The similarity between node v 1 and v 2 is defined as Formula 1. HeteroSim(v 1,v 2 )= 1 1+ε k Φ k v 1.Φ { ε + L[ 1 Φ k i R i:φ i Φ k θ Ri (v 1,v 2 )]}. (1) Where function L() controls the influence of each level of neighbors. When it comes to path-drive model, L() can be defined as a power function according to the distance: L[x] =[x] δ. (2) Here δ is decided by the length and node type of the heterogeneous neighborhood. Intuitively, further nodes make less influence to an object. Thus, δ should be positive and it is positively related to φ Ri.distance. Smooth factor ε is employed in case of none common neighbors in some heterogeneous neighborhoods. Function θ R () measures the similarity between two nodes based on the particular neighborhood connected by relation R. In the experiment, Jaccard s coefficient will be used to calculate it. θ Ri (v 1,v 2 )= Γ R i (v 1 ) Γ Ri (v 2 ) Γ Ri (v 1 ) Γ Ri (v 2 ). (3) Where Γ i (v) is the element of φ i,namelyφ i.ν. Homo-Info Adjustment. In some particular homogeneous networks, nodes quality and influence always play important roles to measure the similarity between any of the two entities. Take social network as an example, person with good reputation is always similar with other people who have the same good reputation. So in this step, we will extract some homogeneous information to adjust the heterogeneous similarity. Wang and some other researchers have already proposed several methods in which they use influence to enhance similarity measurement [12]. We extend the idea of influence ranking in homogeneous network as HomoSim to adjust HN-Sim model. The similarity between node v 1 and v 2 is defined as Formula 4. Sim(v 1,v 2 )=HomoSim(v 1,v 2 ) λ HeteroSim(v 1,v 2 ) 1 λ. (4) Here HomoSim shows the homogeneous ranking similarity between these two nodes, and HeteroSim represents basic HN-Sim. By adjusting the weighting parameter λ, we can draw the following conclusions: (1) When λ = 0,HN-Sim only computes heterogeneous neighborhood based similarity; (2) When λ =1, HN-Sim is reduced to homogeneous ranking similarity; (3) Setting λ between 0 to 1, it will balance the leverage between homogeneous ranking bias and the basic HN-Sim. It will be further discussed in Section 4.4. HomoSim is an adjustment function to enhance the HN-Sim. Itcanbe estimated as Formula 5. HomoSim(v 1,v 2 )= 1 [Rank(v 1 ) Rank(v 2 )] 2. (5)

7 54 J. Nian, S. Wang, and Y. Zhang This ranking function in Formula 3 needs to satisfy the following three properties: (i) Global calculations; (ii) Rapid convergence; (iii) Bare content involved. HN-Sim is based on local structure, therefore it has serious limitation in global view. And it should not gain much more cost by involving global calculation. Different networks always hold different quality ranking strategies. In bibliographic network, paper, author and conference are connected to each other. In order to find the papers with the similar quality, PageRank [8] might be used. In Wikipedia 3, there are bare links between entities but entities are linked with numbers of editors. We believe that high-quality authors will edit high-quality entities, and high-quality entities are edited by high-quality authors. Therefor, in Wikipedia networks, HITS [6] will be used as the Rank() function. In addition, content based ranking (CRank) is also suitable. For instance, in IMDB 4 data set, the review score can be taken as the parameter of the ranking function to measure the similarity of two movies. 4 Experiments In this Section, we present our experiments on a bibliographical information network and a social picture-sharing network. 4.1 Datasets Bibliographical network is a typical heterogeneous information network, which is a kind of object-behavior networks. We use our method to analyse the similarity between different authors. In this paper, we use the DBLP dataset downloaded in January 2013 which contains 869,113 papers, 689,177 authors and 1,304 conferences. Fig. 2 shows the number of neighbors in different Φ i.themajorityofthe nodes have less than 10 neighbors in Φ 1, and more than 40% of the nodes have a heterogeneous neighborhood with nodes amount greater than 100. The diversity of heterogeneous neighborhood is revealed in Table 1. For instance, an authortype node will have 18 heterogeneous neighborhoods whose distance is 5. Another dataset is the one used in [11], downloaded from Flickr. Flickr is a web site providing free and paid digital photo storage, sharing, and some other online social services, it is an image hosting and online community. This information network contains images, users, tags and groups. The dataset covers 10,000 images from 20 groups with 10,284 tags, and 664 users. In this experiment we limit the length of the related-pair sequence as 4, that means only Φ 1, Φ 2 and Φ 3 will be calculated. In DBLP dataset, the size of neighborhood set Φ is 9. In Formula 2, δ is the exact length of the neighborhood. For example, δs ofθ RAP and θ RAP C are 2 and 3 respectively

8 HN-Sim: A Structural Similarity Measure over Object-Behavior Networks 55 Fig. 2. Cumulative probability distribution of the scale of heterogeneous neighbors in different distances: Φ 1, Φ 2 and Φ 3 Table 1. The number of heterogeneous neighborhood types in different distances Distance Author Conference Paper Case Study In DBLP dataset, we calculate the similarity between the author Christos Faloutsos and others. The top 10 similar authors with the similarity marks are listed in Table 2. We can see that HN-Sim satisfies self-similarity. The authors found by HN- Sim Model either publish similar papers or have strong connections with him and hold similar research interests. For instance, Philip S. Yu and Christos Faloutsos are both major in data mining and have the similar reputation in this area. Faloutsos and Kleinberg have collaborated on high-level papers. Table 2. Similar authors to Christos Faloutsos Rank Author Similarity 1 Christos Faloutsos Philip S. Yu Jure Leskovec Joseph M. Hellerstein Yufei Tao Divesh Srivastava Jon M. Kleinberg Petros Drineas Yannis Manolopoulos Gerhard Weikum

9 56 J. Nian, S. Wang, and Y. Zhang 4.3 Result and Evaluation To evaluate the effect of our method, we conduct another experiment on Flickr data set. Use, tag, category information and a brief description is given to each image. We extract the relation between images with tags and groups, constructing a heterogeneous information network. I is used to represent image, T and G represent tag and group respectively. Fig. 3. Top 5 similar images in Flickr found by different methods The relations in this object-behavior network can be combined with these three initial relations, R I T : tag is assigned to image; R I G : image is categorized as group; R T G : tag is assigned to group. To measure the similarity between two images, we limit the neighborhood distance under 3. For another example, Φ 2 ={φ ITI, φ IGI, φ ITG, φ IGT }. δ informula2isset as each distance of φ. We use our method to find the similar pictures. Common neighbors(cn), Jaccard s coefficient(jc) and P-PageRank(PR) are used as adversaries. The result is shown in Fig. 3. To evaluate the performance of the measurement based on heterogeneous neighborhood, we calculate normalized Discounted Cumulative Goal (ndcg) [4]for each baseline method. NDCG is the most common way to evaluate the search result quality, the expression of DCG is as follows. DCG p = p i=0 2 reli 1 log 2 (i +1) Where p is the position of an image in a particular rank witch made by volunteers, and rel i is the relevance values of image. We invite 6 volunteers to rank 144 images by the similarity with the first image in Fig. 3. This rank is defined as ground truth. The relevance function rel in ndcg formula is defined as follows. (6)

10 HN-Sim: A Structural Similarity Measure over Object-Behavior Networks 57 rel i = relevance( rank i goldrank i ) (7) Where rank i is the position of image i ranked by experimental method and goldrank i is that of ground truth. The values are mapped to discrete numeric (Table 3). Table 3. Relevance mapping [0,1] [2,5) [5,10) [10,144] relevance For a comparison, we normalize the DCG p of result by dividing IDCG p which is calculated by golden standard rank. Fig. 4. shows the performance of all the methods we use. Our method achieves the best performance. Homogeneous methods with structural features are not stable and perform not well in heterogeneous information network. This validates that structural similarity is significant and it should be applied in heterogeneous way. Fig. 4. ndcg of each similarity measure in Filckr dataset 4.4 Homo-Info Adjustment Discussion Basic HN-Sim is based on local structure. It outperforms conventional methods in the experiments on DBLP and Flickr datasets. As is introduced in Section 3.2, basic HN-Sim can be enhanced by involving global computation. In this section, we integrate PageRank, Crank, and HeteroPageRank(PageRank in heterogeneous) to HN-Sim. Fig. 5 provides the the average results of different λ in terms of 65 similar images on the Flickr data sets. There is little difference between all other methods at ndcg@15 while basic HN-Sim performs outstandingly. However we would like to point out that after ndcg@15, the integration of global information enhances the results. To prove the effects of the incorporation of heterogeneous and homogeneous information, we sampled some movies on IMDB 5 to continue another experiment. λ is set as 0.4 and PageRank is used as the global homogeneous ranking. We calculate 5

11 58 J. Nian, S. Wang, and Y. Zhang (a) λ =0.2 (b) λ =0.4 (c) λ =0.6 (d) λ =0.8 Fig. 5. The experimental results in terms of NDCG Fig. 6. ndcg of IMDB dataset the most similar movies to Avatar 6, five volunteers were payed to rank the similar movies from a candidate set as the ground truth. The ndcg@20 is shown in Fig. 6. We notice that in this experiment, the homogeneous information does not improve the top 10 results though it does help the overall effects. 5 Conclusions In this paper, we propose a novel method to measure the similarity between objects from the perspective of structure. Heterogeneous neighborhood is borrowed 6

12 HN-Sim: A Structural Similarity Measure over Object-Behavior Networks 59 to categorize heterogeneous neighbors, which contain more structural semantic information than initial neighbors. Compared with conventional methods, our HN-Sim measure puts more focus on topological features. The experimental results show that our method performs better. As to future works, we will make more studies on the extraction of heterogeneous neighborhood and a formalized level-chosen pattern will be given. The aggregation of distance measure of each neighbor set may be more diversified in consideration of different data sets and the semantic information of the data. Acknowledgements. This work was supported by NSFC with Grant No and , and 973 Program with Grant No.2014CB References 1. Chakrabarti, S.: Dynamic personalized pagerank in entity-relation graphs. In: Proceedings of the 16th International Conference on World Wide Web, pp ACM (2007) 2. Hatzivassiloglou, V., Klavans, J.L., Eskin, E.: Detecting text similarity over short passages: Exploring linguistic feature combinations via machine learning. In: Proceedings of the 1999 Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp Citeseer (1999) 3. Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data (TKDD) 2(2), 10 (2008) 4. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems (TOIS) 20(4), (2002) 5. Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp ACM (2002) 6. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM) 46(5), (1999) 7. Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology 58(7), (2007) 8. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web (1999) 9. Ruotsalo, T., Hyvönen, E.: A method for determining ontology-based semantic relevance. In: Wagner, R., Revell, N., Pernul, G. (eds.) DEXA LNCS, vol. 4653, pp Springer, Heidelberg (2007) 10. Shi, C., Kong, X., Yu, P.S., Xie, S., Wu, B.: Relevance search in heterogeneous networks. In: Proceedings of the 15th International Conference on Extending Database Technology, pp ACM (2012) 11. Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In: VLDB 2011 (2011) 12. Wang, G., Hu, Q., Yu, P.S.: Influence and similarity on heterogeneous networks. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp ACM (2012) 13. Zhao, P., Han, J., Sun, Y.: P-rank: a comprehensive structural similarity measure over information networks. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp ACM (2009)

HBGSim: A Structural Similarity Measurement over Heterogeneous Big Graphs

HBGSim: A Structural Similarity Measurement over Heterogeneous Big Graphs 2014 IEEE International Conference on Big Data HBGSim: A Structural Similarity Measurement over Heterogeneous Big Graphs Jiazhen Nian Department of Machine Intelligence Peking University Beijing 100871,

More information

CS224W Final Report Emergence of Global Status Hierarchy in Social Networks

CS224W Final Report Emergence of Global Status Hierarchy in Social Networks CS224W Final Report Emergence of Global Status Hierarchy in Social Networks Group 0: Yue Chen, Jia Ji, Yizheng Liao December 0, 202 Introduction Social network analysis provides insights into a wide range

More information

User Guided Entity Similarity Search Using Meta-Path Selection in Heterogeneous Information Networks

User Guided Entity Similarity Search Using Meta-Path Selection in Heterogeneous Information Networks User Guided Entity Similarity Search Using Meta-Path Selection in Heterogeneous Information Networks Xiao Yu, Yizhou Sun, Brandon Norick, Tiancheng Mao, Jiawei Han Computer Science Department University

More information

On Finding Power Method in Spreading Activation Search

On Finding Power Method in Spreading Activation Search On Finding Power Method in Spreading Activation Search Ján Suchal Slovak University of Technology Faculty of Informatics and Information Technologies Institute of Informatics and Software Engineering Ilkovičova

More information

Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page

Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page International Journal of Soft Computing and Engineering (IJSCE) ISSN: 31-307, Volume-, Issue-3, July 01 Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page Neelam Tyagi, Simple

More information

Graph Classification in Heterogeneous

Graph Classification in Heterogeneous Title: Graph Classification in Heterogeneous Networks Name: Xiangnan Kong 1, Philip S. Yu 1 Affil./Addr.: Department of Computer Science University of Illinois at Chicago Chicago, IL, USA E-mail: {xkong4,

More information

Linking Entities in Chinese Queries to Knowledge Graph

Linking Entities in Chinese Queries to Knowledge Graph Linking Entities in Chinese Queries to Knowledge Graph Jun Li 1, Jinxian Pan 2, Chen Ye 1, Yong Huang 1, Danlu Wen 1, and Zhichun Wang 1(B) 1 Beijing Normal University, Beijing, China zcwang@bnu.edu.cn

More information

Popularity Weighted Ranking for Academic Digital Libraries

Popularity Weighted Ranking for Academic Digital Libraries Popularity Weighted Ranking for Academic Digital Libraries Yang Sun and C. Lee Giles Information Sciences and Technology The Pennsylvania State University University Park, PA, 16801, USA Abstract. We propose

More information

Image Similarity Measurements Using Hmok- Simrank

Image Similarity Measurements Using Hmok- Simrank Image Similarity Measurements Using Hmok- Simrank A.Vijay Department of computer science and Engineering Selvam College of Technology, Namakkal, Tamilnadu,india. k.jayarajan M.E (Ph.D) Assistant Professor,

More information

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques Kouhei Sugiyama, Hiroyuki Ohsaki and Makoto Imase Graduate School of Information Science and Technology,

More information

Link Prediction for Social Network

Link Prediction for Social Network Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue

More information

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW ISSN: 9 694 (ONLINE) ICTACT JOURNAL ON COMMUNICATION TECHNOLOGY, MARCH, VOL:, ISSUE: WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW V Lakshmi Praba and T Vasantha Department of Computer

More information

Towards a hybrid approach to Netflix Challenge

Towards a hybrid approach to Netflix Challenge Towards a hybrid approach to Netflix Challenge Abhishek Gupta, Abhijeet Mohapatra, Tejaswi Tenneti March 12, 2009 1 Introduction Today Recommendation systems [3] have become indispensible because of the

More information

Mining Trusted Information in Medical Science: An Information Network Approach

Mining Trusted Information in Medical Science: An Information Network Approach Mining Trusted Information in Medical Science: An Information Network Approach Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign Collaborated with many, especially Yizhou

More information

Deep Web Crawling and Mining for Building Advanced Search Application

Deep Web Crawling and Mining for Building Advanced Search Application Deep Web Crawling and Mining for Building Advanced Search Application Zhigang Hua, Dan Hou, Yu Liu, Xin Sun, Yanbing Yu {hua, houdan, yuliu, xinsun, yyu}@cc.gatech.edu College of computing, Georgia Tech

More information

Link prediction in multiplex bibliographical networks

Link prediction in multiplex bibliographical networks Int. J. Complex Systems in Science vol. 3(1) (2013), pp. 77 82 Link prediction in multiplex bibliographical networks Manisha Pujari 1, and Rushed Kanawati 1 1 Laboratoire d Informatique de Paris Nord (LIPN),

More information

Assisting Trustworthiness Based Web Services Selection Using the Fidelity of Websites *

Assisting Trustworthiness Based Web Services Selection Using the Fidelity of Websites * Assisting Trustworthiness Based Web Services Selection Using the Fidelity of Websites * Lijie Wang, Fei Liu, Ge Li **, Liang Gu, Liangjie Zhang, and Bing Xie Software Institute, School of Electronic Engineering

More information

Integrating Meta-Path Selection with User-Preference for Top-k Relevant Search in Heterogeneous Information Networks

Integrating Meta-Path Selection with User-Preference for Top-k Relevant Search in Heterogeneous Information Networks Integrating Meta-Path Selection with User-Preference for Top-k Relevant Search in Heterogeneous Information Networks Shaoli Bu bsl89723@gmail.com Zhaohui Peng pzh@sdu.edu.cn Abstract Relevance search in

More information

COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA

COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia WWW 2010

More information

Document Retrieval using Predication Similarity

Document Retrieval using Predication Similarity Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Web Structure Mining using Link Analysis Algorithms

Web Structure Mining using Link Analysis Algorithms Web Structure Mining using Link Analysis Algorithms Ronak Jain Aditya Chavan Sindhu Nair Assistant Professor Abstract- The World Wide Web is a huge repository of data which includes audio, text and video.

More information

Mining Query-Based Subnetwork Outliers in Heterogeneous Information Networks

Mining Query-Based Subnetwork Outliers in Heterogeneous Information Networks Mining Query-Based Subnetwork Outliers in Heterogeneous Information Networks Honglei Zhuang, Jing Zhang 2, George Brova, Jie Tang 2, Hasan Cam 3, Xifeng Yan 4, Jiawei Han University of Illinois at Urbana-Champaign

More information

Query Independent Scholarly Article Ranking

Query Independent Scholarly Article Ranking Query Independent Scholarly Article Ranking Shuai Ma, Chen Gong, Renjun Hu, Dongsheng Luo, Chunming Hu, Jinpeng Huai SKLSDE Lab, Beihang University, China Beijing Advanced Innovation Center for Big Data

More information

HIPRank: Ranking Nodes by Influence Propagation based on authority and hub

HIPRank: Ranking Nodes by Influence Propagation based on authority and hub HIPRank: Ranking Nodes by Influence Propagation based on authority and hub Wen Zhang, Song Wang, GuangLe Han, Ye Yang, Qing Wang Laboratory for Internet Software Technologies Institute of Software, Chinese

More information

An Improved PageRank Method based on Genetic Algorithm for Web Search

An Improved PageRank Method based on Genetic Algorithm for Web Search Available online at www.sciencedirect.com Procedia Engineering 15 (2011) 2983 2987 Advanced in Control Engineeringand Information Science An Improved PageRank Method based on Genetic Algorithm for Web

More information

Efficiently Mining Positive Correlation Rules

Efficiently Mining Positive Correlation Rules Applied Mathematics & Information Sciences An International Journal 2011 NSP 5 (2) (2011), 39S-44S Efficiently Mining Positive Correlation Rules Zhongmei Zhou Department of Computer Science & Engineering,

More information

Proximity Prestige using Incremental Iteration in Page Rank Algorithm

Proximity Prestige using Incremental Iteration in Page Rank Algorithm Indian Journal of Science and Technology, Vol 9(48), DOI: 10.17485/ijst/2016/v9i48/107962, December 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Proximity Prestige using Incremental Iteration

More information

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node A Modified Algorithm to Handle Dangling Pages using Hypothetical Node Shipra Srivastava Student Department of Computer Science & Engineering Thapar University, Patiala, 147001 (India) Rinkle Rani Aggrawal

More information

Semantic Annotation of Web Resources Using IdentityRank and Wikipedia

Semantic Annotation of Web Resources Using IdentityRank and Wikipedia Semantic Annotation of Web Resources Using IdentityRank and Wikipedia Norberto Fernández, José M.Blázquez, Luis Sánchez, and Vicente Luque Telematic Engineering Department. Carlos III University of Madrid

More information

Chapter 1 Introduction

Chapter 1 Introduction Chapter 1 Introduction Abstract In this chapter, we introduce some basic concepts and definitions in heterogeneous information network and compare the heterogeneous information network with other related

More information

Citation Prediction in Heterogeneous Bibliographic Networks

Citation Prediction in Heterogeneous Bibliographic Networks Citation Prediction in Heterogeneous Bibliographic Networks Xiao Yu Quanquan Gu Mianwei Zhou Jiawei Han University of Illinois at Urbana-Champaign {xiaoyu1, qgu3, zhou18, hanj}@illinois.edu Abstract To

More information

Research Article Link Prediction in Directed Network and Its Application in Microblog

Research Article Link Prediction in Directed Network and Its Application in Microblog Mathematical Problems in Engineering, Article ID 509282, 8 pages http://dx.doi.org/10.1155/2014/509282 Research Article Link Prediction in Directed Network and Its Application in Microblog Yan Yu 1,2 and

More information

Personalized Document Rankings by Incorporating Trust Information From Social Network Data into Link-Based Measures

Personalized Document Rankings by Incorporating Trust Information From Social Network Data into Link-Based Measures Personalized Document Rankings by Incorporating Trust Information From Social Network Data into Link-Based Measures Claudia Hess, Klaus Stein Laboratory for Semantic Information Technology Bamberg University

More information

Minimal Test Cost Feature Selection with Positive Region Constraint

Minimal Test Cost Feature Selection with Positive Region Constraint Minimal Test Cost Feature Selection with Positive Region Constraint Jiabin Liu 1,2,FanMin 2,, Shujiao Liao 2, and William Zhu 2 1 Department of Computer Science, Sichuan University for Nationalities, Kangding

More information

Performance Measures for Multi-Graded Relevance

Performance Measures for Multi-Graded Relevance Performance Measures for Multi-Graded Relevance Christian Scheel, Andreas Lommatzsch, and Sahin Albayrak Technische Universität Berlin, DAI-Labor, Germany {christian.scheel,andreas.lommatzsch,sahin.albayrak}@dai-labor.de

More information

CADIAL Search Engine at INEX

CADIAL Search Engine at INEX CADIAL Search Engine at INEX Jure Mijić 1, Marie-Francine Moens 2, and Bojana Dalbelo Bašić 1 1 Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000 Zagreb, Croatia {jure.mijic,bojana.dalbelo}@fer.hr

More information

WE know that most real systems usually consist of a

WE know that most real systems usually consist of a IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 29, NO. 1, JANUARY 2017 17 A Survey of Heterogeneous Information Network Analysis Chuan Shi, Member, IEEE, Yitong Li, Jiawei Zhang, Yizhou Sun,

More information

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,

More information

Learning the Three Factors of a Non-overlapping Multi-camera Network Topology

Learning the Three Factors of a Non-overlapping Multi-camera Network Topology Learning the Three Factors of a Non-overlapping Multi-camera Network Topology Xiaotang Chen, Kaiqi Huang, and Tieniu Tan National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy

More information

Lecture 17 November 7

Lecture 17 November 7 CS 559: Algorithmic Aspects of Computer Networks Fall 2007 Lecture 17 November 7 Lecturer: John Byers BOSTON UNIVERSITY Scribe: Flavio Esposito In this lecture, the last part of the PageRank paper has

More information

NUS-I2R: Learning a Combined System for Entity Linking

NUS-I2R: Learning a Combined System for Entity Linking NUS-I2R: Learning a Combined System for Entity Linking Wei Zhang Yan Chuan Sim Jian Su Chew Lim Tan School of Computing National University of Singapore {z-wei, tancl} @comp.nus.edu.sg Institute for Infocomm

More information

Trust-Based Recommendation Based on Graph Similarity

Trust-Based Recommendation Based on Graph Similarity Trust-Based Recommendation Based on Graph Similarity Chung-Wei Hang and Munindar P. Singh Department of Computer Science North Carolina State University Raleigh, NC 27695-8206, USA {chang,singh}@ncsu.edu

More information

The application of Randomized HITS algorithm in the fund trading network

The application of Randomized HITS algorithm in the fund trading network The application of Randomized HITS algorithm in the fund trading network Xingyu Xu 1, Zhen Wang 1,Chunhe Tao 1,Haifeng He 1 1 The Third Research Institute of Ministry of Public Security,China Abstract.

More information

Online Social Networks and Media

Online Social Networks and Media Online Social Networks and Media Absorbing Random Walks Link Prediction Why does the Power Method work? If a matrix R is real and symmetric, it has real eigenvalues and eigenvectors: λ, w, λ 2, w 2,, (λ

More information

Link Prediction and Anomoly Detection

Link Prediction and Anomoly Detection Graphs and Networks Lecture 23 Link Prediction and Anomoly Detection Daniel A. Spielman November 19, 2013 23.1 Disclaimer These notes are not necessarily an accurate representation of what happened in

More information

A Proximity-Based Fallback Model for Hybrid Web Recommender Systems

A Proximity-Based Fallback Model for Hybrid Web Recommender Systems A Proximity-Based Fallback Model for Hybrid Web Recommender Systems Jaeseok Myung «supervised by Sang-goo Lee» Intelligent Data Systems Lab. School of Computer Science and Engineering, Seoul National University,

More information

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, Sep. 15 www.ijcea.com ISSN 2321-3469 COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION

More information

HetPathMine: A Novel Transductive Classification Algorithm on Heterogeneous Information Networks

HetPathMine: A Novel Transductive Classification Algorithm on Heterogeneous Information Networks HetPathMine: A Novel Transductive Classification Algorithm on Heterogeneous Information Networks Chen Luo 1,RenchuGuan 1, Zhe Wang 1,, and Chenghua Lin 2 1 College of Computer Science and Technology, Jilin

More information

Combining Text Embedding and Knowledge Graph Embedding Techniques for Academic Search Engines

Combining Text Embedding and Knowledge Graph Embedding Techniques for Academic Search Engines Combining Text Embedding and Knowledge Graph Embedding Techniques for Academic Search Engines SemDeep-4, Oct. 2018 Gengchen Mai Krzysztof Janowicz Bo Yan STKO Lab, University of California, Santa Barbara

More information

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,

More information

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks SOMSN: An Effective Self Organizing Map for Clustering of Social Networks Fatemeh Ghaemmaghami Research Scholar, CSE and IT Dept. Shiraz University, Shiraz, Iran Reza Manouchehri Sarhadi Research Scholar,

More information

AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks

AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, Jiawei Han University of Illinois at Urbana-Champaign (UIUC) Facebook Inc. U.S. Army Research

More information

SimRank : A Measure of Structural-Context Similarity

SimRank : A Measure of Structural-Context Similarity SimRank : A Measure of Structural-Context Similarity Glen Jeh and Jennifer Widom 1.Co-Founder at FriendDash 2.Stanford, compter Science, department chair SIGKDD 2002, Citation : 506 (Google Scholar) 1

More information

Organization and Retrieval Method of Multimodal Point of Interest Data Based on Geo-ontology

Organization and Retrieval Method of Multimodal Point of Interest Data Based on Geo-ontology , pp.49-54 http://dx.doi.org/10.14257/astl.2014.45.10 Organization and Retrieval Method of Multimodal Point of Interest Data Based on Geo-ontology Ying Xia, Shiyan Luo, Xu Zhang, Hae Yong Bae Research

More information

Tag Based Image Search by Social Re-ranking

Tag Based Image Search by Social Re-ranking Tag Based Image Search by Social Re-ranking Vilas Dilip Mane, Prof.Nilesh P. Sable Student, Department of Computer Engineering, Imperial College of Engineering & Research, Wagholi, Pune, Savitribai Phule

More information

Improving Suffix Tree Clustering Algorithm for Web Documents

Improving Suffix Tree Clustering Algorithm for Web Documents International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Improving Suffix Tree Clustering Algorithm for Web Documents Yan Zhuang Computer Center East China Normal

More information

Re-ranking Search Results Using Semantic Similarity

Re-ranking Search Results Using Semantic Similarity Re-ranking Search Results Using Semantic Ruofan Wang wrfan8@gmail.com Shan Jiang jiangshan.pku@gmail.com Yan Zhang zhyzhy00@gmail.com Abstract In this paper, we propose a re-ranking method which employs

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Recommender Systems II Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Recommender Systems Recommendation via Information Network Analysis Hybrid Collaborative Filtering

More information

A. Papadopoulos, G. Pallis, M. D. Dikaiakos. Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks

A. Papadopoulos, G. Pallis, M. D. Dikaiakos. Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks A. Papadopoulos, G. Pallis, M. D. Dikaiakos Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks IEEE/WIC/ACM International Conference on Web Intelligence Nov.

More information

Sequences Modeling and Analysis Based on Complex Network

Sequences Modeling and Analysis Based on Complex Network Sequences Modeling and Analysis Based on Complex Network Li Wan 1, Kai Shu 1, and Yu Guo 2 1 Chongqing University, China 2 Institute of Chemical Defence People Libration Army {wanli,shukai}@cqu.edu.cn

More information

Datasets Size: Effect on Clustering Results

Datasets Size: Effect on Clustering Results 1 Datasets Size: Effect on Clustering Results Adeleke Ajiboye 1, Ruzaini Abdullah Arshah 2, Hongwu Qin 3 Faculty of Computer Systems and Software Engineering Universiti Malaysia Pahang 1 {ajibraheem@live.com}

More information

Ranking Web Pages by Associating Keywords with Locations

Ranking Web Pages by Associating Keywords with Locations Ranking Web Pages by Associating Keywords with Locations Peiquan Jin, Xiaoxiang Zhang, Qingqing Zhang, Sheng Lin, and Lihua Yue University of Science and Technology of China, 230027, Hefei, China jpq@ustc.edu.cn

More information

Mining High Order Decision Rules

Mining High Order Decision Rules Mining High Order Decision Rules Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 e-mail: yyao@cs.uregina.ca Abstract. We introduce the notion of high

More information

Fast Personalized PageRank On MapReduce Authors: Bahman Bahmani, Kaushik Chakrabart, Dong Xin In SIGMOD 2011 Presenter: Adams Wei Yu

Fast Personalized PageRank On MapReduce Authors: Bahman Bahmani, Kaushik Chakrabart, Dong Xin In SIGMOD 2011 Presenter: Adams Wei Yu Fast Personalized PageRank On MapReduce Authors: Bahman Bahmani, Kaushik Chakrabart, Dong Xin In SIGMOD 2011 Presenter: Adams Wei Yu March 2015, CMU Graph data is Ubiquitous Basic Problem in Graphs: How

More information

LET:Towards More Precise Clustering of Search Results

LET:Towards More Precise Clustering of Search Results LET:Towards More Precise Clustering of Search Results Yi Zhang, Lidong Bing,Yexin Wang, Yan Zhang State Key Laboratory on Machine Perception Peking University,100871 Beijing, China {zhangyi, bingld,wangyx,zhy}@cis.pku.edu.cn

More information

Capturing Missing Links in Social Networks Using Vertex Similarity

Capturing Missing Links in Social Networks Using Vertex Similarity Capturing Missing Links in Social Networks Using Vertex Similarity Hung-Hsuan Chen, Liang Gou, Xiaolong (Luke) Zhang, C. Lee Giles Computer Science and Engineering Information Sciences and Technology The

More information

HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery

HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery Ninh D. Pham, Quang Loc Le, Tran Khanh Dang Faculty of Computer Science and Engineering, HCM University of Technology,

More information

RiMOM Results for OAEI 2009

RiMOM Results for OAEI 2009 RiMOM Results for OAEI 2009 Xiao Zhang, Qian Zhong, Feng Shi, Juanzi Li and Jie Tang Department of Computer Science and Technology, Tsinghua University, Beijing, China zhangxiao,zhongqian,shifeng,ljz,tangjie@keg.cs.tsinghua.edu.cn

More information

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 2 Sofia 2015 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2015-0032 Privacy-Preserving of Check-in

More information

Exploratory Recommendations Using Wikipedia s Linking Structure

Exploratory Recommendations Using Wikipedia s Linking Structure Adrian M. Kentsch, Walter A. Kosters, Peter van der Putten and Frank W. Takes {akentsch,kosters,putten,ftakes}@liacs.nl Leiden Institute of Advanced Computer Science (LIACS), Leiden University, The Netherlands

More information

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks Archana Sulebele, Usha Prabhu, William Yang (Group 29) Keywords: Link Prediction, Review Networks, Adamic/Adar,

More information

Ranking web pages using machine learning approaches

Ranking web pages using machine learning approaches University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2008 Ranking web pages using machine learning approaches Sweah Liang Yong

More information

CPSC 532L Project Development and Axiomatization of a Ranking System

CPSC 532L Project Development and Axiomatization of a Ranking System CPSC 532L Project Development and Axiomatization of a Ranking System Catherine Gamroth cgamroth@cs.ubc.ca Hammad Ali hammada@cs.ubc.ca April 22, 2009 Abstract Ranking systems are central to many internet

More information

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

TREC 2017 Dynamic Domain Track Overview

TREC 2017 Dynamic Domain Track Overview TREC 2017 Dynamic Domain Track Overview Grace Hui Yang Zhiwen Tang Ian Soboroff Georgetown University Georgetown University NIST huiyang@cs.georgetown.edu zt79@georgetown.edu ian.soboroff@nist.gov 1. Introduction

More information

A P2P-based Incremental Web Ranking Algorithm

A P2P-based Incremental Web Ranking Algorithm A P2P-based Incremental Web Ranking Algorithm Sumalee Sangamuang Pruet Boonma Juggapong Natwichai Computer Engineering Department Faculty of Engineering, Chiang Mai University, Thailand sangamuang.s@gmail.com,

More information

Taccumulation of the social network data has raised

Taccumulation of the social network data has raised International Journal of Advanced Research in Social Sciences, Environmental Studies & Technology Hard Print: 2536-6505 Online: 2536-6513 September, 2016 Vol. 2, No. 1 Review Social Network Analysis and

More information

INFORMATION MANAGEMENT FOR SEMANTIC REPRESENTATION IN RANDOM FOREST

INFORMATION MANAGEMENT FOR SEMANTIC REPRESENTATION IN RANDOM FOREST International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, August 2015 www.ijcea.com ISSN 2321-3469 INFORMATION MANAGEMENT FOR SEMANTIC REPRESENTATION IN RANDOM FOREST Miss.Priyadarshani

More information

Clustering-Based Distributed Precomputation for Quality-of-Service Routing*

Clustering-Based Distributed Precomputation for Quality-of-Service Routing* Clustering-Based Distributed Precomputation for Quality-of-Service Routing* Yong Cui and Jianping Wu Department of Computer Science, Tsinghua University, Beijing, P.R.China, 100084 cy@csnet1.cs.tsinghua.edu.cn,

More information

COMP5331: Knowledge Discovery and Data Mining

COMP5331: Knowledge Discovery and Data Mining COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd, Jon M. Kleinberg 1 1 PageRank

More information

Top-N Recommendations from Implicit Feedback Leveraging Linked Open Data

Top-N Recommendations from Implicit Feedback Leveraging Linked Open Data Top-N Recommendations from Implicit Feedback Leveraging Linked Open Data Vito Claudio Ostuni, Tommaso Di Noia, Roberto Mirizzi, Eugenio Di Sciascio Polytechnic University of Bari, Italy {ostuni,mirizzi}@deemail.poliba.it,

More information

WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE

WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE Ms.S.Muthukakshmi 1, R. Surya 2, M. Umira Taj 3 Assistant Professor, Department of Information Technology, Sri Krishna College of Technology, Kovaipudur,

More information

Supplementary Information

Supplementary Information 1 2 3 4 5 6 7 8 9 10 11 12 Supplementary Information Competition-Based Benchmarking of Influence Ranking Methods in Social Networks Alexandru Topîrceanu Contents 1 Node overlapping correlation change as

More information

An Empirical Analysis of Communities in Real-World Networks

An Empirical Analysis of Communities in Real-World Networks An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization

More information

LINK GRAPH ANALYSIS FOR ADULT IMAGES CLASSIFICATION

LINK GRAPH ANALYSIS FOR ADULT IMAGES CLASSIFICATION LINK GRAPH ANALYSIS FOR ADULT IMAGES CLASSIFICATION Evgeny Kharitonov *, ***, Anton Slesarev *, ***, Ilya Muchnik **, ***, Fedor Romanenko ***, Dmitry Belyaev ***, Dmitry Kotlyarov *** * Moscow Institute

More information

Context based Re-ranking of Web Documents (CReWD)

Context based Re-ranking of Web Documents (CReWD) Context based Re-ranking of Web Documents (CReWD) Arijit Banerjee, Jagadish Venkatraman Graduate Students, Department of Computer Science, Stanford University arijitb@stanford.edu, jagadish@stanford.edu}

More information

TriRank: Review-aware Explainable Recommendation by Modeling Aspects

TriRank: Review-aware Explainable Recommendation by Modeling Aspects TriRank: Review-aware Explainable Recommendation by Modeling Aspects Xiangnan He, Tao Chen, Min-Yen Kan, Xiao Chen National University of Singapore Presented by Xiangnan He CIKM 15, Melbourne, Australia

More information

Improving Difficult Queries by Leveraging Clusters in Term Graph

Improving Difficult Queries by Leveraging Clusters in Term Graph Improving Difficult Queries by Leveraging Clusters in Term Graph Rajul Anand and Alexander Kotov Department of Computer Science, Wayne State University, Detroit MI 48226, USA {rajulanand,kotov}@wayne.edu

More information

Web page recommendation using a stochastic process model

Web page recommendation using a stochastic process model Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,

More information

HFCT: A Hybrid Fuzzy Clustering Method for Collaborative Tagging

HFCT: A Hybrid Fuzzy Clustering Method for Collaborative Tagging 007 International Conference on Convergence Information Technology HFCT: A Hybrid Fuzzy Clustering Method for Collaborative Tagging Lixin Han,, Guihai Chen Department of Computer Science and Engineering,

More information

An Improved Computation of the PageRank Algorithm 1

An Improved Computation of the PageRank Algorithm 1 An Improved Computation of the PageRank Algorithm Sung Jin Kim, Sang Ho Lee School of Computing, Soongsil University, Korea ace@nowuri.net, shlee@computing.ssu.ac.kr http://orion.soongsil.ac.kr/ Abstract.

More information

The link prediction problem for social networks

The link prediction problem for social networks The link prediction problem for social networks Alexandra Chouldechova STATS 319, February 1, 2011 Motivation Recommending new friends in in online social networks. Suggesting interactions between the

More information

AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING

AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING Irina Bernst, Patrick Bouillon, Jörg Frochte *, Christof Kaufmann Dept. of Electrical Engineering

More information

Towards Rule Learning Approaches to Instance-based Ontology Matching

Towards Rule Learning Approaches to Instance-based Ontology Matching Towards Rule Learning Approaches to Instance-based Ontology Matching Frederik Janssen 1, Faraz Fallahi 2 Jan Noessner 3, and Heiko Paulheim 1 1 Knowledge Engineering Group, TU Darmstadt, Hochschulstrasse

More information

Meta-path based Multi-Network Collective Link Prediction

Meta-path based Multi-Network Collective Link Prediction Meta-path based Multi-Network Collective Link Prediction Jiawei Zhang Big Data and Social Computing (BDSC) Lab University of Illinois at Chicago Chicago, IL, USA jzhan9@uic.edu Philip S. Yu Big Data and

More information

Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm

Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm K.Parimala, Assistant Professor, MCA Department, NMS.S.Vellaichamy Nadar College, Madurai, Dr.V.Palanisamy,

More information