Detecting Communities in K-Partite K-Uniform (Hyper)Networks

Size: px
Start display at page:

Download "Detecting Communities in K-Partite K-Uniform (Hyper)Networks"

Transcription

1 Liu X, Murata T. Detecting communities in K-partite K-uniform (hyper)networks. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 6(5): Sept. 0. DOI 0.007/s Detecting Communities in K-Partite K-Uniform (Hyper)Networks Xin Liu ( ) and Tsuyoshi Murata, Member, ACM, IEEE Department of Computer Science, Tokyo Institute of Technology, Tokyo 5-855, Japan tsinllew@ai.cs.titech.ac.jp; murata@cs.titech.ac.jp Received October, 00; revised July 4, 0. Abstract In social tagging systems such as Delicious and Flickr, users collaboratively manage tags to annotate resources. Naturally, a social tagging system can be modeled as a (user, tag, resource) hypernetwork, where there are three different types of nodes, namely users, resources and tags, and each hyperedge has three end nodes, connecting a user, a resource and a tag that the user employs to annotate the resource. Then how can we automatically cluster related users, resources and tags, respectively? This is a problem of community detection in a 3-partite, 3-uniform hypernetwork. More generally, given a K-partite K-uniform (hyper)network, where each (hyper)edge is a K-tuple composed of nodes of K different types, how can we automatically detect communities for nodes of different types? In this paper, by turning this problem into a problem of finding an efficient compression of the (hyper)network s structure, we propose a quality function for measuring the goodness of partitions of a K-partite K-uniform (hyper)network into communities, and develop a fast community detection method based on optimization. Our method overcomes the limitations of state of the art techniques and has several desired properties such as comprehensive, parameter-free, and scalable. We compare our method with existing methods in both synthetic and real-world datasets. Keywords community detection, bipartite graph, tripartite hypergraph, clustering, social tagging Introduction Networks are appropriate models for studying the structures and dynamics of real-world systems, where nodes represent the fundamental entities of a system and edges represent the relationships between entities. One crucial step when conducting such a study is to detect communities: groups of related nodes that correspond to functional subunits of the underlying system such as protein complexes and social spheres. Community detection in unipartite networks has been extensively investigated [-5]. There are many heterogenous systems that can be modeled as K-partite K-uniform (hyper)networks, or simply K, K -(hyper)networks, where the nodes can be divided into K disjoint sets and each (hyper)edge has K end nodes, one in each set. For example, the DBLP computer science bibliography can be modeled Fig.. (a) A unipartite network. (b), -network. (c) 3, 3 -hypernetwork, where the 3-way hyperedges are represented as curved lines. Regular Paper This work was supported in part by JSPS Grant-in-Aid under Grant No and IBM Ph.D. Fellowship. In unipartite networks, there is no restriction on edges, i.e., an edge can connect any two nodes. It is considered that unipartite networks are suited to model homogeneous systems composed of entities of a single type, e.g., a social network where nodes represent individuals and edges represent the friendship between two individuals (see Fig.(a)). Since edges have more than two end nodes when K >, they are actually hyperedges. Correspondingly, the networks are hypernetworks. 0 Springer Science + Business Media, LLC & Science Press, China

2 Xin Liu et al.: Detecting Communities in K, K -(Hyper)Networks 779 as an author-paper network, where edges between authors and papers represent the authoring relationship. This is a case for, -networks, often referred as bipartite networks (see Fig.(b)). In social tagging systems, users collaboratively manage tags to annotate resources. The tagging relationship involves three entities of different types: a user, a resource and a tag that the user employs to annotate the resource. A social tagging system can be naturally modeled as a (user, tag, resource) hypernetwork. This is a case for 3, 3 - hypernetworks (see Fig.(c)). As for community detection in K, K - (hyper)networks, a common strategy is to reduce a K, K -(hyper)network to simpler unipartite networks [6-7],, -networks [8], or K, -networks [9]. One major drawback of this strategy is that some valuable information of the original (hyper)network is lost during reduction [0], and the subsequently detected communities are less accurate. Researchers also proposed extended modularity optimization [-5] and tensor decomposition [6] methods. But these two methods are designed to detect communities of one-to-one correspondence, as shown in Fig.(a). Real-world heterogeneous systems are often more complex than that. For example, ) in DBLP computer science bibliography, a research group may publish papers on data mining and also on natural language processing; ) in social tagging systems, a group of users may have interest in resources about programming and also enjoy resources about sports news; a collection of resources about flower photos may be annotated by some users with tags like flower, beautiful, nature, and also annotated by some other users with semantically different tags like canon, macro, 00 mm lens (because they are photography enthusiasts). Hence, communities of many-to-many correspondence, as shown in Fig.(b), are practically more significant. To the best of our knowledge, there is no method that is able to detect communities of many-to-many correspondence up to now. Besides, another disadvantage of some existing methods [9,6-0] is that they require experimenters to specify certain parameters such as the numbers of communities. In practice, such a priori knowledge is difficult to obtain. Rosvall and Bergstrom recently proposed an information compression method for community detection Fig.. (a) Communities of one-to-one correspondence. (b) Communities of many-to-many correspondence in a 3, 3 -hypernetwork. in unipartite networks []. The main insight is to convert the community detection problem to a problem of finding an efficient compression of the network s structure. In this paper, we extend their idea and propose a framework to address the problem of community detection in K, K -(hyper)networks. Specifically, we propose a quality function for measuring the goodness of partitions of a K, K -(hyper)network into communities, and develop a fast algorithm for optimizing the quality function. Our method overcomes the limitations of existing methods and has the following key properties. Comprehensive: it is able to handle broad families of K, K -(hyper)networks, and is competent for both communities of one-to-one correspondence and communities of many-to-many correspondence. Parameter-free: it can automatically detect communities, without any a priori knowledge like the numbers of communities. Accurate: it is more sensitive than previous methods. Scalable: it is fast and scalable to large-scale data. The rest of the paper is organized as follows. Section reviews related research. Section 3 formulates the problem of community detection in K, K - (hyper)networks. We introduce our method, including the quality function and the optimization algorithm in Section 4, with attendant notes on implementation in Appendix. Section 5 presents experimental results, followed by a conclusion in Section 6. Related Work Here we discuss related work from four areas: K, -networks are networks whose nodes can be divided into K disjoint sets and each edge can only connect two nodes in different sets. When K =, K, K -network and K, -network indicate the same thing, namely the bipartite network. When K >, K, K -hypernetwork differs fundamentally from K, -network, since the building block of the former is the hyperedge connecting K end nodes, while building block of the latter is the pairwise edge connecting two end nodes. A community is said to have correspondence with another community, if there are dense connections between them. In a community detection problem, the numbers of communities are not specified by experimenters but should be found by a method itself [6,]. Thus, the classic clustering and graph partitioning approaches which require experimenters to specify the numbers of communities are not qualified.

3 780 J. Comput. Sci. & Technol., Sept. 0, Vol.6, No.5 unipartite network,, -network, K, -network (K > ), and K, K -hypernetwork (K > ).. Unipartite Network The study of community detection in unipartite networks has a long history. It is closely related to graph partitioning [3] in graph theory and computer science, and hierarchical clustering [4] in sociology. In recent years, this study has attracted a great deal of interest, especially in the realm of statistical physics [-5]. And, particular attention should be paid to modularity [5-6], a quality function for measuring the goodness of partitions of a unipartite network into communities. It is based on the idea that a random network is not expected to have community structure, so the possible existence of communities in a given network is revealed by comparison between certain quantities in this given network and those in an appropriate random network. Although modularity suffers from the resolution limit [7] attributed to the existence of multiscale community structure [8], it has been widely used. Modularity optimization [9-37] is perhaps the most popular method for community detection in unipartite networks, partly in virtue of its parameter-free nature.., -Network Due to the success of modularity optimization, some researchers extended it to, -networks [-3]. In their extended modularity formulas, a community seeks only one community of another node type, with dense connections between them. Hence, one major drawback of these methods is that they are biased towards communities of one-to-one correspondence, and cannot handle communities of many-to-many correspondence. In parallel, Guimerà et al. proposed another extended modularity optimization method [6], which focuses on community detection in one node set at a time. Their method essentially amounts to carrying out modularity optimization in a unipartite network reduced from the original, -network. Thus, as mentioned before, it suffers from the accuracy problem. There are many researches on co-clustering objects of two types. The information-theoretic co-clustering algorithm [7] is among the first to address this problem. Follow-up work includes [8-0]. However, all of them require a priori knowledge such as the numbers of clusters..3 K, -Network (K > ) Community detection in K, -Networks is related to the problem of co-clustering objects of multiple types. Successful methods include consistent bipartite graph co-partitioning [38], high order coclustering [39], spectral relational clustering [40], graph approximation [4], and collective matrix factorization [4-43], etc. All of these methods need experimenters to specify certain parameters..4 K, K -Hypernetwork (K > ) A common strategy for processing a K, K - hypernetwork is to reduce it to simpler networks. Take a (user, tag, resource) hypernetwork as an example. In Zlatić s method [7], they reduce the hypernetwork to three unipartite networks respectively for users, tags and resources based on a node similarity measure, and then use bottom-up optimization algorithm in each of the unipartite networks; in Neubauer s method [8], they reduce the hypernetwork to a user-tag, -network, a user-resource, -network and a tag-resource, - network to formulate an extended modularity and then apply modularity optimization algorithm. In Lu s method [9], they decompose the 3-way (user, tag, resource) hyperedge into pairwise user-tag, user-resource, and tag-resource edges to build a user-tag-resource 3, -network, and then employ K-means algorithm. One major drawback of this class of methods is that some valuable information of the original hypernetworks is lost during reduction [0], and the subsequently detected communities are less accurate. Murata extended modularity optimization to K, K -hypernetworks [4-5]. In his extended modularity formula, a community seeks only one community of another node type, with dense connections between them. Hence, this method is biased towards communities of one-to-one correspondence. Lin et al. presented an approach for detecting communities in rich media social networks by consistent decomposition of multiple tensors [6], which also applies to the community detection problem in K, K -hypernetworks. However, they only consider the same numbers of communities in different node sets (due to the restriction on dimensions of the decomposed factor matrices), implying the oneto-one correspondence between communities. Another weakness is that these numbers have to be specified by experimenters in advance. There are researches towards understanding the properties and structures of the 3, 3 -hypernetwork model for social tagging systems. Zlatić et al. discussed hyperedge distributions, node similarity, and correlations [7]. Cattuto et al. studied path length, clustering coefficient, and node connectivity [44]. Halpin et al. focused on dynamics of tag distributions [45]. 3 Problem Formulation One fundamental issue is the definition of

4 Xin Liu et al.: Detecting Communities in K, K -(Hyper)Networks 78 community in a K, K -(hyper)network. Generally, a community should be a group of related nodes that correspond to a functional subunit of the underlying real-world system. In unipartite networks, a community is often understood as a group of nodes with dense connections between them. But this notion is not suitable for K, K -(hyper)networks, since nodes of the same type are not connected. Instead, we consider a K, K -(hyper)network community as a group of parallel nodes (of the same type), i.e., the nodes that are similar to one another as regards their relations to nodes of other types. In other words, parallel nodes connect to other nodes in similar ways [46-47]. This is a natural assumption, because a group of parallel nodes are very likely to form a functional subunit. For example, in a social tagging system, those users having similar tagging actions are very likely to share the same interest; those resources that are annotated with many common tags are very likely to be in the same category. In the following, we formulate the problem of community detection in K, K -(hyper) networks. Now assume an undirected and unweighted K, K -(hyper)network H = (V () V () V (K), E), where V (k) is the k-th node set, and E {(v () i, v () i,..., v (K) ) v (k) i k V (k) } is the set of K-way (hyper)edges. Suppose n (k) = V (k) is the number of nodes in the k-th node set, and m = E the number of (hyper)edges. The structure of H can be represented by a K-dimensional array A of n () n () n (K) size, with elements {, if (v () i A ii =, v () i 0, otherwise.,..., v (K) ) E; The problem of community detection in H is that, given A, how we can find a good partition C = {V α () } c() α = {V α () } c() α = {V α (K) K } c(k) α K = that divides V (), V ()..., V (K) into disjoint communities, respectively: c () α = V () α = V (), c () α = V () α = V (), c (K) α K = V (K) α K = V (K). Note that the numbers of communities c (k) (k =,,..., K) are not known a priori. The meaning of good is twofold: ) nodes in the same community are parallel (in a sense that we mentioned in the first paragraph of Section 3, and the same. hereinafter); ) (hyper)edges between communities are either dense or sparse, so that the correspondence between communities is clear. It is easy to see that the above criteria of good apply to both communities of one-to-one correspondence and communities of manyto-many correspondence. Whether we should make a partition into the former or the latter is determined by the intrinsic structure of H itself. Table summarizes the notations frequently used in this paper. Table. Notations for a K, K -(Hyper)Network H Symbol Meaning V (k) The k-th node set E n n (k) c (k) m v (k) i k V (k) α k n (k) α k S (k) The (hyper)edge set The total number of nodes The number of nodes in the k-th node set The number of communities in the k-th node set The total number of (hyper)edges The i k -th node of the k-th node set The α k -th community of the k-th node set The number of nodes in the α k -th community of the k-th node set The vector whose element S (k) i indicates community membership of the i k -th node of the k-th node k set A The K-dimensional array whose element A i i indicates the number of (hyper)edges between v () i, v () i,..., v (K) M The K-dimensional array whose element M α α α K indicates the number of (hyper)edges between V α (), V α ()..., V α (K) K 4 The Proposed Method In this section we present our method for solving the problem formulated in Section 3. We first define a quality function for measuring the goodness of partitions of a K, K -(hyper)network into communities, and then propose an algorithm for optimizing the quality function. 4. Quality Function The main insight of the information compression method proposed by Rosvall and Bergstrom [] is to convert the community detection problem to a problem of finding an efficient compression of the network s structure. In their study, they focus on information compression on a unipartite network s structure. In the following, we extend their idea to show how to compress the structural information of a K, K -(hyper)network, in order to formulate our quality function. Now let us envision a communication process of transmitting structural information of a K, K - (hyper)network H. A signaler knows the structure of H

5 78 J. Comput. Sci. & Technol., Sept. 0, Vol.6, No.5 and aims to transmit much of the information in a reduced fashion to a receiver over a noiseless channel. To do so, the signaler makes a partition of H into communities and encodes the structural information X = {A} as compressed information summarizing the community partition: Y = {S (), S (),..., S (K), M}, where S (k) is the community membership vector of the k-th node set, and M is the community connectivity array. For a partition dividing the k-th node set into c (k) communities, we have S (k) = [S (k), S(k),..., S(k) ], where n (k) S (k) i k {,,..., c (k) } indicates the community membership of node v (k) i k. The community connectivity array M is a K-dimensional array of c () c () c (K) size, with element M αα α K {0,,..., m} indicating the number of (hyper)edges between communities V α (), V α ()..., V α (K) K. That is M αα α K = A ii. v () i V α () v () i V α () v (K) i V α (K) K K It is easy to derive that the description length (in bits) of the compressed information Y is K K L(Y ) = (n (k) logc (k) ) + c (k) log(m + ) k= k= where the logarithm is taken in base. After receiving Y, the receiver knows the community membership of each node and the number of (hyper)edges between each community K-tuple (V α (), V α () ). Then he tries to recover the original structural information X by constructing possible candidates. The number of different candidates is given..., V (K) α K by c () c () α = α = c (K) α K = ( n () n (K) α K α n () α M αα α K ) () where n (k) α k = V α (k) k is the number of nodes in community V α (k) k, the parentheses in () denote the binomial coefficient, and each binomial coefficient gives the number of different candidates for recovering the original M αα α K (hyper)edges between V α (), V α (),..., V α (K) K. Hence, the description length (in bits) of the additional information for the receiver to recover X (i.e., the conditional information between X and Y ) is [ c () L(X Y ) = log c () α = α = c (K) α K = ( n () n (K) α K α n () α M αα α K ) ]. The objective is that the signaler transmits the least while the receiver receives the most. This is apparently a dilemma. If the signaler makes a partition of V (k) into n (k) communities (k =,,..., K), meaning one community for each node, there would be no compression on Y. Thus, the receiver can recover X completely, without any additional information (L(X Y ) = 0), while the signaler has to transmit the most (L(Y ) gets the largest). Conversely, if the signaler makes a partition of V (k) into only one community (k =,,, K), Y would be in the most compressed form. Thus, the signaler transmits the least information (L(Y ) gets the smallest), while the receiver needs the most additional information to recover X (L(X Y ) gets the largest). If the signaler makes a good partition (as described in Section 3, and the same hereinafter), he can Fig.3. Information compression on a 3, 3 -hypernetwork.

6 Xin Liu et al.: Detecting Communities in K, K -(Hyper)Networks 783 highlight certain regularities (e.g., the similarities of nodes in the same community) and filter out relatively unimportant details (e.g., the dissimilarities of nodes in the same community). Intuitively, the compression based on it would achieve the optimal trade-off between L(Y ) and L(X Y ) (see Fig.3 for illustration). According to the minimum description length (MDL) principle [48], Q H (C ) = L(Y ) + L(X Y ) = K K (n (k) log c (k) ) + c (k) log(m + )+ k= [ c () log c () α = α = k= c (K) α K = ( n () n (K) α K α n () α M αα α K would get the minimum value. This is the quality function for C. It is clear that the lower value of Q H, the better of C. It should be emphasized that Q H highly values a good partition, which applies to both communities of one-to-one correspondence and communities of many-to-many correspondence. 4. Algorithm for Minimizing Quality Function Now we can evaluate a partition based on the quality function Q H, and a low value of Q H indicates a good partition. Then the task is to search over all possible partitions for one that has a minimum Q H. However, like modularity optimization [9-37], finding the global optimal solution is NP-hard [49]. Thus approximate algorithm is required. We develop an algorithm modified from label propagation algorithm (LPA) [37,50-5]. LPA is fast for minimizing Q H (originally designed for maximizing the modularity [5] ), but it is prone to get stuck in a poor local minimum [37]. Our algorithm achieves a proper balance between accuracy and speed. It can be divided into two iterative phases. In Phase, we run LPA in the original (hyper)network H, to reach a local minimum of Q H. In Phase, we run LPA in a reduced (hyper)network H, to escape the local minimum previously reached. Specific procedures and detailed explanations of our algorithm are as follows. Initially we assign each node a unique label, indicating its community membership. Therefore, there are as many initial communities as there are nodes. Then we enter Phase (running LPA in H). (a) In each step, in a random sequential order, update each node s label to one of the existing labels (the existing labels in the node set of the considered node) that generates the greatest decrease of Q H ; if no new ) ] label generates a decrease of Q H, keep the node s label unchanged (the label updating rule). (b) Repeat (a), each step in a new random sequential order, until no decrease of Q H can be achieved. (c) Identify communities as nodes bearing the same labels. Algorithm. Detecting Communities in a K, K - (Hyper)Network H by Minimizing Q H Input: Connectivity array A of H Output: A community partition C Begin Assign each node in H a unique label; 3 repeat //Phase 4 repeat 5 Update each node s label in H; 6 until a local minimum of Q H //Phase 7 Build a reduced K, K -(hyper)network H; 8 Assign each node in H a unique label; 9 repeat 0 Update each node s label in H; until a local minimum of Q H Retrieve node labels in H from the corresponding labels in H; 3 until no change in Q H 4 Identity communities as groups of nodes bearing 5 End the same labels; In the above procedures, note that ) according to the label updating rule, Q H never increases; ) some initial labels vanish while some others become popular. Thus, Phase in effect seeks to minimize Q H by combining communities. At the initial stage when each community is composed of one node, we can easily combine communities. Later, with communities getting larger and larger, combining them becomes harder and harder: combining community V (k) with community V (k) means that the labels of all nodes in V (k) are updated to V (k) s label; but it is difficult to achieve this consensus based on the updating rule of each individual node. Consequently, Phase finally stops and converges to a local minimum of Q H. In Phase we try to escape the local minimum. Following the method presented in [35], we build a reduced (hyper)network H whose nodes are now the

7 784 J. Comput. Sci. & Technol., Sept. 0, Vol.6, No.5 communities identified in Phase. Then, we assign each node in H a unique label, and sequently and repeatedly update their labels (running LPA in H). Note that the nodes are now the communities identified in Phase. By updating a node s label to another, we actually combine these two communities and escape the local minimum. After Phase finishes, we would come back to the original (hyper)network H and enter Phase again. That is, we first retrieve each node s label from the corresponding label in H, and then run LPA to reach another local minimum. With these two phases being repeated iteratively, a fairly good solution can be finally obtained. It should be noted that the above algorithm, though described as to minimize Q H, can be used to optimize other quality functions (for maximization, just make some minor modification, e.g., decrease increase, minimum maximum). In Appendix, we discuss implementation issues and show how this algorithm can be speeded to run in near linear time. 0 8 in the second [54]. Taken together, we consider the partition that divides women into two communities { 9} and {0 8} as the ground truth (since it is in agreement with both the findings of Davis and Freeman). 5 Experiments In this section we present experiments for evaluating the performance of our method in, -network and 3, 3 -hypernetwork. 5. Southern Women, -Network First, we use the famous Southern women, - network as a touchstone. The original data underlying this network were collected by Davis et al. during the 930s as part of an extensive study of class and race in black and white society in the Deep South [53]. It describes the participation of 8 women in 4 social events. Then we can derive a, -network whose nodes represent women and social events, and whose edges represent the participation of the women in the events. This data and the corresponding, -network have been intensively studied by social scientists. ) Based on the ethnographic knowledge, Davis made a partition of the 8 women into two groups women 9 in the first group and women 9 8 in the second (woman 9 is a secondary member of both groups) [53]. ) Freeman reviewed different studies and identified a consensus partition of the 8 women into two groups women 9 in the first group and women Fig.4. Partitions of the Southern women, -network obtained by (a) our method, (b) the extended modularity optimization advanced by Guimerà [6], (c) the extended modularity optimization proposed by Murata [], (d) the extended modularity optimization presented by Barber [], and (e) the extended modularity optimization brought forward by Suzuki [3]. Women are indicated as circle symbols located at the top side, while events are indicated as square symbols located at the bottom side. Nodes in the same community are painted in the same color. Except for our method, existing approaches for community detection in, -networks include the extended modularity optimization methods [6,-3] and In order to calculate QH in H, we introduce the node weight (equal to the number of nodes in the corresponding communities in H) and (hyper)edge weight (equal to the number of (hyper)edges between the corresponding communities in H) [35,5]. For a given node vx in H, there is a node v x in H corresponding to v x s community, so we retrieve v x s label from v x s label. In Phase, all nodes labels in a community are forced to be updated jointly. By running LPA in H, we can update the individual labels separately and further refine the solution.

8 Xin Liu et al.: Detecting Communities in K, K -(Hyper)Networks 785 the co-clustering algorithms [7-0]. Here we only consider the extended modularity optimization methods, since the co-clustering algorithms require a priori knowledge such as the numbers of clusters and are not qualified in a strict sense [6,]. Fig.4 shows the partitions obtained by different methods. (The partitions obtained by the extended modularity optimization methods are reported in [6,, 3].) It is satisfying to find that only our method s partition for women (the top side nodes in Fig.4(a)) is consistent with the ground truth proposed by the social scientists. Our partition of events into three communities is also reasonable, as it conforms to the criteria of good : ) nodes within the same event community are parallel ; ) the correspondence between communities is clear event communities { 6} and {0 4}, respectively, correspond with woman communities { 9} and {0 8}, while event community {7 9} corresponds with both woman communities. In addition, our method detects communities of manyto-many correspondence, while others detect communities of one-to-one correspondence. (Although there are many-to-many correspondence between communities described in Fig.4(e), this partition seems somewhat strange, since several communities consist of only one node.) 5. Synthetic 3, 3 -Hypernetwork Now, let us concentrate on comparing our method with existing methods through a standard benchmark test [,8,4-5,5] in synthetic 3, 3 -hypernetworks. The basic scheme is as follows. ) We generate a set of random 3, 3 -hypernetworks with known community structure (the true partition). ) Applying various community detection methods to these hypernetworks (the true partition is hidden at this time), we compare the similarities between partitions obtained by these methods and the true partition. 3) The more similar an obtained partition is to the true partition, the better of the corresponding method. To quantify the similarity between two partitions, we adopt the widely used normalized mutual information (NMI) []. NMI is an information theoretic measure that calculates the amount of common information between two partitions. Specifically, NMI(P, P ) c P cp α= β= = N αβ log(n αβ N/N α N β ) cp α= N α log(n α /N) + c P β= N β log(n β /N) where P and P are two partitions on the same set of nodes, c P is number of communities in P, c P is number of communities in P, N is the total number of nodes, N α is the number of nodes in the α-th community of P, N β is the number of nodes in the β-th community of P, and N αβ is the number of nodes that are both in the α-th community of P and the β-th community of P. If P and P match completely, we have a maximum NMI value of, whereas if P and P are totally independent of one another, we have a minimum value of 0. For comparison, we consider several methods that cover state-of-the-art techniques. They are, in order, the extended modularity optimization method (ExModularity) [4], the tensor decomposition method (MetaFac) [6], and the method that involves reduction of a 3, 3 -hypernetwork to, -networks (BiReduction) [8] (a brief description of these three methods is included in Subsection.4). In addition, we consider another method (UniReduction) modified from Zlatić s approach [7], which involves reduction of a 3, 3 -hypernetwork to unipartite networks. Detailed description of UniReduction is as follows. To make it clear, we color the three node sets V (), V (), and V (3) in a 3, 3 -hypernetwork red, green, and blue, respectively. Suppose we are to detect red communities. We first reduce the original hypernetwork to a weighted unipartite network of red nodes, and then employ modularity optimization in this unipartite network. The edge weight w xy () between v x () and v y () in the unipartite network is equal to their similarity in the original hypernetwork as measured by Jaccard index [55]. In specific, xy = Γ (v() x ) Γ (v () Γ (v x () ) Γ (v () w () = n () i = y ) y ) n () n (3) i = i A 3= xi i 3 A yii 3 n (3) i (A 3= xi i 3 + A yii 3 A xii 3 A yii 3 ), where Γ (v x () ) and Γ (v y () ) denote the neighbor sets of v x () and v y (). Justification of this method is that the more similar two nodes are to each other, the larger of the edge weight between them, and thus the more probable they are grouped into the same community by modularity optimization. Note that ExModularity, UniReduction, and BiReduction all rely on heuristics to optimize their respective quality functions. The original ExModularity and The major difference between Zlatić s approach and UniReduction lies in the similarity measure for calculating the edge weight. In Zlatić s approach, the similarity measure only considers neighbors in one node set, i.e., either green or blue neighbors of v x () and v y () (thus some information of the original hyperedges is lost). In UniReduction, the similarity measure considers both green and blue neighbors.

9 786 J. Comput. Sci. & Technol., Sept. 0, Vol.6, No.5 BiReduction use the greedy bottom-up algorithm [9]. To be fair, we consistently use Algorithm (described in Subsection 4., and the same hereinafter) as the optimization algorithm for our method, ExModularity, UniReduction, and BiReduction. (Actually, Algorithm is empirically proved to be superior to the greedy bottom-up algorithm in terms of the optimization result.) 5.. Communities of One-to-One Correspondence For the first case, we consider comparing these methods in a set of synthetic 3, 3 -hypernetworks with builtin communities of one-to-one correspondence. Detailed hypernetwork generation procedures are as follows. ) Set the numbers of red, green and blue nodes as n () = n () = n (3) = 60. Set the numbers of red, green and blue communities as c () = c () = c (3) = 8. Equally divide the red, green and blue nodes into c (), c () and c (3) communities. That is, n () α = n () α = n (3) α 3 = 0 (α =,..., c () ; α =,..., c () ; α 3 =,..., c (3) ). ) For each community triple (V α (), V α (), V α (3) 3 ), first set the hyperedge density as { pdense ± rand, if α = α = α 3 ; p αα α 3 = rand, otherwise. () rand is a random number from 0.0 to.0, which acts as a noise factor. Then, with probability p αα α 3, randomly place hyperedges connecting nodes v () i V α (), v () i V α () and v (3) i 3 V α (3) 3. From (), we can find that dense hyperedges are placed between the α-th red, α-th green and α-th blue communities (α =,..., 8), constituting their one-to-one correspondence. By gradually decreasing p dense from 0.08 to 0.00, we generate different hypernetworks, with the resulting community structure more and more weak. Thus we pose greater and greater challenges to different methods. In Fig.5, we show the performances of various methods in this set of hypernetworks (values are averaged over 0 runs). On the whole, performance of each method varies in a similar way across red, green and blue node sets (since red, green and blue nodes are in a symmetric status in the hypernetwork generation procedures). Specifically, our method, ExModularity, and BiReduction perform excellently, correctly detecting not only the numbers of communities but also community membership of each node almost all the way to the point p dense = At the turning stage, i.e., when p dense falling from to 0.03, our method slightly outperforms ExModularity and BiReduction, as shown in the embedded figures. MetaFac, though given a priori knowledge of the true numbers of communities (the number of red/green/blue communities are set as 8), does not provide remarkable result. Not all nodes community memberships can be detected by MetaFac, even at p dense = The record of UniReduction is even worse. Its performance began to deteriorate as early as p dense = All of these methods miss the true community structure when p dense <0.05. To see this, we compare the partitions obtained by our method, ExModularity, UniReduction and BiReduction with the true partition. We are surprised to find that these obtained partitions are consistently much better than the true partition, as measured by their respective quality functions (the true partition should have been better than any other partitions based on an objective view), that is, these quality functions fail to tell right from wrong and become invalid. Consequently, all of the methods fail, since one can never expect an optimization algorithm to find the Fig.5. Performances of various methods in the 3, 3 -hypernetworks with built-in communities of one-to-one correspondence. (a) Red node set. (b) Green node set. (c) Blue node set. For different pdense, the number of hyperedges generally ranges from to

10 Xin Liu et al.: Detecting Communities in K, K -(Hyper)Networks 787 true partition under an invalid quality function. The only difference is that, under different quality functions, our method favors a partition into only one community (in each node set), while ExModularity and BiReduction prefer a partition into over 40 communities (in each node set). According to the information theory, a partition into one community is totally independent of the true partition, leading to an NMI value of 0 (this is the reason that our method s value dives towards 0). On the other hand, a randomly generated partition into 40 communities is expected to have an NMI value of 0.30±0.000 (based on tests), which is comparable to the values achieved by ExModularity and BiReduction (this is the reason that these two methods can still have nice looking records even when no community structure exists, i.e., at p dense = 0.00). Therefore, it is just deceptive information given by NMI, and ExModularity and BiReduction are actually not superior to our method when p dense < Communities of Many-to-Many Correspondence As for the second case, we generated a set of random 3, 3 -hypernetworks with built-in communities of many-to-many correspondence. Detailed procedures are as follows. ) Set the numbers of red, green and blue nodes as n () = 0, n () = 60, n (3) = 00. Set the numbers of red, green and blue communities as c () = 6, c () = 8, c (3) = 0. Equally divide red, green and blue nodes into c (), c () and c (3) communities. That is, n () α = n () α = n (3) α 3 = 0 (α =,..., c () ; α =,..., c () ; α 3 =,..., c (3) ). ) For each community triple (V α (), V α (), V α (3) 3 ), first set the hyperedge density as p αα α 3 { pdense ± rand, with prob. 0.; rand, with prob (3) Then, with probability p αα α 3, randomly place hyperedges connecting nodes v () i V α (), v () i V α () and v (3) i 3 V α (3) 3. From (3), we can find that around 48 out of the c () c () c (3) = 480 community triples are randomly selected and placed with dense hyperedges. Since 48 is much larger than c (), c () and c (3), there must be manyto-many correspondence between communities. Similarly, we gradually decrease p dense from 0.08 to 0.00, and generate a set of hypernetworks O, with resulting community structure more and more difficult to detect. The performances of various methods in this set of hypernetworks are shown in Fig.6 (values are averaged over 0 runs). As Fig.6 shows, our method outperforms others by a large margin. It works almost perfectly all the way until p dense = 0.05, with a sudden degradation thereafter. (The reason that our method s NMI value is well below others when p dense < 0.05 is the same as discussed in the test for communities of one-to-one correspondence.) As for other methods, we can observe two common features. ) None of them can detect community memberships with 00% accuracy, even when p dense = ) Their performances deteriorate much earlier than our method, often with records fluctuating wildly before the turning points. Therefore, as expected, these methods cannot handle communities of many-to-many correspondence. In specific, UniReduction is the second best in most of the time, followed by MetaFac. Note that MetaFac is given at least an estimate of the true numbers of communities (the number of red/green/blue communities are set as 8), so its performance is not appealing. Contrary to the excellent performances in the previous set of hypernetworks, BiReduction and Ex- Modularity do not show satisfactory results this time. Remark. One may wonder that why the performance of our method in this test is better than that in Fig.6. Performances of various methods in the 3, 3 -hypernetworks with built-in communities of many-to-many correspondence. (a) Red node set. (b) Green node set. (c) Blue Node set. O For different pdense, the number of hyperedges generally ranges from to

11 788 J. Comput. Sci. & Technol., Sept. 0, Vol.6, No.5 the previous test for communities of one-to-one correspondence. This is because there are more communities triples placed with dense connections this time (recall that around 48 community triples are placed with dense hyperedges in the hypernetwork generation procedures for communities of many-to-many correspondence, while only 8 community triples are placed with dense hyperedges in the generation procedures for communities of one-to-one correspondence). Thus, more information can be capitalized by our method, resulting in a better performance. These two tests show that our method is competent for detecting both communities of one-to-one correspondence and communities of many-to-many correspondence. For the former case, our method slightly wins the state-of-the-art techniques. For the latter case which is more significant in practice, our method outperforms others by a large margin. Whether we should make a partition into the former case or the latter case is determined by the intrinsic structure of the (hyper)network itself. 6 Conclusion Based on the information compression idea [], we define a quality function for measuring the goodness of partitions of a K, K -(hyper)network into communities, and develop an algorithm for optimizing this quality function. Our method provides a framework to solve the community detection problem in K, K - (hyper)networks. Compared with existing methods, our method is competent for both communities of one-to-one correspondence and many-to-many correspondence. It should be emphasized that our method is automatic and independent of any priori knowledge like the numbers of communities. By carefully designing the algorithm (see Appendix), our method also has the desired property of scalability. The framework proposed in this paper is the first step for analyzing complex heterogeneous systems. A real-world heterogeneous system often contains several different kinds of relationships: some relationships are formed between entities of the same type, some are between entities of different types. Generally, the relationship between entities of the same type can be modeled as a unipartite network, while the relationship between entities of different types can be modeled as a K, K -(hyper)network. Thus, such a heterogeneous system can be modeled as multiple related (hyper)networks, one for each kind of relationship. Community detection from multiple related (hyper)networks is left for our future work. References [] Fortunato S. Community detection in graphs. Physics Reports, 00, 486: [] Danon L, Duch L, Guilera A D, Arenas A. Comparing community structure identification. J. Stat. Mech, 005, 9: P [3] Lancichinetti A, Fortunato S. Community detection algorithms: A comparative analysis. Phys. Rev. E, 009, 80(5): [4] Leskovec J, Lang K J, Mahoney M W. Empirical comparison of algorithms for network community detection. In Proc. the 9th International Conference on World Wide Web, Raleigh, USA, Apr. 6-30, 00, pp [5] Shen H, Cheng X. Spectral methods for the detection of network community structure: A comparative analysis. J. Stat. Mech., 00, 0: P000. [6] Guimerà R, Pardo M S, Amaral L A N. Module identification in bipartite and directed networks. Phys. Rev. E, 007, 76(3): [7] Zlatić V, Ghoshal G, Caldarelli G. Hypergraph topological quantities for tagged social networks. Phys. Rev. E, 009, 80(3): [8] Neubauer N, Obermayer K. Towards community detection in k-partite k-uniform hypergraphs. In Workshop on Analyzing Networks and Learning with Graphs, Whistler, BC, Canada, Dec., 009. [9] Lu C, Chen X, Park E K. Exploit the tripartite network of social tagging for web clustering. In Proc. the 8th ACM Conference on Information and Knowledge Management, Hong Kong, China, Nov. -6, 009, pp [0] Zhou T, Ren J, Medo M, Zhang Y C. Bipartite network projection and personal recommendation. Phys. Rev. E, 007, 76(4): [] Barber M J. Modularity and community detection in bipartite network. Phys. Rev. E, 007, 76(6): [] Murata T, Ikeya T. A new modularity for detecting one-tomany correspondence of communities in bipartite networks. Advances in Complex Systems, 00, 3(): 9-3. [3] Suzuk, Wakita K. Extracting multi-facet community structure from bipartite networks. In Proc. International Conference on Computational Science and Engineering, Vancouver, BC, Canada, Aug. 9-3, 009, pp [4] Murata T. Detecting communities from tripartite networks. In Proc. the 9th International Conference on World Wide Web, Raleigh, USA, Apr. 6-30, 00, pp [5] Murata T. Modularity for heterogeneous networks. In Proc. the st ACM Conference on Hypertext and Hypermedia, Toronto, Canada, Jun. 3-6, 00, pp [6] Lin Y R, Sun J, Castro P, Konuru R, Sundaram H, Kelliher A. Metafac: Community discovery via relational hypergraph factorization. In Proc. the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, Jun. 8-Jul., 009, pp [7] Dhillon I S, Mallela S, Modha D S. Information-theoretic coclustering. In Proc. the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, USA, Aug. 4-7, 003, pp [8] Li T. A general model for clustering binary data. In Proc. the th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, USA, Aug. -4, 005, pp [9] Banerjee A, Dhillon I, Ghosh J, Merugu S, Modha D S. A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. Journal of Machine Learning Research, 007, 8:

12 Xin Liu et al.: Detecting Communities in K, K -(Hyper)Networks 789 [0] Long B, Zhang Z, Yu P S. A probabilistic framework for relational clustering. In Proc. the 3th ACM International Conference on Knowledge Discovery and Data Mining, San Jose, USA, Aug. -5, 007, pp [] Newman M E J. Networks: An Introduction. New York: Oxford University Press, 00. [] Rosvall M, Bergstrom C T. An information-theoretic framework for resolving community structure in complex networks. Proc. Natl. Acad. Sci. USA, 007, 04(8): [3] Kernighan B, Lin S. An efficient heuristic procedure to partition graphs. Bell Syst. Tech. J., 970, 49(): [4] Scott J. Social Network Analysis: A Handbook. Second Edition, Sage Publications, Newberry Park, CA, 000. [5] Newman M E J, Girvan M. Finding and evaluating community structure in networks. Phys. Rev. E, 004, 69(): 063. [6] Newman M E J. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA, 006, 03(3): [7] Fortunato S, Barthélemy M. Resolution limit in community detection. Proc. Natl. Acad. Sci. USA, 007, 04(): [8] Shen H, Cheng X, Fang B. Covariance, correlation matrix, and the multiscale community structure of networks. Phys. Rev. E, 00, 8(): 064. [9] Newman M E J. Fast algorithm for detecting community structure in networks. Phys. Rev. E, 004, 69(6): [30] Clauset A, Newman M E J, Moore C. Finding community structure in very large networks. Phys. Rev. E, 004, 70(6): 066. [3] Duch L, Arenas A. Community detection in complex networks using extremal optimization. Phys. Rev. E, 005, 7(): [3] Medus A, Acuna G, Dorso C O. Detection of community structures in networks via global optimization. Physica A, 005, 358(-4): [33] Newman M E J. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E, 006, 74(3): [34] Schuetz P, Caflisch A. Efficient modularity optimization by multistep greedy algorithm and vertex refinement. Phys. Rev. E, 008, 77(4): 046. [35] Blondel V D, Guillaume J L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J. Stat. Mech., 008, 0: P0008. [36] Zhang X S, Wang R S, Wang Y, Wang J, Qiu Y, Wang L, Chen L. Modularity optimization in community detection of complex networks. Europhys. Lett., 009, 87(3): [37] Liu X, Murata T. Advanced modularity-specialized label propagation algorithm for detecting communities in networks. Physica A, 00, 389(7): [38] Gao B, Liu T Y, Zheng X, Cheng Q S, Ma W Y. Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In Proc. the th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, USA, Aug. -4, 005, pp [39] Greco G, Guzzo A, Pontieri L. An information-theoretic framework for high-order co-clustering of heterogeneous objects. In Proc. the 5th Italian Symposium on Advanced Database Systems, Torre Canne, Italy, Jun. 7-0, 007, pp [40] Long B, Zhang Z F, Wu X Y, Yu P S. Spectral clustering for multi-type relational data. In Proc. the 3rd International Conference on Machine Learning, Pittsburgh, USA, Jun. 5-9, 006, pp [4] Long B, Wu X, Zhang Z, Yu P S. Unsupervised learning on k-partite graphs. In Proc. the th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, USA, Aug. 0-3, 006, pp [4] Singh A P, Gordon G J. Relational learning via collective matrix factorization. In Proc. the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp , Las Vegas, USA, Aug. 4-7, 008. [43] Singh A P, Gordon G J. A unified view of matrix factorization models. In Proc. the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Antwerp, Belgium, Sept. 5-9, 008, pp [44] Cattuto C, Schmitz C, Baldassarri A, Servedio V D P, Loreto V, Hotho A, Grahl M, Stumme G. Network properties of folksonomies. AI Communications, 007, 0(4): [45] Halpin H, Robu V, Shepherd H. The complex dynamics of collaborative tagging. In Proc. the 6th International Conference on World Wide Web, Banff, Canada, May 8-, 007, pp.-0. [46] Long B, Wu X, Zhang Z, Yu P S. Community learning by graph approximation. In Proc. the 7th IEEE International Conference on Data Mining, Omaha, USA, Oct. 8-3, 007, pp.3-4. [47] Long B, Zhang Z, Yu P S, Xu T. Clustering on complex graphs. In Proc. the 3rd National Conference on Artificial Intelligence, Chicago, USA, Jul. 3-7, 008, pp [48] Rissanen J. Modelling by shortest data description. Automatica, 978, 4(5): [49] Brandes U, Delling D, Gaertler M, Görke R, Hoefer M, Nikolski Z, Wagner D. On modularity np-completeness and beyond. Technical Report 006-9, ITI Wagner, Faculty of Informatics, Universität Karlsruhe, 006. [50] Raghavan U N, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E, 007, 76(3): [5] Barber M J, Clark J W. Detecting network communities by propagating labels under constraints. Phys. Rev. E, 009, 80(): 069. [5] Arenas A, Duch J, Fernández A, Gómez S. Size reduction of complex networks preserving modularity. New Journal of Physics, 007, 9: 76. [53] Davis A, Gardner B B, Gardner M R. Deep South. Chicago: University of Chicago Press, IL, 94. [54] Breiger R, Carley K, Pattison P (eds.) Dynamic Social Network Modeling and Analysis: Workshop Summary and Papers, Washington, DC: The National Academics Press, USA, 003. [55] Jaccard P. Étude comparative de la distribution florale dans une portion des alpes et des jura. Bull. Soc. Vaudoise Sci. Nat., 90, 37: [56] Lü L, Zhou T. Link prediction in complex networks: A survey. Physica A, 0, 390: Xin Liu is a Ph.D. candidate in the Department of Computer Science, Tokyo Institute of Technology. He received the B.S. degree in computing and information science from Wuhan University of Technology in 004, and the M.S. degree in computer science from Wuhan University in 007. His research interests include Web mining and social network analysis.

13 790 J. Comput. Sci. & Technol., Sept. 0, Vol.6, No.5 Tsuyoshi Murata is an associate professor in the Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology. He obtained his doctor s degree in computer science at Tokyo Institute of Technology in 997, on the topic of machine discovery of geometrical theorems. At Tokyo Institute of Technology, he conducts research on Web mining, artificial intelligence, and social network analysis. He is a member of IEEE, AAAI, ACM, JSAI, IPSJ and JSSST. Appendix Implementation Issues In Appendix, we focus on the implementation issues of Algorithm. The core of the algorithm is LPA, where we sequentially and repeatedly update each node s label to reach a local minimum of Q H. Without loss of generality, suppose we are now updating the label of node v x (). The updating rule is S () x where Q H (S () x v () x = argmin(q H (S x () = l)) l = l) denotes the quality function for taking the label l. This rule can be rewritten as S () x = argmin(φ(l) + Ψ(l)) l Φ(l) = c () α = c (K) α K = log log ( n () + ( n () i = i x i = i x n () i = n () i = i x )( n () δ(s () i, l) n () i = n (K) = i = ) ( n (K) δ(s () i, α ) = δ(s (K!), α K ) A xi δ(s () i, α ) δ(s (K), α K )+ n (K) = )( n () δ(s () i, l) n () i = i x n () i = i = n (K) = A i δ(s () i, l) δ(s (K), α K ) ) ( n (K) δ(s () i, α ) = δ(s (K), α K ) A i δ(s () i, l) δ(s (K), α K ) K n () n () log(c () + ) n () log c () + c (k) log(m + ), if δ(s () i, l) = 0; Ψ(l) = k= i = i x 0, otherwise. ) ) + where δ(, ) is the Kronecker s delta (the proof is omitted here due to space constraints). In implementation, the numbers of (hyper)edges between communities, namely M, are kept in real-time. Other stored data include the (hyper)network s connectivity array A, the community membership vector S (k), and the number of nodes in each community n (k) α k. To check a candidate label l as v x () s new label, we need to calculate Φ(l) and Ψ(l). As for Φ(l), we traverse communities that have connections with v x () or community labeled l. This operation requires a time of O( d () ), where d () is the average degree of nodes in V (). Ψ(l) can be calculated in O() time. Suppose we consider all existing labels as candidate labels. In the worst case (i.e., the initial stage when each node forms its own community), the number of candidate labels is as many as n (). Then, the running time of updating v () x s label would be O( d () n () ) = O(m). Further, one step of label updating (i.e., sequentially updating each node s label once) requires a time of O(m K k= n(k) ). Relatively few steps are needed for LPA to converge [37,50-5]. The number of passes of Phase and Phase in practice is also a small number (see the running details shown below). As a result, the overall time complexity of our algorithm is O(m K k= n(k) ). The speed of the algorithm can be further improved. Possible techniques are as follows.

Community Detection: Comparison of State of the Art Algorithms

Community Detection: Comparison of State of the Art Algorithms Community Detection: Comparison of State of the Art Algorithms Josiane Mothe IRIT, UMR5505 CNRS & ESPE, Univ. de Toulouse Toulouse, France e-mail: josiane.mothe@irit.fr Karen Mkhitaryan Institute for Informatics

More information

Community Detection in Bipartite Networks:

Community Detection in Bipartite Networks: Community Detection in Bipartite Networks: Algorithms and Case Studies Kathy Horadam and Taher Alzahrani Mathematical and Geospatial Sciences, RMIT Melbourne, Australia IWCNA 2014 Community Detection,

More information

An Efficient Algorithm for Community Detection in Complex Networks

An Efficient Algorithm for Community Detection in Complex Networks An Efficient Algorithm for Community Detection in Complex Networks Qiong Chen School of Computer Science & Engineering South China University of Technology Guangzhou Higher Education Mega Centre Panyu

More information

Network community detection with edge classifiers trained on LFR graphs

Network community detection with edge classifiers trained on LFR graphs Network community detection with edge classifiers trained on LFR graphs Twan van Laarhoven and Elena Marchiori Department of Computer Science, Radboud University Nijmegen, The Netherlands Abstract. Graphs

More information

A Novel Parallel Hierarchical Community Detection Method for Large Networks

A Novel Parallel Hierarchical Community Detection Method for Large Networks A Novel Parallel Hierarchical Community Detection Method for Large Networks Ping Lu Shengmei Luo Lei Hu Yunlong Lin Junyang Zou Qiwei Zhong Kuangyan Zhu Jian Lu Qiao Wang Southeast University, School of

More information

Community detection using boundary nodes in complex networks

Community detection using boundary nodes in complex networks Community detection using boundary nodes in complex networks Mursel Tasgin and Haluk O. Bingol Department of Computer Engineering Bogazici University, Istanbul In this paper, we propose a new community

More information

Community Detection in Directed Weighted Function-call Networks

Community Detection in Directed Weighted Function-call Networks Community Detection in Directed Weighted Function-call Networks Zhengxu Zhao 1, Yang Guo *2, Weihua Zhao 3 1,3 Shijiazhuang Tiedao University, Shijiazhuang, Hebei, China 2 School of Mechanical Engineering,

More information

Research on Community Structure in Bus Transport Networks

Research on Community Structure in Bus Transport Networks Commun. Theor. Phys. (Beijing, China) 52 (2009) pp. 1025 1030 c Chinese Physical Society and IOP Publishing Ltd Vol. 52, No. 6, December 15, 2009 Research on Community Structure in Bus Transport Networks

More information

Reflexive Regular Equivalence for Bipartite Data

Reflexive Regular Equivalence for Bipartite Data Reflexive Regular Equivalence for Bipartite Data Aaron Gerow 1, Mingyang Zhou 2, Stan Matwin 1, and Feng Shi 3 1 Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada 2 Department of Computer

More information

A Simple Acceleration Method for the Louvain Algorithm

A Simple Acceleration Method for the Louvain Algorithm A Simple Acceleration Method for the Louvain Algorithm Naoto Ozaki, Hiroshi Tezuka, Mary Inaba * Graduate School of Information Science and Technology, University of Tokyo, Tokyo, Japan. * Corresponding

More information

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan Explore Co-clustering on Job Applications Qingyun Wan SUNet ID:qywan 1 Introduction In the job marketplace, the supply side represents the job postings posted by job posters and the demand side presents

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Generalized Modularity for Community Detection

Generalized Modularity for Community Detection Generalized Modularity for Community Detection Mohadeseh Ganji 1,3, Abbas Seifi 1, Hosein Alizadeh 2, James Bailey 3, and Peter J. Stuckey 3 1 Amirkabir University of Technology, Tehran, Iran, aseifi@aut.ac.ir,

More information

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian Demystifying movie ratings 224W Project Report Amritha Raghunath (amrithar@stanford.edu) Vignesh Ganapathi Subramanian (vigansub@stanford.edu) 9 December, 2014 Introduction The past decade or so has seen

More information

Keywords: dynamic Social Network, Community detection, Centrality measures, Modularity function.

Keywords: dynamic Social Network, Community detection, Centrality measures, Modularity function. Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Efficient

More information

Study of Data Mining Algorithm in Social Network Analysis

Study of Data Mining Algorithm in Social Network Analysis 3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Study of Data Mining Algorithm in Social Network Analysis Chang Zhang 1,a, Yanfeng Jin 1,b, Wei Jin 1,c, Yu Liu 1,d 1

More information

Expected Nodes: a quality function for the detection of link communities

Expected Nodes: a quality function for the detection of link communities Expected Nodes: a quality function for the detection of link communities Noé Gaumont 1, François Queyroi 2, Clémence Magnien 1 and Matthieu Latapy 1 1 Sorbonne Universités, UPMC Univ Paris 06, UMR 7606,

More information

On community detection in very large networks

On community detection in very large networks On community detection in very large networks Alexandre P. Francisco and Arlindo L. Oliveira INESC-ID / CSE Dept, IST, Tech Univ of Lisbon Rua Alves Redol 9, 1000-029 Lisboa, PT {aplf,aml}@inesc-id.pt

More information

CEIL: A Scalable, Resolution Limit Free Approach for Detecting Communities in Large Networks

CEIL: A Scalable, Resolution Limit Free Approach for Detecting Communities in Large Networks CEIL: A Scalable, Resolution Limit Free Approach for Detecting Communities in Large etworks Vishnu Sankar M IIT Madras Chennai, India vishnusankar151gmail.com Balaraman Ravindran IIT Madras Chennai, India

More information

An Empirical Analysis of Communities in Real-World Networks

An Empirical Analysis of Communities in Real-World Networks An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization

More information

My favorite application using eigenvalues: partitioning and community detection in social networks

My favorite application using eigenvalues: partitioning and community detection in social networks My favorite application using eigenvalues: partitioning and community detection in social networks Will Hobbs February 17, 2013 Abstract Social networks are often organized into families, friendship groups,

More information

Efficient FM Algorithm for VLSI Circuit Partitioning

Efficient FM Algorithm for VLSI Circuit Partitioning Efficient FM Algorithm for VLSI Circuit Partitioning M.RAJESH #1, R.MANIKANDAN #2 #1 School Of Comuting, Sastra University, Thanjavur-613401. #2 Senior Assistant Professer, School Of Comuting, Sastra University,

More information

An information-theoretic framework for resolving community structure in complex networks

An information-theoretic framework for resolving community structure in complex networks An information-theoretic framework for resolving community structure in complex networks Martin Rosvall, and Carl T. Bergstrom PNAS published online Apr 23, 2007; doi:10.1073/pnas.0611034104 This information

More information

On Reduct Construction Algorithms

On Reduct Construction Algorithms 1 On Reduct Construction Algorithms Yiyu Yao 1, Yan Zhao 1 and Jue Wang 2 1 Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao, yanzhao}@cs.uregina.ca 2 Laboratory

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

arxiv: v2 [cs.si] 22 Mar 2013

arxiv: v2 [cs.si] 22 Mar 2013 Community Structure Detection in Complex Networks with Partial Background Information Zhong-Yuan Zhang a arxiv:1210.2018v2 [cs.si] 22 Mar 2013 Abstract a School of Statistics, Central University of Finance

More information

2007 by authors and 2007 World Scientific Publishing Company

2007 by authors and 2007 World Scientific Publishing Company Electronic version of an article published as J. M. Kumpula, J. Saramäki, K. Kaski, J. Kertész, Limited resolution and multiresolution methods in complex network community detection, Fluctuation and Noise

More information

Discovery of Community Structure in Complex Networks Based on Resistance Distance and Center Nodes

Discovery of Community Structure in Complex Networks Based on Resistance Distance and Center Nodes Journal of Computational Information Systems 8: 23 (2012) 9807 9814 Available at http://www.jofcis.com Discovery of Community Structure in Complex Networks Based on Resistance Distance and Center Nodes

More information

On the Permanence of Vertices in Network Communities. Tanmoy Chakraborty Google India PhD Fellow IIT Kharagpur, India

On the Permanence of Vertices in Network Communities. Tanmoy Chakraborty Google India PhD Fellow IIT Kharagpur, India On the Permanence of Vertices in Network Communities Tanmoy Chakraborty Google India PhD Fellow IIT Kharagpur, India 20 th ACM SIGKDD, New York City, Aug 24-27, 2014 Tanmoy Chakraborty Niloy Ganguly IIT

More information

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Gautam Bhat, Rajeev Kumar Singh Department of Computer Science and Engineering Shiv Nadar University Gautam Buddh Nagar,

More information

EXTREMAL OPTIMIZATION AND NETWORK COMMUNITY STRUCTURE

EXTREMAL OPTIMIZATION AND NETWORK COMMUNITY STRUCTURE EXTREMAL OPTIMIZATION AND NETWORK COMMUNITY STRUCTURE Noémi Gaskó Department of Computer Science, Babeş-Bolyai University, Cluj-Napoca, Romania gaskonomi@cs.ubbcluj.ro Rodica Ioana Lung Department of Statistics,

More information

A Game Map Complexity Measure Based on Hamming Distance Yan Li, Pan Su, and Wenliang Li

A Game Map Complexity Measure Based on Hamming Distance Yan Li, Pan Su, and Wenliang Li Physics Procedia 22 (2011) 634 640 2011 International Conference on Physics Science and Technology (ICPST 2011) A Game Map Complexity Measure Based on Hamming Distance Yan Li, Pan Su, and Wenliang Li Collage

More information

Hierarchical Overlapping Community Discovery Algorithm Based on Node purity

Hierarchical Overlapping Community Discovery Algorithm Based on Node purity Hierarchical Overlapping ommunity Discovery Algorithm Based on Node purity Guoyong ai, Ruili Wang, and Guobin Liu Guilin University of Electronic Technology, Guilin, Guangxi, hina ccgycai@guet.edu.cn,

More information

Nearly-optimal associative memories based on distributed constant weight codes

Nearly-optimal associative memories based on distributed constant weight codes Nearly-optimal associative memories based on distributed constant weight codes Vincent Gripon Electronics and Computer Enginering McGill University Montréal, Canada Email: vincent.gripon@ens-cachan.org

More information

Generalized Louvain method for community detection in large networks

Generalized Louvain method for community detection in large networks Generalized Louvain method for community detection in large networks Pasquale De Meo, Emilio Ferrara, Giacomo Fiumara, Alessandro Provetti, Dept. of Physics, Informatics Section. Dept. of Mathematics.

More information

SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS

SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS Ren Wang, Andong Wang, Talat Iqbal Syed and Osmar R. Zaïane Department of Computing Science, University of Alberta, Canada ABSTRACT

More information

PERFORMANCE OF THE DISTRIBUTED KLT AND ITS APPROXIMATE IMPLEMENTATION

PERFORMANCE OF THE DISTRIBUTED KLT AND ITS APPROXIMATE IMPLEMENTATION 20th European Signal Processing Conference EUSIPCO 2012) Bucharest, Romania, August 27-31, 2012 PERFORMANCE OF THE DISTRIBUTED KLT AND ITS APPROXIMATE IMPLEMENTATION Mauricio Lara 1 and Bernard Mulgrew

More information

Maximizing edge-ratio is NP-complete

Maximizing edge-ratio is NP-complete Maximizing edge-ratio is NP-complete Steven D Noble, Pierre Hansen and Nenad Mladenović February 7, 01 Abstract Given a graph G and a bipartition of its vertices, the edge-ratio is the minimum for both

More information

arxiv: v1 [physics.data-an] 27 Sep 2007

arxiv: v1 [physics.data-an] 27 Sep 2007 Community structure in directed networks E. A. Leicht 1 and M. E. J. Newman 1, 2 1 Department of Physics, University of Michigan, Ann Arbor, MI 48109, U.S.A. 2 Center for the Study of Complex Systems,

More information

An Edge-Swap Heuristic for Finding Dense Spanning Trees

An Edge-Swap Heuristic for Finding Dense Spanning Trees Theory and Applications of Graphs Volume 3 Issue 1 Article 1 2016 An Edge-Swap Heuristic for Finding Dense Spanning Trees Mustafa Ozen Bogazici University, mustafa.ozen@boun.edu.tr Hua Wang Georgia Southern

More information

Limitations of Matrix Completion via Trace Norm Minimization

Limitations of Matrix Completion via Trace Norm Minimization Limitations of Matrix Completion via Trace Norm Minimization ABSTRACT Xiaoxiao Shi Computer Science Department University of Illinois at Chicago xiaoxiao@cs.uic.edu In recent years, compressive sensing

More information

Study on A Recommendation Algorithm of Crossing Ranking in E- commerce

Study on A Recommendation Algorithm of Crossing Ranking in E- commerce International Journal of u-and e-service, Science and Technology, pp.53-62 http://dx.doi.org/10.14257/ijunnesst2014.7.4.6 Study on A Recommendation Algorithm of Crossing Ranking in E- commerce Duan Xueying

More information

Statistical Physics of Community Detection

Statistical Physics of Community Detection Statistical Physics of Community Detection Keegan Go (keegango), Kenji Hata (khata) December 8, 2015 1 Introduction Community detection is a key problem in network science. Identifying communities, defined

More information

V4 Matrix algorithms and graph partitioning

V4 Matrix algorithms and graph partitioning V4 Matrix algorithms and graph partitioning - Community detection - Simple modularity maximization - Spectral modularity maximization - Division into more than two groups - Other algorithms for community

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be

More information

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr

More information

Progress Report: Collaborative Filtering Using Bregman Co-clustering

Progress Report: Collaborative Filtering Using Bregman Co-clustering Progress Report: Collaborative Filtering Using Bregman Co-clustering Wei Tang, Srivatsan Ramanujam, and Andrew Dreher April 4, 2008 1 Introduction Analytics are becoming increasingly important for business

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

Modularity CMSC 858L

Modularity CMSC 858L Modularity CMSC 858L Module-detection for Function Prediction Biological networks generally modular (Hartwell+, 1999) We can try to find the modules within a network. Once we find modules, we can look

More information

Using the Kolmogorov-Smirnov Test for Image Segmentation

Using the Kolmogorov-Smirnov Test for Image Segmentation Using the Kolmogorov-Smirnov Test for Image Segmentation Yong Jae Lee CS395T Computational Statistics Final Project Report May 6th, 2009 I. INTRODUCTION Image segmentation is a fundamental task in computer

More information

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques Kouhei Sugiyama, Hiroyuki Ohsaki and Makoto Imase Graduate School of Information Science and Technology,

More information

Link Prediction for Social Network

Link Prediction for Social Network Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue

More information

Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data

Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data D.Radha Rani 1, A.Vini Bharati 2, P.Lakshmi Durga Madhuri 3, M.Phaneendra Babu 4, A.Sravani 5 Department

More information

Constrained Clustering with Interactive Similarity Learning

Constrained Clustering with Interactive Similarity Learning SCIS & ISIS 2010, Dec. 8-12, 2010, Okayama Convention Center, Okayama, Japan Constrained Clustering with Interactive Similarity Learning Masayuki Okabe Toyohashi University of Technology Tenpaku 1-1, Toyohashi,

More information

An Improved KNN Classification Algorithm based on Sampling

An Improved KNN Classification Algorithm based on Sampling International Conference on Advances in Materials, Machinery, Electrical Engineering (AMMEE 017) An Improved KNN Classification Algorithm based on Sampling Zhiwei Cheng1, a, Caisen Chen1, b, Xuehuan Qiu1,

More information

A Patent Retrieval Method Using a Hierarchy of Clusters at TUT

A Patent Retrieval Method Using a Hierarchy of Clusters at TUT A Patent Retrieval Method Using a Hierarchy of Clusters at TUT Hironori Doi Yohei Seki Masaki Aono Toyohashi University of Technology 1-1 Hibarigaoka, Tenpaku-cho, Toyohashi-shi, Aichi 441-8580, Japan

More information

arxiv: v2 [physics.soc-ph] 16 Sep 2010

arxiv: v2 [physics.soc-ph] 16 Sep 2010 Community detection algorithms: a comparative analysis Andrea Lancichinetti, 2 and Santo Fortunato Complex Networks and Systems, Institute for Scientific Interchange (ISI), Viale S. Severo 65, 33, Torino,

More information

CUT: Community Update and Tracking in Dynamic Social Networks

CUT: Community Update and Tracking in Dynamic Social Networks CUT: Community Update and Tracking in Dynamic Social Networks Hao-Shang Ma National Cheng Kung University No.1, University Rd., East Dist., Tainan City, Taiwan ablove904@gmail.com ABSTRACT Social network

More information

Tag Based Image Search by Social Re-ranking

Tag Based Image Search by Social Re-ranking Tag Based Image Search by Social Re-ranking Vilas Dilip Mane, Prof.Nilesh P. Sable Student, Department of Computer Engineering, Imperial College of Engineering & Research, Wagholi, Pune, Savitribai Phule

More information

Weighted compactness function based label propagation algorithm for community detection

Weighted compactness function based label propagation algorithm for community detection Accepted Manuscript Weighted compactness function based label propagation algorithm for community detection Weitong Zhang, Rui Zhang, Ronghua Shang, Licheng Jiao PII: S0378-4371(17)31078-6 DOI: https://doi.org/10.1016/j.physa.2017.11.006

More information

Complex networks: A mixture of power-law and Weibull distributions

Complex networks: A mixture of power-law and Weibull distributions Complex networks: A mixture of power-law and Weibull distributions Ke Xu, Liandong Liu, Xiao Liang State Key Laboratory of Software Development Environment Beihang University, Beijing 100191, China Abstract:

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

A Two-phase Distributed Training Algorithm for Linear SVM in WSN

A Two-phase Distributed Training Algorithm for Linear SVM in WSN Proceedings of the World Congress on Electrical Engineering and Computer Systems and Science (EECSS 015) Barcelona, Spain July 13-14, 015 Paper o. 30 A wo-phase Distributed raining Algorithm for Linear

More information

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks SOMSN: An Effective Self Organizing Map for Clustering of Social Networks Fatemeh Ghaemmaghami Research Scholar, CSE and IT Dept. Shiraz University, Shiraz, Iran Reza Manouchehri Sarhadi Research Scholar,

More information

Hierarchical Document Clustering

Hierarchical Document Clustering Hierarchical Document Clustering Benjamin C. M. Fung, Ke Wang, and Martin Ester, Simon Fraser University, Canada INTRODUCTION Document clustering is an automatic grouping of text documents into clusters

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Structured Light II Johannes Köhler Johannes.koehler@dfki.de Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Introduction Previous lecture: Structured Light I Active Scanning Camera/emitter

More information

A. Papadopoulos, G. Pallis, M. D. Dikaiakos. Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks

A. Papadopoulos, G. Pallis, M. D. Dikaiakos. Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks A. Papadopoulos, G. Pallis, M. D. Dikaiakos Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks IEEE/WIC/ACM International Conference on Web Intelligence Nov.

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Collaborative Filtering using Euclidean Distance in Recommendation Engine

Collaborative Filtering using Euclidean Distance in Recommendation Engine Indian Journal of Science and Technology, Vol 9(37), DOI: 10.17485/ijst/2016/v9i37/102074, October 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Collaborative Filtering using Euclidean Distance

More information

A New Pool Control Method for Boolean Compressed Sensing Based Adaptive Group Testing

A New Pool Control Method for Boolean Compressed Sensing Based Adaptive Group Testing Proceedings of APSIPA Annual Summit and Conference 27 2-5 December 27, Malaysia A New Pool Control Method for Boolean Compressed Sensing Based Adaptive roup Testing Yujia Lu and Kazunori Hayashi raduate

More information

Basics of Network Analysis

Basics of Network Analysis Basics of Network Analysis Hiroki Sayama sayama@binghamton.edu Graph = Network G(V, E): graph (network) V: vertices (nodes), E: edges (links) 1 Nodes = 1, 2, 3, 4, 5 2 3 Links = 12, 13, 15, 23,

More information

Dynamic network generative model

Dynamic network generative model Dynamic network generative model Habiba, Chayant Tantipathanananandh, Tanya Berger-Wolf University of Illinois at Chicago. In this work we present a statistical model for generating realistic dynamic networks

More information

A New Heuristic Layout Algorithm for Directed Acyclic Graphs *

A New Heuristic Layout Algorithm for Directed Acyclic Graphs * A New Heuristic Layout Algorithm for Directed Acyclic Graphs * by Stefan Dresbach Lehrstuhl für Wirtschaftsinformatik und Operations Research Universität zu Köln Pohligstr. 1, 50969 Köln revised August

More information

Visible and Long-Wave Infrared Image Fusion Schemes for Situational. Awareness

Visible and Long-Wave Infrared Image Fusion Schemes for Situational. Awareness Visible and Long-Wave Infrared Image Fusion Schemes for Situational Awareness Multi-Dimensional Digital Signal Processing Literature Survey Nathaniel Walker The University of Texas at Austin nathaniel.walker@baesystems.com

More information

A Memetic Heuristic for the Co-clustering Problem

A Memetic Heuristic for the Co-clustering Problem A Memetic Heuristic for the Co-clustering Problem Mohammad Khoshneshin 1, Mahtab Ghazizadeh 2, W. Nick Street 1, and Jeffrey W. Ohlmann 1 1 The University of Iowa, Iowa City IA 52242, USA {mohammad-khoshneshin,nick-street,jeffrey-ohlmann}@uiowa.edu

More information

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,

More information

SOME stereo image-matching methods require a user-selected

SOME stereo image-matching methods require a user-selected IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 3, NO. 2, APRIL 2006 207 Seed Point Selection Method for Triangle Constrained Image Matching Propagation Qing Zhu, Bo Wu, and Zhi-Xiang Xu Abstract In order

More information

Bipartite Graph Partitioning and Content-based Image Clustering

Bipartite Graph Partitioning and Content-based Image Clustering Bipartite Graph Partitioning and Content-based Image Clustering Guoping Qiu School of Computer Science The University of Nottingham qiu @ cs.nott.ac.uk Abstract This paper presents a method to model the

More information

Rectangular Matrix Multiplication Revisited

Rectangular Matrix Multiplication Revisited JOURNAL OF COMPLEXITY, 13, 42 49 (1997) ARTICLE NO. CM970438 Rectangular Matrix Multiplication Revisited Don Coppersmith IBM Research, T. J. Watson Research Center, Yorktown Heights, New York 10598 Received

More information

Response Network Emerging from Simple Perturbation

Response Network Emerging from Simple Perturbation Journal of the Korean Physical Society, Vol 44, No 3, March 2004, pp 628 632 Response Network Emerging from Simple Perturbation S-W Son, D-H Kim, Y-Y Ahn and H Jeong Department of Physics, Korea Advanced

More information

An Improved k-shell Decomposition for Complex Networks Based on Potential Edge Weights

An Improved k-shell Decomposition for Complex Networks Based on Potential Edge Weights International Journal of Applied Mathematical Sciences ISSN 0973-0176 Volume 9, Number 2 (2016), pp. 163-168 Research India Publications http://www.ripublication.com An Improved k-shell Decomposition for

More information

CONTENT ADAPTIVE SCREEN IMAGE SCALING

CONTENT ADAPTIVE SCREEN IMAGE SCALING CONTENT ADAPTIVE SCREEN IMAGE SCALING Yao Zhai (*), Qifei Wang, Yan Lu, Shipeng Li University of Science and Technology of China, Hefei, Anhui, 37, China Microsoft Research, Beijing, 8, China ABSTRACT

More information

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing 244 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 10, NO 2, APRIL 2002 Heuristic Algorithms for Multiconstrained Quality-of-Service Routing Xin Yuan, Member, IEEE Abstract Multiconstrained quality-of-service

More information

Motivation. Technical Background

Motivation. Technical Background Handling Outliers through Agglomerative Clustering with Full Model Maximum Likelihood Estimation, with Application to Flow Cytometry Mark Gordon, Justin Li, Kevin Matzen, Bryce Wiedenbeck Motivation Clustering

More information

A Comparison of Pattern-Based Spectral Clustering Algorithms in Directed Weighted Network

A Comparison of Pattern-Based Spectral Clustering Algorithms in Directed Weighted Network A Comparison of Pattern-Based Spectral Clustering Algorithms in Directed Weighted Network Sumuya Borjigin 1. School of Economics and Management, Inner Mongolia University, No.235 West College Road, Hohhot,

More information

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University. 3D Computer Vision Structured Light II Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1 Introduction

More information

Clustering Documents in Large Text Corpora

Clustering Documents in Large Text Corpora Clustering Documents in Large Text Corpora Bin He Faculty of Computer Science Dalhousie University Halifax, Canada B3H 1W5 bhe@cs.dal.ca http://www.cs.dal.ca/ bhe Yongzheng Zhang Faculty of Computer Science

More information

Efficient Mining Algorithms for Large-scale Graphs

Efficient Mining Algorithms for Large-scale Graphs Efficient Mining Algorithms for Large-scale Graphs Yasunari Kishimoto, Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka Abstract This article describes efficient graph mining algorithms designed

More information

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases Jinlong Wang, Congfu Xu, Hongwei Dan, and Yunhe Pan Institute of Artificial Intelligence, Zhejiang University Hangzhou, 310027,

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

On the Approximability of Modularity Clustering

On the Approximability of Modularity Clustering On the Approximability of Modularity Clustering Newman s Community Finding Approach for Social Nets Bhaskar DasGupta Department of Computer Science University of Illinois at Chicago Chicago, IL 60607,

More information

Image Classification Using Wavelet Coefficients in Low-pass Bands

Image Classification Using Wavelet Coefficients in Low-pass Bands Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August -7, 007 Image Classification Using Wavelet Coefficients in Low-pass Bands Weibao Zou, Member, IEEE, and Yan

More information

TELCOM2125: Network Science and Analysis

TELCOM2125: Network Science and Analysis School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning

More information

From Centrality to Temporary Fame: Dynamic Centrality in Complex Networks

From Centrality to Temporary Fame: Dynamic Centrality in Complex Networks From Centrality to Temporary Fame: Dynamic Centrality in Complex Networks Dan Braha 1, 2 and Yaneer Bar-Yam 2 1 University of Massachusetts Dartmouth, MA 02747, USA 2 New England Complex Systems Institute

More information

Scalable Clustering of Signed Networks Using Balance Normalized Cut

Scalable Clustering of Signed Networks Using Balance Normalized Cut Scalable Clustering of Signed Networks Using Balance Normalized Cut Kai-Yang Chiang,, Inderjit S. Dhillon The 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) Oct.

More information

HFCT: A Hybrid Fuzzy Clustering Method for Collaborative Tagging

HFCT: A Hybrid Fuzzy Clustering Method for Collaborative Tagging 007 International Conference on Convergence Information Technology HFCT: A Hybrid Fuzzy Clustering Method for Collaborative Tagging Lixin Han,, Guihai Chen Department of Computer Science and Engineering,

More information

A new method for community detection in social networks based on message distribution

A new method for community detection in social networks based on message distribution 298 A new method for community detection in social networks based on message distribution Seyed Saeid Hoseini 1*, Seyed Hamid Abbasi 2 1 Student, Department of Computer Engineering, Islamic Azad University,

More information

Community Detection based on Structural and Attribute Similarities

Community Detection based on Structural and Attribute Similarities Community Detection based on Structural and Attribute Similarities The Anh Dang, Emmanuel Viennet L2TI - Institut Galilée - Université Paris-Nord 99, avenue Jean-Baptiste Clément - 93430 Villetaneuse -

More information

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University

More information