CONE: Community Oriented Network Embedding

Size: px
Start display at page:

Download "CONE: Community Oriented Network Embedding"

Transcription

1 : Community Oriented Network Embedding Carl Yang University of Illinois, Urbana Champaign 201 N. Goodwin Ave Urbana, IL 61801, USA Hanqing Lu Zhejiang University 38 Zheda Rd Hangzhou, , China Kevin Chen-Chuan Chang University of Illinois, Urbana Champaign 201 N. Goodwin Ave Urbana, IL 61801, USA ABSTRACT Detecting underlying communities has long been a popular topic in the research on networks. It is usually modeled as an unsupervised clustering problem on graphs, based on some heuristic assumptions about the characteristics of communities, such as edge density and node homogeneity. In this work, we doubt the universality of these widely adopted assumptions and compare human labeled communities with machine predicted ones obtained via various mainstream algorithms. Based on supportive results, we argue that communities are defined by various social patterns and unsupervised learning based on heuristics is incapable of capturing all such patterns. Therefore, we propose to inject supervision into community detection through Community Oriented Network Embedding (), which leverages limited community examples to learn the latent representations of nodes that are aware of the underlying social patterns.specifically, a deep architecture is developed based on Recurrent Neural Networks (RNN) and improved towards learning an embedding that captures important link structures and node attributes that lead to the formation of ground-truth communities. Generic clustering algorithms in the embedded space then effectively reveals more communities that share similar social patterns with the ground-truth ones. Categories and Subject Descriptors H.4.0 [Information Systems]: General Keywords Community detection, supervised learning, network embedding, deep architecture 1. INTRODUCTION One of the most popular topics in the research on networks is identifying network communities. On the one hand, Equal contribution Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 20XX ACM X-XXXXX-XX-X/XX/XX...$ Student PhD Secretory Professor Dean Figure 1: Toy example of a CS research group as networks are growing larger than ever before, it is efficient and necessary to look into smaller sub-networks, which consist of specific groups of interacting objects (i.e., nodes) and their links (i.e., edges). On the other hand, the knowledge of community structures allows us to better understand the status of an object within a group and the relationships between it and its peers, so as to enable multiple benefits including the discovery of functionally related objects [23], the study of interactions between modules [2], the inference of missing attribute values [17], the prediction of unobserved connections [8] and so on. Many algorithms aim to solve this problem [3, 7, 9, 18, 31, 33, 35]. However, they are all formulated based on the heuristic assumptions of edge density and node homogeneity, i.e., communities are constructed by densely connected nodes and nodes within one community are all homogeneous in some ways. Most surprisingly, even the ground-truth community labels used for evaluation in many standard datasets are generated by machines based on these two assumptions, rather than humans [32]. In this work, we doubt the universality of the two widely trusted assumptions. To see how these two assumptions do not necessarily hold true, consider the scenario in Figure 1, where the community is a computer science (CS) research group including students, staff and faculties. This community is naturally a valid community as everyone is affiliated to the latent group. However, it is not densely connected as it is highly likely that there are people who have different schedules and thus have never met each other. The community is not homogeneous in a specific way either. Students may have similar age range, salary range and come from the same hometown, but these are quite different from those of the staff and faculties. Therefore, unsupervised community detection algorithms based on edge density and node homogeneity can hardly detect communities like this. Although neither of the density and homogeneity charac-

2 teristics is prominent in the community of Figure 1, there are indeed some interesting social patterns. E.g., a pattern of star is clearly formed among the PhD and the students around, and the secretory obviously acts as a connecting bridge. Those patterns are characterized by both link structures and node attributes. E.g., the PhD as a star center is different in ages from the students around but may share similar other attributes such as major and research field. The PhD is also densely connected with the students around, but the students may not know each other. On the other hand, the secretory as a bridge is different in attributes like job from any other members in the community, but connects to many of them. These sub-structures are prominent and valuable for defining and detecting communities of specific patterns, e.g., research groups in CS or any other discipline may well share the similar patterns. How can we automatically capture the important social patterns underlying network communities? As we see in Figure 1, the patterns can be complex and different across networks. It is impractical to enumerate all possible ones and design an unsupervised algorithm that can detect communities of all patterns. In this work, we propose to inject weak supervision into community detection. Instead of taking pre-defined assumptions about community characteristics, it is more reliable to let the data tell us what the communities look like. Specifically, given a network, we propose to leverage a few labeled communities as examples, and automatically learn their social patterns as described by some latent features. Those features can then be leveraged for the detection of similar other communities. While it is intuitive to learn the latent features via embedding, a supervised embedding directed by complex social patterns on networks is non-trivial and has never been done. In this work, we develop a deep architecture of Community Oriented Network Embedding (). To capture high dimensional noisy node attributes, we translate them into binary sequences of fixed lengths and leverage the successful recurrent neural networks (RNN) from natural language processing (NLP). To account for link structures, we design a random-walk layer, which fundamentally reconstructs the attribute embeddings according to neighborhood structures. To leverage supervision, we connect a softmax supervision layer and evaluate the empirical loss w.r.t. ground-truth community labels in an end-to-end framework. Unlike other network embedding techniques, decomposes the embedding process into multiple nonlinear layers of a deep architecture. The key advantages of are: Expressive: the deep architecture of makes it powerful in exploring various combinations of attribute values in a high-dimensional space and the global consistency among different ground-truth communities. Linkage aware: the random-walk layer of makes it aware of neighborhood structures as expressed by a high-order stationary distribution, so that link structures are leveraged to capture important sub-networks. Community oriented: is an end-to-end embedding framework, so the supervision of ground-truth communities can be properly propagated to the attribute embedding layer via the random-walk layer, which explores their mutual reinforcement and reliably captures social patterns. Out-of-sample: The output of is an embedding model, rather than the embeddings of a specific set of nodes. Therefore, it is able to handle the out-of-sample problem, which addresses the challenge of limited supervision and dynamic network. 2. MOTIVATING STUDY In this section, we show the deficiency of traditional unsupervised community detection algorithms by quantitatively examining the density and homogeneity assumptions they adopt and their consequences. We conduct a set of analysis using the Facebook dataset described in [18]. This dataset includes 10 ego-networks, consisting of 193 social circles and 4,039 nodes. 10 ego users have manually identified all the communities to which their friends belong. The average number of social circles in each ego-network is 19, with an average community size of 22 users. These social circles were used as the ground-truth communities. Let a graph G = {V, E} represent a network with a vertex set V and an edge set E. A community can be represented as a sub-graph C = {V C, E C} with V C V and E C E. Attr(v) represents the node attribute vector of a vertex v V. We evaluate the density and homogeneity of both ground-truth communities and communities detected by the state-of-the-art algorithms. Metrics. We use a density metric D( ) and a homogeneity metric H( ) defined as following. D(C) = 2 E C V C ( V C 1). (1) The value of D(C) ranges from 0 (a graph without edges) to 1 (a complete clique). It is a standard measure of the density of a network widely used in related literature [32]. Attr(x) Attr(y) σ( H(C) = x,y V C,x<y avg(attr(x) Attr(y)) ) V C ( V C 1). (2) avg(attr(x) Attr(y)) is the average of attribute similarity among all pairs of nodes in the whole network G, and σ(t) = 1 e t 1+e t is the adjusted half sigmoid function in the range of [0, 1]. The value of H(C) also ranges from 0 (none of the nodes share the same attributes) to 1 (all nodes share the same attributes), while the value increases rapidly as the similarity of Attr( ) in C is small compared with the average attribute similarity in the whole network G. Compared algorithms. We use three mainstream community detection algorithms within the state-of-the-art: Min- Cut [9] (based on network modularity), CESNA [33] (based on probabilistic generative model) and InfoMaps [21] (based on information theory). Density and homogeneity analysis. Assumption 1 : nodes within the same community are densely connected. Conflicting evidence: In Figure 2 (a), we present D(C) computed on the ground-truth communities as well as the communities detected by the state-of-the-art algorithms. It can be observed that ground-truth communities do not always have a high density. In fact, in some ego-networks like the one we show, there are more low-density communities

3 than high-density ones. The evidence directly contradicts with Assumption 1. Based on this inaccurate assumption, traditional unsupervised community detection algorithms tend to detect communities with high density. As an example, the distributions of D(C) computed on the detected communities in ego-network 686 are presented in Figure 2 (b-d). It can be observed that most of the detected communities have a high density (D(C) > 0.5), which is inconsistent with the density distribution of the ground-truth communities. It suggests that the inaccurate assumption of edge density indeed leads to unreliable community detection results. (a)ground-truth (c)cesna (b)mincut (d)infomap Figure 2: Community density in ego-net 686. We conduct the same homogeneity analysis on communities detected by the same group of state-of-the-art unsupervised algorithms. As can be seen in Figure 3 (c-d), since they assume node homogeneity in communities, CESNA and InfoMaps do detect communities of slightly higher homogeneity, which diverge from the ground-truth ones. 3. The analysis in Sec 2 clearly demonstrates the deficiency of existing unsupervised community detection algorithms, which is a consequence of their falsifiable assumptions of edge density and node homogeneity. To deal with this deficiency, we propose to inject supervision into community detection, so as to leverage the various social patterns underlying communities in different networks. 3.1 Challenges While it is intuitive to automatically learn the latent features that effectively describe and detect communities via an embedding model that is aware of social patterns, the task is non-trivial and posts some unique challenges: Exploring various combinations of high-dimensional noisy node attributes. Utilizing complex link structures. Leveraging the supervision of community labels. Dealing with limited supervision and dynamic networks. To deal with all challenges above, we develop Community Oriented Network Embedding (), which coherently combines node attributes, link structures and community labels to effectively capture important social patterns. 3.2 Framework (a)ground-truth (b)mincut Problem formulation: Our input is a network modeled by G = {V, E, A, F}, where V is the set of n nodes, E is the set of observed links among V, A is the set of observed node attributes, and F is a set of example community memberships. In order to detect more communities on G, we firstly learn the latent features that describe communities in G. Specifically, we learn an embedding model that transforms each node v V into a p-dimensional vector. In the p-dimensional embedded space, nodes forming each community that shares the similar social patterns with the ones indicated by F are close. Therefore, new communities can then be detected through generic clustering algorithms such as k-means on the p-dimensional features. Overlapping communities can also be discovered by algorithms like MOC [4]. (c)cesna (d)infomaps Figure 3: Community homogeneity in ego-net 698. Assumption 2 : nodes within the same community are homogeneous in some ways. Conflicting evidence: As an example, the distribution of H(C) computed on the ground-truth communities in egonetwork 698 is contradictory to Assumption 2. As shown in Figure 3 (a), few communities are indeed homogeneous (e.g., the one with homogeneity higher than 0.8), while the majority are much less homogeneous. Given σ(1) 0.46, the results indicate that most communities have a homogeneity similar to that of the whole network. Modeling node attributes with recurrent neural network: In real world networks, node attributes A can be noisy, high-dimensional and of various forms. E.g., in textrich networks like Twitter, A can be the set of recent tweets; in tag-rich networks like Flickr, A can be the set of frequently used hashtags; in networks with explicit attributes like Facebook and LinkedIn, A can be a set of categorical variables like Birthday, School, Occupation and etc. An intelligent model should be expressive enough to explore various combinations of such attributes so as to identify the important social patterns. The task of modeling such complex information reminds us of the challenge in natural language processing (NLP),

4 and specifically, paragraph embedding, where the frequency, ordering and co-occurrence of words all matter when interpreting the whole paragraph. While there is no previous work that models network vertexes (such as users in social networks) as paragraphs, we find it intuitive and rewarding to do so, because a i A is basically a description of node v i, and it is usually easy to translate a i into a paragraph. Specifically, tweets, with topics hidden in the words, are naturally appropriate to be modeled as sentences; hashtags and explicit categorical attributes, of which the frequency, ordering and co-occurrence of certain values carries rich semantics, can also be properly modeled as sentences (e.g., a Facebook attribute like birthday: July naturally equals to a sentence like the user s birthday is in July ). Just like the task of NLP, the length of the description A is naturally different for each node (paragraph), while the total number of possible values of words in tweets, popular hashtags and informative attributes is limited (vocabulary). To deeply understand the semantics of paragraphs, recurrent neural network (RNN) has been proven advantageous over various other methods, basically due to its encoding of sentences as sequences of words and supreme expressive power within the neural networks, both of which are intriguing considering our task of modeling high-dimensional noisy node attributes. Moreover, the deep learning framework naturally allows us to inject supervision into the model learning process and provides us with the flexibility of modifying the network architecture in order to leverage other information like link structures. Given those beneficial properties, we adopt RNN to model node attributes. Given node v i s d-dimensional attribute vector a i, we first transform it into a sequence {j x i,j = 1}. E.g., a 5-dimensional user attribute vector (0, 1, 0, 0, 1) T will be transformed into a sequence {1, 4}. Then we use this sequence as the input of RNN. As mentioned in [24], simple RNN would be difficult to train due to the resulting long term dependencies. Therefore, we choose long-short term memory (LSTM) [15] instead to model node attributes by learning a proper embedding through the following equations: i t = δ(w ia t + G ih t 1 + b i), Ĉ t = tanh(w ca t + G f h t 1 + b f ), f t = δ(w f a t + G f h t 1 + b f ), C t = i t Ĉt + ft Ct, o t = δ(w oa t + G oh t 1 + V oc t + b o) h t = o t tanh(c t). (3) where δ represents the sigmoid activation function; W i, W c, W f, W o, G i, G f, G o, and V o are weight matrices. b i, b f, and b o are bias vectors. The gates in LSTM cell can modulate the interactions between the memory cell itself and its environment. The input gates allow incoming signal to alter the state of the memory cell or block it and the output gates allow the state of the memory cell to have an effect on other neurons or prevent it. Moreover, the forget gates allow the cell to remember or forget its previous state. The architecture and implementation of LSTM can be found in the public website. After we input one element, there will be one output of the LSTM cell. As a result, there will be d output embedding vector {h 1, h 2,..., h d }. We take the average of them, i.e., ĥi = (h h d)/d, as the semantic embedding of the input sequence {x 1, x 2,..., x d }. This process is named as mean pooling. Supposing that the embedding size of user is m, we use H R n m to represent the node embeddings, where the i-th row of H equals to ĥi. Leveraging link structures with a random-walk layer: Another important characteristic of network data is the abundance of interactions among nodes, which is also significant for understanding social patterns. E.g., if some nodes are connected in a specific way, they are more likely to be in the same community. In order to leverage the interactions among nodes in networks, we introduce a novel randomwalk layer into RNN, which extends the modeling of node attributes to link structures. The random-walk layer is implemented by computing the one-step random transition matrix S based on network interactions. E.g., for networks with edge weights w ij (the weights are binary if we consider links such as friendships in social networks), S is computed as s ij = w ij/d i, where d i = j wij. A random-walk layer performing one step of the random walk process can be implemented by multiplying S to the attribute embedding H of RNN, i.e., T = SH. In order to simulate t steps of random walks, we can simply perform t random transitions, which can be implemented as T = S t H. In Sec 4, we experiment on the impact of different numbers of random transitions. To leverage link structures in the feature learning process, we connect the random-walk layer to the LSTM layer. Intuitively, the random-walk layer reconstructs the embedding of each node w.r.t. its neighbors, according to the distance of each neighbor measured by random walks of certain steps. This is done before the softmax prediction of community memberships and the evaluation towards ground-truth labels, so that the prediction is made on the reconstructed embeddings and the supervision is done with the consideration of link structures. Specifically, as supervision requires the embeddings of nodes within the same communities to be close, the random-walk layer allows the attribute embeddings to be not so close. As a consequence, only nodes within neighborhoods of both specific attributes and structures indicated by supervision will be embedded as close. Exploiting community supervision through end-toend embedding: To learn the essential features that describe social patterns as embeddings guided by labeled communities, we incorporate a softmax supervision layer into an end-to-end deep learning framework. By incorporating, is able Figure 4 illustrates the architecture of. A random-walk layer of several random transitions is connected to the attribute embedding layer and a softmax supervision layer is connected to exploit community labels. Suppose that the predicted labels are in matrix P, and the ground-truth labels are in matrix G. We adopt the following cross-entropy loss as our objective function: L = n G ilog(p i). (4) i=1 To optimization Eq. 4, we employ the stochastic gradient descent (SGD) with the diagonal variant of AdaGrad in [20].

5 Node Attrib utes A LSTM h1 LSTM h2 LSTM h... Attribute Embedding Matrix H Random Transition Matrix S Random Transition Matrix S Reconstructed Embedding Matrix T Softmax Prediction P Ground Truth Labels G (1) Attribute Embedding Layer (2) Random-walk Layer (3) Softmax Supervision Layer Figure 4: The deep architecture of Community Oriented Network Embedding (). At the t-th step, the parameters Θ is updated by: ρ Θ t Θ t 1 t g t, (5) i=1 g2 i where ρ is the initial learning rate and g t is the sub-gradient at time t. Since our proposed random-walk layer is implemented as a matrix multiplication, the gradients can be efficiently back-propagated to the attribute embedding layer. Thus the overall embedding framework is end-to-end and the mutual reinforcement between node attributes and link structures is leveraged. Out-of-sample community detection with : Our framework is designed to deal with the out-of-sample problem, i.e., an embedding model is learned once on the labeled data and can be used multiple times on unlabeled data. Unlike many network embedding algorithms that learn a specific embedding for nodes in the network by enforcing small distance among certain nodes [12, 19, 26], uses community labels to learn the model and store the parameters in a deep architecture. In this way, it memorizes the specific social patterns indicated by labeled communities and is able to apply them on new nodes. Such out-of-sample property is plausible in at least three folds. 1. Saving supervision. While supervision is usually expensive and limited, can be learned on a relatively very small training set and utilized on the rest of the large network. 2. Crossing networks. Learned on a specific network, can be used to compute embeddings on any other networks, assuming similar social patterns (such as from one ego-network to another), as long as the attribute set remains the same, or a mapping between attribute sets can be provided. 3. Excelling on dynamic networks. does not need to be re-learned for new nodes or links added to the network over time. When detecting communities in a network, embeddings of nodes computed by the model learned on labeled data are fed into generic clustering algorithms like k-means. We apply cross-validation to automatically choose the optimal number of communities as done similarly in [33]. The efficiency of : Since is an end-to-end deep learning framework implemented using TensorFlow, it is easy to run efficiently on GPU EXPERIMENTS In this section, we evaluate on the task of community detection with extensive experiments on real-world network datasets. 4.1 Experimental Settings Datasets. We use three real-world network datasets. The first is the Facebook dataset we used in Sec 2 for data analysis, which consists of 10 ego-networks with community labels explicitly provided by the ego users [18]. The attributes in this dataset are well-defined user profiles such as education, work and location, and links are undirected friendships among users. The second is a Flickr dataset collected by [29]. The community labels are generated from the groups joined by users. The attributes are the tags aggregated on users posted photos and the links are undirected friendships. The third is a Twitter dataset also collected by [18], which consists of 973 ego-networks. The community labels are generated from friend circles (or lists), i.e., friends put into the same list by the ego user are regarded as within one community. The attributes are the hashtags and popular account mentions within users recent tweets and the links are directed followings. Detailed statistics of the three datasets we use are shown in Table 1. In the model selection experiments, we vary the size of training and testing sets to show the ability of to leverage limited supervision. Dataset #nodes #comms #links #attrs Facebook 4, , Flickr 10, ,814 77,260 Twitter 5, ,556 12,274 Table 1: Statistics of 3 real-world network datasets. Compared algorithms. We compare with two groups of algorithms from the state-of-the-art to comprehensively evaluate the performance of. Traditional unsupervised community detection algorithms. Some of these algorithms are based on the edge density assumption alone, while others also assume node homogeneity. We compare with this group of algorithms to show the advantage of abandoning these inaccurate assumptions and leveraging the supervision of example communities. MinCut [9]: a classic community detection algorithm based on modularity.

6 BigClam [31]: an advanced community detection algorithm solely based on network structure. Circles [18]: a generative model of edges w.r.t. attribute similarity to detect communities. CESNA [33]: a generative model of edges and attributes to detect communities. Network embedding algorithms adapted to leverage community labels. While we find it intuitive to leverage community labels by learning embeddings, we compare with the stateof-the-art network embedding algorithms to show that the task is non-trivial and is advantageous. Inspired by [25], we adapt those network embedding algorithms to accommodate extra information by building a heterogeneous network augmented by node attributes and community labels. The embeddings learned by these algorithms are fed into the same generic clustering algorithm as to yield community detection results. [19]: we generate truncated random walks on the augmented heterogeneous network. node2vec [12]: we conduct the 2nd order random walks on the augmented heterogeneous network. [26]: we compute the model on the augmented heterogeneous network as what is done in [25]. Metrics. Two widely used metrics for evaluating community detection results are used in our experiments. For a detected community c i and a ground-truth community c i, the F1 similarity and similarity are defined as F 1 = 2 precision recall precision + recall, = ci c i c i c i, where precision = c i c i c i, recall = c i c i c i. For a set of ground-truth communities {c i } M i=1 and a set of detected communities {c i} N i=1, we compute the score as 1 2 C c i C max eval(ci, c c i ) + 1 i C 2 C c i C max c i C eval(ci, c i ), where eval(c i, c i ) can be either replaced by F 1 or. 4.2 Performance Comparison with Baselines We quantitively evaluate against all baselines on community detection. We randomly split the labeled communities into training sets and testing sets. For, we use 10% of the labeled communities as training set to learn the embedding model. For the adaptions of other embedding algorithms, the 10% community labels are used to build the heterogeneous networks for generating vertex sequences and computing neighborhood similarities. All compared algorithms including are then evaluated on the rest 90% labeled communities. To observe significant difference in performance, we split the training and testing sets 10 times and conduct statistical t-tests with p-value Table 2 and 3 show the average F1 and scores evaluated on all compared algorithms over the same 10 random splits of training and testing sets. The results all passed Algorithm Facebook Flickr Twitter MinCut BigClam CESNA Circles node2vec Table 2: Performance comparison using F1. Algorithm Facebook Flickr Twitter MinCut BigClam CESNA Circles node2vec Table 3: Performance comparison using. our t-tests. The parameters of baselines are all set to the default values as suggested by the original works. For, we intuitively use two random transitions in the randomwalk layer to stress network locality and leverage the link structures in close neighborhoods. reaches around 30% relative improvements on the Facebook dataset and more than 10% improvements on the other two datasets, compared with the second-runner among the 7 compared algorithms on both F1 and scores, while they have varying performances. This indicates the robustness and general advantages of our proposed approach. Taking a closer look at the scores, we observe that the scores on the Facebook dataset look much better than those on the other two datasets. This is due to the good quality of both attributes and links in the dataset. Since the Facebook dataset is the only one that has explicit human labels of communities, the large improvements on it indicate the advantage of in leveraging user provided community examples to understand specific social patterns. The scores of network embedding algorithms are lower than traditional community detection algorithms on the Facebook dataset, probably because the attributes are few and clean and traditional models work well on such settings. However, as the attributes become high-dimensional and noisy like in the Flickr and Twitter datasets, network embedding algorithms originally designed to deal with NLP problems excel. This indicates the advantage of leveraging embeddings for traditional social network tasks and further proves the efficacy of, especially in exploring highdimensional noisy attributes. 4.3 Model Selection We comprehensively evaluate the performance of when different amounts of supervision are available and study the impact of different numbers of random-walk transitions and embedding dimensions. Impact of supervision. We study the impact of supervision on the performance of by varying the training set portion. Figure 6 shows the results compared with all

7 baselines. s advantage of leveraging labeled data is significant on all three datasets. As can be observed, efficiently leverages supervision by learning from very small amounts of labeled data, and the performance converges rapidly as the training set portion reaches around 11% on the Facabook and Flickr datasets. On datasets with large amounts of labeled data such as Twitter, around 6% of labeled data are enough to learn a good model. Note that the other three network embedding algorithms do not leverage supervision effectively, basically because they do not deal with the out-of-sample problem and the social patterns they learn on labeled data do not generalize trivially to unlabeled ones. Impact of parameters. We study the impact of parameters inside by varying the number of random-walk transitions and embedding dimensions. Figure 7 shows the results compared with all baseline algorithms. The number of random-walk transitions has a large impact on the performance of, especially when the number is small. This demonstrates the utility of our novelly designed randomwalk layer. As the number of transitions grows larger and the stationary distribution of random walk is approximated, the performance becomes stable. Note that large numbers of transitions do not necessarily lead to optimal performance, because community structures are often well described within small neighborhoods. The number of embedding dimensions does not have a significant impact, indicating the robustness of inherited from RNN. 4.4 Case Study To understand the advantage of, we conduct the same density and homogeneity analysis on our detected communities and present the results on the same ego-networks we showed in Sec 2. Comparing Figure 5(a) and 5(b) with Figure 2(a) and 3(a), we observe that indeed finds communities that are similar in distributions of density and homogeneity as the ground-truth, which indicates its effectiveness in coherently leveraging community labels, node attributes and link structures. (a)density in ego 686 (b)homogeneity in ego 698 Figure 5: Communities detected by. 5. RELATED WORK We stress the novelty of this work and differentiate it with existing literature in three folds by comparing with community detection, network embedding and node classification. 5.1 Community detection Traditional community detection algorithms are mostly unsupervised, based on either of the edge density or node homogeneity assumptions, or both. Therefore, the communities they detect are only constructed by densely connected or homogeneous nodes. Among them, many are based on probabilistic generative models. For instance, COCOMP [35] extends LDA [6] by taking community assignments as latent variables. It leverages node homogeneity by assuming that each community is a group of people that use a specific distribution of words. BigClam [31] does not make use of node content at all. It leverages edge density by modeling the probabilities of edges solely based on the existing connections in the network. A variation of BigClam called CESNA [33] is later proposed, with additional consideration of binaryvalued node attributes. It leverages both edge density and node homogeneity by assuming communities to be formed by densely connected homogeneous nodes. Another method within the state-of-the-art, Circles [18], also leverages the same assumptions with a slightly different model. Many non-generative algorithms have also been proposed. Among them, some are based on graph metrics such as modularity [9] and maximal clique [11]. They are fundamentally leveraging the edge density assumption. As for node homogeneity, some algorithms firstly augment the graph with content links and then do clustering on the augmented graph [22, 34], while some others attempt to find partition that yields the minimum encoding cost based on information theory [21, 3]. They do not scale trivially to networks with high-dimensional noisy attributes. 5.2 Network embedding As we discuss in Sec 1, we aim to find node features that are suitable for the task of community detection. However, unlike the the techniques that generate features from graphs based on network structures [14, 27, 28], we frame the feature creation procedure as learning a community oriented representation of each node. Moreover, we aim to inject supervision into the learning procedure, instead of the unsupervised ways based on relational data in the networks [1, 30] or structural relationships between concepts [10, 16]. We note that while it is not a trivial task, some recent network embedding algorithms like [12, 19, 26] are flexible enough to consider node attribute and the supervision of community memberships. E.g., we can build a heterogeneous network with attributes and labels as augmented nodes and then apply the truncated random walks of [19, 12] on it, or learn the bipartite network embeddings on it as done in [25]. However, those algorithms can not handle the out-of-sample situations, so the label information is not properly leveraged for the embedding of unlabeled nodes. Since the supervision available is usually limited and only on a small part of the data, the performance is expected to be inferior. Finally we argue that the extension of their framework into to our situation leads to networks that are too large for the algorithms to be efficient, especially when the number of attributes are large. 5.3 Node classification Traditional supervised learning on networks is essentially node classification [5, 13, 27, 36], which leverages node attributes (labels) and network structures for the tasks like attribute prediction. However, although we utilize supervision, since our final goal is to perform unsupervised detection of communities based on the community oriented features we learn under supervision, our problem is still quite different from node classification. To be more specific, our predictions are new clusters of nodes, for which no predefined categories

8 % 7% 8% 9% 10% 11% 12% 13% 14% 15% (a)f1 on Facebook 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% (d) on Flickr % 7% 8% 9% 10% 11% 12% 13% 14% 15% (b) on Facebook 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% (e)f1 on Twitter Figure 6: Varying the training set portion % 7% 8% 9% 10% 11% 12% 13% 14% 15% 5 (c)f1 on Flickr 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% (f) on Twitter (a)f1 on Facebook 0.02 (d) on Flickr (b) on Facebook (e)f1 on Twitter F1-score (c)f1 on Flickr (f) on Twitter Figure 7: Varying the number of random-walk transitions and the embedding dimensions. or labeled data are available at all. As a consequence, node classification algorithms are not applicable to our problem. 6. CONCLUSION In this paper, we doubt the generality of the two widely trusted assumptions for all existing community detection methods, i.e., the edge density assumption and node homogeneity assumption. We further prove that those assumptions have led to inaccurate community detection results for various mainstream algorithms within the state-ofthe-art through data analysis on a real-world social network dataset with explicit manual community labels. To deal with this deficiency, we propose to leverage limited amount of labeled communities as examples. For this purpose, we design to inject supervision into the problem of community detection by learning an embedding model for the whole network, which captures community shapes in terms of both node attributes and link structures, as indicated by some community examples. Generic clustering algorithms performed on the embeddings then yield reliable community detection results on the whole network. Recently, there has been a trend in applying successful NLP algorithms, especially those involving deep learning architectures, to solve social network tasks. To the best of our knowledge, we are the first to successfully apply RNN to social networks. Our key leverage is the expressiveness of RNN in modeling high-dimensional noisy node attributes and the coherent combination with random-walk layers to incorporate link structures. The out-of-sample embedding model is learned in an end-to-end deep framework to effectively leverage community supervision and the mutual reinforcement between attributes and links. It is implemented using TensorFlow and runs efficiently on GPU. While is originally designed for leveraging supervision for robust community detection free from falsifiable assumptions, it can be easily applied to many network learning tasks to coherently leverage node attributes, link structures and various kinds of supervision. As we observe from the experiments, is especially advantageous in dealing with high-dimensional noisy attributes, such as bags of hashtags and even raw texts.

9 7. REFERENCES [1] A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola. Distributed large-scale natural graph factorization. In WWW, pages ACM, [2] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. Mixed membership stochastic blockmodels. JMLR, 9(Sep): , [3] L. Akoglu, H. Tong, B. Meeder, and C. Faloutsos. Pics: Parameter-free identification of cohesive subgroups in large attributed graphs. In SDM, pages SIAM, [4] A. Banerjee, C. Krumpelman, J. Ghosh, S. Basu, and R. J. Mooney. Model-based overlapping clustering. In SIGKDD, pages ACM, [5] S. Bhagat, G. Cormode, and S. Muthukrishnan. Node classification in social networks. In Social network data analytics, pages Springer, [6] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3: , [7] T. Chakraborty, S. Patranabis, P. Goyal, and A. Mukherjee. On the formation of circles in co-authorship networks. In SIGKDD, pages ACM, [8] J. Chang and D. M. Blei. Relational topic models for document networks. In AIStats, volume 9, pages 81 88, [9] A. Clauset, M. E. Newman, and C. Moore. Finding community structure in very large networks. Physical review E, 70(6):066111, [10] G. E. Dahl, D. Yu, L. Deng, and A. Acero. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20(1):30 42, [11] N. Du, B. Wu, X. Pei, B. Wang, and L. Xu. Community detection in large-scale social networks. In WebKDD, pages ACM, [12] A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. [13] J. He, M. Li, H.-J. Zhang, H. Tong, and C. Zhang. Manifold-ranking based image retrieval. In Proceedings of the 12th annual ACM international conference on Multimedia, pages ACM, [14] K. Henderson, B. Gallagher, L. Li, L. Akoglu, T. Eliassi-Rad, H. Tong, and C. Faloutsos. It s who you know: graph mining using recursive structural features. In SIGKDD, pages ACM, [15] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8): , [16] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages , [17] R. Li, C. Wang, and K. C.-C. Chang. User profiling in an ego network: co-profiling attributes and relationships. In WWW, pages ACM, [18] J. J. McAuley and J. Leskovec. Learning to discover social circles in ego networks. In NIPS, volume 2012, pages , [19] B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In SIGKDD, pages ACM, [20] X. Qiu and X. Huang. Convolutional neural tensor network architecture for community-based question answering. In IJCAI, pages , [21] M. Rosvall and C. T. Bergstrom. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences, 105(4): , [22] Y. Ruan, D. Fuhry, and S. Parthasarathy. Efficient community detection in large networks using content and links. In WWW, pages , [23] A. P. Streich, M. Frank, D. Basin, and J. M. Buhmann. Multi-assignment clustering for boolean data. In ICML, pages ACM, [24] I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, pages , [25] J. Tang, M. Qu, and Q. Mei. Pte: Predictive text embedding through large-scale heterogeneous text networks. In SIGKDD, pages ACM, [26] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. Line: Large-scale information network embedding. In WWW, pages ACM, [27] L. Tang and H. Liu. Relational learning via latent social dimensions. In SIGKDD, pages ACM, [28] L. Tang and H. Liu. Leveraging social media networks for classification. Data Mining and Knowledge Discovery, 23(3): , [29] X. Wang, L. Tang, H. Liu, and L. Wang. Learning with multi-resolution overlapping communities. Knowledge and Information Systems (KAIS), [30] S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin. Graph embedding and extensions: a general framework for dimensionality reduction. IEEE transactions on pattern analysis and machine intelligence, 29(1):40 51, [31] J. Yang and J. Leskovec. Overlapping community detection at scale: a nonnegative matrix factorization approach. In WSDM, pages ACM, [32] J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1): , [33] J. Yang, J. McAuley, and J. Leskovec. Community detection in networks with node attributes. In ICDM, pages IEEE, [34] T. Yang, R. Jin, Y. Chi, and S. Zhu. Combining link and content for community detection: a discriminative approach. In SIGKDD, pages ACM, [35] W. Zhou, H. Jin, and Y. Liu. Community discovery and profiling with social messages. In SIGKDD, pages ACM, [36] X. Zhu, Z. Ghahramani, J. Lafferty, et al. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, volume 3, pages , 2003.

arxiv: v1 [cs.mm] 12 Jan 2016

arxiv: v1 [cs.mm] 12 Jan 2016 Learning Subclass Representations for Visually-varied Image Classification Xinchao Li, Peng Xu, Yue Shi, Martha Larson, Alan Hanjalic Multimedia Information Retrieval Lab, Delft University of Technology

More information

Network embedding. Cheng Zheng

Network embedding. Cheng Zheng Network embedding Cheng Zheng Outline Problem definition Factorization based algorithms --- Laplacian Eigenmaps(NIPS, 2001) Random walk based algorithms ---DeepWalk(KDD, 2014), node2vec(kdd, 2016) Deep

More information

GraphGAN: Graph Representation Learning with Generative Adversarial Nets

GraphGAN: Graph Representation Learning with Generative Adversarial Nets The 32 nd AAAI Conference on Artificial Intelligence (AAAI 2018) New Orleans, Louisiana, USA GraphGAN: Graph Representation Learning with Generative Adversarial Nets Hongwei Wang 1,2, Jia Wang 3, Jialin

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

An Empirical Analysis of Communities in Real-World Networks

An Empirical Analysis of Communities in Real-World Networks An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization

More information

GraphNet: Recommendation system based on language and network structure

GraphNet: Recommendation system based on language and network structure GraphNet: Recommendation system based on language and network structure Rex Ying Stanford University rexying@stanford.edu Yuanfang Li Stanford University yli03@stanford.edu Xin Li Stanford University xinli16@stanford.edu

More information

Layerwise Interweaving Convolutional LSTM

Layerwise Interweaving Convolutional LSTM Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States

More information

27: Hybrid Graphical Models and Neural Networks

27: Hybrid Graphical Models and Neural Networks 10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look

More information

DeepWalk: Online Learning of Social Representations

DeepWalk: Online Learning of Social Representations DeepWalk: Online Learning of Social Representations ACM SIG-KDD August 26, 2014, Rami Al-Rfou, Steven Skiena Stony Brook University Outline Introduction: Graphs as Features Language Modeling DeepWalk Evaluation:

More information

node2vec: Scalable Feature Learning for Networks

node2vec: Scalable Feature Learning for Networks node2vec: Scalable Feature Learning for Networks A paper by Aditya Grover and Jure Leskovec, presented at Knowledge Discovery and Data Mining 16. 11/27/2018 Presented by: Dharvi Verma CS 848: Graph Database

More information

arxiv: v1 [cs.si] 12 Jan 2019

arxiv: v1 [cs.si] 12 Jan 2019 Predicting Diffusion Reach Probabilities via Representation Learning on Social Networks Furkan Gursoy furkan.gursoy@boun.edu.tr Ahmet Onur Durahim onur.durahim@boun.edu.tr arxiv:1901.03829v1 [cs.si] 12

More information

arxiv: v1 [cs.si] 3 Oct 2017

arxiv: v1 [cs.si] 3 Oct 2017 Relationship Profiling over Social Networks: Reverse Smoothness from Similarity to Closeness arxiv:1710.01363v1 [cs.si] 3 Oct 2017 ABSTRACT Carl Yang University of Illinois, Urbana Champaign 201 N. Goodwin

More information

Empirical Evaluation of RNN Architectures on Sentence Classification Task

Empirical Evaluation of RNN Architectures on Sentence Classification Task Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks

More information

End-To-End Spam Classification With Neural Networks

End-To-End Spam Classification With Neural Networks End-To-End Spam Classification With Neural Networks Christopher Lennan, Bastian Naber, Jan Reher, Leon Weber 1 Introduction A few years ago, the majority of the internet s network traffic was due to spam

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

A. Papadopoulos, G. Pallis, M. D. Dikaiakos. Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks

A. Papadopoulos, G. Pallis, M. D. Dikaiakos. Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks A. Papadopoulos, G. Pallis, M. D. Dikaiakos Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks IEEE/WIC/ACM International Conference on Web Intelligence Nov.

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Yaguang Li Joint work with Rose Yu, Cyrus Shahabi, Yan Liu Page 1 Introduction Traffic congesting is wasteful of time,

More information

Deep Learning Applications

Deep Learning Applications October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning

More information

Using PageRank in Feature Selection

Using PageRank in Feature Selection Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy fienco,meo,bottag@di.unito.it Abstract. Feature selection is an important

More information

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of

More information

Query Likelihood with Negative Query Generation

Query Likelihood with Negative Query Generation Query Likelihood with Negative Query Generation Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 ylv2@uiuc.edu ChengXiang Zhai Department of Computer

More information

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing

More information

Learning to Discover Social Circles in Ego Networks

Learning to Discover Social Circles in Ego Networks Learning to Discover Social Circles in Ego Networks Julian McAuley Stanford jmcauley@cs.stanford.edu Jure Leskovec Stanford jure@cs.stanford.edu Abstract Our personal social networks are big and cluttered,

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks

AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, Jiawei Han University of Illinois at Urbana-Champaign (UIUC) Facebook Inc. U.S. Army Research

More information

Supervised Random Walks

Supervised Random Walks Supervised Random Walks Pawan Goyal CSE, IITKGP September 8, 2014 Pawan Goyal (IIT Kharagpur) Supervised Random Walks September 8, 2014 1 / 17 Correlation Discovery by random walk Problem definition Estimate

More information

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan Explore Co-clustering on Job Applications Qingyun Wan SUNet ID:qywan 1 Introduction In the job marketplace, the supply side represents the job postings posted by job posters and the demand side presents

More information

Using PageRank in Feature Selection

Using PageRank in Feature Selection Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy {ienco,meo,botta}@di.unito.it Abstract. Feature selection is an important

More information

A Unified Framework to Integrate Supervision and Metric Learning into Clustering

A Unified Framework to Integrate Supervision and Metric Learning into Clustering A Unified Framework to Integrate Supervision and Metric Learning into Clustering Xin Li and Dan Roth Department of Computer Science University of Illinois, Urbana, IL 61801 (xli1,danr)@uiuc.edu December

More information

Deep Learning on Graphs

Deep Learning on Graphs Deep Learning on Graphs with Graph Convolutional Networks Hidden layer Hidden layer Input Output ReLU ReLU, 6 April 2017 joint work with Max Welling (University of Amsterdam) The success story of deep

More information

Jure Leskovec, Cornell/Stanford University. Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research

Jure Leskovec, Cornell/Stanford University. Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research Jure Leskovec, Cornell/Stanford University Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research Network: an interaction graph: Nodes represent entities Edges represent interaction

More information

Deep Learning on Graphs

Deep Learning on Graphs Deep Learning on Graphs with Graph Convolutional Networks Hidden layer Hidden layer Input Output ReLU ReLU, 22 March 2017 joint work with Max Welling (University of Amsterdam) BDL Workshop @ NIPS 2016

More information

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,

More information

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in

More information

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Authors: Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio Presenter: Yu-Wei Lin Background: Recurrent Neural

More information

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction by Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han.[1] Presented : Badri Patro 1 1 Computer Vision Reading

More information

arxiv: v1 [cs.si] 7 Sep 2018

arxiv: v1 [cs.si] 7 Sep 2018 dyngraph2vec: Capturing Network Dynamics using Dynamic Graph Representation Learning Palash Goyal USC Information Sciences Institute palashgo@usc.edu Sujit Rokka Chhetri * University of California-Irvine

More information

Spectral Methods for Network Community Detection and Graph Partitioning

Spectral Methods for Network Community Detection and Graph Partitioning Spectral Methods for Network Community Detection and Graph Partitioning M. E. J. Newman Department of Physics, University of Michigan Presenters: Yunqi Guo Xueyin Yu Yuanqi Li 1 Outline: Community Detection

More information

A Taxonomy of Semi-Supervised Learning Algorithms

A Taxonomy of Semi-Supervised Learning Algorithms A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph

More information

Pixel-level Generative Model

Pixel-level Generative Model Pixel-level Generative Model Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tübingen, Germany Pixel Recurrent Neural Networks (2016ICML) A. van den Oord,

More information

Unsupervised User Identity Linkage via Factoid Embedding

Unsupervised User Identity Linkage via Factoid Embedding Unsupervised User Identity Linkage via Factoid Embedding Wei Xie, Xin Mu, Roy Ka-Wei Lee, Feida Zhu and Ee-Peng Lim Living Analytics Research Centre Singapore Management University, Singapore Email: {weixie,roylee.2013,fdzhu,eplim}@smu.edu.sg

More information

Classify My Social Contacts into Circles Stanford University CS224W Fall 2014

Classify My Social Contacts into Circles Stanford University CS224W Fall 2014 Classify My Social Contacts into Circles Stanford University CS224W Fall 2014 Amer Hammudi (SUNet ID: ahammudi) ahammudi@stanford.edu Darren Koh (SUNet: dtkoh) dtkoh@stanford.edu Jia Li (SUNet: jli14)

More information

TriRank: Review-aware Explainable Recommendation by Modeling Aspects

TriRank: Review-aware Explainable Recommendation by Modeling Aspects TriRank: Review-aware Explainable Recommendation by Modeling Aspects Xiangnan He, Tao Chen, Min-Yen Kan, Xiao Chen National University of Singapore Presented by Xiangnan He CIKM 15, Melbourne, Australia

More information

Image Captioning with Object Detection and Localization

Image Captioning with Object Detection and Localization Image Captioning with Object Detection and Localization Zhongliang Yang, Yu-Jin Zhang, Sadaqat ur Rehman, Yongfeng Huang, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

More information

arxiv: v1 [cs.si] 14 Mar 2017

arxiv: v1 [cs.si] 14 Mar 2017 SNE: Signed Network Embedding Shuhan Yuan 1, Xintao Wu 2, and Yang Xiang 1 1 Tongji University, Shanghai, China, Email:{4e66,shxiangyang}@tongji.edu.cn 2 University of Arkansas, Fayetteville, AR, USA,

More information

Crowd Scene Understanding with Coherent Recurrent Neural Networks

Crowd Scene Understanding with Coherent Recurrent Neural Networks Crowd Scene Understanding with Coherent Recurrent Neural Networks Hang Su, Yinpeng Dong, Jun Zhu May 22, 2016 Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 1 / 26 Outline 1 Introduction 2 LSTM

More information

Plagiarism Detection Using FP-Growth Algorithm

Plagiarism Detection Using FP-Growth Algorithm Northeastern University NLP Project Report Plagiarism Detection Using FP-Growth Algorithm Varun Nandu (nandu.v@husky.neu.edu) Suraj Nair (nair.sur@husky.neu.edu) Supervised by Dr. Lu Wang December 10,

More information

Web Structure Mining Community Detection and Evaluation

Web Structure Mining Community Detection and Evaluation Web Structure Mining Community Detection and Evaluation 1 Community Community. It is formed by individuals such that those within a group interact with each other more frequently than with those outside

More information

learning stage (Stage 1), CNNH learns approximate hash codes for training images by optimizing the following loss function:

learning stage (Stage 1), CNNH learns approximate hash codes for training images by optimizing the following loss function: 1 Query-adaptive Image Retrieval by Deep Weighted Hashing Jian Zhang and Yuxin Peng arxiv:1612.2541v2 [cs.cv] 9 May 217 Abstract Hashing methods have attracted much attention for large scale image retrieval.

More information

PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks

PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks Pramod Srinivasan CS591txt - Text Mining Seminar University of Illinois, Urbana-Champaign April 8, 2016 Pramod Srinivasan

More information

A Comparison of Pattern-Based Spectral Clustering Algorithms in Directed Weighted Network

A Comparison of Pattern-Based Spectral Clustering Algorithms in Directed Weighted Network A Comparison of Pattern-Based Spectral Clustering Algorithms in Directed Weighted Network Sumuya Borjigin 1. School of Economics and Management, Inner Mongolia University, No.235 West College Road, Hohhot,

More information

A Deep Relevance Matching Model for Ad-hoc Retrieval

A Deep Relevance Matching Model for Ad-hoc Retrieval A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese

More information

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR

SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR P.SHENBAGAVALLI M.E., Research Scholar, Assistant professor/cse MPNMJ Engineering college Sspshenba2@gmail.com J.SARAVANAKUMAR B.Tech(IT)., PG

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

A Recommender System Based on Improvised K- Means Clustering Algorithm

A Recommender System Based on Improvised K- Means Clustering Algorithm A Recommender System Based on Improvised K- Means Clustering Algorithm Shivani Sharma Department of Computer Science and Applications, Kurukshetra University, Kurukshetra Shivanigaur83@yahoo.com Abstract:

More information

Community Detection Using Node Attributes and Structural Patterns in Online Social Networks

Community Detection Using Node Attributes and Structural Patterns in Online Social Networks Computer and Information Science; Vol. 10, No. 4; 2017 ISSN 1913-8989 E-ISSN 1913-8997 Published by Canadian Center of Science and Education Community Detection Using Node Attributes and Structural Patterns

More information

Hierarchical Mixed Neural Network for Joint Representation Learning of Social-Attribute Network

Hierarchical Mixed Neural Network for Joint Representation Learning of Social-Attribute Network Hierarchical Mixed Neural Network for Joint Representation Learning of Social-Attribute Network Weizheng Chen 1(B), Jinpeng Wang 2, Zhuoxuan Jiang 1, Yan Zhang 1, and Xiaoming Li 1 1 School of Electronics

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be

More information

TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA)

TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA) TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA) 1 S. ADAEKALAVAN, 2 DR. C. CHANDRASEKAR 1 Assistant Professor, Department of Information Technology, J.J. College of Arts and Science, Pudukkottai,

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Supervised Hashing for Image Retrieval via Image Representation Learning

Supervised Hashing for Image Retrieval via Image Representation Learning Supervised Hashing for Image Retrieval via Image Representation Learning Rongkai Xia, Yan Pan, Cong Liu (Sun Yat-Sen University) Hanjiang Lai, Shuicheng Yan (National University of Singapore) Finding Similar

More information

Community-Based Recommendations: a Solution to the Cold Start Problem

Community-Based Recommendations: a Solution to the Cold Start Problem Community-Based Recommendations: a Solution to the Cold Start Problem Shaghayegh Sahebi Intelligent Systems Program University of Pittsburgh sahebi@cs.pitt.edu William W. Cohen Machine Learning Department

More information

Using Decision Boundary to Analyze Classifiers

Using Decision Boundary to Analyze Classifiers Using Decision Boundary to Analyze Classifiers Zhiyong Yan Congfu Xu College of Computer Science, Zhejiang University, Hangzhou, China yanzhiyong@zju.edu.cn Abstract In this paper we propose to use decision

More information

jldadmm: A Java package for the LDA and DMM topic models

jldadmm: A Java package for the LDA and DMM topic models jldadmm: A Java package for the LDA and DMM topic models Dat Quoc Nguyen School of Computing and Information Systems The University of Melbourne, Australia dqnguyen@unimelb.edu.au Abstract: In this technical

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,

More information

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward

More information

Unsupervised Learning

Unsupervised Learning Deep Learning for Graphics Unsupervised Learning Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL Timetable Niloy

More information

COMP 465: Data Mining Still More on Clustering

COMP 465: Data Mining Still More on Clustering 3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following

More information

A Study of MatchPyramid Models on Ad hoc Retrieval

A Study of MatchPyramid Models on Ad hoc Retrieval A Study of MatchPyramid Models on Ad hoc Retrieval Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences Text Matching Many text based

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN: Semi Automatic Annotation Exploitation Similarity of Pics in i Personal Photo Albums P. Subashree Kasi Thangam 1 and R. Rosy Angel 2 1 Assistant Professor, Department of Computer Science Engineering College,

More information

Query-Sensitive Similarity Measure for Content-Based Image Retrieval

Query-Sensitive Similarity Measure for Content-Based Image Retrieval Query-Sensitive Similarity Measure for Content-Based Image Retrieval Zhi-Hua Zhou Hong-Bin Dai National Laboratory for Novel Software Technology Nanjing University, Nanjing 2193, China {zhouzh, daihb}@lamda.nju.edu.cn

More information

Domain-specific user preference prediction based on multiple user activities

Domain-specific user preference prediction based on multiple user activities 7 December 2016 Domain-specific user preference prediction based on multiple user activities Author: YUNFEI LONG, Qin Lu,Yue Xiao, MingLei Li, Chu-Ren Huang. www.comp.polyu.edu.hk/ Dept. of Computing,

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Automatic Domain Partitioning for Multi-Domain Learning

Automatic Domain Partitioning for Multi-Domain Learning Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels

More information

Deep Tensor: Eliciting New Insights from Graph Data that Express Relationships between People and Things

Deep Tensor: Eliciting New Insights from Graph Data that Express Relationships between People and Things Deep Tensor: Eliciting New Insights from Graph Data that Express Relationships between People and Things Koji Maruhashi An important problem in information and communications technology (ICT) is classifying

More information

Pseudo-Implicit Feedback for Alleviating Data Sparsity in Top-K Recommendation

Pseudo-Implicit Feedback for Alleviating Data Sparsity in Top-K Recommendation Pseudo-Implicit Feedback for Alleviating Data Sparsity in Top-K Recommendation Yun He, Haochen Chen, Ziwei Zhu, James Caverlee Department of Computer Science and Engineering, Texas A&M University Department

More information

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Do Something..

More information

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

Deep Character-Level Click-Through Rate Prediction for Sponsored Search Deep Character-Level Click-Through Rate Prediction for Sponsored Search Bora Edizel - Phd Student UPF Amin Mantrach - Criteo Research Xiao Bai - Oath This work was done at Yahoo and will be presented as

More information

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions

More information

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks SOMSN: An Effective Self Organizing Map for Clustering of Social Networks Fatemeh Ghaemmaghami Research Scholar, CSE and IT Dept. Shiraz University, Shiraz, Iran Reza Manouchehri Sarhadi Research Scholar,

More information

Deep MIML Network. Abstract

Deep MIML Network. Abstract Deep MIML Network Ji Feng and Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China Collaborative Innovation Center of Novel Software Technology

More information

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

1 Introduction. 3 Data Preprocessing. 2 Literature Review

1 Introduction. 3 Data Preprocessing. 2 Literature Review Rock or not? This sure does. [Category] Audio & Music CS 229 Project Report Anand Venkatesan(anand95), Arjun Parthipan(arjun777), Lakshmi Manoharan(mlakshmi) 1 Introduction Music Genre Classification continues

More information

VisoLink: A User-Centric Social Relationship Mining

VisoLink: A User-Centric Social Relationship Mining VisoLink: A User-Centric Social Relationship Mining Lisa Fan and Botang Li Department of Computer Science, University of Regina Regina, Saskatchewan S4S 0A2 Canada {fan, li269}@cs.uregina.ca Abstract.

More information

Community Detection: Comparison of State of the Art Algorithms

Community Detection: Comparison of State of the Art Algorithms Community Detection: Comparison of State of the Art Algorithms Josiane Mothe IRIT, UMR5505 CNRS & ESPE, Univ. de Toulouse Toulouse, France e-mail: josiane.mothe@irit.fr Karen Mkhitaryan Institute for Informatics

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

Lab meeting (Paper review session) Stacked Generative Adversarial Networks

Lab meeting (Paper review session) Stacked Generative Adversarial Networks Lab meeting (Paper review session) Stacked Generative Adversarial Networks 2017. 02. 01. Saehoon Kim (Ph. D. candidate) Machine Learning Group Papers to be covered Stacked Generative Adversarial Networks

More information