On Application-Aware Data Extraction for Big Data in Social Networks
|
|
- Stephanie Griffith
- 5 years ago
- Views:
Transcription
1 On Application-Aware Data Extraction for Big Data in Social Networks Ming-Syan Chen Research Center for Information Tech. Innovation, Academia Sinica EE Department, National Taiwan Univ.
2 Fast Increasing of Social Network Activities Example social networks: Twitter Facebook Flickr MSN Wikipedia Amazon.com Such a network Very huge in size! Cannot easily be analyzed M.-S. Chen 2 2
3 The Amount of Information is Huge! Twitter 150+ million members 50 million tweets per day Facebook 800+ million users Amazon Co-purchasing Network half million product nodes several million recomm. links Web Pages Yahoo! Over one billion Web Pages From twitter.om Amazon From SNSP M.-S. Chen 3
4 Example of Big Data and Social Network Volume: thousands of people! Velocity: fast accumulated!! Variety: eating different food!!! M.-S. Chen 4
5 Example of Big Data and Social Network For some gossip in this occasion, Veracity is an issue and the information Value could be low. Mrs. Chang just did a face lift! Mr. Lin won the lottery! M.-S. Chen 5
6 Information Extraction for Big Data in Social Networks Extracting important information from large social network graphs To allow data analysts to mine the information in large social networks, to enable scalable storage and querying, and to facilitate the development of real-world applications M.-S. Chen 6
7 Outline Graph reduction Summarization, sampling, and extraction Information Extraction on Social Network Graphs Capturing key parameters (parameter extraction) Guide query (information extraction) Decomposing SN graphs (structure extraction) M.-S. Chen 7
8 Graph Reduction Graph summarization (going thru all data) e.g., NTU has 32K students, 20% are sushi lovers, 25% prefer steak, also 15% are artists, 20% are engineers, etc. Graph sampling (going thru a subset) Getting a small representative set of NTU students (which preferably fit statistics) Graph extraction Application/goal-oriented data extraction, e.g., only picking good eaters for feast contest. M.-S. Chen 8
9 Graph Extraction To handle complicated things with simple skills. Application/goal-oriented data extraction Three levels of information extraction from SN graphs Parameter extraction (e.g., company stat.) Fast calculation of closeness centrality Information extraction (e.g., company biz.) Guide query Structure extraction (e.g., company org.) Decomposing SN graphs M.-S. Chen 執簡御繁
10 Parameter extraction Structure extraction weapon Information extraction (regarding capability) M.-S. Chen 10
11 Outline Graph reduction Information Extraction on Social Network Graphs Capturing key parameters (parameter extraction) Guide query (information extraction) Decomposing SN graphs (structure extraction) M.-S. Chen 11
12 Closeness centrality There are several interesting quantities, including closeness centrality, network diameters, degree distribution, in SN graphs. Closeness centrality of node v, C c (v): the inverse of the average shortest path distance from v to any other node in a network. If C c (v) is large, v is around the center as it requires only few hops to reach others. M.-S. Chen 12
13 Response to Dynamic Changes It is frequent to have edge insertion or deletion in a social network It is desirable to fast update the closeness centrality of every node in response to edge insertion/deletion. Example use: pick a number of people (the nodes with high CCs) who can maximize advertisement effectiveness. M.-S. Chen
14 Example of Closeness Centrality C c (v): the inverse of the average shortest path distance from v to other nodes C c C c 14 1 ( v) ( w) Thus, node w is closer to all other node than the node v C c ( v) V u V 1 p( v, u) An unweighted and undirected graph G with 14 nodes and 18 edges M.-S. Chen 14
15 Calculating Closeness Centrality One can calculate closeness centralities of all vertices by solving All Pairs Shortest Paths (APSP) problem. O(n(m+n)) based on the breadth-first search (BFS) method for undirected graph, where n and m are the number of nodes and edges in the graph. In a dynamic graph, re-solving APSP problem after each edge insertion or deletion is not efficient. Note that only some pairs of shortest paths will be affected due to certain edge changes. Identify them (unstable node pairs) for fast calculation of CC M.-S. Chen 15
16 Example For example, with the addition of (a,b) Un-changed shortest paths p(b,v), p(c,t) and p(r,h), etc. Changed shortest paths Before edge insertion p(a,b)={a,d,w,b}, p(a,c)={a,d,w,r,c} and p(u,v)={u,l,o,d,w,r,s,v}, etc. After edge insertion (we then call these nodes unstable) p(a,b)={a,b}, p(a,c)={a,b,c} and p(u,v)={u,x,a,b,c,v}, etc. (a): the original unweighted and undirected graph G. (b): G =G e(a,b). M.-S. Chen 16
17 Illustration of Unstable Node Pairs To find V u : u-unstable node set, whose shortest paths to u changed after the edge addition If we perform BFS at node u in G and G to obtain G u and G u, we can find only the shortest paths p(u,b), p(u,c), p(u,h), p(u,v) and p(u,t) changed. unstable node pairs: (u,b), (u,c), (u,h), (u,v) and (u,t). V u ={b,c,h,v,t} G u G u M.-S. Chen 17
18 (Main Theorem) After the addition of edge (a,b), every unstable node pair (whose shortest path changed) {v,u} will have v V a and u V b V a V b Only these shortest paths will change after edge addition (and need to be re-calculated)
19 Concurrent Calculation of CC in SN Perform in parallel BFS at nodes a and b in G to obtain V a ={a,x,l,u},v b ={b,c,h,v,t}, simultaneously. Calculate G a Calculate G b Calculate G a and V a Calculate G b and V b Time Perform BFS starting at a V b Perform BFS starting at x V b Inform nodes in these unstable pairs to re-calculate their shortest paths to others and CC Perform BFS starting at l V b Perform BFS starting at u V b M.-S. Chen 19
20 Experiments To evaluate CENDY, we conducted experiments on six real unit-weighted graph datasets of different types. The case of edge deletion can be done similarly (in light of a companion theorem proposed) M.-S. Chen 20
21 Experiments Evaluation on Edge Insertion From this table, we can see that the closeness centralities of all vertices and APL can be updated only by a few of BFS processes. e.g., DBLP contains 460,4 nodes. The naïve way requires to perform 460K BFS processes to update closeness centrality and APL. However, CENDY only requires 4K BFS processes to finish the task. M.-S. Chen 21
22 Remark In response to the fast changes in SN, CENDY is devised to efficiently update the closeness centrality of each node in the social network. The design of new algorithms is called for to efficiently calculate other key parameters in the fast changing social network M.-S. Chen 22
23 Outline Graph reduction Information Extraction on Social Network Graphs Capturing key parameters (parameter extraction) Guide query (information extraction) Decomposing SN graphs (structure extraction) M.-S. Chen 23
24 Motivation of Guide Query Several works on information finding in social networks Expert finding [Deng 08][Lappas 09] To find the experts based on some given requirement Gateway finding [Koren 06][Wang 10] To find the gateways between the source group and the target group Active Friending [Wu ] To explore social networks to improve friend finding Guide query [Lin ] To explore social networks to improve friend finding [Deng 08] ICDM [Lappas 09] KDD [Koren 06] KDD Wang 10] KDD [Wu ] KDD20. [Lin ] WAIM 20 M.-S. Chen 24
25 Motivation of Guide Query (Cont d) By expert finding, the answer is a list of experts ranked by their expertise. Using the guide query, the answer is a list of informative friends of the querier ranked by the ability of gathering information from experts Exploring social relationship Taking the probabilities of getting help into consideration M.-S. Chen 25
26 Guide Query: Graph Extraction based on Your Friends These two friends are who I should ask for information. This friend is also who I should ask since she can collect information from her friends. I want to know information about Company A or B. A B A A C D A C E E B M.-S. Chen 26
27 Quide Query Guided query [Lin ] For a user initiating the query, the answer is the user s neighbors that are informative about user-assigned attributes. An informative neighbor should either have the attributes itself or know some other friends that have the attributes. [Lin ] Y.-C. Lin, P. S. Yu, M.-S. Chen, Guide Query M.-S. Chen in Social Networks, WAIM
28 Problem Definition Given a query node q and a set of keywords W = {w 1, w 2,, w W }, the guide query is to find the top-k informative neighbors of q considering W. q = N 0 W = {A, B} {A} N 4 N 41 N 11 {C} N 0 {D} N 3 N 12 N 1 {C} {A} N 32 N i N i target candidate {A, B} N N 31 2 {A} N {A} N 21 N 33 N 34 M.-S. Chen 28
29 Problem (Cont d) In the model, an edge is labeled with the probability that a node successfully spreads the request to the linked node. We rank the candidates based on how informative they are, which is evaluated by the proposed {A} N InfScore and 11 DivScore {C} P=0.6 N 0 N 4 {D} N 3 N 41 N 12 N 1 {C} P=0.7 {A} N 32 P=0.3 P=0.2 {A, B} {A} N N 21 N 2 N 31 {A} P=0.8 N 34 N 33 M.-S. Chen 29
30 InfScore InfScore: The informative level for a candidate node (i.e., the ability to spread the request to targets). Modeled by the expected number of targets a candidate is able to spread the request to. {A} N 4 N 41 N 11 {C} N 0 {D} N 3 N 12 N 1 {C} {A} N 32 {A, B} {A} N N 21 N 2 N 31 {A} N 34 N 33 M.-S. Chen 30
31 InfScore InfRatio is defined as the probability that a specific candidate successfully spreads the request to a certain target. e.g., the InfRatio from N 1 to N is 0.25 {A} N 11 {C} N 0 N 4 {D} N 3 N 41 N 12 N 1 {C} P=0.25 {A, B} {A} N N 21 {A} N N 31 2 {A} N 33 N 32 P=0.25 P=0.25 N 34 M.-S. Chen 31
32 InfScore (intensity) The InfScore is the weighted sum of InfRatio. InfScore(N 1 ) = *2 = 1.5 (N 11 ) (N 12 ) (N ) InfScore(N 4 ) = = 1.5 (N 4 ) (N 41 ) {A} N 4 N 41 N 11 {C} N 0 {D} N 3 N InfScore N N N N N 12 {A, B} N 1 {C} P=0.25 {A} N N 21 N 2 {A} N 31 {A} P=0.25 N 32 P=0.25 N 34 N 33 M.-S. Chen 32
33 DivScore (Diversity) The DivScore is an entropy-like measure to evaluate the diversity of possibly accessible target nodes. For each node, the target vector X T is defined as follows. Each item in the vector is a normalized InfScore value, describing the probability distribution on different targets. With the target vector, the DivScore is defined as follows.
34 DivScore We design the DivScore as the probability distribution to each possibly accessible target. Example: DivScore(N 3 ) = [-(1/3)*log 2 (1/3)]*2 + [-(1/6)*log 2 (1/6)]*2 Distribution of N 3 : [0.5/1.5, 0.5/1.5, 0.25/1.5, 0.25/1.5] =[1/3, 1/3, 1/6, 1/6] {A} N 11 {C} N 0 N 4 {D} N 3 N 41 N DivScore N N N N N 12 {A, B} N 1 {C} P=0.25 {A} N N 21 N 2 {A} N 31 P=0.25 {A} N 32 P=0.25 N 34 N 33
35 Experimental Setup DBLP dataset [DBLP] Co-authorship network Edge probability Based on the WC (weighted cascade) model p(n i -> N j ) = 1 / d(n j ) d(n j ) is the in-degree of N j Node attribute Conference names of an author s publications [DBLP] [Chen 10] W. Chen, et al., Scalable Influence Maximization for Prevalent Viral Marketing in Large-Scale Social Networks, KDD M.-S. Chen 35
36 Experimental Results Suppose Ming-Syan Chen wants to discuss with people who have published papers on KDD, SDM, CIKM, ICDM, PKDD, which coauthors should he first connect to? (i.e., Either coauthors who have these conf. papers or coauthors who coauthored with people who have these conf. papers.) Query input: q = Ming-Syan Chen k = 10 W = [KDD, SDM, CIKM, ICDM, PKDD] M.-S. Chen 36
37 Remark The key notion is to guide the query to right candidates in the social network. For each candidate, a combination of the expertise and the social relationship with the person initiating the query is considered Just like the group formation (KDD-12) and this expert finding problem (WAIM-), more applications/tools can be enhanced with SR considered M.-S. Chen 37
38 Outline Graph reduction Information Extraction on Social Network Graphs Capturing key parameters (parameter extraction) Guide query (information extraction) Decomposing SN graphs (structure extraction) M.-S. Chen 38
39 Diffusion Analysis in Social Networks Diffusion of Information can be used to model the interaction among nodes in a network, e.g., Viruses spread over the internet. Disease spread in the community. Rumors/news spread among humans. M.-S. Chen 39
40 Example Diffusion Information diffusion can happen in social networks, such as facebook and twitter. n 1 n n 4 2 n 8 n 5 n 7 n 2 n 6 n 9 Underlying network Path of Infection M.-S. Chen 40
41 The Network is Hidden In some situations, the underlying network is not known (due to cost or privacy issue). Network inference problem (NIP) is studied to discover the underlying network n 1 n 3 1 n 5 3 n 8 0 n 7 n 4 To infer the network 2 from what happened. n 2 n 9 n 6 M.-S. Chen 41
42 Network Inference Problem Assume there is an underlying information network. NIP is to infer the information network given a set of cascades. A cascade t s = [t 1 s,, t N s ] is the time records of information s spreading over the network. (N is #nodes), i.e., node n i gets s (infected) in time t i s If a node i is never infected with s, set t i s =. Ex : t s = [,, 2,, 0,1] n 1 n 2 n 5 n 4 n 6 M.-S. Chen 42 0 n 3 2 1
43 Clustering Cascades Traditionally, NIP assumes there is one underlying network, which may not always be true in reality e.g., Sports news, political news, and entertainment news are likely to spread in different ways Hence, we would like to cluster cascades so that the cascades in each cluster spread in the same pattern An SN graph is hence decomposed into application-specific ones M.-S. Chen 43
44 Example Cascades Cascade a (Lakers news) n 2 n 1 n 3 n 5 n 4 0 n 6 1 Cascade b (49ers news) n 1 n 3 n n 2 n 5 n 6 Cascade c (Redskins news) n 1 n 4 n n 5 n 3 n 6 Cascade d (Heats news) Cascade e (Jets news) Cascade f (Celtics news) n 1 n 4 2 n 2 n 5 1 n 3 n n 1 n n 2 n 5 2 n 3 n 6 n 2 n 1 n 3 n 4 M.-S. Chen 44 1 n 5 0 n 6
45 To Model Inference Network Modeling method: If two nodes are always infected in short time, the weight would be large. w ij = 1 s:t i s <t j s s:t i s <t j s 1 t j s t i s Consider w 12 as an example. {s: t 1 s < t 2 s } = {b, c, e} w 12 = 1 3 ( ) =
46 Example Inference Network n n n n n n 6 M.-S. Chen 46
47 To Cluster Cascades by K-Means Transform cascade t to N-dim indicator based on whether nodes are infected or not. Ex: t a =,,,, 0,1 [0,0,0,0,1,1] t b = 0,,, 1,, [1,0,0,1,0,0] t c = 0,1,2,,, [1,1,1,0,0,0] Run K-means to get the clustering result. (a, d, f) and (b, c, e) 47
48 Graph Decomposition By considering cascades {a, d, f} and cascades {b, c, e} independently (based on which nodes are infected), the original SN graph is decomposed in accordance with the information carried. Cascades {a, d, f} (NBA) Cascades {b, c, e} (NFL) 0.25 n 2 n n n n n n 5 n 3 n n 6 M.-S. Chen n n
49 Remark Traditionally NIP results in a dense and complex network, which is difficult to capture knowledge. By properly clustering cascades, we can have a few resulting concise networks which carry clearer information These resulting networks better match the corresponding cascades than a single dense network. M.-S. Chen 49
50 Conclusion Information extraction is an application/goaloriented process to capture the key ingredients (parameters, information, structure, etc) in the huge SN The procedure of information extraction can be integrated into related process for better efficiency in practice M.-S. Chen 50
51 Thank you! M.-S. Chen 51
52 Graph Summarization Condense the original graph to a more compact form Lossless and lossy methods Required to examine the entire network G 4 c 5 6 A revised example form S. Navlakha et al. Graph Summarization with Bounded Error. SIGMOD 08 a b d Gs {5, 10} {6, 10} Sa={2,3} Sb={1,9} Sc={7,8,10} Sd={4,5,6} M.-S. Chen 52
53 Graph Sampling Graph Sampling Selecting a subset of the original data Characteristics of the original graph are preserved Only a proportion of nodes in the network are visited Sampling Plotted by NodeXL, an EXCEL template created by the NodeXL team at Microsoft Research M.-S. Chen 53
54 A Running Example of CENDY Originally, we have the closeness centralities of all nodes and the average path length of the graph. C c 14 1 ( x) A= a b c d h l o r s t u v w x An unweighted and undirected graph G with 14 nodes and 18 edges L G (14 1) M.-S. Chen 54
55 Example (Cont d) For the insertion of the edge e(a,b). We perform BFS at node a in G and G to obtain G a and G a, and then have V a ={b,c,h,v,t}. G a G a M.-S. Chen 55
56 Example (Cont d) Also, we perform BFS at node b in G and G to obtain G b and G b, and then have V b ={a,x,l,u}. G b G b M.-S. Chen 56
57 Example (Cont d) Then, in light of the main theorem, we re-calculate the paths between V a and V b For example, for node x V b, we calculate (1): p(x,t) - p (x,t) = 7 (1+1+3) = 2 (2): p(x,h) - p (x,h) = 6 4 = 2 (3): p(x,v) - p (x,v) = 6 4 = 2 (4): p(x,c) - p (x,c) = 5 3 = 2 (5): p(x,b) - p (x,b) = 4 2 = 2 and then update its new closeness centrality: C c ( x) 47 (1) (2) (3) (4) (5) G x G x M.-S. Chen 57
58 Example (Cont d) Finally, we update the closeness centralities of the referenced nodes and recalculate the APL. A= a b c d h l o r s t u v w x a b c d h l o r s t u v w x L G M.-S. Chen (14 1)
59 Example Scenario N 0 is initiating a query to find a job in company A or company B. Which friend should N 0 ask for information? N 4 N 41 {A} N 11 {C} N 0 {D} N 3 N 12 N 1 {C} {A} N 32 N 2 N 31 N 34 {A, B} {A} N N 21 {A} N 33 M.-S. Chen 59
60 New Contributions Given M. Gomez-Rodriguez, J. Leskovec, and A. Krause. Inferring Networks of Diffusion and Influence. In KDD 10, Our work is unique in that: 1. We assume there could be many underlying networks (rather than only one). 2. We model and learn a weighted graph (rather than an unweighted one). M.-S. Chen 60
Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing
Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Gautam Bhat, Rajeev Kumar Singh Department of Computer Science and Engineering Shiv Nadar University Gautam Buddh Nagar,
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford
More informationRecommendation on the Web Search by Using Co-Occurrence
Recommendation on the Web Search by Using Co-Occurrence S.Jayabalaji 1, G.Thilagavathy 2, P.Kubendiran 3, V.D.Srihari 4. UG Scholar, Department of Computer science & Engineering, Sree Shakthi Engineering
More informationDe#anonymizing,Social,Networks, and,inferring,private,attributes, Using,Knowledge,Graphs,
De#anonymizing,Social,Networks, and,inferring,private,attributes, Using,Knowledge,Graphs, Jianwei Qian Illinois Tech Chunhong Zhang BUPT Xiang#Yang Li USTC,/Illinois Tech Linlin Chen Illinois Tech Outline
More informationScalable Influence Maximization in Social Networks under the Linear Threshold Model
Scalable Influence Maximization in Social Networks under the Linear Threshold Model Wei Chen Microsoft Research Asia Yifei Yuan Li Zhang In collaboration with University of Pennsylvania Microsoft Research
More informationLecture Note: Computation problems in social. network analysis
Lecture Note: Computation problems in social network analysis Bang Ye Wu CSIE, Chung Cheng University, Taiwan September 29, 2008 In this lecture note, several computational problems are listed, including
More informationFast Nearest Neighbor Search on Large Time-Evolving Graphs
Fast Nearest Neighbor Search on Large Time-Evolving Graphs Leman Akoglu Srinivasan Parthasarathy Rohit Khandekar Vibhore Kumar Deepak Rajan Kun-Lung Wu Graphs are everywhere Leman Akoglu Fast Nearest Neighbor
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank
More informationExample 3: Viral Marketing and the vaccination policy problem
Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 2: Jan 19, 2012 Scribes: Preethi Ambati and Azar Aliyev Example 3: Viral Marketing and the vaccination policy problem Diffusion
More informationViral Marketing and Outbreak Detection. Fang Jin Yao Zhang
Viral Marketing and Outbreak Detection Fang Jin Yao Zhang Paper 1: Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, Éva Tardos KDD 2003 Outline Problem Statement
More informationCascades. Rik Sarkar. Social and Technological Networks. University of Edinburgh, 2018.
Cascades Social and Technological Networks Rik Sarkar University of Edinburgh, 2018. Course Solutions to Ex0 are up Make sure you are comfortable with this material Notes 1 with exercise questions are
More informationA BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK
A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific
More informationExtracting Information from Complex Networks
Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform
More informationChapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han
Chapter 1. Social Media and Social Computing October 2012 Youn-Hee Han http://link.koreatech.ac.kr 1.1 Social Media A rapid development and change of the Web and the Internet Participatory web application
More informationIRIE: Scalable and Robust Influence Maximization in Social Networks
IRIE: Scalable and Robust Influence Maximization in Social Networks Kyomin Jung KAIST Republic of Korea kyomin@kaist.edu Wooram Heo KAIST Republic of Korea modesty83@kaist.ac.kr Wei Chen Microsoft Research
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/24/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 High dim. data
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 3/6/2012 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 In many data mining
More informationGraph Algorithms using Map-Reduce. Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web
Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some
More informationMining Social Network Graphs
Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be
More informationGraph Mining: Overview of different graph models
Graph Mining: Overview of different graph models Davide Mottin, Konstantina Lazaridou Hasso Plattner Institute Graph Mining course Winter Semester 2016 Lecture road Anomaly detection (previous lecture)
More informationEfficient Mining Algorithms for Large-scale Graphs
Efficient Mining Algorithms for Large-scale Graphs Yasunari Kishimoto, Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka Abstract This article describes efficient graph mining algorithms designed
More informationSmallBlue: Unlock Collective Intelligence from Information Flows in Social Networks
SmallBlue: Unlock Collective Intelligence from Information Flows in Social Networks Dashun Wang Northeastern University 110 Forsyth Street, Boston, MA 02115 Zhen Wen, Ching-Yung Lin IBM T. J. Watson Research
More informationAlgorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov
Algorithms and Applications in Social Networks 2017/2018, Semester B Slava Novgorodov 1 Lesson #1 Administrative questions Course overview Introduction to Social Networks Basic definitions Network properties
More informationContent-Centric Flow Mining for Influence Analysis in Social Streams
Content-Centric Flow Mining for Influence Analysis in Social Streams Karthik Subbian University of Minnesota, MN. karthik@cs.umn.edu Charu Aggarwal IBM Watson Research, NY. charu@us.ibm.com Jaideep Srivastava
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationDS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK232 Fall 2016 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna,
More informationAn Empirical Analysis of Communities in Real-World Networks
An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization
More informationJure Leskovec Machine Learning Department Carnegie Mellon University
Jure Leskovec Machine Learning Department Carnegie Mellon University Currently: Soon: Today: Large on line systems have detailed records of human activity On line communities: Facebook (64 million users,
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More informationDynamic and Historical Shortest-Path Distance Queries on Large Evolving Networks by Pruned Landmark Labeling
2014/04/09 @ WWW 14 Dynamic and Historical Shortest-Path Distance Queries on Large Evolving Networks by Pruned Landmark Labeling Takuya Akiba (U Tokyo) Yoichi Iwata (U Tokyo) Yuichi Yoshida (NII & PFI)
More informationCS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul
1 CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul Introduction Our problem is crawling a static social graph (snapshot). Given
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationEffective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar
Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar School of Computer Science Faculty of Science University of Windsor How Big is this Big Data? 40 Billion Instagram Photos 300 Hours
More informationCS224W: Analysis of Networks Jure Leskovec, Stanford University
CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 Observations Models
More informationPrivacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University
Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy Xiaokui Xiao Nanyang Technological University Outline Privacy preserving data publishing: What and Why Examples of privacy attacks
More informationHolistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges
Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Abhishek Santra 1 and Sanjukta Bhowmick 2 1 Information Technology Laboratory, CSE Department, University of
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More informationOn Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs
On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs Sungpack Hong 2, Nicole C. Rodia 1, and Kunle Olukotun 1 1 Pervasive Parallelism Laboratory, Stanford University
More informationDS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK 233 Spring 2018 Service Providing Improve urban planning, Ease Traffic Congestion, Save Energy,
More informationGraphGAN: Graph Representation Learning with Generative Adversarial Nets
The 32 nd AAAI Conference on Artificial Intelligence (AAAI 2018) New Orleans, Louisiana, USA GraphGAN: Graph Representation Learning with Generative Adversarial Nets Hongwei Wang 1,2, Jia Wang 3, Jialin
More informationBring Semantic Web to Social Communities
Bring Semantic Web to Social Communities Jie Tang Dept. of Computer Science, Tsinghua University, China jietang@tsinghua.edu.cn April 19, 2010 Abstract Recently, more and more researchers have recognized
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #10: Link Analysis-2 Seoul National University 1 In This Lecture Pagerank: Google formulation Make the solution to converge Computing Pagerank for very large graphs
More informationIntroduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion
Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion References Social Network Social Network Analysis Sociocentric
More informationAn Optimal Allocation Approach to Influence Maximization Problem on Modular Social Network. Tianyu Cao, Xindong Wu, Song Wang, Xiaohua Hu
An Optimal Allocation Approach to Influence Maximization Problem on Modular Social Network Tianyu Cao, Xindong Wu, Song Wang, Xiaohua Hu ACM SAC 2010 outline Social network Definition and properties Social
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Queries on streams
More informationNon Overlapping Communities
Non Overlapping Communities Davide Mottin, Konstantina Lazaridou HassoPlattner Institute Graph Mining course Winter Semester 2016 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides
More informationLearning Network Graph of SIR Epidemic Cascades Using Minimal Hitting Set based Approach
Learning Network Graph of SIR Epidemic Cascades Using Minimal Hitting Set based Approach Zhuozhao Li and Haiying Shen Dept. of Electrical and Computer Engineering Clemson University, SC, USA Kang Chen
More informationAnalysis of Large Graphs: TrustRank and WebSpam
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #6: Mining Data Streams Seoul National University 1 Outline Overview Sampling From Data Stream Queries Over Sliding Window 2 Data Streams In many data mining situations,
More informationCS425: Algorithms for Web Scale Data
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org J.
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationDeveloping Focused Crawlers for Genre Specific Search Engines
Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com
More informationGraph Algorithms. Revised based on the slides by Ruoming Kent State
Graph Algorithms Adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States. See http://creativecommons.org/licenses/by-nc-sa/3.0/us/
More informationJianyong Wang Department of Computer Science and Technology Tsinghua University
Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity
More informationUsing! to Teach Graph Theory
!! Using! to Teach Graph Theory Todd Abel Mary Elizabeth Searcy Appalachian State University Why Graph Theory? Mathematical Thinking (Habits of Mind, Mathematical Practices) Accessible to students at a
More informationMaximizing the Spread of Influence through a Social Network
Maximizing the Spread of Influence through a Social Network By David Kempe, Jon Kleinberg, Eva Tardos Report by Joe Abrams Social Networks Infectious disease networks Viral Marketing Viral Marketing Example:
More informationElection Analysis and Prediction Using Big Data Analytics
Election Analysis and Prediction Using Big Data Analytics Omkar Sawant, Chintaman Taral, Roopak Garbhe Students, Department Of Information Technology Vidyalankar Institute of Technology, Mumbai, India
More informationHow to organize the Web?
How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second try: Web Search Information Retrieval attempts to find relevant docs in a small and trusted set Newspaper
More informationDatabases 2 (VU) ( / )
Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:
More informationMCL. (and other clustering algorithms) 858L
MCL (and other clustering algorithms) 858L Comparing Clustering Algorithms Brohee and van Helden (2006) compared 4 graph clustering algorithms for the task of finding protein complexes: MCODE RNSC Restricted
More informationAMAZON.COM RECOMMENDATIONS ITEM-TO-ITEM COLLABORATIVE FILTERING PAPER BY GREG LINDEN, BRENT SMITH, AND JEREMY YORK
AMAZON.COM RECOMMENDATIONS ITEM-TO-ITEM COLLABORATIVE FILTERING PAPER BY GREG LINDEN, BRENT SMITH, AND JEREMY YORK PRESENTED BY: DEEVEN PAUL ADITHELA 2708705 OUTLINE INTRODUCTION DIFFERENT TYPES OF FILTERING
More informationUser-centric Cross-network Social Multimedia Computing
Part III User-centric Cross-network Social Multimedia Computing Jitao Sang Multimedia Computing Group National Lab of Pattern Recognition, Institute of Automation Chinese Academy of Sciences Big Data &
More informationAdvanced Computer Graphics CS 525M: Crowds replace Experts: Building Better Location-based Services using Mobile Social Network Interactions
Advanced Computer Graphics CS 525M: Crowds replace Experts: Building Better Location-based Services using Mobile Social Network Interactions XIAOCHEN HUANG Computer Science Dept. Worcester Polytechnic
More informationPositive and Negative Links
Positive and Negative Links Web Science (VU) (707.000) Elisabeth Lex KTI, TU Graz May 4, 2015 Elisabeth Lex (KTI, TU Graz) Networks May 4, 2015 1 / 66 Outline 1 Repetition 2 Motivation 3 Structural Balance
More informationAN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE
AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3
More informationMath 443/543 Graph Theory Notes 10: Small world phenomenon and decentralized search
Math 443/543 Graph Theory Notes 0: Small world phenomenon and decentralized search David Glickenstein November 0, 008 Small world phenomenon The small world phenomenon is the principle that all people
More informationGraph Symmetry and Social Network Anonymization
Graph Symmetry and Social Network Anonymization Yanghua XIAO ( 肖仰华 ) School of computer science Fudan University For more information, please visit http://gdm.fudan.edu.cn Graph isomorphism determination
More informationREVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK
REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK 1 Dr.R.Kousalya, 2 T.Sindhupriya 1 Research Supervisor, Professor & Head, Department of Computer Applications, Dr.N.G.P Arts and Science College, Coimbatore
More informationProbabilistic Graph Summarization
Probabilistic Graph Summarization Nasrin Hassanlou, Maryam Shoaran, and Alex Thomo University of Victoria, Victoria, Canada {hassanlou,maryam,thomo}@cs.uvic.ca 1 Abstract We study group-summarization of
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/25/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 3 In many data mining
More informationMaximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg and Eva Tardos
Maximizing the Spread of Influence through a Social Network David Kempe, Jon Kleinberg and Eva Tardos Group 9 Lauren Thomas, Ryan Lieblein, Joshua Hammock and Mary Hanvey Introduction In a social network,
More informationNew Challenges in Big Data: Technical Perspectives. Hwanjo Yu POSTECH
New Challenges in Big Data: Technical Perspectives Hwanjo Yu POSTECH http:/hwanjoyu.org Over 1 Billion SNS users!! Viral Marketing Word-of-Mouth Effect > TV advertising......... Influence Maximization
More informationModeling Dynamic Behavior in Large Evolving Graphs
Modeling Dynamic Behavior in Large Evolving Graphs R. Rossi, J. Neville, B. Gallagher, and K. Henderson Presented by: Doaa Altarawy 1 Outline - Motivation - Proposed Model - Definitions - Modeling dynamic
More informationDepartment of Computer Science & Engineering The Graduate School, Chung-Ang University. CAU Artificial Intelligence LAB
Department of Computer Science & Engineering The Graduate School, Chung-Ang University CAU Artificial Intelligence LAB 1 / 17 Text data is exploding on internet because of the appearance of SNS, such as
More informationChapter 10. Fundamental Network Algorithms. M. E. J. Newman. May 6, M. E. J. Newman Chapter 10 May 6, / 33
Chapter 10 Fundamental Network Algorithms M. E. J. Newman May 6, 2015 M. E. J. Newman Chapter 10 May 6, 2015 1 / 33 Table of Contents 1 Algorithms for Degrees and Degree Distributions Degree-Degree Correlation
More informationOverview of the Stateof-the-Art. Networks. Evolution of social network studies
Overview of the Stateof-the-Art in Social Networks INF5370 spring 2014 Evolution of social network studies 1950-1970: mathematical studies of networks formed by the actual human interactions Pandemics,
More informationUniversity of Maryland. Tuesday, March 2, 2010
Data-Intensive Information Processing Applications Session #5 Graph Algorithms Jimmy Lin University of Maryland Tuesday, March 2, 2010 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/6/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 High dim. data Graph data Infinite data Machine
More informationMeasuring and Evaluating Dissimilarity in Data and Pattern Spaces
Measuring and Evaluating Dissimilarity in Data and Pattern Spaces Irene Ntoutsi, Yannis Theodoridis Database Group, Information Systems Laboratory Department of Informatics, University of Piraeus, Greece
More informationRecommender Systems 6CCS3WSN-7CCSMWAL
Recommender Systems 6CCS3WSN-7CCSMWAL http://insidebigdata.com/wp-content/uploads/2014/06/humorrecommender.jpg Some basic methods of recommendation Recommend popular items Collaborative Filtering Item-to-Item:
More informationInfluence Maximization in the Independent Cascade Model
Influence Maximization in the Independent Cascade Model Gianlorenzo D Angelo, Lorenzo Severini, and Yllka Velaj Gran Sasso Science Institute (GSSI), Viale F. Crispi, 7, 67100, L Aquila, Italy. {gianlorenzo.dangelo,lorenzo.severini,yllka.velaj}@gssi.infn.it
More informationMining Trusted Information in Medical Science: An Information Network Approach
Mining Trusted Information in Medical Science: An Information Network Approach Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign Collaborated with many, especially Yizhou
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References q Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationLarge Scale Graph Algorithms
Large Scale Graph Algorithms A Guide to Web Research: Lecture 2 Yury Lifshits Steklov Institute of Mathematics at St.Petersburg Stuttgart, Spring 2007 1 / 34 Talk Objective To pose an abstract computational
More informationFunctionality, Challenges and Architecture of Social Networks
Functionality, Challenges and Architecture of Social Networks INF 5370 Outline Social Network Services Functionality Business Model Current Architecture and Scalability Challenges Conclusion 1 Social Network
More informationInferring the Underlying Structure of Information Cascades
212 IEEE 12th International Conference on Data Mining Inferring the Underlying Structure of Information Cascades Bo Zong, Yinghui Wu, Ambuj K. Singh, and Xifeng Yan University of California at Santa Barbara
More informationCombining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating
Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,
More informationBig Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety
Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety Abhishek
More informationDatabase and Knowledge-Base Systems: Data Mining. Martin Ester
Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro
More informationarxiv: v1 [cs.si] 12 Jan 2019
Predicting Diffusion Reach Probabilities via Representation Learning on Social Networks Furkan Gursoy furkan.gursoy@boun.edu.tr Ahmet Onur Durahim onur.durahim@boun.edu.tr arxiv:1901.03829v1 [cs.si] 12
More informationDistance-based Outlier Detection: Consolidation and Renewed Bearing
Distance-based Outlier Detection: Consolidation and Renewed Bearing Gustavo. H. Orair, Carlos H. C. Teixeira, Wagner Meira Jr., Ye Wang, Srinivasan Parthasarathy September 15, 2010 Table of contents Introduction
More informationExamination examples
Examination examples Databasteknik (5 hours) 1. Relational Algebra & SQL (4 pts total; 2 pts each). Part A Consider the relations R(A, B), and S(C, D). Of the following three equivalences between expressions
More informationSlides based on those in:
Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org A 3.3 B 38.4 C 34.3 D 3.9 E 8.1 F 3.9 1.6 1.6 1.6 1.6 1.6 2 y 0.8 ½+0.2 ⅓ M 1/2 1/2 0 0.8 1/2 0 0 + 0.2 0 1/2 1 [1/N]
More informationReduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs
Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Alessandro Epasto J. Feldman*, S. Lattanzi*, S. Leonardi, V. Mirrokni*. *Google Research Sapienza U. Rome Motivation Recommendation
More informationUsing Time-Sensitive Rooted PageRank to Detect Hierarchical Social Relationships
Using Time-Sensitive Rooted PageRank to Detect Hierarchical Social Relationships Mohammad Jaber 1, Panagiotis Papapetrou 2, Sven Helmer 3, Peter T. Wood 1 1 Department of Comp. Sci. and Info. Systems,
More informationInternational Journal of Advance Engineering and Research Development. Performance Enhancement of Search System
Scientific Journal of Impact Factor(SJIF): 3.134 International Journal of Advance Engineering and Research Development Volume 2,Issue 7, July -2015 Performance Enhancement of Search System Ms. Uma P Nalawade
More informationFPGP: Graph Processing Framework on FPGA
FPGP: Graph Processing Framework on FPGA Guohao DAI, Yuze CHI, Yu WANG, Huazhong YANG E.E. Dept., TNLIST, Tsinghua University dgh14@mails.tsinghua.edu.cn 1 Big graph is widely used Big graph is widely
More informationIntroduction to Data Science
UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics
More informationValue Added Association Rules
Value Added Association Rules T.Y. Lin San Jose State University drlin@sjsu.edu Glossary Association Rule Mining A Association Rule Mining is an exploratory learning task to discover some hidden, dependency
More information