On Application-Aware Data Extraction for Big Data in Social Networks

Size: px
Start display at page:

Download "On Application-Aware Data Extraction for Big Data in Social Networks"

Transcription

1 On Application-Aware Data Extraction for Big Data in Social Networks Ming-Syan Chen Research Center for Information Tech. Innovation, Academia Sinica EE Department, National Taiwan Univ.

2 Fast Increasing of Social Network Activities Example social networks: Twitter Facebook Flickr MSN Wikipedia Amazon.com Such a network Very huge in size! Cannot easily be analyzed M.-S. Chen 2 2

3 The Amount of Information is Huge! Twitter 150+ million members 50 million tweets per day Facebook 800+ million users Amazon Co-purchasing Network half million product nodes several million recomm. links Web Pages Yahoo! Over one billion Web Pages From twitter.om Amazon From SNSP M.-S. Chen 3

4 Example of Big Data and Social Network Volume: thousands of people! Velocity: fast accumulated!! Variety: eating different food!!! M.-S. Chen 4

5 Example of Big Data and Social Network For some gossip in this occasion, Veracity is an issue and the information Value could be low. Mrs. Chang just did a face lift! Mr. Lin won the lottery! M.-S. Chen 5

6 Information Extraction for Big Data in Social Networks Extracting important information from large social network graphs To allow data analysts to mine the information in large social networks, to enable scalable storage and querying, and to facilitate the development of real-world applications M.-S. Chen 6

7 Outline Graph reduction Summarization, sampling, and extraction Information Extraction on Social Network Graphs Capturing key parameters (parameter extraction) Guide query (information extraction) Decomposing SN graphs (structure extraction) M.-S. Chen 7

8 Graph Reduction Graph summarization (going thru all data) e.g., NTU has 32K students, 20% are sushi lovers, 25% prefer steak, also 15% are artists, 20% are engineers, etc. Graph sampling (going thru a subset) Getting a small representative set of NTU students (which preferably fit statistics) Graph extraction Application/goal-oriented data extraction, e.g., only picking good eaters for feast contest. M.-S. Chen 8

9 Graph Extraction To handle complicated things with simple skills. Application/goal-oriented data extraction Three levels of information extraction from SN graphs Parameter extraction (e.g., company stat.) Fast calculation of closeness centrality Information extraction (e.g., company biz.) Guide query Structure extraction (e.g., company org.) Decomposing SN graphs M.-S. Chen 執簡御繁

10 Parameter extraction Structure extraction weapon Information extraction (regarding capability) M.-S. Chen 10

11 Outline Graph reduction Information Extraction on Social Network Graphs Capturing key parameters (parameter extraction) Guide query (information extraction) Decomposing SN graphs (structure extraction) M.-S. Chen 11

12 Closeness centrality There are several interesting quantities, including closeness centrality, network diameters, degree distribution, in SN graphs. Closeness centrality of node v, C c (v): the inverse of the average shortest path distance from v to any other node in a network. If C c (v) is large, v is around the center as it requires only few hops to reach others. M.-S. Chen 12

13 Response to Dynamic Changes It is frequent to have edge insertion or deletion in a social network It is desirable to fast update the closeness centrality of every node in response to edge insertion/deletion. Example use: pick a number of people (the nodes with high CCs) who can maximize advertisement effectiveness. M.-S. Chen

14 Example of Closeness Centrality C c (v): the inverse of the average shortest path distance from v to other nodes C c C c 14 1 ( v) ( w) Thus, node w is closer to all other node than the node v C c ( v) V u V 1 p( v, u) An unweighted and undirected graph G with 14 nodes and 18 edges M.-S. Chen 14

15 Calculating Closeness Centrality One can calculate closeness centralities of all vertices by solving All Pairs Shortest Paths (APSP) problem. O(n(m+n)) based on the breadth-first search (BFS) method for undirected graph, where n and m are the number of nodes and edges in the graph. In a dynamic graph, re-solving APSP problem after each edge insertion or deletion is not efficient. Note that only some pairs of shortest paths will be affected due to certain edge changes. Identify them (unstable node pairs) for fast calculation of CC M.-S. Chen 15

16 Example For example, with the addition of (a,b) Un-changed shortest paths p(b,v), p(c,t) and p(r,h), etc. Changed shortest paths Before edge insertion p(a,b)={a,d,w,b}, p(a,c)={a,d,w,r,c} and p(u,v)={u,l,o,d,w,r,s,v}, etc. After edge insertion (we then call these nodes unstable) p(a,b)={a,b}, p(a,c)={a,b,c} and p(u,v)={u,x,a,b,c,v}, etc. (a): the original unweighted and undirected graph G. (b): G =G e(a,b). M.-S. Chen 16

17 Illustration of Unstable Node Pairs To find V u : u-unstable node set, whose shortest paths to u changed after the edge addition If we perform BFS at node u in G and G to obtain G u and G u, we can find only the shortest paths p(u,b), p(u,c), p(u,h), p(u,v) and p(u,t) changed. unstable node pairs: (u,b), (u,c), (u,h), (u,v) and (u,t). V u ={b,c,h,v,t} G u G u M.-S. Chen 17

18 (Main Theorem) After the addition of edge (a,b), every unstable node pair (whose shortest path changed) {v,u} will have v V a and u V b V a V b Only these shortest paths will change after edge addition (and need to be re-calculated)

19 Concurrent Calculation of CC in SN Perform in parallel BFS at nodes a and b in G to obtain V a ={a,x,l,u},v b ={b,c,h,v,t}, simultaneously. Calculate G a Calculate G b Calculate G a and V a Calculate G b and V b Time Perform BFS starting at a V b Perform BFS starting at x V b Inform nodes in these unstable pairs to re-calculate their shortest paths to others and CC Perform BFS starting at l V b Perform BFS starting at u V b M.-S. Chen 19

20 Experiments To evaluate CENDY, we conducted experiments on six real unit-weighted graph datasets of different types. The case of edge deletion can be done similarly (in light of a companion theorem proposed) M.-S. Chen 20

21 Experiments Evaluation on Edge Insertion From this table, we can see that the closeness centralities of all vertices and APL can be updated only by a few of BFS processes. e.g., DBLP contains 460,4 nodes. The naïve way requires to perform 460K BFS processes to update closeness centrality and APL. However, CENDY only requires 4K BFS processes to finish the task. M.-S. Chen 21

22 Remark In response to the fast changes in SN, CENDY is devised to efficiently update the closeness centrality of each node in the social network. The design of new algorithms is called for to efficiently calculate other key parameters in the fast changing social network M.-S. Chen 22

23 Outline Graph reduction Information Extraction on Social Network Graphs Capturing key parameters (parameter extraction) Guide query (information extraction) Decomposing SN graphs (structure extraction) M.-S. Chen 23

24 Motivation of Guide Query Several works on information finding in social networks Expert finding [Deng 08][Lappas 09] To find the experts based on some given requirement Gateway finding [Koren 06][Wang 10] To find the gateways between the source group and the target group Active Friending [Wu ] To explore social networks to improve friend finding Guide query [Lin ] To explore social networks to improve friend finding [Deng 08] ICDM [Lappas 09] KDD [Koren 06] KDD Wang 10] KDD [Wu ] KDD20. [Lin ] WAIM 20 M.-S. Chen 24

25 Motivation of Guide Query (Cont d) By expert finding, the answer is a list of experts ranked by their expertise. Using the guide query, the answer is a list of informative friends of the querier ranked by the ability of gathering information from experts Exploring social relationship Taking the probabilities of getting help into consideration M.-S. Chen 25

26 Guide Query: Graph Extraction based on Your Friends These two friends are who I should ask for information. This friend is also who I should ask since she can collect information from her friends. I want to know information about Company A or B. A B A A C D A C E E B M.-S. Chen 26

27 Quide Query Guided query [Lin ] For a user initiating the query, the answer is the user s neighbors that are informative about user-assigned attributes. An informative neighbor should either have the attributes itself or know some other friends that have the attributes. [Lin ] Y.-C. Lin, P. S. Yu, M.-S. Chen, Guide Query M.-S. Chen in Social Networks, WAIM

28 Problem Definition Given a query node q and a set of keywords W = {w 1, w 2,, w W }, the guide query is to find the top-k informative neighbors of q considering W. q = N 0 W = {A, B} {A} N 4 N 41 N 11 {C} N 0 {D} N 3 N 12 N 1 {C} {A} N 32 N i N i target candidate {A, B} N N 31 2 {A} N {A} N 21 N 33 N 34 M.-S. Chen 28

29 Problem (Cont d) In the model, an edge is labeled with the probability that a node successfully spreads the request to the linked node. We rank the candidates based on how informative they are, which is evaluated by the proposed {A} N InfScore and 11 DivScore {C} P=0.6 N 0 N 4 {D} N 3 N 41 N 12 N 1 {C} P=0.7 {A} N 32 P=0.3 P=0.2 {A, B} {A} N N 21 N 2 N 31 {A} P=0.8 N 34 N 33 M.-S. Chen 29

30 InfScore InfScore: The informative level for a candidate node (i.e., the ability to spread the request to targets). Modeled by the expected number of targets a candidate is able to spread the request to. {A} N 4 N 41 N 11 {C} N 0 {D} N 3 N 12 N 1 {C} {A} N 32 {A, B} {A} N N 21 N 2 N 31 {A} N 34 N 33 M.-S. Chen 30

31 InfScore InfRatio is defined as the probability that a specific candidate successfully spreads the request to a certain target. e.g., the InfRatio from N 1 to N is 0.25 {A} N 11 {C} N 0 N 4 {D} N 3 N 41 N 12 N 1 {C} P=0.25 {A, B} {A} N N 21 {A} N N 31 2 {A} N 33 N 32 P=0.25 P=0.25 N 34 M.-S. Chen 31

32 InfScore (intensity) The InfScore is the weighted sum of InfRatio. InfScore(N 1 ) = *2 = 1.5 (N 11 ) (N 12 ) (N ) InfScore(N 4 ) = = 1.5 (N 4 ) (N 41 ) {A} N 4 N 41 N 11 {C} N 0 {D} N 3 N InfScore N N N N N 12 {A, B} N 1 {C} P=0.25 {A} N N 21 N 2 {A} N 31 {A} P=0.25 N 32 P=0.25 N 34 N 33 M.-S. Chen 32

33 DivScore (Diversity) The DivScore is an entropy-like measure to evaluate the diversity of possibly accessible target nodes. For each node, the target vector X T is defined as follows. Each item in the vector is a normalized InfScore value, describing the probability distribution on different targets. With the target vector, the DivScore is defined as follows.

34 DivScore We design the DivScore as the probability distribution to each possibly accessible target. Example: DivScore(N 3 ) = [-(1/3)*log 2 (1/3)]*2 + [-(1/6)*log 2 (1/6)]*2 Distribution of N 3 : [0.5/1.5, 0.5/1.5, 0.25/1.5, 0.25/1.5] =[1/3, 1/3, 1/6, 1/6] {A} N 11 {C} N 0 N 4 {D} N 3 N 41 N DivScore N N N N N 12 {A, B} N 1 {C} P=0.25 {A} N N 21 N 2 {A} N 31 P=0.25 {A} N 32 P=0.25 N 34 N 33

35 Experimental Setup DBLP dataset [DBLP] Co-authorship network Edge probability Based on the WC (weighted cascade) model p(n i -> N j ) = 1 / d(n j ) d(n j ) is the in-degree of N j Node attribute Conference names of an author s publications [DBLP] [Chen 10] W. Chen, et al., Scalable Influence Maximization for Prevalent Viral Marketing in Large-Scale Social Networks, KDD M.-S. Chen 35

36 Experimental Results Suppose Ming-Syan Chen wants to discuss with people who have published papers on KDD, SDM, CIKM, ICDM, PKDD, which coauthors should he first connect to? (i.e., Either coauthors who have these conf. papers or coauthors who coauthored with people who have these conf. papers.) Query input: q = Ming-Syan Chen k = 10 W = [KDD, SDM, CIKM, ICDM, PKDD] M.-S. Chen 36

37 Remark The key notion is to guide the query to right candidates in the social network. For each candidate, a combination of the expertise and the social relationship with the person initiating the query is considered Just like the group formation (KDD-12) and this expert finding problem (WAIM-), more applications/tools can be enhanced with SR considered M.-S. Chen 37

38 Outline Graph reduction Information Extraction on Social Network Graphs Capturing key parameters (parameter extraction) Guide query (information extraction) Decomposing SN graphs (structure extraction) M.-S. Chen 38

39 Diffusion Analysis in Social Networks Diffusion of Information can be used to model the interaction among nodes in a network, e.g., Viruses spread over the internet. Disease spread in the community. Rumors/news spread among humans. M.-S. Chen 39

40 Example Diffusion Information diffusion can happen in social networks, such as facebook and twitter. n 1 n n 4 2 n 8 n 5 n 7 n 2 n 6 n 9 Underlying network Path of Infection M.-S. Chen 40

41 The Network is Hidden In some situations, the underlying network is not known (due to cost or privacy issue). Network inference problem (NIP) is studied to discover the underlying network n 1 n 3 1 n 5 3 n 8 0 n 7 n 4 To infer the network 2 from what happened. n 2 n 9 n 6 M.-S. Chen 41

42 Network Inference Problem Assume there is an underlying information network. NIP is to infer the information network given a set of cascades. A cascade t s = [t 1 s,, t N s ] is the time records of information s spreading over the network. (N is #nodes), i.e., node n i gets s (infected) in time t i s If a node i is never infected with s, set t i s =. Ex : t s = [,, 2,, 0,1] n 1 n 2 n 5 n 4 n 6 M.-S. Chen 42 0 n 3 2 1

43 Clustering Cascades Traditionally, NIP assumes there is one underlying network, which may not always be true in reality e.g., Sports news, political news, and entertainment news are likely to spread in different ways Hence, we would like to cluster cascades so that the cascades in each cluster spread in the same pattern An SN graph is hence decomposed into application-specific ones M.-S. Chen 43

44 Example Cascades Cascade a (Lakers news) n 2 n 1 n 3 n 5 n 4 0 n 6 1 Cascade b (49ers news) n 1 n 3 n n 2 n 5 n 6 Cascade c (Redskins news) n 1 n 4 n n 5 n 3 n 6 Cascade d (Heats news) Cascade e (Jets news) Cascade f (Celtics news) n 1 n 4 2 n 2 n 5 1 n 3 n n 1 n n 2 n 5 2 n 3 n 6 n 2 n 1 n 3 n 4 M.-S. Chen 44 1 n 5 0 n 6

45 To Model Inference Network Modeling method: If two nodes are always infected in short time, the weight would be large. w ij = 1 s:t i s <t j s s:t i s <t j s 1 t j s t i s Consider w 12 as an example. {s: t 1 s < t 2 s } = {b, c, e} w 12 = 1 3 ( ) =

46 Example Inference Network n n n n n n 6 M.-S. Chen 46

47 To Cluster Cascades by K-Means Transform cascade t to N-dim indicator based on whether nodes are infected or not. Ex: t a =,,,, 0,1 [0,0,0,0,1,1] t b = 0,,, 1,, [1,0,0,1,0,0] t c = 0,1,2,,, [1,1,1,0,0,0] Run K-means to get the clustering result. (a, d, f) and (b, c, e) 47

48 Graph Decomposition By considering cascades {a, d, f} and cascades {b, c, e} independently (based on which nodes are infected), the original SN graph is decomposed in accordance with the information carried. Cascades {a, d, f} (NBA) Cascades {b, c, e} (NFL) 0.25 n 2 n n n n n n 5 n 3 n n 6 M.-S. Chen n n

49 Remark Traditionally NIP results in a dense and complex network, which is difficult to capture knowledge. By properly clustering cascades, we can have a few resulting concise networks which carry clearer information These resulting networks better match the corresponding cascades than a single dense network. M.-S. Chen 49

50 Conclusion Information extraction is an application/goaloriented process to capture the key ingredients (parameters, information, structure, etc) in the huge SN The procedure of information extraction can be integrated into related process for better efficiency in practice M.-S. Chen 50

51 Thank you! M.-S. Chen 51

52 Graph Summarization Condense the original graph to a more compact form Lossless and lossy methods Required to examine the entire network G 4 c 5 6 A revised example form S. Navlakha et al. Graph Summarization with Bounded Error. SIGMOD 08 a b d Gs {5, 10} {6, 10} Sa={2,3} Sb={1,9} Sc={7,8,10} Sd={4,5,6} M.-S. Chen 52

53 Graph Sampling Graph Sampling Selecting a subset of the original data Characteristics of the original graph are preserved Only a proportion of nodes in the network are visited Sampling Plotted by NodeXL, an EXCEL template created by the NodeXL team at Microsoft Research M.-S. Chen 53

54 A Running Example of CENDY Originally, we have the closeness centralities of all nodes and the average path length of the graph. C c 14 1 ( x) A= a b c d h l o r s t u v w x An unweighted and undirected graph G with 14 nodes and 18 edges L G (14 1) M.-S. Chen 54

55 Example (Cont d) For the insertion of the edge e(a,b). We perform BFS at node a in G and G to obtain G a and G a, and then have V a ={b,c,h,v,t}. G a G a M.-S. Chen 55

56 Example (Cont d) Also, we perform BFS at node b in G and G to obtain G b and G b, and then have V b ={a,x,l,u}. G b G b M.-S. Chen 56

57 Example (Cont d) Then, in light of the main theorem, we re-calculate the paths between V a and V b For example, for node x V b, we calculate (1): p(x,t) - p (x,t) = 7 (1+1+3) = 2 (2): p(x,h) - p (x,h) = 6 4 = 2 (3): p(x,v) - p (x,v) = 6 4 = 2 (4): p(x,c) - p (x,c) = 5 3 = 2 (5): p(x,b) - p (x,b) = 4 2 = 2 and then update its new closeness centrality: C c ( x) 47 (1) (2) (3) (4) (5) G x G x M.-S. Chen 57

58 Example (Cont d) Finally, we update the closeness centralities of the referenced nodes and recalculate the APL. A= a b c d h l o r s t u v w x a b c d h l o r s t u v w x L G M.-S. Chen (14 1)

59 Example Scenario N 0 is initiating a query to find a job in company A or company B. Which friend should N 0 ask for information? N 4 N 41 {A} N 11 {C} N 0 {D} N 3 N 12 N 1 {C} {A} N 32 N 2 N 31 N 34 {A, B} {A} N N 21 {A} N 33 M.-S. Chen 59

60 New Contributions Given M. Gomez-Rodriguez, J. Leskovec, and A. Krause. Inferring Networks of Diffusion and Influence. In KDD 10, Our work is unique in that: 1. We assume there could be many underlying networks (rather than only one). 2. We model and learn a weighted graph (rather than an unweighted one). M.-S. Chen 60

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Gautam Bhat, Rajeev Kumar Singh Department of Computer Science and Engineering Shiv Nadar University Gautam Buddh Nagar,

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford

More information

Recommendation on the Web Search by Using Co-Occurrence

Recommendation on the Web Search by Using Co-Occurrence Recommendation on the Web Search by Using Co-Occurrence S.Jayabalaji 1, G.Thilagavathy 2, P.Kubendiran 3, V.D.Srihari 4. UG Scholar, Department of Computer science & Engineering, Sree Shakthi Engineering

More information

De#anonymizing,Social,Networks, and,inferring,private,attributes, Using,Knowledge,Graphs,

De#anonymizing,Social,Networks, and,inferring,private,attributes, Using,Knowledge,Graphs, De#anonymizing,Social,Networks, and,inferring,private,attributes, Using,Knowledge,Graphs, Jianwei Qian Illinois Tech Chunhong Zhang BUPT Xiang#Yang Li USTC,/Illinois Tech Linlin Chen Illinois Tech Outline

More information

Scalable Influence Maximization in Social Networks under the Linear Threshold Model

Scalable Influence Maximization in Social Networks under the Linear Threshold Model Scalable Influence Maximization in Social Networks under the Linear Threshold Model Wei Chen Microsoft Research Asia Yifei Yuan Li Zhang In collaboration with University of Pennsylvania Microsoft Research

More information

Lecture Note: Computation problems in social. network analysis

Lecture Note: Computation problems in social. network analysis Lecture Note: Computation problems in social network analysis Bang Ye Wu CSIE, Chung Cheng University, Taiwan September 29, 2008 In this lecture note, several computational problems are listed, including

More information

Fast Nearest Neighbor Search on Large Time-Evolving Graphs

Fast Nearest Neighbor Search on Large Time-Evolving Graphs Fast Nearest Neighbor Search on Large Time-Evolving Graphs Leman Akoglu Srinivasan Parthasarathy Rohit Khandekar Vibhore Kumar Deepak Rajan Kun-Lung Wu Graphs are everywhere Leman Akoglu Fast Nearest Neighbor

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank

More information

Example 3: Viral Marketing and the vaccination policy problem

Example 3: Viral Marketing and the vaccination policy problem Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 2: Jan 19, 2012 Scribes: Preethi Ambati and Azar Aliyev Example 3: Viral Marketing and the vaccination policy problem Diffusion

More information

Viral Marketing and Outbreak Detection. Fang Jin Yao Zhang

Viral Marketing and Outbreak Detection. Fang Jin Yao Zhang Viral Marketing and Outbreak Detection Fang Jin Yao Zhang Paper 1: Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, Éva Tardos KDD 2003 Outline Problem Statement

More information

Cascades. Rik Sarkar. Social and Technological Networks. University of Edinburgh, 2018.

Cascades. Rik Sarkar. Social and Technological Networks. University of Edinburgh, 2018. Cascades Social and Technological Networks Rik Sarkar University of Edinburgh, 2018. Course Solutions to Ex0 are up Make sure you are comfortable with this material Notes 1 with exercise questions are

More information

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific

More information

Extracting Information from Complex Networks

Extracting Information from Complex Networks Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform

More information

Chapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han

Chapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han Chapter 1. Social Media and Social Computing October 2012 Youn-Hee Han http://link.koreatech.ac.kr 1.1 Social Media A rapid development and change of the Web and the Internet Participatory web application

More information

IRIE: Scalable and Robust Influence Maximization in Social Networks

IRIE: Scalable and Robust Influence Maximization in Social Networks IRIE: Scalable and Robust Influence Maximization in Social Networks Kyomin Jung KAIST Republic of Korea kyomin@kaist.edu Wooram Heo KAIST Republic of Korea modesty83@kaist.ac.kr Wei Chen Microsoft Research

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/24/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 High dim. data

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 3/6/2012 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 In many data mining

More information

Graph Algorithms using Map-Reduce. Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web

Graph Algorithms using Map-Reduce. Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be

More information

Graph Mining: Overview of different graph models

Graph Mining: Overview of different graph models Graph Mining: Overview of different graph models Davide Mottin, Konstantina Lazaridou Hasso Plattner Institute Graph Mining course Winter Semester 2016 Lecture road Anomaly detection (previous lecture)

More information

Efficient Mining Algorithms for Large-scale Graphs

Efficient Mining Algorithms for Large-scale Graphs Efficient Mining Algorithms for Large-scale Graphs Yasunari Kishimoto, Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka Abstract This article describes efficient graph mining algorithms designed

More information

SmallBlue: Unlock Collective Intelligence from Information Flows in Social Networks

SmallBlue: Unlock Collective Intelligence from Information Flows in Social Networks SmallBlue: Unlock Collective Intelligence from Information Flows in Social Networks Dashun Wang Northeastern University 110 Forsyth Street, Boston, MA 02115 Zhen Wen, Ching-Yung Lin IBM T. J. Watson Research

More information

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov Algorithms and Applications in Social Networks 2017/2018, Semester B Slava Novgorodov 1 Lesson #1 Administrative questions Course overview Introduction to Social Networks Basic definitions Network properties

More information

Content-Centric Flow Mining for Influence Analysis in Social Streams

Content-Centric Flow Mining for Influence Analysis in Social Streams Content-Centric Flow Mining for Influence Analysis in Social Streams Karthik Subbian University of Minnesota, MN. karthik@cs.umn.edu Charu Aggarwal IBM Watson Research, NY. charu@us.ibm.com Jaideep Srivastava

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK232 Fall 2016 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna,

More information

An Empirical Analysis of Communities in Real-World Networks

An Empirical Analysis of Communities in Real-World Networks An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization

More information

Jure Leskovec Machine Learning Department Carnegie Mellon University

Jure Leskovec Machine Learning Department Carnegie Mellon University Jure Leskovec Machine Learning Department Carnegie Mellon University Currently: Soon: Today: Large on line systems have detailed records of human activity On line communities: Facebook (64 million users,

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

Dynamic and Historical Shortest-Path Distance Queries on Large Evolving Networks by Pruned Landmark Labeling

Dynamic and Historical Shortest-Path Distance Queries on Large Evolving Networks by Pruned Landmark Labeling 2014/04/09 @ WWW 14 Dynamic and Historical Shortest-Path Distance Queries on Large Evolving Networks by Pruned Landmark Labeling Takuya Akiba (U Tokyo) Yoichi Iwata (U Tokyo) Yuichi Yoshida (NII & PFI)

More information

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul 1 CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul Introduction Our problem is crawling a static social graph (snapshot). Given

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

Embedded Technosolutions

Embedded Technosolutions Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication

More information

Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar

Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar School of Computer Science Faculty of Science University of Windsor How Big is this Big Data? 40 Billion Instagram Photos 300 Hours

More information

CS224W: Analysis of Networks Jure Leskovec, Stanford University

CS224W: Analysis of Networks Jure Leskovec, Stanford University CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 Observations Models

More information

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy Xiaokui Xiao Nanyang Technological University Outline Privacy preserving data publishing: What and Why Examples of privacy attacks

More information

Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges

Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Abhishek Santra 1 and Sanjukta Bhowmick 2 1 Information Technology Laboratory, CSE Department, University of

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs

On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs Sungpack Hong 2, Nicole C. Rodia 1, and Kunle Olukotun 1 1 Pervasive Parallelism Laboratory, Stanford University

More information

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK 233 Spring 2018 Service Providing Improve urban planning, Ease Traffic Congestion, Save Energy,

More information

GraphGAN: Graph Representation Learning with Generative Adversarial Nets

GraphGAN: Graph Representation Learning with Generative Adversarial Nets The 32 nd AAAI Conference on Artificial Intelligence (AAAI 2018) New Orleans, Louisiana, USA GraphGAN: Graph Representation Learning with Generative Adversarial Nets Hongwei Wang 1,2, Jia Wang 3, Jialin

More information

Bring Semantic Web to Social Communities

Bring Semantic Web to Social Communities Bring Semantic Web to Social Communities Jie Tang Dept. of Computer Science, Tsinghua University, China jietang@tsinghua.edu.cn April 19, 2010 Abstract Recently, more and more researchers have recognized

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #10: Link Analysis-2 Seoul National University 1 In This Lecture Pagerank: Google formulation Make the solution to converge Computing Pagerank for very large graphs

More information

Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion

Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion References Social Network Social Network Analysis Sociocentric

More information

An Optimal Allocation Approach to Influence Maximization Problem on Modular Social Network. Tianyu Cao, Xindong Wu, Song Wang, Xiaohua Hu

An Optimal Allocation Approach to Influence Maximization Problem on Modular Social Network. Tianyu Cao, Xindong Wu, Song Wang, Xiaohua Hu An Optimal Allocation Approach to Influence Maximization Problem on Modular Social Network Tianyu Cao, Xindong Wu, Song Wang, Xiaohua Hu ACM SAC 2010 outline Social network Definition and properties Social

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Queries on streams

More information

Non Overlapping Communities

Non Overlapping Communities Non Overlapping Communities Davide Mottin, Konstantina Lazaridou HassoPlattner Institute Graph Mining course Winter Semester 2016 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides

More information

Learning Network Graph of SIR Epidemic Cascades Using Minimal Hitting Set based Approach

Learning Network Graph of SIR Epidemic Cascades Using Minimal Hitting Set based Approach Learning Network Graph of SIR Epidemic Cascades Using Minimal Hitting Set based Approach Zhuozhao Li and Haiying Shen Dept. of Electrical and Computer Engineering Clemson University, SC, USA Kang Chen

More information

Analysis of Large Graphs: TrustRank and WebSpam

Analysis of Large Graphs: TrustRank and WebSpam Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #6: Mining Data Streams Seoul National University 1 Outline Overview Sampling From Data Stream Queries Over Sliding Window 2 Data Streams In many data mining situations,

More information

CS425: Algorithms for Web Scale Data

CS425: Algorithms for Web Scale Data CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org J.

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Developing Focused Crawlers for Genre Specific Search Engines

Developing Focused Crawlers for Genre Specific Search Engines Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com

More information

Graph Algorithms. Revised based on the slides by Ruoming Kent State

Graph Algorithms. Revised based on the slides by Ruoming Kent State Graph Algorithms Adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States. See http://creativecommons.org/licenses/by-nc-sa/3.0/us/

More information

Jianyong Wang Department of Computer Science and Technology Tsinghua University

Jianyong Wang Department of Computer Science and Technology Tsinghua University Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity

More information

Using! to Teach Graph Theory

Using! to Teach Graph Theory !! Using! to Teach Graph Theory Todd Abel Mary Elizabeth Searcy Appalachian State University Why Graph Theory? Mathematical Thinking (Habits of Mind, Mathematical Practices) Accessible to students at a

More information

Maximizing the Spread of Influence through a Social Network

Maximizing the Spread of Influence through a Social Network Maximizing the Spread of Influence through a Social Network By David Kempe, Jon Kleinberg, Eva Tardos Report by Joe Abrams Social Networks Infectious disease networks Viral Marketing Viral Marketing Example:

More information

Election Analysis and Prediction Using Big Data Analytics

Election Analysis and Prediction Using Big Data Analytics Election Analysis and Prediction Using Big Data Analytics Omkar Sawant, Chintaman Taral, Roopak Garbhe Students, Department Of Information Technology Vidyalankar Institute of Technology, Mumbai, India

More information

How to organize the Web?

How to organize the Web? How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second try: Web Search Information Retrieval attempts to find relevant docs in a small and trusted set Newspaper

More information

Databases 2 (VU) ( / )

Databases 2 (VU) ( / ) Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:

More information

MCL. (and other clustering algorithms) 858L

MCL. (and other clustering algorithms) 858L MCL (and other clustering algorithms) 858L Comparing Clustering Algorithms Brohee and van Helden (2006) compared 4 graph clustering algorithms for the task of finding protein complexes: MCODE RNSC Restricted

More information

AMAZON.COM RECOMMENDATIONS ITEM-TO-ITEM COLLABORATIVE FILTERING PAPER BY GREG LINDEN, BRENT SMITH, AND JEREMY YORK

AMAZON.COM RECOMMENDATIONS ITEM-TO-ITEM COLLABORATIVE FILTERING PAPER BY GREG LINDEN, BRENT SMITH, AND JEREMY YORK AMAZON.COM RECOMMENDATIONS ITEM-TO-ITEM COLLABORATIVE FILTERING PAPER BY GREG LINDEN, BRENT SMITH, AND JEREMY YORK PRESENTED BY: DEEVEN PAUL ADITHELA 2708705 OUTLINE INTRODUCTION DIFFERENT TYPES OF FILTERING

More information

User-centric Cross-network Social Multimedia Computing

User-centric Cross-network Social Multimedia Computing Part III User-centric Cross-network Social Multimedia Computing Jitao Sang Multimedia Computing Group National Lab of Pattern Recognition, Institute of Automation Chinese Academy of Sciences Big Data &

More information

Advanced Computer Graphics CS 525M: Crowds replace Experts: Building Better Location-based Services using Mobile Social Network Interactions

Advanced Computer Graphics CS 525M: Crowds replace Experts: Building Better Location-based Services using Mobile Social Network Interactions Advanced Computer Graphics CS 525M: Crowds replace Experts: Building Better Location-based Services using Mobile Social Network Interactions XIAOCHEN HUANG Computer Science Dept. Worcester Polytechnic

More information

Positive and Negative Links

Positive and Negative Links Positive and Negative Links Web Science (VU) (707.000) Elisabeth Lex KTI, TU Graz May 4, 2015 Elisabeth Lex (KTI, TU Graz) Networks May 4, 2015 1 / 66 Outline 1 Repetition 2 Motivation 3 Structural Balance

More information

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3

More information

Math 443/543 Graph Theory Notes 10: Small world phenomenon and decentralized search

Math 443/543 Graph Theory Notes 10: Small world phenomenon and decentralized search Math 443/543 Graph Theory Notes 0: Small world phenomenon and decentralized search David Glickenstein November 0, 008 Small world phenomenon The small world phenomenon is the principle that all people

More information

Graph Symmetry and Social Network Anonymization

Graph Symmetry and Social Network Anonymization Graph Symmetry and Social Network Anonymization Yanghua XIAO ( 肖仰华 ) School of computer science Fudan University For more information, please visit http://gdm.fudan.edu.cn Graph isomorphism determination

More information

REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK

REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK 1 Dr.R.Kousalya, 2 T.Sindhupriya 1 Research Supervisor, Professor & Head, Department of Computer Applications, Dr.N.G.P Arts and Science College, Coimbatore

More information

Probabilistic Graph Summarization

Probabilistic Graph Summarization Probabilistic Graph Summarization Nasrin Hassanlou, Maryam Shoaran, and Alex Thomo University of Victoria, Victoria, Canada {hassanlou,maryam,thomo}@cs.uvic.ca 1 Abstract We study group-summarization of

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/25/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 3 In many data mining

More information

Maximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg and Eva Tardos

Maximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg and Eva Tardos Maximizing the Spread of Influence through a Social Network David Kempe, Jon Kleinberg and Eva Tardos Group 9 Lauren Thomas, Ryan Lieblein, Joshua Hammock and Mary Hanvey Introduction In a social network,

More information

New Challenges in Big Data: Technical Perspectives. Hwanjo Yu POSTECH

New Challenges in Big Data: Technical Perspectives. Hwanjo Yu POSTECH New Challenges in Big Data: Technical Perspectives Hwanjo Yu POSTECH http:/hwanjoyu.org Over 1 Billion SNS users!! Viral Marketing Word-of-Mouth Effect > TV advertising......... Influence Maximization

More information

Modeling Dynamic Behavior in Large Evolving Graphs

Modeling Dynamic Behavior in Large Evolving Graphs Modeling Dynamic Behavior in Large Evolving Graphs R. Rossi, J. Neville, B. Gallagher, and K. Henderson Presented by: Doaa Altarawy 1 Outline - Motivation - Proposed Model - Definitions - Modeling dynamic

More information

Department of Computer Science & Engineering The Graduate School, Chung-Ang University. CAU Artificial Intelligence LAB

Department of Computer Science & Engineering The Graduate School, Chung-Ang University. CAU Artificial Intelligence LAB Department of Computer Science & Engineering The Graduate School, Chung-Ang University CAU Artificial Intelligence LAB 1 / 17 Text data is exploding on internet because of the appearance of SNS, such as

More information

Chapter 10. Fundamental Network Algorithms. M. E. J. Newman. May 6, M. E. J. Newman Chapter 10 May 6, / 33

Chapter 10. Fundamental Network Algorithms. M. E. J. Newman. May 6, M. E. J. Newman Chapter 10 May 6, / 33 Chapter 10 Fundamental Network Algorithms M. E. J. Newman May 6, 2015 M. E. J. Newman Chapter 10 May 6, 2015 1 / 33 Table of Contents 1 Algorithms for Degrees and Degree Distributions Degree-Degree Correlation

More information

Overview of the Stateof-the-Art. Networks. Evolution of social network studies

Overview of the Stateof-the-Art. Networks. Evolution of social network studies Overview of the Stateof-the-Art in Social Networks INF5370 spring 2014 Evolution of social network studies 1950-1970: mathematical studies of networks formed by the actual human interactions Pandemics,

More information

University of Maryland. Tuesday, March 2, 2010

University of Maryland. Tuesday, March 2, 2010 Data-Intensive Information Processing Applications Session #5 Graph Algorithms Jimmy Lin University of Maryland Tuesday, March 2, 2010 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/6/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 High dim. data Graph data Infinite data Machine

More information

Measuring and Evaluating Dissimilarity in Data and Pattern Spaces

Measuring and Evaluating Dissimilarity in Data and Pattern Spaces Measuring and Evaluating Dissimilarity in Data and Pattern Spaces Irene Ntoutsi, Yannis Theodoridis Database Group, Information Systems Laboratory Department of Informatics, University of Piraeus, Greece

More information

Recommender Systems 6CCS3WSN-7CCSMWAL

Recommender Systems 6CCS3WSN-7CCSMWAL Recommender Systems 6CCS3WSN-7CCSMWAL http://insidebigdata.com/wp-content/uploads/2014/06/humorrecommender.jpg Some basic methods of recommendation Recommend popular items Collaborative Filtering Item-to-Item:

More information

Influence Maximization in the Independent Cascade Model

Influence Maximization in the Independent Cascade Model Influence Maximization in the Independent Cascade Model Gianlorenzo D Angelo, Lorenzo Severini, and Yllka Velaj Gran Sasso Science Institute (GSSI), Viale F. Crispi, 7, 67100, L Aquila, Italy. {gianlorenzo.dangelo,lorenzo.severini,yllka.velaj}@gssi.infn.it

More information

Mining Trusted Information in Medical Science: An Information Network Approach

Mining Trusted Information in Medical Science: An Information Network Approach Mining Trusted Information in Medical Science: An Information Network Approach Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign Collaborated with many, especially Yizhou

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References q Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Large Scale Graph Algorithms

Large Scale Graph Algorithms Large Scale Graph Algorithms A Guide to Web Research: Lecture 2 Yury Lifshits Steklov Institute of Mathematics at St.Petersburg Stuttgart, Spring 2007 1 / 34 Talk Objective To pose an abstract computational

More information

Functionality, Challenges and Architecture of Social Networks

Functionality, Challenges and Architecture of Social Networks Functionality, Challenges and Architecture of Social Networks INF 5370 Outline Social Network Services Functionality Business Model Current Architecture and Scalability Challenges Conclusion 1 Social Network

More information

Inferring the Underlying Structure of Information Cascades

Inferring the Underlying Structure of Information Cascades 212 IEEE 12th International Conference on Data Mining Inferring the Underlying Structure of Information Cascades Bo Zong, Yinghui Wu, Ambuj K. Singh, and Xifeng Yan University of California at Santa Barbara

More information

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,

More information

Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety

Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety Abhishek

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

arxiv: v1 [cs.si] 12 Jan 2019

arxiv: v1 [cs.si] 12 Jan 2019 Predicting Diffusion Reach Probabilities via Representation Learning on Social Networks Furkan Gursoy furkan.gursoy@boun.edu.tr Ahmet Onur Durahim onur.durahim@boun.edu.tr arxiv:1901.03829v1 [cs.si] 12

More information

Distance-based Outlier Detection: Consolidation and Renewed Bearing

Distance-based Outlier Detection: Consolidation and Renewed Bearing Distance-based Outlier Detection: Consolidation and Renewed Bearing Gustavo. H. Orair, Carlos H. C. Teixeira, Wagner Meira Jr., Ye Wang, Srinivasan Parthasarathy September 15, 2010 Table of contents Introduction

More information

Examination examples

Examination examples Examination examples Databasteknik (5 hours) 1. Relational Algebra & SQL (4 pts total; 2 pts each). Part A Consider the relations R(A, B), and S(C, D). Of the following three equivalences between expressions

More information

Slides based on those in:

Slides based on those in: Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org A 3.3 B 38.4 C 34.3 D 3.9 E 8.1 F 3.9 1.6 1.6 1.6 1.6 1.6 2 y 0.8 ½+0.2 ⅓ M 1/2 1/2 0 0.8 1/2 0 0 + 0.2 0 1/2 1 [1/N]

More information

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Alessandro Epasto J. Feldman*, S. Lattanzi*, S. Leonardi, V. Mirrokni*. *Google Research Sapienza U. Rome Motivation Recommendation

More information

Using Time-Sensitive Rooted PageRank to Detect Hierarchical Social Relationships

Using Time-Sensitive Rooted PageRank to Detect Hierarchical Social Relationships Using Time-Sensitive Rooted PageRank to Detect Hierarchical Social Relationships Mohammad Jaber 1, Panagiotis Papapetrou 2, Sven Helmer 3, Peter T. Wood 1 1 Department of Comp. Sci. and Info. Systems,

More information

International Journal of Advance Engineering and Research Development. Performance Enhancement of Search System

International Journal of Advance Engineering and Research Development. Performance Enhancement of Search System Scientific Journal of Impact Factor(SJIF): 3.134 International Journal of Advance Engineering and Research Development Volume 2,Issue 7, July -2015 Performance Enhancement of Search System Ms. Uma P Nalawade

More information

FPGP: Graph Processing Framework on FPGA

FPGP: Graph Processing Framework on FPGA FPGP: Graph Processing Framework on FPGA Guohao DAI, Yuze CHI, Yu WANG, Huazhong YANG E.E. Dept., TNLIST, Tsinghua University dgh14@mails.tsinghua.edu.cn 1 Big graph is widely used Big graph is widely

More information

Introduction to Data Science

Introduction to Data Science UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics

More information

Value Added Association Rules

Value Added Association Rules Value Added Association Rules T.Y. Lin San Jose State University drlin@sjsu.edu Glossary Association Rule Mining A Association Rule Mining is an exploratory learning task to discover some hidden, dependency

More information