Case Study: Social Network Analysis. Part II

Size: px
Start display at page:

Download "Case Study: Social Network Analysis. Part II"

Transcription

1 Case Study: Social Network Analysis Part II

2 Outline IoT Fundamentals and IoT Stream Mining Algorithms Predictive Learning Descriptive Learning Frequent Pattern mining Evolving Analytics including Novel Patterns Case Study: Social Network Analysis Challenges in mining networked data, Online sampling Evolving centralities and communities Tracking the dynamics of evolving communities Case Study: Predictive Maintenance Problem Definition Change, Anomaly and Novelty Detection Failure Prediction and Detection Case Study: Secure IoT Stream Mining Types of attacks (e.g., Controlled Channel and Timing) Securing data and system logs Defense against side-channel attacks Data-Obliviousness Randomization 79

3 Challenges in Mining Networked Data Unbounded, evolving, high speed, massive Continuous interactions between social entities Represent real world social structures Mining social structures to make powerful decisions Unique challenges in mining streams

4 Complex Evolutionary Network : Calls' Network Topology Continuous interactions between users ( Edges and Nodes) Multiple interactions between two users ( Multi-Graph) Long/short interactions between two users ( Weighted Graph) Who initiates the Call ( Bi-directional graph) More users making less calls ( Obeys Power Law Distribution ) Number of connected components > 1 ( Disconnected Graph ) Vertices's with different degrees ( Irregular Graph ) Density of Graph 0 (Sparse Graph) Interactions based on place and time ( Spatio-Temporal ) Nodes and edges get added and deleted ( Evolutionary )

5 Practical Applications of Call Networks Analysis Behavioral prediction Churn prediction Load prediction Word of mouth viral marketing Influence analysis Customer profiling Event detection etc.

6 Problems Unable to hold entire data on disk High cost in batch processing, out-dated results Difficult to compute continuous queries over large data streams Difficult to calculate interesting measures like centrality, path length, eccentricity etc Difficult to gain useful insights in real time

7 Online Sampling 84

8 Sampling Massive Streaming Call Graphs Sampling: Selecting a subset of individuals from with in a stream of data to represent the characteristics of the whole stream at a given point of time.

9 Applications Approximate answers to real time queries Real time computation of measures like centrality, clustering coefficient, path length etc for identifying graph properties Finding frequent items in real time Real time detection of communities, events, ego networks and key players etc

10 Scenario Anonymous CDR's from Telecommunication networks Data Stream over 31 days Approx. 8 millions to 16 millions calls per day Spread across 24 hrs per day Call-Graph Semantics Nodes as callers and callees Edges as calls Multi-graph with repetitive edges (calls between same nodes) Multi-graph mapped to weighted network as frequency of edges Incoming and outgoing calls as bi-directional edges

11 Evaluating Sampling methods and Algorithms Which structures are well preserved by the samples over the evolution of a weighted directed graph stream? Which samples maintain the properties of dynamic stream? Which methods and techniques have least time complexity? Which samples are biased towards some of the metrics? Which samples exhibit similar degree distribution as snapshot of stream?

12 Methods Node based methods Sample a set of nodes from the original graph. The samples posses only nodes and no structure. Edge based methods Samples are generated by selecting a subset of edges from the original graph. The resultant graph is a subgraph of original graph with nodes and edges.

13 Algorithms Reservoir Sampling Space Saving Biased Random Sampling

14 Reservoir Sampling Randomly choosing a sample of k items from a list S containing n items. Replaces elements with gradually decreasing probability All the elements are chosen with same probability Fill the reservoir of size k with first n elements For each element i after n Generate a random number r between 0 and i 1 If r<k; Let j be the element at position r in k Replace element j with i Else Skip I (Vitter 1985)

15 Space Saving Algorithm Approximate approach for finding most frequent items Maintain partial information of interest; monitor only a subset m of elements For each element e in stream If e is monitored: Increment Count Else Let m be the element with least hits min Replace m with e with count = min+1 (Metwally et al. 2005)

16 Biased Random Sampling Generates a random sample by inserting every element from the stream with equal probability Replacing elements from sample randomly Biased towards recent elements in stream Fill the list of size k with first n elements For each element i after n Generate a random number r between 0 and k Let j be the element at position r in k Replace j with i (Aggarwal 2006)

17 SSE Sample of size 10⁴ edges over 31 days using Fruchterman Reingold Layout

18 RS RS displays Least community structure

19 BRS Sample of nodes with different modularities

20 Evolving centralities and communities 97

21 A Temporal Networks C B E D F t1 Person-to-person communication Disease spreading Social networks Temporal Networks Editors: Holme, Petter, Saramäki, Jari (Eds.)

22 Temporal Networks C A B E F D t2

23 Temporal Networks C A B E F D t3

24 Temporal Networks C A B D E F t1 A B C D E F t2 A B C E F D t3

25 Temporal Networks C C C A B E F A B E F A B E F D t1 D t2 D t3 C A is the source of F contamination? A B E F D aggregate

26 Temporal Networks C A B E F D A B B C D E B C D E E F

27 Temporal Networks C A B E F D Observation window W =

28 Temporal Networks [1,2] C [1,2] [17,19] A B E [7,9] D [8,9] F [2,4], [11,20]

29 Temporal Networks A [1,2] C [1,2] [17,19] B E [7,9] D [8,9] F [2,4], [11,20] Person-to-person communication Disease spreading Social networks

30 Temporal Networks A [1,2] C [1,2] [17,19] B E [7,9] D [8,9] F [2,4], [11,20] Person-to-person communication Disease spreading Social networks

31 Temporal Metrics The concept of geodesic distance cannot be limited to the number of hops separating two nodes (the topology of the network) but should also take into account the temporal ordering of links.

32 Temporal Path A [1,2] C [1,2] [17,19] B E [7,9] D [8,9] F [2,4], [11,20] W = [1,20] R = 1 day T = 0 Temporal Path Duration Is fastest path? P(B,E) = <(B,C,1), (C,E,2)> P(B,E) = <(B,D,7), (D,E,9)> P(B,E) = <(B,D,7), (D,E,8)> P(B,E) = <(B,C,1), (C,E,1)> 1 yes 2 no 1 Yes 0

33 Temporal Metrics Revisiting centrality metrics. Fastest path duration How close a node is from the others nodes in the graph High closeness = best visibility into what is happening

34 Temporal Metrics Revisiting centrality metrics. Number of fastest paths between j and k that pass through v Number of fastest paths between j and k High betweenness = great influence over what flows High betweenness = control the flow of information (gatekeeper)

35 Betweenness centrality A [1,2] C [1,2] [17,19] B E [7,9] D [8,9] F [2,4], [11,20] W = [1,20] R = 1 day T = 0 Node Fastest Path (temporal) Shortest Path (static) C D E 3 4 B 0 4

36 Betweenness centrality [1,2] C [1,2] R = 1 day A B [7,7] D E [2,4] F T = 0 Ranking Node 1 E, C 3 A, B, D, F t W = [1,7] W = [1,14] W = [1,21]

37 Betweenness centrality [1,2] C [1,2] R = 1 day A B E [7,9] D [8,9] F [2,4], [11,14] T = 0 Ranking Node Ranking Node 1 E, C 1 E 2 C 3 A, B, D, F 3 D A, B, F t W = [1,7] W = [1,14] W = [1,21]

38 Betweenness centrality A [1,2] C [1,2] [17,19] B E [7,9] D [8,9] F [2,4], [11,20] R = 1 day T = 0 Ranking Node Ranking Node Ranking Node 1 E, C 1 E 1 E 3 A, B, D, F 2 C 2 C 3 D 3 D A, B, F 4 A, B, F t W = [1,7] W = [1,14] W = [1,21]

39 Centrality Change C i (t): Centrality of node i at time t C i (t) = pos i (t-1) pos i (t) / max(pos i (t-1), pos i (t)) Ranking Node 1 E, C 3 A, B, D, F ranking position of node i in time t Ranking Node 1 E 2 C 3 D 4 A, B, F Ranking Node 1 E 2 C 3 D 4 A, B, F t

40 Tracking the dynamics of evolving communities 117

41 Social Networks as Dynamic Structures Social networks are a hot topic and a focus of considerable a2enjon in recent research Growing availability of large volumes of relajonal data, boosted by the proliferation of social media web sites Number of social entities and the interactions among social entities change over time social networks have a dynamic nature One way to uncover the evolution patterns of social networks is by monitoring the evolujon of their communijes

42 Event-based Dynamic Community Mining Dynamic communities are unstable pakerns that can evolve in both membership and content Dynamic communities undergo a succession of events during their life-cycle in the network Community Evolution Events according to Palla et al. (2007)

43 Event-based Dynamic Community Mining How can we perform the mapping of communities between different snapshots of the dynamic network? Proposed solujon: compute condijonal probabilijes for each pair of communijes found at consecujve jme points? MAESTRA - Learning from Massive, Incompletely annotated, and Structured Data

44 Tracking Clusters Survival threshold

45 Event-based Dynamic Community Mining MAESTRA - Learning from Massive, Incompletely annotated, and Structured Data

46 Communities Life-cycle and Temporal Trajectory How can we represent the evolution of the dynamic communities? Community Life-cycle: temporal sequence of events (birth, split, merge, death) undergone by a given dynamic community, from the moment the community first appeared unjl the moment it fade away. Figure from Greene et al. (2010)

47 Telco Data Community Life-cycle July August September October November December Using Louvain Algorithm for Community Detection

48 Interpretation of Communities Dynamics What is happening in the structure of the underlying network that explains these community dynamics? Tucker3 model

49 Node-level Measures Eigenvector centrality: measures how well a given node is connected to other wellconnected nodes in the network Closeness centrality: measure of reachability that quanjfies how fast can a given node can reach everyone in the network Betweenness centrality: measures the extent to which a node lies between other nodes in the network

50 Assigning a meaning to the axes of the 2D space Social Activity Sociability VS Accessibility

51 Interpretation of a Community s Temporal Trajectory

Temporal Networks. Hiroki Sayama

Temporal Networks. Hiroki Sayama Temporal Networks Hiroki Sayama sayama@binghamton.edu Temporal networks Networks whose topologies and activities change over time Heavily data-driven research E.g. human contacts (email, social media,

More information

Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion

Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion References Social Network Social Network Analysis Sociocentric

More information

Sentiment analysis under temporal shift

Sentiment analysis under temporal shift Sentiment analysis under temporal shift Jan Lukes and Anders Søgaard Dpt. of Computer Science University of Copenhagen Copenhagen, Denmark smx262@alumni.ku.dk Abstract Sentiment analysis models often rely

More information

On Biased Reservoir Sampling in the Presence of Stream Evolution

On Biased Reservoir Sampling in the Presence of Stream Evolution Charu C. Aggarwal T J Watson Research Center IBM Corporation Hawthorne, NY USA On Biased Reservoir Sampling in the Presence of Stream Evolution VLDB Conference, Seoul, South Korea, 2006 Synopsis Construction

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References q Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization Pedro Ribeiro (DCC/FCUP & CRACS/INESC-TEC) Part 1 Motivation and emergence of Network Science

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

A Novel Technique for Finding Influential Nodes

A Novel Technique for Finding Influential Nodes A Novel Technique for Finding Influential Nodes Mini Singh Ahuja Department of Computer science, Guru Nanak Dev University, Regional Campus, Gurdaspur Abstract In complex networks there is a big issue

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

Intelligent Network Management Using Graph Differential Anomaly Visualization Qi Liao

Intelligent Network Management Using Graph Differential Anomaly Visualization Qi Liao Intelligent Network Management Using Graph Differential Anomaly Visualization Qi Liao Network Management What is going on in the network? Public servers Private servers Wireless Users DMZ Applications

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

An overview of Graph Categories and Graph Primitives

An overview of Graph Categories and Graph Primitives An overview of Graph Categories and Graph Primitives Dino Ienco (dino.ienco@irstea.fr) https://sites.google.com/site/dinoienco/ Topics I m interested in: Graph Database and Graph Data Mining Social Network

More information

Web Structure Mining Community Detection and Evaluation

Web Structure Mining Community Detection and Evaluation Web Structure Mining Community Detection and Evaluation 1 Community Community. It is formed by individuals such that those within a group interact with each other more frequently than with those outside

More information

Social Network Analysis

Social Network Analysis Social Network Analysis Mathematics of Networks Manar Mohaisen Department of EEC Engineering Adjacency matrix Network types Edge list Adjacency list Graph representation 2 Adjacency matrix Adjacency matrix

More information

DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li

DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Welcome to DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Time: 6:00pm 8:50pm Wednesday Location: Fuller 320 Spring 2017 2 Team assignment Finalized. (Great!) Guest Speaker 2/22 A

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be

More information

Microsoft Exam

Microsoft Exam Volume: 42 Questions Case Study: 1 Relecloud General Overview Relecloud is a social media company that processes hundreds of millions of social media posts per day and sells advertisements to several hundred

More information

Managing and mining (streaming) sensor data

Managing and mining (streaming) sensor data Petr Čížek Artificial Intelligence Center Czech Technical University in Prague November 3, 2016 Petr Čížek VPD 1 / 1 Stream data mining / stream data querying Problem definition Data can not be stored

More information

Persistent Homology in Complex Network Analysis

Persistent Homology in Complex Network Analysis Persistent Homology Summer School - Rabat Persistent Homology in Complex Network Analysis Ulderico Fugacci Kaiserslautern University of Technology Department of Computer Science July 7, 2017 Anything has

More information

Graph Data Management Systems in New Applications Domains. Mikko Halin

Graph Data Management Systems in New Applications Domains. Mikko Halin Graph Data Management Systems in New Applications Domains Mikko Halin Introduction Presentation is based on two papers Graph Data Management Systems for New Application Domains - Philippe Cudré-Mauroux,

More information

Scott Philips, Edward Kao, Michael Yee and Christian Anderson. Graph Exploitation Symposium August 9 th 2011

Scott Philips, Edward Kao, Michael Yee and Christian Anderson. Graph Exploitation Symposium August 9 th 2011 Activity-Based Community Detection Scott Philips, Edward Kao, Michael Yee and Christian Anderson Graph Exploitation Symposium August 9 th 2011 23-1 This work is sponsored by the Office of Naval Research

More information

Massive Data Analysis

Massive Data Analysis Professor, Department of Electrical and Computer Engineering Tennessee Technological University February 25, 2015 Big Data This talk is based on the report [1]. The growth of big data is changing that

More information

Architectural Styles. Reid Holmes

Architectural Styles. Reid Holmes Material and some slide content from: - Emerson Murphy-Hill - Software Architecture: Foundations, Theory, and Practice - Essential Software Architecture Architectural Styles Reid Holmes Lecture 5 - Tuesday,

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford

More information

Graph Sampling Approach for Reducing. Computational Complexity of. Large-Scale Social Network

Graph Sampling Approach for Reducing. Computational Complexity of. Large-Scale Social Network Journal of Innovative Technology and Education, Vol. 3, 216, no. 1, 131-137 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/1.12988/jite.216.6828 Graph Sampling Approach for Reducing Computational Complexity

More information

Graph Analytics and Machine Learning A Great Combination Mark Hornick

Graph Analytics and Machine Learning A Great Combination Mark Hornick Graph Analytics and Machine Learning A Great Combination Mark Hornick Oracle Advanced Analytics and Machine Learning November 3, 2017 Safe Harbor Statement The following is intended to outline our research

More information

Challenges in Ubiquitous Data Mining

Challenges in Ubiquitous Data Mining LIAAD-INESC Porto, University of Porto, Portugal jgama@fep.up.pt 1 2 Very-short-term Forecasting in Photovoltaic Systems 3 4 Problem Formulation: Network Data Model Querying Model Query = Q( n i=0 S i)

More information

TELCOM2125: Network Science and Analysis

TELCOM2125: Network Science and Analysis School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 Figures are taken from: M.E.J. Newman, Networks: An Introduction 2

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank

More information

Network visualization techniques and evaluation

Network visualization techniques and evaluation Network visualization techniques and evaluation The Charlotte Visualization Center University of North Carolina, Charlotte March 15th 2007 Outline 1 Definition and motivation of Infovis 2 3 4 Outline 1

More information

Latent Space Model for Road Networks to Predict Time-Varying Traffic. Presented by: Rob Fitzgerald Spring 2017

Latent Space Model for Road Networks to Predict Time-Varying Traffic. Presented by: Rob Fitzgerald Spring 2017 Latent Space Model for Road Networks to Predict Time-Varying Traffic Presented by: Rob Fitzgerald Spring 2017 Definition of Latent https://en.oxforddictionaries.com/definition/latent Latent Space Model?

More information

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum

More information

SmallBlue: Unlock Collective Intelligence from Information Flows in Social Networks

SmallBlue: Unlock Collective Intelligence from Information Flows in Social Networks SmallBlue: Unlock Collective Intelligence from Information Flows in Social Networks Dashun Wang Northeastern University 110 Forsyth Street, Boston, MA 02115 Zhen Wen, Ching-Yung Lin IBM T. J. Watson Research

More information

Stateful Detection in High Throughput Distributed Systems

Stateful Detection in High Throughput Distributed Systems Stateful Detection in High Throughput Distributed Systems Gunjan Khanna, Ignacio Laguna, Fahad A. Arshad, Saurabh Bagchi Dependable Computing Systems Lab School of Electrical and Computer Engineering Purdue

More information

Sampling Large Graphs for Anticipatory Analysis

Sampling Large Graphs for Anticipatory Analysis Sampling Large Graphs for Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance Extreme Computing Conference September 16, 2015

More information

Exploring graph mining approaches for dynamic heterogeneous networks

Exploring graph mining approaches for dynamic heterogeneous networks Georgetown University Institutional Repository http://www.library.georgetown.edu/digitalgeorgetown The author made this article openly available online. Please tell us how this access affects you. Your

More information

Outsourcing Privacy-Preserving Social Networks to a Cloud

Outsourcing Privacy-Preserving Social Networks to a Cloud IEEE INFOCOM 2013, April 14-19, Turin, Italy Outsourcing Privacy-Preserving Social Networks to a Cloud Guojun Wang a, Qin Liu a, Feng Li c, Shuhui Yang d, and Jie Wu b a Central South University, China

More information

Link Structure Analysis

Link Structure Analysis Link Structure Analysis Kira Radinsky All of the following slides are courtesy of Ronny Lempel (Yahoo!) Link Analysis In the Lecture HITS: topic-specific algorithm Assigns each page two scores a hub score

More information

Community Detection. Community

Community Detection. Community Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,

More information

Searching the Web What is this Page Known for? Luis De Alba

Searching the Web What is this Page Known for? Luis De Alba Searching the Web What is this Page Known for? Luis De Alba ldealbar@cc.hut.fi Searching the Web Arasu, Cho, Garcia-Molina, Paepcke, Raghavan August, 2001. Stanford University Introduction People browse

More information

A Framework for Clustering Massive Text and Categorical Data Streams

A Framework for Clustering Massive Text and Categorical Data Streams A Framework for Clustering Massive Text and Categorical Data Streams Charu C. Aggarwal IBM T. J. Watson Research Center charu@us.ibm.com Philip S. Yu IBM T. J.Watson Research Center psyu@us.ibm.com Abstract

More information

Temporal Graphs KRISHNAN PANAMALAI MURALI

Temporal Graphs KRISHNAN PANAMALAI MURALI Temporal Graphs KRISHNAN PANAMALAI MURALI METRICFORENSICS: A Multi-Level Approach for Mining Volatile Graphs Authors: Henderson, Eliassi-Rad, Faloutsos, Akoglu, Li, Maruhashi, Prakash and Tong. Published:

More information

ABSTRACT I. INTRODUCTION II. METHODS AND MATERIAL

ABSTRACT I. INTRODUCTION II. METHODS AND MATERIAL 2016 IJSRST Volume 2 Issue 4 Print ISSN: 2395-6011 Online ISSN: 2395-602X Themed Section: Science and Technology A Paper on Multisite Framework for Web page Recommendation Using Incremental Mining Mr.

More information

Analysis of Tweets: Donald Trump and sexual harassment allegations

Analysis of Tweets: Donald Trump and sexual harassment allegations Analysis of Tweets: Donald Trump and sexual harassment allegations MAIN CONCEPT Twitter historical database was used to search for suitable tweets. Search term of Trump and harassment was consequently

More information

Graph Exploitation Testbed

Graph Exploitation Testbed Graph Exploitation Testbed Peter Jones and Eric Robinson Graph Exploitation Symposium April 18, 2012 This work was sponsored by the Office of Naval Research under Air Force Contract FA8721-05-C-0002. Opinions,

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Big Data - Security with Privacy

Big Data - Security with Privacy Big Data - Security with Privacy Elisa Bertino CS Department, Cyber Center, and CERIAS Purdue University Cyber Center Today we have technologies for Acquiring and sensing data Transmitting data Storing,

More information

Trace Analysis, Clustering and Network Theory NOMADS

Trace Analysis, Clustering and Network Theory NOMADS Trace Analysis, Clustering and Network Theory NOMADS Usage Trace in RAW Form 2007 11 17 00:00:03 EST b430bar-win-ap1200-1 [info] 189341: Nov 17 00:00:02 EST: %DOT11-6-ASSOC: Interface Dot11Radio0, Station

More information

Visual Analytics Sandbox: A big data platform for processing network traffic

Visual Analytics Sandbox: A big data platform for processing network traffic Visual Analytics Sandbox: A big data platform for processing network traffic Raju Gottumukkala, Ph.D. Director of Research, Informatics Research Institute Site Director, NSF Center for Visual and Decision

More information

Minghai Liu, Rui Cai, Ming Zhang, and Lei Zhang. Microsoft Research, Asia School of EECS, Peking University

Minghai Liu, Rui Cai, Ming Zhang, and Lei Zhang. Microsoft Research, Asia School of EECS, Peking University Minghai Liu, Rui Cai, Ming Zhang, and Lei Zhang Microsoft Research, Asia School of EECS, Peking University Ordering Policies for Web Crawling Ordering policy To prioritize the URLs in a crawling queue

More information

Hyperbolic Traffic Load Centrality for Large-Scale Complex Communications Networks

Hyperbolic Traffic Load Centrality for Large-Scale Complex Communications Networks ICT 2016: 23 rd International Conference on Telecommunications Hyperbolic Traffic Load Centrality for Large-Scale Complex Communications Networks National Technical University of Athens (NTUA) School of

More information

City, University of London Institutional Repository

City, University of London Institutional Repository City Research Online City, University of London Institutional Repository Citation: Andrienko, N., Andrienko, G., Fuchs, G., Rinzivillo, S. & Betz, H-D. (2015). Real Time Detection and Tracking of Spatial

More information

Lecture Note: Computation problems in social. network analysis

Lecture Note: Computation problems in social. network analysis Lecture Note: Computation problems in social network analysis Bang Ye Wu CSIE, Chung Cheng University, Taiwan September 29, 2008 In this lecture note, several computational problems are listed, including

More information

Dynamic Design of Cellular Wireless Networks via Self Organizing Mechanism

Dynamic Design of Cellular Wireless Networks via Self Organizing Mechanism Dynamic Design of Cellular Wireless Networks via Self Organizing Mechanism V.Narasimha Raghavan, M.Venkatesh, Divya Sridharabalan, T.Sabhanayagam, Nithin Bharath Abstract In our paper, we are utilizing

More information

Impact of information load on the centrality parameters of a pig trade network in Northern Germany

Impact of information load on the centrality parameters of a pig trade network in Northern Germany Faculty of Agricultural and Nutritional Science Christian-Albrechts-University Kiel Institute of Animal Breeding and Husbandry Impact of information load on the centrality parameters of a pig trade network

More information

DATA MINING II - 1DL460. Spring 2014"

DATA MINING II - 1DL460. Spring 2014 DATA MINING II - 1DL460 Spring 2014" A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt14 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Trajectory Pattern Mining. Figures and charts are from some materials downloaded from the internet.

Trajectory Pattern Mining. Figures and charts are from some materials downloaded from the internet. Trajectory Pattern Mining Figures and charts are from some materials downloaded from the internet. Outline Spatio-temporal data types Mining trajectory patterns Spatio-temporal data types Spatial extension

More information

Data Mining and. in Dynamic Networks

Data Mining and. in Dynamic Networks Data Mining and Knowledge Discovery in Dynamic Networks Panos M. Pardalos Center for Applied Optimization Dept. of Industrial & Systems Engineering Affiliated Faculty of: Computer & Information Science

More information

CSCI5070 Advanced Topics in Social Computing

CSCI5070 Advanced Topics in Social Computing CSCI5070 Advanced Topics in Social Computing Irwin King The Chinese University of Hong Kong king@cse.cuhk.edu.hk!! 2012 All Rights Reserved. Outline Scale-Free Networks Generation Properties Analysis Dynamic

More information

Clustering analysis of gene expression data

Clustering analysis of gene expression data Clustering analysis of gene expression data Chapter 11 in Jonathan Pevsner, Bioinformatics and Functional Genomics, 3 rd edition (Chapter 9 in 2 nd edition) Human T cell expression data The matrix contains

More information

Mining and Analyzing Online Social Networks

Mining and Analyzing Online Social Networks The 5th EuroSys Doctoral Workshop (EuroDW 2011) Salzburg, Austria, Sunday 10 April 2011 Mining and Analyzing Online Social Networks Emilio Ferrara eferrara@unime.it Advisor: Prof. Giacomo Fiumara PhD School

More information

Network Analysis. Dr. Scott A. Hale Oxford Internet Institute 16 March 2016

Network Analysis. Dr. Scott A. Hale Oxford Internet Institute   16 March 2016 Network Analysis Dr. Scott A. Hale Oxford Internet Institute http://www.scotthale.net/ 16 March 2016 Outline for today 1 Basic network concepts 2 Network data 3 Software for networks 4 Layout algorithms

More information

From Think Like a Vertex to Think Like a Graph. Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, John McPherson

From Think Like a Vertex to Think Like a Graph. Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, John McPherson From Think Like a Vertex to Think Like a Graph Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, John McPherson Large Scale Graph Processing Graph data is everywhere and growing

More information

TrajStore: an Adaptive Storage System for Very Large Trajectory Data Sets

TrajStore: an Adaptive Storage System for Very Large Trajectory Data Sets TrajStore: an Adaptive Storage System for Very Large Trajectory Data Sets Philippe Cudré-Mauroux Eugene Wu Samuel Madden Computer Science and Artificial Intelligence Laboratory Massachusetts Institute

More information

Topology Enhancement in Wireless Multihop Networks: A Top-down Approach

Topology Enhancement in Wireless Multihop Networks: A Top-down Approach Topology Enhancement in Wireless Multihop Networks: A Top-down Approach Symeon Papavassiliou (joint work with Eleni Stai and Vasileios Karyotis) National Technical University of Athens (NTUA) School of

More information

Constructing a G(N, p) Network

Constructing a G(N, p) Network Random Graph Theory Dr. Natarajan Meghanathan Professor Department of Computer Science Jackson State University, Jackson, MS E-mail: natarajan.meghanathan@jsums.edu Introduction At first inspection, most

More information

Routing State Distance: A Path-based Metric for Network Analysis Gonca Gürsun

Routing State Distance: A Path-based Metric for Network Analysis Gonca Gürsun Routing State Distance: A Path-based Metric for Network Analysis Gonca Gürsun joint work with Natali Ruchansky, Evimaria Terzi, Mark Crovella Distance Metrics for Analyzing Routing Shortest Path Similar

More information

Link Analysis in the Cloud

Link Analysis in the Cloud Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)

More information

Semantic-Based Surveillance Video Retrieval

Semantic-Based Surveillance Video Retrieval Semantic-Based Surveillance Video Retrieval Weiming Hu, Dan Xie, Zhouyu Fu, Wenrong Zeng, and Steve Maybank, Senior Member, IEEE IEEE Transactions on Image Processing, Vol. 16, No. 4, April 2007 Present

More information

Overview of Web Mining Techniques and its Application towards Web

Overview of Web Mining Techniques and its Application towards Web Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

More information

Social Network Analysis With igraph & R. Ofrit Lesser December 11 th, 2014

Social Network Analysis With igraph & R. Ofrit Lesser December 11 th, 2014 Social Network Analysis With igraph & R Ofrit Lesser ofrit.lesser@gmail.com December 11 th, 2014 Outline The igraph R package Basic graph concepts What can you do with igraph? Construction Attributes Centrality

More information

Role of big data in classification and novel class detection in data streams

Role of big data in classification and novel class detection in data streams DOI 10.1186/s40537-016-0040-9 METHODOLOGY Open Access Role of big data in classification and novel class detection in data streams M. B. Chandak * *Correspondence: hodcs@rknec.edu; chandakmb@gmail.com

More information

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check

More information

Data Mining Course Overview

Data Mining Course Overview Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical

More information

Access Network Virtualization

Access Network Virtualization Access Network Virtualization Globecom December 10 th, 2014 George Ginis ASSIA, Inc. gginis@assia-inc.com 1 Outline Industry Trends Architectures Benefits New Services 2 Broadband Access Technologies 800

More information

Lambda Architecture for Batch and Stream Processing. October 2018

Lambda Architecture for Batch and Stream Processing. October 2018 Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.

More information

IEEE Project Titles

IEEE Project Titles www.chennaisunday.com PH:9566137117 IEEE Project Titles -2018 S.No Project Title Year CLOUD COMPUTING 1 A Collaborative Key Management Protocol in Ciphertext Policy Attribute-Based Encryption for Cloud

More information

Overlay and P2P Networks. Introduction and unstructured networks. Prof. Sasu Tarkoma

Overlay and P2P Networks. Introduction and unstructured networks. Prof. Sasu Tarkoma Overlay and P2P Networks Introduction and unstructured networks Prof. Sasu Tarkoma 14.1.2013 Contents Overlay networks and intro to networking Unstructured networks Overlay Networks An overlay network

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Privacy preserving data mining Li Xiong Slides credits: Chris Clifton Agrawal and Srikant 4/3/2011 1 Privacy Preserving Data Mining Privacy concerns about personal data AOL

More information

Data Warehousing 11g Essentials

Data Warehousing 11g Essentials Oracle 1z0-515 Data Warehousing 11g Essentials Version: 6.0 QUESTION NO: 1 Indentify the true statement about REF partitions. A. REF partitions have no impact on partition-wise joins. B. Changes to partitioning

More information

Spatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018

Spatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018 Spatiotemporal Access to Moving Objects Hao LIU, Xu GENG 17/04/2018 Contents Overview & applications Spatiotemporal queries Movingobjects modeling Sampled locations Linear function of time Indexing structure

More information

Chris Moffatt Director of Technology, Ed-Fi Alliance

Chris Moffatt Director of Technology, Ed-Fi Alliance Chris Moffatt Director of Technology, Ed-Fi Alliance Review Background and Context Temporal ODS Project Project Overview Design and Architecture Demo Temporal Snapshot & Query Proof of Concept Discussion

More information

Behavioral Data Mining. Lecture 9 Modeling People

Behavioral Data Mining. Lecture 9 Modeling People Behavioral Data Mining Lecture 9 Modeling People Outline Power Laws Big-5 Personality Factors Social Network Structure Power Laws Y-axis = frequency of word, X-axis = rank in decreasing order Power Laws

More information

Clustering Large Dynamic Datasets Using Exemplar Points

Clustering Large Dynamic Datasets Using Exemplar Points Clustering Large Dynamic Datasets Using Exemplar Points William Sia, Mihai M. Lazarescu Department of Computer Science, Curtin University, GPO Box U1987, Perth 61, W.A. Email: {siaw, lazaresc}@cs.curtin.edu.au

More information

TrajStore: an Adaptive Storage System for Very Large Trajectory Data Sets

TrajStore: an Adaptive Storage System for Very Large Trajectory Data Sets TrajStore: an Adaptive Storage System for Very Large Trajectory Data Sets Philippe Cudré-Mauroux Eugene Wu Samuel Madden Computer Science and Artificial Intelligence Laboratory Massachusetts Institute

More information

Compressed representations for web and social graphs. Cecilia Hernandez and Gonzalo Navarro Presented by Helen Xu 6.

Compressed representations for web and social graphs. Cecilia Hernandez and Gonzalo Navarro Presented by Helen Xu 6. Compressed representations for web and social graphs Cecilia Hernandez and Gonzalo Navarro Presented by Helen Xu 6.886 April 6, 2018 Web graphs and social networks Web graphs represent the link structure

More information

arxiv: v1 [cs.db] 20 Oct 2014

arxiv: v1 [cs.db] 20 Oct 2014 Università della Svizzera italiana USI Technical Report Series in Informatics Optimized Disk Layouts for Adaptive Storage of Interaction Graphs Robert Soulé 1, Buğra Gedik 2 arxiv:1410.5290v1 [cs.db] 20

More information

Efficient solutions for the monitoring of the Internet

Efficient solutions for the monitoring of the Internet Efficient solutions for the monitoring of the Internet Chadi BARAKAT INRIA Sophia Antipolis, France Planète research group HDR defense January 22, 2009 Email: Chadi.Barakat@sophia.inria.fr WEB: http://www.inria.fr/planete/chadi

More information

Constructing a G(N, p) Network

Constructing a G(N, p) Network Random Graph Theory Dr. Natarajan Meghanathan Associate Professor Department of Computer Science Jackson State University, Jackson, MS E-mail: natarajan.meghanathan@jsums.edu Introduction At first inspection,

More information

Graph Exploration: Taking the User into the Loop

Graph Exploration: Taking the User into the Loop Graph Exploration: Taking the User into the Loop Davide Mottin, Anja Jentzsch, Emmanuel Müller Hasso Plattner Institute, Potsdam, Germany 2016/10/24 CIKM2016, Indianapolis, US Where we are Background (5

More information

Social Media Intelligence Text and Network Mining combined. Dr. Rosaria Silipo

Social Media Intelligence Text and Network Mining combined. Dr. Rosaria Silipo Social Media Intelligence Text and Network Mining combined Dr. Rosaria Silipo rosariasilipo@yahoo.com Previously on PAW... PAW San Francisco 2012 2 Social Media Analysis Water Water Everywhere, and not

More information

Query Independent Scholarly Article Ranking

Query Independent Scholarly Article Ranking Query Independent Scholarly Article Ranking Shuai Ma, Chen Gong, Renjun Hu, Dongsheng Luo, Chunming Hu, Jinpeng Huai SKLSDE Lab, Beihang University, China Beijing Advanced Innovation Center for Big Data

More information

Operations Dashboard for ArcGIS Monitoring GIS Operations. Michele Lundeen Esri

Operations Dashboard for ArcGIS Monitoring GIS Operations. Michele Lundeen Esri Operations Dashboard for ArcGIS Monitoring GIS Operations Michele Lundeen Esri mlundeen@esri.com What is a dashboard? Conceptual term, can mean different things to different audiences Dashboards provide

More information

Towards Energy Proportionality for Large-Scale Latency-Critical Workloads

Towards Energy Proportionality for Large-Scale Latency-Critical Workloads Towards Energy Proportionality for Large-Scale Latency-Critical Workloads David Lo *, Liqun Cheng *, Rama Govindaraju *, Luiz André Barroso *, Christos Kozyrakis Stanford University * Google Inc. 2012

More information

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008 Lesson 4 Random graphs Sergio Barbarossa Graph models 1. Uncorrelated random graph (Erdős, Rényi) N nodes are connected through n edges which are chosen randomly from the possible configurations 2. Binomial

More information

Properties of Biological Networks

Properties of Biological Networks Properties of Biological Networks presented by: Ola Hamud June 12, 2013 Supervisor: Prof. Ron Pinter Based on: NETWORK BIOLOGY: UNDERSTANDING THE CELL S FUNCTIONAL ORGANIZATION By Albert-László Barabási

More information

Handling and processing, high-velocity networked data

Handling and processing, high-velocity networked data 10 Feature Article: Knowledge Discovery from Temporal Social Networks Knowledge Discovery from Temporal Social Networks Shazia Tabassum, Fabíola S. F. Pereira, and João Gama Abstract Extracting knowledge

More information

Flash Storage Complementing a Data Lake for Real-Time Insight

Flash Storage Complementing a Data Lake for Real-Time Insight Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum

More information