Data Mining and. in Dynamic Networks

Size: px
Start display at page:

Download "Data Mining and. in Dynamic Networks"

Transcription

1 Data Mining and Knowledge Discovery in Dynamic Networks Panos M. Pardalos Center for Applied Optimization Dept. of Industrial & Systems Engineering Affiliated Faculty of: Computer & Information Science & Engineering Department Biomedical Engineering Program, McKnight Brain Institute

2 Massive Datasets The proliferation of massive datasets brings with it a series of special computational challenges. This data avalanche arises in a wide range of scientific and commercial applications. With advances in computer and information technologies, many of these challenges are beginning to be addressed. (Abello, Pardalos & Resende, 2002, Handbook of Massive Datasets)

3 Knowledge Discovery in Databases (KDD) KDD is the process of identifying valid, novel, potentially useful, and ultimately understandable structure (models and patterns) in the data Understand the application domain Create a target dataset Remove (or correct) corrupted data Apply data-reduction algorithms Apply data mining algorithms Interpret the mined patterns

4 Graph Representation of Massive Datasets In many cases, it is convenient to represent a dataset as a graph (network) with certain attributes associated with its vertices and edges Studying the properties of these graphs often provides useful information about the internal structure of the datasets they represent

5 Important Concepts A graph G = (V, E), V = set of vertices, E = set of edges Degrees of the vertices, degree distribution Size of connected components Edge density Cliques and independent sets

6 Example of a graph

7 Examples of Real-Life Massive Graphs Web graph (links between websites) Call graph (telephone traffic data) Market graph (stock prices data) Brain networks (neurons and connections between them)

8 Degree Distribution: Power Law Degree distribution of a graph characterizes global statistical patterns underlying the dataset this graph represents Interestingly, the degree distribution of all considered real-life graphs has a well-defined power-law structure: The probability that a vertex has a degree k (i.e., k neighbors) is or ( Self-organized networks)

9 Cliques and Independent Sets A clique is a subgraph of G that has all possible edges Cliques represent dense clusters of similar objects An independent set is a subgraph of G with no edges. Independent sets represent groups of different objects

10 Maximum Clique and Independent Set Problems The subject of a special interest is to find the maximum clique and independent set in the graph Maximum clique and maximum independent set problems can be transformed to each other, using the notion of complementary graph These problems are NP-hard

11 Finding cliques and independent sets Heuristic algorithms (no guarantee to find an optimal solution) Exact algorithms (finding maximum clique or independent set)

12 Clique Partitioning Minimum clique partition: dividing the graph into a minimum number of distinct cliques This provides a natural way of partitioning a dataset represented by a graph into a number of clusters of similar objects (clustering problem), where the number of clusters is the minimum number of cliques in the graph

13 Graph Coloring Coloring essentially represents the partitioning of the graph into a minimum number of independent sets Partitioning a dataset represented by a graph into a number of clusters of different objects

14 Call Graph The call graph comes from telecommunications traffic. The vertices of this graph are telephone numbers, and the edges are calls made from one number to another (including additional billing data, such as, the time of the call and its duration). The challenge in studying call graphs is that they are massive. Every day AT & T handles approximately 300 million long-distance calls. (American Scientist Online, Jan- Feb 2000) Careful analysis of the call graph could help with infrastructure planning, customer classification and marketing. How can we visualize such massive graphs? To flash a terabyte of data on a 1000x1000 screen, you need to cram a megabyte of data into each pixel!

15 Call Graph In our experiments with data from telecommunication traffic, in an instance of the corresponding multigraph has 53,767,087 vertices and over 170 million of edges. It is a not a connected graph, but has 3.7 million separate components, yet a giant connected component with 44,989,297 vertices was computed. The maximum (quasi)-clique problem is considered in this giant component. We found cliques of size 30 and there were more than of these 30- member cliques (Abello, Pardalos & Resende)

16 Call graph

17 Call Graph In a battlefield situation, just counting the messages or identifying the source and the intended recipient of each message, constructing a call graph, yields valuable information like the organization of a military force. The records in the call database are collected for commercial purposes. In order to send an itemized bill, a phone company needs to keep track of every call completed, with the originating and receiving phone numbers and the starting and ending times. The largest companies handle roughly 250 million toll calls a day, and so a month's worth of data amounts to several billion call records. AT&T reports that its database of retained records is approaching two trillion calls and more than 300 terabytes of data.

18 Call Graph Historical calling patterns can be used to detect fraud, and some patterns are also of interest in marketing. For example, a company that offers a discounted rate within a "calling circle" can use information from the call graph to estimate the costs and benefits of the program. This kind of traffic data could be compiled for other communications channels. For instance, Federal Express and other courier services keep digitized records of their deliveries, which could readily be transformed into a database of senders and receivers. With a packet sniffer installed in the network, we compile this data for the traffic. (American Scientist Online, Sep-Oct 2006)

19 Degree Distribution of the Call Graph (data by AT&T)

20 Market Graph Vertices are stocks, and an edge connects two stocks if the correlation between their price fluctuations over a certain period is greater than a specified threshold ~6000 vertices (stocks)

21 Market Graph Market graph (all the considered instances for different correlation thresholds) follows the power-law model Using the combination of heuristic and exact algorithms, the exact solution of the maximum clique problem was found (Boginski, Butenko & Pardalos)

22 Degree distribution of the Market graph

23 Finding Cliques in the Market Graph Applying a heuristic algorithm to find a large clique: let N(i) be the set of neighbors of the vertex i

24 Finding Cliques in the Market graph Preprocessing procedure: C is the clique found by the heuristic algorithm: recursively remove from the graph all of the vertices which are not in C and whose degree is less than C Denote the resulting (reduced) graph as G = (V, E )

25 Finding Cliques in the Market graph Using the IP formulation of the maximum clique problem to find the exact solution:

26 Maximum Clique size for different correlation thresholds Large cliques despite very low edge density confirms the idea about the globalization of the market

27 Classification of Stocks Using Clique Partitioning A clique in the market graph represents a dense cluster of stocks whose prices exhibit a similar behavior over time Therefore, dividing the market graph into a set of distinct cliques (clique partitioning) is a natural approach to classifying stocks (dividing the set of stocks into clusters of similar objects an approach to solving the clustering problem)

28 Independent sets in the Market graph Maximum independent set represents the largest perfectly diversified portfolio Solving the maximum clique problem in the complementary graph The preprocessing procedure could not reduce the size of the initial graph, the exact solution could not be found Large diversified portfolios are hard to find

29 Independent set sizes for different correlation thresholds Relatively small independent sets found by the heuristic algorithm

30 Independent Sets in the Market Graph Finding a perfectly diversified portfolio containing any given stock For every vertex in the market graph, an independent set that contains this vertex was detected, and the sizes of these independent sets were almost the same, which means that it is possible to find a diversified portfolio containing any given stock using the market graph methodology

31 Maximum Clique size for different correlation thresholds Maximum clique size for various thresholds in Food Market Graph

32 Independent Sets in the Market Graph Maximum independent set size for various thresholds in Food Market Graph

33 Connected Components Intuition in Market Graph Two nodes are correlated if their correspondent nodes are connected by edge (correlated) Power-law graphs generally have very high clustering coefficient i.e., the tendency for association of two nodes which are associated with a common node is high

34 Connected Components in Market Graph Largest Group size by Time Period Group Size by Time Period - (0.7) Group Size by Time period - (0.6) Largest Group Size Largest group size Time Period Time Period Group Size by Time Period - (0.5) 1,400 1,200 Largest group size 1, Time Period

35 Connected Components in Market Graph Observations The increase in the giant component size from oldest to newest time period indicates the globalization tendency, just as in maximum clique size and edge density The giant components includes semiconductor industries and the increase in the size of the giant components corroborates the observation that the number of these industries has been increasing with time

36 Additional Applications Social Networks Biological Networks Transportation Networks (place of living and place of work)

37 References J. Abello, P.M. Pardalos, and M.G.C. Resende (eds.), Handbook of Massive Data Sets, Kluwer Academic Publishers. V. Boginski, S. Butenko, and P.M. Pardalos, Modeling and Optimization in Massive Graphs. In: P. M. Pardalos and H. Wolkowicz, eds. Novel Approaches to Hard Discrete Optimization, American Mathematical Society, V. Boginski, S. Butenko, and P.M. Pardalos, On Structural Properties of the Market Graph. In: A. Nagurney (editor), Innovations in Financial and Economic Networks, Edward Elgar Publishers,

38 References American Scientist Online, (Jan-Feb 2000), Computing Science Graph Theory in Practice: Part I by Brian Hayes, Volume 88, No. 1 American Scientist Online, (Sep-Oct 2006), Connecting the Dots: Can the tools of graph theory and social-network studies unravel the next big plot?, Volume 94, No. 5

39 Modeling Epileptic Brain EEG recordings received from the electrodes located in different functional units of the brain (time series) The values of T-index between all pairs of electrodes are calculated Two electrodes are considered to be entrained in the seizure if the corresponding value of T-index is less than T critical.

40 Modeling Epileptic Brain One can represent all the electrode locations (functional units of the brain) as the vertices of the graph. An edge connects two vertices if the corresponding value of T-index is less than T critical, i.e. these electrode sites are entrained at a certain time moment. The evolution of the properties of this graph is investigated

41 Modeling Epileptic Brain Edge density of the considered graph (dashed lines represent the moments of epileptic seizures)

42 Modeling Epileptic Brain Size of the largest connected component (dashed lines represent the moments of epileptic seizures)

43 Modeling Epileptic Brain: Related Technique Let A be the matrix containing the values of T-index T ij for all pairs of electrodes Solve the quadratic 0-1 problem to find k electrode sites producing the minimal sum of T-indices (so-called critical sites), which means that these sites are entrained during the seizure

44 Summary There are many mathematical programming techniques for addressing data mining problems in dynamic networks Graph-based techniques for this type of problems is a promising research area Performance of any approach depends on a specific dataset there is no universal technique

45 References Data Mining in Biomedicine, P.M. Pardalos, V. Boginski, and A. Vazacopoulos (eds.), Springer, forthcoming. P.M. Pardalos, W. Chaovalitwongse, L.D. Iasemidis, J.C. Sackellares, D.-S. Shiau, P.R. Carney, O.A. Prokopyev, V.A. Yatsenko, Seizure Warning Algorithm Based on Optimization and Nonlinear Dynamics, Mathematical Programming, 101(2): O.A. Prokopyev, V. Boginski, W. Chaovalitwongse, P. M. Pardalos, J. C. Sackellares, and P. R. Carney, Network-Based Techniques in EEG Data Analysis and Epileptic Brain Modeling. To appear in: Data mining in Biomedicine, P.M. Pardalos, V. Boginski and A. Vazacopoulos (eds.), Springer.

46 This cosmos was not made by Gods or men, but always was, and is, and ever shall be ever-living fire. Heraclitus - The Fire Priest (540 BC BC)

Statistical analysis of nancial networks

Statistical analysis of nancial networks Computational Statistics & Data Analysis 48 (25) 431 443 www.elsevier.com/locate/csda Statistical analysis of nancial networks VladimirBoginski a, Sergiy Butenko b, Panos M. Pardalos a; a Department of

More information

Interactive Visualization of the Stock Market Graph

Interactive Visualization of the Stock Market Graph Interactive Visualization of the Stock Market Graph Presented by Camilo Rostoker rostokec@cs.ubc.ca Department of Computer Science University of British Columbia Overview 1. Introduction 2. The Market

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA Knowledge Discovery Javier Béjar URL - Spring 2019 CS - MIA Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics

More information

1 Homophily and assortative mixing

1 Homophily and assortative mixing 1 Homophily and assortative mixing Networks, and particularly social networks, often exhibit a property called homophily or assortative mixing, which simply means that the attributes of vertices correlate

More information

Lecture Note: Computation problems in social. network analysis

Lecture Note: Computation problems in social. network analysis Lecture Note: Computation problems in social network analysis Bang Ye Wu CSIE, Chung Cheng University, Taiwan September 29, 2008 In this lecture note, several computational problems are listed, including

More information

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery Javier Béjar cbea URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics

More information

Basics of Network Analysis

Basics of Network Analysis Basics of Network Analysis Hiroki Sayama sayama@binghamton.edu Graph = Network G(V, E): graph (network) V: vertices (nodes), E: edges (links) 1 Nodes = 1, 2, 3, 4, 5 2 3 Links = 12, 13, 15, 23,

More information

COMPUTER SIMULATION OF COMPLEX SYSTEMS USING AUTOMATA NETWORKS K. Ming Leung

COMPUTER SIMULATION OF COMPLEX SYSTEMS USING AUTOMATA NETWORKS K. Ming Leung POLYTECHNIC UNIVERSITY Department of Computer and Information Science COMPUTER SIMULATION OF COMPLEX SYSTEMS USING AUTOMATA NETWORKS K. Ming Leung Abstract: Computer simulation of the dynamics of complex

More information

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization Pedro Ribeiro (DCC/FCUP & CRACS/INESC-TEC) Part 1 Motivation and emergence of Network Science

More information

LATIN SQUARES AND THEIR APPLICATION TO THE FEASIBLE SET FOR ASSIGNMENT PROBLEMS

LATIN SQUARES AND THEIR APPLICATION TO THE FEASIBLE SET FOR ASSIGNMENT PROBLEMS LATIN SQUARES AND THEIR APPLICATION TO THE FEASIBLE SET FOR ASSIGNMENT PROBLEMS TIMOTHY L. VIS Abstract. A significant problem in finite optimization is the assignment problem. In essence, the assignment

More information

Graph Sampling Approach for Reducing. Computational Complexity of. Large-Scale Social Network

Graph Sampling Approach for Reducing. Computational Complexity of. Large-Scale Social Network Journal of Innovative Technology and Education, Vol. 3, 216, no. 1, 131-137 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/1.12988/jite.216.6828 Graph Sampling Approach for Reducing Computational Complexity

More information

DS504/CS586: Big Data Analytics Big Data Clustering II

DS504/CS586: Big Data Analytics Big Data Clustering II Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: AK 232 Fall 2016 More Discussions, Limitations v Center based clustering K-means BFR algorithm

More information

9. Conclusions. 9.1 Definition KDD

9. Conclusions. 9.1 Definition KDD 9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Research Interests Optimization:

Research Interests Optimization: Mitchell: Research interests 1 Research Interests Optimization: looking for the best solution from among a number of candidates. Prototypical optimization problem: min f(x) subject to g(x) 0 x X IR n Here,

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

4/4/16 Comp 555 Spring

4/4/16 Comp 555 Spring 4/4/16 Comp 555 Spring 2016 1 A clique is a graph where every vertex is connected via an edge to every other vertex A clique graph is a graph where each connected component is a clique The concept of clustering

More information

Extracting Information from Complex Networks

Extracting Information from Complex Networks Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

CE4031 and CZ4031 Database System Principles

CE4031 and CZ4031 Database System Principles CE4031 and CZ4031 Database System Principles Academic AY1819 Semester 1 CE/CZ4031 Database System Principles s CE/CZ2001 Algorithms; CZ2007 Introduction to Databases CZ4033 Advanced Data Management (not

More information

Collaborative Rough Clustering

Collaborative Rough Clustering Collaborative Rough Clustering Sushmita Mitra, Haider Banka, and Witold Pedrycz Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India {sushmita, hbanka r}@isical.ac.in Dept. of Electrical

More information

Lecture 20: Clustering and Evolution

Lecture 20: Clustering and Evolution Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/12/2013 Comp 465 Fall 2013 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other vertex A clique

More information

Effect of age and dementia on topology of brain functional networks. Paul McCarthy, Luba Benuskova, Liz Franz University of Otago, New Zealand

Effect of age and dementia on topology of brain functional networks. Paul McCarthy, Luba Benuskova, Liz Franz University of Otago, New Zealand Effect of age and dementia on topology of brain functional networks Paul McCarthy, Luba Benuskova, Liz Franz University of Otago, New Zealand 1 Structural changes in aging brain Age-related changes in

More information

PROBLEM FORMULATION AND RESEARCH METHODOLOGY

PROBLEM FORMULATION AND RESEARCH METHODOLOGY PROBLEM FORMULATION AND RESEARCH METHODOLOGY ON THE SOFT COMPUTING BASED APPROACHES FOR OBJECT DETECTION AND TRACKING IN VIDEOS CHAPTER 3 PROBLEM FORMULATION AND RESEARCH METHODOLOGY The foregoing chapter

More information

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction CHAPTER 5 SUMMARY AND CONCLUSION Chapter 1: Introduction Data mining is used to extract the hidden, potential, useful and valuable information from very large amount of data. Data mining tools can handle

More information

Lecture 20: Clustering and Evolution

Lecture 20: Clustering and Evolution Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/11/2014 Comp 555 Bioalgorithms (Fall 2014) 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other

More information

Mining High Order Decision Rules

Mining High Order Decision Rules Mining High Order Decision Rules Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 e-mail: yyao@cs.uregina.ca Abstract. We introduce the notion of high

More information

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology 9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example

More information

Epilog: Further Topics

Epilog: Further Topics Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Epilog: Further Topics Lecture: Prof. Dr. Thomas

More information

ON THE COMPLEXITY OF THE BROADCAST SCHEDULING PROBLEM

ON THE COMPLEXITY OF THE BROADCAST SCHEDULING PROBLEM ON THE COMPLEXITY OF THE BROADCAST SCHEDULING PROBLEM SERGIY I. BUTENKO, CLAYTON W. COMMANDER, AND PANOS M. PARDALOS Abstract. In this paper, a Broadcast Scheduling Problem (bsp) in a time division multiple

More information

Graph Coloring via Constraint Programming-based Column Generation

Graph Coloring via Constraint Programming-based Column Generation Graph Coloring via Constraint Programming-based Column Generation Stefano Gualandi Federico Malucelli Dipartimento di Elettronica e Informatica, Politecnico di Milano Viale Ponzio 24/A, 20133, Milan, Italy

More information

Fall 2017 ECEN Special Topics in Data Mining and Analysis

Fall 2017 ECEN Special Topics in Data Mining and Analysis Fall 2017 ECEN 689-600 Special Topics in Data Mining and Analysis Nick Duffield Department of Electrical & Computer Engineering Teas A&M University Organization Organization Instructor: Nick Duffield,

More information

DS504/CS586: Big Data Analytics Big Data Clustering II

DS504/CS586: Big Data Analytics Big Data Clustering II Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: KH 116 Fall 2017 Updates: v Progress Presentation: Week 15: 11/30 v Next Week Office hours

More information

COMP 465 Special Topics: Data Mining

COMP 465 Special Topics: Data Mining COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,

More information

Preface MOTIVATION ORGANIZATION OF THE BOOK. Section 1: Basic Concepts of Graph Theory

Preface MOTIVATION ORGANIZATION OF THE BOOK. Section 1: Basic Concepts of Graph Theory xv Preface MOTIVATION Graph Theory as a well-known topic in discrete mathematics, has become increasingly under interest within recent decades. This is principally due to its applicability in a wide range

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 11, November 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Diagnosis of Grape Leaf Diseases Using K-Means Clustering and Neural Network

Diagnosis of Grape Leaf Diseases Using K-Means Clustering and Neural Network International Conference on Emerging Trends in Applications of Computing ( ICETAC 2K7 ) Diagnosis of Grape Leaf Diseases Using K-Means Clustering and Neural Network S.Sankareswari, Dept of Computer Science

More information

Algorithmic and Economic Aspects of Networks. Nicole Immorlica

Algorithmic and Economic Aspects of Networks. Nicole Immorlica Algorithmic and Economic Aspects of Networks Nicole Immorlica Syllabus 1. Jan. 8 th (today): Graph theory, network structure 2. Jan. 15 th : Random graphs, probabilistic network formation 3. Jan. 20 th

More information

Community detection. Leonid E. Zhukov

Community detection. Leonid E. Zhukov Community detection Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Network Science Leonid E.

More information

OPTIMIZATION. joint course with. Ottimizzazione Discreta and Complementi di R.O. Edoardo Amaldi. DEIB Politecnico di Milano

OPTIMIZATION. joint course with. Ottimizzazione Discreta and Complementi di R.O. Edoardo Amaldi. DEIB Politecnico di Milano OPTIMIZATION joint course with Ottimizzazione Discreta and Complementi di R.O. Edoardo Amaldi DEIB Politecnico di Milano edoardo.amaldi@polimi.it Website: http://home.deib.polimi.it/amaldi/opt-16-17.shtml

More information

An Empirical Analysis of Communities in Real-World Networks

An Empirical Analysis of Communities in Real-World Networks An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization

More information

CLASSIFICATION OF BOUNDARY AND REGION SHAPES USING HU-MOMENT INVARIANTS

CLASSIFICATION OF BOUNDARY AND REGION SHAPES USING HU-MOMENT INVARIANTS CLASSIFICATION OF BOUNDARY AND REGION SHAPES USING HU-MOMENT INVARIANTS B.Vanajakshi Department of Electronics & Communications Engg. Assoc.prof. Sri Viveka Institute of Technology Vijayawada, India E-mail:

More information

An Automated System for Data Attribute Anomaly Detection

An Automated System for Data Attribute Anomaly Detection Proceedings of Machine Learning Research 77:95 101, 2017 KDD 2017: Workshop on Anomaly Detection in Finance An Automated System for Data Attribute Anomaly Detection David Love Nalin Aggarwal Alexander

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

1 More configuration model

1 More configuration model 1 More configuration model In the last lecture, we explored the definition of the configuration model, a simple method for drawing networks from the ensemble, and derived some of its mathematical properties.

More information

CE4031 and CZ4031 Database System Principles

CE4031 and CZ4031 Database System Principles CE431 and CZ431 Database System Principles Course CE/CZ431 Course Database System Principles CE/CZ21 Algorithms; CZ27 Introduction to Databases CZ433 Advanced Data Management (not offered currently) Lectures

More information

9.1. K-means Clustering

9.1. K-means Clustering 424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific

More information

11/17/2009 Comp 590/Comp Fall

11/17/2009 Comp 590/Comp Fall Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 Problem Set #5 will be available tonight 11/17/2009 Comp 590/Comp 790-90 Fall 2009 1 Clique Graphs A clique is a graph with every vertex connected

More information

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms

More information

Keywords: clustering algorithms, unsupervised learning, cluster validity

Keywords: clustering algorithms, unsupervised learning, cluster validity Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Clustering Based

More information

On competition numbers of complete multipartite graphs with partite sets of equal size. Boram PARK, Suh-Ryung KIM, and Yoshio SANO.

On competition numbers of complete multipartite graphs with partite sets of equal size. Boram PARK, Suh-Ryung KIM, and Yoshio SANO. RIMS-1644 On competition numbers of complete multipartite graphs with partite sets of equal size By Boram PARK, Suh-Ryung KIM, and Yoshio SANO October 2008 RESEARCH INSTITUTE FOR MATHEMATICAL SCIENCES

More information

Digital Humanities Digital Economics. Digital Arts. 05. One elective course from the courses offered at the university faculties

Digital Humanities Digital Economics. Digital Arts. 05. One elective course from the courses offered at the university faculties Table 1. First study cycle 1 Year 1 semester 01. 10.9.AITMIR.1101.C е-democracy 02. 04.0.AITMIR.1102.C е-management 03. 08.9.AITMIR.1103.C е-ethics 04. One elective course from the following list 1. 08.0.AITMIR.1104.MaE

More information

Interactive 3D Representation as a Method of Investigating Information Graph Features

Interactive 3D Representation as a Method of Investigating Information Graph Features Interactive 3D Representation as a Method of Investigating Information Graph Features Alexander Antonov and Nikita Volkov Lomonosov Moscow State University, Moscow, Russia asa@parallel.ru, volkovnikita94@gmail.com

More information

Critical Node Detection Problem. Panos Pardalos Distinguished Professor CAO, Dept. of Industrial and Systems Engineering, University of Florida

Critical Node Detection Problem. Panos Pardalos Distinguished Professor CAO, Dept. of Industrial and Systems Engineering, University of Florida Critical Node Detection Problem ITALY May, 2008 Panos Pardalos Distinguished Professor CAO, Dept. of Industrial and Systems Engineering, University of Florida Outline of Talk Introduction Problem Definition

More information

A Generating Function Approach to Analyze Random Graphs

A Generating Function Approach to Analyze Random Graphs A Generating Function Approach to Analyze Random Graphs Presented by - Vilas Veeraraghavan Advisor - Dr. Steven Weber Department of Electrical and Computer Engineering Drexel University April 8, 2005 Presentation

More information

Study and Implementation of CHAMELEON algorithm for Gene Clustering

Study and Implementation of CHAMELEON algorithm for Gene Clustering [1] Study and Implementation of CHAMELEON algorithm for Gene Clustering 1. Motivation Saurav Sahay The vast amount of gathered genomic data from Microarray and other experiments makes it extremely difficult

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

On Universal Cycles of Labeled Graphs

On Universal Cycles of Labeled Graphs On Universal Cycles of Labeled Graphs Greg Brockman Harvard University Cambridge, MA 02138 United States brockman@hcs.harvard.edu Bill Kay University of South Carolina Columbia, SC 29208 United States

More information

Response Network Emerging from Simple Perturbation

Response Network Emerging from Simple Perturbation Journal of the Korean Physical Society, Vol 44, No 3, March 2004, pp 628 632 Response Network Emerging from Simple Perturbation S-W Son, D-H Kim, Y-Y Ahn and H Jeong Department of Physics, Korea Advanced

More information

Economic Issues and Market Dynamics. Ike Elliott Senior Vice President, Global Softswitch Services Level 3 Communications

Economic Issues and Market Dynamics. Ike Elliott Senior Vice President, Global Softswitch Services Level 3 Communications Economic Issues and Market Dynamics Ike Elliott Senior Vice President, Global Softswitch Services Level 3 Communications Ike.elliott@Level3.com Internet infrastructure costs much less per bit than circuit

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Data Matrices and Vector Space Model Denis Helic KTI, TU Graz Nov 6, 2014 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 1 / 55 Big picture: KDDM Probability

More information

11/14/2010 Intelligent Systems and Soft Computing 1

11/14/2010 Intelligent Systems and Soft Computing 1 Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

SOCIAL MEDIA MINING. Data Mining Essentials

SOCIAL MEDIA MINING. Data Mining Essentials SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate

More information

IDENTIFYING GEOMETRICAL OBJECTS USING IMAGE ANALYSIS

IDENTIFYING GEOMETRICAL OBJECTS USING IMAGE ANALYSIS IDENTIFYING GEOMETRICAL OBJECTS USING IMAGE ANALYSIS Fathi M. O. Hamed and Salma F. Elkofhaifee Department of Statistics Faculty of Science University of Benghazi Benghazi Libya felramly@gmail.com and

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

The Maximum Clique Problem

The Maximum Clique Problem November, 2012 Motivation How to put as much left-over stuff as possible in a tasty meal before everything will go off? Motivation Find the largest collection of food where everything goes together! Here,

More information

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set Renu Vashist School of Computer Science and Engineering Shri Mata Vaishno Devi University, Katra,

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Analysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths

Analysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths Analysis of Biological Networks 1. Clustering 2. Random Walks 3. Finding paths Problem 1: Graph Clustering Finding dense subgraphs Applications Identification of novel pathways, complexes, other modules?

More information

On the Complexity of Broadcast Scheduling. Problem

On the Complexity of Broadcast Scheduling. Problem On the Complexity of Broadcast Scheduling Problem Sergiy Butenko, Clayton Commander and Panos Pardalos Abstract In this paper, a broadcast scheduling problem (BSP) in a time division multiple access (TDMA)

More information

On Dimensionality Reduction of Massive Graphs for Indexing and Retrieval

On Dimensionality Reduction of Massive Graphs for Indexing and Retrieval On Dimensionality Reduction of Massive Graphs for Indexing and Retrieval Charu C. Aggarwal 1, Haixun Wang # IBM T. J. Watson Research Center Hawthorne, NY 153, USA 1 charu@us.ibm.com # Microsoft Research

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Network Theory: Social, Mythological and Fictional Networks. Math 485, Spring 2018 (Midterm Report) Christina West, Taylor Martins, Yihe Hao

Network Theory: Social, Mythological and Fictional Networks. Math 485, Spring 2018 (Midterm Report) Christina West, Taylor Martins, Yihe Hao Network Theory: Social, Mythological and Fictional Networks Math 485, Spring 2018 (Midterm Report) Christina West, Taylor Martins, Yihe Hao Abstract: Comparative mythology is a largely qualitative and

More information

CSE 701: LARGE-SCALE GRAPH MINING. A. Erdem Sariyuce

CSE 701: LARGE-SCALE GRAPH MINING. A. Erdem Sariyuce CSE 701: LARGE-SCALE GRAPH MINING A. Erdem Sariyuce WHO AM I? My name is Erdem Office: 323 Davis Hall Office hours: Wednesday 2-4 pm Research on graph (network) mining & management Practical algorithms

More information

Community Detection. Community

Community Detection. Community Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,

More information

Rapid growth of massive datasets

Rapid growth of massive datasets Overview Rapid growth of massive datasets E.g., Online activity, Science, Sensor networks Data Distributed Clusters are Pervasive Data Distributed Computing Mature Methods for Common Problems e.g., classification,

More information

TELCOM2125: Network Science and Analysis

TELCOM2125: Network Science and Analysis School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 Figures are taken from: M.E.J. Newman, Networks: An Introduction 2

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

For students entering Part 1 in September 2018 UFCSWIYB

For students entering Part 1 in September 2018 UFCSWIYB Programme Specification BSc Computer Science For students entering Part 1 in September 2018 UCAS Code: G400 UFCOMPB UFCSWIYB This document sets out key information about your Programme and forms part of

More information

Security analytics: From data to action Visual and analytical approaches to detecting modern adversaries

Security analytics: From data to action Visual and analytical approaches to detecting modern adversaries Security analytics: From data to action Visual and analytical approaches to detecting modern adversaries Chris Calvert, CISSP, CISM Director of Solutions Innovation Copyright 2013 Hewlett-Packard Development

More information

For students entering Part 1 in September 2019 UFCSWIYB

For students entering Part 1 in September 2019 UFCSWIYB Programme Specification BSc Computer Science For students entering Part 1 in September 2019 UCAS Code: G400 UFCOMPB UFCSWIYB This document sets out key information about your Programme and forms part of

More information

Table Of Contents: xix Foreword to Second Edition

Table Of Contents: xix Foreword to Second Edition Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data

More information

6. Tabu Search. 6.3 Minimum k-tree Problem. Fall 2010 Instructor: Dr. Masoud Yaghini

6. Tabu Search. 6.3 Minimum k-tree Problem. Fall 2010 Instructor: Dr. Masoud Yaghini 6. Tabu Search 6.3 Minimum k-tree Problem Fall 2010 Instructor: Dr. Masoud Yaghini Outline Definition Initial Solution Neighborhood Structure and Move Mechanism Tabu Structure Illustrative Tabu Structure

More information

CS 591: Introduction to Computer Security. Lecture 13: Evaluation

CS 591: Introduction to Computer Security. Lecture 13: Evaluation CS 591: Introduction to Computer Security Lecture 13: Evaluation James Hook Evaluation lo hi assure-o-meter 1 Evaluation Context: DoD identifies computer security as important in 70s (Anderson 1972) Recognizes

More information

CS 591: Introduction to Computer Security. Lecture 13: Evaluation

CS 591: Introduction to Computer Security. Lecture 13: Evaluation CS 591: Introduction to Computer Security Lecture 13: Evaluation James Hook Evaluation lo hi assure-o-meter Evaluation Context: DoD identifies computer security as important in 70s (Anderson 1972) Recognizes

More information

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov Algorithms and Applications in Social Networks 2017/2018, Semester B Slava Novgorodov 1 Lesson #1 Administrative questions Course overview Introduction to Social Networks Basic definitions Network properties

More information

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN... INTRODUCTION... 2 WHAT IS DATA MINING?... 2 HOW TO ACHIEVE DATA MINING... 2 THE ROLE OF DARWIN... 3 FEATURES OF DARWIN... 4 USER FRIENDLY... 4 SCALABILITY... 6 VISUALIZATION... 8 FUNCTIONALITY... 10 Data

More information

More Efficient Classification of Web Content Using Graph Sampling

More Efficient Classification of Web Content Using Graph Sampling More Efficient Classification of Web Content Using Graph Sampling Chris Bennett Department of Computer Science University of Georgia Athens, Georgia, USA 30602 bennett@cs.uga.edu Abstract In mining information

More information

Machine Learning (CSMML16) (Autumn term, ) Xia Hong

Machine Learning (CSMML16) (Autumn term, ) Xia Hong Machine Learning (CSMML16) (Autumn term, 28-29) Xia Hong 1 Useful books: 1. C. M. Bishop: Pattern Recognition and Machine Learning (2007) Springer. 2. S. Haykin: Neural Networks (1999) Prentice Hall. 3.

More information

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University CS423: Data Mining Introduction Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS423: Data Mining 1 / 29 Quote of the day Never memorize something that

More information

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract

More information

Case Study: Social Network Analysis. Part II

Case Study: Social Network Analysis. Part II Case Study: Social Network Analysis Part II https://sites.google.com/site/kdd2017iot/ Outline IoT Fundamentals and IoT Stream Mining Algorithms Predictive Learning Descriptive Learning Frequent Pattern

More information

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please) Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in

More information

GRASP WITH PATH-RELINKING FOR THE QUADRATIC ASSIGNMENT PROBLEM

GRASP WITH PATH-RELINKING FOR THE QUADRATIC ASSIGNMENT PROBLEM WITH PATH-RELINKING FOR THE QUADRATIC ASSIGNMENT PROBLEM CARLOS A.S. OLIVEIRA, PANOS M. PARDALOS, AND M.G.C. RESENDE Abstract. This paper describes a with path-relinking heuristic for the quadratic assignment

More information

Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications

Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications Daniel Mican, Nicolae Tomai Babes-Bolyai University, Dept. of Business Information Systems, Str. Theodor

More information

Notes for Lecture 24

Notes for Lecture 24 U.C. Berkeley CS170: Intro to CS Theory Handout N24 Professor Luca Trevisan December 4, 2001 Notes for Lecture 24 1 Some NP-complete Numerical Problems 1.1 Subset Sum The Subset Sum problem is defined

More information

Big Data Analytics The Data Mining process. Roger Bohn March. 2016

Big Data Analytics The Data Mining process. Roger Bohn March. 2016 1 Big Data Analytics The Data Mining process Roger Bohn March. 2016 Office hours HK thursday5 to 6 in the library 3115 If trouble, email or Slack private message. RB Wed. 2 to 3:30 in my office Some material

More information

Hierarchical Local Clustering for Constraint Reduction in Rank-Optimizing Linear Programs

Hierarchical Local Clustering for Constraint Reduction in Rank-Optimizing Linear Programs Hierarchical Local Clustering for Constraint Reduction in Rank-Optimizing Linear Programs Kaan Ataman and W. Nick Street Department of Management Sciences The University of Iowa Abstract Many real-world

More information