Data Mining and. in Dynamic Networks
|
|
- Lester York
- 6 years ago
- Views:
Transcription
1 Data Mining and Knowledge Discovery in Dynamic Networks Panos M. Pardalos Center for Applied Optimization Dept. of Industrial & Systems Engineering Affiliated Faculty of: Computer & Information Science & Engineering Department Biomedical Engineering Program, McKnight Brain Institute
2 Massive Datasets The proliferation of massive datasets brings with it a series of special computational challenges. This data avalanche arises in a wide range of scientific and commercial applications. With advances in computer and information technologies, many of these challenges are beginning to be addressed. (Abello, Pardalos & Resende, 2002, Handbook of Massive Datasets)
3 Knowledge Discovery in Databases (KDD) KDD is the process of identifying valid, novel, potentially useful, and ultimately understandable structure (models and patterns) in the data Understand the application domain Create a target dataset Remove (or correct) corrupted data Apply data-reduction algorithms Apply data mining algorithms Interpret the mined patterns
4 Graph Representation of Massive Datasets In many cases, it is convenient to represent a dataset as a graph (network) with certain attributes associated with its vertices and edges Studying the properties of these graphs often provides useful information about the internal structure of the datasets they represent
5 Important Concepts A graph G = (V, E), V = set of vertices, E = set of edges Degrees of the vertices, degree distribution Size of connected components Edge density Cliques and independent sets
6 Example of a graph
7 Examples of Real-Life Massive Graphs Web graph (links between websites) Call graph (telephone traffic data) Market graph (stock prices data) Brain networks (neurons and connections between them)
8 Degree Distribution: Power Law Degree distribution of a graph characterizes global statistical patterns underlying the dataset this graph represents Interestingly, the degree distribution of all considered real-life graphs has a well-defined power-law structure: The probability that a vertex has a degree k (i.e., k neighbors) is or ( Self-organized networks)
9 Cliques and Independent Sets A clique is a subgraph of G that has all possible edges Cliques represent dense clusters of similar objects An independent set is a subgraph of G with no edges. Independent sets represent groups of different objects
10 Maximum Clique and Independent Set Problems The subject of a special interest is to find the maximum clique and independent set in the graph Maximum clique and maximum independent set problems can be transformed to each other, using the notion of complementary graph These problems are NP-hard
11 Finding cliques and independent sets Heuristic algorithms (no guarantee to find an optimal solution) Exact algorithms (finding maximum clique or independent set)
12 Clique Partitioning Minimum clique partition: dividing the graph into a minimum number of distinct cliques This provides a natural way of partitioning a dataset represented by a graph into a number of clusters of similar objects (clustering problem), where the number of clusters is the minimum number of cliques in the graph
13 Graph Coloring Coloring essentially represents the partitioning of the graph into a minimum number of independent sets Partitioning a dataset represented by a graph into a number of clusters of different objects
14 Call Graph The call graph comes from telecommunications traffic. The vertices of this graph are telephone numbers, and the edges are calls made from one number to another (including additional billing data, such as, the time of the call and its duration). The challenge in studying call graphs is that they are massive. Every day AT & T handles approximately 300 million long-distance calls. (American Scientist Online, Jan- Feb 2000) Careful analysis of the call graph could help with infrastructure planning, customer classification and marketing. How can we visualize such massive graphs? To flash a terabyte of data on a 1000x1000 screen, you need to cram a megabyte of data into each pixel!
15 Call Graph In our experiments with data from telecommunication traffic, in an instance of the corresponding multigraph has 53,767,087 vertices and over 170 million of edges. It is a not a connected graph, but has 3.7 million separate components, yet a giant connected component with 44,989,297 vertices was computed. The maximum (quasi)-clique problem is considered in this giant component. We found cliques of size 30 and there were more than of these 30- member cliques (Abello, Pardalos & Resende)
16 Call graph
17 Call Graph In a battlefield situation, just counting the messages or identifying the source and the intended recipient of each message, constructing a call graph, yields valuable information like the organization of a military force. The records in the call database are collected for commercial purposes. In order to send an itemized bill, a phone company needs to keep track of every call completed, with the originating and receiving phone numbers and the starting and ending times. The largest companies handle roughly 250 million toll calls a day, and so a month's worth of data amounts to several billion call records. AT&T reports that its database of retained records is approaching two trillion calls and more than 300 terabytes of data.
18 Call Graph Historical calling patterns can be used to detect fraud, and some patterns are also of interest in marketing. For example, a company that offers a discounted rate within a "calling circle" can use information from the call graph to estimate the costs and benefits of the program. This kind of traffic data could be compiled for other communications channels. For instance, Federal Express and other courier services keep digitized records of their deliveries, which could readily be transformed into a database of senders and receivers. With a packet sniffer installed in the network, we compile this data for the traffic. (American Scientist Online, Sep-Oct 2006)
19 Degree Distribution of the Call Graph (data by AT&T)
20 Market Graph Vertices are stocks, and an edge connects two stocks if the correlation between their price fluctuations over a certain period is greater than a specified threshold ~6000 vertices (stocks)
21 Market Graph Market graph (all the considered instances for different correlation thresholds) follows the power-law model Using the combination of heuristic and exact algorithms, the exact solution of the maximum clique problem was found (Boginski, Butenko & Pardalos)
22 Degree distribution of the Market graph
23 Finding Cliques in the Market Graph Applying a heuristic algorithm to find a large clique: let N(i) be the set of neighbors of the vertex i
24 Finding Cliques in the Market graph Preprocessing procedure: C is the clique found by the heuristic algorithm: recursively remove from the graph all of the vertices which are not in C and whose degree is less than C Denote the resulting (reduced) graph as G = (V, E )
25 Finding Cliques in the Market graph Using the IP formulation of the maximum clique problem to find the exact solution:
26 Maximum Clique size for different correlation thresholds Large cliques despite very low edge density confirms the idea about the globalization of the market
27 Classification of Stocks Using Clique Partitioning A clique in the market graph represents a dense cluster of stocks whose prices exhibit a similar behavior over time Therefore, dividing the market graph into a set of distinct cliques (clique partitioning) is a natural approach to classifying stocks (dividing the set of stocks into clusters of similar objects an approach to solving the clustering problem)
28 Independent sets in the Market graph Maximum independent set represents the largest perfectly diversified portfolio Solving the maximum clique problem in the complementary graph The preprocessing procedure could not reduce the size of the initial graph, the exact solution could not be found Large diversified portfolios are hard to find
29 Independent set sizes for different correlation thresholds Relatively small independent sets found by the heuristic algorithm
30 Independent Sets in the Market Graph Finding a perfectly diversified portfolio containing any given stock For every vertex in the market graph, an independent set that contains this vertex was detected, and the sizes of these independent sets were almost the same, which means that it is possible to find a diversified portfolio containing any given stock using the market graph methodology
31 Maximum Clique size for different correlation thresholds Maximum clique size for various thresholds in Food Market Graph
32 Independent Sets in the Market Graph Maximum independent set size for various thresholds in Food Market Graph
33 Connected Components Intuition in Market Graph Two nodes are correlated if their correspondent nodes are connected by edge (correlated) Power-law graphs generally have very high clustering coefficient i.e., the tendency for association of two nodes which are associated with a common node is high
34 Connected Components in Market Graph Largest Group size by Time Period Group Size by Time Period - (0.7) Group Size by Time period - (0.6) Largest Group Size Largest group size Time Period Time Period Group Size by Time Period - (0.5) 1,400 1,200 Largest group size 1, Time Period
35 Connected Components in Market Graph Observations The increase in the giant component size from oldest to newest time period indicates the globalization tendency, just as in maximum clique size and edge density The giant components includes semiconductor industries and the increase in the size of the giant components corroborates the observation that the number of these industries has been increasing with time
36 Additional Applications Social Networks Biological Networks Transportation Networks (place of living and place of work)
37 References J. Abello, P.M. Pardalos, and M.G.C. Resende (eds.), Handbook of Massive Data Sets, Kluwer Academic Publishers. V. Boginski, S. Butenko, and P.M. Pardalos, Modeling and Optimization in Massive Graphs. In: P. M. Pardalos and H. Wolkowicz, eds. Novel Approaches to Hard Discrete Optimization, American Mathematical Society, V. Boginski, S. Butenko, and P.M. Pardalos, On Structural Properties of the Market Graph. In: A. Nagurney (editor), Innovations in Financial and Economic Networks, Edward Elgar Publishers,
38 References American Scientist Online, (Jan-Feb 2000), Computing Science Graph Theory in Practice: Part I by Brian Hayes, Volume 88, No. 1 American Scientist Online, (Sep-Oct 2006), Connecting the Dots: Can the tools of graph theory and social-network studies unravel the next big plot?, Volume 94, No. 5
39 Modeling Epileptic Brain EEG recordings received from the electrodes located in different functional units of the brain (time series) The values of T-index between all pairs of electrodes are calculated Two electrodes are considered to be entrained in the seizure if the corresponding value of T-index is less than T critical.
40 Modeling Epileptic Brain One can represent all the electrode locations (functional units of the brain) as the vertices of the graph. An edge connects two vertices if the corresponding value of T-index is less than T critical, i.e. these electrode sites are entrained at a certain time moment. The evolution of the properties of this graph is investigated
41 Modeling Epileptic Brain Edge density of the considered graph (dashed lines represent the moments of epileptic seizures)
42 Modeling Epileptic Brain Size of the largest connected component (dashed lines represent the moments of epileptic seizures)
43 Modeling Epileptic Brain: Related Technique Let A be the matrix containing the values of T-index T ij for all pairs of electrodes Solve the quadratic 0-1 problem to find k electrode sites producing the minimal sum of T-indices (so-called critical sites), which means that these sites are entrained during the seizure
44 Summary There are many mathematical programming techniques for addressing data mining problems in dynamic networks Graph-based techniques for this type of problems is a promising research area Performance of any approach depends on a specific dataset there is no universal technique
45 References Data Mining in Biomedicine, P.M. Pardalos, V. Boginski, and A. Vazacopoulos (eds.), Springer, forthcoming. P.M. Pardalos, W. Chaovalitwongse, L.D. Iasemidis, J.C. Sackellares, D.-S. Shiau, P.R. Carney, O.A. Prokopyev, V.A. Yatsenko, Seizure Warning Algorithm Based on Optimization and Nonlinear Dynamics, Mathematical Programming, 101(2): O.A. Prokopyev, V. Boginski, W. Chaovalitwongse, P. M. Pardalos, J. C. Sackellares, and P. R. Carney, Network-Based Techniques in EEG Data Analysis and Epileptic Brain Modeling. To appear in: Data mining in Biomedicine, P.M. Pardalos, V. Boginski and A. Vazacopoulos (eds.), Springer.
46 This cosmos was not made by Gods or men, but always was, and is, and ever shall be ever-living fire. Heraclitus - The Fire Priest (540 BC BC)
Statistical analysis of nancial networks
Computational Statistics & Data Analysis 48 (25) 431 443 www.elsevier.com/locate/csda Statistical analysis of nancial networks VladimirBoginski a, Sergiy Butenko b, Panos M. Pardalos a; a Department of
More informationInteractive Visualization of the Stock Market Graph
Interactive Visualization of the Stock Market Graph Presented by Camilo Rostoker rostokec@cs.ubc.ca Department of Computer Science University of British Columbia Overview 1. Introduction 2. The Market
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationKnowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA
Knowledge Discovery Javier Béjar URL - Spring 2019 CS - MIA Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics
More information1 Homophily and assortative mixing
1 Homophily and assortative mixing Networks, and particularly social networks, often exhibit a property called homophily or assortative mixing, which simply means that the attributes of vertices correlate
More informationLecture Note: Computation problems in social. network analysis
Lecture Note: Computation problems in social network analysis Bang Ye Wu CSIE, Chung Cheng University, Taiwan September 29, 2008 In this lecture note, several computational problems are listed, including
More informationKnowledge Discovery. URL - Spring 2018 CS - MIA 1/22
Knowledge Discovery Javier Béjar cbea URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics
More informationBasics of Network Analysis
Basics of Network Analysis Hiroki Sayama sayama@binghamton.edu Graph = Network G(V, E): graph (network) V: vertices (nodes), E: edges (links) 1 Nodes = 1, 2, 3, 4, 5 2 3 Links = 12, 13, 15, 23,
More informationCOMPUTER SIMULATION OF COMPLEX SYSTEMS USING AUTOMATA NETWORKS K. Ming Leung
POLYTECHNIC UNIVERSITY Department of Computer and Information Science COMPUTER SIMULATION OF COMPLEX SYSTEMS USING AUTOMATA NETWORKS K. Ming Leung Abstract: Computer simulation of the dynamics of complex
More informationAn Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization
An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization Pedro Ribeiro (DCC/FCUP & CRACS/INESC-TEC) Part 1 Motivation and emergence of Network Science
More informationLATIN SQUARES AND THEIR APPLICATION TO THE FEASIBLE SET FOR ASSIGNMENT PROBLEMS
LATIN SQUARES AND THEIR APPLICATION TO THE FEASIBLE SET FOR ASSIGNMENT PROBLEMS TIMOTHY L. VIS Abstract. A significant problem in finite optimization is the assignment problem. In essence, the assignment
More informationGraph Sampling Approach for Reducing. Computational Complexity of. Large-Scale Social Network
Journal of Innovative Technology and Education, Vol. 3, 216, no. 1, 131-137 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/1.12988/jite.216.6828 Graph Sampling Approach for Reducing Computational Complexity
More informationDS504/CS586: Big Data Analytics Big Data Clustering II
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: AK 232 Fall 2016 More Discussions, Limitations v Center based clustering K-means BFR algorithm
More information9. Conclusions. 9.1 Definition KDD
9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationResearch Interests Optimization:
Mitchell: Research interests 1 Research Interests Optimization: looking for the best solution from among a number of candidates. Prototypical optimization problem: min f(x) subject to g(x) 0 x X IR n Here,
More informationDatabase and Knowledge-Base Systems: Data Mining. Martin Ester
Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro
More information4/4/16 Comp 555 Spring
4/4/16 Comp 555 Spring 2016 1 A clique is a graph where every vertex is connected via an edge to every other vertex A clique graph is a graph where each connected component is a clique The concept of clustering
More informationExtracting Information from Complex Networks
Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationCE4031 and CZ4031 Database System Principles
CE4031 and CZ4031 Database System Principles Academic AY1819 Semester 1 CE/CZ4031 Database System Principles s CE/CZ2001 Algorithms; CZ2007 Introduction to Databases CZ4033 Advanced Data Management (not
More informationCollaborative Rough Clustering
Collaborative Rough Clustering Sushmita Mitra, Haider Banka, and Witold Pedrycz Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India {sushmita, hbanka r}@isical.ac.in Dept. of Electrical
More informationLecture 20: Clustering and Evolution
Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/12/2013 Comp 465 Fall 2013 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other vertex A clique
More informationEffect of age and dementia on topology of brain functional networks. Paul McCarthy, Luba Benuskova, Liz Franz University of Otago, New Zealand
Effect of age and dementia on topology of brain functional networks Paul McCarthy, Luba Benuskova, Liz Franz University of Otago, New Zealand 1 Structural changes in aging brain Age-related changes in
More informationPROBLEM FORMULATION AND RESEARCH METHODOLOGY
PROBLEM FORMULATION AND RESEARCH METHODOLOGY ON THE SOFT COMPUTING BASED APPROACHES FOR OBJECT DETECTION AND TRACKING IN VIDEOS CHAPTER 3 PROBLEM FORMULATION AND RESEARCH METHODOLOGY The foregoing chapter
More informationChapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction
CHAPTER 5 SUMMARY AND CONCLUSION Chapter 1: Introduction Data mining is used to extract the hidden, potential, useful and valuable information from very large amount of data. Data mining tools can handle
More informationLecture 20: Clustering and Evolution
Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/11/2014 Comp 555 Bioalgorithms (Fall 2014) 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other
More informationMining High Order Decision Rules
Mining High Order Decision Rules Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 e-mail: yyao@cs.uregina.ca Abstract. We introduce the notion of high
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More informationEpilog: Further Topics
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Epilog: Further Topics Lecture: Prof. Dr. Thomas
More informationON THE COMPLEXITY OF THE BROADCAST SCHEDULING PROBLEM
ON THE COMPLEXITY OF THE BROADCAST SCHEDULING PROBLEM SERGIY I. BUTENKO, CLAYTON W. COMMANDER, AND PANOS M. PARDALOS Abstract. In this paper, a Broadcast Scheduling Problem (bsp) in a time division multiple
More informationGraph Coloring via Constraint Programming-based Column Generation
Graph Coloring via Constraint Programming-based Column Generation Stefano Gualandi Federico Malucelli Dipartimento di Elettronica e Informatica, Politecnico di Milano Viale Ponzio 24/A, 20133, Milan, Italy
More informationFall 2017 ECEN Special Topics in Data Mining and Analysis
Fall 2017 ECEN 689-600 Special Topics in Data Mining and Analysis Nick Duffield Department of Electrical & Computer Engineering Teas A&M University Organization Organization Instructor: Nick Duffield,
More informationDS504/CS586: Big Data Analytics Big Data Clustering II
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: KH 116 Fall 2017 Updates: v Progress Presentation: Week 15: 11/30 v Next Week Office hours
More informationCOMP 465 Special Topics: Data Mining
COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,
More informationPreface MOTIVATION ORGANIZATION OF THE BOOK. Section 1: Basic Concepts of Graph Theory
xv Preface MOTIVATION Graph Theory as a well-known topic in discrete mathematics, has become increasingly under interest within recent decades. This is principally due to its applicability in a wide range
More informationInternational Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 11, November 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationDiagnosis of Grape Leaf Diseases Using K-Means Clustering and Neural Network
International Conference on Emerging Trends in Applications of Computing ( ICETAC 2K7 ) Diagnosis of Grape Leaf Diseases Using K-Means Clustering and Neural Network S.Sankareswari, Dept of Computer Science
More informationAlgorithmic and Economic Aspects of Networks. Nicole Immorlica
Algorithmic and Economic Aspects of Networks Nicole Immorlica Syllabus 1. Jan. 8 th (today): Graph theory, network structure 2. Jan. 15 th : Random graphs, probabilistic network formation 3. Jan. 20 th
More informationCommunity detection. Leonid E. Zhukov
Community detection Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Network Science Leonid E.
More informationOPTIMIZATION. joint course with. Ottimizzazione Discreta and Complementi di R.O. Edoardo Amaldi. DEIB Politecnico di Milano
OPTIMIZATION joint course with Ottimizzazione Discreta and Complementi di R.O. Edoardo Amaldi DEIB Politecnico di Milano edoardo.amaldi@polimi.it Website: http://home.deib.polimi.it/amaldi/opt-16-17.shtml
More informationAn Empirical Analysis of Communities in Real-World Networks
An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization
More informationCLASSIFICATION OF BOUNDARY AND REGION SHAPES USING HU-MOMENT INVARIANTS
CLASSIFICATION OF BOUNDARY AND REGION SHAPES USING HU-MOMENT INVARIANTS B.Vanajakshi Department of Electronics & Communications Engg. Assoc.prof. Sri Viveka Institute of Technology Vijayawada, India E-mail:
More informationAn Automated System for Data Attribute Anomaly Detection
Proceedings of Machine Learning Research 77:95 101, 2017 KDD 2017: Workshop on Anomaly Detection in Finance An Automated System for Data Attribute Anomaly Detection David Love Nalin Aggarwal Alexander
More informationInternational Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani
LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More information1 More configuration model
1 More configuration model In the last lecture, we explored the definition of the configuration model, a simple method for drawing networks from the ensemble, and derived some of its mathematical properties.
More informationCE4031 and CZ4031 Database System Principles
CE431 and CZ431 Database System Principles Course CE/CZ431 Course Database System Principles CE/CZ21 Algorithms; CZ27 Introduction to Databases CZ433 Advanced Data Management (not offered currently) Lectures
More information9.1. K-means Clustering
424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific
More information11/17/2009 Comp 590/Comp Fall
Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 Problem Set #5 will be available tonight 11/17/2009 Comp 590/Comp 790-90 Fall 2009 1 Clique Graphs A clique is a graph with every vertex connected
More informationChapter 28. Outline. Definitions of Data Mining. Data Mining Concepts
Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms
More informationKeywords: clustering algorithms, unsupervised learning, cluster validity
Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Clustering Based
More informationOn competition numbers of complete multipartite graphs with partite sets of equal size. Boram PARK, Suh-Ryung KIM, and Yoshio SANO.
RIMS-1644 On competition numbers of complete multipartite graphs with partite sets of equal size By Boram PARK, Suh-Ryung KIM, and Yoshio SANO October 2008 RESEARCH INSTITUTE FOR MATHEMATICAL SCIENCES
More informationDigital Humanities Digital Economics. Digital Arts. 05. One elective course from the courses offered at the university faculties
Table 1. First study cycle 1 Year 1 semester 01. 10.9.AITMIR.1101.C е-democracy 02. 04.0.AITMIR.1102.C е-management 03. 08.9.AITMIR.1103.C е-ethics 04. One elective course from the following list 1. 08.0.AITMIR.1104.MaE
More informationInteractive 3D Representation as a Method of Investigating Information Graph Features
Interactive 3D Representation as a Method of Investigating Information Graph Features Alexander Antonov and Nikita Volkov Lomonosov Moscow State University, Moscow, Russia asa@parallel.ru, volkovnikita94@gmail.com
More informationCritical Node Detection Problem. Panos Pardalos Distinguished Professor CAO, Dept. of Industrial and Systems Engineering, University of Florida
Critical Node Detection Problem ITALY May, 2008 Panos Pardalos Distinguished Professor CAO, Dept. of Industrial and Systems Engineering, University of Florida Outline of Talk Introduction Problem Definition
More informationA Generating Function Approach to Analyze Random Graphs
A Generating Function Approach to Analyze Random Graphs Presented by - Vilas Veeraraghavan Advisor - Dr. Steven Weber Department of Electrical and Computer Engineering Drexel University April 8, 2005 Presentation
More informationStudy and Implementation of CHAMELEON algorithm for Gene Clustering
[1] Study and Implementation of CHAMELEON algorithm for Gene Clustering 1. Motivation Saurav Sahay The vast amount of gathered genomic data from Microarray and other experiments makes it extremely difficult
More informationSTATISTICS (STAT) Statistics (STAT) 1
Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).
More informationOn Universal Cycles of Labeled Graphs
On Universal Cycles of Labeled Graphs Greg Brockman Harvard University Cambridge, MA 02138 United States brockman@hcs.harvard.edu Bill Kay University of South Carolina Columbia, SC 29208 United States
More informationResponse Network Emerging from Simple Perturbation
Journal of the Korean Physical Society, Vol 44, No 3, March 2004, pp 628 632 Response Network Emerging from Simple Perturbation S-W Son, D-H Kim, Y-Y Ahn and H Jeong Department of Physics, Korea Advanced
More informationEconomic Issues and Market Dynamics. Ike Elliott Senior Vice President, Global Softswitch Services Level 3 Communications
Economic Issues and Market Dynamics Ike Elliott Senior Vice President, Global Softswitch Services Level 3 Communications Ike.elliott@Level3.com Internet infrastructure costs much less per bit than circuit
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Data Matrices and Vector Space Model Denis Helic KTI, TU Graz Nov 6, 2014 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 1 / 55 Big picture: KDDM Probability
More information11/14/2010 Intelligent Systems and Soft Computing 1
Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationIDENTIFYING GEOMETRICAL OBJECTS USING IMAGE ANALYSIS
IDENTIFYING GEOMETRICAL OBJECTS USING IMAGE ANALYSIS Fathi M. O. Hamed and Salma F. Elkofhaifee Department of Statistics Faculty of Science University of Benghazi Benghazi Libya felramly@gmail.com and
More informationOverview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010
INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,
More informationThe Maximum Clique Problem
November, 2012 Motivation How to put as much left-over stuff as possible in a tasty meal before everything will go off? Motivation Find the largest collection of food where everything goes together! Here,
More informationA Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set
A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set Renu Vashist School of Computer Science and Engineering Shri Mata Vaishno Devi University, Katra,
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationAnalysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths
Analysis of Biological Networks 1. Clustering 2. Random Walks 3. Finding paths Problem 1: Graph Clustering Finding dense subgraphs Applications Identification of novel pathways, complexes, other modules?
More informationOn the Complexity of Broadcast Scheduling. Problem
On the Complexity of Broadcast Scheduling Problem Sergiy Butenko, Clayton Commander and Panos Pardalos Abstract In this paper, a broadcast scheduling problem (BSP) in a time division multiple access (TDMA)
More informationOn Dimensionality Reduction of Massive Graphs for Indexing and Retrieval
On Dimensionality Reduction of Massive Graphs for Indexing and Retrieval Charu C. Aggarwal 1, Haixun Wang # IBM T. J. Watson Research Center Hawthorne, NY 153, USA 1 charu@us.ibm.com # Microsoft Research
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationNetwork Theory: Social, Mythological and Fictional Networks. Math 485, Spring 2018 (Midterm Report) Christina West, Taylor Martins, Yihe Hao
Network Theory: Social, Mythological and Fictional Networks Math 485, Spring 2018 (Midterm Report) Christina West, Taylor Martins, Yihe Hao Abstract: Comparative mythology is a largely qualitative and
More informationCSE 701: LARGE-SCALE GRAPH MINING. A. Erdem Sariyuce
CSE 701: LARGE-SCALE GRAPH MINING A. Erdem Sariyuce WHO AM I? My name is Erdem Office: 323 Davis Hall Office hours: Wednesday 2-4 pm Research on graph (network) mining & management Practical algorithms
More informationCommunity Detection. Community
Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,
More informationRapid growth of massive datasets
Overview Rapid growth of massive datasets E.g., Online activity, Science, Sensor networks Data Distributed Clusters are Pervasive Data Distributed Computing Mature Methods for Common Problems e.g., classification,
More informationTELCOM2125: Network Science and Analysis
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 Figures are taken from: M.E.J. Newman, Networks: An Introduction 2
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationFor students entering Part 1 in September 2018 UFCSWIYB
Programme Specification BSc Computer Science For students entering Part 1 in September 2018 UCAS Code: G400 UFCOMPB UFCSWIYB This document sets out key information about your Programme and forms part of
More informationSecurity analytics: From data to action Visual and analytical approaches to detecting modern adversaries
Security analytics: From data to action Visual and analytical approaches to detecting modern adversaries Chris Calvert, CISSP, CISM Director of Solutions Innovation Copyright 2013 Hewlett-Packard Development
More informationFor students entering Part 1 in September 2019 UFCSWIYB
Programme Specification BSc Computer Science For students entering Part 1 in September 2019 UCAS Code: G400 UFCOMPB UFCSWIYB This document sets out key information about your Programme and forms part of
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More information6. Tabu Search. 6.3 Minimum k-tree Problem. Fall 2010 Instructor: Dr. Masoud Yaghini
6. Tabu Search 6.3 Minimum k-tree Problem Fall 2010 Instructor: Dr. Masoud Yaghini Outline Definition Initial Solution Neighborhood Structure and Move Mechanism Tabu Structure Illustrative Tabu Structure
More informationCS 591: Introduction to Computer Security. Lecture 13: Evaluation
CS 591: Introduction to Computer Security Lecture 13: Evaluation James Hook Evaluation lo hi assure-o-meter 1 Evaluation Context: DoD identifies computer security as important in 70s (Anderson 1972) Recognizes
More informationCS 591: Introduction to Computer Security. Lecture 13: Evaluation
CS 591: Introduction to Computer Security Lecture 13: Evaluation James Hook Evaluation lo hi assure-o-meter Evaluation Context: DoD identifies computer security as important in 70s (Anderson 1972) Recognizes
More informationAlgorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov
Algorithms and Applications in Social Networks 2017/2018, Semester B Slava Novgorodov 1 Lesson #1 Administrative questions Course overview Introduction to Social Networks Basic definitions Network properties
More informationINTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...
INTRODUCTION... 2 WHAT IS DATA MINING?... 2 HOW TO ACHIEVE DATA MINING... 2 THE ROLE OF DARWIN... 3 FEATURES OF DARWIN... 4 USER FRIENDLY... 4 SCALABILITY... 6 VISUALIZATION... 8 FUNCTIONALITY... 10 Data
More informationMore Efficient Classification of Web Content Using Graph Sampling
More Efficient Classification of Web Content Using Graph Sampling Chris Bennett Department of Computer Science University of Georgia Athens, Georgia, USA 30602 bennett@cs.uga.edu Abstract In mining information
More informationMachine Learning (CSMML16) (Autumn term, ) Xia Hong
Machine Learning (CSMML16) (Autumn term, 28-29) Xia Hong 1 Useful books: 1. C. M. Bishop: Pattern Recognition and Machine Learning (2007) Springer. 2. S. Haykin: Neural Networks (1999) Prentice Hall. 3.
More informationCS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University
CS423: Data Mining Introduction Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS423: Data Mining 1 / 29 Quote of the day Never memorize something that
More informationA Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis
A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract
More informationCase Study: Social Network Analysis. Part II
Case Study: Social Network Analysis Part II https://sites.google.com/site/kdd2017iot/ Outline IoT Fundamentals and IoT Stream Mining Algorithms Predictive Learning Descriptive Learning Frequent Pattern
More informationHomework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)
Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in
More informationGRASP WITH PATH-RELINKING FOR THE QUADRATIC ASSIGNMENT PROBLEM
WITH PATH-RELINKING FOR THE QUADRATIC ASSIGNMENT PROBLEM CARLOS A.S. OLIVEIRA, PANOS M. PARDALOS, AND M.G.C. RESENDE Abstract. This paper describes a with path-relinking heuristic for the quadratic assignment
More informationAssociation-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications
Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications Daniel Mican, Nicolae Tomai Babes-Bolyai University, Dept. of Business Information Systems, Str. Theodor
More informationNotes for Lecture 24
U.C. Berkeley CS170: Intro to CS Theory Handout N24 Professor Luca Trevisan December 4, 2001 Notes for Lecture 24 1 Some NP-complete Numerical Problems 1.1 Subset Sum The Subset Sum problem is defined
More informationBig Data Analytics The Data Mining process. Roger Bohn March. 2016
1 Big Data Analytics The Data Mining process Roger Bohn March. 2016 Office hours HK thursday5 to 6 in the library 3115 If trouble, email or Slack private message. RB Wed. 2 to 3:30 in my office Some material
More informationHierarchical Local Clustering for Constraint Reduction in Rank-Optimizing Linear Programs
Hierarchical Local Clustering for Constraint Reduction in Rank-Optimizing Linear Programs Kaan Ataman and W. Nick Street Department of Management Sciences The University of Iowa Abstract Many real-world
More information