Data Analysis with Intersection Graphs Valter Martins Vairinhos 1 Victor Lobo 1,2 Purificación Galindo Villardón 3

Size: px
Start display at page:

Download "Data Analysis with Intersection Graphs Valter Martins Vairinhos 1 Victor Lobo 1,2 Purificación Galindo Villardón 3"

Transcription

1 Data Analysis with Intersection Graphs Valter Martins Vairinhos 1 Victor Lobo 1,2 Purificación Galindo Villardón 3 1 Escola Naval, 2 ISEGI-UNL, 3 Universidad de Salamanca, Departamento de Estadística 1- Introduction Formalization of DAIG Some useful results and techniques Constructing DAGT from tables using intersection graphs Data objects and cliques Multivalued characteristics Representing contingency tables Sparse datasets Conclusions and comparisons with other approaches Bibliography... 6 Abstract This paper presents a formalization for multivariate data analysis, based on graph theory, that we have named DAIG Data Analysis with Intersection Graphs. The basic principles of this approach are presented, followed by a brief example and comparison with the standard tablebased approach. A few advantages of this approach are presented, and the paper concludes with a brief comparison with other approaches. 1- Introduction This paper presents a formalization for multivariate data analysis, based on graph theory, that we have named DAIG Data Analysis with Intersection Graphs, first proposed in (Vairinhos 2003). This approach to data analysis breaks away from the traditional table based representation of data, and uses intersection graphs as the basic representation. In this approach, the vertices are not observations or variables, but sets of observations represented by pairs of variables and values (atoms). We will try to show that this approach can in some cases be more flexible and powerful than tables, and can make good use of the extensive recent work in graph theory. To illustrate what DAIG is, we shall use example taken from the well known zoo dataset from the UC-Irvine Machine Learning Repository (UCI 2008), that is composed of 17 characteristics of 101 different animals. Most of these characteristics are binary: hair, feathers, eggs, milk, airborne, aquatic, predator, toothed, backbone, breathes, venomous, fins, tail,, catsize. The two remaining characteristics, while numerical, can be treated as categorical since they have only 6 or 7 values: number of legs, and type (mammal, bird, fish, etc). 2- Formalization of DAIG The main building blocks, or atoms, for our representation of data are sets of observations represented by pairs of variables/values. An observation o i (i = 1n) is formed by p values of 1

2 characteristics or variables X j (j = 1..p), each with a particular value x ij. We can thus represent the observation o i by p pairs of the form (X j = x ij ),(i=1..n; j=1..p), called atoms. We then take these atoms as vertices of a graph, and since these atoms are connected in the sense that they form an observation, we connect them with edges, as can be seen in Figure 1. Data pattern x i (tabular form) Data graph X 1 =x i1 X 2 =x i2 X 1 X 2 x i1 x i2 X j x ij X m xim X m =x im X j =x ij Figure 1- Tabular and data graph representation of an observation On this data graph, an observation or data pattern is a clique, i.e., a set of atoms that are connected amongst themselves. Let us see a practical example. Suppose we observe 3 characteristics of a chicken, and present them in the traditional tabular form. It s representation on a data graph will be a 3-clique, as seen in Figure 2. Data pattern x i Data graph (tabular form) Has fins Is Chicken = fins legs No fins No legs 2 legs 4 legs Figure 2 -Tabular and data graph representation of chicken As more observations are made, their representation can be added to the graph. There are various ways to do this, and depending on what sort of data analysis is desired, we may prefer one or the other. When we have various observations, these will probably share some atoms, i.e., some observations will have the same values for the same variable. This information may be explicitly 2

3 recorded by each atom (vertex) by associating with it a list of the observations that contain them, as seen in Figure 3. This graph is an intersection graph (McKee and McMorris 1999), because the edges correspond to non-empty intersections of the sets of observations that form the vertices. We may or may not associate weights to the edges of this intersection graph, depending on the objective. When necessary those weights can be defined as a function of sets corresponding to adjacent vertices. Those weights may be, for example, the cardinality of the intersection of those sets. Another way of defining weighs may be using the function f(a,b) = a b 2 / a * b where a and b are the sets of adjacent vertices. This function corresponds to the proximity measure P(a b) * P(b a), which can useful in many analysis (Vairinhos 2003). Thus, depending of the objective, after adding various observations, we will have: a) An intersection graph, where each vertex has an associated set of observations. This is the preferred representation, because it retains all the information about individual observations, and thus allows all sorts of analysis to be performed. b) A weighted intersection graph, where ach edge has an associated weight. For example, the weight may be the number of co-occurrences of the atoms in the observations a b or any other function as convenient for the objective of the study. Data patterns (tabular form) Data graph No fins Antelope Bear Chicken Clam Crow Duck Goat Worm Fins Carp Catfish Sealion No legs Carp Catfish Clam Worm Figure 3 - Data graph of a set of observations 2 legs Chicken Crow Duck Sealion Is Carp Chicken Goat 4 legs Antelope Bear Goat Antelope Bear Catfish Clam Crow Duck Sealion Worm Vertices that correspond to a given variable (which we will call variable set) will normally (but not necessarily) form an independent set, since the observations will usually have a single value for each variable. 3

4 The degree of each vertex will usually be the number of variables minus one. However, it can be less if there are missing values, or greater if, for a given observation, there is more than one value for each variable. 3- Some useful results and techniques The proposed formalization leads to many interesting results and allows for efficient implementations of data analysis techniques. One of those implementations can be seen in (Vairinhos 2003) where this formalism has been used to suggest interpretations of concepts discovered with biplots. Each vertex and path along the graph corresponds to a concept described by intention. If the vertices that form this path have a non-empty intersection, that intersection is the extension of the corresponding concept. In this case, that path identifies also a non-empty clique. Let us now see some of the useful techniques and results developed for this formalism Constructing DAGT from tables using intersection graphs Conceptually, the construction of the data structure for DAGT is just like a chemical analysis of a compound: the observation is broken down into it s constituting atoms. For each variable that characterizes the observation, the existing graph is updated by adding the label of new observation to the vertex that corresponds to it s value. Edges will be added to the other vertices that compose the data pattern, if necessary. The edges can always be added later by finding nonempty intersections of the vertices. The weights of those edges can also be computed, since they are a function of the cardinality of those intersections. Adding the edges as the observations are introduced has the advantage of reducing the computing time later, since it avoids computing empty intersections. 3.1 Data objects and cliques As stated previously, observations will correspond to cliques in the graph. If the observations are characterized by p variables, they will be p-cliques. It must be noted that in many real problems there are missing values. In this case, if only m < p variables where measured for an observation, then the corresponding clique will also be of a lower order. A clique in the graph may correspond to an observation, a set of observations, or no observation at all. A p-clique corresponds to a complete specification of a data observation, and will thus correspond to single datum, a set of undistinguishable ones, or a non-existent object. Whether that observation exists or not can only be determined by computing the intersection of the vertices involved. If an empty set is obtained, then no observation with those characteristics was made. A m-clique (with m < p), will be a generalization, or concept, since some variables that characterize the observations are left out. It is important to note that concepts do not have to be cliques. The set of observations that correspond to a given concept is one again found by computing the intersection of the sets of the vertices. 3.2 Multivalued characteristics 4

5 One of the advantages of DAIG is the ease with which it deals with multi-valued characteristics. These types of variables are not very common, but they do occur, and are dealt with in a rather awkward manner by the traditional tabular approach. An example a possibly multi-valued variable is the nationality of a person. While most people have a single nationality, it is possible to have 2 or more. While this constitutes a problem for some representations, a DAIG will simple add the identifier of the observation to the various vertices that correspond to the different nationalities. The variable set will no longer be an independent set, but for most graph algorithms that is not important. This is also a case where the degree of the vertices can greater than the number of variables Representing contingency tables Another big advantage of DAIG is the ease with which multiway contingency tables can be represented. A cell in a multiway contingency table can be seen as the intersection of p atoms corresponding to the p variables of al-way table, having as absolute frequency the cardinal of that intersection. It is then easy to see that a cell in a p-way table corresponds to a non-empty p-clique and can be represented in a parallel coordinates graph as a path connecting all the variables and, in a radar graph as a polygon of p-edges. An example of this type of analysis it is show in Figure 4 where the 17-way contingency table of the 101 animals of the zoo dataset was filtered to show only the edges with a weight of more than 70. Instead of showing the numerical weighs, these are coded as thickness of the edges. The interpretation of this sketch is quite intuitive. Airbourne Has tail No feathers Venomous Has backbone Breaths Domestic Figure 4 - Sketch of the data graph of the 17 characteristics of 101 animals, filtered to inclide only edges with more than 70 ocurrences. In effect this corresponds to the most important information of the 17-way contiungency table. Fins 3.4 Sparse datasets The proposed DAIG is far more efficient than the traditional table-based representation when there are many missing values in the data. This a situation that is common in real data, for a number of reasons. A table based representation has to waste space with non-existing values, while the graph-based representation does not. 5

6 A data-analysis program has been developed using the DAIG framework, that has proved to be useful and efficient. The program was originally developed as part of a PhD thesis (Vairinhos 2003), and improved versions (Vairinhos 2004) have been used to analyse maintenance data of ships (Parreira, Vairinhos et al. 2007). 4. Conclusions and comparisons with other approaches An alternate way of looking at data analysis was explained. This approach is based on intersection graph theory providing an environment for reasoning about data and concepts based on points (vertices), lines (paths) that represent relationships amongst values of different variables. Graphs have been used for data analysis, but in a very different way. In (Agresi 1996) the concept of Formal Concept Analysis is developed, based in the algebraic concept of lattice, and the associated graphs. Association Graphs, for example (Agresi 1996) are used to represent correlations amongst variables. In (Whittaker 1990; Edwards 1995; Lauritzen 1996) graphical models are developed to represent the relations of conditional dependencies between variables using graphs. In those approaches, however, vertices correspond to variables (and not individual values of those variables), and edges are added when there is a conditional dependence between two variables. 5. Bibliography Agresi, A. (1996). An introduction to categorical data analysis. New York, John Wiley & Sons. Edwards, D. (1995). Introduction to Graphical Modeling, Springer. Lauritzen, S. L. (1996). Graphical Models, Oxford Science Publications. McKee, T. A. and F. R. McMorris (1999). Topics in intersection graph theory. Philadelphia, PA, Society for Industrial and Applied Mathematics (SIAM). Parreira, R. R., V. Vairinhos, et al. (2007). Análise de parâmetros de operação de máquinas marítimas. XIV Jornadas de Classificaçaão e Análise de Dados JOCALD Porto. UCI (2008). Machine Learning Repository, University of California at Irvine Vairinhos, V. M. (2003). Desarrollo de un Sistema para Minería de Datos Basado en los Metodos Biplot. Salamanca, Spain, Universidad de Salamanca. Vairinhos, V. M. (2004). BiplotsPMD- data Mining Cebtrada em Biplots. Apresentação de Um Protótipo. JOCLAD' XI Jornadas de Classificação e análise de Dados, Lisbon. Whittaker, J. (1990). Graphical Models in Applied Statistics, John Wiley. 6

Keywords: clustering algorithms, unsupervised learning, cluster validity

Keywords: clustering algorithms, unsupervised learning, cluster validity Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Clustering Based

More information

Data Collection, Preprocessing and Implementation

Data Collection, Preprocessing and Implementation Chapter 6 Data Collection, Preprocessing and Implementation 6.1 Introduction Data collection is the loosely controlled method of gathering the data. Such data are mostly out of range, impossible data combinations,

More information

Pattern Discovery Using Apriori and Ch-Search Algorithm

Pattern Discovery Using Apriori and Ch-Search Algorithm ISSN (e): 2250 3005 Volume, 05 Issue, 03 March 2015 International Journal of Computational Engineering Research (IJCER) Pattern Discovery Using Apriori and Ch-Search Algorithm Prof.Kumbhar S.L. 1, Mahesh

More information

Package VDA. R topics documented: February 19, Type Package Title VDA Version 1.3 Date

Package VDA. R topics documented: February 19, Type Package Title VDA Version 1.3 Date Type Package Title VDA Version 1.3 Date 2013-07-05 Package VDA February 19, 2015 Author Maintainer Edward Grant Multicategory Vertex Discriminant Analysis: A novel supervised

More information

Analysing Hierarchical Data Using a Stochastic Evolutionary Neural Tree R.G.Adams, N.Davey, S.J.George

Analysing Hierarchical Data Using a Stochastic Evolutionary Neural Tree R.G.Adams, N.Davey, S.J.George Analysing Hierarchical Data Using a Stochastic Evolutionary Neural Tree R.G.Adams, N.Davey, S.J.George R.G.Adams@herts.ac.uk, N.Davey@herts.ac.uk, S.J.George@herts.ac.uk Faculty of Engineering and Information

More information

A Procedure to Compute Prototypes for Data Mining in Non-structured Domains

A Procedure to Compute Prototypes for Data Mining in Non-structured Domains A Procedure to Compute Prototypes for Data Mining in Non-structured Domains J. MSndez, M. Hern&ndez, and J. Lorenzo Dpto. de InformAtica y Sistemas, Universidad de Las Palmas de Gran Canaria, 35017 Las

More information

Inference for loglinear models (contd):

Inference for loglinear models (contd): Stat 504, Lecture 25 1 Inference for loglinear models (contd): Loglinear/Logit connection Intro to Graphical Models Stat 504, Lecture 25 2 Loglinear Models no distinction between response and explanatory

More information

36-720: Graphical Models

36-720: Graphical Models 36-720: Graphical Models Brian Junker September 17, 2007 Undirected Graphs and Conditional Independence Generators and Graphs Graphical Models Log-Linear Graphical Models Example Decomposabe Models 1 36-720

More information

Simultaneous Pattern and Data Clustering Using Modified K-Means Algorithm

Simultaneous Pattern and Data Clustering Using Modified K-Means Algorithm Simultaneous Pattern and Data Clustering Using Modified K-Means Algorithm M.Pramod Kumar Vignan University, Vadlamudi, Guntur, Andhrapradesh. Abstract-- In data mining and knowledge discovery, for finding

More information

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated

More information

Hypergraph Exploitation for Data Sciences

Hypergraph Exploitation for Data Sciences Photos placed in horizontal position with even amount of white space between photos and header Hypergraph Exploitation for Data Sciences Photos placed in horizontal position with even amount of white space

More information

Figure 1: From Left to Right, General Venn Diagrams for One, Two, and Three Sets

Figure 1: From Left to Right, General Venn Diagrams for One, Two, and Three Sets 2.3. VENN DIAGRAMS & SET OPERATIONS In this section we introduce Venn diagrams and define four basic operations on sets. We also present some important properties related to these operations. Venn Diagrams

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information

Genetic Algorithm and Simulated Annealing based Approaches to Categorical Data Clustering

Genetic Algorithm and Simulated Annealing based Approaches to Categorical Data Clustering Genetic Algorithm and Simulated Annealing based Approaches to Categorical Data Clustering Indrajit Saha and Anirban Mukhopadhyay Abstract Recently, categorical data clustering has been gaining significant

More information

6. Relational Algebra (Part II)

6. Relational Algebra (Part II) 6. Relational Algebra (Part II) 6.1. Introduction In the previous chapter, we introduced relational algebra as a fundamental model of relational database manipulation. In particular, we defined and discussed

More information

Concept Tree Based Ordering for Shaded Similarity Matrix

Concept Tree Based Ordering for Shaded Similarity Matrix Concept Tree Based Ordering for Shaded Similarity Matrix Jun Wang Bei Yu Les Gasser Graduate School of Library and Information Science University of Illinois at Urbana-Champaign 501 E. Daniel St., Champaign,

More information

Core Membership Computation for Succinct Representations of Coalitional Games

Core Membership Computation for Succinct Representations of Coalitional Games Core Membership Computation for Succinct Representations of Coalitional Games Xi Alice Gao May 11, 2009 Abstract In this paper, I compare and contrast two formal results on the computational complexity

More information

KDSODTEX: A Novel Technique to Extract Second Order Decision Table Using KDRuleEx

KDSODTEX: A Novel Technique to Extract Second Order Decision Table Using KDRuleEx International Journal of Scientific & Engineering Research, Volume 3, Issue 8, August-2012 1 KDSODTEX: A Novel Technique to Extract Second Order Table Using KDRuleEx Kamal Kumar Sethi, Durgesh Kumar Mishra,

More information

On Generalizing Rough Set Theory

On Generalizing Rough Set Theory On Generalizing Rough Set Theory Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca Abstract. This paper summarizes various formulations

More information

MetaData for Database Mining

MetaData for Database Mining MetaData for Database Mining John Cleary, Geoffrey Holmes, Sally Jo Cunningham, and Ian H. Witten Department of Computer Science University of Waikato Hamilton, New Zealand. Abstract: At present, a machine

More information

Information Granulation and Approximation in a Decision-theoretic Model of Rough Sets

Information Granulation and Approximation in a Decision-theoretic Model of Rough Sets Information Granulation and Approximation in a Decision-theoretic Model of Rough Sets Y.Y. Yao Department of Computer Science University of Regina Regina, Saskatchewan Canada S4S 0A2 E-mail: yyao@cs.uregina.ca

More information

On some subclasses of circular-arc graphs

On some subclasses of circular-arc graphs On some subclasses of circular-arc graphs Guillermo Durán - Min Chih Lin Departamento de Computación Facultad de Ciencias Exactas y Naturales Universidad de Buenos Aires e-mail: {willy,oscarlin}@dc.uba.ar

More information

Storage Model of Graph Based on Variable Collection

Storage Model of Graph Based on Variable Collection Advanced Materials Research Online: 2013-09-04 ISSN: 1662-8985, Vols. 765-767, pp 1456-1460 doi:10.4028/www.scientific.net/amr.765-767.1456 2013 Trans Tech Publications, Switzerland Storage Model of Graph

More information

Data Analytics and Boolean Algebras

Data Analytics and Boolean Algebras Data Analytics and Boolean Algebras Hans van Thiel November 28, 2012 c Muitovar 2012 KvK Amsterdam 34350608 Passeerdersstraat 76 1016 XZ Amsterdam The Netherlands T: + 31 20 6247137 E: hthiel@muitovar.com

More information

VC 17/18 TP14 Pattern Recognition

VC 17/18 TP14 Pattern Recognition VC 17/18 TP14 Pattern Recognition Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos Miguel Tavares Coimbra Outline Introduction to Pattern Recognition

More information

Bijective Proofs of Two Broken Circuit Theorems

Bijective Proofs of Two Broken Circuit Theorems Bijective Proofs of Two Broken Circuit Theorems Andreas Blass PENNSYLVANIA STATE UNIVERSITY UNIVERSITY PARK, PENNSYLVANIA 16802 Bruce Eli Sagan THE UNIVERSITY OF PENNSYLVANIA PHILADELPHIA, PENNSYLVANIA

More information

Program Calculus Calculational Programming

Program Calculus Calculational Programming Program Calculus Calculational Programming National Institute of Informatics June 21 / June 28 / July 5, 2010 Program Calculus Calculational Programming What we will learn? Discussing the mathematical

More information

Bastian Wormuth. Version About this Manual

Bastian Wormuth. Version About this Manual Elba User Manual Table of Contents Bastian Wormuth Version 0.1 1 About this Manual...1 2 Overview...2 3 Starting Elba...3 4 Establishing the database connection... 3 5 Elba's Main Window... 5 6 Creating

More information

WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1

WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1 WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1 H. Altay Güvenir and Aynur Akkuş Department of Computer Engineering and Information Science Bilkent University, 06533, Ankara, Turkey

More information

Basic Properties The Definition of Catalan Numbers

Basic Properties The Definition of Catalan Numbers 1 Basic Properties 1.1. The Definition of Catalan Numbers There are many equivalent ways to define Catalan numbers. In fact, the main focus of this monograph is the myriad combinatorial interpretations

More information

BRACE: A Paradigm For the Discretization of Continuously Valued Data

BRACE: A Paradigm For the Discretization of Continuously Valued Data Proceedings of the Seventh Florida Artificial Intelligence Research Symposium, pp. 7-2, 994 BRACE: A Paradigm For the Discretization of Continuously Valued Data Dan Ventura Tony R. Martinez Computer Science

More information

idrm: Fixing the broken interface between design and manufacturing

idrm: Fixing the broken interface between design and manufacturing idrm: Fixing the broken interface between design and manufacturing Abstract Sage Design Automation, Inc. Santa Clara, California, USA This paper reviews the industry practice of using the design rule manual

More information

Extremal Graph Theory: Turán s Theorem

Extremal Graph Theory: Turán s Theorem Bridgewater State University Virtual Commons - Bridgewater State University Honors Program Theses and Projects Undergraduate Honors Program 5-9-07 Extremal Graph Theory: Turán s Theorem Vincent Vascimini

More information

A Particular Type of Non-associative Algebras and Graph Theory

A Particular Type of Non-associative Algebras and Graph Theory A Particular Type of Non-associative Algebras and Graph Theory JUAN NÚÑEZ, MARITHANIA SILVERO & M. TRINIDAD VILLAR University of Seville Department of Geometry and Topology Aptdo. 1160. 41080-Seville SPAIN

More information

Orthogonal art galleries with holes: a coloring proof of Aggarwal s Theorem

Orthogonal art galleries with holes: a coloring proof of Aggarwal s Theorem Orthogonal art galleries with holes: a coloring proof of Aggarwal s Theorem Pawe l Żyliński Institute of Mathematics University of Gdańsk, 8095 Gdańsk, Poland pz@math.univ.gda.pl Submitted: Sep 9, 005;

More information

Slides for Faculty Oxford University Press All rights reserved.

Slides for Faculty Oxford University Press All rights reserved. Oxford University Press 2013 Slides for Faculty Assistance Preliminaries Author: Vivek Kulkarni vivek_kulkarni@yahoo.com Outline Following topics are covered in the slides: Basic concepts, namely, symbols,

More information

Symmetric Product Graphs

Symmetric Product Graphs Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 5-20-2015 Symmetric Product Graphs Evan Witz Follow this and additional works at: http://scholarworks.rit.edu/theses

More information

Minimal Universal Bipartite Graphs

Minimal Universal Bipartite Graphs Minimal Universal Bipartite Graphs Vadim V. Lozin, Gábor Rudolf Abstract A graph U is (induced)-universal for a class of graphs X if every member of X is contained in U as an induced subgraph. We study

More information

Structured System Theory

Structured System Theory Appendix C Structured System Theory Linear systems are often studied from an algebraic perspective, based on the rank of certain matrices. While such tests are easy to derive from the mathematical model,

More information

Evolving SQL Queries for Data Mining

Evolving SQL Queries for Data Mining Evolving SQL Queries for Data Mining Majid Salim and Xin Yao School of Computer Science, The University of Birmingham Edgbaston, Birmingham B15 2TT, UK {msc30mms,x.yao}@cs.bham.ac.uk Abstract. This paper

More information

Concept Tree Based Clustering Visualization with Shaded Similarity Matrices

Concept Tree Based Clustering Visualization with Shaded Similarity Matrices Syracuse University SURFACE School of Information Studies: Faculty Scholarship School of Information Studies (ischool) 12-2002 Concept Tree Based Clustering Visualization with Shaded Similarity Matrices

More information

Equi-sized, Homogeneous Partitioning

Equi-sized, Homogeneous Partitioning Equi-sized, Homogeneous Partitioning Frank Klawonn and Frank Höppner 2 Department of Computer Science University of Applied Sciences Braunschweig /Wolfenbüttel Salzdahlumer Str 46/48 38302 Wolfenbüttel,

More information

An Introduction to Chromatic Polynomials

An Introduction to Chromatic Polynomials An Introduction to Chromatic Polynomials Julie Zhang May 17, 2018 Abstract This paper will provide an introduction to chromatic polynomials. We will first define chromatic polynomials and related terms,

More information

Notes for Lecture 24

Notes for Lecture 24 U.C. Berkeley CS170: Intro to CS Theory Handout N24 Professor Luca Trevisan December 4, 2001 Notes for Lecture 24 1 Some NP-complete Numerical Problems 1.1 Subset Sum The Subset Sum problem is defined

More information

Representing the Real World

Representing the Real World Representing the Real World The theory of representing the real world in a GIS using digital data The nature of digital data and binary notation The discrete object view of the world Entities, data objects,

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University September 30, 2016 1 Introduction (These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan.

More information

Study of Data Mining Algorithm in Social Network Analysis

Study of Data Mining Algorithm in Social Network Analysis 3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Study of Data Mining Algorithm in Social Network Analysis Chang Zhang 1,a, Yanfeng Jin 1,b, Wei Jin 1,c, Yu Liu 1,d 1

More information

Identifying Global Exceptional Patterns in Multi-database Mining

Identifying Global Exceptional Patterns in Multi-database Mining Feature Article: Identifying Global Exceptional Patterns in Multi-database Mining 19 Identifying Global Exceptional Patterns in Multi-database Mining Chengqi Zhang 1, Meiling Liu 2, Wenlong Nie 3, and

More information

Rule extraction from support vector machines

Rule extraction from support vector machines Rule extraction from support vector machines Haydemar Núñez 1,3 Cecilio Angulo 1,2 Andreu Català 1,2 1 Dept. of Systems Engineering, Polytechnical University of Catalonia Avda. Victor Balaguer s/n E-08800

More information

CS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering

CS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering W10.B.0.0 CS435 Introduction to Big Data W10.B.1 FAQs Term project 5:00PM March 29, 2018 PA2 Recitation: Friday PART 1. LARGE SCALE DATA AALYTICS 4. RECOMMEDATIO SYSTEMS 5. EVALUATIO AD VALIDATIO TECHIQUES

More information

Semantic text features from small world graphs

Semantic text features from small world graphs Semantic text features from small world graphs Jurij Leskovec 1 and John Shawe-Taylor 2 1 Carnegie Mellon University, USA. Jozef Stefan Institute, Slovenia. jure@cs.cmu.edu 2 University of Southampton,UK

More information

Automated Clustering-Based Workload Characterization

Automated Clustering-Based Workload Characterization Automated Clustering-Based Worload Characterization Odysseas I. Pentaalos Daniel A. MenascŽ Yelena Yesha Code 930.5 Dept. of CS Dept. of EE and CS NASA GSFC Greenbelt MD 2077 George Mason University Fairfax

More information

SUBSETS AND SUBGRAPHS WITH MAXIMAL PROPERTIES1

SUBSETS AND SUBGRAPHS WITH MAXIMAL PROPERTIES1 SUBSETS AND SUBGRAPHS WITH MAXIMAL PROPERTIES1 OYSTEIN ORE AND T. S. MOTZKIN2 1. Introduction. One is often concerned with subsets H of a set G such that H is maximal among all those subsets of G which

More information

A Categorical Model for a Versioning File System

A Categorical Model for a Versioning File System A Categorical Model for a Versioning File System Eduardo S. L. Gastal Instituto de Informática Universidade Federal do Rio Grande do Sul (UFRGS) Caixa Postal 15.064 91.501-970 Porto Alegre RS Brazil eslgastal@inf.ufrgs.br

More information

REDUCING GRAPH COLORING TO CLIQUE SEARCH

REDUCING GRAPH COLORING TO CLIQUE SEARCH Asia Pacific Journal of Mathematics, Vol. 3, No. 1 (2016), 64-85 ISSN 2357-2205 REDUCING GRAPH COLORING TO CLIQUE SEARCH SÁNDOR SZABÓ AND BOGDÁN ZAVÁLNIJ Institute of Mathematics and Informatics, University

More information

arxiv: v3 [math.co] 19 Nov 2015

arxiv: v3 [math.co] 19 Nov 2015 A Proof of Erdös - Faber - Lovász Conjecture Suresh M. H., V. V. P. R. V. B. Suresh Dara arxiv:1508.03476v3 [math.co] 19 Nov 015 Abstract Department of Mathematical and Computational Sciences, National

More information

EXTENDED ALGORITHM FOR DESIGN-MATRIX REORGANIZATION

EXTENDED ALGORITHM FOR DESIGN-MATRIX REORGANIZATION Proceedings of ICAD2011 ICAD-2011-03 EXTENDED ALGORITHM FOR DESIGN-MATRIX REORGANIZATION Efrén Moreno Benavides efren.moreno@upm.es Universidad Politécnica de Madrid Department of Aerospace Propulsion

More information

Formal Approach in Software Testing

Formal Approach in Software Testing Formal Approach in Software Testing #Abhishek Dixit, #Shivani Goel 1 csed, TIET biodatadixit@yahoo.co.in 2 csed, TIET shivani@tiet.ac.in Abstract Testing is an important activity for checking the correctness

More information

Konigsberg Bridge Problem

Konigsberg Bridge Problem Graphs Konigsberg Bridge Problem c C d g A Kneiphof e D a B b f c A C d e g D a b f B Euler s Graph Degree of a vertex: the number of edges incident to it Euler showed that there is a walk starting at

More information

if for every induced subgraph H of G the chromatic number of H is equal to the largest size of a clique in H. The triangulated graphs constitute a wid

if for every induced subgraph H of G the chromatic number of H is equal to the largest size of a clique in H. The triangulated graphs constitute a wid Slightly Triangulated Graphs Are Perfect Frederic Maire e-mail : frm@ccr.jussieu.fr Case 189 Equipe Combinatoire Universite Paris 6, France December 21, 1995 Abstract A graph is triangulated if it has

More information

Chapter 7: Entity-Relationship Model

Chapter 7: Entity-Relationship Model Chapter 7: Entity-Relationship Model, 7th Ed. See www.db-book.com for conditions on re-use Chapter 7: Entity-Relationship Model Design Process Modeling Constraints E-R Diagram Design Issues Weak Entity

More information

A note on self complementary brittle and self complementary quasi chordal graphs

A note on self complementary brittle and self complementary quasi chordal graphs Applied and Computational Mathematics 2013; 2(3): 86-91 Published online July 20, 2013 (http://www.sciencepublishinggroup.com/j/acm) doi: 10.11648/j.acm.20130203.13 A note on self complementary brittle

More information

The Buss Reduction for the k-weighted Vertex Cover Problem

The Buss Reduction for the k-weighted Vertex Cover Problem The Buss Reduction for the k-weighted Vertex Cover Problem Hong Xu Xin-Zeng Wu Cheng Cheng Sven Koenig T. K. Satish Kumar University of Southern California, Los Angeles, California 90089, the United States

More information

Association Pattern Mining. Lijun Zhang

Association Pattern Mining. Lijun Zhang Association Pattern Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms

More information

H1 Spring C. A service-oriented architecture is frequently deployed in practice without a service registry

H1 Spring C. A service-oriented architecture is frequently deployed in practice without a service registry 1. (12 points) Identify all of the following statements that are true about the basics of services. A. Screen scraping may not be effective for large desktops but works perfectly on mobile phones, because

More information

Cantor s Diagonal Argument for Different Levels of Infinity

Cantor s Diagonal Argument for Different Levels of Infinity JANUARY 2015 1 Cantor s Diagonal Argument for Different Levels of Infinity Michael J. Neely University of Southern California http://www-bcf.usc.edu/ mjneely Abstract These notes develop the classic Cantor

More information

Fundamental Properties of Graphs

Fundamental Properties of Graphs Chapter three In many real-life situations we need to know how robust a graph that represents a certain network is, how edges or vertices can be removed without completely destroying the overall connectivity,

More information

The Fibonacci hypercube

The Fibonacci hypercube AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 40 (2008), Pages 187 196 The Fibonacci hypercube Fred J. Rispoli Department of Mathematics and Computer Science Dowling College, Oakdale, NY 11769 U.S.A. Steven

More information

SDMX self-learning package No. 5 Student book. Metadata Structure Definition

SDMX self-learning package No. 5 Student book. Metadata Structure Definition No. 5 Student book Metadata Structure Definition Produced by Eurostat, Directorate B: Statistical Methodologies and Tools Unit B-5: Statistical Information Technologies Last update of content December

More information

CSE 230 Computer Science II (Data Structure) Introduction

CSE 230 Computer Science II (Data Structure) Introduction CSE 230 Computer Science II (Data Structure) Introduction Fall 2017 Stony Brook University Instructor: Shebuti Rayana Basic Terminologies Data types Data structure Phases of S/W development Specification

More information

Instructions. Notation. notation: In particular, t(i, 2) = 2 2 2

Instructions. Notation. notation: In particular, t(i, 2) = 2 2 2 Instructions Deterministic Distributed Algorithms, 10 21 May 2010, Exercises http://www.cs.helsinki.fi/jukka.suomela/dda-2010/ Jukka Suomela, last updated on May 20, 2010 exercises are merely suggestions

More information

Which n-venn diagrams can be drawn with convex k-gons?

Which n-venn diagrams can be drawn with convex k-gons? Which n-venn diagrams can be drawn with convex k-gons? Jeremy Carroll Frank Ruskey Mark Weston Abstract We establish a new lower bound for the number of sides required for the component curves of simple

More information

Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection

Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection Petr Somol 1,2, Jana Novovičová 1,2, and Pavel Pudil 2,1 1 Dept. of Pattern Recognition, Institute of Information Theory and

More information

Learning Characteristic Structured Patterns in Rooted Planar Maps

Learning Characteristic Structured Patterns in Rooted Planar Maps Learning Characteristic Structured Patterns in Rooted Planar Maps Satoshi Kawamoto Yusuke Suzuki Takayoshi Shoudai Abstract Exting the concept of ordered graphs, we propose a new data structure to express

More information

P Is Not Equal to NP. ScholarlyCommons. University of Pennsylvania. Jon Freeman University of Pennsylvania. October 1989

P Is Not Equal to NP. ScholarlyCommons. University of Pennsylvania. Jon Freeman University of Pennsylvania. October 1989 University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science October 1989 P Is Not Equal to NP Jon Freeman University of Pennsylvania Follow this and

More information

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) January 11, 2018 Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 In this lecture

More information

Lecture 13: May 10, 2002

Lecture 13: May 10, 2002 EE96 Pat. Recog. II: Introduction to Graphical Models University of Washington Spring 00 Dept. of Electrical Engineering Lecture : May 0, 00 Lecturer: Jeff Bilmes Scribe: Arindam Mandal, David Palmer(000).

More information

Cluster quality assessment by the modified Renyi-ClipX algorithm

Cluster quality assessment by the modified Renyi-ClipX algorithm Issue 3, Volume 4, 2010 51 Cluster quality assessment by the modified Renyi-ClipX algorithm Dalia Baziuk, Aleksas Narščius Abstract This paper presents the modified Renyi-CLIPx clustering algorithm and

More information

Mining High Order Decision Rules

Mining High Order Decision Rules Mining High Order Decision Rules Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 e-mail: yyao@cs.uregina.ca Abstract. We introduce the notion of high

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.854J / 18.415J Advanced Algorithms Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advanced

More information

The Simplex Algorithm

The Simplex Algorithm The Simplex Algorithm Uri Feige November 2011 1 The simplex algorithm The simplex algorithm was designed by Danzig in 1947. This write-up presents the main ideas involved. It is a slight update (mostly

More information

Predictive and comprehensible rule discovery using a multi-objective genetic algorithm

Predictive and comprehensible rule discovery using a multi-objective genetic algorithm Knowledge-Based Systems 19 (2006) 413 421 www.elsevier.com/locate/knosys Predictive and comprehensible rule discovery using a multi-objective genetic algorithm S. Dehuri a, R. Mall b, * a P.G. Department

More information

Chapter 10. Fundamental Network Algorithms. M. E. J. Newman. May 6, M. E. J. Newman Chapter 10 May 6, / 33

Chapter 10. Fundamental Network Algorithms. M. E. J. Newman. May 6, M. E. J. Newman Chapter 10 May 6, / 33 Chapter 10 Fundamental Network Algorithms M. E. J. Newman May 6, 2015 M. E. J. Newman Chapter 10 May 6, 2015 1 / 33 Table of Contents 1 Algorithms for Degrees and Degree Distributions Degree-Degree Correlation

More information

Multi Domain Logic and its Applications to SAT

Multi Domain Logic and its Applications to SAT Multi Domain Logic and its Applications to SAT Tudor Jebelean RISC Linz, Austria Tudor.Jebelean@risc.uni-linz.ac.at Gábor Kusper Eszterházy Károly College gkusper@aries.ektf.hu Abstract We describe a new

More information

Visualization? Information Visualization. Information Visualization? Ceci n est pas une visualization! So why two disciplines? So why two disciplines?

Visualization? Information Visualization. Information Visualization? Ceci n est pas une visualization! So why two disciplines? So why two disciplines? Visualization? New Oxford Dictionary of English, 1999 Information Visualization Matt Cooper visualize - verb [with obj.] 1. form a mental image of; imagine: it is not easy to visualize the future. 2. make

More information

CSCI 5454 Ramdomized Min Cut

CSCI 5454 Ramdomized Min Cut CSCI 5454 Ramdomized Min Cut Sean Wiese, Ramya Nair April 8, 013 1 Randomized Minimum Cut A classic problem in computer science is finding the minimum cut of an undirected graph. If we are presented with

More information

CS 3EA3: Sheet 9 Optional Assignment - The Importance of Algebraic Properties

CS 3EA3: Sheet 9 Optional Assignment - The Importance of Algebraic Properties CS 3EA3: Sheet 9 Optional Assignment - The Importance of Algebraic Properties James Zhu 001317457 21 April 2017 1 Abstract Algebraic properties (such as associativity and commutativity) may be defined

More information

Unsupervised Feature Selection for Sparse Data

Unsupervised Feature Selection for Sparse Data Unsupervised Feature Selection for Sparse Data Artur Ferreira 1,3 Mário Figueiredo 2,3 1- Instituto Superior de Engenharia de Lisboa, Lisboa, PORTUGAL 2- Instituto Superior Técnico, Lisboa, PORTUGAL 3-

More information

Morphological Image Processing

Morphological Image Processing Morphological Image Processing Morphology Identification, analysis, and description of the structure of the smallest unit of words Theory and technique for the analysis and processing of geometric structures

More information

Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, Roma, Italy

Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, Roma, Italy Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, 00142 Roma, Italy e-mail: pimassol@istat.it 1. Introduction Questions can be usually asked following specific

More information

arxiv: v1 [math.co] 3 Apr 2016

arxiv: v1 [math.co] 3 Apr 2016 A note on extremal results on directed acyclic graphs arxiv:1604.0061v1 [math.co] 3 Apr 016 A. Martínez-Pérez, L. Montejano and D. Oliveros April 5, 016 Abstract The family of Directed Acyclic Graphs as

More information

Extraction of Evolution Tree from Product Variants Using Linear Counting Algorithm. Liu Shuchang

Extraction of Evolution Tree from Product Variants Using Linear Counting Algorithm. Liu Shuchang Extraction of Evolution Tree from Product Variants Using Linear Counting Algorithm Liu Shuchang 30 2 7 29 Extraction of Evolution Tree from Product Variants Using Linear Counting Algorithm Liu Shuchang

More information

14.1 Encoding for different models of computation

14.1 Encoding for different models of computation Lecture 14 Decidable languages In the previous lecture we discussed some examples of encoding schemes, through which various objects can be represented by strings over a given alphabet. We will begin this

More information

Evolution Module. 6.1 Phylogenetic Trees. Bob Gardner and Lev Yampolski. Integrated Biology and Discrete Math (IBMS 1300)

Evolution Module. 6.1 Phylogenetic Trees. Bob Gardner and Lev Yampolski. Integrated Biology and Discrete Math (IBMS 1300) Evolution Module 6.1 Phylogenetic Trees Bob Gardner and Lev Yampolski Integrated Biology and Discrete Math (IBMS 1300) Fall 2008 1 INDUCTION Note. The natural numbers N is the familiar set N = {1, 2, 3,...}.

More information

Byzantine Consensus in Directed Graphs

Byzantine Consensus in Directed Graphs Byzantine Consensus in Directed Graphs Lewis Tseng 1,3, and Nitin Vaidya 2,3 1 Department of Computer Science, 2 Department of Electrical and Computer Engineering, and 3 Coordinated Science Laboratory

More information

CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer Science (Arkoudas and Musser) Sections p.

CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer Science (Arkoudas and Musser) Sections p. CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer Science (Arkoudas and Musser) Sections 10.1-10.3 p. 1/106 CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer

More information

Arithmetic universes as generalized point-free spaces

Arithmetic universes as generalized point-free spaces Arithmetic universes as generalized point-free spaces Steve Vickers CS Theory Group Birmingham * Grothendieck: "A topos is a generalized topological space" *... it's represented by its category of sheaves

More information

arxiv: v1 [math.co] 4 Apr 2011

arxiv: v1 [math.co] 4 Apr 2011 arxiv:1104.0510v1 [math.co] 4 Apr 2011 Minimal non-extensible precolorings and implicit-relations José Antonio Martín H. Abstract. In this paper I study a variant of the general vertex coloring problem

More information

mimr A package for graphical modelling in R

mimr A package for graphical modelling in R DSC 2003 Working Papers (Draft Versions) http://www.ci.tuwien.ac.at/conferences/dsc-2003/ mimr A package for graphical modelling in R Søren Højsgaard Abstract The mimr package for graphical modelling in

More information

13 th Annual Johns Hopkins Math Tournament Saturday, February 18, 2012 Explorations Unlimited Round Automata Theory

13 th Annual Johns Hopkins Math Tournament Saturday, February 18, 2012 Explorations Unlimited Round Automata Theory 13 th Annual Johns Hopkins Math Tournament Saturday, February 18, 2012 Explorations Unlimited Round Automata Theory 1. Introduction Automata theory is an investigation of the intersection of mathematics,

More information