Circle Graphs: New Visualization Tools for Text-Mining

Size: px
Start display at page:

Download "Circle Graphs: New Visualization Tools for Text-Mining"

Transcription

1 Circle Graphs: New Visualization Tools for Text-Mining Yonatan Aumann, Ronen Feldman, Yaron Ben Yehuda, David Landau, Orly Liphstat, Yonatan Schler Department of Mathematics and Computer Science Bar-Ilan University Ramat-Gan, ISRAEL Tel: Fax: Abstract. The proliferation of digitally available textual data necessitates automatic tools for analyzing large textual collections. Thus, in analogy to data mining for structured databases, text mining is defined for textual collections. A central tool in text-mining is the analysis of concept relationship, which discovers connections between different concepts, as reflected in the collection. However, discovering these relationships is not sufficient, as they also have to be presented to the user in a meaningful and manageable way. In this paper we introduce a new family of visualization tools, which we coin circle graphs, which provide means for visualizing concept relationships mined from large collections. Circle graphs allow for instant appreciation of multiple relationships gathered from the entire collection. A special type of circle-graphs, called Trend Graphs, allows tracking of the evolution of relationships over time. 1 Introduction Most informal definitions [2] introduce knowledge discovery in databases (KDD) as the extraction of useful information from databases by large-scale search for interesting patterns. The vast majority of existing KDD applications and methods deal with structured databases, for example, client data stored in a relational database, and thus exploits data organized in records structured by categorical, ordinal, and continuous variables. However, a tremendous amount of information is stored in documents that are essentially unstructured. The availability of document collections and especially of online information is rapidly growing, so that an analysis bottleneck often arises also in this area. Thus, in analogy to data mining for structured data, text mining is defined for textual data. Text mining is the science of extracting information from hidden patterns in large textual collections. Text mining shares many characteristics with classical data mining, but also differs in some. Thus, it is necessary to provide special tools geared specifically to text mining. A central tool, found valuable in text mining, is the analysis of concept J.M. Zytkow and J. Rauch (Eds.): PKDD 99, LNAI 1704, pp , Springer Verlag Berlin Heidelberg 1999

2 278 Y. Aumann et al. relationship [3,4], defined as follows. Large textual corpuses are most commonly composed of a collection of separate documents (e.g. news articles, web pages). Each document refers to a set of concepts (terms). Text mining operations consider the distribution of concepts on the inter-document level, seeking to discover the nature and relationships of concepts as reflected in the collection as a whole. For example, in a collection of news articles, a large number of articles on politician X and "scandal" may indicate a negative image of the character, and alert for a new PR campaign. Or, for another example, a growing number of articles on both company Y and product Z may indicate a shift of focus in the company s interests, a shift which should be noted by its competitors. Notice that in both of these cases, the information is not provided by any single document, but rather from the totality of the collection. Thus, concept relationship analysis seeks to discover the relationship between concepts, as reflected by the totality of the corpus at hand. Clearly, discovering the concept relationships is only useful insofar as this information can be conveyed to the end-user. In practice, even medium-sized collections tend to give rise to a very large number of relationships. Thus, a mere listing of the discovered relationships is of little practical use for the end-user, as it is too large to comprehend. In addition, a linear list fails to show the structure arising from the entirety of relationships. Thus, we find that in order for mining of concept-relationship to be a useful tool for text-analysis, proper visualization techniques are necessary for presenting the results to the end user in a meaningful and manageable form. In this paper we introduce a new family of visualization tools for text-mining, which we call Circle Graphs. Circle graphs prove to be an effective tool for visualizing concept relationships discovered in text-mining. The graphs provide the user with an instant overall view of many relationships at once. Thus, circle graphs provide the extra benefit of surfacing the overall structure emerging from the multitude of relationships. We describe two specific types of circle graphs: 1. Category Connection Graphs: Provide a graphic representation of relationships between concept in different categories. 2. Context Connection Circle Graphs: Provide the user with a graphic representation of the connection between entities in a give context. 2 Circle Graphs We now describe the circle graphs visualization. We first give some basic definitions and notations.

3 Circle Graphs: New Visualization Tools for Text Mining Definitions and Notations Let T a be taxonomy. T is represented as a DAG (Directed Acyclic Graph), with the terms at the leaves. For a given node v T, we denote by Terms(v) the terms which are decedents of v. Let D be a collection of documents. For terms e 1 and e 2 we denote sup D (e 1,e) the number of documents in D which indicate a relationship between e 1 and e 2. The nature of the indication can be defined individually according to the context. In the current implementation we say that a document indicates a relationship if both terms appear in the document in the same sentence. This has proved to be a strong indicator. Similarly, for a collection D and terms e 1, e 2 and c, we denote by sup D (e 1,e 2,c) the number of documents which indicate a relationship between e 1 and e 2 in the context of c (e.g. relationship between the UK and Ireland in the context of peace talks). Again, the nature of indication may be determined in many ways. In the current implementation we require that they all appear in the same sentence. 2.2 Category Connection Maps Category Connection Maps provide a means for concise visual representation of connections between different categories, e.g. between companies and technologies, countries and people, or drugs and diseases. In order to define a category connection map, the user chooses any number of categories from the taxonomy. The system finds all the connections between the terms in the different categories. To visualize the output, all the terms in the chosen categories are depicted on a circle, with each category placed on a separate part on the circle. A line is depicted between terms of different categories which are related. A color coding scheme represents stronger links with darker colors. Formally, given a set C={v 1,v 2,,v k } of taxonomy nodes, and a document collection D, the category connection map is the weighted G defined as follows. The nodes of the graph are the set V=terms(v 1 ) terms(v 2 ) terms(v k ). Nodes u,w V are connected by an edge if: 1. u and w are from different categories 2. sup D (u,w)>0. 3. The weight of the edge (u,w) is sup D (u,w). An important point to notice regarding Category Connection Maps is that the map presents in a single picture information from the entire collection of documents. In the specific example, there is no single document that has the relationship between all the companies and the technologies. Rather, the graphs depicts aggregate knowledge from hundreds of documents. Thus, the user is provided with a bird s-eye summary view of data from across the collection.

4 280 Y. Aumann et al. The Category Connection Maps are dynamic in several ways. Firstly, the user can choose any node in the graph and the links from this node are highlighted. In addition, a double-click on any of the edges brings the list of documents which support the given relationship, together with the most relevant sentence in each document. Thus, in a way, the system is the opposite of search engines. Search engines point to documents, in the hope that the user will be able to find the necessary information. Circle category connection maps present the user with the information itself, which can then be backed by a list of documents. Figure 1 Context Circle Graph. The graph presents the connections between companies in the context of joint venture. Clusters are depicted separately. Color coded lines represent the strength of the connection. The information is based on 5,413 news articles obtained from Marketwatch.com. Only connection with weight 2 or more are depicted. 2.3 Context Circle-Graphs Context Circle-Graphs provide a visual means for concise representation of the relationship between many terms in a given context. In order to define a context circle graph the user defines: 1. A taxonomy category (e.g. companies ), which determines the nodes of the circle graph (e.g. companies) 2. An optional context node (e.g. joint venture ): which will determine the type of connection we wish to find among the graph nodes. Formally, for a set of taxonomy nodes vs, and a context node C, the Context Circle Graph is a weighted graph on the node set V=terms(vs). For each pair u,w V there is an edge between u and w, if there exists a context term c C, such that sup D (u,w,c)>0. In this case the weight of the edge is Σ c C sup D (u,w,c). If no

5 Circle Graphs: New Visualization Tools for Text Mining 281 context node is defined, then the connection can be in any context. Formally, in this case the root of the taxonomy is considered as the context. A Context Circle Graph for companies in the context of joint venture is depicted in Figure 1. In this case, the graph is clustered, as described below. The graph is based on 5,413 news documents downloaded from Marketwatch.com. The graph gives the user a summary of the entire collection in one visualization. The user can appreciate the overall structure of the connections between companies in this context, even before reading a single document! Clustering For Context Circle Graph we use clustering to identify clusters of nodes which are strongly inter-related in the given context. In the example of figure 1, the system identified six separate clusters. The edges between members of each cluster are depicted in a separate small Context Circle Graph, adjacent to the center graph. The center graph shows connections between terms of different clusters, and those with terms which are not in any cluster. We now describe the algorithm for determining the clusters. Note that the clustering problem here is different from the classic clustering problem. In the classic problem we are given points in some space, and seek to find clusters of points which are close to each other. Here, we are given a graph in which we are seeking to find dense sub-graphs of the graph. Thus, a different type of clustering algorithm is necessary. The algorithm is composed of two main steps. In the first step we assign weights to edges in the graph. The weight of an edge reflects the strength of the connection between the vertices. Edges incident to vertices which are in the same cluster should be associated with high weights. In the next step we identify sets of vertices which are dense-subgraphs. This step uses the weights assigned to the edges in the previous one. We first describe the weight-assignment method. In order to evaluate the strength of a link between a pair of vertices u and v, we consider two criteria: Let u be a vertex in the graph. We use the notation Γ(u) to represent the neighborhood of u. The cluster weight of (u,v) is affected by the similarity of Γ(u) and Γ(v). We assume that vertices within the same clusters have many common neighbors. Existence of many common neighbors is not a sufficient condition, since in dense graphs any two vertices may have some common neighbors. Thus, we emphasize the neighbors which are close to u and v in the sense of cluster weight. Suppose x Γ(u) Γ(u), if the cluster weights of (x,u) and (x,v) are high, there is a good chance that x belongs to the same cluster as u and v.

6 282 Y. Aumann et al. We can now define an update operation on an edge (u,v) which takes into account both criteria: w(u,v) = w(x,u) + w(x,v) x Γ (u) I Γ(v) x Γ (u) I Γ (v) The algorithm starts with initializing all weights to be equal, w(u,v)=1 for all u,v. Next, the update operation is applied to all edges iteratively. After a small number of iterations (set to 5 in the current implementation) it stops and outputs the values associated with each edge. We call this the cluster weight of the edge. The cluster weight has the following characteristic. Consider two vertices u and v within the same dense sub-graph. The edges within this sub-graph mutually affect each other. Thus the iterations drive cluster weight w(u,v) up. If, however, u and v do not belong to the dense sub-graph, the majority of edges affecting w(u,v) will have lower weights, resulting in a low cluster weight assigned to (u,v). After computing the weights, the second step of the algorithm finds the clusters. We define a new graph with the same set of vertices. In the new graph we consider only a small subset of the original edges, whose weights were the highest. In our experiments we took the top 10% of the edges. Since now almost all of the edges are likely to connect vertices within the same dense sub-graph, we thus separate the vertices into clusters by computing the connected components of the new graph and considering each component as a cluster. Figure 1 shows a circle context graph with six clusters. The cluster are depicted around the center graph. Each cluster is depicted in a different color. Nodes which are not in any cluster are colored gray. Note that the nodes of cluster appear both in the central circle and in the separate cluster graph. Edges within the cluster are depicted in the separate cluster graph. Edges between clusters are depicted in the central circle. References 1. Eick, S.G. and Wills, G.J.; Navigating Large Networks with Hierarchies. Visualization 93, pp , Fayyad, U,; Piatetsky-Shapiro, G.; and Smyth P. Knowledge Discover and Data Mining: Towards a Unifying Framework. In Proceedings of the 2 nd International Conference of Knowledge Discovery and Data Mining, 82-88, Feldman, R.; and Dagan, I.; KDT - Knowledge Discovery in Texts. In Proceedings of the 1sr International Conference of Knowledge Discovery and Data Mining Feldman, R.; Klosgen, W.; and Zilberstein, A.; Visualization Techniques to Explore Data Mining Results for Document Collections. In Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining, Hendley, R.J., Drew, N.S., Wood, A.M., and Beale R.; Narcissus: Visualizing Information. In Proc. Int. Symp. On Information Visualization, pp , 1995.

Text Mining via Information Extraction

Text Mining via Information Extraction Ronen Feldman, Yonatan Aumann, Moshe Fresko, Orly Liphstat, Binyamin Rosenfeld, Yonatan Schler Department of Mathematics and Computer Science Bar-Ilan University Ramat-Gan, ISRAEL Tel: 972-3-5326611 Fax:

More information

Visualization Techniques to Explore Data Mining Results for Document Collections

Visualization Techniques to Explore Data Mining Results for Document Collections Visualization Techniques to Explore Data Mining Results for Document Collections Ronen Feldman Math and Computer Science Department Bar-Ilan University Ramat-Gan, ISRAEL 52900 feldman@macs.biu.ac.il Willi

More information

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What

More information

Theme Identification in RDF Graphs

Theme Identification in RDF Graphs Theme Identification in RDF Graphs Hanane Ouksili PRiSM, Univ. Versailles St Quentin, UMR CNRS 8144, Versailles France hanane.ouksili@prism.uvsq.fr Abstract. An increasing number of RDF datasets is published

More information

arxiv: v2 [cs.ds] 25 Jan 2017

arxiv: v2 [cs.ds] 25 Jan 2017 d-hop Dominating Set for Directed Graph with in-degree Bounded by One arxiv:1404.6890v2 [cs.ds] 25 Jan 2017 Joydeep Banerjee, Arun Das, and Arunabha Sen School of Computing, Informatics and Decision System

More information

Discovering interesting rules from financial data

Discovering interesting rules from financial data Discovering interesting rules from financial data Przemysław Sołdacki Institute of Computer Science Warsaw University of Technology Ul. Andersa 13, 00-159 Warszawa Tel: +48 609129896 email: psoldack@ii.pw.edu.pl

More information

Multi-relational Decision Tree Induction

Multi-relational Decision Tree Induction Multi-relational Decision Tree Induction Arno J. Knobbe 1,2, Arno Siebes 2, Daniël van der Wallen 1 1 Syllogic B.V., Hoefseweg 1, 3821 AE, Amersfoort, The Netherlands, {a.knobbe, d.van.der.wallen}@syllogic.com

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

Community Detection. Community

Community Detection. Community Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,

More information

Visualisation of Temporal Interval Association Rules

Visualisation of Temporal Interval Association Rules Visualisation of Temporal Interval Association Rules Chris P. Rainsford 1 and John F. Roddick 2 1 Defence Science and Technology Organisation, DSTO C3 Research Centre Fernhill Park, Canberra, 2600, Australia.

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

Association Rule Selection in a Data Mining Environment

Association Rule Selection in a Data Mining Environment Association Rule Selection in a Data Mining Environment Mika Klemettinen 1, Heikki Mannila 2, and A. Inkeri Verkamo 1 1 University of Helsinki, Department of Computer Science P.O. Box 26, FIN 00014 University

More information

CIS 121 Data Structures and Algorithms Minimum Spanning Trees

CIS 121 Data Structures and Algorithms Minimum Spanning Trees CIS 121 Data Structures and Algorithms Minimum Spanning Trees March 19, 2019 Introduction and Background Consider a very natural problem: we are given a set of locations V = {v 1, v 2,..., v n }. We want

More information

The Interestingness Tool for Search in the Web

The Interestingness Tool for Search in the Web The Interestingness Tool for Search in the Web e-print, Gilad Amar and Ran Shaltiel Software Engineering Department, The Jerusalem College of Engineering Azrieli POB 3566, Jerusalem, 91035, Israel iaakov@jce.ac.il,

More information

K-Mean Clustering Algorithm Implemented To E-Banking

K-Mean Clustering Algorithm Implemented To E-Banking K-Mean Clustering Algorithm Implemented To E-Banking Kanika Bansal Banasthali University Anjali Bohra Banasthali University Abstract As the nations are connected to each other, so is the banking sector.

More information

Minimum Spanning Trees My T. UF

Minimum Spanning Trees My T. UF Introduction to Algorithms Minimum Spanning Trees @ UF Problem Find a low cost network connecting a set of locations Any pair of locations are connected There is no cycle Some applications: Communication

More information

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW Ana Azevedo and M.F. Santos ABSTRACT In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done

More information

Hardness of Subgraph and Supergraph Problems in c-tournaments

Hardness of Subgraph and Supergraph Problems in c-tournaments Hardness of Subgraph and Supergraph Problems in c-tournaments Kanthi K Sarpatwar 1 and N.S. Narayanaswamy 1 Department of Computer Science and Engineering, IIT madras, Chennai 600036, India kanthik@gmail.com,swamy@cse.iitm.ac.in

More information

A New Approach To Graph Based Object Classification On Images

A New Approach To Graph Based Object Classification On Images A New Approach To Graph Based Object Classification On Images Sandhya S Krishnan,Kavitha V K P.G Scholar, Dept of CSE, BMCE, Kollam, Kerala, India Sandhya4parvathy@gmail.com Abstract: The main idea of

More information

The strong chromatic number of a graph

The strong chromatic number of a graph The strong chromatic number of a graph Noga Alon Abstract It is shown that there is an absolute constant c with the following property: For any two graphs G 1 = (V, E 1 ) and G 2 = (V, E 2 ) on the same

More information

Matching Algorithms. Proof. If a bipartite graph has a perfect matching, then it is easy to see that the right hand side is a necessary condition.

Matching Algorithms. Proof. If a bipartite graph has a perfect matching, then it is easy to see that the right hand side is a necessary condition. 18.433 Combinatorial Optimization Matching Algorithms September 9,14,16 Lecturer: Santosh Vempala Given a graph G = (V, E), a matching M is a set of edges with the property that no two of the edges have

More information

Sequences Modeling and Analysis Based on Complex Network

Sequences Modeling and Analysis Based on Complex Network Sequences Modeling and Analysis Based on Complex Network Li Wan 1, Kai Shu 1, and Yu Guo 2 1 Chongqing University, China 2 Institute of Chemical Defence People Libration Army {wanli,shukai}@cqu.edu.cn

More information

PROJECT PERIODIC REPORT

PROJECT PERIODIC REPORT PROJECT PERIODIC REPORT Grant Agreement number: 257403 Project acronym: CUBIST Project title: Combining and Uniting Business Intelligence and Semantic Technologies Funding Scheme: STREP Date of latest

More information

Michał Dębski. Uniwersytet Warszawski. On a topological relaxation of a conjecture of Erdős and Nešetřil

Michał Dębski. Uniwersytet Warszawski. On a topological relaxation of a conjecture of Erdős and Nešetřil Michał Dębski Uniwersytet Warszawski On a topological relaxation of a conjecture of Erdős and Nešetřil Praca semestralna nr 3 (semestr letni 2012/13) Opiekun pracy: Tomasz Łuczak On a topological relaxation

More information

Information Retrieval using Pattern Deploying and Pattern Evolving Method for Text Mining

Information Retrieval using Pattern Deploying and Pattern Evolving Method for Text Mining Information Retrieval using Pattern Deploying and Pattern Evolving Method for Text Mining 1 Vishakha D. Bhope, 2 Sachin N. Deshmukh 1,2 Department of Computer Science & Information Technology, Dr. BAM

More information

Straight-Line Drawings of 2-Outerplanar Graphs on Two Curves

Straight-Line Drawings of 2-Outerplanar Graphs on Two Curves Straight-Line Drawings of 2-Outerplanar Graphs on Two Curves (Extended Abstract) Emilio Di Giacomo and Walter Didimo Università di Perugia ({digiacomo,didimo}@diei.unipg.it). Abstract. We study how to

More information

arxiv: v1 [math.co] 3 Apr 2016

arxiv: v1 [math.co] 3 Apr 2016 A note on extremal results on directed acyclic graphs arxiv:1604.0061v1 [math.co] 3 Apr 016 A. Martínez-Pérez, L. Montejano and D. Oliveros April 5, 016 Abstract The family of Directed Acyclic Graphs as

More information

Preimages of Small Geometric Cycles

Preimages of Small Geometric Cycles Preimages of Small Geometric Cycles Sally Cockburn Department of Mathematics Hamilton College, Clinton, NY scockbur@hamilton.edu Abstract A graph G is a homomorphic preimage of another graph H, or equivalently

More information

International Journal of Scientific & Engineering Research Volume 8, Issue 5, May ISSN

International Journal of Scientific & Engineering Research Volume 8, Issue 5, May ISSN International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 106 Self-organizing behavior of Wireless Ad Hoc Networks T. Raghu Trivedi, S. Giri Nath Abstract Self-organization

More information

Web Structure Mining Community Detection and Evaluation

Web Structure Mining Community Detection and Evaluation Web Structure Mining Community Detection and Evaluation 1 Community Community. It is formed by individuals such that those within a group interact with each other more frequently than with those outside

More information

Realizing the 2-Associahedron

Realizing the 2-Associahedron Realizing the 2-Associahedron Patrick Tierney Satyan L. Devadoss, Advisor Dagan Karp, Reader Department of Mathematics May, 2016 Copyright 2016 Patrick Tierney. The author grants Harvey Mudd College and

More information

Strongly Connected Components

Strongly Connected Components Strongly Connected Components Let G = (V, E) be a directed graph Write if there is a path from to in G Write if and is an equivalence relation: implies and implies s equivalence classes are called the

More information

As an example of possible application we compute these invariants for connectome graphs which are studied in neuroscience.

As an example of possible application we compute these invariants for connectome graphs which are studied in neuroscience. STRONG CONNECTIVITY AND ITS APPLICATIONS arxiv:1609.07355v2 [cs.dm] 20 Oct 2016 PETERIS DAUGULIS Abstract. Directed graphs are widely used in modelling of nonsymmetric relations in various sciences and

More information

CS473-Algorithms I. Lecture 13-A. Graphs. Cevdet Aykanat - Bilkent University Computer Engineering Department

CS473-Algorithms I. Lecture 13-A. Graphs. Cevdet Aykanat - Bilkent University Computer Engineering Department CS473-Algorithms I Lecture 3-A Graphs Graphs A directed graph (or digraph) G is a pair (V, E), where V is a finite set, and E is a binary relation on V The set V: Vertex set of G The set E: Edge set of

More information

Approximation of Frequency Queries by Means of Free-Sets

Approximation of Frequency Queries by Means of Free-Sets Approximation of Frequency Queries by Means of Free-Sets Jean-François Boulicaut, Artur Bykowski, and Christophe Rigotti Laboratoire d Ingénierie des Systèmes d Information INSA Lyon, Bâtiment 501 F-69621

More information

Communication Networks I December 4, 2001 Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page 1

Communication Networks I December 4, 2001 Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page 1 Communication Networks I December, Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page Communication Networks I December, Notation G = (V,E) denotes a

More information

Text Mining. Representation of Text Documents

Text Mining. Representation of Text Documents Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,

More information

A SOM-view of oilfield data: A novel vector field visualization for Self-Organizing Maps and its applications in the petroleum industry

A SOM-view of oilfield data: A novel vector field visualization for Self-Organizing Maps and its applications in the petroleum industry A SOM-view of oilfield data: A novel vector field visualization for Self-Organizing Maps and its applications in the petroleum industry Georg Pölzlbauer, Andreas Rauber (Department of Software Technology

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Toward the integration of informatic tools and GRID infrastructure for Assyriology text analysis

Toward the integration of informatic tools and GRID infrastructure for Assyriology text analysis 58 Rencontre Assyriologique Internationale (RAI) Private and State 16-20 July 2012 - Leiden Toward the integration of informatic tools and GRID infrastructure for Assyriology text analysis Giovanni Ponti,

More information

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA) International Journal of Innovation and Scientific Research ISSN 2351-8014 Vol. 12 No. 1 Nov. 2014, pp. 217-222 2014 Innovative Space of Scientific Research Journals http://www.ijisr.issr-journals.org/

More information

of optimization problems. In this chapter, it is explained that what network design

of optimization problems. In this chapter, it is explained that what network design CHAPTER 2 Network Design Network design is one of the most important and most frequently encountered classes of optimization problems. In this chapter, it is explained that what network design is? The

More information

Discharging and reducible configurations

Discharging and reducible configurations Discharging and reducible configurations Zdeněk Dvořák March 24, 2018 Suppose we want to show that graphs from some hereditary class G are k- colorable. Clearly, we can restrict our attention to graphs

More information

Information mining and information retrieval : methods and applications

Information mining and information retrieval : methods and applications Information mining and information retrieval : methods and applications J. Mothe, C. Chrisment Institut de Recherche en Informatique de Toulouse Université Paul Sabatier, 118 Route de Narbonne, 31062 Toulouse

More information

Solutions to Midterm 2 - Monday, July 11th, 2009

Solutions to Midterm 2 - Monday, July 11th, 2009 Solutions to Midterm - Monday, July 11th, 009 CPSC30, Summer009. Instructor: Dr. Lior Malka. (liorma@cs.ubc.ca) 1. Dynamic programming. Let A be a set of n integers A 1,..., A n such that 1 A i n for each

More information

Binding Number of Some Special Classes of Trees

Binding Number of Some Special Classes of Trees International J.Math. Combin. Vol.(206), 76-8 Binding Number of Some Special Classes of Trees B.Chaluvaraju, H.S.Boregowda 2 and S.Kumbinarsaiah 3 Department of Mathematics, Bangalore University, Janana

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

Graph Structure Over Time

Graph Structure Over Time Graph Structure Over Time Observing how time alters the structure of the IEEE data set Priti Kumar Computer Science Rensselaer Polytechnic Institute Troy, NY Kumarp3@rpi.edu Abstract This paper examines

More information

A Data Mining Case Study

A Data Mining Case Study A Data Mining Case Study Mark-André Krogel Otto-von-Guericke-Universität PF 41 20, D-39016 Magdeburg, Germany Phone: +49-391-67 113 99, Fax: +49-391-67 120 18 Email: krogel@iws.cs.uni-magdeburg.de ABSTRACT:

More information

Research Question Presentation on the Edge Clique Covers of a Complete Multipartite Graph. Nechama Florans. Mentor: Dr. Boram Park

Research Question Presentation on the Edge Clique Covers of a Complete Multipartite Graph. Nechama Florans. Mentor: Dr. Boram Park Research Question Presentation on the Edge Clique Covers of a Complete Multipartite Graph Nechama Florans Mentor: Dr. Boram Park G: V 5 Vertex Clique Covers and Edge Clique Covers: Suppose we have a graph

More information

Applying Data Mining Techniques to Wafer Manufacturing

Applying Data Mining Techniques to Wafer Manufacturing Applying Data Mining Techniques to Wafer Manufacturing Elisa Bertino 1, Barbara Catania 2, and Eleonora Caglio 3 1 Università degli Studi di Milano Via Comelico 39/41 20135 Milano, Italy bertino@dsi.unimi.it

More information

Evolving SQL Queries for Data Mining

Evolving SQL Queries for Data Mining Evolving SQL Queries for Data Mining Majid Salim and Xin Yao School of Computer Science, The University of Birmingham Edgbaston, Birmingham B15 2TT, UK {msc30mms,x.yao}@cs.bham.ac.uk Abstract. This paper

More information

The Design Space of Software Development Methodologies

The Design Space of Software Development Methodologies The Design Space of Software Development Methodologies Kadie Clancy, CS2310 Term Project I. INTRODUCTION The success of a software development project depends on the underlying framework used to plan and

More information

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 The Survey Of Data Mining And Warehousing Architha.S, A.Kishore Kumar Department of Computer Engineering Department of computer engineering city engineering college VTU Bangalore, India ABSTRACT: Data

More information

Extremal Graph Theory: Turán s Theorem

Extremal Graph Theory: Turán s Theorem Bridgewater State University Virtual Commons - Bridgewater State University Honors Program Theses and Projects Undergraduate Honors Program 5-9-07 Extremal Graph Theory: Turán s Theorem Vincent Vascimini

More information

Generating Topology on Graphs by. Operations on Graphs

Generating Topology on Graphs by. Operations on Graphs Applied Mathematical Sciences, Vol. 9, 2015, no. 57, 2843-2857 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2015.5154 Generating Topology on Graphs by Operations on Graphs M. Shokry Physics

More information

ON DECOMPOSITION OF FUZZY BԐ OPEN SETS

ON DECOMPOSITION OF FUZZY BԐ OPEN SETS ON DECOMPOSITION OF FUZZY BԐ OPEN SETS 1 B. Amudhambigai, 2 K. Saranya 1,2 Department of Mathematics, Sri Sarada College for Women, Salem-636016, Tamilnadu,India email: 1 rbamudha@yahoo.co.in, 2 saranyamath88@gmail.com

More information

Redefining and Enhancing K-means Algorithm

Redefining and Enhancing K-means Algorithm Redefining and Enhancing K-means Algorithm Nimrat Kaur Sidhu 1, Rajneet kaur 2 Research Scholar, Department of Computer Science Engineering, SGGSWU, Fatehgarh Sahib, Punjab, India 1 Assistant Professor,

More information

An ICA-Based Multivariate Discretization Algorithm

An ICA-Based Multivariate Discretization Algorithm An ICA-Based Multivariate Discretization Algorithm Ye Kang 1,2, Shanshan Wang 1,2, Xiaoyan Liu 1, Hokyin Lai 1, Huaiqing Wang 1, and Baiqi Miao 2 1 Department of Information Systems, City University of

More information

C-NBC: Neighborhood-Based Clustering with Constraints

C-NBC: Neighborhood-Based Clustering with Constraints C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is

More information

Lecture 22 Tuesday, April 10

Lecture 22 Tuesday, April 10 CIS 160 - Spring 2018 (instructor Val Tannen) Lecture 22 Tuesday, April 10 GRAPH THEORY Directed Graphs Directed graphs (a.k.a. digraphs) are an important mathematical modeling tool in Computer Science,

More information

Some Strong Connectivity Concepts in Weighted Graphs

Some Strong Connectivity Concepts in Weighted Graphs Annals of Pure and Applied Mathematics Vol. 16, No. 1, 2018, 37-46 ISSN: 2279-087X (P), 2279-0888(online) Published on 1 January 2018 www.researchmathsci.org DOI: http://dx.doi.org/10.22457/apam.v16n1a5

More information

SIMILARITY MEASURES FOR MULTI-VALUED ATTRIBUTES FOR DATABASE CLUSTERING

SIMILARITY MEASURES FOR MULTI-VALUED ATTRIBUTES FOR DATABASE CLUSTERING SIMILARITY MEASURES FOR MULTI-VALUED ATTRIBUTES FOR DATABASE CLUSTERING TAE-WAN RYU AND CHRISTOPH F. EICK Department of Computer Science, University of Houston, Houston, Texas 77204-3475 {twryu, ceick}@cs.uh.edu

More information

Isometric Diamond Subgraphs

Isometric Diamond Subgraphs Isometric Diamond Subgraphs David Eppstein Computer Science Department, University of California, Irvine eppstein@uci.edu Abstract. We test in polynomial time whether a graph embeds in a distancepreserving

More information

Tadeusz Morzy, Maciej Zakrzewicz

Tadeusz Morzy, Maciej Zakrzewicz From: KDD-98 Proceedings. Copyright 998, AAAI (www.aaai.org). All rights reserved. Group Bitmap Index: A Structure for Association Rules Retrieval Tadeusz Morzy, Maciej Zakrzewicz Institute of Computing

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

Lecture 17 November 7

Lecture 17 November 7 CS 559: Algorithmic Aspects of Computer Networks Fall 2007 Lecture 17 November 7 Lecturer: John Byers BOSTON UNIVERSITY Scribe: Flavio Esposito In this lecture, the last part of the PageRank paper has

More information

Graphs. Data Structures and Algorithms CSE 373 SU 18 BEN JONES 1

Graphs. Data Structures and Algorithms CSE 373 SU 18 BEN JONES 1 Graphs Data Structures and Algorithms CSE 373 SU 18 BEN JONES 1 Warmup Discuss with your neighbors: Come up with as many kinds of relational data as you can (data that can be represented with a graph).

More information

Image Access and Data Mining: An Approach

Image Access and Data Mining: An Approach Image Access and Data Mining: An Approach Chabane Djeraba IRIN, Ecole Polythechnique de l Université de Nantes, 2 rue de la Houssinière, BP 92208-44322 Nantes Cedex 3, France djeraba@irin.univ-nantes.fr

More information

CMSC 380. Graph Terminology and Representation

CMSC 380. Graph Terminology and Representation CMSC 380 Graph Terminology and Representation GRAPH BASICS 2 Basic Graph Definitions n A graph G = (V,E) consists of a finite set of vertices, V, and a finite set of edges, E. n Each edge is a pair (v,w)

More information

Lecture Notes. char myarray [ ] = {0, 0, 0, 0, 0 } ; The memory diagram associated with the array can be drawn like this

Lecture Notes. char myarray [ ] = {0, 0, 0, 0, 0 } ; The memory diagram associated with the array can be drawn like this Lecture Notes Array Review An array in C++ is a contiguous block of memory. Since a char is 1 byte, then an array of 5 chars is 5 bytes. For example, if you execute the following C++ code you will allocate

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 2 Sajjad Haider Spring 2010 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric

More information

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) January 11, 2018 Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 In this lecture

More information

Generating edge covers of path graphs

Generating edge covers of path graphs Generating edge covers of path graphs J. Raymundo Marcial-Romero, J. A. Hernández, Vianney Muñoz-Jiménez and Héctor A. Montes-Venegas Facultad de Ingeniería, Universidad Autónoma del Estado de México,

More information

Towards Semantic Data Mining

Towards Semantic Data Mining Towards Semantic Data Mining Haishan Liu Department of Computer and Information Science, University of Oregon, Eugene, OR, 97401, USA ahoyleo@cs.uoregon.edu Abstract. Incorporating domain knowledge is

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN: IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T

More information

Graph Algorithms Using Depth First Search

Graph Algorithms Using Depth First Search Graph Algorithms Using Depth First Search Analysis of Algorithms Week 8, Lecture 1 Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Graph Algorithms Using Depth

More information

Scenario-based Refactoring Selection

Scenario-based Refactoring Selection BABEŞ-BOLYAI University of Cluj-Napoca Faculty of Mathematics and Computer Science Proceedings of the National Symposium ZAC2014 (Zilele Academice Clujene, 2014), p. 32-41 Scenario-based Refactoring Selection

More information

Alessandro Del Ponte, Weijia Ran PAD 637 Week 3 Summary January 31, Wasserman and Faust, Chapter 3: Notation for Social Network Data

Alessandro Del Ponte, Weijia Ran PAD 637 Week 3 Summary January 31, Wasserman and Faust, Chapter 3: Notation for Social Network Data Wasserman and Faust, Chapter 3: Notation for Social Network Data Three different network notational schemes Graph theoretic: the most useful for centrality and prestige methods, cohesive subgroup ideas,

More information

Theoretical Computer Science. Testing Eulerianity and connectivity in directed sparse graphs

Theoretical Computer Science. Testing Eulerianity and connectivity in directed sparse graphs Theoretical Computer Science 412 (2011) 6390 640 Contents lists available at SciVerse ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Testing Eulerianity and connectivity

More information

Learning Graph Grammars

Learning Graph Grammars Learning Graph Grammars 1 Aly El Gamal ECE Department and Coordinated Science Laboratory University of Illinois at Urbana-Champaign Abstract Discovery of patterns in a given graph - in the form of repeated

More information

Introduction to Trajectory Clustering. By YONGLI ZHANG

Introduction to Trajectory Clustering. By YONGLI ZHANG Introduction to Trajectory Clustering By YONGLI ZHANG Outline 1. Problem Definition 2. Clustering Methods for Trajectory data 3. Model-based Trajectory Clustering 4. Applications 5. Conclusions 1 Problem

More information

Text Mining: A Burgeoning technology for knowledge extraction

Text Mining: A Burgeoning technology for knowledge extraction Text Mining: A Burgeoning technology for knowledge extraction 1 Anshika Singh, 2 Dr. Udayan Ghosh 1 HCL Technologies Ltd., Noida, 2 University School of Information &Communication Technology, Dwarka, Delhi.

More information

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44 Data Mining Piotr Paszek piotr.paszek@us.edu.pl Introduction (Piotr Paszek) Data Mining DM KDD 1 / 44 Plan of the lecture 1 Data Mining (DM) 2 Knowledge Discovery in Databases (KDD) 3 CRISP-DM 4 DM software

More information

The Rainbow Connection of a Graph Is (at Most) Reciprocal to Its Minimum Degree

The Rainbow Connection of a Graph Is (at Most) Reciprocal to Its Minimum Degree The Rainbow Connection of a Graph Is (at Most) Reciprocal to Its Minimum Degree Michael Krivelevich 1 and Raphael Yuster 2 1 SCHOOL OF MATHEMATICS, TEL AVIV UNIVERSITY TEL AVIV, ISRAEL E-mail: krivelev@post.tau.ac.il

More information

An Apriori-like algorithm for Extracting Fuzzy Association Rules between Keyphrases in Text Documents

An Apriori-like algorithm for Extracting Fuzzy Association Rules between Keyphrases in Text Documents An Apriori-lie algorithm for Extracting Fuzzy Association Rules between Keyphrases in Text Documents Guy Danon Department of Information Systems Engineering Ben-Gurion University of the Negev Beer-Sheva

More information

[8] that this cannot happen on the projective plane (cf. also [2]) and the results of Robertson, Seymour, and Thomas [5] on linkless embeddings of gra

[8] that this cannot happen on the projective plane (cf. also [2]) and the results of Robertson, Seymour, and Thomas [5] on linkless embeddings of gra Apex graphs with embeddings of face-width three Bojan Mohar Department of Mathematics University of Ljubljana Jadranska 19, 61111 Ljubljana Slovenia bojan.mohar@uni-lj.si Abstract Aa apex graph is a graph

More information

Lecture 6: Graph Properties

Lecture 6: Graph Properties Lecture 6: Graph Properties Rajat Mittal IIT Kanpur In this section, we will look at some of the combinatorial properties of graphs. Initially we will discuss independent sets. The bulk of the content

More information

EVALUATING GENERALIZED ASSOCIATION RULES THROUGH OBJECTIVE MEASURES

EVALUATING GENERALIZED ASSOCIATION RULES THROUGH OBJECTIVE MEASURES EVALUATING GENERALIZED ASSOCIATION RULES THROUGH OBJECTIVE MEASURES Veronica Oliveira de Carvalho Professor of Centro Universitário de Araraquara Araraquara, São Paulo, Brazil Student of São Paulo University

More information

Bipartite Graph Partitioning and Content-based Image Clustering

Bipartite Graph Partitioning and Content-based Image Clustering Bipartite Graph Partitioning and Content-based Image Clustering Guoping Qiu School of Computer Science The University of Nottingham qiu @ cs.nott.ac.uk Abstract This paper presents a method to model the

More information

PACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS

PACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS PACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS PAUL BALISTER Abstract It has been shown [Balister, 2001] that if n is odd and m 1,, m t are integers with m i 3 and t i=1 m i = E(K n) then K n can be decomposed

More information

A THREE AND FIVE COLOR THEOREM

A THREE AND FIVE COLOR THEOREM PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 52, October 1975 A THREE AND FIVE COLOR THEOREM FRANK R. BERNHART1 ABSTRACT. Let / be a face of a plane graph G. The Three and Five Color Theorem

More information

MINING OPERATIONAL DATA FOR IMPROVING GSM NETWORK PERFORMANCE

MINING OPERATIONAL DATA FOR IMPROVING GSM NETWORK PERFORMANCE MINING OPERATIONAL DATA FOR IMPROVING GSM NETWORK PERFORMANCE Antonio Leong, Simon Fong Department of Electrical and Electronic Engineering University of Macau, Macau Edison Lai Radio Planning Networks

More information

Graph Theory S 1 I 2 I 1 S 2 I 1 I 2

Graph Theory S 1 I 2 I 1 S 2 I 1 I 2 Graph Theory S I I S S I I S Graphs Definition A graph G is a pair consisting of a vertex set V (G), and an edge set E(G) ( ) V (G). x and y are the endpoints of edge e = {x, y}. They are called adjacent

More information

Basic relations between pixels (Chapter 2)

Basic relations between pixels (Chapter 2) Basic relations between pixels (Chapter 2) Lecture 3 Basic Relationships Between Pixels Definitions: f(x,y): digital image Pixels: q, p (p,q f) A subset of pixels of f(x,y): S A typology of relations:

More information

Number Theory and Graph Theory

Number Theory and Graph Theory 1 Number Theory and Graph Theory Chapter 6 Basic concepts and definitions of graph theory By A. Satyanarayana Reddy Department of Mathematics Shiv Nadar University Uttar Pradesh, India E-mail: satya8118@gmail.com

More information

2. Background. 2.1 Clustering

2. Background. 2.1 Clustering 2. Background 2.1 Clustering Clustering involves the unsupervised classification of data items into different groups or clusters. Unsupervised classificaiton is basically a learning task in which learning

More information

Chapter 9 Graph Algorithms

Chapter 9 Graph Algorithms Introduction graph theory useful in practice represent many real-life problems can be if not careful with data structures Chapter 9 Graph s 2 Definitions Definitions an undirected graph is a finite set

More information

Discovering Paths Traversed by Visitors in Web Server Access Logs

Discovering Paths Traversed by Visitors in Web Server Access Logs Discovering Paths Traversed by Visitors in Web Server Access Logs Alper Tugay Mızrak Department of Computer Engineering Bilkent University 06533 Ankara, TURKEY E-mail: mizrak@cs.bilkent.edu.tr Abstract

More information

5 Matchings in Bipartite Graphs and Their Applications

5 Matchings in Bipartite Graphs and Their Applications 5 Matchings in Bipartite Graphs and Their Applications 5.1 Matchings Definition 5.1 A matching M in a graph G is a set of edges of G, none of which is a loop, such that no two edges in M have a common

More information