Data Mining Primitives, Languages, and System Data Mining Primitives Task-relevant data The kinds of knowledge to be mined: Background Knowledge
|
|
- Meredith Sutton
- 6 years ago
- Views:
Transcription
1 Data Mining Primitives, Languages, and System Data Mining Primitives Task-relevant data The kinds of knowledge to be mined: Background Knowledge Interestingness measures Presentation and visualization of discovered Patters: Task-relevant data: This is the data base portion to be investigated. Rather than mining the entire database, you can specify the portion of the database that is relevant for analysis/investigation. E.g.: Transaction involving customer purchases is Canda need to be retrieved The kinds of knowledge to be mined: This specifies the data mining functions to be performed, such as Characterization, discrimination. Association. Classification. Clustering. Background Knowledge: Users can specify the background knowledge or knowledge about the domain to be mined. Background knowledge is useful for guiding the knowledge discovery process and for evaluating the patterns found. There are several kinds of background knowledge such as: Concept hierarchies: help in mining data at multiple levels of abstraction. Beliefs regarding relationships in data: This helps to evaluate the discovered patterns according to their degree of unexpectedness or expectedness. Interestingness measures: Interestingness measures are used o to separate uninteresting patterns from knowledge. To guide the mining process. To evaluate the discovered patters.
2 Different kinds of knowledge have different interestingness measures. Example: Association rule mining has interestingness measures, such as Support : the % of task-relevant relevant data tuples for which the rulepattern appears. Confidence: an estimate of the strength of the implication of the rule. The rules whose support and confidence values are below user-specified threshold are considered uninteresting. Presentation and visualization of discovered Patters: This refers to the form in which discovered patterns to be displayed A Data Mining Query Language The Language adopts an SQL- like syntax, so that it can easily be integrated with the relational query language SQL. The syntax of DMQL is defined is an extended BNF grammar where [ ] represents 0 or one occurrence, { } represents 0 or more occurrences, and words in sans serif font represent keywords. Syntax for Task Relevant Data Specification:.DMQL provides clauses for the specification of such information Syntax: Use database <database-name> / use data ware house <data warehouse-name> In relevance to <attribute-or-dimension list> From <relation(s) / cube (s) > [where <condition>] [order by <order-list>] [group by <grouping-list>] [having <condition> ] //Condition by which groups of data are considered relevant. Examples: Use database All Electronics-database In relevance to I.name, I.price, C.income, C.age From customer C, iter I, purchase P, items-sold S Where I.iterm-ID = S.item-ID and S.trans-ID = P.trans-ID and P.cust-ID = C.cust-ID and C.country = canada Group by P.data.
3 Syntax for specifying the kind of knowledge to be mined: Specifying the kind of the knowledge to be mined determines the data mining function to be performed. Characterization <Mine-Knowledge-Specification>: : = mine characteristics [as <pattern-name> ] analyze <measure(s)> specifies that characteristic descriptions are to be mined. Examples : mine characteristics as customerpurchasing analyze count % Discrimination: <Mine-Knowledge-Specification> : : = mine comparison [as <pattern-names>] for <target-class> where <target-condition> { versus <contrast-class-i> where <contrast-condition_i > } analyze <measure(s)> specifies that discriminant descriptions are to be mined Syntax for Association: <mine-knowledge-specification> : : = mine association [as <pattern-name>] [matching <meta pattern> ] Specify the mining pattern of association Syntax for Classification: <mine-knowledge-specification> : : = mine classification [as <pattern name> ] analyze <classifying-attribute-or-dimension> Specifies that patterns for data classification are to be mined Syntax for Concept Hierarchy Specification Concept hierarchy allow the mining of knowledge at multiple levels of abstractions.. Use hierarchy <hirarcy_name> for <attribute_or_dimention> Schema hierarchy: This can be defined as street <city < province-or-state < country
4 Set-grouping hierarchy: organizes values for a given attribute into groups of constants or range values. Define hierarchy age_hierarchy for age on customer as Level1: {young, middle aged, senior}< level0: Ì all Leve2: { }< Level1: Ìyoung Leve2: { } < Level1: Ì Middle aged Level2: { }< Level1: Ì senior Syntax for Interestingness Measure Specification: Interestingness measure and thresholds can be specified by the user with the statement With[<interest measre>] threshold =<threshold. E.g.: with support threshold = 5% with confidence threshold = 70% Syntax for pattern presentation and visualization DMQL display statement for visualization of patterns is: display as <result form> Where, the <result form> could be any of the knowledge presentation /visualization forms, such as table, pie chart, etc To view the patterns at different levels of abstractions <multilevel-manipulation> : = rollup on <attribute or-dimension> Drilldown on <attribute or-dimension> add <attribute or-dimension> drop < attribute or-dimension> Designing Graphical User Interfaces Based On a Data Mining Query Language In experienced users may find data mining query language awkward to use and the syntax difficult to remember Instead, users may prefer to communicate with DM Systems through a GUI A Data mining GUI may consists of the following functional components Data collection and data mining query composition. : This component allows the user to specify task relevant data sets and to compose Data Mining queries.
5 Presentation of discovered patterns: This component allows the display of the discovered patterns in various forms, including tables, graphs, charts curves and other visualization techniques Hierarchy specification and manipulation: It allows for concept hierarchy specification, either by hand by user or automatically. It also allows for modification of concept hierarchies by the user or automatically. Manipulation of DMPS: It allows for dynamic adjustment of DM thresholds. It also allows for selection, display and modification of concept hierarchies Interactive multilevel mining : It allows roll-up or drilldown operations on discovered Patterns. Other miscellaneous in formation : It includes on-line help manuals, indexed search, debugging etc., Architectures of Data Mining Systems Database and data warehouse systems have becomes the mainstream information systems Comprehensive information processing and data analysis infrastructures have been systematically constructed surrounding database systems and data warehouses. Data Mining using following coupling schemes No coupling Loose coupling Semi-tight coupling Tight coupling No Coupling: No coupling means that a DM system will not utilize any function of a DB or DW system. It may fetch data from a particular source (such as a file system), Process data using some DM algorithms and then store the mining results in another files. It is A Simple in implementation. Disadvantages: No coupling DM system may spend a substantial amount of time finding, collecting, cleaning and transforming data where as in DB/DW systems, data tend to be well organized, indexed, cleaned, integrated, so that the finding task relevant, highly quality data becomes easy.
6 There are many tested, scalable algorithms and data structures implemented in DB/DW systems. Without any coupling of such systems, a DM system will need to used other tools, (making it difficult to integrable such systems into an info processing environment _ Hence, NoCoupling represents a poor design. Loose coupling: DM system uses some facilities of a DB/DW systems It fetches data from a depositing managed by these systems, performs data mining, and then stores the results either in a file or in a designated place in DB/DW. Advantages: It is better than no coupling, since it fetches any portion of data stored in DB/DW using query processing, indexing, and other system facilities. It incurs flexibility, efficiency provided by DW/DB systems. Disadvantages: It is difficult to achieve high scalability and good performance with large data sets, since, Loose coupling DM systems are Memory based and It does not explore data structures and query optimization methods provided by DB/DW systems Semi tight coupling: Besides facilities of DB/DW systems availed by loose coupling, it also uses efficient implementation of few DM primitives such Sorting, indexing, aggregation, histogram analysis, multiway join, and pre-computation of some essential stastical measures such as sum, count etc. Also, some frequently used intermediate mining results can be precomputed and stored in DB/DW system. Tight coupling: Here, DM system is smoothly integrated into the DB/DW system. DM subsystem is treated as one functional component of an information system Data mining queries and functions are optimized based on mining query analysis, data structures, indexing schemes and query processing methods of a DB/DW systems Advantages: This approach is highly desirable architecture. It facilitates efficient implementation of data mining functions. It provides high system performance. It provides integrated information processing environment. Introduction to Data Generalization
7 Data generalization is a process which abstracts a large set of task-relevant data in a database from a relatively low conceptual level to higher conceptual levels. Methods for the efficient and flexible generalization of large data sets can be categorized according to two approaches: (1) the data cube approach, (2) the attribute-oriented induction approach. Attribute-oriented induction The general idea of attribute-oriented induction is to first collect the task-relevant data using a relational database query and then perform generalization based on the examination of the number of distinct values of each attribute in the relevant set of data. The generalization is performed by either attribute removal or attribute generalization (also known as concept hierarchy ascension). Aggregation is performed by merging identical, generalized tuples, and accumulating their respective counts. This reduces the size of the generalized data set. The resulting generalized relation can be mapped into different forms for presentation to the user, such as charts or rules. The essential operation of attribute-oriented induction is data generalization, which can be performed in one of two ways on the initial working relation: (1) attribute removal, or (2) attribute generalization. 1. Attribute removal is based on the following rule: If there is a large set of distinct values for an attribute of the initial working relation, but either (1) there is no eneralization operator on the attribute (e.g., there is no concept hierarchy defined for the attribute), or (2) its higher level concepts are expressed in terms of other attributes, then the attribute should be removed from the working relation. If, as in case-1, there is a large set of distinct values for an attribute but there is no generalization operator for it, the attribute should be removed because it cannot be generalized. In case-2, where the higher-level concepts of the attribute are expressed in terms of other attributes. 2. Attribute generalization is based on the following rule: If there is a large set of distinct values for an attribute in the initial working relation, and there exists a set of generalization operators on the attribute, then a generalization operator should be selected and applied to the attribute.
8 This rule is based on the following reasoning. Use of a generalization operator to generalize an attribute value within a tuple in the working relation will make the rule cover more of the original data tuples, thus generalizing the concept it represents. This corresponds to the generalization rule known as climbing generalization trees in learning-from-examples. The process of control of how high an attribute should be generalized is called attribute generalization control. If the attribute is generalized too high", it may lead to overgeneralization, and the resulting rules may not be very informative. On the other hand, if the attribute is not generalized to a sufficiently high level", then under-generalization may result, where the rules obtained may not be informative either. Thus, a balance should be attained in attribute-oriented generalization. There are many possible ways to control a generalization process. Two common approaches are described below. The Firs technique, called attribute generalization threshold control, either sets one generalization threshold for all of the attributes, or sets one threshold for each attribute. If the number of distinct values in an attribute is greater than the attribute threshold, further attribute removal or attribute generalization should be performed. The second technique, called generalized relation threshold control, sets a threshold for the generalized relation. If the number of (distinct) tuples in the generalized relat ion is greater than the threshold, further generalization should be performed. Otherwise, no further generalization should be performed. For example, if a user feels that the generalized relation is too small, she can increase the threshold, which implies drilling down. Otherwise, to further generalize a relation, she can reduce the threshold, which implies rolling up. Efficient implementation of attribute-oriented induction Algorithm : Attribute-oriented induction. Mining generalized characteristics in a relational database based on a user's data mining request. Input. (i) A relational database DB, (ii) a data mining q uery, DMQuery, (iii) a_list, a list of attributes (iv) Gen(ai), a set of concept hierarchies or generalization operators on
9 attributes ai, and (iv) a_gen_thres h(ai) attribute generalization thresholds for each attributes ai. Output. P, a Prime_generalized_relation. Method.The method is outlined as follows 1. W get_task_relevant_data(dmquery, DB) 2. prepare_for_generalization( W); This is performed by: (1) Scanning the initial working relation W once and collecting the distinct values for each attribute ai (2) Computing the minimum desired level Li for each attribute ai based on its given or default attribute threshold and (3) determining the mapping-pairs (v, v0) for each attribute ai in W, where v is a distinct value of ai in W, and v0 is its corresponding generalized value at level Li. 3. P generalization(w); This is done by replacing each value v in W while accumulating count and computing any other aggregate values. This step can be efficiently implemented in two variations: (1) For each generalized tuple, insert the tuple into a sorted prime relation P by a binary search: if the tuple is already in P, simply increase its count and other aggregate values accordingly; otherwise, insert it into P. (2) Since in most cases the number of distinct values at the prime relation level is small, the prime relation can be coded as an m-dimensional array where m is the number of attributes in P, and each dimension contains the corresponding generalized attribute values. Each array element holds the corresponding count and other aggregation values, if any. The insertion of a generalized tuple is performed by measure aggregation in the corresponding array element. Data cube implementation of attribute-oriented induction The data cube implementation of attribute-oriented induction can be performed in two ways. 1. Construct a data cube on-the-fly for the given data mining query: This is desirable if either the task-relevant data set is too specific to match any predefined data cube, or it is
10 not very large. Since such a data cube is computed only after the query is submitted, the major motivation for constructing such a data cube is to facilitate efficient drill-down analysis. 2. Use a predefined data cube: An alternative Method use a predefined data cube query is posed to the system, and use this predefined cube for subsequent data mining. This is desirable if the granularity of the task-relevant data can match that of the predefined data cube and the set of task-relevant data is quite large Since such a data cube is precomputed, it facilitates attribute relevance analysis, attribute-oriented induction, dicing and slicing, roll-up, and drill-down. The cost one must pay is the cost of cube computation and the nontrivial storage overhead. Methods of attribute relevance analysis The general idea behind attribute relevance analysis is to compute some measure which is used to quantify the relevance of an attribute with respect to a given class. Such measures include the information gain, Giniindex, uncertainty, and correlation coefficients. Let S be a set of training object (or tuple) where the class label of each tuple is known. Suppose that there are m classes. Let S contain si objects of class Ci, for i = 1,,m. An arbitrary object belongs to class Ci with probability si/s, where s is the total number of objects in set S. The expected information needed to classify given tuple is If an attribute A with values {a1, a2....av} is used to partition S into the subsets {S1, S2,... Sv }, where Sj contains those objects in S that have value aj of A. Let Sj contain sij objects of class Ci. The expected information based on this partitioning by A is known as the entropy of A. It is the weighted average: The information gained by branching on A is defined by:
11 The attribute which maximizes Gain(A) is selected. Attribute relevance analysis for class description is performed as follows. 1. Data Collection: Collect data for both the target class and the contrasting class by query processing. Notice that for class comparison, both the target class and the contrasting class are provided by the user in the data mining query. For class characterization, the target class is the class to be characterized, whereas the contrasting class is the set of comparable data which are not in the target class. 2. Preliminary Relevance analysis using conservative AOI: Attribute-oriented induction (AOI) can be used to perform some preliminary relevance analysis on the data by removing or generalizing attributes having a large number of distinct values(such as name and phone#). Such attributes are unlikely to be meaningful for concept description. To be conservative, the AOI should employ attribute generalization thresholds that are set reasonably large.( so as to allow more attributes to be considered in further relevance analysis by selected measure performed in step-3). The relation obtained by such an attribute removal and attribute generalization process is called the candidate relation of the mining task. 3. Remove irrelevant or weakly relevant attributes using the selected measure : The selected relevance measure used is used evaluate(or rank) each attribute in the candidate relation. For example, the information gain measure described above may be used. The attributes are then sorted (i.e., ranked) according to their computed relevance measure value. Attribute that are not relevant or weakly relevant are then removed based on the set threshold. The resulting relation is called Initial Target/Contrast class Relation. Mining Class Comparisons The class comparison can be done on the below factor Data collection Dimension relevance analysis Synchronous generalization Presentation of the derived comparison Working
12 Use Big_U nivers ity_d B mine comparison as grad_vs _undergrad_s tudents in relevance to name, gender, major, birth_place, birth_date, residence, phone#, gpa for graduate_s tudents where status in graduate versus undergraduate_s tudents where status in undergraduate analyzecount% froms tudent 1. Data collection target and contrasting classes 2. Attribute relevance analysis remove attributes name, gender, major, phone# 3. Synchronous generalization controlled by user-specified dimension thresholds prime target and contrasting class(es) relations/cuboids 4. Drill down, roll up and other OLAP operations on target and contrasting classes to adjust levels of abstractions of resulting description 5. Presentation as generalized relations, crosstabs, bar charts, pie charts, or rules contrasting measures to reflect comparison between target and contrasting classes e.g. count%
Data Mining By IK Unit 4. Unit 4
Unit 4 Data mining can be classified into two categories 1) Descriptive mining: describes concepts or task-relevant data sets in concise, summarative, informative, discriminative forms 2) Predictive mining:
More informationDEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SHRI ANGALAMMAN COLLEGE OF ENGINEERING & TECHNOLOGY (An ISO 9001:2008 Certified Institution) SIRUGANOOR,TRICHY-621105. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year / Semester: IV/VII CS1011-DATA
More informationData Mining: Concepts and Techniques
Data Mining: Concepts and Techniques Slides for Textbook Chapter 4 October 17, 2006 Data Mining: Concepts and Techniques 1 Chapter 4: Data Mining Primitives, Languages, and System Architectures Data mining
More informationDATA WAREHOUING UNIT I
BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009
More informationThis tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.
About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts
More informationR07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.
www..com www..com Set No.1 1. a) What is data mining? Briefly explain the Knowledge discovery process. b) Explain the three-tier data warehouse architecture. 2. a) With an example, describe any two schema
More informationDatabase design View Access patterns Need for separate data warehouse:- A multidimensional data model:-
UNIT III: Data Warehouse and OLAP Technology: An Overview : What Is a Data Warehouse? A Multidimensional Data Model, Data Warehouse Architecture, Data Warehouse Implementation, From Data Warehousing to
More informationUNIT 4 Characterization and Comparison
UNIT 4 Characterization and Comparison Lecture Topic ************************************************* Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 What is concept description? Data generalization
More informationCT75 DATA WAREHOUSING AND DATA MINING DEC 2015
Q.1 a. Briefly explain data granularity with the help of example Data Granularity: The single most important aspect and issue of the design of the data warehouse is the issue of granularity. It refers
More information1. Inroduction to Data Mininig
1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationCHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)
CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) INTRODUCTION A dimension is an attribute within a multidimensional model consisting of a list of values (called members). A fact is defined by a combination
More information1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda
Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:
More informationCHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and
CHAPTER-13 Mining Class Comparisons: Discrimination between DifferentClasses: 13.1 Introduction 13.2 Class Comparison Methods and Implementation 13.3 Presentation of Class Comparison Descriptions 13.4
More informationData warehouses Decision support The multidimensional model OLAP queries
Data warehouses Decision support The multidimensional model OLAP queries Traditional DBMSs are used by organizations for maintaining data to record day to day operations On-line Transaction Processing
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More informationLectures for the course: Data Warehousing and Data Mining (IT 60107)
Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline
More informationAn Approach to Intensional Query Answering at Multiple Abstraction Levels Using Data Mining Approaches
An Approach to Intensional Query Answering at Multiple Abstraction Levels Using Data Mining Approaches Suk-Chung Yoon E. K. Park Dept. of Computer Science Dept. of Software Architecture Widener University
More informationINSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad
INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad - 500 043 INFORMATION TECHNOLOGY DEFINITIONS AND TERMINOLOGY Course Name : DATA WAREHOUSING AND DATA MINING Course Code : AIT006 Program
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationContents. Foreword to Second Edition. Acknowledgments About the Authors
Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1
More informationSummary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4
Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is
More informationData Mining and Analytics. Introduction
Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data
More informationDATA MINING AND WAREHOUSING
DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making
More information2 CONTENTS
Contents 4 Data Cube Computation and Data Generalization 3 4.1 Efficient Methods for Data Cube Computation............................. 3 4.1.1 A Road Map for Materialization of Different Kinds of Cubes.................
More informationCHAPTER-23 MINING COMPLEX TYPES OF DATA
CHAPTER-23 MINING COMPLEX TYPES OF DATA 23.1 Introduction 23.2 Multidimensional Analysis and Descriptive Mining of Complex Data Objects 23.3 Generalization of Structured Data 23.4 Aggregation and Approximation
More information2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.
Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss
More informationSQL Server Analysis Services
DataBase and Data Mining Group of DataBase and Data Mining Group of Database and data mining group, SQL Server 2005 Analysis Services SQL Server 2005 Analysis Services - 1 Analysis Services Database and
More informationCT75 (ALCCS) DATA WAREHOUSING AND DATA MINING JUN
Q.1 a. Define a Data warehouse. Compare OLTP and OLAP systems. Data Warehouse: A data warehouse is a subject-oriented, integrated, time-variant, and 2 Non volatile collection of data in support of management
More informationData Preprocessing. Komate AMPHAWAN
Data Preprocessing Komate AMPHAWAN 1 Data cleaning (data cleansing) Attempt to fill in missing values, smooth out noise while identifying outliers, and correct inconsistencies in the data. 2 Missing value
More informationOLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube
OLAP2 outline Multi Dimensional Data Model Need for Multi Dimensional Analysis OLAP Operators Data Cube Demonstration Using SQL Multi Dimensional Data Model Multi dimensional analysis is a popular approach
More informationCse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before
More informationETL and OLAP Systems
ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester
More informationCode No: R Set No. 1
Code No: R05321204 Set No. 1 1. (a) Draw and explain the architecture for on-line analytical mining. (b) Briefly discuss the data warehouse applications. [8+8] 2. Briefly discuss the role of data cube
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs
More informationAn Overview of Data Warehousing and OLAP Technology
An Overview of Data Warehousing and OLAP Technology CMPT 843 Karanjit Singh Tiwana 1 Intro and Architecture 2 What is Data Warehouse? Subject-oriented, integrated, time varying, non-volatile collection
More informationCarnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem
Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Lecture # 24: Data Warehousing / Data Mining (R&G, ch 25 and 26) Data mining detailed outline Problem
More informationData Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier
Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio
More informationWeb Data mining-a Research area in Web usage mining
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,
More informationSCHEME OF COURSE WORK. Data Warehousing and Data mining
SCHEME OF COURSE WORK Course Details: Course Title Course Code Program: Specialization: Semester Prerequisites Department of Information Technology Data Warehousing and Data mining : 15CT1132 : B.TECH
More informationData mining - detailed outline. Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Problem.
Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Data Warehousing / Data Mining (R&G, ch 25 and 26) C. Faloutsos and A. Pavlo Data mining detailed outline
More informationWeb Usage Mining: A Research Area in Web Mining
Web Usage Mining: A Research Area in Web Mining Rajni Pamnani, Pramila Chawan Department of computer technology, VJTI University, Mumbai Abstract Web usage mining is a main research area in Web mining
More informationOracle Database 10g: Introduction to SQL
ORACLE UNIVERSITY CONTACT US: 00 9714 390 9000 Oracle Database 10g: Introduction to SQL Duration: 5 Days What you will learn This course offers students an introduction to Oracle Database 10g database
More informationData Science. Data Analyst. Data Scientist. Data Architect
Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &
More informationCMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)
CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification
More informationBasics of Dimensional Modeling
Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension
More informationAn Overview of various methodologies used in Data set Preparation for Data mining Analysis
An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of
More informationInformation Integration
Chapter 11 Information Integration While there are many directions in which modern database systems are evolving, a large family of new applications fall undei the general heading of information integration.
More informationEcient Rule-Based Attribute-Oriented Induction. for Data Mining
Ecient Rule-Based Attribute-Oriented Induction for Data Mining David W. Cheung y H.Y. Hwang z Ada W. Fu z Jiawei Han x y Department of Computer Science and Information Systems, The University of Hong Kong,
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro
More informationComputing Data Cubes Using Massively Parallel Processors
Computing Data Cubes Using Massively Parallel Processors Hongjun Lu Xiaohui Huang Zhixian Li {luhj,huangxia,lizhixia}@iscs.nus.edu.sg Department of Information Systems and Computer Science National University
More informationA Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective
A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective B.Manivannan Research Scholar, Dept. Computer Science, Dravidian University, Kuppam, Andhra Pradesh, India
More informationGUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV
GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand
More informationData Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A
Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 432 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business
More informationDeccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus
Overview: Analysis Services enables you to analyze large quantities of data. With it, you can design, create, and manage multidimensional structures that contain detail and aggregated data from multiple
More information5. MULTIPLE LEVELS AND CROSS LEVELS ASSOCIATION RULES UNDER CONSTRAINTS
5. MULTIPLE LEVELS AND CROSS LEVELS ASSOCIATION RULES UNDER CONSTRAINTS Association rules generated from mining data at multiple levels of abstraction are called multiple level or multi level association
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 07 : 06/11/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationSomething to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:
Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base
More informationData warehouse and Data Mining
Data warehouse and Data Mining Lecture No. 14 Data Mining and its techniques Naeem A. Mahoto Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationDiscovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials *
Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials * Galina Bogdanova, Tsvetanka Georgieva Abstract: Association rules mining is one kind of data mining techniques
More informationData Warehousing and Decision Support
Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical
More informationThis tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.
About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This
More informationData Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application
Data Structures Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali 2009-2010 Association Rules: Basic Concepts and Application 1. Association rules: Given a set of transactions, find
More informationData Warehousing and Decision Support
Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 4320 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business
More informationIntroduction to Data Mining
Introduction to JULY 2011 Afsaneh Yazdani What motivated? Wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge What motivated? Data
More information2. Discovery of Association Rules
2. Discovery of Association Rules Part I Motivation: market basket data Basic notions: association rule, frequency and confidence Problem of association rule mining (Sub)problem of frequent set mining
More informationChapter 18: Data Analysis and Mining
Chapter 18: Data Analysis and Mining Database System Concepts See www.db-book.com for conditions on re-use Chapter 18: Data Analysis and Mining Decision Support Systems Data Analysis and OLAP 18.2 Decision
More informationQuestion Bank. 4) It is the source of information later delivered to data marts.
Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile
More informationAssociation Rules. Berlin Chen References:
Association Rules Berlin Chen 2005 References: 1. Data Mining: Concepts, Models, Methods and Algorithms, Chapter 8 2. Data Mining: Concepts and Techniques, Chapter 6 Association Rules: Basic Concepts A
More informationA Multi-Dimensional Data Model
A Multi-Dimensional Data Model A Data Warehouse is based on a Multidimensional data model which views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in
More informationData Warehousing and OLAP
Data Warehousing and OLAP INFO 330 Slides courtesy of Mirek Riedewald Motivation Large retailer Several databases: inventory, personnel, sales etc. High volume of updates Management requirements Efficient
More informationDta Mining and Data Warehousing
CSCI6405 Fall 2003 Dta Mining and Data Warehousing Instructor: Qigang Gao, Office: CS219, Tel:494-3356, Email: q.gao@dal.ca Teaching Assistant: Christopher Jordan, Email: cjordan@cs.dal.ca Office Hours:
More informationAdvanced Data Management Technologies
ADMT 2017/18 Unit 10 J. Gamper 1/37 Advanced Data Management Technologies Unit 10 SQL GROUP BY Extensions J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements: I
More informationData Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA
Obj ti Objectives Motivation: Why preprocess the Data? Data Preprocessing Techniques Data Cleaning Data Integration and Transformation Data Reduction Data Preprocessing Lecture 3/DMBI/IKI83403T/MTI/UI
More informationRocky Mountain Technology Ventures
Rocky Mountain Technology Ventures Comparing and Contrasting Online Analytical Processing (OLAP) and Online Transactional Processing (OLTP) Architectures 3/19/2006 Introduction One of the most important
More informationData Warehouse and Data Mining
Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationKnowledge Discovery in Databases. Databases. date name surname street city account no. payment balance
Databases date name surname street city account no. payment balance 980103 Jan Novak Dlouha 5 Praha 1 9945371 100.00 100.00 980105 Jan Novak Dlouha 5 Praha 1 9945371 1500.00 1600.00 980106 Jan Novak Dlouha
More informationTribhuvan University Institute of Science and Technology MODEL QUESTION
MODEL QUESTION 1. Suppose that a data warehouse for Big University consists of four dimensions: student, course, semester, and instructor, and two measures count and avg-grade. When at the lowest conceptual
More informationSQL Server 2005 Analysis Services
atabase and ata Mining Group of atabase and ata Mining Group of atabase and ata Mining Group of atabase and ata Mining Group of atabase and ata Mining Group of atabase and ata Mining Group of SQL Server
More informationData Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.
Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan
More informationGrid Computing Systems: A Survey and Taxonomy
Grid Computing Systems: A Survey and Taxonomy Material for this lecture from: A Survey and Taxonomy of Resource Management Systems for Grid Computing Systems, K. Krauter, R. Buyya, M. Maheswaran, CS Technical
More informationANU MLSS 2010: Data Mining. Part 2: Association rule mining
ANU MLSS 2010: Data Mining Part 2: Association rule mining Lecture outline What is association mining? Market basket analysis and association rule examples Basic concepts and formalism Basic rule measurements
More informationBUSINESS INTELLIGENCE. SSAS - SQL Server Analysis Services. Business Informatics Degree
BUSINESS INTELLIGENCE SSAS - SQL Server Analysis Services Business Informatics Degree 2 BI Architecture SSAS: SQL Server Analysis Services 3 It is both an OLAP Server and a Data Mining Server Distinct
More informationIT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS
PART A 1. What are production reporting tools? Give examples. (May/June 2013) Production reporting tools will let companies generate regular operational reports or support high-volume batch jobs. Such
More information1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.
Instructions to the Examiners: 1. May the Examiners not look for exact words from the text book in the Answers. 2. May any valid example be accepted - example may or may not be from the text book 1. Attempt
More informationFig 1.2: Relationship between DW, ODS and OLTP Systems
1.4 DATA WAREHOUSES Data warehousing is a process for assembling and managing data from various sources for the purpose of gaining a single detailed view of an enterprise. Although there are several definitions
More informationData Analysis and Data Science
Data Analysis and Data Science CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/29/15 Agenda Check-in Online Analytical Processing Data Science Homework 8 Check-in Online Analytical
More informationUNIT 2 Data Preprocessing
UNIT 2 Data Preprocessing Lecture Topic ********************************************** Lecture 13 Why preprocess the data? Lecture 14 Lecture 15 Lecture 16 Lecture 17 Data cleaning Data integration and
More informationVALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur 603203. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : III & VI Section : CSE - 2 Subject Code : IT6702 Subject Name : Data warehousing
More informationREPORTING AND QUERY TOOLS AND APPLICATIONS
Tool Categories: REPORTING AND QUERY TOOLS AND APPLICATIONS There are five categories of decision support tools Reporting Managed query Executive information system OLAP Data Mining Reporting Tools Production
More informationUNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?
(Please write your Roll No. immediately) End-Term Examination Fourth Semester [MCA] MAY-JUNE 2006 Roll No. Paper Code: MCA-202 (ID -44202) Subject: Data Warehousing & Data Mining Note: Question no. 1 is
More informationChapter 18: Data Analysis and Mining
Chapter 18: Data Analysis and Mining Database System Concepts See www.db-book.com for conditions on re-use Chapter 18: Data Analysis and Mining Decision Support Systems Data Analysis and OLAP Data Warehousing
More informationDecision Support Systems aka Analytical Systems
Decision Support Systems aka Analytical Systems Decision Support Systems Systems that are used to transform data into information, to manage the organization: OLAP vs OLTP OLTP vs OLAP Transactions Analysis
More informationC=(FS) 2 : Cubing by Composition of Faceted Search
C=(FS) : Cubing by Composition of Faceted Search Ronny Lempel Dafna Sheinwald IBM Haifa Research Lab Introduction to Multifaceted Search and to On-Line Analytical Processing (OLAP) Intro Multifaceted Search
More informationData Warehousing and Data Mining
Data Warehousing and Data Mining Lecture 3 Efficient Cube Computation CITS3401 CITS5504 Wei Liu School of Computer Science and Software Engineering Faculty of Engineering, Computing and Mathematics Acknowledgement:
More informationData Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems
Data Warehousing & Mining CPS 116 Introduction to Database Systems Data integration 2 Data resides in many distributed, heterogeneous OLTP (On-Line Transaction Processing) sources Sales, inventory, customer,
More informationData Mining. Vera Goebel. Department of Informatics, University of Oslo
Data Mining Vera Goebel Department of Informatics, University of Oslo 2012 1 Lecture Contents Knowledge Discovery in Databases (KDD) Definition and Applications OLAP Architectures for OLAP and KDD KDD
More informationPreprocessing Short Lecture Notes cse352. Professor Anita Wasilewska
Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept
More informationChapter 12: Indexing and Hashing. Basic Concepts
Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition
More informationDSS based on Data Warehouse
DSS based on Data Warehouse C_13 / 19.01.2017 Decision support system is a complex system engineering. At the same time, research DW composition, DW structure and DSS Architecture based on DW, puts forward
More information