INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

Size: px
Start display at page:

Download "INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad"

Transcription

1 INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad INFORMATION TECHNOLOGY DEFINITIONS AND TERMINOLOGY Course Name : DATA WAREHOUSING AND DATA MINING Course Code : AIT006 Program : B. Tech Semester : VI Branch : IT Section : A,B Academic Year : Course Faculty : Ms. K. LaxmiNarayanamma, Assistant Professor, Dept. of IT OBJECTIVES I II To help students to consider in depth the terminology and nomenclature used in the syllabus. To focus on the meaning of new words / terminology/nomenclature 1 P a g e

2 DEFINITIONS AND TERMINOLOGYQUESTION BANK UNIT - I 1 Define Database. A database is a collection of information that is organized so that it can be easily accessed, managed and updated. Data is organized into rows, columns and tables, and it is indexed to make it easier to find relevant information. 2 What is data warehouse? A data warehousing is a technique for collecting and managing data from varied sources to provide meaningful business insights. It is a blend of technologies and components which allows the strategic use of data. 3 What is data store? A data store is a repository for storing, managing and distributing data sets on an enterprise level. It is a broad term that incorporates all types of data that is produced, stored and used by an organization. 4 What is data integration? Data integration is a process in which heterogeneous data is retrieved and combined as an incorporated form and structure. Data integration allows different data types (such as data sets, documents and tables) to be merged by users, organizations and applications, for use as personal or business processes and/or functions. 4 What is data mart? A data mart is a repository of data that is designed to serve a particular community of knowledge workers. Because data marts catalog specific data, they often require less space than enterprise data warehouses, making them easier to search and cheaper to run. 5 What is Enterprise Data warehouse? In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources. 6 Define Meta Data. Metadata is data that describes other data. Meta is a prefix that in most information technology usages means "an underlying definition or description." Metadata summarizes basic information about data, which can make finding and working with particular instances of data easier. 7 What is operational Data? An operational data store (ODS) is a type of database that's often used as an interim logical area for a data warehouse. An ODS can be used for integrating disparate data from multiple sources so that business operations, CLO1 AIT CLO2 AIT CLO2 AIT CLO2 AIT CLO3 AIT CLO2 AIT P a g e

3 analysis and reporting can be carried out while business operations are occurring. 8 Define OLTP (online transaction processing). OLTP (online transaction processing) is a class of software programs capable of supporting transaction-oriented applications on the Internet. Typically, OLTPsystems are used for order entry, financial transactions, customer relationship management (CRM) and retail sales. 9 Define data cube. An OLAP cube is a multidimensional database that is optimized for data warehouse and online analytical processing (OLAP) applications. An OLAP cube is a method of storing data in a multidimensional form, generally for reporting purposes. In OLAP cubes, data (measures) are categorized by dimensions. 10 Define OLAP (online analytical processing). 12 List OLAP operations 1.Roll-up(Drill-up) 2.Drill-down 3.Slice and Dice 4.Pivot 13 Define Roll-up operation on data cube? 14 Define Drill-down operation on data cube? 15 Define Slice operation on data cube? 16 Define Dice operation on data cube? OLAP (online analytical processing) is a computing method that enables users to easily and selectively extract and query data in order to analyze it from different points of view. OLAP is the technology behind many Business Intelligence (BI) applications. OLAP is a powerful technology for data discovery, including capabilities for limitless report viewing, complex analytical calculations, and predictive what if scenario (budget, forecast) planning. The roll-up operation performs aggregation on a data cube either by climbing up the hierarchy or by dimension reduction. Drill-down is the reverse of roll-up. That means lower level summary to higher level summary. Drill-down can be performed either by 1. Stepping down a concept hierarchy for a dimension 2.By introducing a new dimension. The Slice operation performs a selection on one dimension of the given cube, resulting in a sub cube. Reduces the dimensionality of the cubes. The Dice operation defines a sub-cube by performing a selection on two or more dimensions. CLO2 AIT P a g e

4 17 Define Pivot operation on data cube? Pivot is also known as rotate. It Rotates the data axis to view the data from different perspectives. 18 Distinguish OLTP and OLAP An OLTP system is customer-oriented and is used for transaction and query with respect to users and processing by clerks, clients, and information technology professionals. An system orientation? OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts. 19 Distinguishing OLTP and Data contents: An OLTP system manages current data that, typically, are OLAP with respect to Data too detailed to be easily used for decision making. contents? An OLAP system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier for use in informed decision making. 20 List Distinguishing features of 1.Users and system orientation OLTP and OLAP. 2.Data contents 3.Database design 4.View 5.Access patterns 21 What is a multidimensional The multidimensional data model is an integral part of On-Line Analytical data model? Processing, or OLAP.And because OLAP is also analytic, the queries are complex. The multidimensional data model is designed to solve complex queries in real time. 22 Define Star Schema? In data warehousing and business intelligence (BI), a star schema is the simplest form of a dimensional model, in which data is organized into facts and dimensions. A fact is an event that is counted or measured, such as a sale or login. A dimension contains reference information about the fact, such as date, product, or customer. A star schema is diagramed by surrounding each fact with its associated dimensions. The resulting diagram resembles a star. 23 Define Snowflake schema. In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions. UNIT II CLO1 AIT CLO3 AIT CLO3 AIT CLO2 AIT P a g e

5 1 What is data mining? Data mining is the process of sorting through large data sets to identify patterns and establish relationships to solve problems through data analysis. Data mining tools allow enterprises to predict future trends. 2 Need of Data Mining? In present world, huge amount of data is available in Information Industry. 3 List the steps included in data mining. Until it converts to useful knowledge there is no use of this huge data. 1.Data Cleaning 2.Data Integration 3.Data Selection 4.Data Transformation 5.Data Mining 6.Pattern Evaluation 7.Knowledge Presentation 4 What is data cleaning? Data cleansing is the process of altering data in a given storage resource to make sure that it is accurate and correct. There are many ways to pursue data cleansing in various software and data storage architectures; most of them center on the careful review of data sets and the protocols associated with any particular data storage technology. 5 What is Data Integration? Data Integration is the process of combining the data from multiple data sources. 6 Define data Selection? Data selection is defined as the process of determining the appropriate data type and source, as well as suitable instruments to collect data. Data selection precedes the actual practice of data collection. 7 What is Data Transformation? Data transformation is the process of converting data or information from one format to another, usually from the format of a source system into the required format of a new destination system. The usual process involves converting documents, but data conversions sometimes involve the conversion of a program from one computer language to another to enable the program to run on a different platform. The usual reason for this data migration is the adoption of a new system that's totally different from the previous one. 8 List the major components of data Mining system architecture 1.Data bases or Data Warehouse Server 2.Data Mining Engine 3.Pattern Evaluation CLO5 AIT CLO5 AIT CLO5 AIT CLO6 AIT CLO6 AIT Understand CLO5 AIT CLO5 AIT P a g e

6 4.Knowledge Base 5.Graphical User Interface 9 What is data mining Engine? Data mining is a very important process where potentially useful and previously unknown information is extracted from large volumes of data. There are a number of components involved in the data mining process. These components constitute the architecture of a data mining system. 10 Define Knowledge Base with respect to data mining. 11 List the applications of Data Mining? 12 List the Data Mining Functionalities? 13 What are the predictive tasks of data mining? Knowledge Base consists of data that is very important in the process of data mining. Knowledge Base provides input to the data mining engine which guides data mining engine in the process of pattern search. 1.Market Analysis 2.Fraud Detection 3.Customer Retention 4.Production Control 5.Science Exploration 1.Concept / Class description 2.Association (correlation and causality) 3.Classification and Prediction 4.Cluster analysis 5.Outlier analysis 6.Trend and evolution analysis 1.Classification 2.Prediction 3.Time Series analysis 14 What is Clustering principle? Clustering is based on the principle of maximizing the intra-class similarity and minimizing the interclass similarity. 15 List the different measures to evaluate the pattern / rules. 16 List the Data Mining system classifications. 1.Objective Measures based on statistics and structures of patterns, e.g., support, confidence, etc. 2.Subjective Measures based on user s belief in the data, e.g., unexpectedness, novelty, action ability, etc. 1.Based on different views of Data Mining system 2.Kinds of databases to be mined 3.Kinds of knowledge to be discovered 4.Kinds of techniques utilized 5.Kinds of applications adapted CLO6 AIT CLO6 AIT CLO5 AIT CLO5 AIT P a g e

7 17 List the Data Mining task primitives 1.Set of task-relevant data to be mined 2.Kind of knowledge to be mined 3.Background knowledge to be used in the discovery process 4.Mining methodology and user interaction 5.Performance and scalability 18 Define Data reduction. Data reduction is the transformation of numerical or alphabetical digital information derived empirically or experimentally into a corrected, ordered, and simplified form. The basic concept is the reduction of multitudinous amounts of data down to the meaningful parts. 19 State the need for data reduction in data mining? 20 List the Data Reduction Strategies. A database or date warehouse may store terabytes of data. So it may take very long to perform data analysis and mining on such huge amounts of data. Data Reduction Strategies: 1.Data Cube Aggregation 2.Dimensionality Reduction 3.Data Compression 4.Numerosity Reduction 5.Discretisation and concept hierarchy generation. UNIT III 1 What is a market basket? A market basket is a collection of items purchased by a customer in a single transaction, which is a well-defined business activity. CLO5 AIT What is association rule mining? 3 What is the need for association rule mining? 4 What are the measures of rule interestingness? Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories. In a given transaction with multiple items, it tries to find the rules that govern how or why items are often bought together. Rule support and confidence are two measures of rule interestingness. They respectively reflect the usefulness and certainty of discovered rules 7 P a g e

8 5 List the applications of association rule mining? Applications of Association Rule:- 1.Market Basket data analysis. 2.Catalog design. 3.Cross marketing. 6 State Apriori property? The Apriori Algorithm is an influential algorithm for mining frequent itemsets for boolean association rules. Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation, and groups of candidates are tested against the data. 7 Define Monotonic functions? A monotonic function is a function which is either entirely nonincreasing or nondecreasing. Afunction is monotonic if its first derivative (which need not be continuous) does not change sign. 8 What is Multilevel association rule? Multilevel association rule: Multilevel association rules can be defined as applying association rules over different levels of data abstraction. 9 What is Multi dimensional Multi dimensional association rule can be defined as the statement which association rule? contains only two (or) more predicates/dimensions. 10 Define Categorical Attributes. In statistics, a categorical variable is a variable that can take on one of a limited and usually fixed number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property. 12 Define Quantitative Attributes. 13 State constraint based association mining. 14 List the constraints used in mining. Quantitative Attribute (QA) is a special attribute that is used to compare two values, i.e., it is used to compare a user-defined value against an upper limit and a lower limit. For example, the result for a test is inferred by comparing the user-defined value against an upper and a lower limit Constraint-based association rule mining aims to develop a systematic method by which the user can find important association among items in a database of transactions. To elaborate, many retailers, such as supermarkets, carry a large number of items. the kinds of constraints used in the mining are 1.Knowledge type constraint 2.Data constraints 3.Dimension/level constraints 4.Rule constraints 5. Interestingness constraints. CLO9 AIT CLO9 AIT CLO9 AIT CLO10 AIT CLO10 AIT CLO10 AIT CLO10 AIT P a g e

9 15 What is Closed Frequent Item set? 16 Why Is Frequent Pattern Growth Fast? It is a frequent item set that is both closed and its support is greater than or equal to minimum support. An item set is closed in a data set if there exists no superset that has the same support count as this original item set. 1.No candidate generation 2.No candidate test 3.Use compact data structure 4.Eliminate repeated database scan 5.Basic operation is counting and FPtree building. CLO9 AIT What is Support? Support is an indication of how frequently the item set appears in the dataset. 18 What is Confidence? Confidence is an indication of how often the rule has been found to be true. 19 What is support and minimum support? 20 What is pruning in data mining? The minimum support and minimum confidence are set by the users, and are parameters of the Apriori algorithm for association rule generation. These parameters are used to exclude rules in the result that have a support or a confidence lower than the minimum support and minimum confidence respectively. Pruning is a technique in machine learning that reduces the size of decision trees by removing sections of the tree that provide little power to classify instances. Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of over fitting. 21 What is frequent Item set? Frequent Itemset an itemset whose support is greater than or equal to minimum support and threshold. 1 Define classification in data mining? UNIT - IV Classification is a data mining function that assigns items in a collection to target categories or classes. The goal of classification is to accurately predict the target class for each case in the data. For example, a classification model could be used to identify loan applicants as low, medium, or high credit risks. 2 What is prediction? Prediction in data mining is to identify data points purely on the description of another related data value. It is not necessarily related to future events but CLO9 AIT CLO9 AIT CLO12 AIT CLO12 AIT P a g e

10 the used variables are unknown. Prediction is used to know the unknown or missing values. The prediction in data mining is known as Numeric Prediction. 3 List the steps in classification? 1.Model construction: describing a set of predetermined classes 2.Model usage: for classifying future or unknown objects 4 List the common machine learning algorithms. 1.Linear Regression 2.Logistic Regression 3.Decision Tree 4.SVM 5.Naive Bayes 6.KNN 7.K-Means 8.Random Forest. 5 Define a decision Tree? A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements. 6 What Tree pruning in data mining? Pruning is a technique in machine learning that reduces the size of decision trees by removing sections of the tree that provide little power to classify instances. Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting. 7 State the use of Decision tree? A decision tree can be used to visually and explicitly represent decisions and decision making. In data mining, a decision tree describes data (but the resulting classification tree can be an input for decision making). This page deals with decision trees in data mining. 8 What are attribute selection measures? 9 What is Probabilistic learning 10 Define Probabilistic prediction? 1.Information Gain 2.Gain Ratio 3.Gini Index Probabilistic learning calculates explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems. Probabilistic prediction predicts multiple hypotheses, weighted by their probabilities. CLO13 AIT CLO13 AIT CLO13 AIT CLO12 AIT P a g e

11 11 Define Bayesian classification? Bayesian classification is based on Bayes' Theorem. Bayesian classifiers are the statistical classifiers. Bayesian classifiers can predict class membership probabilities such as the probability that a given tuple belongs to a particular class. 12 Define lazy learning? Lazy learning is a learning method in which generalization of the training data is delayed until a query is made to the system 13 Define eager learning? In eager learning, where the system tries to generalize the training data before receiving queries. 14 State the disadvantage of lazy learning? 15 State the reason why the nearest neighbor is a lazy algorithm? 16 Define regression analysis in data mining? 17 Give the methods for comparing classification and prediction. The disadvantages with lazy learning include the large space requirement to store the entire training dataset. Particularly noisy training data increases the case base unnecessarily, because no abstraction is made during the training phase. K-NN is a lazy learner because it doesn t learn a discriminative function from the training data but memorizes the training dataset instead. Regression is a data mining technique used to predict a range of numeric values (also called continuous values), given a particular dataset. Regression is used across multiple industries for business and marketing planning, financial forecasting, environmental modeling and analysis of trends the criteria for comparing the methods of Classification and Prediction 1.Accuracy 2. Speed 3. Robustness 4. Scalability 5. Interpretability 18 Define data science? Datascience is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to data mining. 19 Define accuracy of a classifier? Accuracy of classifier refers to the ability of classifier. It predict the class label correctly and the accuracy of the predictor refers to how well a given predictor can guess the value of predicted attribute for a new data. CLO12 AIT CLO13 AIT CLO13 AIT P a g e

12 20 What is Naïve Bayes algorithm? 1 What is a cluster in data mining? Naive Bayes is a machine learning algorithm for classification problems. It is based on Bayes' probability theorem. It is primarily used for text classification which involves high dimensional training data sets. UNIT - V Clustering is the process of making a group of abstract objects into classes of similar objects. Cluster of data objects can be treated as one group. While doing cluster analysis, we first partition the set of data into groups based on data similarity and then assign the labels to the groups. 2 Define Cluster analysis. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. 3 What is supervised learning? Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs.... In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). 4 What is machine learning? Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to progressively improve their performance on a specific task. Machine learning algorithms build a mathematical model of sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task. 5 Give examples of clustering. 1.Biology 2.Information retrieval 3.Land use 4.Marketing 5.City-planning 6.Climate CLO16 AIT CLO16 AIT CLO15 AIT CLO17 AIT CLO16 AIT P a g e

13 6 Give the Considerations for Cluster Analysis 7 Give Major Clustering Approaches. 1.Partitioning criteria 2.Separation of clusters 3.Similarity measure 4.Clustering space 1.Partitioning approach 2.Hierarchical approach 3.Density-based approach 4.Grid-based approach. 8 What is good clustering? A good clustering method will produce high quality clusters with high intraclass similarity low inter-class similarity. 9 State the weakness of K- Means algorithm. 1.Applicable only when mean is defined 2.Need to specify k, the number of clusters, in advance, 3.Unable to handle noisy data outliers 4.Not suitable to discover clusters with non-convex shapes 10 Define time series database? A time series database (TSDB) is a software system that is optimized for handling time series data, arrays of numbers indexed by time (a date time or a date time range). 11 What is the Type of data variables used in clustering analysis? 12 What is the Categorization of Major Clustering Methods 13 List the steps in K-means clustering algorithm. 1.Interval-scaled variables 2.Binary variables 3.Nominal, ordinal, and ratio variables 4.Variables of mixed types. 1.Partitioning algorithms 2.Hierarchy algorithms 3.Density-based 4.Grid-based 5.Model-based. 1.Initialize the center of the clusters 2.Attribute the closest cluster to each data point CLO15 AIT CLO17 AIT CLO19 AIT CLO19 AIT CLO17 AIT CLO17 AIT CLO19 AIT CLO19 AIT P a g e

14 14 List the classification of clustering methods. 15 State the key difference between classification and clustering. 16 Define EM (Expectation- 3.Set the position of each cluster to the mean of all data points belonging to that cluster. 1.Partitioning Method 2.Hierarchical Method 3. Density-based Method 4.Grid-Based Method 5.Model-Based Method 6.Constraint-based Method. Classification is taking data and putting it into pre-defined categories and in Clustering the set of categories, that you want to group the data into, is not known beforehand. A commonly used algorithm for model-based clustering is the Expectation- Maximization algorithm or EM algorithm, EM clustering is an iterative Maximization) algorithm. algorithm. 17 Define outlier. In statistics, an outlier is an observation point that is distant from other observations. An outlier may be due to variability in the measurement or it may indicate experimental error; the latter are sometimes excluded from the data set. CLO19 AIT CLO19 AIT CLO19 AIT CLO20 AIT Signature of the Faculty Signature of the HOD 14 P a g e

DATA WAREHOUING UNIT I

DATA WAREHOUING UNIT I BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009

More information

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand

More information

Table Of Contents: xix Foreword to Second Edition

Table Of Contents: xix Foreword to Second Edition Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SHRI ANGALAMMAN COLLEGE OF ENGINEERING & TECHNOLOGY (An ISO 9001:2008 Certified Institution) SIRUGANOOR,TRICHY-621105. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year / Semester: IV/VII CS1011-DATA

More information

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Contents. Foreword to Second Edition. Acknowledgments About the Authors Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1

More information

Question Bank. 4) It is the source of information later delivered to data marts.

Question Bank. 4) It is the source of information later delivered to data marts. Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

More information

DATA MINING TRANSACTION

DATA MINING TRANSACTION DATA MINING Data Mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is

More information

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing. About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

Data warehouse and Data Mining

Data warehouse and Data Mining Data warehouse and Data Mining Lecture No. 14 Data Mining and its techniques Naeem A. Mahoto Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur 603203. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : III & VI Section : CSE - 2 Subject Code : IT6702 Subject Name : Data warehousing

More information

SCHEME OF COURSE WORK. Data Warehousing and Data mining

SCHEME OF COURSE WORK. Data Warehousing and Data mining SCHEME OF COURSE WORK Course Details: Course Title Course Code Program: Specialization: Semester Prerequisites Department of Information Technology Data Warehousing and Data mining : 15CT1132 : B.TECH

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

CT75 (ALCCS) DATA WAREHOUSING AND DATA MINING JUN

CT75 (ALCCS) DATA WAREHOUSING AND DATA MINING JUN Q.1 a. Define a Data warehouse. Compare OLTP and OLAP systems. Data Warehouse: A data warehouse is a subject-oriented, integrated, time-variant, and 2 Non volatile collection of data in support of management

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,

More information

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY CHARACTERISTICS Data warehouse is a central repository for summarized and integrated data

More information

Data warehouses Decision support The multidimensional model OLAP queries

Data warehouses Decision support The multidimensional model OLAP queries Data warehouses Decision support The multidimensional model OLAP queries Traditional DBMSs are used by organizations for maintaining data to record day to day operations On-line Transaction Processing

More information

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015 Q.1 a. Briefly explain data granularity with the help of example Data Granularity: The single most important aspect and issue of the design of the data warehouse is the issue of granularity. It refers

More information

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis. www..com www..com Set No.1 1. a) What is data mining? Briefly explain the Knowledge discovery process. b) Explain the three-tier data warehouse architecture. 2. a) With an example, describe any two schema

More information

Time: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A.

Time: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A. COPYRIGHT RESERVED End Sem (V) MCA (XXVIII) 2017 Time: 3 hours Full Marks: 70 Candidates are required to give their answers in their own words as far as practicable. The figures in the margin indicate

More information

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data. Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss

More information

Introduction to Data Mining S L I D E S B Y : S H R E E J A S W A L

Introduction to Data Mining S L I D E S B Y : S H R E E J A S W A L Introduction to Data Mining S L I D E S B Y : S H R E E J A S W A L Books 2 Which Chapter from which Text Book? Chapter 1: Introduction from Han, Kamber, "Data Mining Concepts and Techniques", Morgan Kaufmann

More information

Code No: R Set No. 1

Code No: R Set No. 1 Code No: R05321204 Set No. 1 1. (a) Draw and explain the architecture for on-line analytical mining. (b) Briefly discuss the data warehouse applications. [8+8] 2. Briefly discuss the role of data cube

More information

IT6702 DATA WAREHOUSING AND DATA MINING TWO MARKS WITH ANSWER UNIT-1 DATA WAREHOUSING

IT6702 DATA WAREHOUSING AND DATA MINING TWO MARKS WITH ANSWER UNIT-1 DATA WAREHOUSING IT6702 DATA WAREHOUSING AND DATA MINING TWO MARKS WITH ANSWER UNIT-1 DATA WAREHOUSING 1. What are the uses of multifeature cubes? (Nov/Dec 2007) multifeature cubes, which compute complex queries involving

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS PART A 1. What are production reporting tools? Give examples. (May/June 2013) Production reporting tools will let companies generate regular operational reports or support high-volume batch jobs. Such

More information

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad -500 043 COMPUTER SCIENCE AND ENGINEERING TUTORIAL QUESTION BANK Course Name Course Code Class Branch DATA WAREHOUSING AND DATA MINING

More information

COMPUTER SCIENCE AND ENGINEERING TUTORIAL QUESTION BANK

COMPUTER SCIENCE AND ENGINEERING TUTORIAL QUESTION BANK COMPUTER SCIENCE AND ENGINEERING TUTORIAL QUESTION BANK Course Name Course Code Class Branch DATA WAREHOUSING AND DATA MINING A70520 IV B. Tech I Semester Computer Science and Engineering Year 2016 2017

More information

Data Mining and Analytics. Introduction

Data Mining and Analytics. Introduction Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data

More information

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10) CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification

More information

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:- UNIT III: Data Warehouse and OLAP Technology: An Overview : What Is a Data Warehouse? A Multidimensional Data Model, Data Warehouse Architecture, Data Warehouse Implementation, From Data Warehousing to

More information

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems Data Warehousing & Mining CPS 116 Introduction to Database Systems Data integration 2 Data resides in many distributed, heterogeneous OLTP (On-Line Transaction Processing) sources Sales, inventory, customer,

More information

An Overview of Data Warehousing and OLAP Technology

An Overview of Data Warehousing and OLAP Technology An Overview of Data Warehousing and OLAP Technology CMPT 843 Karanjit Singh Tiwana 1 Intro and Architecture 2 What is Data Warehouse? Subject-oriented, integrated, time varying, non-volatile collection

More information

1. What are the nine decisions in the design of the data warehouse?

1. What are the nine decisions in the design of the data warehouse? 1. What are the nine decisions in the design of the data warehouse? 1. Choosing the process 2. Choosing the grain 3. Identifying and conforming the dimensions 4. Choosing the facts 5. Storing pre-calculations

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective B.Manivannan Research Scholar, Dept. Computer Science, Dravidian University, Kuppam, Andhra Pradesh, India

More information

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE)

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE) SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road 517583 QUESTION BANK (DESCRIPTIVE) Subject with Code : Data Warehousing and Mining (16MC815) Year & Sem: II-MCA & I-Sem Course

More information

Data Science. Data Analyst. Data Scientist. Data Architect

Data Science. Data Analyst. Data Scientist. Data Architect Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &

More information

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore Data Warehousing Data Mining (17MCA442) 1. GENERAL INFORMATION: PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore 560 100 Department of MCA COURSE INFORMATION SHEET Academic

More information

1. Inroduction to Data Mininig

1. Inroduction to Data Mininig 1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

An Overview of various methodologies used in Data set Preparation for Data mining Analysis An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of

More information

Chapter 4 Data Mining A Short Introduction

Chapter 4 Data Mining A Short Introduction Chapter 4 Data Mining A Short Introduction Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining 3. Clustering 4. Classification Data Mining - 2 2 1. Data Mining Overview

More information

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Lectures for the course: Data Warehousing and Data Mining (IT 60107) Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline

More information

Data warehouse architecture consists of the following interconnected layers:

Data warehouse architecture consists of the following interconnected layers: Architecture, in the Data warehousing world, is the concept and design of the data base and technologies that are used to load the data. A good architecture will enable scalability, high performance and

More information

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

What Is Data Mining? CMPT 354: Database I -- Data Mining 2 Data Mining What Is Data Mining? Mining data mining knowledge Data mining is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data CMPT

More information

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Fig 1.2: Relationship between DW, ODS and OLTP Systems 1.4 DATA WAREHOUSES Data warehousing is a process for assembling and managing data from various sources for the purpose of gaining a single detailed view of an enterprise. Although there are several definitions

More information

Evolution of Database Systems

Evolution of Database Systems Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second

More information

Classification Algorithms in Data Mining

Classification Algorithms in Data Mining August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

More information

Supervised and Unsupervised Learning (II)

Supervised and Unsupervised Learning (II) Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised

More information

Basic Data Mining Technique

Basic Data Mining Technique Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm

More information

Knowledge Modelling and Management. Part B (9)

Knowledge Modelling and Management. Part B (9) Knowledge Modelling and Management Part B (9) Yun-Heh Chen-Burger http://www.aiai.ed.ac.uk/~jessicac/project/kmm 1 A Brief Introduction to Business Intelligence 2 What is Business Intelligence? Business

More information

OLAP Introduction and Overview

OLAP Introduction and Overview 1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension

More information

CHAPTER 3 Implementation of Data warehouse in Data Mining

CHAPTER 3 Implementation of Data warehouse in Data Mining CHAPTER 3 Implementation of Data warehouse in Data Mining 3.1 Introduction to Data Warehousing A data warehouse is storage of convenient, consistent, complete and consolidated data, which is collected

More information

Data Mining Concepts

Data Mining Concepts Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential

More information

Winter Semester 2009/10 Free University of Bozen, Bolzano

Winter Semester 2009/10 Free University of Bozen, Bolzano Data Warehousing and Data Mining Winter Semester 2009/10 Free University of Bozen, Bolzano DW Lecturer: Johann Gamper gamper@inf.unibz.it DM Lecturer: Mouna Kacimi mouna.kacimi@unibz.it http://www.inf.unibz.it/dis/teaching/dwdm/index.html

More information

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check

More information

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before

More information

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different? (Please write your Roll No. immediately) End-Term Examination Fourth Semester [MCA] MAY-JUNE 2006 Roll No. Paper Code: MCA-202 (ID -44202) Subject: Data Warehousing & Data Mining Note: Question no. 1 is

More information

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar 1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar 1) What does the term 'Ad-hoc Analysis' mean? Choice 1 Business analysts use a subset of the data for analysis. Choice 2: Business analysts access the Data

More information

DATA MINING Introductory and Advanced Topics Part I

DATA MINING Introductory and Advanced Topics Part I DATA MINING Introductory and Advanced Topics Part I Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by Dr. M.H.Dunham, Data

More information

Tribhuvan University Institute of Science and Technology MODEL QUESTION

Tribhuvan University Institute of Science and Technology MODEL QUESTION MODEL QUESTION 1. Suppose that a data warehouse for Big University consists of four dimensions: student, course, semester, and instructor, and two measures count and avg-grade. When at the lowest conceptual

More information

UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES

UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES Data Pre-processing-Data Cleaning, Integration, Transformation, Reduction, Discretization Concept Hierarchies-Concept Description: Data Generalization And

More information

劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012

劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012 劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012 Overview of Data Mining ( 資料採礦 ) What is Data Mining? Steps in Data Mining Overview of Data Mining techniques Points to Remember Data mining

More information

Association Rules. Berlin Chen References:

Association Rules. Berlin Chen References: Association Rules Berlin Chen 2005 References: 1. Data Mining: Concepts, Models, Methods and Algorithms, Chapter 8 2. Data Mining: Concepts and Techniques, Chapter 6 Association Rules: Basic Concepts A

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4 Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is

More information

BIG DATA SCIENTIST Certification. Big Data Scientist

BIG DATA SCIENTIST Certification. Big Data Scientist BIG DATA SCIENTIST Certification Big Data Scientist Big Data Science Professional (BDSCP) certifications are formal accreditations that prove proficiency in specific areas of Big Data. To obtain a certification,

More information

COMP 465 Special Topics: Data Mining

COMP 465 Special Topics: Data Mining COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems Management Information Systems Management Information Systems B10. Data Management: Warehousing, Analyzing, Mining, and Visualization Code: 166137-01+02 Course: Management Information Systems Period: Spring

More information

Data Mining and Warehousing

Data Mining and Warehousing Data Mining and Warehousing Sangeetha K V I st MCA Adhiyamaan College of Engineering, Hosur-635109. E-mail:veerasangee1989@gmail.com Rajeshwari P I st MCA Adhiyamaan College of Engineering, Hosur-635109.

More information

A Review on Cluster Based Approach in Data Mining

A Review on Cluster Based Approach in Data Mining A Review on Cluster Based Approach in Data Mining M. Vijaya Maheswari PhD Research Scholar, Department of Computer Science Karpagam University Coimbatore, Tamilnadu,India Dr T. Christopher Assistant professor,

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining

More information

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing

More information

Data Warehouse and Mining

Data Warehouse and Mining Data Warehouse and Mining 1. is a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management decisions. A. Data Mining. B. Data Warehousing. C. Web Mining. D. Text

More information

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking

More information

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI Department of Information Technology IT6702 Data Warehousing & Data Mining Anna University 2 & 16 Mark Questions & Answers Year / Semester: IV / VII Regulation:

More information

Data Collection, Preprocessing and Implementation

Data Collection, Preprocessing and Implementation Chapter 6 Data Collection, Preprocessing and Implementation 6.1 Introduction Data collection is the loosely controlled method of gathering the data. Such data are mostly out of range, impossible data combinations,

More information

Summary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse

Summary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse Principles of Knowledge Discovery in bases Fall 1999 Chapter 2: Warehousing and Dr. Osmar R. Zaïane University of Alberta Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in bases University

More information

Correlative Analytic Methods in Large Scale Network Infrastructure Hariharan Krishnaswamy Senior Principal Engineer Dell EMC

Correlative Analytic Methods in Large Scale Network Infrastructure Hariharan Krishnaswamy Senior Principal Engineer Dell EMC Correlative Analytic Methods in Large Scale Network Infrastructure Hariharan Krishnaswamy Senior Principal Engineer Dell EMC 2018 Storage Developer Conference. Dell EMC. All Rights Reserved. 1 Data Center

More information

AT78 DATA MINING & WAREHOUSING JUN 2015

AT78 DATA MINING & WAREHOUSING JUN 2015 Q.2 a. Where is data mining is used? (4) Applications: 1) Military uses 2) Medical field 3) Business Intelligence 4) Intelligence agencies (security purpose in communication and other fields) 5) Data Retrieval

More information

Data Mining Course Overview

Data Mining Course Overview Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical

More information

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core) Introduction to Data Science What is Analytics and Data Science? Overview of Data Science and Analytics Why Analytics is is becoming popular now? Application of Analytics in business Analytics Vs Data

More information

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei Data Mining Chapter 1: Introduction Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei 1 Any Question? Just Ask 3 Chapter 1. Introduction Why Data Mining? What Is Data Mining? A Multi-Dimensional

More information

The strategic advantage of OLAP and multidimensional analysis

The strategic advantage of OLAP and multidimensional analysis IBM Software Business Analytics Cognos Enterprise The strategic advantage of OLAP and multidimensional analysis 2 The strategic advantage of OLAP and multidimensional analysis Overview Online analytical

More information

COURSE PLAN. Computer Science & Engineering

COURSE PLAN. Computer Science & Engineering COURSE PLAN FACULTY DETAILS: Name of the Faculty:: Designation: Department:: Asst. Professor Computer Science & Engineering COURSE DETAILS Name Of The Programme:: Lesson Plan Batch:: 2011-2015 Designation::Assistant

More information

REPORTING AND QUERY TOOLS AND APPLICATIONS

REPORTING AND QUERY TOOLS AND APPLICATIONS Tool Categories: REPORTING AND QUERY TOOLS AND APPLICATIONS There are five categories of decision support tools Reporting Managed query Executive information system OLAP Data Mining Reporting Tools Production

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS Assist. Prof. Dr. Volkan TUNALI Topics 2 Business Intelligence (BI) Decision Support System (DSS) Data Warehouse Online Analytical Processing (OLAP)

More information

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús

More information

Lecture 18. Business Intelligence and Data Warehousing. 1:M Normalization. M:M Normalization 11/1/2017. Topics Covered

Lecture 18. Business Intelligence and Data Warehousing. 1:M Normalization. M:M Normalization 11/1/2017. Topics Covered Lecture 18 Business Intelligence and Data Warehousing BDIS 6.2 BSAD 141 Dave Novak Topics Covered Test # Review What is Business Intelligence? How can an organization be data rich and information poor?

More information

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms

More information

DEPARTMENT OF INFORMATION TECHNOLOGY IT6702 DATA WAREHOUSING & DATA MINING

DEPARTMENT OF INFORMATION TECHNOLOGY IT6702 DATA WAREHOUSING & DATA MINING DEPARTMENT OF INFORMATION TECHNOLOGY IT6702 DATA WAREHOUSING & DATA MINING UNIT I PART A 1. Define data mining? Data mining refers to extracting or mining" knowledge from large amounts of data and another

More information

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) INTRODUCTION A dimension is an attribute within a multidimensional model consisting of a list of values (called members). A fact is defined by a combination

More information