Table Of Contents: xix Foreword to Second Edition

Size: px
Start display at page:

Download "Table Of Contents: xix Foreword to Second Edition"

Transcription

1 Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data Mining? 1 (4) Moving toward the Information 1 (1) Age Data Mining as the Evolution of 2 (3) Information Technology 1.2 What Is Data Mining? 5 (3) 1.3 What Kinds of Data Can Be Mined? 8 (7) Database Data 9 (1) Data Warehouses 10 (3) Transactional Data 13 (1) Other Kinds of Data 14 (1) 1.4 What Kinds of Patterns Can Be Mined? 15 (8) Class/Concept Description: 15 (2) Characterization and Discrimination

2 1.4.2 Mining Frequent Patterns, 17 (1) Associations, and Correlations Classification and Regression for 18 (1) Predictive Analysis Cluster Analysis 19 (1) Outlier Analysis 20 (1) Are All Patterns Interesting? 21 (2) 1.5 Which Technologies Are Used? 23 (4) Statistics 23 (1) Machine Learning 24 (2) Database Systems and Data 26 (1) Warehouses Information Retrieval 26 (1) 1.6 Which Kinds of Applications Are 27 (2) Targeted? Business Intelligence 27 (1) Web Search Engines 28 (1) 1.7 Major Issues in Data Mining 29 (4) Mining Methodology 29 (1) User Interaction 30 (1) Efficiency and Scalability 31 (1) Diversity of Database Types 32 (1) Data Mining and Society 32 (1) 1.8 Summary 33 (1) 1.9 Exercises 34 (1) 1.10 Bibliographic Notes 35 (4) Chapter 2 Getting to Know Your Data 39 (44) 2.1 Data Objects and Attribute Types 40 (4) What Is an Attribute? 40 (1) Nominal Attributes 41 (1) Binary Attributes 41 (1)

3 2.1.4 Ordinal Attributes 42 (1) Numeric Attributes 43 (1) Discrete versus Continuous 44 (1) Attributes 2.2 Basic Statistical Descriptions of Data 44 (12) Measuring the Central Tendency: 45 (3) Mean, Median, and Mode Measuring the Dispersion of Data: 48 (3) Range, Quartiles, Variance, Standard Deviation, and Interquartile Range Graphic Displays of Basic Statistical 51 (5) Descriptions of Data 2.3 Data Visualization 56 (9) Pixel-Oriented Visualization 57 (1) Techniques Geometric Projection Visualization 58 (2) Techniques Icon-Based Visualization 60 (3) Techniques Hierarchical Visualization 63 (1) Techniques Visualizing Complex Data and 64 (1) Relations 2.4 Measuring Data Similarity and 65 (14) Dissimilarity Data Matrix versus Dissimilarity 67 (1) Matrix Proximity Measures for Nominal 68 (2) Attributes Proximity Measures for Binary 70 (2) Attributes

4 2.4.4 Dissimilarity of Numeric Data: 72 (2) Minkowski Distance Proximity Measures for Ordinal 74 (1) Attributes Dissimilarity for Attributes of 75 (2) Mixed Types Cosine Similarity 77 (2) 2.5 Summary 79 (1) 2.6 Exercises 79 (2) 2.7 Bibliographic Notes 81 (2) Chapter 3 Data Preprocessing 83 (42) 3.1 Data Preprocessing: An Overview 84 (4) Data Quality: Why Preprocess the 84 (1) Data? Major Tasks in Data Preprocessing 85 (3) 3.2 Data Cleaning 88 (5) Missing Values 88 (1) Noisy Data 89 (2) Data Cleaning as a Process 91 (2) 3.3 Data Integration 93 (6) Entity Identification Problem 94 (1) Redundancy and Correlation 94 (4) Analysis Tuple Duplication 98 (1) Data Value Conflict Detection and 99 (1) Resolution 3.4 Data Reduction 99 (12) Overview of Data Reduction 99 (1) Strategies Wavelet Transforms 100 (2) Principal Components Analysis 102 (1)

5 3.4.4 Attribute Subset Selection 103 (2) Regression and Log-Linear Models: 105 (1) Parametric Data Reduction Histograms 106 (2) Clustering 108 (1) Sampling 108 (2) Data Cube Aggregation 110 (1) 3.5 Data Transformation and Data 111 (9) Discretization Data Transformation Strategies 112 (1) Overview Data Transformation by 113 (2) Normalization Discretization by Binning 115 (1) Discretization by Histogram 115 (1) Analysis Discretization by Cluster Decision 116 (1) Tree, and Correlation Analyses Concept Hierarchy Generation for 117 (3) Nominal Data 3.6 Summary 120 (1) 3.7 Exercises 121 (2) 3.8 Bibliographic Notes 123 (2) Chapter 4 Data Warehousing and Online 125 (62) Analytical Processing 4.1 Data Warehouse: Basic Concepts 125 (10) What Is a Data Warehouse? 126 (2) Differences between Operational 128 (1) Database Systems and Data Warehouses But, Why Have a Separate Data 129 (1) Warehouse?

6 4.1.4 Data Warehousing: A Multitiered 130 (2) Architecture Data Warehouse Models: 132 (2) Enterprise Warehouse, Data Mart, and Virtual Warehouse Extraction, Transformation, and 134 (1) Loading Metadata Repository 134 (1) 4.2 Data Warehouse Modeling: Data Cube 135 (15) and OLAP Data Cube: A Multidimensional 136 (3) Data Model Stars, Snowflakes, and Fact 139 (3) Constellations: Schemas for Multidimensional Data Models Dimensions: The Role of Concept 142 (2) Hierarchies Measures: Their Categorization 144 (2) and Computation Typical OLAP Operations 146 (3) A Starnet Query Model for 149 (1) Querying Multidimensional Databases 4.3 Data Warehouse Design and Usage 150 (6) A Business Analysis Framework for 150 (1) Data Warehouse Design Data Warehouse Design Process 151 (2) Data Warehouse Usage for 153 (2) Information Processing From Online Analytical Processing 155 (1) to Multidimensional Data Mining 4.4 Data Warehouse Implementation 156 (10)

7 4.4.1 Efficient Data Cube Computation: 156 (4) An Overview Indexing OLAP Data: Bitmap Index 160 (3) and Join Index Efficient Processing of OLAP 163 (1) Queries OLAP Server Architectures: ROLAP 164 (2) versus MOLAP versus HOLAP 4.5 Data Generalization by Attribute- 166 (12) Oriented Induction Attribute-Oriented Induction for 167 (5) Data Characterization Efficient Implementation of 172 (3) Attribute-Oriented Induction Attribute-Oriented Induction for 175 (3) Class Comparisons 4.6 Summary 178 (2) 4.7 Exercises 180 (4) 4.8 Bibliographic Notes 184 (3) Chapter 5 Data Cube Technology 187 (56) 5.1 Data Cube Computation: Preliminary 188 (6) Concepts Cube Materialization: Full Cube, 188 (4) Iceberg Cube, Closed Cube, and Cube Shell General Strategies for Data Cube 192 (2) Computation 5.2 Data Cube Computation Methods 194 (24) Multiway Array Aggregation for 195 (5) Full Cube Computation BUC: Computing Iceberg Cubes 200 (4)

8 from the Apex Cuboid Downward Star-Cubing: Computing Iceberg 204 (6) Cubes Using a Dynamic Star-Tree Structure Precomputing Shell fragments for 210 (8) Fast High-Dimensional OLAP 5.3 Processing Advanced Kinds of Queries 218 (9) by Exploring Cube Technology Sampling Cubes: OLAP-Based 218 (7) Mining on Sampling Data Ranking Cubes: Efficient 225 (2) Computation of Top-k Queries 5.4 Multidimensional Data Analysis in 227 (7) Cube Space Prediction Cubes: Prediction 227 (3) Mining in Cube Space Multifeature Cubes: Complex 230 (1) Aggregation at Multiple Granularities Exception-Based, Discovery-Driven 231 (3) Cube Space Exploration 5.5 Summary 234 (1) 5.6 Exercises 235 (5) 5.7 Bibliographic Notes 240 (3) Chapter 6 Mining Frequent Patterns, 243 (36) Associations, and Correlations: Basic Concepts and Methods 6.1 Basic Concepts 243 (5) Market Basket Analysis: A 244 (2) Motivating Example Frequent Itemsets Closed Itemsets 246 (2) and Association Rules

9 6.2 Frequent Itemset Mining Methods 248 (16) Apnori Algorithm: Finding Frequent 248 (6) Itemsets by Confined Candidate Generation Generating Association Rules from 254 (1) Frequent Itemsets Improving the Efficiency of Apriori 254 (3) A Pattern-Growth Approach for 257 (2) Mining Frequent Itemsets Mining Frequent Itemsets Using 259 (3) Vertical Data Format Mining Closed and Max Patterns 262 (2) 6.3 Which Patterns Are Interesting? (7) Pattern Evaluation Methods Strong Rules Are Not Necessarily 264 (1) Interesting From Association Analysis to 265 (2) Correlation Analysis A Comparison of Pattern 267 (4) Evaluation Measures 6.4 Summary 271 (2) 6.5 Exercises 273 (3) 6.6 Bibliographic Notes 276 (3) Chapter 7 Advanced Pattern Mining 279 (48) 7.1 Pattern Mining: A Road Map 279 (4) 7.2 Pattern Mining in Multilevel, 283 (11) Multidimensional Space Mining Multilevel Associations 283 (4) Mining Multidimensional 287 (2) Associations Mining Quantitative Association 289 (2)

10 Rules Mining Rare Patterns and Negative 291 (3) Patterns 7.3 Constraint-Based Frequent Pattern 294 (7) Mining Metarule-Guided Mining of 295 (1) Association Rules Constraint-Based Pattern 296 (5) Generation: Pruning Pattern Space and Pruning Data Space 7.4 Mining High-Dimensional Data and 301 (6) Colossal Patterns Mining Colossal Patterns by 302 (5) Pattern-Fusion 7.5 Mining Compressed or Approximate 307 (6) Patterns Mining Compressed Patterns by 308 (2) Pattern Clustering Extracting Redundancy-Aware Topk 310 (3) Patterns 7.6 Pattern Exploration and Application 313 (6) Semantic Annotation of Frequent 313 (4) Patterns Applications of Pattern Mining 317 (2) 7.7 Summary 319 (2) 7.8 Exercises 321 (2) 7.9 Bibliographic Notes 323 (4) Chapter 8 Classification: Basic Concepts 327 (66) 8.1 Basic Concepts 327 (3) What Is Classification? 327 (1) General Approach to Classification 328 (2)

11 8.2 Decision Tree Induction 330 (20) Decision Tree Induction 332 (4) Attribute Selection Measures 336 (8) Tree Pruning 344 (3) Scalability and Decision Tree 347 (1) Induction Visual Mining for Decision Tree 348 (2) Induction 8.3 Bayes Classification Methods 350 (5) Bayes' Theorem 350 (1) Naive Bayesian Classification 351 (4) 8.4 Rule-Based Classification 355 (9) Using IF-THEN Rules for 355 (2) Classification Rule Extraction from a Decision 357 (2) Tree Rule Induction Using a Sequential 359 (5) Covering Algorithm 8.5 Model Evaluation and Selection 364 (13) Metrics for Evaluating Classifier 364 (6) Performance Holdout Method and Random 370 (1) Subsampling Cross-Validation 370 (1) Bootstrap 371 (1) Model Selection Using Statistical 372 (1) Tests of Significance Comparing Classifiers Based on 373 (4) Cost-Benefit and ROC Curves 8.6 Techniques to Improve Classification 377 (8) Accuracy

12 8.6.1 Introducing Ensemble Methods 378 (1) Bagging 379 (1) Boosting and AdaBoost 380 (2) Random Forests 382 (1) Improving Classification Accuracy 383 (2) of Class-Imbalanced Data 8.7 Summary 385 (1) 8.8 Exercises 386 (3) 8.9 Bibliographic Notes 389 (4) Chapter 9 Classification: Advanced Methods 393 (50) 9.1 Bayesian Belief Networks 393 (5) Concepts and Mechanisms 394 (2) Training Bayesian Belief Networks 396 (2) 9.2 Classification by Backpropagation 398 (10) A Multilayer Feed-Forward Neural 398 (2) Network Defining a Network Topology 400 (1) Backpropagation 400 (6) Inside the Black Box: 406 (2) Backpropagation and Interpretability 9.3 Support Vector Machines 408 (7) The Case When the Data Are 408 (5) Linearly Separable The Case When the Data Are 413 (2) Linearly Inseparable 9.4 Classification Using Frequent Patterns 415 (7) Associative Classification 416 (3) Discriminative Frequent Pattern- 419 (3) Based Classification 9.5 Lazy Learners (or Learning from Your 422 (4) Neighbors)

13 9.5.1 k-nearest-neighbor Classifiers 423 (2) Case-Based Reasoning 425 (1) 9.6 Other Classification Methods 426 (3) Genetic Algorithms 426 (1) Rough Set Approach 427 (1) Fuzzy Set Approaches 428 (1) 9.7 Additional Topics Regarding 429 (7) Classification Multiclass Classification 430 (2) Semi-Supervised Classification 432 (1) Active Learning 433 (1) Transfer Learning 434 (2) 9.8 Summary 436 (2) 9.9 Exercises 438 (1) 9.10 Bibliographic Notes 439 (4) Chapter 10 Cluster Analysis: Basic Concepts 443 (54) and Methods 10.1 Cluster Analysis 444 (7) What Is Cluster Analysis? 444 (1) Requirements for Cluster Analysis 445 (3) Overview of Basic Clustering 448 (3) Methods 10.2 Partitioning Methods 451 (6) k-means: A Centroid-Based 451 (3) Technique k-medoids: A Representative 454 (3) Object-Based Technique 10.3 Hierarchical Methods 457 (14) Agglomerative versus Divisive 459 (2) Hierarchical Clustering Distance Measures in Algorithmic 461 (1)

14 Methods BIRCH: Multiphase Hierarchical 462 (4) Clustering Using Clustering Feature Trees Chameleon: Multiphase 466 (1) Hierarchical Clustering Using Dynamic Modeling Probabilistic Hierarchical 467 (4) Clustering 10.4 Density-Based Methods 471 (8) DBSCAN: Density-Based 471 (2) Clustering Based on Connected Regions with High Density OPTICS: Ordering Points to 473 (3) Identify the Clustering Structure DENCLUE: Clustering Based on 476 (3) Density Distribution Functions 10.5 Grid-Based Methods 479 (4) STING: STatistical INformation 479 (2) Grid CLIQUE: An Apriori-like Subspace 481 (2) Clustering Method 10.6 Evaluation of Clustering 483 (7) Assessing Clustering Tendency 484 (2) Determining the Number of 486 (1) Clusters Measuring Clustering Quality 487 (3) 10.7 Summary 490 (1) 10.8 Exercises 491 (3) 10.9 Bibliographic Notes 494 (3) Chapter 11 Advanced Cluster Analysis 497 (46)

15 11.1 Probabilistic Model-Based Clustering 497 (11) Fuzzy Clusters 499 (2) Probabilistic Model-Based 501 (4) Clusters Expectation-Maximization 505 (3) Algorithm 11.2 Clustering High-Dimensional Data 508 (14) Clustering High-Dimensional Data: 508 (2) Problems, Challenges, and Major Methodologies Subspace Clustering Methods 510 (2) Biclustering 512 (7) Dimensionality Reduction 519 (3) Methods and Spectral Clustering 11.3 Clustering Graph and Network Data 522 (10) Applications and Challenges 523 (2) Similarity Measures 525 (3) Graph Clustering Methods 528 (4) 11.4 Clustering with Constraints 532 (6) Categorization of Constraints 533 (2) Methods for Clustering with 535 (3) Constraints 11.5 Summary 538 (1) 11.6 Exercises 539 (1) 11.7 Bibliographic Notes 540 (3) Chapter 12 Outlier Detection 543 (42) 12.1 Outliers and Outlier Analysis 544 (5) What Are Outliers? 544 (1) Types of Outliers 545 (3) Challenges of Outlier Detection 548 (1) 12.2 Outlier Detection Methods 549 (4)

16 Supervised, Semi-Supervised, and 549 (2) Unsupervised Methods Statistical Methods, Proximity- 551 (2) Based Methods, and Clustering-Based Methods 12.3 Statistical Approaches 553 (7) Parametric Methods 553 (5) Nonparamertic Methods 558 (2) 12.4 Proximity-Based Approaches 560 (7) Distance-Based Outlier Detection 561 (1) and a Nested Loop Method A Grid-Based Method 562 (2) Density-Based Outlier Detection 564 (3) 12.5 Clustering-Based Approaches 567 (4) 12.6 Classification-Based Approaches 571 (2) 12.7 Mining Contextual and Collective 573 (3) Outliers Transforming Contextual Outlier 573 (1) Detection to Conventional Outlier Detection Modeling Normal Behavior with 574 (1) Respect to Contexts Mining Collective Outliers 575 (1) 12.8 Outlier Detection in High-Dimensional 576 (5) Data Extending Conventional Outlier 577 (1) Detection Finding Outliers in Subspaces 578 (1) Modeling High-Dimensional 579 (2) Outliers 12.9 Summary 581 (1)

17 12.10 Exercises 582 (1) Bibliographic Notes 583 (2) Chapter 13 Data Mining Trends and 585 (48) Research Frontiers 13.1 Mining Complex Data Types 585 (13) Mining Sequence Data: Time- 586 (5) Series, Symbolic Sequences, and Biological Sequences Mining Graphs and Networks 591 (4) Mining Other Kinds of Data 595 (3) 13.2 Other Methodologies of Data Mining 598 (9) Statistical Data Mining 598 (2) Views on Data Mining 600 (2) Foundations Visual and Audio Data Mining 602 (5) 13.3 Data Mining Applications 607 (11) Data Mining for Financial Data 607 (2) Analysis Data Mining for Retail and 609 (2) Telecommunication Industries Data Mining in Science and 611 (3) Engineering Data Mining for Intrusion 614 (1) Detection and Prevention Data Mining and Recommender 615 (3) Systems 13.4 Data Mining and Society 618 (4) Ubiquitous and Invisible Data 618 (2) Mining Privacy, Security, and Social 620 (2) Impacts of Data Mining

18 13.5 Data Mining Trends 622 (3) 13.6 Summary 625 (1) 13.7 Exercises 626 (2) 13.8 Bibliographic Notes 628 (5) Bibliography 633 (40) Index 673

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Contents. Foreword to Second Edition. Acknowledgments About the Authors Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1

More information

DATA WAREHOUING UNIT I

DATA WAREHOUING UNIT I BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SHRI ANGALAMMAN COLLEGE OF ENGINEERING & TECHNOLOGY (An ISO 9001:2008 Certified Institution) SIRUGANOOR,TRICHY-621105. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year / Semester: IV/VII CS1011-DATA

More information

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand

More information

VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur 603203. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : III & VI Section : CSE - 2 Subject Code : IT6702 Subject Name : Data warehousing

More information

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data. Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss

More information

Code No: R Set No. 1

Code No: R Set No. 1 Code No: R05321204 Set No. 1 1. (a) Draw and explain the architecture for on-line analytical mining. (b) Briefly discuss the data warehouse applications. [8+8] 2. Briefly discuss the role of data cube

More information

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE)

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE) SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road 517583 QUESTION BANK (DESCRIPTIVE) Subject with Code : Data Warehousing and Mining (16MC815) Year & Sem: II-MCA & I-Sem Course

More information

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis. www..com www..com Set No.1 1. a) What is data mining? Briefly explain the Knowledge discovery process. b) Explain the three-tier data warehouse architecture. 2. a) With an example, describe any two schema

More information

CT75 (ALCCS) DATA WAREHOUSING AND DATA MINING JUN

CT75 (ALCCS) DATA WAREHOUSING AND DATA MINING JUN Q.1 a. Define a Data warehouse. Compare OLTP and OLAP systems. Data Warehouse: A data warehouse is a subject-oriented, integrated, time-variant, and 2 Non volatile collection of data in support of management

More information

SCHEME OF COURSE WORK. Data Warehousing and Data mining

SCHEME OF COURSE WORK. Data Warehousing and Data mining SCHEME OF COURSE WORK Course Details: Course Title Course Code Program: Specialization: Semester Prerequisites Department of Information Technology Data Warehousing and Data mining : 15CT1132 : B.TECH

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015 Q.1 a. Briefly explain data granularity with the help of example Data Granularity: The single most important aspect and issue of the design of the data warehouse is the issue of granularity. It refers

More information

COMPUTER SCIENCE AND ENGINEERING TUTORIAL QUESTION BANK

COMPUTER SCIENCE AND ENGINEERING TUTORIAL QUESTION BANK COMPUTER SCIENCE AND ENGINEERING TUTORIAL QUESTION BANK Course Name Course Code Class Branch DATA WAREHOUSING AND DATA MINING A70520 IV B. Tech I Semester Computer Science and Engineering Year 2016 2017

More information

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Lectures for the course: Data Warehousing and Data Mining (IT 60107) Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline

More information

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad - 500 043 INFORMATION TECHNOLOGY DEFINITIONS AND TERMINOLOGY Course Name : DATA WAREHOUSING AND DATA MINING Course Code : AIT006 Program

More information

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad -500 043 COMPUTER SCIENCE AND ENGINEERING TUTORIAL QUESTION BANK Course Name Course Code Class Branch DATA WAREHOUSING AND DATA MINING

More information

DEPARTMENT OF INFORMATION TECHNOLOGY IT6702 DATA WAREHOUSING & DATA MINING

DEPARTMENT OF INFORMATION TECHNOLOGY IT6702 DATA WAREHOUSING & DATA MINING DEPARTMENT OF INFORMATION TECHNOLOGY IT6702 DATA WAREHOUSING & DATA MINING UNIT I PART A 1. Define data mining? Data mining refers to extracting or mining" knowledge from large amounts of data and another

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad-500 014 Subject: DATA WAREHOUSING AND DATA MINING Class : IT III TUTORIAL QUESTION BANK PART A (Short Answer Questions) 1 Define data mining? 2

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

Data Mining and Analytics. Introduction

Data Mining and Analytics. Introduction Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data

More information

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4 Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is

More information

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore Data Warehousing Data Mining (17MCA442) 1. GENERAL INFORMATION: PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore 560 100 Department of MCA COURSE INFORMATION SHEET Academic

More information

Answer All Questions. All Questions Carry Equal Marks. Time: 20 Min. Marks: 10.

Answer All Questions. All Questions Carry Equal Marks. Time: 20 Min. Marks: 10. Code No: 126VW Set No. 1 JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD B.Tech. III Year, II Sem., II Mid-Term Examinations, April-2018 DATA WAREHOUSING AND DATA MINING Objective Exam Name: Hall Ticket

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web

More information

1. What are the nine decisions in the design of the data warehouse?

1. What are the nine decisions in the design of the data warehouse? 1. What are the nine decisions in the design of the data warehouse? 1. Choosing the process 2. Choosing the grain 3. Identifying and conforming the dimensions 4. Choosing the facts 5. Storing pre-calculations

More information

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing

More information

Tribhuvan University Institute of Science and Technology MODEL QUESTION

Tribhuvan University Institute of Science and Technology MODEL QUESTION MODEL QUESTION 1. Suppose that a data warehouse for Big University consists of four dimensions: student, course, semester, and instructor, and two measures count and avg-grade. When at the lowest conceptual

More information

IT6702 DATA WAREHOUSING AND DATA MINING TWO MARKS WITH ANSWER UNIT-1 DATA WAREHOUSING

IT6702 DATA WAREHOUSING AND DATA MINING TWO MARKS WITH ANSWER UNIT-1 DATA WAREHOUSING IT6702 DATA WAREHOUSING AND DATA MINING TWO MARKS WITH ANSWER UNIT-1 DATA WAREHOUSING 1. What are the uses of multifeature cubes? (Nov/Dec 2007) multifeature cubes, which compute complex queries involving

More information

UNIT 2 Data Preprocessing

UNIT 2 Data Preprocessing UNIT 2 Data Preprocessing Lecture Topic ********************************************** Lecture 13 Why preprocess the data? Lecture 14 Lecture 15 Lecture 16 Lecture 17 Data cleaning Data integration and

More information

Time: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A.

Time: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A. COPYRIGHT RESERVED End Sem (V) MCA (XXVIII) 2017 Time: 3 hours Full Marks: 70 Candidates are required to give their answers in their own words as far as practicable. The figures in the margin indicate

More information

UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES

UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES Data Pre-processing-Data Cleaning, Integration, Transformation, Reduction, Discretization Concept Hierarchies-Concept Description: Data Generalization And

More information

A Review on Cluster Based Approach in Data Mining

A Review on Cluster Based Approach in Data Mining A Review on Cluster Based Approach in Data Mining M. Vijaya Maheswari PhD Research Scholar, Department of Computer Science Karpagam University Coimbatore, Tamilnadu,India Dr T. Christopher Assistant professor,

More information

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining

More information

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

Data warehouses Decision support The multidimensional model OLAP queries

Data warehouses Decision support The multidimensional model OLAP queries Data warehouses Decision support The multidimensional model OLAP queries Traditional DBMSs are used by organizations for maintaining data to record day to day operations On-line Transaction Processing

More information

2. Data Preprocessing

2. Data Preprocessing 2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459

More information

Elysium Technologies Private Limited::IEEE Final year Project

Elysium Technologies Private Limited::IEEE Final year Project Elysium Technologies Private Limited::IEEE Final year Project - o n t e n t s Data mining Transactions Rule Representation, Interchange, and Reasoning in Distributed, Heterogeneous Environments Defeasible

More information

Classification Algorithms in Data Mining

Classification Algorithms in Data Mining August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

More information

Data Mining Concepts

Data Mining Concepts Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential

More information

Clustering in Data Mining

Clustering in Data Mining Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,

More information

Data Science. Data Analyst. Data Scientist. Data Architect

Data Science. Data Analyst. Data Scientist. Data Architect Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &

More information

3. Data Preprocessing. 3.1 Introduction

3. Data Preprocessing. 3.1 Introduction 3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation

More information

By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad

By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad Data Analytics life cycle Discovery Data preparation Preprocessing requirements data cleaning, data integration, data reduction, data

More information

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different? (Please write your Roll No. immediately) End-Term Examination Fourth Semester [MCA] MAY-JUNE 2006 Roll No. Paper Code: MCA-202 (ID -44202) Subject: Data Warehousing & Data Mining Note: Question no. 1 is

More information

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,

More information

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA Obj ti Objectives Motivation: Why preprocess the Data? Data Preprocessing Techniques Data Cleaning Data Integration and Transformation Data Reduction Data Preprocessing Lecture 3/DMBI/IKI83403T/MTI/UI

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS PART A 1. What are production reporting tools? Give examples. (May/June 2013) Production reporting tools will let companies generate regular operational reports or support high-volume batch jobs. Such

More information

9. Conclusions. 9.1 Definition KDD

9. Conclusions. 9.1 Definition KDD 9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]

More information

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms

More information

Web Information Retrieval

Web Information Retrieval Lucian Blaga University of Sibiu Hermann Oberth Engineering Faculty Computer Science Department Web Information Retrieval First Technical Report PhD title: Data Mining for unstructured data Author: Daniel

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

An Overview of Data Warehousing and OLAP Technology

An Overview of Data Warehousing and OLAP Technology An Overview of Data Warehousing and OLAP Technology CMPT 843 Karanjit Singh Tiwana 1 Intro and Architecture 2 What is Data Warehouse? Subject-oriented, integrated, time varying, non-volatile collection

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2011 Han, Kamber & Pei. All rights

More information

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core) Introduction to Data Science What is Analytics and Data Science? Overview of Data Science and Analytics Why Analytics is is becoming popular now? Application of Analytics in business Analytics Vs Data

More information

Supervised and Unsupervised Learning (II)

Supervised and Unsupervised Learning (II) Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised

More information

Data Preprocessing. Data Mining 1

Data Preprocessing. Data Mining 1 Data Preprocessing Today s real-world databases are highly susceptible to noisy, missing, and inconsistent data due to their typically huge size and their likely origin from multiple, heterogenous sources.

More information

Table of Contents. Rajesh Pandey Page 1

Table of Contents. Rajesh Pandey Page 1 Table of Contents Chapter 1: Introduction to Data Mining and Data Warehousing... 4 1.1 Review of Basic Concepts of Data Mining and Data Warehousing... 4 1.2 Data Mining... 5 1.2.1 Why Data Mining?... 5

More information

Road Map. Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary

Road Map. Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary 2. Data preprocessing Road Map Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary 2 Data types Categorical vs. Numerical Scale types

More information

Name of the lecturer Doç. Dr. Selma Ayşe ÖZEL

Name of the lecturer Doç. Dr. Selma Ayşe ÖZEL Y.L. CENG-541 Information Retrieval Systems MASTER Doç. Dr. Selma Ayşe ÖZEL Information retrieval strategies: vector space model, probabilistic retrieval, language models, inference networks, extended

More information

Contents. List of Figures. List of Tables. List of Algorithms. I Clustering, Data, and Similarity Measures 1

Contents. List of Figures. List of Tables. List of Algorithms. I Clustering, Data, and Similarity Measures 1 Contents List of Figures List of Tables List of Algorithms Preface xiii xv xvii xix I Clustering, Data, and Similarity Measures 1 1 Data Clustering 3 1.1 Definition of Data Clustering... 3 1.2 The Vocabulary

More information

Basic Data Mining Technique

Basic Data Mining Technique Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm

More information

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing. About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This

More information

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY CHARACTERISTICS Data warehouse is a central repository for summarized and integrated data

More information

Chapter 3: Supervised Learning

Chapter 3: Supervised Learning Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example

More information

K236: Basis of Data Science

K236: Basis of Data Science Schedule of K236 K236: Basis of Data Science Lecture 6: Data Preprocessing Lecturer: Tu Bao Ho and Hieu Chi Dam TA: Moharasan Gandhimathi and Nuttapong Sanglerdsinlapachai 1. Introduction to data science

More information

AUTONOMOUS. Department of Computer Science and Engineering

AUTONOMOUS. Department of Computer Science and Engineering AUTONOMOUS Department of Computer Science and Engineering Course Name : DWDM Course Number : Course Designation: Core Prerequisites : DBMS,SQL IV B Tech I Semester (2015-2016) Pallam Ravi/ B.JYOTHI Assistant

More information

Data Preprocessing. Komate AMPHAWAN

Data Preprocessing. Komate AMPHAWAN Data Preprocessing Komate AMPHAWAN 1 Data cleaning (data cleansing) Attempt to fill in missing values, smooth out noise while identifying outliers, and correct inconsistencies in the data. 2 Missing value

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei Data Mining Chapter 1: Introduction Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei 1 Any Question? Just Ask 3 Chapter 1. Introduction Why Data Mining? What Is Data Mining? A Multi-Dimensional

More information

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:- UNIT III: Data Warehouse and OLAP Technology: An Overview : What Is a Data Warehouse? A Multidimensional Data Model, Data Warehouse Architecture, Data Warehouse Implementation, From Data Warehousing to

More information

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4

More information

COURSE PLAN. Computer Science & Engineering

COURSE PLAN. Computer Science & Engineering COURSE PLAN FACULTY DETAILS: Name of the Faculty:: Designation: Department:: Asst. Professor Computer Science & Engineering COURSE DETAILS Name Of The Programme:: Lesson Plan Batch:: 2011-2015 Designation::Assistant

More information

Contents. Part I Setting the Scene

Contents. Part I Setting the Scene Contents Part I Setting the Scene 1 Introduction... 3 1.1 About Mobility Data... 3 1.1.1 Global Positioning System (GPS)... 5 1.1.2 Format of GPS Data... 6 1.1.3 Examples of Trajectory Datasets... 8 1.2

More information

CPSC 340: Machine Learning and Data Mining. Hierarchical Clustering Fall 2017

CPSC 340: Machine Learning and Data Mining. Hierarchical Clustering Fall 2017 CPSC 340: Machine Learning and Data Mining Hierarchical Clustering Fall 2017 Assignment 1 is due Friday. Admin Follow the assignment guidelines naming convention (a1.zip/a1.pdf). Assignment 0 grades posted

More information

Winter Semester 2009/10 Free University of Bozen, Bolzano

Winter Semester 2009/10 Free University of Bozen, Bolzano Data Warehousing and Data Mining Winter Semester 2009/10 Free University of Bozen, Bolzano DW Lecturer: Johann Gamper gamper@inf.unibz.it DM Lecturer: Mouna Kacimi mouna.kacimi@unibz.it http://www.inf.unibz.it/dis/teaching/dwdm/index.html

More information

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10) CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8 Tutorial 3 1 / 8 Overview Non-Parametrics Models Definitions KNN Ensemble Methods Definitions, Examples Random Forests Clustering Definitions, Examples k-means Clustering 2 / 8 Non-Parametrics Models Definitions

More information

CS 521 Data Mining Techniques Instructor: Abdullah Mueen

CS 521 Data Mining Techniques Instructor: Abdullah Mueen CS 521 Data Mining Techniques Instructor: Abdullah Mueen LECTURE 2: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks

More information

Bioinformatics - Lecture 07

Bioinformatics - Lecture 07 Bioinformatics - Lecture 07 Bioinformatics Clusters and networks Martin Saturka http://www.bioplexity.org/lectures/ EBI version 0.4 Creative Commons Attribution-Share Alike 2.5 License Learning on profiles

More information

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,

More information

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA INSIGHTS@SAS: ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA AGENDA 09.00 09.15 Intro 09.15 10.30 Analytics using SAS Enterprise Guide Ellen Lokollo 10.45 12.00 Advanced Analytics using SAS

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data

More information

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 15 Table of contents 1 Introduction 2 Data preprocessing

More information

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple

More information

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection

More information

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 15 Table of contents 1 Introduction 2 Data preprocessing

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information