Table Of Contents: xix Foreword to Second Edition
|
|
- Rafe Lane
- 5 years ago
- Views:
Transcription
1 Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data Mining? 1 (4) Moving toward the Information 1 (1) Age Data Mining as the Evolution of 2 (3) Information Technology 1.2 What Is Data Mining? 5 (3) 1.3 What Kinds of Data Can Be Mined? 8 (7) Database Data 9 (1) Data Warehouses 10 (3) Transactional Data 13 (1) Other Kinds of Data 14 (1) 1.4 What Kinds of Patterns Can Be Mined? 15 (8) Class/Concept Description: 15 (2) Characterization and Discrimination
2 1.4.2 Mining Frequent Patterns, 17 (1) Associations, and Correlations Classification and Regression for 18 (1) Predictive Analysis Cluster Analysis 19 (1) Outlier Analysis 20 (1) Are All Patterns Interesting? 21 (2) 1.5 Which Technologies Are Used? 23 (4) Statistics 23 (1) Machine Learning 24 (2) Database Systems and Data 26 (1) Warehouses Information Retrieval 26 (1) 1.6 Which Kinds of Applications Are 27 (2) Targeted? Business Intelligence 27 (1) Web Search Engines 28 (1) 1.7 Major Issues in Data Mining 29 (4) Mining Methodology 29 (1) User Interaction 30 (1) Efficiency and Scalability 31 (1) Diversity of Database Types 32 (1) Data Mining and Society 32 (1) 1.8 Summary 33 (1) 1.9 Exercises 34 (1) 1.10 Bibliographic Notes 35 (4) Chapter 2 Getting to Know Your Data 39 (44) 2.1 Data Objects and Attribute Types 40 (4) What Is an Attribute? 40 (1) Nominal Attributes 41 (1) Binary Attributes 41 (1)
3 2.1.4 Ordinal Attributes 42 (1) Numeric Attributes 43 (1) Discrete versus Continuous 44 (1) Attributes 2.2 Basic Statistical Descriptions of Data 44 (12) Measuring the Central Tendency: 45 (3) Mean, Median, and Mode Measuring the Dispersion of Data: 48 (3) Range, Quartiles, Variance, Standard Deviation, and Interquartile Range Graphic Displays of Basic Statistical 51 (5) Descriptions of Data 2.3 Data Visualization 56 (9) Pixel-Oriented Visualization 57 (1) Techniques Geometric Projection Visualization 58 (2) Techniques Icon-Based Visualization 60 (3) Techniques Hierarchical Visualization 63 (1) Techniques Visualizing Complex Data and 64 (1) Relations 2.4 Measuring Data Similarity and 65 (14) Dissimilarity Data Matrix versus Dissimilarity 67 (1) Matrix Proximity Measures for Nominal 68 (2) Attributes Proximity Measures for Binary 70 (2) Attributes
4 2.4.4 Dissimilarity of Numeric Data: 72 (2) Minkowski Distance Proximity Measures for Ordinal 74 (1) Attributes Dissimilarity for Attributes of 75 (2) Mixed Types Cosine Similarity 77 (2) 2.5 Summary 79 (1) 2.6 Exercises 79 (2) 2.7 Bibliographic Notes 81 (2) Chapter 3 Data Preprocessing 83 (42) 3.1 Data Preprocessing: An Overview 84 (4) Data Quality: Why Preprocess the 84 (1) Data? Major Tasks in Data Preprocessing 85 (3) 3.2 Data Cleaning 88 (5) Missing Values 88 (1) Noisy Data 89 (2) Data Cleaning as a Process 91 (2) 3.3 Data Integration 93 (6) Entity Identification Problem 94 (1) Redundancy and Correlation 94 (4) Analysis Tuple Duplication 98 (1) Data Value Conflict Detection and 99 (1) Resolution 3.4 Data Reduction 99 (12) Overview of Data Reduction 99 (1) Strategies Wavelet Transforms 100 (2) Principal Components Analysis 102 (1)
5 3.4.4 Attribute Subset Selection 103 (2) Regression and Log-Linear Models: 105 (1) Parametric Data Reduction Histograms 106 (2) Clustering 108 (1) Sampling 108 (2) Data Cube Aggregation 110 (1) 3.5 Data Transformation and Data 111 (9) Discretization Data Transformation Strategies 112 (1) Overview Data Transformation by 113 (2) Normalization Discretization by Binning 115 (1) Discretization by Histogram 115 (1) Analysis Discretization by Cluster Decision 116 (1) Tree, and Correlation Analyses Concept Hierarchy Generation for 117 (3) Nominal Data 3.6 Summary 120 (1) 3.7 Exercises 121 (2) 3.8 Bibliographic Notes 123 (2) Chapter 4 Data Warehousing and Online 125 (62) Analytical Processing 4.1 Data Warehouse: Basic Concepts 125 (10) What Is a Data Warehouse? 126 (2) Differences between Operational 128 (1) Database Systems and Data Warehouses But, Why Have a Separate Data 129 (1) Warehouse?
6 4.1.4 Data Warehousing: A Multitiered 130 (2) Architecture Data Warehouse Models: 132 (2) Enterprise Warehouse, Data Mart, and Virtual Warehouse Extraction, Transformation, and 134 (1) Loading Metadata Repository 134 (1) 4.2 Data Warehouse Modeling: Data Cube 135 (15) and OLAP Data Cube: A Multidimensional 136 (3) Data Model Stars, Snowflakes, and Fact 139 (3) Constellations: Schemas for Multidimensional Data Models Dimensions: The Role of Concept 142 (2) Hierarchies Measures: Their Categorization 144 (2) and Computation Typical OLAP Operations 146 (3) A Starnet Query Model for 149 (1) Querying Multidimensional Databases 4.3 Data Warehouse Design and Usage 150 (6) A Business Analysis Framework for 150 (1) Data Warehouse Design Data Warehouse Design Process 151 (2) Data Warehouse Usage for 153 (2) Information Processing From Online Analytical Processing 155 (1) to Multidimensional Data Mining 4.4 Data Warehouse Implementation 156 (10)
7 4.4.1 Efficient Data Cube Computation: 156 (4) An Overview Indexing OLAP Data: Bitmap Index 160 (3) and Join Index Efficient Processing of OLAP 163 (1) Queries OLAP Server Architectures: ROLAP 164 (2) versus MOLAP versus HOLAP 4.5 Data Generalization by Attribute- 166 (12) Oriented Induction Attribute-Oriented Induction for 167 (5) Data Characterization Efficient Implementation of 172 (3) Attribute-Oriented Induction Attribute-Oriented Induction for 175 (3) Class Comparisons 4.6 Summary 178 (2) 4.7 Exercises 180 (4) 4.8 Bibliographic Notes 184 (3) Chapter 5 Data Cube Technology 187 (56) 5.1 Data Cube Computation: Preliminary 188 (6) Concepts Cube Materialization: Full Cube, 188 (4) Iceberg Cube, Closed Cube, and Cube Shell General Strategies for Data Cube 192 (2) Computation 5.2 Data Cube Computation Methods 194 (24) Multiway Array Aggregation for 195 (5) Full Cube Computation BUC: Computing Iceberg Cubes 200 (4)
8 from the Apex Cuboid Downward Star-Cubing: Computing Iceberg 204 (6) Cubes Using a Dynamic Star-Tree Structure Precomputing Shell fragments for 210 (8) Fast High-Dimensional OLAP 5.3 Processing Advanced Kinds of Queries 218 (9) by Exploring Cube Technology Sampling Cubes: OLAP-Based 218 (7) Mining on Sampling Data Ranking Cubes: Efficient 225 (2) Computation of Top-k Queries 5.4 Multidimensional Data Analysis in 227 (7) Cube Space Prediction Cubes: Prediction 227 (3) Mining in Cube Space Multifeature Cubes: Complex 230 (1) Aggregation at Multiple Granularities Exception-Based, Discovery-Driven 231 (3) Cube Space Exploration 5.5 Summary 234 (1) 5.6 Exercises 235 (5) 5.7 Bibliographic Notes 240 (3) Chapter 6 Mining Frequent Patterns, 243 (36) Associations, and Correlations: Basic Concepts and Methods 6.1 Basic Concepts 243 (5) Market Basket Analysis: A 244 (2) Motivating Example Frequent Itemsets Closed Itemsets 246 (2) and Association Rules
9 6.2 Frequent Itemset Mining Methods 248 (16) Apnori Algorithm: Finding Frequent 248 (6) Itemsets by Confined Candidate Generation Generating Association Rules from 254 (1) Frequent Itemsets Improving the Efficiency of Apriori 254 (3) A Pattern-Growth Approach for 257 (2) Mining Frequent Itemsets Mining Frequent Itemsets Using 259 (3) Vertical Data Format Mining Closed and Max Patterns 262 (2) 6.3 Which Patterns Are Interesting? (7) Pattern Evaluation Methods Strong Rules Are Not Necessarily 264 (1) Interesting From Association Analysis to 265 (2) Correlation Analysis A Comparison of Pattern 267 (4) Evaluation Measures 6.4 Summary 271 (2) 6.5 Exercises 273 (3) 6.6 Bibliographic Notes 276 (3) Chapter 7 Advanced Pattern Mining 279 (48) 7.1 Pattern Mining: A Road Map 279 (4) 7.2 Pattern Mining in Multilevel, 283 (11) Multidimensional Space Mining Multilevel Associations 283 (4) Mining Multidimensional 287 (2) Associations Mining Quantitative Association 289 (2)
10 Rules Mining Rare Patterns and Negative 291 (3) Patterns 7.3 Constraint-Based Frequent Pattern 294 (7) Mining Metarule-Guided Mining of 295 (1) Association Rules Constraint-Based Pattern 296 (5) Generation: Pruning Pattern Space and Pruning Data Space 7.4 Mining High-Dimensional Data and 301 (6) Colossal Patterns Mining Colossal Patterns by 302 (5) Pattern-Fusion 7.5 Mining Compressed or Approximate 307 (6) Patterns Mining Compressed Patterns by 308 (2) Pattern Clustering Extracting Redundancy-Aware Topk 310 (3) Patterns 7.6 Pattern Exploration and Application 313 (6) Semantic Annotation of Frequent 313 (4) Patterns Applications of Pattern Mining 317 (2) 7.7 Summary 319 (2) 7.8 Exercises 321 (2) 7.9 Bibliographic Notes 323 (4) Chapter 8 Classification: Basic Concepts 327 (66) 8.1 Basic Concepts 327 (3) What Is Classification? 327 (1) General Approach to Classification 328 (2)
11 8.2 Decision Tree Induction 330 (20) Decision Tree Induction 332 (4) Attribute Selection Measures 336 (8) Tree Pruning 344 (3) Scalability and Decision Tree 347 (1) Induction Visual Mining for Decision Tree 348 (2) Induction 8.3 Bayes Classification Methods 350 (5) Bayes' Theorem 350 (1) Naive Bayesian Classification 351 (4) 8.4 Rule-Based Classification 355 (9) Using IF-THEN Rules for 355 (2) Classification Rule Extraction from a Decision 357 (2) Tree Rule Induction Using a Sequential 359 (5) Covering Algorithm 8.5 Model Evaluation and Selection 364 (13) Metrics for Evaluating Classifier 364 (6) Performance Holdout Method and Random 370 (1) Subsampling Cross-Validation 370 (1) Bootstrap 371 (1) Model Selection Using Statistical 372 (1) Tests of Significance Comparing Classifiers Based on 373 (4) Cost-Benefit and ROC Curves 8.6 Techniques to Improve Classification 377 (8) Accuracy
12 8.6.1 Introducing Ensemble Methods 378 (1) Bagging 379 (1) Boosting and AdaBoost 380 (2) Random Forests 382 (1) Improving Classification Accuracy 383 (2) of Class-Imbalanced Data 8.7 Summary 385 (1) 8.8 Exercises 386 (3) 8.9 Bibliographic Notes 389 (4) Chapter 9 Classification: Advanced Methods 393 (50) 9.1 Bayesian Belief Networks 393 (5) Concepts and Mechanisms 394 (2) Training Bayesian Belief Networks 396 (2) 9.2 Classification by Backpropagation 398 (10) A Multilayer Feed-Forward Neural 398 (2) Network Defining a Network Topology 400 (1) Backpropagation 400 (6) Inside the Black Box: 406 (2) Backpropagation and Interpretability 9.3 Support Vector Machines 408 (7) The Case When the Data Are 408 (5) Linearly Separable The Case When the Data Are 413 (2) Linearly Inseparable 9.4 Classification Using Frequent Patterns 415 (7) Associative Classification 416 (3) Discriminative Frequent Pattern- 419 (3) Based Classification 9.5 Lazy Learners (or Learning from Your 422 (4) Neighbors)
13 9.5.1 k-nearest-neighbor Classifiers 423 (2) Case-Based Reasoning 425 (1) 9.6 Other Classification Methods 426 (3) Genetic Algorithms 426 (1) Rough Set Approach 427 (1) Fuzzy Set Approaches 428 (1) 9.7 Additional Topics Regarding 429 (7) Classification Multiclass Classification 430 (2) Semi-Supervised Classification 432 (1) Active Learning 433 (1) Transfer Learning 434 (2) 9.8 Summary 436 (2) 9.9 Exercises 438 (1) 9.10 Bibliographic Notes 439 (4) Chapter 10 Cluster Analysis: Basic Concepts 443 (54) and Methods 10.1 Cluster Analysis 444 (7) What Is Cluster Analysis? 444 (1) Requirements for Cluster Analysis 445 (3) Overview of Basic Clustering 448 (3) Methods 10.2 Partitioning Methods 451 (6) k-means: A Centroid-Based 451 (3) Technique k-medoids: A Representative 454 (3) Object-Based Technique 10.3 Hierarchical Methods 457 (14) Agglomerative versus Divisive 459 (2) Hierarchical Clustering Distance Measures in Algorithmic 461 (1)
14 Methods BIRCH: Multiphase Hierarchical 462 (4) Clustering Using Clustering Feature Trees Chameleon: Multiphase 466 (1) Hierarchical Clustering Using Dynamic Modeling Probabilistic Hierarchical 467 (4) Clustering 10.4 Density-Based Methods 471 (8) DBSCAN: Density-Based 471 (2) Clustering Based on Connected Regions with High Density OPTICS: Ordering Points to 473 (3) Identify the Clustering Structure DENCLUE: Clustering Based on 476 (3) Density Distribution Functions 10.5 Grid-Based Methods 479 (4) STING: STatistical INformation 479 (2) Grid CLIQUE: An Apriori-like Subspace 481 (2) Clustering Method 10.6 Evaluation of Clustering 483 (7) Assessing Clustering Tendency 484 (2) Determining the Number of 486 (1) Clusters Measuring Clustering Quality 487 (3) 10.7 Summary 490 (1) 10.8 Exercises 491 (3) 10.9 Bibliographic Notes 494 (3) Chapter 11 Advanced Cluster Analysis 497 (46)
15 11.1 Probabilistic Model-Based Clustering 497 (11) Fuzzy Clusters 499 (2) Probabilistic Model-Based 501 (4) Clusters Expectation-Maximization 505 (3) Algorithm 11.2 Clustering High-Dimensional Data 508 (14) Clustering High-Dimensional Data: 508 (2) Problems, Challenges, and Major Methodologies Subspace Clustering Methods 510 (2) Biclustering 512 (7) Dimensionality Reduction 519 (3) Methods and Spectral Clustering 11.3 Clustering Graph and Network Data 522 (10) Applications and Challenges 523 (2) Similarity Measures 525 (3) Graph Clustering Methods 528 (4) 11.4 Clustering with Constraints 532 (6) Categorization of Constraints 533 (2) Methods for Clustering with 535 (3) Constraints 11.5 Summary 538 (1) 11.6 Exercises 539 (1) 11.7 Bibliographic Notes 540 (3) Chapter 12 Outlier Detection 543 (42) 12.1 Outliers and Outlier Analysis 544 (5) What Are Outliers? 544 (1) Types of Outliers 545 (3) Challenges of Outlier Detection 548 (1) 12.2 Outlier Detection Methods 549 (4)
16 Supervised, Semi-Supervised, and 549 (2) Unsupervised Methods Statistical Methods, Proximity- 551 (2) Based Methods, and Clustering-Based Methods 12.3 Statistical Approaches 553 (7) Parametric Methods 553 (5) Nonparamertic Methods 558 (2) 12.4 Proximity-Based Approaches 560 (7) Distance-Based Outlier Detection 561 (1) and a Nested Loop Method A Grid-Based Method 562 (2) Density-Based Outlier Detection 564 (3) 12.5 Clustering-Based Approaches 567 (4) 12.6 Classification-Based Approaches 571 (2) 12.7 Mining Contextual and Collective 573 (3) Outliers Transforming Contextual Outlier 573 (1) Detection to Conventional Outlier Detection Modeling Normal Behavior with 574 (1) Respect to Contexts Mining Collective Outliers 575 (1) 12.8 Outlier Detection in High-Dimensional 576 (5) Data Extending Conventional Outlier 577 (1) Detection Finding Outliers in Subspaces 578 (1) Modeling High-Dimensional 579 (2) Outliers 12.9 Summary 581 (1)
17 12.10 Exercises 582 (1) Bibliographic Notes 583 (2) Chapter 13 Data Mining Trends and 585 (48) Research Frontiers 13.1 Mining Complex Data Types 585 (13) Mining Sequence Data: Time- 586 (5) Series, Symbolic Sequences, and Biological Sequences Mining Graphs and Networks 591 (4) Mining Other Kinds of Data 595 (3) 13.2 Other Methodologies of Data Mining 598 (9) Statistical Data Mining 598 (2) Views on Data Mining 600 (2) Foundations Visual and Audio Data Mining 602 (5) 13.3 Data Mining Applications 607 (11) Data Mining for Financial Data 607 (2) Analysis Data Mining for Retail and 609 (2) Telecommunication Industries Data Mining in Science and 611 (3) Engineering Data Mining for Intrusion 614 (1) Detection and Prevention Data Mining and Recommender 615 (3) Systems 13.4 Data Mining and Society 618 (4) Ubiquitous and Invisible Data 618 (2) Mining Privacy, Security, and Social 620 (2) Impacts of Data Mining
18 13.5 Data Mining Trends 622 (3) 13.6 Summary 625 (1) 13.7 Exercises 626 (2) 13.8 Bibliographic Notes 628 (5) Bibliography 633 (40) Index 673
Contents. Foreword to Second Edition. Acknowledgments About the Authors
Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1
More informationDATA WAREHOUING UNIT I
BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationDEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SHRI ANGALAMMAN COLLEGE OF ENGINEERING & TECHNOLOGY (An ISO 9001:2008 Certified Institution) SIRUGANOOR,TRICHY-621105. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year / Semester: IV/VII CS1011-DATA
More informationGUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV
GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand
More informationVALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur 603203. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : III & VI Section : CSE - 2 Subject Code : IT6702 Subject Name : Data warehousing
More information2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.
Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss
More informationCode No: R Set No. 1
Code No: R05321204 Set No. 1 1. (a) Draw and explain the architecture for on-line analytical mining. (b) Briefly discuss the data warehouse applications. [8+8] 2. Briefly discuss the role of data cube
More informationSIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE)
SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road 517583 QUESTION BANK (DESCRIPTIVE) Subject with Code : Data Warehousing and Mining (16MC815) Year & Sem: II-MCA & I-Sem Course
More informationR07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.
www..com www..com Set No.1 1. a) What is data mining? Briefly explain the Knowledge discovery process. b) Explain the three-tier data warehouse architecture. 2. a) With an example, describe any two schema
More informationCT75 (ALCCS) DATA WAREHOUSING AND DATA MINING JUN
Q.1 a. Define a Data warehouse. Compare OLTP and OLAP systems. Data Warehouse: A data warehouse is a subject-oriented, integrated, time-variant, and 2 Non volatile collection of data in support of management
More informationSCHEME OF COURSE WORK. Data Warehousing and Data mining
SCHEME OF COURSE WORK Course Details: Course Title Course Code Program: Specialization: Semester Prerequisites Department of Information Technology Data Warehousing and Data mining : 15CT1132 : B.TECH
More informationThis tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.
About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts
More informationCT75 DATA WAREHOUSING AND DATA MINING DEC 2015
Q.1 a. Briefly explain data granularity with the help of example Data Granularity: The single most important aspect and issue of the design of the data warehouse is the issue of granularity. It refers
More informationCOMPUTER SCIENCE AND ENGINEERING TUTORIAL QUESTION BANK
COMPUTER SCIENCE AND ENGINEERING TUTORIAL QUESTION BANK Course Name Course Code Class Branch DATA WAREHOUSING AND DATA MINING A70520 IV B. Tech I Semester Computer Science and Engineering Year 2016 2017
More informationLectures for the course: Data Warehousing and Data Mining (IT 60107)
Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline
More informationINSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad
INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad - 500 043 INFORMATION TECHNOLOGY DEFINITIONS AND TERMINOLOGY Course Name : DATA WAREHOUSING AND DATA MINING Course Code : AIT006 Program
More informationINSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad
INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad -500 043 COMPUTER SCIENCE AND ENGINEERING TUTORIAL QUESTION BANK Course Name Course Code Class Branch DATA WAREHOUSING AND DATA MINING
More informationDEPARTMENT OF INFORMATION TECHNOLOGY IT6702 DATA WAREHOUSING & DATA MINING
DEPARTMENT OF INFORMATION TECHNOLOGY IT6702 DATA WAREHOUSING & DATA MINING UNIT I PART A 1. Define data mining? Data mining refers to extracting or mining" knowledge from large amounts of data and another
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationDATA MINING AND WAREHOUSING
DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making
More informationSt.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad
St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad-500 014 Subject: DATA WAREHOUSING AND DATA MINING Class : IT III TUTORIAL QUESTION BANK PART A (Short Answer Questions) 1 Define data mining? 2
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationData Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking
More informationIntroduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.
Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How
More informationData Mining and Analytics. Introduction
Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data
More informationSummary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4
Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is
More informationPESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore
Data Warehousing Data Mining (17MCA442) 1. GENERAL INFORMATION: PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore 560 100 Department of MCA COURSE INFORMATION SHEET Academic
More informationAnswer All Questions. All Questions Carry Equal Marks. Time: 20 Min. Marks: 10.
Code No: 126VW Set No. 1 JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD B.Tech. III Year, II Sem., II Mid-Term Examinations, April-2018 DATA WAREHOUSING AND DATA MINING Objective Exam Name: Hall Ticket
More informationBing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web
More information1. What are the nine decisions in the design of the data warehouse?
1. What are the nine decisions in the design of the data warehouse? 1. Choosing the process 2. Choosing the grain 3. Identifying and conforming the dimensions 4. Choosing the facts 5. Storing pre-calculations
More informationData Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality
Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing
More informationTribhuvan University Institute of Science and Technology MODEL QUESTION
MODEL QUESTION 1. Suppose that a data warehouse for Big University consists of four dimensions: student, course, semester, and instructor, and two measures count and avg-grade. When at the lowest conceptual
More informationIT6702 DATA WAREHOUSING AND DATA MINING TWO MARKS WITH ANSWER UNIT-1 DATA WAREHOUSING
IT6702 DATA WAREHOUSING AND DATA MINING TWO MARKS WITH ANSWER UNIT-1 DATA WAREHOUSING 1. What are the uses of multifeature cubes? (Nov/Dec 2007) multifeature cubes, which compute complex queries involving
More informationUNIT 2 Data Preprocessing
UNIT 2 Data Preprocessing Lecture Topic ********************************************** Lecture 13 Why preprocess the data? Lecture 14 Lecture 15 Lecture 16 Lecture 17 Data cleaning Data integration and
More informationTime: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A.
COPYRIGHT RESERVED End Sem (V) MCA (XXVIII) 2017 Time: 3 hours Full Marks: 70 Candidates are required to give their answers in their own words as far as practicable. The figures in the margin indicate
More informationUNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES
UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES Data Pre-processing-Data Cleaning, Integration, Transformation, Reduction, Discretization Concept Hierarchies-Concept Description: Data Generalization And
More informationA Review on Cluster Based Approach in Data Mining
A Review on Cluster Based Approach in Data Mining M. Vijaya Maheswari PhD Research Scholar, Department of Computer Science Karpagam University Coimbatore, Tamilnadu,India Dr T. Christopher Assistant professor,
More informationCse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining
More informationPreprocessing Short Lecture Notes cse352. Professor Anita Wasilewska
Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationData warehouses Decision support The multidimensional model OLAP queries
Data warehouses Decision support The multidimensional model OLAP queries Traditional DBMSs are used by organizations for maintaining data to record day to day operations On-line Transaction Processing
More information2. Data Preprocessing
2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459
More informationElysium Technologies Private Limited::IEEE Final year Project
Elysium Technologies Private Limited::IEEE Final year Project - o n t e n t s Data mining Transactions Rule Representation, Interchange, and Reasoning in Distributed, Heterogeneous Environments Defeasible
More informationClassification Algorithms in Data Mining
August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms
More informationData Mining Concepts
Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential
More informationClustering in Data Mining
Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,
More informationData Science. Data Analyst. Data Scientist. Data Architect
Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &
More information3. Data Preprocessing. 3.1 Introduction
3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation
More informationBy Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad
By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad Data Analytics life cycle Discovery Data preparation Preprocessing requirements data cleaning, data integration, data reduction, data
More informationUNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?
(Please write your Roll No. immediately) End-Term Examination Fourth Semester [MCA] MAY-JUNE 2006 Roll No. Paper Code: MCA-202 (ID -44202) Subject: Data Warehousing & Data Mining Note: Question no. 1 is
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationData Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA
Obj ti Objectives Motivation: Why preprocess the Data? Data Preprocessing Techniques Data Cleaning Data Integration and Transformation Data Reduction Data Preprocessing Lecture 3/DMBI/IKI83403T/MTI/UI
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationIT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS
PART A 1. What are production reporting tools? Give examples. (May/June 2013) Production reporting tools will let companies generate regular operational reports or support high-volume batch jobs. Such
More information9. Conclusions. 9.1 Definition KDD
9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]
More informationChapter 28. Outline. Definitions of Data Mining. Data Mining Concepts
Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms
More informationWeb Information Retrieval
Lucian Blaga University of Sibiu Hermann Oberth Engineering Faculty Computer Science Department Web Information Retrieval First Technical Report PhD title: Data Mining for unstructured data Author: Daniel
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationAn Overview of Data Warehousing and OLAP Technology
An Overview of Data Warehousing and OLAP Technology CMPT 843 Karanjit Singh Tiwana 1 Intro and Architecture 2 What is Data Warehouse? Subject-oriented, integrated, time varying, non-volatile collection
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 3
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2011 Han, Kamber & Pei. All rights
More informationIntroduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)
Introduction to Data Science What is Analytics and Data Science? Overview of Data Science and Analytics Why Analytics is is becoming popular now? Application of Analytics in business Analytics Vs Data
More informationSupervised and Unsupervised Learning (II)
Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised
More informationData Preprocessing. Data Mining 1
Data Preprocessing Today s real-world databases are highly susceptible to noisy, missing, and inconsistent data due to their typically huge size and their likely origin from multiple, heterogenous sources.
More informationTable of Contents. Rajesh Pandey Page 1
Table of Contents Chapter 1: Introduction to Data Mining and Data Warehousing... 4 1.1 Review of Basic Concepts of Data Mining and Data Warehousing... 4 1.2 Data Mining... 5 1.2.1 Why Data Mining?... 5
More informationRoad Map. Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary
2. Data preprocessing Road Map Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary 2 Data types Categorical vs. Numerical Scale types
More informationName of the lecturer Doç. Dr. Selma Ayşe ÖZEL
Y.L. CENG-541 Information Retrieval Systems MASTER Doç. Dr. Selma Ayşe ÖZEL Information retrieval strategies: vector space model, probabilistic retrieval, language models, inference networks, extended
More informationContents. List of Figures. List of Tables. List of Algorithms. I Clustering, Data, and Similarity Measures 1
Contents List of Figures List of Tables List of Algorithms Preface xiii xv xvii xix I Clustering, Data, and Similarity Measures 1 1 Data Clustering 3 1.1 Definition of Data Clustering... 3 1.2 The Vocabulary
More informationBasic Data Mining Technique
Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm
More informationThis tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.
About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This
More informationDATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY
DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY CHARACTERISTICS Data warehouse is a central repository for summarized and integrated data
More informationChapter 3: Supervised Learning
Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example
More informationK236: Basis of Data Science
Schedule of K236 K236: Basis of Data Science Lecture 6: Data Preprocessing Lecturer: Tu Bao Ho and Hieu Chi Dam TA: Moharasan Gandhimathi and Nuttapong Sanglerdsinlapachai 1. Introduction to data science
More informationAUTONOMOUS. Department of Computer Science and Engineering
AUTONOMOUS Department of Computer Science and Engineering Course Name : DWDM Course Number : Course Designation: Core Prerequisites : DBMS,SQL IV B Tech I Semester (2015-2016) Pallam Ravi/ B.JYOTHI Assistant
More informationData Preprocessing. Komate AMPHAWAN
Data Preprocessing Komate AMPHAWAN 1 Data cleaning (data cleansing) Attempt to fill in missing values, smooth out noise while identifying outliers, and correct inconsistencies in the data. 2 Missing value
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationData Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei
Data Mining Chapter 1: Introduction Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei 1 Any Question? Just Ask 3 Chapter 1. Introduction Why Data Mining? What Is Data Mining? A Multi-Dimensional
More informationDatabase design View Access patterns Need for separate data warehouse:- A multidimensional data model:-
UNIT III: Data Warehouse and OLAP Technology: An Overview : What Is a Data Warehouse? A Multidimensional Data Model, Data Warehouse Architecture, Data Warehouse Implementation, From Data Warehousing to
More informationContents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation
Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4
More informationCOURSE PLAN. Computer Science & Engineering
COURSE PLAN FACULTY DETAILS: Name of the Faculty:: Designation: Department:: Asst. Professor Computer Science & Engineering COURSE DETAILS Name Of The Programme:: Lesson Plan Batch:: 2011-2015 Designation::Assistant
More informationContents. Part I Setting the Scene
Contents Part I Setting the Scene 1 Introduction... 3 1.1 About Mobility Data... 3 1.1.1 Global Positioning System (GPS)... 5 1.1.2 Format of GPS Data... 6 1.1.3 Examples of Trajectory Datasets... 8 1.2
More informationCPSC 340: Machine Learning and Data Mining. Hierarchical Clustering Fall 2017
CPSC 340: Machine Learning and Data Mining Hierarchical Clustering Fall 2017 Assignment 1 is due Friday. Admin Follow the assignment guidelines naming convention (a1.zip/a1.pdf). Assignment 0 grades posted
More informationWinter Semester 2009/10 Free University of Bozen, Bolzano
Data Warehousing and Data Mining Winter Semester 2009/10 Free University of Bozen, Bolzano DW Lecturer: Johann Gamper gamper@inf.unibz.it DM Lecturer: Mouna Kacimi mouna.kacimi@unibz.it http://www.inf.unibz.it/dis/teaching/dwdm/index.html
More informationCMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)
CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationOverview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8
Tutorial 3 1 / 8 Overview Non-Parametrics Models Definitions KNN Ensemble Methods Definitions, Examples Random Forests Clustering Definitions, Examples k-means Clustering 2 / 8 Non-Parametrics Models Definitions
More informationCS 521 Data Mining Techniques Instructor: Abdullah Mueen
CS 521 Data Mining Techniques Instructor: Abdullah Mueen LECTURE 2: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks
More informationBioinformatics - Lecture 07
Bioinformatics - Lecture 07 Bioinformatics Clusters and networks Martin Saturka http://www.bioplexity.org/lectures/ EBI version 0.4 Creative Commons Attribution-Share Alike 2.5 License Learning on profiles
More informationCS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University
CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,
More informationADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA
INSIGHTS@SAS: ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA AGENDA 09.00 09.15 Intro 09.15 10.30 Analytics using SAS Enterprise Guide Ellen Lokollo 10.45 12.00 Advanced Analytics using SAS
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data
More informationData Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 15 Table of contents 1 Introduction 2 Data preprocessing
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationUnsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection
More informationData Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 15 Table of contents 1 Introduction 2 Data preprocessing
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More information