Chapter 1, Introduction

Similar documents
Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 1

Code No: R Set No. 1

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

SCHEME OF COURSE WORK. Data Warehousing and Data mining

DATA WAREHOUING UNIT I

Table Of Contents: xix Foreword to Second Edition

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.

Contents. Foreword to Second Edition. Acknowledgments About the Authors

CSE5243 INTRO. TO DATA MINING

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE)

Introduction. IST557 Data Mining: Techniques and Applications. Jessie Li, Penn State University

Data mining fundamentals

COMP 465 Special Topics: Data Mining

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei

D B M G Data Base and Data Mining Group of Politecnico di Torino

CSE-4412: Data Mining

Introduction to Data Mining S L I D E S B Y : S H R E E J A S W A L

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

Question Bank. 4) It is the source of information later delivered to data marts.

Time: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A.

COURSE PLAN. Computer Science & Engineering

Data warehouse and Data Mining


Contents. Preface to the Second Edition

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

Concepts and Techniques. Data Mining: Slides related to: University of Illinois at Urbana-Champaign

Knowledge Discovery and Data Mining

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

CS249: ADVANCED DATA MINING

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore

9. Conclusions. 9.1 Definition KDD

Data Mining and Data Warehousing Introduction to Data Mining

Data Mining: Dynamic Past and Promising Future

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

Tribhuvan University Institute of Science and Technology MODEL QUESTION

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Winter Semester 2009/10 Free University of Bozen, Bolzano

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

CS 412 Intro. to Data Mining

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

Using Association Rules for Better Treatment of Missing Values

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

1. Inroduction to Data Mininig

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

Data Mining: Concepts and Techniques

Data Mining Course Overview

Data Mining Concepts

Data Mining and Analytics. Introduction

Chapter 3: Data Mining:

DATA MINING TRANSACTION

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA

Product presentations can be more intelligently planned

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

Data Mining Jay Urbain, PhD. Credits: Nazli Goharian, Jiawei Han, Micheline Kamber, and Jian Pei

Data Preprocessing. Slides by: Shree Jaswal

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Data Set. What is Data Mining? Data Mining (Big Data Analytics) Illustrative Applications. What is Knowledge Discovery?

DATA MINING II - 1DL460

Parametric Comparisons of Classification Techniques in Data Mining Applications

Knowledge Discovery in Data Bases

CS570 Introduction to Data Mining

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

Big Data Analytics The Data Mining process. Roger Bohn March. 2016

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

Overview of Web Mining Techniques and its Application towards Web

DATA MINING II - 1DL460

SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER

DATA MINING AND WAREHOUSING

INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING

An Overview of Data Warehousing and OLAP Technology

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Interestingness Measurements

A REVIEW ON VARIOUS APPROACHES OF CLUSTERING IN DATA MINING

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

Chapter 5, Data Cube Computation

Gene Clustering & Classification

DATA MINING RESEARCH: RETROSPECT AND PROSPECT

Introduction to Data Mining. Lijun Zhang

Data warehousing and Phases used in Internet Mining Jitender Ahlawat 1, Joni Birla 2, Mohit Yadav 3

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

An Improved Apriori Algorithm for Association Rules

Chapter 4, Data Warehouse and OLAP Operations

Machine Learning - Clustering. CS102 Fall 2017

Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials *

Domestic electricity consumption analysis using data mining techniques

Keyword Extraction by KNN considering Similarity among Features

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Detection and Deletion of Outliers from Large Datasets

Transcription:

CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from big Data (KDD) Informative pattern extraction from data General View of Data Mining Process Input Data Pre- Process Data Mining Post- Process Output Knowledge Cleaning, Normalization, Integration Pattern Mining, Classification, Clustering Evaluation, Interpretation, Visualization 1

Data Mining in Computer Science Alternative View of Data Mining Process pattern knowledge task-relevant data data warehouse database Data Mining in Business Intelligence Increasing potential to support business decisions Decision Making Data Presentation Interpretation, Visualization Data Mining Knowledge (Pattern) Discovery Data Exploration Selection, Summarization, Transformation Data Preprocessing & Integration Cleaning, Normalization, Warehousing Data Sources CEO Business Analyst Data Analyst DBA 2

Why Need Data Mining? Business Business data analysis for decision support Market analysis and management Risk analysis and management Fraud detection and security Science and Engineering Biomedical data analysis Patient treatment, disease diagnosis and drug discovery WWW data analysis Information retrieval and web management Geographic data analysis City planning and renewal Why Not Traditional Data Analysis? Explosive Growth of Data Terabytes or petabytes of data High Dimensionality of Data Hundreds or thousands of dimensions High Complexity of Data Stream data, sensor data Time-series data, temporal data Spatial data, spatio-temporal data, multimedia data Structural data, graphic data Combined, heterogeneous data format Data mining algorithms should handle these data!! 3

Data Mining Functions: (1) Generalization Data Cleaning and Reduction Statistical normalization methods Sampling and discretizing techniques Data Integration and Warehousing Multidimensional data modeling Dimension reduction techniques Data cube aggregation algorithms Data Transformation OLAP (online analytical process) operations Querying for selection and summarization Data Mining Functions: (2) Pattern Mining Frequent Pattern Mining Mining frequently occurred item-sets Mining frequently occurred sequential patterns Mining frequently occurred structural patterns (sub-graphs) Association Rule Mining Mining one-direction relations between two sets of data Correlation Mining Mining two-direction relations between two sets of data Coherent Pattern Mining Mining coherent sequential patterns Mining coherent structural patterns 4

Data Mining Functions: (3) Classification Supervised Learning Training data with class labels Prediction of classes of data with no class labels Typical Methods Decision tree-based induction Naïve bayesian classification K-nearest neighbors (knn) Support vector machine (SVM) Neural network Logistic regression Rule-based classification Pattern-based classification Data Mining Functions: (4) Clustering Unsupervised Learning Grouping data with no class labels Prediction of potential members with same class labels Typical Methods K-means Agglomerative hierarchical clustering Divisive hierarchical clustering Density-based clustering Grid-based clustering Pattern-based clustering Outlier analysis 5

Origins of Data Mining Inter-disciplinary Research Field Data Mining History of Data Mining 1991 ~ 1994 Workshops on Knowledge Discovery in Databases 1995 ~ current International Conference on Knowledge Discovery and Data Mining Sponsored by ACM from 1998 1993 Market basket problem ( Agrawal et al., ACM SIGMOD Conference ) 1994 Apriori algorithm ( Agrawal and Srikant, VLDB Conference ) 6

Conferences and Journals on Data Mining Conferences ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD) IEEE Int. Conf. on Data Mining (ICDM) ACM Conf. on Information and Knowledge Management (CIKM) SIAM Int. Conf. on Data Mining (SDM) European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD) Pacific-Asian Conf. on Knowledge Discovery and Data Mining (PAKDD) Journals Data Mining and Knowledge Discovery (DMKD) by Springer IEEE Transactions on Knowledge and Data Engineering (TKDE) ACM Transactions on Knowledge Discovery from Data (TKDD) Questions? Lecture Slides on the Course Website, www.ecs.baylor.edu/faculty/cho/4352 7