Knowledge Discovery in Data Bases

Similar documents
Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently New challenges: with a

Introduction to Data Mining and Data Analytics

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University

COMP 465 Special Topics: Data Mining

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei

Knowledge Discovery and Data Mining

Association Rule Mining. Entscheidungsunterstützungssysteme

DATA WAREHOUSING AND MINING UNIT-V TWO MARK QUESTIONS WITH ANSWERS

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA

Chapter 1, Introduction

Question Bank. 4) It is the source of information later delivered to data marts.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

An Improved Apriori Algorithm for Association Rules

Time: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A.

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014

Data warehouse and Data Mining

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22

1. Inroduction to Data Mininig

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?

Introduction to Data Mining S L I D E S B Y : S H R E E J A S W A L

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

D B M G Data Base and Data Mining Group of Politecnico di Torino

Data Mining Course Overview

Chapter 4 Data Mining A Short Introduction

Data mining fundamentals

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

Data Mining and Warehousing

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

DATA MINING II - 1DL460

International Journal of Advance Engineering and Research Development. A Survey on Data Mining Methods and its Applications

DATA MINING II - 1DL460

Chapter 3: Data Mining:

Data Mining: Concepts and Techniques

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Knowledge Discovery & Data Mining

Understanding Rule Behavior through Apriori Algorithm over Social Network Data

Winter Semester 2009/10 Free University of Bozen, Bolzano

ISM 50 - Business Information Systems

Incremental Frequent Pattern Mining. Abstract

Chapter 1 Introduction to Data Mining

KNOWLEDGE DISCOVERY AND DATA MINING

UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University

Data Mining. Vera Goebel. Department of Informatics, University of Oslo

CS570 Introduction to Data Mining

CSE 626: Data mining. Instructor: Sargur N. Srihari. Phone: , ext. 113

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Introduction to Data Mining CS 584 Data Mining (Fall 2016)

Elena Marchiori Free University Amsterdam, Faculty of Science, Department of Mathematics and Computer Science, Amsterdam, The Netherlands

Data Mining & Machine Learning F2.4DN1/F2.9DM1

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Summary of Last Class. Course Content. Chapter 1 Objectives

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

Overview of Web Mining Techniques and its Application towards Web

Product presentations can be more intelligently planned

Mining Association Rules with Item Constraints. Ramakrishnan Srikant and Quoc Vu and Rakesh Agrawal. IBM Almaden Research Center

Frequent Pattern Mining

Foundation of Data Mining: Introduction

Applications and Trends in Data Mining

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

9. Conclusions. 9.1 Definition KDD

Study on the Application Analysis and Future Development of Data Mining Technology

DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE

Performance Improvements. IBM Almaden Research Center. Abstract. The problem of mining sequential patterns was recently introduced

INTRODUCTION TO DATA MINING

An Introduction to Data Mining BY:GAGAN DEEP KAUSHAL

Building Data Mining Application for Customer Relationship Management

cse643 Data Mining Professor Anita Wasilewska Computer Science Department Stony Brook University

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 1

Basic Data Mining Technique

SCHEME OF COURSE WORK. Data Warehousing and Data mining

COMP90049 Knowledge Technologies

DATA MINING TRANSACTION

Data Mining Concepts

Machine Learning: Symbolische Ansätze

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Association Rule Mining and Clustering

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

BCB 713 Module Spring 2011

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

Association Rule Discovery

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

Nesnelerin İnternetinde Veri Analizi

DATA WAREHOUING UNIT I

Research on Data Mining and Statistical Analysis Xiaoyao Lu1, a

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application

Foundations of Business Intelligence: Databases and Information Management

Gurpreet Kaur 1, Naveen Aggarwal 2 1,2

Table Of Contents: xix Foreword to Second Edition

Applying big data analytics in practice

ANU MLSS 2010: Data Mining. Part 2: Association rule mining

Transcription:

Knowledge Discovery in Data Bases Chien-Chung Chan Department of CS University of Akron Akron, OH 44325-4003 2/24/99 1

Why KDD? We are drowning in information, but starving for knowledge John Naisbett Growing Gap between Data Generation and Data Understanding: Automation of business activities: Telephone calls, credit card charges, medical tests, etc. Wal-Mart databases: over 20 millions of transactions per day [Babcock, 1994] Scientific and government databases: Earth observation satellites: Estimated will generate one terabyte (10 15 bytes) of data per day. At a rate of one picture per second. Biology: Human Genome database project has collected over gigabytes of data on the human genetic code [Fasman, Cuticchia, Kingsbury, 1994. US Census data: NASA databases: World Wide Web: 2/24/99 2

What is KDD? KDD refers to the overall process of finding and interpreting useful patterns from data. Data mining refers to the application of algorithms for extracting patterns from data without the additional steps of the KDD process. 2/24/99 3

Steps of the KDD Process [Fayyad et al, 1996] 1. Selection: Learning the application domain: Creating a target dataset: 2. PreProcessing: Data cleaning and preprocessing: 3. Transformation: Data reduction and projection: 4. Data Mining: Choosing the functions and algorithms of data mining: Association rules, classification rules, clustering rules 5. Interpretation and Evaluation: Validate and verify discovered patterns: 6. Using discovered knowledge: 2/24/99 4

Related Fields Statistics Model Evaluation and Validation Machine learning and pattern recognition Symbolic and neural network learning Databases Data warehousing OLAP (online analytical processing) Artificial intelligence Knowledge representation Data visualization Report generation 2/24/99 5

Typical Data Mining Tasks 1. Finding Association Rules [Rakesh Agrawal et al, 1993] Each transaction is a set of items. Given a set of transactions, an association rule is of the form X Y where X and Y are sets of items. Intuitive meaning: transactions of the database which contain X tends to contain Y. e.g.: 30% of transactions that contain beer also contain diapers; 2% of all transactions contain both of these items. Here 30% is called the confidence of the rule, 2% is called the support of the rule. Problem: Find all association rules that satisfied user-specified minimum support and minimum confidence constraints. Potential Problem: Too many rules may be generated. Using an interesting measure to prune redundant rules. Applications: Market basket analysis and cross-marketing Catalog design Store layout Buying patterns 2/24/99 6

2. Finding Sequential Patterns Each data sequence is a list of transactions. Find all sequential patterns with a user-specified minimum support. e.g.: Consider a book-club database A sequential pattern might be 5% of customers bought Foundation, then Foundation and Empire, and then Second Foundation. Applications: Add-on sales Customer satisfaction Identify symptoms/diseases that precede certain diseases 3. Finding Classification Rules Finding discriminant rules for objects of different classes. Approaches: Finding Decision Trees Finding Production Rules Applications: Process loans and credit cards applications Model identification 2/24/99 7

Information Systems An information system is a pair (U, A) Data Tables 2/24/99 8

KDD Products IBM Intelligent Miner To support data-intensive decision-support applications. The Quest Data Mining System [Rakesh Agrawal et al, 1996], IBM Almaden Research Center: Tasks: Association rules Sequential patterns Time-series clustering Classification Incremental mining Goals: Discover patterns in very large databases, rather than simply verify that a pattern exists Have a completeness property that guarantees that all patterns of certain types have been discovered Have high performance and near-linear scaling on very large (multiple gigabytes) real-life databases 2/24/99 9

DBMiner [Han et al, Simon Fraser University] Interactive mining of multiple-level knowledge in relational databases. Tasks: Generalization Characterization Association Classification Prediction 2/24/99 10

Some Challenges in KDD Limitations of collected data [G. Wiederhold, 1996] 1. Limited breadth and coverage: e.g.: they cover only the population that has a certain credit card. 2. Limited depth: essential variables may be missing. e.g.: in most clinical studies no data on diet is available, although for many medical problems food and drink are significant factors. 3. Data mining based on correlation alone can lead to nonsensical results. e.g.: Drinking diet drinks leads to obesity. [P. Bonissone] Heterogeneous Data Formats DataBases Relational Object-Oriented Web Sites and Pages Flat Files (File Systems) Combining Multiple Data Sources Reasoning with uncertainty KDDMS(Knowledge and Data Discovery Management System) [Imielinski, 1996] KDD Query Languages 2/24/99 11