CSE 626: Data mining. Instructor: Sargur N. Srihari. Phone: , ext. 113

Similar documents
Principles of Data Mining

INTRODUCTION TO DATA MINING

Group A: Assignment No 2

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44

Elena Marchiori Free University Amsterdam, Faculty of Science, Department of Mathematics and Computer Science, Amsterdam, The Netherlands

Data-Mining of State Transportation Agencies Projects Databases

Dynamic Data in terms of Data Mining Streams

Machine Learning & Data Mining

Fall Principles of Knowledge Discovery in Databases. University of Alberta

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE

Structured Query Language (SQL) lab syllabus 4 th science. SQL is used to communicate with a database. it is the standard language for relational

Statistical Learning and Data Mining CS 363D/ SSC 358

11. Introduction to SQL

1. Inroduction to Data Mininig

Data Mining Course Overview

DATA MINING OF NS-2 TRACE FILE

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University

Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently New challenges: with a

Knowledge Discovery in Data Bases

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

COMP 465 Special Topics: Data Mining

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Introduction to Data Mining and Data Analytics

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Foundation of Data Mining: Introduction

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

SQL. A C P K Siriwardhana MSc, BSc in Computer Science FIRST COURSE

BIG DATA SCIENTIST Certification. Big Data Scientist

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University

9. Conclusions. 9.1 Definition KDD

Based on Big Data: Hype or Hallelujah? by Elena Baralis

Data Mining An Overview ITEV, F /18

TIM 50 - Business Information Systems

3 Data, Data Mining. Chengkai Li

SK International Journal of Multidisciplinary Research Hub Research Article / Survey Paper / Case Study Published By: SK Publisher

IJMIE Volume 2, Issue 9 ISSN:

International Journal of Advance Engineering and Research Development. A Survey on Data Mining Methods and its Applications

COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES

D B M G Data Base and Data Mining Group of Politecnico di Torino

COMP 6838 Data MIning

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

Enhancing Cluster Quality by Using User Browsing Time

Ajloun National University

International Journal of Advanced Research in Computer Science and Software Engineering

A Program demonstrating Gini Index Classification

Efficient Distributed Data Mining using Intelligent Agents

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

DIAGDATA: A TOOL FOR GENERATION OF FUZZY INFERENCE SYSTEM

INTEGRATING DATA MINING TECHNIQUES WITH INTRUSION DETECTION METHODS

Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

Analysis of a Population of Diabetic Patients Databases in Weka Tool P.Yasodha, M. Kannan

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

TIM 50 - Business Information Systems

Data Mining Technology Based on Bayesian Network Structure Applied in Learning

Winter Semester 2009/10 Free University of Bozen, Bolzano

Course no: CSC- 451 Full Marks: Credit hours: 3 Pass Marks: Nature of course: Theory (3 Hrs.) + Lab (3 Hrs.)

Enhancing Cluster Quality by Using User Browsing Time

The Fuzzy Search for Association Rules with Interestingness Measure

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 4 Data Mining A Short Introduction

Data Mining in the Application of E-Commerce Website

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

A Performance Assessment on Various Data mining Tool Using Support Vector Machine

Chapter 1: Text Mining Overview 1.1 Introduction

Lesson Plan. Discipline : Computer Sc. & Engineering

An Improved Apriori Algorithm for Association Rules

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA

Review on Data Mining Techniques for Intrusion Detection System

CS570: Introduction to Data Mining

Research on Data Mining Technology Based on Business Intelligence. Yang WANG

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.7

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014

Application of Clustering as a Data Mining Tool in Bp systolic diastolic

K-Mean Clustering Algorithm Implemented To E-Banking

Data mining fundamentals

COMP519: Web Programming Autumn 2015

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Keywords Fuzzy, Set Theory, KDD, Data Base, Transformed Database.

Overview of Web Mining Techniques and its Application towards Web

Data Mining. Vera Goebel. Department of Informatics, University of Oslo

Dynamic Clustering of Data with Modified K-Means Algorithm

Introduction to Data Mining

Knowledge Discovery and Data Mining

BIG DATA SCIENCE PROFESSIONAL Certification. Big Data Science Professional

Implementing and Maintaining Microsoft SQL Server 2005 Analysis Services

Database Management Systems CS Spring 2017

Data Imbalance Problem solving for SMOTE Based Oversampling: Study on Fault Detection Prediction Model in Semiconductor Manufacturing Process

Design and Realization of Data Mining System based on Web HE Defu1, a

Reliable Data Mining Tasks and Techniques for Industrial Applications

Big Data Specialized Studies

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 1

Advanced Data Mining Techniques

Clustering of Data with Mixed Attributes based on Unified Similarity Metric

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

CSE-4412: Data Mining

DEPARTMENT OF COMPUTER APPLICATIONS B.C.A. - THIRD YEAR ( REGULATION) SIXTH SEMESTER LESSON PLAN SRM INSTITUTE OF SCIENCE AND TECHNOLOGY

Transcription:

CSE 626: Data mining Instructor: Sargur N. Srihari E-mail: srihari@cedar.buffalo.edu Phone: 645-6164, ext. 113 1

What is Data Mining? Different perspectives: CSE, Business, IT As a field of research in CSE: Science of extracting useful information from large data sets or databases Also known as Knowledge Discovery and Data Mining (KDD) Knowledge Discovery in Databases (KDD) 2

Data Mining Definitions 1. Analysis of datasets to find unsuspected relationships 2. Summarize data in novel ways that are understandable useful to data owner 3. Extraction of knowledge from data non-trivial extraction of implicit, previously unknown & potentially useful knowledge from data 4. Process of discovering patterns: automatically or semi-automatically, in large quantities of data Patterns discovered must be useful: meaningful in that they lead to some advantage, usually economic 3

Why Data Mining? 1. Large datasets are common: due to advances in digital data acquisition and storage technology Business Supermarket transactions Credit card usage records Telephone call details Government statistics Scientific Images of astronomical bodies Molecular databases Medical records International organizations produce more information in a week than many people could read in a lifetime 2. Automatic data production leads to need for automatic data consumption 3. Large databases mean vast amounts of information 4. Difficulty lies in accessing it 4

KDD is a multidisciplinary field Information Retrieval Machine Learning Pattern Recognition Database KDD Statistics Visualization Artificial Intelligence Expert Systems 5

Terminology for Data Structured Data Training Set Unstructured Data Information Retrieval Machine Learning Pattern Recognition Records Database KDD Statistics Sample Table Visualization Artificial Intelligence Expert Systems Data Points Instances 6

Course Textbook Hand, David, Heikki Mannila, and Padhraic Smyth, Principles of Data Mining, MIT Press 2001. Approach: Fundamental principles Emphasis on Theory and Algorithms Many other textbooks: Emphasize business applications, case studies 7

Many Other Textbooks 1. Han and Kamber, Data Mining Concepts and Techniques, Morgan Kaufmann, 2000 (Data Base Perspective) 2. Witten, I. H., and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, 2000. (Machine Learning Perspective) 3. Adriaans, P., and D. Zantinge, Data Mining, Addison- Wesley,1998. (Layman Perspective) 4. Groth, R., Data Mining: A Hands-on Approach for Business Professionals, Prentice-Hall PTR,1997. (Business Perspective) 5. Kennedy, R., Y. Lee, et al., Solving Data Mining Problems through Pattern Recognition, Prentice-Hall PTR, 1998. (Pattern Recognition Perspective) 6. Weiss, S., and N. Indurkhya, Predictive Data Mining: A Practical Guide, Morgan Kaufmann, 1998. (Statistical Perspective) 8

More Data Mining Textbooks 7. S.Chakrabarti, Mining the web, Morgan Kaufman, 2003 (Emphasis on webpages and hyperlinks) 8 T. Dasu and T. Johnson, Exploratory Data Mining and Data Cleaning, Wiley, 2003 (Focus on data quality) 9. K. Cios, W. Pedrycz and R. Swiniarski, Data Mining Methods for Knowledge Discovery,Kluwer, 1998,(Focus on Mathematical issues, e.g., rough sets) 10. M. Kantardzic, Data Mining: Concepts, Models and Algorithms, IEEE-Wiley, 2003 (Focus on Machine Learning) 11. A. K. Pujari, Data Mining Techniques, Universities Press, 2001,(Data Base Perspective) 12. R. Groth, Data Mining: A hands-on approach for business professionals, Prentice Hall, 1998 (Business user perspective including software CD) 9

Data Mining vs Statistics Objective of data mining exercise plays no role in data collection strategy In this way it differs from much of statistics For this reason, data mining is referred to as secondary data analysis KDD more complicated than initially thought 80% preparing data 20% mining data 10

Query: Data Base vs Data Mining Data Base: When you know exactly what you are looking for Query Tool: SQL (Structured Query Language) example Table called Persons LastName FirstName Address City Hansen Ola Timoteivn 10 Sandnes Svendson Tove Borgvn 23 Sandnes Pettersen Kari Storgt 20 Stavanger Query: SELECT LastName FROM Persons results in LastName Hansen Svendson Pettersen Data Mining: When you only vaguely know what you are looking for 11

Data Mining Tasks and Techniques Not so much a single technique Idea that there is more knowledge hidden in the data than shows itself on the surface Any technique that helps to extract more out of data is useful Five major task types: 1. Exploratory Data Analysis (Visualization) Model 2. Descriptive Modeling (Density estimation, Clustering) building 3. Predictive Modeling (Classification and Regression) 4. Discovering Patterns and Rules (Association rules) 5. Retrieval by Content (Retrieve items similar to pattern of interest) 12

Topics in Data Mining 1. Fundamentals Nature of Data Measurement Summarizing and Visualization (includes PCA) Uncertainty and Inference 2. Data Mining Components Models Score Functions Optimization and Search 3. Data Mining Tasks and Algorithms Density Estimation and Clustering Classification (decision trees, neural networks, genetic algorithms) Regression Pattern Discovery (association rules) Retrieval by Content (includes Image Retrieval and Text Analytics) 13