Introduction to Data Mining

Similar documents
This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

Knowledge Discovery and Data Mining

Winter Semester 2009/10 Free University of Bozen, Bolzano

DATA MINING II - 1DL460

Data Mining and Analytics. Introduction

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Question Bank. 4) It is the source of information later delivered to data marts.

Introduction to Data Mining S L I D E S B Y : S H R E E J A S W A L

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

DATA MINING II - 1DL460

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei

Data mining fundamentals

D B M G Data Base and Data Mining Group of Politecnico di Torino

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

Knowledge Modelling and Management. Part B (9)

Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently New challenges: with a

Data warehouse and Data Mining

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Cse352 Artifficial Intelligence Short Review for Midterm. Professor Anita Wasilewska Computer Science Department Stony Brook University

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Time: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A.

Data Mining Technology Based on Bayesian Network Structure Applied in Learning

1. Inroduction to Data Mininig

Data Mining Course Overview

Clustering Analysis based on Data Mining Applications Xuedong Fan

An Indian Journal FULL PAPER. Trade Science Inc. Research on data mining clustering algorithm in cloud computing environments ABSTRACT KEYWORDS

Answer All Questions. All Questions Carry Equal Marks. Time: 20 Min. Marks: 10.

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE)

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Privacy Overview and Data Mining CSC 301 Spring 2018 Howard Rosenthal

Knowledge Discovery and Data Mining

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore

Chapter 1 Introduction to Data Mining

Data Mining and Knowledge Management Process

Big Data Analytics The Data Mining process. Roger Bohn March. 2016

Dynamic Data in terms of Data Mining Streams

1.1 What Motivated Data Mining? Why Is It Important?

Chapter 1, Introduction

Customer Clustering using RFM analysis

2 CONTENTS

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.

COMP 465 Special Topics: Data Mining

SCHEME OF COURSE WORK. Data Warehousing and Data mining

Dynamic Clustering of Data with Modified K-Means Algorithm

Data Management Glossary

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

Research on Data Mining Technology Based on Business Intelligence. Yang WANG

SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR

DATA WAREHOUING UNIT I

Multivariate discretization of continuous valued attributes.

A Survey on Data Preprocessing Techniques for Bioinformatics and Web Usage Mining

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 1

Machine Learning & Data Mining

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

Data Mining and Data Warehousing Introduction to Data Mining

MetaData for Database Mining

Data Mining Concepts

Table Of Contents: xix Foreword to Second Edition

DATA MINING AND WAREHOUSING

Chapter 4 Data Mining A Short Introduction

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16

A Brief Introduction to Data Mining

Summary. Machine Learning: Introduction. Marcin Sydow

Overview. Data Mining for Business Intelligence. Shmueli, Patel & Bruce

Section A. 1. a) Explain the evolution of information systems into today s complex information ecosystems and its consequences.

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Knowledge Discovery and Data Mining 1 (VO) ( )

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4

Basic Data Mining Technique

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

DATA MINING Introductory and Advanced Topics Part I

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44

Applying Supervised Learning

Parametric Comparisons of Classification Techniques in Data Mining Applications

Incremental Frequent Pattern Mining. Abstract

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing

劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012

ECT7110. Data Preprocessing. Prof. Wai Lam. ECT7110 Data Preprocessing 1

1 (eagle_eye) and Naeem Latif

INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING

Chemometrics. Description of Pirouette Algorithms. Technical Note. Abstract

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395

Nesnelerin İnternetinde Veri Analizi

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394

TIM 50 - Business Information Systems

Transcription:

Introduction to JULY 2011 Afsaneh Yazdani

What motivated? Wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge

What motivated? Data mining can be viewed as a result of the natural evolution of Information Technology.

What motivated? Data mining tools perform data analysis and may uncover important data patterns.

What is data mining? Simply stated, data mining refers to extracting or mining knowledge from large amounts of data.

What is data mining? The process that finds a small set of precious nuggets from a great deal of raw material

What is data mining? Automated technique to extract a previously unknown piece of information from large data-bases.

What is data mining? Automated technique to extract a previously unknown piece of information from large data-bases. It concerns with data which have been collected for another purpose

Knowledge Discovery Process Consists of an iterative sequence of the following steps: 1- Data cleaning (to remove noise and inconsistent data) 2. Data integration (where multiple data sources may be combined) 3. Data selection (where data relevant to the analysis task are retrieved from the database) 4. Data transformation (where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations, for instance)

Knowledge Discovery Process Consists of an iterative sequence of the following steps: 5. Data mining (an essential process where intelligent methods are applied in order to extract data patterns) 6. Pattern evaluation (to identify the truly interesting patterns representing knowledge based on some interesting measures) 7. Knowledge presentation (where visualization and knowledge representation techniques are used to present the mined knowledge to the user)

What is new in data mining? - Very huge data bases - Great use of new technology - Commercial interest - New attractive softwares

as a interdisciplinary development Data mining involves an integration of techniques from multiple disciplines such as: Database and data warehouse technology, statistics, machine learning, high-performance, computing, pattern recognition, neural networks, data visualization, information retrieval, image and signal processing, and spatial or temporal data analysis.

Database Technology Information Science Statistics Data Mining Visualization Machine Learning Other Disciplines

Purpose of data mining: Seeking for Models or Patterns

Purpose of data mining: Seeking for Models or Patterns Models Global summary of relationships between variables which helps to understand phenomenon and allows prediction Patterns Characteristic Structure exhibited by a few number of points

Purpose of data mining: Seeking for Models or Patterns DM models are not usually closed forms Models Global summary of relationships between variables which helps to understand phenomenon and allows prediction Patterns Characteristic Structure exhibited by a few number of points

Functionalities Concept/Class Description: Characterization and Discrimination Data characterization is a summarization of the general characteristics or features of a target class of data. There are several methods for effective data summarization and characterization. Simple data summaries based on statistical measures and plots.

Functionalities Concept/Class Description: Characterization and Discrimination Data discrimination is a comparison of the general features of target class data objects with the general features of objects from one or a set of contrasting classes. Discrimination descriptions expressed in rule form are referred to as discriminant rules.

Functionalities Mining Frequent Patterns, Associations, and Correlations Frequent patterns are patterns that occur frequently in data. There are many kinds of frequent patterns:

Functionalities Mining Frequent Patterns, Associations, and Correlations Frequent patterns are patterns that occur frequently in data. There are many kinds of frequent patterns: A frequent item-set typically refers to a set of items that frequently appear together in a transactional data set, such as milk and bread.

Functionalities Mining Frequent Patterns, Associations, and Correlations Frequent patterns are patterns that occur frequently in data. There are many kinds of frequent patterns: A frequently occurring subsequence, such as the pattern that customers tend to purchase first a PC, followed by a digital camera, and then a memory card, is a (frequent) sequential pattern.

Functionalities Mining Frequent Patterns, Associations, and Correlations Frequent patterns are patterns that occur frequently in data. There are many kinds of frequent patterns: A substructure can refer to different structural forms, such as graphs, trees, or lattices, which may be combined with item-sets or subsequences. If a substructure occurs frequently, it is called a (frequent) structured pattern.

Functionalities Mining Frequent Patterns, Associations, and Correlations Mining frequent patterns leads to the discovery of interesting associations and correlations within data.

Functionalities Classification and Prediction Classification is the process of finding a model (or function) that describes and distinguishes data classes or concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown. Derived model is based on the analysis of a set of training data (i.e., data objects whose class label is known).

Functionalities Classification and Prediction The derived model may be represented in various forms, such as classification IF-THEN rules, decision trees, mathematical formula, or neural networks.

Functionalities Classification and Prediction Decision Tree A decision tree is a flow-chart-like tree structure, where each node denotes a test on an attribute value, each branch represents an outcome of the test, and tree leaves represent classes or class distributions. Decision trees can easily be converted to classification rules.

Functionalities Classification and Prediction Neural Network A neural network, when used for classification, is typically a collection of neuron-like processing units with weighted connections between the units.

Functionalities Clustering Unlike classification and prediction, which analyze classlabeled data objects, clustering analyzes data objects without consulting a known class label. The objects are clustered or grouped based on the principle of maximizing the intra-class similarity and minimizing the interclass similarity. That is, clusters of objects are formed so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters.

Functionalities Outlier Analysis A database may contain data objects that do not comply with the general behavior or model of the data. These data objects are outliers. Rather than using statistical or distance measures, deviation-based methods identify outliers by examining differences in the main characteristics of objects in a group.

Functionalities Evolution Analysis Data evolution analysis describes and models regularities or trends for objects whose behavior changes over time.

Functionalities Which patterns are interesting? A pattern represents knowledge if it is: - easily understood by humans - valid on test data with some degree of certainty - potentially useful, novel, - validates a hunch about which the user was curious.