DATA MINING II - 1DL460

Size: px
Start display at page:

Download "DATA MINING II - 1DL460"

Transcription

1 DATA MINING II - 1DL460 Spring 2012 A second course in data mining!! Kjell Orsborn! Uppsala Database Laboratory! Department of Information Technology, Uppsala University,! Uppsala, Sweden 5/18/12 1

2 Kjell Orsborn, lecturer, examiner Personell phone: , room: 1321 (house 1, floor 3)! Tore Risch, lecturer phone , room 1353 (house 1, floor 3)! Erik Zeitler, lecturer phone , room 1320 (house 1, floor 3)! Andrej Andrejev, course assistant, phone: , room 1306 (house 1, floor 3)! Lars Melander, course assistant lars.melander.it.uu.se, phone: , room 1316 (house 1, floor 3) 5/18/12 2

3 Preliminary course contents Lecture topics:! Course intro - overview of topics in data mining 2 Web mining Search engines Sequential association analysis Alt. association analysis Visual data exploration Cluster validation Advanced clustering methods: Chamelon, Cure Birch (SNN, Rock, Jarvis-Patrick) Alternative classification techniques: Naïve Bayes Support vector machines Ensemble methods Outlier detection Stream data mining Privacy preserving data mining 5/18/12 3

4 Course contents continued Assignments:! Assignment 1 Web mining HITS! Assignment 2 Implementation of Association Rule Mining! Assignment 3 Implementation of scalable DBSCAN (Possible alternative to A3 term paper on various dm topics) 5/18/12 4

5 Examination Written examination grade 3, 4 and 5 Assignments all 3 assignments should be passed with a passing grade 5/18/12 5

6 Introduction to Data Mining II (Tan, Steinbach, Kumar ch. 1) Kjell Orsborn! Department of Information Technology Uppsala University, Uppsala, Sweden 5/18/12 6

7 Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions, (Simoudis, 1996). Involves the analysis of data and the use of software techniques for finding hidden and unexpected patterns and relationships in sets of data; in contrast to information and knowledge that are already intuitive.! Patterns and relationships are identified by examining the underlying rules and features in the data.! Tends to work from the data up and most accurate results normally require large volumes of data to deliver reliable conclusions.! Data mining can provide huge paybacks for companies who have made a significant investment in data warehousing.! Relatively new technology, however already used in a number of industries. 5/18/12 7

8 Historic view of data mining Han et al, /18/12 8

9 The data mining process Data cleaning (to remove noise and inconsistent data) Data integration (where multiple data sources may be combined) Data selection (where data relevant to the analysis task are retrieved from the database) Data transformation (where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations) Data mining (an essential process where intelligent! methods are applied in order to extract data patterns) Pattern evaluation (to identify the truly! interesting patterns representing! knowledge based on some! interestingness measures) Knowledge presentation (where! visualization and knowledge! representation techniques! are used to present the! mined knowledge! to the user) Cleaning & Integration! Selection & Transformation! Data Warehouse Data Mining! Evaluation & Presentation! 1 Knowledge! Patterns Database Database Database File File File 5/18/12 9

10 Why data mining? There was 5 exabytes of information created between the dawn of civilization through 2003, Schmidt said, but that much information is now created every 2 days, and the pace is increasing...people aren't ready for the technology revolution that's going to happen to them...! (Eric Schmidt, Google)! 5/18/12 10

11 Why data mining? The explosive growth of data: from terabytes, through petabytes, to exabytes Data collection from automated data collection tools, database systems, web, e-commerce, transactions, stocks, remote sensing, bioinformatics, scientific simulation, computerized society, news, digital cameras, Human analysts may take weeks to discover useful information Much 4,000,000 of the data is never analyzed at all Total new disk (TB) since ,500,000 3,000,000 2,500,000 The Data Gap! From: R. Grossman, C. Kamath, 2,000,000 1,500,000 V. Kumar, Data Mining for Scientific and Engineering Applications 1,000, ,000 Number of analysts /18/12 11

12 Why mine data (commercial viewpoint)? Lots of data is being collected! and warehoused Web data, e-commerce purchases at department/! grocery stores Bank & credit card! transactions! Computers have become cheaper and more powerful! Competitive pressure is strong Provide better, customized services for an edge (e.g. in Customer Relationship Management) 5/18/12 12

13 Why mine data (scientific viewpoint)? Data collected and stored at! enormous speeds (GB/hour) remote sensors on a satellite telescopes scanning the skies microarrays generating gene! expression data scientific simulations! generating terabytes of data! Traditional techniques infeasible for raw data! Data mining may help scientists in classifying and segmenting data in hypothesis formation 5/18/12 13

14 Why not traditional data analysis? Tremendous amount of data Algorithms must be highly scalable to handle such as tera-bytes of data High-dimensionality of data Micro-array may have tens of thousands of dimensions High complexity of data Data streams and sensor data Time-series data, temporal data, sequence data Structure data, graphs, social networks and multi-linked data Heterogeneous databases and legacy databases Spatial, spatiotemporal, multimedia, text and Web data Software programs, scientific simulations New and sophisticated applications 5/18/12 14

15 Data mining tasks Prediction methods Use some variables to predict unknown or future values of other variables. Description methods Find human-interpretable patterns that describe the data. From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, /18/12 15

16 Classification - definition Given a collection of records (training set) Each record contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes. Goal: previously unseen records should be assigned a class as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. 5/18/12 16

17 Clustering - definition Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that Data points in one cluster are more similar to one another. Data points in separate clusters are less similar to one another. Similarity Measures: Euclidean distance if attributes are continuous. Other problem-specific measures. 5/18/12 17

18 Association rule discovery - definition Given a set of records each of which contain some number of items from a given collection; Produce dependency rules which will predict occurrence of an item based on occurrences of other items. TID Items 1 Bread, Coke, Milk 2 Beer, Bread 3 Beer, Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk 5 Coke, Diaper, Milk Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer} 5/18/12 18

19 Sequential pattern discovery definition Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events.!!! (A B) (C) (D E) Rules are formed by first disovering patterns. Event occurrences in the patterns are governed by timing constraints. (A B) (C) (D E) <= xg >ng <= ws <= ms 5/18/12 19

20 Deviation or anomaly detection Detect significant deviations from normal behavior Applications: Credit Card Fraud Detection Network Intrusion Detection Typical network traffic at University level may reach over 100 million connections per day 5/18/12 20

21 Challenges of data mining Scalability Dimensionality Complex and heterogeneous data Data quality Data ownership and distribution Privacy preservation Streaming data 5/18/12 21

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Spring 2016 A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt16 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 1 1 Acknowledgement Several Slides in this presentation are taken from course slides provided by Han and Kimber (Data Mining Concepts and Techniques) and Tan,

More information

INTRODUCTION TO DATA MINING

INTRODUCTION TO DATA MINING INTRODUCTION TO DATA MINING 1 Chiara Renso KDDLab - ISTI CNR, Italy http://www-kdd.isti.cnr.it email: chiara.renso@isti.cnr.it Knowledge Discovery and Data Mining Laboratory, ISTI National Research Council,

More information

Statistical Learning and Data Mining CS 363D/ SSC 358

Statistical Learning and Data Mining CS 363D/ SSC 358 Statistical Learning and Data Mining CS 363D/ SSC 358! Lecture: Introduction Pradeep Ravikumar pradeepr@cs.utexas.edu What is this course about (in 1 minute) Big Data Data Mining, Statistical Learning

More information

An Introduction to Data Mining BY:GAGAN DEEP KAUSHAL

An Introduction to Data Mining BY:GAGAN DEEP KAUSHAL An Introduction to Data Mining BY:GAGAN DEEP KAUSHAL Trends leading to Data Flood More data is generated: Bank, telecom, other business transactions... Scientific Data: astronomy, biology, etc Web, text,

More information

D B M G Data Base and Data Mining Group of Politecnico di Torino

D B M G Data Base and Data Mining Group of Politecnico di Torino DataBase and Data Mining Group of Data mining fundamentals Data Base and Data Mining Group of Data analysis Most companies own huge databases containing operational data textual documents experiment results

More information

Data mining fundamentals

Data mining fundamentals Data mining fundamentals Elena Baralis Politecnico di Torino Data analysis Most companies own huge bases containing operational textual documents experiment results These bases are a potential source of

More information

Foundation of Data Mining: Introduction

Foundation of Data Mining: Introduction Foundation of Data Mining: Introduction Hillol Kargupta CSEE Department, UMBC hillol@cs.umbc.edu ITE 342, (410) 455-3972 www.cs.umbc.edu/~hillol Acknowledgement: Tan, Steinbach, and Kumar provided some

More information

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Data Mining Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused Web data, e-commerce

More information

Data Mining Course Overview

Data Mining Course Overview Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical

More information

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei Data Mining Chapter 1: Introduction Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei 1 Any Question? Just Ask 3 Chapter 1. Introduction Why Data Mining? What Is Data Mining? A Multi-Dimensional

More information

Data Mining: Introduction. Lecture Notes for Chapter 1. Introduction to Data Mining

Data Mining: Introduction. Lecture Notes for Chapter 1. Introduction to Data Mining Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Why Mine Data? Commercial Viewpoint

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

CS246: Mining Massive Datasets Jure Leskovec, Stanford University. CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2 Instructor: Jure Leskovec TAs: Aditya Parameswaran Bahman Bahmani Peyman Kazemian 3 Course website: http://cs246.stanford.edu

More information

Stats Overview Ji Zhu, Michigan Statistics 1. Overview. Ji Zhu 445C West Hall

Stats Overview Ji Zhu, Michigan Statistics 1. Overview. Ji Zhu 445C West Hall Stats 415 - Overview Ji Zhu, Michigan Statistics 1 Overview Ji Zhu 445C West Hall 734-936-2577 jizhu@umich.edu Stats 415 - Overview Ji Zhu, Michigan Statistics 2 What is Data Mining? Data mining is a multi-disciplinary

More information

Data Mining Concept. References. Why Mine Data? Commercial Viewpoint. Why Mine Data? Scientific Viewpoint

Data Mining Concept. References. Why Mine Data? Commercial Viewpoint. Why Mine Data? Scientific Viewpoint References Discovering Knowledge in Data Daniel T Larose, 2005 Data Mining Concept Data Mining: Concepts and Techniques, 2nd Edition, 2005 Micheline Kamber, Jiawei Han Data Mining: Practical Machine Learning

More information

COMP90049 Knowledge Technologies

COMP90049 Knowledge Technologies COMP90049 Knowledge Technologies Data Mining (Lecture Set 3) 2017 Rao Kotagiri Department of Computing and Information Systems The Melbourne School of Engineering Some of slides are derived from Prof Vipin

More information

INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING

INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING CS 7265 BIG DATA ANALYTICS INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, PhD Computer Science,

More information

COMP 465 Special Topics: Data Mining

COMP 465 Special Topics: Data Mining COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,

More information

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University CS423: Data Mining Introduction Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS423: Data Mining 1 / 29 Quote of the day Never memorize something that

More information

Introduction to Data Mining CS 584 Data Mining (Fall 2016)

Introduction to Data Mining CS 584 Data Mining (Fall 2016) Introduction to Data Mining CS 584 Data Mining (Fall 2016) Huzefa Rangwala AssociateProfessor, Computer Science George Mason University Email: rangwala@cs.gmu.edu Website: www.cs.gmu.edu/~hrangwal Slides

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to JULY 2011 Afsaneh Yazdani What motivated? Wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge What motivated? Data

More information

Introduction to Data Mining. Komate AMPHAWAN

Introduction to Data Mining. Komate AMPHAWAN Introduction to Data Mining Komate AMPHAWAN 1 Data mining(1970s) = Knowledge Discovery in (very large) Databases : KDD Automatically Find hidden patterns and hidden association from raw data by using computer(s).

More information

CISC 4631 Data Mining Lecture 01:

CISC 4631 Data Mining Lecture 01: CISC 4631 Data Mining Lecture 01: Introduction to Data Mining 1 Let s Start By Seeing What You Know Quick Quiz Do you know what Data Mining is? Do you know of any examples of Data Mining? 2 What is Data

More information

Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently New challenges: with a

Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently New challenges: with a Data Mining and Information Retrieval Introduction to Data Mining Why Data Mining? Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently

More information

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,

More information

Introduction to Data Mining S L I D E S B Y : S H R E E J A S W A L

Introduction to Data Mining S L I D E S B Y : S H R E E J A S W A L Introduction to Data Mining S L I D E S B Y : S H R E E J A S W A L Books 2 Which Chapter from which Text Book? Chapter 1: Introduction from Han, Kamber, "Data Mining Concepts and Techniques", Morgan Kaufmann

More information

Includes Review of Syllabus OVERVIEW OF THE CLASS

Includes Review of Syllabus OVERVIEW OF THE CLASS Includes Review of Syllabus OVERVIEW OF THE CLASS What is this class about? This class will introduce data mining The types of problems that can be addressed The methods that can be used Focus will be

More information

Winter Semester 2009/10 Free University of Bozen, Bolzano

Winter Semester 2009/10 Free University of Bozen, Bolzano Data Warehousing and Data Mining Winter Semester 2009/10 Free University of Bozen, Bolzano DW Lecturer: Johann Gamper gamper@inf.unibz.it DM Lecturer: Mouna Kacimi mouna.kacimi@unibz.it http://www.inf.unibz.it/dis/teaching/dwdm/index.html

More information

Question Bank. 4) It is the source of information later delivered to data marts.

Question Bank. 4) It is the source of information later delivered to data marts. Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

More information

DATABASE DESIGN I - 1DL300

DATABASE DESIGN I - 1DL300 DATABASE DESIGN I - 1DL300 Fall 2009 An introductury course on database systems http://user.it.uu.se/~udbl/dbt-ht2009/ alt. http://www.it.uu.se/edu/course/homepage/dbastekn/ht09/ Kjell Orsborn Uppsala

More information

Knowledge Discovery & Data Mining

Knowledge Discovery & Data Mining Announcements ISM 50 - Business Information Systems Lecture 17 Instructor: Magdalini Eirinaki UC Santa Cruz May 29, 2007 News Folio #3 DUE Thursday 5/31 Database Assignment DUE Tuesday 6/5 Business Paper

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 1

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 1 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 1 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights

More information

ISM 50 - Business Information Systems

ISM 50 - Business Information Systems ISM 50 - Business Information Systems Lecture 17 Instructor: Magdalini Eirinaki UC Santa Cruz May 29, 2007 Announcements News Folio #3 DUE Thursday 5/31 Database Assignment DUE Tuesday 6/5 Business Paper

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

COMP 6838 Data MIning

COMP 6838 Data MIning COMP 6838 Data MIning LECTURE 1: Introduction Dr. Edgar Acuna Departmento de Matematicas Universidad de Puerto Rico- Mayaguez math.uprm.edu/~edgar 1 Course s Objectives Understand the basic concepts to

More information

DATA MINING LECTURE 1. Introduction

DATA MINING LECTURE 1. Introduction DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is still no unique answer to this question. A tentative definition: Data mining is the use of efficient techniques

More information

1. Inroduction to Data Mininig

1. Inroduction to Data Mininig 1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the

More information

CSE-4412: Data Mining

CSE-4412: Data Mining CSE-4412: Data Mining Welcome! Parke Godfrey www.cse.yorku.ca/course/4412/ January 9, 2007 Data Mining: Concepts and Techniques 1 Chapter 1. Introduction Why is data mining needed? What is data mining?

More information

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44 Data Mining Piotr Paszek piotr.paszek@us.edu.pl Introduction (Piotr Paszek) Data Mining DM KDD 1 / 44 Plan of the lecture 1 Data Mining (DM) 2 Knowledge Discovery in Databases (KDD) 3 CRISP-DM 4 DM software

More information

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore Data Warehousing Data Mining (17MCA442) 1. GENERAL INFORMATION: PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore 560 100 Department of MCA COURSE INFORMATION SHEET Academic

More information

DATA MINING II - 1DL460. Spring 2014"

DATA MINING II - 1DL460. Spring 2014 DATA MINING II - 1DL460 Spring 2014" A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt14 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Data Mining & Machine Learning

Data Mining & Machine Learning Data Mining & Machine Learning Dino Pedreschi & Anna Monreale Dipartimento di Infomatica Tutor: Riccardo Guidotti, Dipartimento di Informatica DIPARTIMENTO DI INFORMATICA - Università di Pisa Data Mining

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 02 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

Knowledge Discovery in Data Bases

Knowledge Discovery in Data Bases Knowledge Discovery in Data Bases Chien-Chung Chan Department of CS University of Akron Akron, OH 44325-4003 2/24/99 1 Why KDD? We are drowning in information, but starving for knowledge John Naisbett

More information

Data Mining and Warehousing

Data Mining and Warehousing Data Mining and Warehousing Sangeetha K V I st MCA Adhiyamaan College of Engineering, Hosur-635109. E-mail:veerasangee1989@gmail.com Rajeshwari P I st MCA Adhiyamaan College of Engineering, Hosur-635109.

More information

Data Mining and Data Warehousing Introduction to Data Mining

Data Mining and Data Warehousing Introduction to Data Mining Data Mining and Data Warehousing Introduction to Data Mining Quiz Easy Q1. Which of the following is a data warehouse? a. Can be updated by end users. b. Contains numerous naming conventions and formats.

More information

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA Knowledge Discovery Javier Béjar URL - Spring 2019 CS - MIA Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics

More information

CSE5243 INTRO. TO DATA MINING

CSE5243 INTRO. TO DATA MINING CSE5243 INTRO. TO DATA MINING Chapter 1. Introduction Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han CSE 5243. Course Page & Schedule Class Homepage:

More information

Big Data Analytics The Data Mining process. Roger Bohn March. 2016

Big Data Analytics The Data Mining process. Roger Bohn March. 2016 1 Big Data Analytics The Data Mining process Roger Bohn March. 2016 Office hours HK thursday5 to 6 in the library 3115 If trouble, email or Slack private message. RB Wed. 2 to 3:30 in my office Some material

More information

Chapter 3: Data Mining:

Chapter 3: Data Mining: Chapter 3: Data Mining: 3.1 What is Data Mining? Data Mining is the process of automatically discovering useful information in large repository. Why do we need Data mining? Conventional database systems

More information

Data Mining & Feature Selection

Data Mining & Feature Selection دااگشنه رتبيت م عل م Data Mining & Feature Selection M.M. Pedram pedram@tmu.ac.ir Faculty of Engineering, Tarbiat Moallem University The 11 th Iranian Confernce on Fuzzy systems, 5-7 July, 2011 Contents

More information

INTRODUCTION TO DATA MINING ASSOCIATION RULES. Luiza Antonie

INTRODUCTION TO DATA MINING ASSOCIATION RULES. Luiza Antonie INTRODUCTION TO DATA MINING ASSOCIATION RULES Luiza Antonie Luiza Antonie, PhD WHO AM I? PDF on Record Linkage Department of Finance and Economics, University of Guelph Email: lantonie@uoguelph.ca Website:

More information

Chapter 4 Data Mining A Short Introduction

Chapter 4 Data Mining A Short Introduction Chapter 4 Data Mining A Short Introduction Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining 3. Clustering 4. Classification Data Mining - 2 2 1. Data Mining Overview

More information

DATA MINING - 1DL105, 1DL111

DATA MINING - 1DL105, 1DL111 1 DATA MINING - 1DL105, 1DL111 Fall 2007 An introductory class in data mining http://user.it.uu.se/~udbl/dut-ht2007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database

More information

BIG DATA TESTING: A UNIFIED VIEW

BIG DATA TESTING: A UNIFIED VIEW http://core.ecu.edu/strg BIG DATA TESTING: A UNIFIED VIEW BY NAM THAI ECU, Computer Science Department, March 16, 2016 2/30 PRESENTATION CONTENT 1. Overview of Big Data A. 5 V s of Big Data B. Data generation

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING 1: Introduction Instructor: Yizhou Sun yzsun@cs.ucla.edu (Instructor for Today s class: Ting Chen) April 9, 2017 Course Information Course homepage: http://web.cs.ucla.edu/~yzsun/classes/2017spr

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Spring 2016 A second course in data mining!! http://www.it.uu.se/edu/course/homepage/infoutv2/vt16 Kjell Orsborn! Uppsala Database Laboratory! Department of Information Technology,

More information

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery Javier Béjar cbea URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics

More information

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SHRI ANGALAMMAN COLLEGE OF ENGINEERING & TECHNOLOGY (An ISO 9001:2008 Certified Institution) SIRUGANOOR,TRICHY-621105. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year / Semester: IV/VII CS1011-DATA

More information

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What

More information

DATA MINING LECTURE 1. Introduction

DATA MINING LECTURE 1. Introduction DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is still no unique answer to this question. A tentative definition: Data mining is the use of efficient techniques

More information

DATA MINING LECTURE 1. Introduction

DATA MINING LECTURE 1. Introduction DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is still no unique answer to this question. A tentative definition: Data mining is the use of efficient techniques

More information

Code No: R Set No. 1

Code No: R Set No. 1 Code No: R05321204 Set No. 1 1. (a) Draw and explain the architecture for on-line analytical mining. (b) Briefly discuss the data warehouse applications. [8+8] 2. Briefly discuss the role of data cube

More information

9. Conclusions. 9.1 Definition KDD

9. Conclusions. 9.1 Definition KDD 9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]

More information

Data Mining & Data Warehouse

Data Mining & Data Warehouse Data Mining & Data Warehouse Associate Professor Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology (1) 2016 2017 1 Points to Cover Why Do We Need Data Warehouses?

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 1, Number 1 (2015), pp. 25-31 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

DATABASTEKNIK - 1DL116

DATABASTEKNIK - 1DL116 1 DATABASTEKNIK - 1DL116 Fall 2003 An introductury course on database systems http://user.it.uu.se/~udbl/dbt-ht2003/ Kjell Orsborn Uppsala Database Laboratory Department of Information Technology, Uppsala

More information

DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE

DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE Dr. Kirti Singh, Librarian, SSD Women s Institute of Technology, Bathinda Abstract: Major libraries have large collections and circulation. Managing

More information

Extended R-Tree Indexing Structure for Ensemble Stream Data Classification

Extended R-Tree Indexing Structure for Ensemble Stream Data Classification Extended R-Tree Indexing Structure for Ensemble Stream Data Classification P. Sravanthi M.Tech Student, Department of CSE KMM Institute of Technology and Sciences Tirupati, India J. S. Ananda Kumar Assistant

More information

Overview of Web Mining Techniques and its Application towards Web

Overview of Web Mining Techniques and its Application towards Web Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous

More information

Data warehouse and Data Mining

Data warehouse and Data Mining Data warehouse and Data Mining Lecture No. 14 Data Mining and its techniques Naeem A. Mahoto Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

Chapter 1 Introduction to Data Mining

Chapter 1 Introduction to Data Mining 1.1 Introduction to Data Mining Chapter 1 Introduction to Data Mining Data mining refers to the process of extracting or mining knowledge from ample amounts of data (Hand, et al., 2000). It is the process

More information

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 The Survey Of Data Mining And Warehousing Architha.S, A.Kishore Kumar Department of Computer Engineering Department of computer engineering city engineering college VTU Bangalore, India ABSTRACT: Data

More information

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged.

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged. Frequent itemset Association&decision rule mining University of Szeged What frequent itemsets could be used for? Features/observations frequently co-occurring in some database can gain us useful insights

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

745: Advanced Database Systems

745: Advanced Database Systems 745: Advanced Database Systems Yanlei Diao University of Massachusetts Amherst Outline Overview of course topics Course requirements Database Management Systems 1. Online Analytical Processing (OLAP) vs.

More information

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

What Is Data Mining? CMPT 354: Database I -- Data Mining 2 Data Mining What Is Data Mining? Mining data mining knowledge Data mining is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data CMPT

More information

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015 Q.1 a. Briefly explain data granularity with the help of example Data Granularity: The single most important aspect and issue of the design of the data warehouse is the issue of granularity. It refers

More information

DATA MINING INTRO LECTURE. Introduction

DATA MINING INTRO LECTURE. Introduction DATA MINING INTRO LECTURE Introduction Instructors Aris (Aris Anagnostopoulos) Yiannis (Ioannis Chatzigiannakis) Evimaria (Evimaria Terzi) What is data mining? After years of data mining there is still

More information

Data Mining: Data. What is Data? Lecture Notes for Chapter 2. Introduction to Data Mining. Properties of Attribute Values. Types of Attributes

Data Mining: Data. What is Data? Lecture Notes for Chapter 2. Introduction to Data Mining. Properties of Attribute Values. Types of Attributes 0 Data Mining: Data What is Data? Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Collection of data objects and their attributes An attribute is a property or characteristic

More information

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining 10 Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 What is Data? Collection of data objects

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

A SURVEY OF DATA MINING & ITS APPLICATIONS

A SURVEY OF DATA MINING & ITS APPLICATIONS A SURVEY OF DATA MINING & ITS APPLICATIONS Pankaj jain M.Tech Student, Computer Science Siddhi Vinayak College of Science & Hr.Education, Alwar (Rajasthan) Abstract- Data mining consists of evolving set

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 1 Databases and Database Users Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1-2 OUTLINE Types of Databases and Database Applications

More information

Database Infrastructure to Support Knowledge Management in Physicochemical Data - Application in NIST/TRC SOURCE Data System

Database Infrastructure to Support Knowledge Management in Physicochemical Data - Application in NIST/TRC SOURCE Data System 18 th CODATA Conference, Montreal, CANADA September 29 to October 3, 2002 Database Infrastructure to Support Knowledge Management in Physicochemical Data - Application in NIST/TRC SOURCE Data System Qian

More information

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand

More information

Data warehousing and Phases used in Internet Mining Jitender Ahlawat 1, Joni Birla 2, Mohit Yadav 3

Data warehousing and Phases used in Internet Mining Jitender Ahlawat 1, Joni Birla 2, Mohit Yadav 3 International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 170 Data warehousing and Phases used in Internet Mining Jitender Ahlawat 1, Joni Birla 2, Mohit Yadav 3 1 M.Tech.

More information

Data Mining: Dynamic Past and Promising Future

Data Mining: Dynamic Past and Promising Future SDM@10 Anniversary Panel: Data Mining: A Decade of Progress and Future Outlook Data Mining: Dynamic Past and Promising Future Jiawei Han Department of Computer Science University of Illinois at Urbana

More information

An Indian Journal FULL PAPER. Trade Science Inc. Research on data mining clustering algorithm in cloud computing environments ABSTRACT KEYWORDS

An Indian Journal FULL PAPER. Trade Science Inc. Research on data mining clustering algorithm in cloud computing environments ABSTRACT KEYWORDS [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 17 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(17), 2014 [9562-9566] Research on data mining clustering algorithm in cloud

More information

DATA MINING TRANSACTION

DATA MINING TRANSACTION DATA MINING Data Mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

Data Stream Mining. Tore Risch Dept. of information technology Uppsala University Sweden

Data Stream Mining. Tore Risch Dept. of information technology Uppsala University Sweden Data Stream Mining Tore Risch Dept. of information technology Uppsala University Sweden 2016-02-25 Enormous data growth Read landmark article in Economist 2010-02-27: http://www.economist.com/node/15557443/

More information

Incremental Frequent Pattern Mining. Abstract

Incremental Frequent Pattern Mining. Abstract Incremental Frequent Pattern Mining Abstract Data Mining (DM) is a process for extracting interesting patterns from large volumes of data. It is one of the crucial steps in Knowledge Discovery in Databases

More information

A Brief Introduction to Data Mining

A Brief Introduction to Data Mining A Brief Introduction to Data Mining L. Torgo ltorgo@dcc.fc.up.pt Departamento de Ciência de Computadores Faculdade de Ciências / Universidade do Porto Sept, 2014 Introduction Motivation for Data Mining?

More information

Jarek Szlichta

Jarek Szlichta Jarek Szlichta http://data.science.uoit.ca/ Approximate terminology, though there is some overlap: Data(base) operations Executing specific operations or queries over data Data mining Looking for patterns

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information