Dta Mining and Data Warehousing
|
|
- Jody Paul
- 5 years ago
- Views:
Transcription
1 CSCI645 Fall 23 Dta Mining and Data Warehousing Instructor: Qigang Gao, Office: CS219, Tel: , Teaching Assistant: Christopher Jordan, Office Hours: TR, 1:3-3: PM 9 October 23 1
2 Lectures Outline Pat I: Overview on DM and DW 1. Introduction (ch1) Ass1 Due: Sep 23 Tue 2. Data preprocessing (ch3) Part II: DW and OLAP 3. Data warehousing and OLAP (Ch2) Ass2: Sep 23 Oct 14 Part III: Data Mining Methods/Algorithms 4. Data mining primitives (ch4) 5. Classification data mining (ch7) Ass3: Oct 7 Oct Association data mining (ch6) Ass4: Oct 21 Nov 5 7. Characterization data mining (ch5) 8. Clustering data mining (ch8) Part IV: Mining Complex Types of Data 9. Mining the Web (Ch9) 1. Mining spatial data (Ch9) Project Presentations Project Due: Dec 8 9 October 23 2
3 3. DATA PREPROCESSING (Ch3) Data Preprocessing (DPP) Concept Major Tasks of DPP A DPP Case Study Summary 9 October 23 3
4 Why Is Data Preprocessing Important? No quality data, quality mining results! Quality decisions must be based on quality data e.g., duplicate or missing data may cause incorrect or even misleading statistics. Data warehouse needs consistent integration of quality data Data extraction, cleaning, and transformation comprises the majority of the work of building a data warehouse 9 October 23 4
5 Multi-Dimensional Measure of Data Quality A well-accepted multidimensional view: Accuracy Completeness Consistency Timeliness Believability Value added Interpretability Accessibility 9 October 23 5
6 9 October 23 6
7 Why Data Preprocessing? Raw data have errors and inconsistencies (Data cleaning) Data need to be integrated from different sources and a unique format is needed (Data integration and transformation) Irrelevant data should be removed (Data reduction) Domain kwledge should be added into the prepared data (Discretization and concept hierarchy generation) 9 October 23 7
8 Major Tasks of DPP 9 October 23 8
9 Major Tasks of DPP (cont) Data cleaning Fill in missing values, smooth isy data, identify or remove outliers, and resolve inconsistencies Data integration Integration of multiple databases, data cubes, or files Data transformation Normalization and aggregation Data reduction Obtains reduced representation in volume but produces the same or similar analytical results Data discretization Part of data reduction but with particular importance, especially for numerical data 9 October 23 9
10 Why data cleaning? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation= isy: containing errors or outliers e.g., Salary= -1 inconsistent: containing discrepancies in codes or names e.g., Age= 42 Birthday= 3/7/1997 e.g., Was rating 1,2,3, w rating A, B, C 9 October 23 1
11 Why is data dirty? Incomplete data comes from n/a data value when collected different consideration between the time when the data was collected and when it is analyzed. human/hardware/software problems Noisy data comes from the process of data collection entry transmission Inconsistent data comes from Different data sources Functional dependency violation 9 October 23 11
12 E.g. Data rmalization for clustering mining E.g., For clustering mining of a customer database: DB (Age, Income, Credit) The distance between to data points: d = ((C1_a1 - C2_a1)^2 + (C2_a2 - C2_a2)^2 + (C3_a1 C3_a2)^2)^(1/2) Age Income Credit Customer1: 32 4, 1, Customer2: 24 3, 2, 8 1, 8, Normalized: 1 1/1 1/ (rescaled) (rescaled) If we scale all the attributes to the same order of magnitude we obtain reliable distance measure between the different records. 9 October 23 12
13 A DPP Case Study Business Background: The publisher sells five types of magazine - on cars, houses, sports, music, and s. The aim of the data mining is to find new, interesting clusters of clients in order to set up a marketing exercise. The business is interested in questions such as "What is the typical profile of a reader of a car magazine?, "Is there any correlation between an interest in cars and an interest in s?"... Data mining task: - Mining clusters of clients for a magazine publisher database. - Data preparation for clustering: cleaned, integrated, rmalized, numerical valued data, etc 9 October 23 13
14 1. Data Selection The database should contain the records of subscription data of the magazines. It should be a selection of operational data from the publishers invoicing system and contains information about people who have subscribed to a magazine The records consist of: client number, name, address, date of subscription,and type of magazine In order to facilitate the DM process, a copy of this operational data is drawn and stored in a separate database (Refer Table 1) 9 October 23 14
15 Client number Name Address Date purchase Magazine purchased Clinton King Jonson 2 Boulevard 3 High Road car music sports house 1. Original data 9 October 23 15
16 2. Data Cleaning: remove duplications Duplication of records: In an operational client database some clients may be represented by several records, some of the possible causes may include: - the result of negligence, such as people making typing errors - clients moving from on place to ather without tifying change of the address - the cases in which people deliberately spell their names incorrectly or give incorrect information about themselves for avoiding a negative decision... (Refer to Table 2) 9 October 23 16
17 Client number Name Address Date purchase Magazine purchased Clinton King Jonson 2 Boulevard 3 High Road car music sports house 1. Original data Client number Name Address Date purchase Magazine purchased Clinton King 2 Boulevard 3 High Road car music sports house 2. De-duplication 9 October 23 17
18 De-duplication De-duplication: The duplicated records may be identified by a pattern recognition algorithm and then corrected. E.g., The records Mr. and Mr. Jonson in the database. They have different client numbers but the same address, which is a strong indication that they are the same person. This type of pollution will give a company the impression that it has more clients than in fact is the case. Of course, we can never be sure of this, but a de-duplication algorithm using pattern analysis techniques could identify the situation and present it to a user to make a decision. 9 October 23 18
19 2. Data Cleaning: correct domain inconsistency Domain inconsistency: Pollution was caused by wrong domain values which are t consistent with the definitions. E.g. In the example table, date means 1 January 191 (the company did t even exist at that time). In some databases, analysis shows an unexpected high number of people born on 11 November: When people were forced to fill in a birth date on a screen and they either do t kw or do t want to divulge it, they were inclined to type in ` '. This kind of untrue random values can be disastrous in a data mining context. If information is unkwn () it should be represented as such in the database. 9 October 23 19
20 Client number Name Address Date purchase Magazine purchased Clinton King 2 Boulevard 3 High Road car music sports house Client number Name Address Date purchase Magazine purchased Clinton King 2 Boulevard 3 High Road car music sports house 3. Domain consistency 9 October 23 2
21 3. Data Integration (Enrichment) Suppose that we have purchased extra information about our clients consisting of data of birth, income, amount of credit, and whether or t an individual owns a car or a house. (Refer to Table 4) * You therefore have to make a deliberate decision either to overlook it or to delete it. A general rule states that any deletion of data must be a conscious decision, after a thorough analysis of the possible consequences. 9 October 23 21
22 Client number Name Address Date purchase Magazine purchased Clinton King 2 Boulevard 3 High Road car music sports house 3. Domain consistency Client name Date of birth Income Credit Car owner House owner Clinton $36, $26,6 yes 4. Additional data available for enrichment 9 October 23 22
23 Credit numb er Name Date of birth Income Credit Car owne r Hous e owne r Address Date purchase made Magazin e purchas ed Clinton King $36, $26.6 yes 2 Boulevard car music sports house 5. Enriched table 9 October 23 23
24 4. Data Deduction Remove the columns and rows which are t valuable to the DM process. In Table 6, the column NAME and the row with multiple values are removed from the database. In a real DM project, maybe most of the tables that are collected from the operational data and a lot of desirable data is missing, and most is possible to retrieve. 9 October 23 24
25 Credit numb er Name Date of birth Income Credit Car owne r Hous e owne r Address Date purchase made Magazin e purchas ed Clinton King $36, $26.6 yes 2 Boulevard car music sports house 5. Enriched table Credit number Date of birth Income Credit Car owne r House owner Address Date purchase made Magazine purchased $36, $26.6 yes 2 Boulevard car music house 6. Table with column and row removed 9 October 23 25
26 4. Data Deduction (cont) In some cases, especially fraud detection, lack of information can be a valuable indication of interesting patterns. Up to this point, the process phase has consisted of mainly simple SQL operations. 9 October 23 26
27 5. Data transformation For most of databases, the information provided is much too detailed to be used as input of data mining algorithms, such as Credit number Date of birth Income Credit Car owne r House owner Address Date purchase made Magazine purchased $36, $26.6 yes 2 Boulevard car music house Apply the following coding steps: 1. Address to region 2. Birth date to age 3. Divide income be 1 4. Divide credit by 1 5. Convert cars yes- to 1-6. Convert purchase date to month numbers starting from October 23 27
28 Credit number Date of birth Income Credit Car owne r House owner Address Date purchase made Magazine purchased $36, $26.6 yes 2 Boulevard car music house 6. Table with column and row removed Credit number Age Income Credit Car owne r House owner Region Month of purchase Magazine purchased car music house 7. An intermediate coding stage 9 October 23 28
29 Credit numbe r Age Income Credit Car owner House owner Region Car magazine House Sport s Music Comic The final table 9 October 23 29
30 9 October 23 3
31 Summary Data preparation is a big issue and most time cost process for both mining and warehousing Data preparation includes Data cleaning, integration, transformation, reduction, discretization, etc. Many DPP tools have been developed but it is still an active research area because of the effort needed for 9 October 23 31
Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2011 Han, Kamber & Pei. All rights
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data
More informationDta Mining and Data Warehousing
CSCI6405 Fall 2003 Dta Mining and Data Warehousing Instructor: Qigang Gao, Office: CS219, Tel:494-3356, Email: q.gao@dal.ca Teaching Assistant: Christopher Jordan, Email: cjordan@cs.dal.ca Office Hours:
More informationData Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality
Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing
More informationData Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking
More informationECT7110. Data Preprocessing. Prof. Wai Lam. ECT7110 Data Preprocessing 1
ECT7110 Data Preprocessing Prof. Wai Lam ECT7110 Data Preprocessing 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest,
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationPreprocessing Short Lecture Notes cse352. Professor Anita Wasilewska
Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept
More informationData Preprocessing. Data Mining 1
Data Preprocessing Today s real-world databases are highly susceptible to noisy, missing, and inconsistent data due to their typically huge size and their likely origin from multiple, heterogenous sources.
More informationCS 521 Data Mining Techniques Instructor: Abdullah Mueen
CS 521 Data Mining Techniques Instructor: Abdullah Mueen LECTURE 2: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks
More informationData Preprocessing in Python. Prof.Sushila Aghav
Data Preprocessing in Python Prof.Sushila Aghav Sushila.aghav@mitcoe.edu.in Content Why preprocess the data? Descriptive data summarization Data cleaning Data integration and transformation April 24, 2018
More informationQuestion Bank. 4) It is the source of information later delivered to data marts.
Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile
More information7. Cluster Data Mining (ch8) K-means Clustering Method. CSCI6405 Fall 2003 Dta Mining and Data Warehousing. Lectures Outline
CSCI645 Fall 23 Dta Mining and Data Warehousing Instructor: Qigang Gao, Office: CS219, Tel:494-3356, Email: qggao@cs.dal.ca Teaching Assistant: Christopher Jordan, Email: cjordan@cs.dal.ca Office Hours:
More information2. Data Preprocessing
2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459
More information3. Data Preprocessing. 3.1 Introduction
3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation
More informationUNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES
UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES Data Pre-processing-Data Cleaning, Integration, Transformation, Reduction, Discretization Concept Hierarchies-Concept Description: Data Generalization And
More informationData Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 15 Table of contents 1 Introduction 2 Data preprocessing
More informationUNIT 2 Data Preprocessing
UNIT 2 Data Preprocessing Lecture Topic ********************************************** Lecture 13 Why preprocess the data? Lecture 14 Lecture 15 Lecture 16 Lecture 17 Data cleaning Data integration and
More informationData Mining. Vera Goebel. Department of Informatics, University of Oslo
Data Mining Vera Goebel Department of Informatics, University of Oslo 2012 1 Lecture Contents Knowledge Discovery in Databases (KDD) Definition and Applications OLAP Architectures for OLAP and KDD KDD
More informationData Preprocessing UE 141 Spring 2013
Data Preprocessing UE 141 Spring 2013 Jing Gao SUNY Buffalo 1 Outline Data Data Preprocessing Improve data quality Prepare data for analysis Exploring Data Statistics Visualization 2 Document Data Each
More informationThis tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.
About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts
More informationBy Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad
By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad Data Analytics life cycle Discovery Data preparation Preprocessing requirements data cleaning, data integration, data reduction, data
More informationData Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 15 Table of contents 1 Introduction 2 Data preprocessing
More informationData preprocessing Functional Programming and Intelligent Algorithms
Data preprocessing Functional Programming and Intelligent Algorithms Que Tran Høgskolen i Ålesund 20th March 2017 1 Why data preprocessing? Real-world data tend to be dirty incomplete: lacking attribute
More informationData Mining: Concepts and Techniques
Data Mining: Concepts and Techniques Chapter 2 Original Slides: Jiawei Han and Micheline Kamber Modification: Li Xiong Data Mining: Concepts and Techniques 1 Chapter 2: Data Preprocessing Why preprocess
More informationData Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA
Obj ti Objectives Motivation: Why preprocess the Data? Data Preprocessing Techniques Data Cleaning Data Integration and Transformation Data Reduction Data Preprocessing Lecture 3/DMBI/IKI83403T/MTI/UI
More information2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.
Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss
More informationSCHEME OF COURSE WORK. Data Warehousing and Data mining
SCHEME OF COURSE WORK Course Details: Course Title Course Code Program: Specialization: Semester Prerequisites Department of Information Technology Data Warehousing and Data mining : 15CT1132 : B.TECH
More information1. Inroduction to Data Mininig
1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the
More informationK236: Basis of Data Science
Schedule of K236 K236: Basis of Data Science Lecture 6: Data Preprocessing Lecturer: Tu Bao Ho and Hieu Chi Dam TA: Moharasan Gandhimathi and Nuttapong Sanglerdsinlapachai 1. Introduction to data science
More informationSummary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4
Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is
More informationR07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.
www..com www..com Set No.1 1. a) What is data mining? Briefly explain the Knowledge discovery process. b) Explain the three-tier data warehouse architecture. 2. a) With an example, describe any two schema
More informationDEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SHRI ANGALAMMAN COLLEGE OF ENGINEERING & TECHNOLOGY (An ISO 9001:2008 Certified Institution) SIRUGANOOR,TRICHY-621105. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year / Semester: IV/VII CS1011-DATA
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationImplementing and Maintaining Microsoft SQL Server 2005 Analysis Services
Implementing and Maintaining Microsoft SQL Server 2005 Analysis Services Introduction Elements of this syllabus are subject to change. This three-day instructor-led course teaches students how to implement
More informationData Exploration and Preparation Data Mining and Text Mining (UIC Politecnico di Milano)
Data Exploration and Preparation Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining, : Concepts and Techniques", The Morgan Kaufmann
More informationData Mining Course Overview
Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical
More informationData Preprocessing Part 1
Data Preprocessing Part 1 HAP 780 Data Mining in Health Care Janusz Wojtusiak, PhD George Mason University Fall 2016 The world is full of obvious things which nobody by any chance ever observes. -Sherlock
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES 2: Data Pre-Processing Instructor: Yizhou Sun yzsun@ccs.neu.edu September 10, 2013 2: Data Pre-Processing Getting to know your data Basic Statistical Descriptions of Data
More informationJarek Szlichta
Jarek Szlichta http://data.science.uoit.ca/ Open data Business Data Web Data Available at different formats 2 Data Scientist: The Sexiest Job of the 21 st Century Harvard Business Review Oct. 2012 (c)
More informationData Warehouse and Data Mining
Data Warehouse and Data Mining Lecture No. 05 Data Modeling Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Data Modeling
More informationOLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube
OLAP2 outline Multi Dimensional Data Model Need for Multi Dimensional Analysis OLAP Operators Data Cube Demonstration Using SQL Multi Dimensional Data Model Multi dimensional analysis is a popular approach
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationDATA WAREHOUING UNIT I
BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009
More informationKnowledge Modelling and Management. Part B (9)
Knowledge Modelling and Management Part B (9) Yun-Heh Chen-Burger http://www.aiai.ed.ac.uk/~jessicac/project/kmm 1 A Brief Introduction to Business Intelligence 2 What is Business Intelligence? Business
More informationLecture 18. Business Intelligence and Data Warehousing. 1:M Normalization. M:M Normalization 11/1/2017. Topics Covered
Lecture 18 Business Intelligence and Data Warehousing BDIS 6.2 BSAD 141 Dave Novak Topics Covered Test # Review What is Business Intelligence? How can an organization be data rich and information poor?
More informationData Collection, Preprocessing and Implementation
Chapter 6 Data Collection, Preprocessing and Implementation 6.1 Introduction Data collection is the loosely controlled method of gathering the data. Such data are mostly out of range, impossible data combinations,
More informationCode No: R Set No. 1
Code No: R05321204 Set No. 1 1. (a) Draw and explain the architecture for on-line analytical mining. (b) Briefly discuss the data warehouse applications. [8+8] 2. Briefly discuss the role of data cube
More informationA Survey on Data Preprocessing Techniques for Bioinformatics and Web Usage Mining
Volume 117 No. 20 2017, 785-794 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu A Survey on Data Preprocessing Techniques for Bioinformatics and Web
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 02 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More information2 CONTENTS. 3.8 Bibliographic Notes... 45
Contents 3 Data Preprocessing 3 3.1 Data Preprocessing: An Overview................. 4 3.1.1 Data Quality: Why Preprocess the Data?......... 4 3.1.2 Major Tasks in Data Preprocessing............. 5 3.2
More informationData Preprocessing. Data Mining: Concepts and Techniques. c 2012 Elsevier Inc. All rights reserved.
3 Data Preprocessing Today s real-world databases are highly susceptible to noisy, missing, and inconsistent data due to their typically huge size (often several gigabytes or more) and their likely origin
More informationData Mining. Jeff M. Phillips. January 12, 2015 CS 5140 / CS 6140
Data Mining CS 5140 / CS 6140 Jeff M. Phillips January 12, 2015 Data Mining What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational
More informationCse352 Artifficial Intelligence Short Review for Midterm. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse352 Artifficial Intelligence Short Review for Midterm Professor Anita Wasilewska Computer Science Department Stony Brook University Midterm Midterm INCLUDES CLASSIFICATION CLASSIFOCATION by Decision
More informationData Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.
Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan
More informationcse634 Data Mining Preprocessing Lecture Notes Chapter 2 Professor Anita Wasilewska
cse634 Data Mining Preprocessing Lecture Notes Chapter 2 Professor Anita Wasilewska Chapter 2: Data Preprocessing (book slide) Why preprocess the data? Descriptive data summarization Data cleaning Data
More informationData Preprocessing. Komate AMPHAWAN
Data Preprocessing Komate AMPHAWAN 1 Data cleaning (data cleansing) Attempt to fill in missing values, smooth out noise while identifying outliers, and correct inconsistencies in the data. 2 Missing value
More informationManagement Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT
MANAGING THE DIGITAL FIRM, 12 TH EDITION Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT VIDEO CASES Case 1: Maruti Suzuki Business Intelligence and Enterprise Databases
More informationTime: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A.
COPYRIGHT RESERVED End Sem (V) MCA (XXVIII) 2017 Time: 3 hours Full Marks: 70 Candidates are required to give their answers in their own words as far as practicable. The figures in the margin indicate
More informationDatabase Vs. Data Warehouse
Database Vs. Data Warehouse Similarities and differences Databases and data warehouses are used to generate different types of information. Information generated by both are used for different purposes.
More informationData Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140
Data Mining CS 5140 / CS 6140 Jeff M. Phillips January 7, 2019 What is Data Mining? What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational
More informationData warehouse and Data Mining
Data warehouse and Data Mining Lecture No. 14 Data Mining and its techniques Naeem A. Mahoto Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationManagement Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management
Management Information Systems Review Questions Chapter 6 Foundations of Business Intelligence: Databases and Information Management 1) The traditional file environment does not typically have a problem
More informationDATA MINING Introductory and Advanced Topics Part I
DATA MINING Introductory and Advanced Topics Part I Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by Dr. M.H.Dunham, Data
More informationDomestic electricity consumption analysis using data mining techniques
Domestic electricity consumption analysis using data mining techniques Prof.S.S.Darbastwar Assistant professor, Department of computer science and engineering, Dkte society s textile and engineering institute,
More informationData Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini
Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 6 (2013), pp. 669-674 Research India Publications http://www.ripublication.com/aeee.htm Data Warehousing Ritham Vashisht,
More informationAdnan YAZICI Computer Engineering Department
Data Warehouse Adnan YAZICI Computer Engineering Department Middle East Technical University, A.Yazici, 2010 Definition A data warehouse is a subject-oriented integrated time-variant nonvolatile collection
More informationCS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University
CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,
More informationCse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before
More informationManagement Information Systems
Foundations of Business Intelligence: Databases and Information Management Lecturer: Richard Boateng, PhD. Lecturer in Information Systems, University of Ghana Business School Executive Director, PearlRichards
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationMIT Database Management Systems Lesson 01: Introduction
MIT 22033 Database Management Systems Lesson 01: Introduction By S. Sabraz Nawaz Senior Lecturer in MIT, FMC, SEUSL Learning Outcomes At the end of the module the student will be able to: Describe the
More informationImplementing and Maintaining Microsoft SQL Server 2008 Analysis Services
Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services Course Details Course Outline Module 1: Introduction to Microsoft SQL Server Analysis Services This module introduces
More informationData Mining and Analytics. Introduction
Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining
More informationE(xtract) T(ransform) L(oad)
Gunther Heinrich, Tobias Steimer E(xtract) T(ransform) L(oad) OLAP 20.06.08 Agenda 1 Introduction 2 Extract 3 Transform 4 Load 5 SSIS - Tutorial 2 1 Introduction 1.1 What is ETL? 1.2 Alternative Approach
More informationCHAPTER 3 Implementation of Data warehouse in Data Mining
CHAPTER 3 Implementation of Data warehouse in Data Mining 3.1 Introduction to Data Warehousing A data warehouse is storage of convenient, consistent, complete and consolidated data, which is collected
More informationCOMP33111: Tutorial/lab exercise 2
COMP33111: Tutorial/lab exercise 2 Part 1: Data cleaning, profiling and warehousing Note: use lecture slides and additional materials (see Blackboard and COMP33111 web page). 1. Explain why legacy data
More informationCOURSE LISTING. Courses Listed. Training for Database & Technology with Modeling in SAP HANA. 20 November 2017 (12:10 GMT) Beginner.
Training for Database & Technology with Modeling in SAP HANA Courses Listed Beginner HA100 - SAP HANA Introduction Advanced HA300 - SAP HANA Certification Exam C_HANAIMP_13 - SAP Certified Application
More informationTribhuvan University Institute of Science and Technology MODEL QUESTION
MODEL QUESTION 1. Suppose that a data warehouse for Big University consists of four dimensions: student, course, semester, and instructor, and two measures count and avg-grade. When at the lowest conceptual
More informationData Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 22 Table of contents 1 Introduction 2 Data warehousing
More informationCOURSE LISTING. Courses Listed. Training for Database & Technology with Modeling in SAP HANA. Last updated on: 30 Nov 2018.
Training for Database & Technology with Modeling in SAP HANA Courses Listed Einsteiger HA100 - SAP HANA Introduction Fortgeschrittene HA300 - SAP HANA 2.0 SPS03 Modeling HA301 - SAP HANA 2.0 SPS02 Advanced
More informationData Warehousing. Adopted from Dr. Sanjay Gunasekaran
Data Warehousing Adopted from Dr. Sanjay Gunasekaran Main Topics Overview of Data Warehouse Concept of Data Conversion Importance of Data conversion and the steps involved Common Industry Methodology Outline
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining
More informationOn-Line Analytical Processing (OLAP) Traditional OLTP
On-Line Analytical Processing (OLAP) CSE 6331 / CSE 6362 Data Mining Fall 1999 Diane J. Cook Traditional OLTP DBMS used for on-line transaction processing (OLTP) order entry: pull up order xx-yy-zz and
More informationCT75 (ALCCS) DATA WAREHOUSING AND DATA MINING JUN
Q.1 a. Define a Data warehouse. Compare OLTP and OLAP systems. Data Warehouse: A data warehouse is a subject-oriented, integrated, time-variant, and 2 Non volatile collection of data in support of management
More informationAn Overview of various methodologies used in Data set Preparation for Data mining Analysis
An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of
More informationThe University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory
Warehousing Outline Andrew Kusiak 2139 Seamans Center Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335 5934 Introduction warehousing concepts Relationship
More informationCS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)
CS614- Data Warehousing Solved MCQ(S) From Midterm Papers (1 TO 22 Lectures) BY Arslan Arshad Nov 21,2016 BS110401050 BS110401050@vu.edu.pk Arslan.arshad01@gmail.com AKMP01 CS614 - Data Warehousing - Midterm
More informationOverview of Web Mining Techniques and its Application towards Web
Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous
More informationKnowledge Discovery. URL - Spring 2018 CS - MIA 1/22
Knowledge Discovery Javier Béjar cbea URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 03 : 13/10/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationData Preprocessing. Chapter Why Preprocess the Data?
Contents 2 Data Preprocessing 3 2.1 Why Preprocess the Data?........................................ 3 2.2 Descriptive Data Summarization..................................... 6 2.2.1 Measuring the Central
More informationWinter Semester 2009/10 Free University of Bozen, Bolzano
Data Warehousing and Data Mining Winter Semester 2009/10 Free University of Bozen, Bolzano DW Lecturer: Johann Gamper gamper@inf.unibz.it DM Lecturer: Mouna Kacimi mouna.kacimi@unibz.it http://www.inf.unibz.it/dis/teaching/dwdm/index.html
More informationCOMP 465 Special Topics: Data Mining
COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,
More informationDynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 1, Number 1 (2015), pp. 25-31 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 3
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights
More informationCS 1655 / Spring 2013! Secure Data Management and Web Applications
CS 1655 / Spring 2013 Secure Data Management and Web Applications 03 Data Warehousing Alexandros Labrinidis University of Pittsburgh What is a Data Warehouse A data warehouse: archives information gathered
More informationGUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV
GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand
More information