PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore

Similar documents
SCHEME OF COURSE WORK. Data Warehousing and Data mining

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

Course File Leaf (Theory) For the Academic Year (Odd/Even Semester)

DATA WAREHOUING UNIT I

SCHEME OF TEACHING AND EXAMINATION B.E. (ISE) VIII SEMESTER (ACADEMIC YEAR )

Chapter 7: Frequent Itemsets and Association Rules

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Data Mining Course Overview

Code No: R Set No. 1

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE)

Table Of Contents: xix Foreword to Second Edition


2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Advanced Web Programming (17MCA42)

Contents. Preface to the Second Edition

COURSE PLAN. Computer Science & Engineering

D B M G Data Base and Data Mining Group of Politecnico di Torino

Chapter 1, Introduction

PESIT Bangalore South Campus Department of MCA Course Information for

M. PHIL. COMPUTER SCIENCE (FT / PT) PROGRAMME (For the candidates to be admitted from the academic year onwards)

Data mining fundamentals

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES

Topic 1 Classification Alternatives

Course no: CSC- 451 Full Marks: Credit hours: 3 Pass Marks: Nature of course: Theory (3 Hrs.) + Lab (3 Hrs.)

Winter Semester 2009/10 Free University of Bozen, Bolzano

Operating System(16MCA24)

Tribhuvan University Institute of Science and Technology MODEL QUESTION

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

AUTONOMOUS. Department of Computer Science and Engineering

Time: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A.

Gurpreet Kaur 1, Naveen Aggarwal 2 1,2

Fall Principles of Knowledge Discovery in Databases. University of Alberta

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei

CT75 (ALCCS) DATA WAREHOUSING AND DATA MINING JUN

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

1. Inroduction to Data Mininig

MINING CONCEPT IN BIG DATA

An Improved Apriori Algorithm for Association Rules

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16

COMPUTER SCIENCE AND ENGINEERING TUTORIAL QUESTION BANK

PESIT Bangalore South Campus

Chapter 7: Frequent Itemsets and Association Rules

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

Meetings This class meets on Mondays from 6:20 PM to 9:05 PM in CIS Room 1034 (in class delivery of instruction).

B.C.A DATA BASE MANAGEMENT SYSTEM MODULE SPECIFICATION SHEET. Course Outline

Syllabus DATABASE I Introduction to Database (INLS523)

Specific Objectives Contents Teaching Hours 4 the basic concepts 1.1 Concepts of Relational Databases

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University

Association Rule Mining. Entscheidungsunterstützungssysteme

Data Mining Clustering

B.C.A 2017 MICROPROCESSOR AND ASSEMBLY LANGUAGE MODULE SPECIFICATION SHEET. Course Outline

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore

Data Warehousing & DataMinig. DATA WAREHOUSING AND DATA MINING Subject Code: 10IS74 I.A. Marks : 25 Hours/Week : 04 Exam Hours: 03 PART A

Jarek Szlichta

Chapter 4 Data Mining A Short Introduction

Data Mining: Data. What is Data? Lecture Notes for Chapter 2. Introduction to Data Mining. Properties of Attribute Values. Types of Attributes

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE

Mine Blood Donors Information through Improved K- Means Clustering Bondu Venkateswarlu 1 and Prof G.S.V.Prasad Raju 2

DATA MINING II - 1DL460

B.C.A 2017 OBJECT ORIENTED PROGRAMMING USING C++ BCA303T MODULE SPECIFICATION SHEET

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?

Business Intelligence Roadmap HDT923 Three Days

Association rules. Marco Saerens (UCL), with Christine Decaestecker (ULB)

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Machine Learning: Symbolische Ansätze

DATA MINING II - 1DL460

PESIT Bangalore South Campus Department of MCA Course Information for. System Programming (13MCA24)

Data Mining Download or Read Online ebook data mining in PDF Format From The Best User Guide Database

Question Bank. 4) It is the source of information later delivered to data marts.

Knowledge Discovery and Data Mining

Section A. 1. a) Explain the evolution of information systems into today s complex information ecosystems and its consequences.

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

An Introduction to WEKA Explorer. In part from: Yizhou Sun 2008

BE COMPUTER SCIENCE & ENGINEERING

Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

Association Pattern Mining. Lijun Zhang

Interestingness Measurements

ANU MLSS 2010: Data Mining. Part 2: Association rule mining

ADIKAVI NANNAYA UNIVERSITY B. A Computer Applications / B.Sc Computer Science/B.Sc IT Syllabus Under CBCS w.e.f

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA

Introduction to Data Mining

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

DATA MINING AND WAREHOUSING

Improved Frequent Pattern Mining Algorithm with Indexing

Product presentations can be more intelligently planned

The application of OLAP and Data mining technology in the analysis of. book lending

Day Hour Timing pm am am am

A Review on Cluster Based Approach in Data Mining

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

COMP 6838 Data MIning

Transcription:

Data Warehousing Data Mining (17MCA442) 1. GENERAL INFORMATION: PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore 560 100 Department of MCA COURSE INFORMATION SHEET Academic Year: 2018 Semester: IV Title Code Duration (hrs) Lectures 48 Hrs 17MCA442 Seminars 4 Hrs Total 52 Hrs Data Warehousing and Data Mining 2. PRE REQUIREMENT STATEMENT: Data warehousing and data mining are two major areas of exploration for knowledge discovery in databases. These topics have gained great relevance especially in the 1990 s and early 2000 s with web data growing at an exponential rate. As more data is collected by businesses and scientific institutions, knowledge exploration techniques are needed to gain useful business intelligence. This course will cover a wide spectrum of industry standard techniques using widely available database and tools packages for knowledge discovery. Data mining is for relatively unstructured data for which more sophisticated techniques are needed. The course aims to cover powerful data mining techniques including clustering, association rules, and classification. It then teaches high volume data processing mechanisms by building warehouse schemas such as snowflake, and star. OLAP query retrieval techniques are also introduced. Should be familiar with statistics concepts. It may also be helpful to have some background in calculus, linear algebra, and computer science. COURSE DESCRIPTION: This course gives an introduction to methods and theory for development of data warehouses and data analysis using data mining. Data quality and methods and techniques for preprocessing of data. Modeling and design of data warehouses. Algorithms for classification, clustering and association rule analysis. Practical use of software for data analysis.

4. LEARNING OUTCOMES After completion of the subject Data warehousing and data mining the student will be able to Identify the techniques of classification and clustering and calculating distances using centroid Get the knowledge of data preprocessing and data quality. Able to design Data warehouses Ability to apply acquired knowledge for understanding data and select suitable methods for data analysis. 5. FACULTY DETAILS: Faculty Name : Mrs.Jayanthi.R Department : MCA Room Number: 504 Phone Number: 8951112398 Mail-id :jayanthir@pes.edu Contact Hours : College Hours Consultation Time: By E-Mail 6. VENUE AND HOURS/WEEK: 7. MODULE MAP: All lectures will normally be held in 500,501 and 506, 5 th Floor. Lecture Hours/week: 4Hrs All the laboratory sessions will be held in Room 500 & 506, V Floor. Class # % of portions covered Chapter Title/ Reference Topic To be Covered Reference Literature Cumulative Chapter 1. Introduction 2. Operational Data stores,etl Data Warehousing 3. Data warehouses 4. R2:chapter 3 Design issues 11.54 11.54 5. Guide lines for Data warehousing Implementation 6. Data warehouse Metadata 7. Introduction 8. Online analytical Characteristics of OLAP system Processing(OLAP) 9. Multidimensional view and data cube 10. Data cube implementations 11.54 23.08 R2: Chapter4 11. Data cube operations 12. Implementation of OLAP and overview on OLAP Sotwares 13. Introduction 14. Challenges Data mining tasks, 15. Data Mining Types of data, Data preprocessing 16. T1: Chapter 1,2 Measures of Similarity and Dissimilarity 11.54 34.62 17. Measures of Similarity and Dissimilarity contd 18. Data mining applications

19. Frequent Item set generation 20. Rule generation 21. Association Compact representation of frequent item sets 22. analysis-basic Alternative methods for generating frequent concepts and item sets Algorithms1 15.38 50.00 23. Alternative methods for generating frequent item sets contd T1: Chapter 6 24. FP growth algorithm 25. FP growth algorithm contd 26. Evaluation of association patterns

27. Basics, General approach to solving a classification problem 28. Decision tree 29. Decision tree 30. Rule-based classifier 31. Rule-based classifier contd Classification 23.08 73.08 32. T1: Chapter Rule-based classifier contd 4,5(5.1-5.3) 33. Nearest-neighbor classifier. 34. Bayesian classifiers 35. Estimating predictive accuracy of classification methods 36. Improving accuracy of classification methods 37. Evaluation criteria for classification method 38. Multiclass problem 39. Overview, features of cluster analysis 40. Types of data and computing distance 41. Types of data and computing distance contd. Clustering 42. Types of cluster analysis methods Techniques 15.38 88.46 43. T1: Chapter 8,9 Partitional methods R2: Chapter 7 44. Hierarchical methods 45. Density based methods 46. Quality and validity of cluster analysis 47. introduction 48. Web content mining 49. Text mining Web Mining 50. R2: Chapter 10 Unstructured text, text clustering 11.54 100 51. Mining spatial and temporal databases 52. Mining spatial and temporal databases contd. 12.RECOMMENDED BOOKS/JOURNALS/WEBSITES Text Books: 1. Jiawei Han and Micheline Kamber: Data Mining - Concepts and Techniques, 2nd Edition, Morgan Kaufmann Publisher, 2006. 2. Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining, Addison- Wesley,2005 Reference Books: 1. Arun K Pujari: Data Mining Techniques University Press, 2nd Edition, 2009. 2. G. K. Gupta: Introduction to Data Mining with Case Studies, 3 rd Edition, PHI, New Delhi, 2009. 3. Alex Berson and Stephen J.Smith: Data Warehousing, Data Mining, and OLAP Computing McGrawHill Publisher, 1997. 9 ASSIGNMENTS The students has to submit 2 assignments, one before first internal and the second one before second internal exam

Assignment questions:- Define : Association analysis, Itemset, Transaction width, Association rule. 2. For the following market basket transaction compute Support & Confidence for the rule {Milk, Diapers} -> {Beer}. Tid Itemsets 1 {Bread, Milk} 2 {Bread, Diapers, Beer, Eggs} 3 {Milk, Diapers, Beer, Cola} 4 {Bread, Milk, Diapers, Beer} 5 {Bread, Milk, Diapers, Cola} 3. Explain Apriori principle and Illustrate the principle with itemset lattice. 4. Write apriori algorithm for finding frequent itemset. Find the frequent pattern generated using Apriori for the following set of transactions. Tid List of item_id's 100 L1, L2, L5 200 L2, L4 300 L2, L3 400 L1, L2, L4 500 L1, L3 600 L2, L3 700 L1, L3 800 L1, L2, L3, L5 900 L1, L2, L3 5. Define Maximal frequent itemset and Closed frequent itemsets. 6. Explain alternative method for generating frequent itemsets. 7. Draw FP-Tree Tid List of Items 100 {M, O, N, K, E, Y } 200 { D, O, N, K, E, Y } 300 { M, A, K, E } 400 { M, U, C, K, Y} 500 { C, O, O, K, I, E }

8. Explain Frequent itemset generation in FP-Growth Algorithm 10 THEORY ASSESSMENT WRITTEN EXAMINATION Paper Structure No. Of Questions 8 Main Questions No. of questions to be answered 5 Exam date Paper Duration 3 Hrs Total Marks 100 Pass Marks 40 CONTINUOUS ASSESSMENT Parameters Weighting (%) Test(s): 15 Marks Assignment(s): Attendance(s): Total Marks: 3 Marks 2 Marks 20 Marks 11 QUESTION BANK Sl Questions Marks No. 1. What is data mining 5 2. Mention Data mining functionality, classification, prediction, clustering & evolution 5 analysis? 3. What are the challenges in methodology of Data Mining technology? 5 4. Discuss issues to consider during Data Mining? 5 5. What defines a Data Mining Task Explain at least 5 primitives? 5 6. What is knowledge discovery? 5 7. Explain the motivating challenges in development of data mining. 5 8. Explain with example the data mining tasks 5 1. What is a data? What do you mean by quality of data? 4 2. What is a data set? Explain the various types of data sets 10 3. What is data preprocessing? 4. Explain the following 5 marks i. Aggrigation each ii. Sampling iii. Dimensionality reduction iv. Feature subset selection v. Feature creation vi. Discretation and binarization vii. Variable transformation Give example 5. Explain the similarity and dissimilarity between 2 objects 6 6. What is Ecludian distance? Write the generalized Minkowski distance metric for 8 various values r. 7. Explain the properties of Ecludian distance. 6 8. What is simple matching coefficients and Jaccard coefficient? Explain with examples 8 9. What is meant by cousine similarity? Explain with example. 6 10. What is Bregman divergence? 5 11. What are the issues related to proximity measures? 10 12. Discuss on selection on right proximity measures 7

1. Define classification. Explain the purposes of using a classification model 6 2. Explain the general approach for building a classification model. 10 3. What is a decision tree? How a decision tree works? 10 4. Explain Hunts algorithm for inducing decision trees 10 5. What are the various methods for expressing attribute test conditions? Explain with 12 examples 6. Explain the measures that can be used to determine the best way to split the record. 12 7. Explain decision tree induction algorithm 10 8. What are the various characteristics of decision tree induction? 12 9. Explain the rule based classifier with an example 5 10. Explain how a rule based classifier works with a suitable example 6 11. Discus rule based ordering scheme and class based ordering scheme 10 12. Explain the direct methods of extracting the classification rules 8 13. Explain the indirect methods for rule extraction 8 14. What are the characteristics of rule based classifiers 10 15. Explain the Nearest-Neighbor classifier 6 16. Discus the k-nearest neighbor classification algorithm 8 17. Explain the characteristics of Nearest-Neighbor classifiers 8