Written Exam Data Warehousing and Data Mining course code:

Size: px
Start display at page:

Download "Written Exam Data Warehousing and Data Mining course code:"

Transcription

1 Written Exam Data Warehousing and Data Mining course code: January 2008 (13:30-17:00) Remarks: The exercises are clearly marked as DM for data mining and DW for data warehousing to allow you to start with the topic you feel most confident about. Answer each exercise on a different sheet. In this way the correction can take place in parallel. In case we have exam paper in booklet-form, you can try to separate the sheets. Do not forget to put your name and student number on every sheet. Motivate yours answers. The motivation / argumentation plays an important role in grading the exercise. You are allowed to use the study material and notes for the written exam. The practicum has to be completed satisfactorily before one is admitted to the written exam. The grade for the written exam is immediately the grade for the course. In case of doubt, the result of the practicum may be taken into account. There are 4 exercises. For each assignment, the number of points is given. In total, there are 40 points. 1

2 Assignment 1 (DM): Classification (15 pts) A retailer wants for marketing purposes distinguish between costumers younger then 35 and customers older then 35. The following table summarizes the data set in the data base of the retailer in an abstract form. The relevant attributes, determined by domain knowledge, are for convenience denoted by A, B and C. The values for A are a1, a2 and a3. The values for B are b1 and b2. The values for C are c1 and c2. Assume that the retailer wants to A B C Number of Instances Y O a1 b1 c a2 b1 c1 0 4 a3 b1 c1 6 2 a1 b2 c a2 b2 c1 6 4 a3 b2 c1 0 6 a1 b1 c2 0 8 a2 b1 c2 8 0 a3 b1 c2 2 0 a1 b2 c2 0 4 a2 b2 c2 2 2 a3 b2 c2 4 0 use Decision Trees to classify the costumers in the class young, denoted by Y, and old, denoted by O. Part 1a Compute the Classification error (pg. 150 handout Ch. 4) for the A attribute. Part 1b According to the Classification error, which attribute would be chosen as the first splitting attribute? For each attribute show the contingency table and the corresponding Classification error. 2

3 Part 1c Draw the resulting Decision Tree of depth 1, based on your outcome of Part b. Repeat Part b for the children of the root node, i.e. the nodes on level 1. Draw the resulting Decision Tree of depth 2. Part 1d Compute the error rate of your Decision Tree of depth 2, using the resubstition error (pg. 180 handout Ch. 4). Part 1e One could also use Naive Bayes as a classification approach. Assume a new customer nc comes in and has attribute values A = a2 and C = c1. How will this customer c be classified if one uses: The partially unfolded Decision Tree of Part c. A Naive Bayes classifier. Part 1f Explain the main differences between a Decision Tree classifier and a Naive Bayes classifier. 3

4 Assignment 2: Association Rules (6 pts) A supermarket stores all the transactions in a large database. These transactions database can be used for basket analysis. For the sake of simplicity and time we focus only on a small part of the the database and of all the items: transaction t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 items {bread,cheese, milk} {bread,cheese, jelly, peanut butter} {cheese,jelly,milk} {bread,cheese,jelly} {milk,peanut butter} {bread,cheese,milk,peanut butter} {jelly,milk} {jelly,milk,peanut butter} {bread,cheese,milk,peanut butter} {jelly,peanut butter} {bread,cheese} {bread,jelly,peanut butter} {cheese,milk} {bread,cheese,jelly,milk} Part 2a Part of the transaction data base. Compute the support and confidence of the following association rules: 1. {cheese} = {bread} 2. {bread} = {cheese} 3. = {peanut butter}with the empty set. Part 2b Compute all the association rules of the form X = {bread} with support s 50% and confidence α 60%. 4

5 Part 2c Suppose one wants to compute only association rules of the form X = {bread} with certain support s and confidence α. How must the Apriori algorithm be adapted in order to generate in an efficient way only association rules of the above form? Only describe clearly what must be adapted and how. 5

6 Assignment 3 (DW): Case Eniac (12 pts) The alumni association 1 of Computer Science called Eniac wants to analyze how strong the relationship is between the company where students do their final master project and the company of the first job of the student. They suspect that students often stay at the same company, i.e., have their first job at the same company as their master project, but it is unknown how often this occurs. Eniac is also interested in the degree to which the topic of the master project influences a student s first job and if there is a significant difference in how long someone stays in his/her first job when he/she does or does not has the first job at the same company as his/her master project. Eniac therefore likes to set up a data warehouse in which their own data on members is merged with data from the ASAS-system of the faculty. Eniac is founded in 1992, so data on their members is collected since then. ASAS contains information on open, running, and finished internships (Dutch stages ) and master projects. ASAS is running since 2002, hence contains data since For this exam question, you may assume that ASAS has complete data on all interships and master projects since 2002 of the whole of the faculty (which is not true in reality). The data warehouse project needs to be rather cost efficient, so priority lies with a data warehouse focussed on the above questions rather than on extensibility for other questions. Eniac (fictitious) ASAS (simplified) Member name studentnumber studyprogramme startyear dateofmasterdefense masterprojectcompany id currentjob id address address Company company id name Job member id company id nrofjob function Project project id kind studentname studentnumber study id supervisor id projecttitle projecttopic description status (open, running, or finished) company id startdate enddate Company company id name Studyprogramme study id name Supervisor supervisor id name address Figure 1: Databases 1 According to Merriam-Webster dictionary, alumnus means (1) a person who has attended or has graduated from a particular school, college, or university, or (2) a person who is a former member, employee, contributor, or inmate. In other words, alumni are former students of, in this case, Computer Science. 6

7 Part 3a (2 pts) i) Does Figure 1 contain metadata or not? Explain your answer. ii) Figure 1 contains many ambiguities that have to be clarified before a data warehouse can be set up. For example, Member.studyprogramme : does it contain a code like CS or is it in full Computer Science. Moreover, in the past, study programmes have had different names and there was a time when there was no separation between bachelor and master. Choose 2 attributes except studyprogramme and Company.name that you consider as the most ambiguous and describe as accurate as possible which ambiguities have to be clarified for them. Part 3b (4 pts) i) The data is by far not complete. Not all former students are member of Eniac (although many are), not every student does his master project externally at a company, etc. Discuss how problematic this is and advise how to deal with it in the data warehousing project. ii) Both databases have a table with companies. You can t simply compare them on company name nor id, while it is evident that this table plays a vital role in determining if students have their first job with the same company as their master project. Describe as accurate as possible which problems or complexities you forsee with the conversion and comparison of these tables. Also explain how you advise to approach solving those problems and complexities. Part 3c (5 pts) i) Which attributes and/or tables are not needed in the data warehouse. Explain your answer. ii) Give a design for the data structure of the data warehouse by means of a star schema with table names and attributes. iii) Give an estimation for the number of rows of your fact table. Mention your assumptions and explicitly provide the calculation. iv) With this data warehouse, can all business questions be fully answered? Explain as accurately as possible to what degree the questions can be answered and which considerations the analysts need to take into account when looking at the results. 7

8 Part 3d (1 pts) Eniac likes to repeat the analysis every year with fresh data. Discuss how you would approach this. Involve as many aspects as possible in your discussion and use proper data warehousing terminology if appropriate. 8

9 Assignment 4 (DW): Advanced Topics(7 pts) Part 4a (3 pts) Year Total City Gotham City (b) Metropolis Total (a) Table 1: Number of Cars per Year per City. Assume the numbers in Table 1 are the number of cars per city per year. What are the conditions that have to be true in order to calculate the total number of cars in cell (a) from the data given in Table 1? What are the conditions for calculating the total number of cars in cell (b) from the data given in Table 1? How could you discover if these conditions are met? What could you do, if these conditions are not met? Part 4b (4 pts) The Mail Order Company used a data warehouse for analyzing mail campaigns. The three Tables 3, 4, 5 show different cross tables. Assume all differences in the cross tables are statistical significant. The three Figures i), ii), and iii) in Table 2 show different causal graphs, which encode alternative believes about the causal influences between the variables. Assume that each graph shows the complete causal model. State for all nine combinations between the three cross tables and the three causal graphs: given the data table, would you reject the causal model (yes, no)? That means, which causal graph is inconsistent with which data table? Explain why you think that the causal graph iii) is consistent or inconsistent with the Table 3. 9

10 M a i l i n g R i c h O r d e r i) M a i l i n g O r d e r R i c h ii) M a i l i n g R i c h O r d e r iii) Table 2: Causal Graphs i), ii), and iii). Each graph shows alternative believes about the causal influences between the variables. E.g. Graph i) means that if a person is Rich, this has a positive causal influence that he/she creates an Order. However, the fact that he/she got a Mailing is not causing a higher chance for an Order. 10

11 Mailing Yes No Total Order Yes No Total Table 3: Order reactions (yes, no) from the customers after mailing campaign (yes, no). Rich Yes No Total Order Yes No Total Table 4: Order reactions (yes, no) from the customers depending on their wealth (Rich (yes, no)) Mailing Yes No Total Rich Yes No Total Yes No Total Order Yes No Total Table 5: Order reactions (yes, no) from the customers depending on their wealth (Rich (yes, no)) and the mailing campaign (yes, no) 11

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction

More information

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore Data Warehousing Data Mining (17MCA442) 1. GENERAL INFORMATION: PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore 560 100 Department of MCA COURSE INFORMATION SHEET Academic

More information

Exam Advanced Data Mining Date: Time:

Exam Advanced Data Mining Date: Time: Exam Advanced Data Mining Date: 11-11-2010 Time: 13.30-16.30 General Remarks 1. You are allowed to consult 1 A4 sheet with notes written on both sides. 2. Always show how you arrived at the result of your

More information

Association Pattern Mining. Lijun Zhang

Association Pattern Mining. Lijun Zhang Association Pattern Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms

More information

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem. STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,

More information

Advanced Relational Database Management MISM Course S A3 Spring 2019 Carnegie Mellon University

Advanced Relational Database Management MISM Course S A3 Spring 2019 Carnegie Mellon University Advanced Relational Database Management MISM Course S19-95736 A3 Spring 2019 Carnegie Mellon University Instructor: Randy Trzeciak Office: HBH 1104C Office hours: By Appointment Phone: 412-268-7040 E-mail:

More information

CPSC 311: Analysis of Algorithms (Honors) Exam 1 October 11, 2002

CPSC 311: Analysis of Algorithms (Honors) Exam 1 October 11, 2002 CPSC 311: Analysis of Algorithms (Honors) Exam 1 October 11, 2002 Name: Instructions: 1. This is a closed book exam. Do not use any notes or books, other than your 8.5-by-11 inch review sheet. Do not confer

More information

Advanced Relational Database Management MISM Course F A Fall 2017 Carnegie Mellon University

Advanced Relational Database Management MISM Course F A Fall 2017 Carnegie Mellon University Advanced Relational Database Management MISM Course F17-95736A Fall 2017 Carnegie Mellon University Instructor: Randy Trzeciak Office: HBH 1104C Office hours: By Appointment Phone: 412-268-7040 E-mail:

More information

SCHEME OF COURSE WORK. Data Warehousing and Data mining

SCHEME OF COURSE WORK. Data Warehousing and Data mining SCHEME OF COURSE WORK Course Details: Course Title Course Code Program: Specialization: Semester Prerequisites Department of Information Technology Data Warehousing and Data mining : 15CT1132 : B.TECH

More information

Classification by Association

Classification by Association Classification by Association Cse352 Ar*ficial Intelligence Professor Anita Wasilewska Generating Classification Rules by Association When mining associa&on rules for use in classifica&on we are only interested

More information

Association mining rules

Association mining rules Association mining rules Given a data set, find the items in data that are associated with each other. Association is measured as frequency of occurrence in the same context. Purchasing one product when

More information

Section A. 1. a) Explain the evolution of information systems into today s complex information ecosystems and its consequences.

Section A. 1. a) Explain the evolution of information systems into today s complex information ecosystems and its consequences. Section A 1. a) Explain the evolution of information systems into today s complex information ecosystems and its consequences. b) Discuss the reasons behind the phenomenon of data retention, its disadvantages,

More information

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #5: Entity/Relational Models---Part 1

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #5: Entity/Relational Models---Part 1 CS 4604: Introduction to Database Management Systems B. Aditya Prakash Lecture #5: Entity/Relational Models---Part 1 E/R: NOT IN BOOK! IMPORTANT: Follow only lecture slides for this topic! Differences

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

DATA WAREHOUING UNIT I

DATA WAREHOUING UNIT I BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009

More information

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad - 500 043 INFORMATION TECHNOLOGY DEFINITIONS AND TERMINOLOGY Course Name : DATA WAREHOUSING AND DATA MINING Course Code : AIT006 Program

More information

Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem

Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Lecture # 24: Data Warehousing / Data Mining (R&G, ch 25 and 26) Data mining detailed outline Problem

More information

Database Design with Entity Relationship Model

Database Design with Entity Relationship Model Database Design with Entity Relationship Model Vijay Kumar SICE, Computer Networking University of Missouri-Kansas City Kansas City, MO kumarv@umkc.edu Database Design Process Database design process integrates

More information

Tillämpad Artificiell Intelligens Applied Artificial Intelligence Tentamen , , MA:8. 1 Search (JM): 11 points

Tillämpad Artificiell Intelligens Applied Artificial Intelligence Tentamen , , MA:8. 1 Search (JM): 11 points Lunds Tekniska Högskola EDA132 Institutionen för datavetenskap VT 2017 Tillämpad Artificiell Intelligens Applied Artificial Intelligence Tentamen 2016 03 15, 14.00 19.00, MA:8 You can give your answers

More information

Question Bank. 4) It is the source of information later delivered to data marts.

Question Bank. 4) It is the source of information later delivered to data marts. Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

More information

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate

More information

WORKING WITH PIVOT TABLES

WORKING WITH PIVOT TABLES WORKING WITH PIVOT TABLES Introduction Perhaps the most powerful analytical tool that Excel provides is the PivotTable command, with which one can cross-tabulate data stored in Excel lists. A cross-tabulation

More information

CS154 Midterm Examination. May 4, 2010, 2:15-3:30PM

CS154 Midterm Examination. May 4, 2010, 2:15-3:30PM CS154 Midterm Examination May 4, 2010, 2:15-3:30PM Directions: Answer all 7 questions on this paper. The exam is open book and open notes. Any materials may be used. Name: I acknowledge and accept the

More information

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015 University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 2:00pm-3:30pm, Tuesday, December 15th Name: ComputingID: This is a closed book and closed notes exam. No electronic

More information

University of Toronto. CSC340S - Information Systems Analysis and Design

University of Toronto. CSC340S - Information Systems Analysis and Design csc340 Information Systems Analysis and Design page 1/12 University of Toronto Faculty of Arts and Science Department of Computer Science CSC340S - Information Systems Analysis and Design Spring 2002 John

More information

Information Management Fundamentals by Dave Wells

Information Management Fundamentals by Dave Wells Information Management Fundamentals by Dave Wells All rights reserved. Reproduction in whole or part prohibited except by written permission. Product and company names mentioned herein may be trademarks

More information

Course Syllabus. Programming Language Paradigms. Spring - DIS Copenhagen. Semester & Location: Elective Course - 3 credits.

Course Syllabus. Programming Language Paradigms. Spring - DIS Copenhagen. Semester & Location: Elective Course - 3 credits. Course Syllabus Programming Language Paradigms Semester & Location: Type & Credits: Spring - DIS Copenhagen Elective Course - 3 credits Major Disciplines: Faculty Members: Computer Science, Mathematics

More information

List of Exercises: Data Mining 1 December 12th, 2015

List of Exercises: Data Mining 1 December 12th, 2015 List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring

More information

No. of Printed Pages : 7 MBA - INFORMATION TECHNOLOGY MANAGEMENT (MBAITM) Term-End Examination December, 2014

No. of Printed Pages : 7 MBA - INFORMATION TECHNOLOGY MANAGEMENT (MBAITM) Term-End Examination December, 2014 No. of Printed Pages : 7 MBMI-011 MBA - INFORMATION TECHNOLOGY MANAGEMENT (MBAITM) Term-End Examination December, 2014 MBMI-011 : DATA WAREHOUSING AND DATA MINING Time : 3 hours Maximum Marks : 100 Note

More information

Modelling Structures in Data Mining Techniques

Modelling Structures in Data Mining Techniques Modelling Structures in Data Mining Techniques Ananth Y N 1, Narahari.N.S 2 Associate Professor, Dept of Computer Science, School of Graduate Studies- JainUniversity- J.C.Road, Bangalore, INDIA 1 Professor

More information

Chapter 4 Data Mining A Short Introduction

Chapter 4 Data Mining A Short Introduction Chapter 4 Data Mining A Short Introduction Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining 3. Clustering 4. Classification Data Mining - 2 2 1. Data Mining Overview

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

Data Mining. 2.4 Data Integration. Fall Instructor: Dr. Masoud Yaghini. Data Integration

Data Mining. 2.4 Data Integration. Fall Instructor: Dr. Masoud Yaghini. Data Integration Data Mining 2.4 Fall 2008 Instructor: Dr. Masoud Yaghini Data integration: Combines data from multiple databases into a coherent store Denormalization tables (often done to improve performance by avoiding

More information

Supervised and Unsupervised Learning (II)

Supervised and Unsupervised Learning (II) Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised

More information

Fall 2018 CSE 482 Big Data Analysis: Exam 1 Total: 36 (+3 bonus points)

Fall 2018 CSE 482 Big Data Analysis: Exam 1 Total: 36 (+3 bonus points) Fall 2018 CSE 482 Big Data Analysis: Exam 1 Total: 36 (+3 bonus points) Name: This exam is open book and notes. You can use a calculator but no laptops, cell phones, nor other electronic devices are allowed.

More information

The appendix contains information about the Classic Models database. Place your answers on the examination paper and any additional paper used.

The appendix contains information about the Classic Models database. Place your answers on the examination paper and any additional paper used. Name: Student Number: Instructions: Do all 9 questions. There is a total of 87 marks. The appendix contains information about the Classic Models database. Place your answers on the examination paper and

More information

Online Application Walkthrough for an Application for a Master s Programme

Online Application Walkthrough for an Application for a Master s Programme Online Application Walkthrough for an Application for a Master s Programme Contents PART I: Overview of the different application steps 3-4 Steps 1-4 3 PART II: Registration 5 1. Register 5 2. Set your

More information

INSTITUTE OF INFORMATION TECHNOLOGY UNIVERSITY OF DHAKA

INSTITUTE OF INFORMATION TECHNOLOGY UNIVERSITY OF DHAKA INSTITUTE OF INFORMATION TECHNOLOGY UNIVERSITY OF DHAKA http://www.iit.du.ac.bd/ BACHELOR OF SCIENCE IN SOFTWARE ENGINEERING (BSSE) 1. Institute of Information Technology (IIT) Institute of Information

More information

Data Analytics. Qualification Exam, May 18, am 12noon

Data Analytics. Qualification Exam, May 18, am 12noon CS220 Data Analytics Number assigned to you: Qualification Exam, May 18, 2014 9am 12noon Note: DO NOT write any information related to your name or KAUST student ID. 1. There should be 12 pages including

More information

Data Warehouse Testing. By: Rakesh Kumar Sharma

Data Warehouse Testing. By: Rakesh Kumar Sharma Data Warehouse Testing By: Rakesh Kumar Sharma Index...2 Introduction...3 About Data Warehouse...3 Data Warehouse definition...3 Testing Process for Data warehouse:...3 Requirements Testing :...3 Unit

More information

Software Requirements Specification Version 1.1 August 29, 2003

Software Requirements Specification Version 1.1 August 29, 2003 Software Requirements Specification Version 1.1 August 29, 2003 Web Accessible Alumni Database Michael J. Reaves Submitted in partial fulfillment Of the requirements of Masters Studio Project Table of

More information

Data mining - detailed outline. Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Problem.

Data mining - detailed outline. Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Problem. Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Data Warehousing / Data Mining (R&G, ch 25 and 26) C. Faloutsos and A. Pavlo Data mining detailed outline

More information

Stat 602X Exam 2 Spring 2011

Stat 602X Exam 2 Spring 2011 Stat 60X Exam Spring 0 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed . Below is a small p classification training set (for classes) displayed in

More information

Rochester Institute of Technology Golisano College of Computing and Information Sciences Department of Information Sciences and Technologies

Rochester Institute of Technology Golisano College of Computing and Information Sciences Department of Information Sciences and Technologies Rochester Institute of Technology Golisano College of Computing and Information Sciences Department of Information Sciences and Technologies 4002-360.01 ~ Introduction to Database & Data Modeling ~ Spring

More information

Software Design Description Report

Software Design Description Report 2015 Software Design Description Report CodeBenders Haldun Yıldız 1819663 Onur Aydınay 1819002 Deniz Can Yüksel 1819697 Ali Şihab Akcan 1818871 TABLE OF CONTENTS 1 Overview... 3 1.1 Scope... 3 1.2 Purpose...

More information

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application Data Structures Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali 2009-2010 Association Rules: Basic Concepts and Application 1. Association rules: Given a set of transactions, find

More information

PESIT Bangalore South Campus

PESIT Bangalore South Campus INTERNAL ASSESSMENT TEST - 2 Date : 19/09/2016 Max Marks : 50 Subject & Code : DATA WAREHOUSING AND DATA MINING(10IS74) Section : 7 th Sem. ISE A & B Name of faculty : Ms. Rashma.B.M Time : 11:30 to 1:00pm

More information

PARTICIPANT Guide. Unit 6

PARTICIPANT Guide. Unit 6 PARTICIPANT Guide Unit 6 UNIT 06 The Beauty of Symmetry PARTICIPANT Guide ACTIVITIES NOTE: At many points in the activities for Mathematics Illuminated, workshop participants will be asked to explain,

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension

More information

Unit I. By Prof.Sushila Aghav MIT

Unit I. By Prof.Sushila Aghav MIT Unit I By Prof.Sushila Aghav MIT Introduction The Need for Databases Data Models Relational Databases Database Design Storage Manager Query Processing Transaction Manager DBMS Applications DBMS contains

More information

CS264: Homework #4. Due by midnight on Wednesday, October 22, 2014

CS264: Homework #4. Due by midnight on Wednesday, October 22, 2014 CS264: Homework #4 Due by midnight on Wednesday, October 22, 2014 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Turn in your solutions

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

Instructor: Craig Duckett. Lecture 11: Thursday, May 3 th, Set Operations, Subqueries, Views

Instructor: Craig Duckett. Lecture 11: Thursday, May 3 th, Set Operations, Subqueries, Views Instructor: Craig Duckett Lecture 11: Thursday, May 3 th, 2018 Set Operations, Subqueries, Views 1 MID-TERM EXAM GRADED! Assignment 2 is due LECTURE 12, NEXT Tuesday, May 8 th in StudentTracker by MIDNIGHT

More information

Web For Alumni. Web-Based Service

Web For Alumni. Web-Based Service Web For Alumni Web for Alumni is the alumni self-service interface to Wofford s administrative software system. It will allow you, as an alumnus, to find information about classmates or change information

More information

Data Mining and Data Warehousing Introduction to Data Mining

Data Mining and Data Warehousing Introduction to Data Mining Data Mining and Data Warehousing Introduction to Data Mining Quiz Easy Q1. Which of the following is a data warehouse? a. Can be updated by end users. b. Contains numerous naming conventions and formats.

More information

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems Data Analysis and Design for BI and Data Warehousing Systems Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your

More information

IMPORTANT: Circle the last two letters of your class account:

IMPORTANT: Circle the last two letters of your class account: Spring 2011 University of California, Berkeley College of Engineering Computer Science Division EECS MIDTERM I CS 186 Introduction to Database Systems Prof. Michael J. Franklin NAME: STUDENT ID: IMPORTANT:

More information

CURRICULUM The Architectural Technology and Construction. programme

CURRICULUM The Architectural Technology and Construction. programme CURRICULUM The Architectural Technology and Construction Management programme CONTENT 1 PROGRAMME STRUCTURE 5 2 CURRICULUM COMMON PART 7 2.1 Core areas in the study programme 7 2.1.1 General 7 2.1.2 Company

More information

Graphical models are a lot like a circuit diagram they are written down to visualize and better understand a problem.

Graphical models are a lot like a circuit diagram they are written down to visualize and better understand a problem. Machine Learning (ML, F16) Lecture#15 (Tuesday, Nov. 1st) Lecturer: Byron Boots Graphical Models 1 Graphical Models Often, one is interested in representing a joint distribution P over a set of n random

More information

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Database Ph.D. Qualifying Exam Spring 2006

Database Ph.D. Qualifying Exam Spring 2006 Database Ph.D. Qualifying Exam Spring 2006 Please answer six of the following nine questions. Question 1. Consider the following relational schema: Employee (ID, Lname, Fname, Salary, Dnumber, City) where

More information

Time: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A.

Time: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A. COPYRIGHT RESERVED End Sem (V) MCA (XXVIII) 2017 Time: 3 hours Full Marks: 70 Candidates are required to give their answers in their own words as far as practicable. The figures in the margin indicate

More information

International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015)

International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) The Improved Apriori Algorithm was Applied in the System of Elective Courses in Colleges and Universities

More information

NATIONAL ASSOCIATION OF SCHOOL PSYCHOLOGISTS (NASP)

NATIONAL ASSOCIATION OF SCHOOL PSYCHOLOGISTS (NASP) NATIONAL ASSOCIATION OF SCHOOL PSYCHOLOGISTS (NASP) Instructions on Completing SPA Program Review Template/Form: Option B For use with: Program-level plans to meet Specialized Professional Associations

More information

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,

More information

Philadelphia University Faculty of Information Technology Department of Computer Science --- Semester, 2007/2008. Course Syllabus

Philadelphia University Faculty of Information Technology Department of Computer Science --- Semester, 2007/2008. Course Syllabus Philadelphia University Faculty of Information Technology Department of Computer Science --- Semester, 2007/2008 Course Syllabus Course Title: Advanced Databases Course Level: 4 Lecture Time: Course code:

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to JULY 2011 Afsaneh Yazdani What motivated? Wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge What motivated? Data

More information

Bachelor of Engineering Technology (Electronics & Controls) Curriculum Document. Australian College of Kuwait. (September 2015) BEEF15 - Version 5.

Bachelor of Engineering Technology (Electronics & Controls) Curriculum Document. Australian College of Kuwait. (September 2015) BEEF15 - Version 5. Bachelor of Engineering Technology (Electronics & Controls) Curriculum Document Australian College of Kuwait (September 2015) BEEF15 - Version 5.1 FOREWORD In this document, a curriculum for Bachelor of

More information

Guest Lecture. Daniel Dao & Chad Cotton

Guest Lecture. Daniel Dao & Chad Cotton Guest Lecture Daniel Dao & Chad Cotton OVERVIEW What is Civitas Learning What We Do Mission Statement Demo What I Do How I Use Databases Chad Cotton WHAT IS CIVITAS LEARNING Civitas Learning Mid-sized

More information

2 CONTENTS

2 CONTENTS Contents 5 Mining Frequent Patterns, Associations, and Correlations 3 5.1 Basic Concepts and a Road Map..................................... 3 5.1.1 Market Basket Analysis: A Motivating Example........................

More information

Department of Electrical Engineering and Computer Sciences Spring 2001 Instructor: Dan Garcia CS 3 Midterm #2. Personal Information

Department of Electrical Engineering and Computer Sciences Spring 2001 Instructor: Dan Garcia CS 3 Midterm #2. Personal Information University of California, Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences Spring 2001 Instructor: Dan Garcia 2001-04-09 CS 3 Midterm #2 Personal Information Last

More information

CHAPTER 3: DATA MODELING USING THE ENTITY-RELATIONSHIP (ER) MODEL

CHAPTER 3: DATA MODELING USING THE ENTITY-RELATIONSHIP (ER) MODEL Chapter 3: Data Modeling Using the Entity-Relationship (ER) Model 1 CHAPTER 3: DATA MODELING USING THE ENTITY-RELATIONSHIP (ER) MODEL Answers to Selected Exercises 3.16 Consider the following set of requirements

More information

Individual Project. Agnieszka Jastrzębska Władysław Homenda Lucjan Stapp

Individual Project. Agnieszka Jastrzębska Władysław Homenda Lucjan Stapp Individual Project Individual Project Target: 1. Improvement of software development skill 2. to industrial method of building application in practical way Individual Project Slide 2/50 Individual Project

More information

B-Trees and External Memory

B-Trees and External Memory Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 and External Memory 1 1 (2, 4) Trees: Generalization of BSTs Each internal node

More information

Higher National Unit Specification. General information for centres. Unit title: CAD: 3D Modelling. Unit code: DW13 34

Higher National Unit Specification. General information for centres. Unit title: CAD: 3D Modelling. Unit code: DW13 34 Higher National Unit Specification General information for centres Unit code: DW13 34 Unit purpose: This Unit is designed to introduce candidates to computerised 3D modelling and enable them to understand

More information

The Use of Soft Systems Methodology for the Development of Data Warehouses

The Use of Soft Systems Methodology for the Development of Data Warehouses The Use of Soft Systems Methodology for the Development of Data Warehouses Roelien Goede School of Information Technology, North-West University Vanderbijlpark, 1900, South Africa ABSTRACT When making

More information

1 Variations of the Traveling Salesman Problem

1 Variations of the Traveling Salesman Problem Stanford University CS26: Optimization Handout 3 Luca Trevisan January, 20 Lecture 3 In which we prove the equivalence of three versions of the Traveling Salesman Problem, we provide a 2-approximate algorithm,

More information

COMP Instructor: Dimitris Papadias WWW page:

COMP Instructor: Dimitris Papadias WWW page: COMP 5311 Instructor: Dimitris Papadias WWW page: http://www.cse.ust.hk/~dimitris/5311/5311.html Textbook Database System Concepts, A. Silberschatz, H. Korth, and S. Sudarshan. Reference Database Management

More information

B-Trees and External Memory

B-Trees and External Memory Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 B-Trees and External Memory 1 (2, 4) Trees: Generalization of BSTs Each internal

More information

CS143 Handout 20 Summer 2012 July 18 th, 2012 Practice CS143 Midterm Exam. (signed)

CS143 Handout 20 Summer 2012 July 18 th, 2012 Practice CS143 Midterm Exam. (signed) CS143 Handout 20 Summer 2012 July 18 th, 2012 Practice CS143 Midterm Exam This midterm exam is open-book, open-note, open-computer, but closed-network. This means that if you want to have your laptop with

More information

Working with Data. L1 Introduction to Database & SQL

Working with Data. L1 Introduction to Database & SQL Working with Data L1 Introduction to Database & SQL Agenda Admin stuff Structure & Schedule of Module Assessment What tools and software is available? 1 Why? What is Data? Is Data Important? But is it

More information

EXAM PREPARATION GUIDE

EXAM PREPARATION GUIDE When Recognition Matters EXAM PREPARATION GUIDE PECB Certified ISO 14001 Lead Implementer www.pecb.com The objective of the PECB Certified ISO 14001 Lead Implementer examination is to ensure that the candidate

More information

Introduction to AI Spring 2006 Dan Klein Midterm Solutions

Introduction to AI Spring 2006 Dan Klein Midterm Solutions NAME: SID#: Login: Sec: 1 CS 188 Introduction to AI Spring 2006 Dan Klein Midterm Solutions 1. (20 pts.) True/False Each problem is worth 2 points. Incorrect answers are worth 0 points. Skipped questions

More information

Comparison of FP tree and Apriori Algorithm

Comparison of FP tree and Apriori Algorithm International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.78-82 Comparison of FP tree and Apriori Algorithm Prashasti

More information

Statistical Techniques in Robotics (STR, S15) Lecture#06 (Wednesday, January 28)

Statistical Techniques in Robotics (STR, S15) Lecture#06 (Wednesday, January 28) Statistical Techniques in Robotics (STR, S15) Lecture#06 (Wednesday, January 28) Lecturer: Byron Boots Graphical Models 1 Graphical Models Often one is interested in representing a joint distribution P

More information

INTRODUCTION USER POPULATION

INTRODUCTION USER POPULATION iscreen Usability Cognitive Walkthrough Report Christine Wania George Abraham INTRODUCTION Context and motivation The College of IST recently installed an interactive kiosk called iscreen, designed to

More information

FROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE

FROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE FROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE David C. Hay Essential Strategies, Inc In the buzzword sweepstakes of 1997, the clear winner has to be Data Warehouse. A host of technologies and techniques

More information

COMS 4721: Machine Learning for Data Science Lecture 23, 4/20/2017

COMS 4721: Machine Learning for Data Science Lecture 23, 4/20/2017 COMS 4721: Machine Learning for Data Science Lecture 23, 4/20/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University ASSOCIATION ANALYSIS SETUP Many businesses

More information

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1 Basic Concepts :- 1. What is Data? Data is a collection of facts from which conclusion may be drawn. In computer science, data is anything in a form suitable for use with a computer. Data is often distinguished

More information

Artificial Intelligence Naïve Bayes

Artificial Intelligence Naïve Bayes Artificial Intelligence Naïve Bayes Instructors: David Suter and Qince Li Course Delivered @ Harbin Institute of Technology [M any slides adapted from those created by Dan Klein and Pieter Abbeel for CS188

More information

CS 1567 Intermediate Programming and System Design Using a Mobile Robot Aibo Lab3 Localization and Path Planning

CS 1567 Intermediate Programming and System Design Using a Mobile Robot Aibo Lab3 Localization and Path Planning CS 1567 Intermediate Programming and System Design Using a Mobile Robot Aibo Lab3 Localization and Path Planning In this lab we will create an artificial landscape in the Aibo pen. The landscape has two

More information

The Game of Criss-Cross

The Game of Criss-Cross Chapter 5 The Game of Criss-Cross Euler Characteristic ( ) Overview. The regions on a map and the faces of a cube both illustrate a very natural sort of situation: they are each examples of regions that

More information

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10) CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification

More information

Oracle9i Data Mining. Data Sheet August 2002

Oracle9i Data Mining. Data Sheet August 2002 Oracle9i Data Mining Data Sheet August 2002 Oracle9i Data Mining enables companies to build integrated business intelligence applications. Using data mining functionality embedded in the Oracle9i Database,

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

COS 226 Fall 2015 Midterm Exam pts.; 60 minutes; 8 Qs; 15 pgs :00 p.m. Name:

COS 226 Fall 2015 Midterm Exam pts.; 60 minutes; 8 Qs; 15 pgs :00 p.m. Name: COS 226 Fall 2015 Midterm Exam 1 60 + 10 pts.; 60 minutes; 8 Qs; 15 pgs. 2015-10-08 2:00 p.m. c 2015 Sudarshan S. Chawathe Name: 1. (1 pt.) Read all material carefully. If in doubt whether something is

More information

CS157a Fall 2018 Sec3 Home Page/Syllabus

CS157a Fall 2018 Sec3 Home Page/Syllabus CS157a Fall 2018 Sec3 Home Page/Syllabus Introduction to Database Management Systems Instructor: Chris Pollett Office: MH 214 Phone Number: (408) 924 5145 Email: chris@pollett.org Office Hours: MW 4:30-5:45pm

More information

Association Rule Mining. Entscheidungsunterstützungssysteme

Association Rule Mining. Entscheidungsunterstützungssysteme Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set

More information

Final Exam DATA MINING I - 1DL360

Final Exam DATA MINING I - 1DL360 Uppsala University Department of Information Technology Kjell Orsborn Final Exam 2012-10-17 DATA MINING I - 1DL360 Date... Wednesday, October 17, 2012 Time... 08:00-13:00 Teacher on duty... Kjell Orsborn,

More information

Introduction to Access 97/2000

Introduction to Access 97/2000 Introduction to Access 97/2000 PowerPoint Presentation Notes Slide 1 Introduction to Databases (Title Slide) Slide 2 Workshop Ground Rules Slide 3 Objectives Here are our objectives for the day. By the

More information