Decision Support Systems 2012/2013. MEIC - TagusPark. Homework #1. Due: 11.Mar.2013

Size: px
Start display at page:

Download "Decision Support Systems 2012/2013. MEIC - TagusPark. Homework #1. Due: 11.Mar.2013"

Transcription

1 Decision Support Systems 2012/2013 MEIC - TagusPark Homework #1 Due: 11.Mar Data Description and Pre-processing 1. A hospital is conducting a study on obesity in adult men and, as part of that study, tested the age and body fat for 18 randomly selected adults, with the following results: Age % Fat Age % Fat (a) ( 1 / 2 val.) Compute the mean, median and standard deviation for Age and % Fat. Include the expressions you used in your calculations and any additional elements you find relevant. Note: In your calculations, take into consideration that the data above concerns a sample of the population, not the whole population. The mean can be computed using the expression: X Age = 1 N N x i = i=1 XFat = 1 N N x i = i=1 The median can be computed upon sorting the data and determining the middle element. In this case, since we have an even number of data-points, median Age = x N/2 + x N/2+1 2 Finally, the (sample) variance can be computed as: = 33.5 median Fat = x N/2 + x N/2+1 2 = 17.4 and we get S 2 Age = 1 N 1 N (x i x Age ) 2 = SFat 2 = 1 N (x i x Fat ) 2 = N 1 i=1 s Age = i=1 SAge 2 = 8.71 s Fat = SFat 2 = 5.29.

2 Homework 1 Decision Support Systems Page 2 of 10 (b) ( 1 / 2 val.) Draw a scatter plot and a q-q plot based on the two variables. Include a brief explanation of the plots. The scatter plot is obtaining by plotting each pair of data-points (x Age, x Fat ) as they appear in the original table. The resulting plot is: 30 Scatter plot of Age vs. % of body fat 25 Body fat (\%) Age (years) The q-q plot, on the other hand, can be obtaining by pairing the quantiles of the two attributes. In this case, since both attributes have the same number of data-points, the q-q plot can easily be obtained by sorting the values in both attributes and plotting the resulting pairs, to yield: 30 Q Q plot of Age vs. % of body fat 25 Body fat (%) Age (years) (c) (1 val.) Normalize the two variables using min-max normalization, so that the data fits in the interval [0, 1]. Include the expressions you used in your calculations and any additional elements you find relevant.

3 Homework 1 Decision Support Systems Page 3 of 10 To normalize the two variables, we use, in each dataset, x i,norm = x i min j {x j } max j x j min j {x j } to get: Age norm % Fat norm Age norm % Fat norm (d) (1 val.) Calculate the correlation coefficient (Pearson s product moment coefficient) between the two attributes. Based on this computation, justify whether the two variables are positively or negatively correlated. The (sample) correlation coefficient betweeen Age and %Fat can be computed as corr(x, Y ) = 1 (N 1) N i=1 x i X s X yi Ȳ s Y. In our case, we have corr(age, %Fat) = 0.54 and we can conclude that, since corr(age, %Fat) > 0, Age and %Fat are positively correlated. 2. (3 val.) Suppose that you want to predict the value of some (discrete) variable Y knowing that some other variable, X, takes some given value, x. For example, returning to the setup of Question 1, you could be interested in predicting the value of % Fat for a person with Age = 35. Show that, in terms of expected squared error: E = E [ (Y c) 2 X = x ], the value c = E [Y X = x] is the best possible prediction for Y. Indicate all relevant computations. Suggestion: Compute the minimum of the above expression with respect to c. Recall that E [f(y ) X = x] denotes the expected value of f(y ) conditioned on the random variable X taking the value x, and is analytically given by E [f(y ) X = x] = f(y)p [Y = y X = x]. y In order to compute the value of c that minimizes the squared error E, we derive it with respect to

4 Homework 1 Decision Support Systems Page 4 of 10 c, to yield: de dc = d dc E [ (Y c) 2 X = x ] [ ] d = E dc (Y c)2 X = x = E [ 2(Y c) X = x] = 2E [Y X = x] + 2c. due to the linearity of E [ ] Equating the above expression to 0 and solving for c, we finally get: c = E [Y X = x]. 2 OLAP Queries 3. Suppose that a data warehouse for a technical support company is structured around three dimensions, Time, Technician and Client, and the measures Count and Charge. The measure Count keeps track of the number of times that a client/technicial was assisted/called upon. The measure Charge keeps track of the payments charged to each customer upon a visit by a technician. (a) (1 val.) Starting with the cuboid [Day, Technician, Client], what specific OLAP operations should be performed in order to list the total fee collected by each technician in 2012? Note: In your answer, you are free to assume any (reasonable) hierarchy for each of the different dimensions in the DW. You should indicate them in your answer. We assume that each of the three dimensions is organized according to the following hierarchies: Time is organized in the hierarchy Day-Month-Semester-Year-all Technician is organized in Technician-Expertise-Area-all Client is organized in Client-Neighborhood-City-State-all. Given this hierarchy, we would require the following OLAP operations: 1. A roll-up on the dimension Time from Day to Year. 2. A roll-up on the dimension Client from Client to all, to aggregate on this dimension. 3. A slice on the dimension Time to select Year = (b) (1 val.) Write a Transact-SQL query to obtain the same result, assuming that the data are stored in a relational database with the scheme Fee(Day, Month, Year, Technician, Store, Client, Charge). You should use the relevant OLAP operators you practiced in the lab session (indicate only the query). A possible T-SQL query would be:

5 Homework 1 Decision Support Systems Page 5 of 10 Technician, SUM(Charge) Fee WHERE Year = 2012 Technician WITH ROLLUP where we used ROLLUP to also include the total amount charged in 2012 (aggregated over all technicians). 4. (2 val.) Give an example of a query that uses grouping with ROLLUP that cannot be expressed by a single clause. Besides your query, you should indicate the clauses necessary to obtain the same information. In your example, you can use the JoBS database used in the lab session (indicate only the queries, not the result). Resorting to the JoBS database, as suggested, the query CASE WHEN GROUPING(E.EngineerName)=1 THEN All Engineers ELSE E.EngineerName END, CASE WHEN GROUPING(S.PartNumber)=1 THEN All Parts ELSE S.PartNumber END, SUM(S.UnitsHeld) EngineerStock S Engineers E S.EngineerId = E.EngineerId E.EngineerName, S.PartNumber WITH ROLLUP would require three clauses if the ROLLUP clause were not used, E.EngineerName, S.PartNumber, SUM(S.UnitsHeld) EngineerStock S Engineers E S.EngineerId = E.EngineerId E.EngineerName,

6 Homework 1 Decision Support Systems Page 6 of 10 S.PartNumber UNI E.EngineerName, All Parts, SUM(S.UnitsHeld) EngineerStock S Engineers E S.EngineerId = E.EngineerId E.EngineerName UNI All Engineers, All Parts, SUM(S.UnitsHeld) EngineerStock S Engineers E S.EngineerId = E.EngineerId in order to compute the total parts per Engineer and the total number of parts (overall). 2.1 Practical Questions (Using SQL Server 2008) For the following questions, you should use the AdventureWorksDW2012 database. To this purpose, at the beginning of your code you should include the following SQL statement: USE AdventureWorksDW2012; GO You can use the Object Explorer in the MS SQL Server Management Studio to explore the tables in this database as well as the attributes in each table. For completeness, Fig. 1 includes a (simplified) representation of the relevant tables and attributes for this homework, where the attributes in italic correspond to primary keys. 5. (3 val.) Write down the SQL query necessary to obtain the relation described in Fig. 2 from the table dbo.factinternetsales (see Fig. 1). This relation describes, for each order in dbo.factinternetsales, the first and last name of the corresponding customer, the postal code, state and country associated with the customer, the name of the product in the order, its shipping date, and the total amount paid by the customer. You need only to indicate the query, not the results. In terms of the relations in Fig. 1, The attribute OrderNumber in the new table corresponds to the attribute SalesOrderNumber in dbo.factinternetsales; The attribute State in the new table corresponds to the attribute StateProvinceCode in the table dbo.dimgeography;

7 Homework 1 Decision Support Systems Page 7 of 10 dbo.dimcustomer CustomerKey GeographyKey FirstName LastName BirthDate... dbo.factinternetsales SalesOrderNumber ProductKey ShipDateKey CustomerKey SalesAmount... dbo.dimgeography GeographyKey EnglishCountryRegionName City StateProvinceCode CountryRegionCode PostalCode... dbo.dimproduct ProductKey EnglishProductName ModelName ProductLine... dbo.dimdate DateKey FullDateAlternateKey CalendarYear... Figura 1: Simplified schema of the AdventureWorksDW2008 that includes only the relevant tables and attributes. (NoName) OrderNumber FirstName LastName PostalCode State Country Product ShipDate Total Figura 2: Table for question 5. The attribute Country in the new table corresponds to the attribute CountryRegionCode in the table dbo.geography; The attribute Product in the new table corresponds to the attribute EnglishProductName in dbo.dimproduct; The attribute ShipDate in the new table corresponds to the attribute FullDateAlternateKey in dbo.dimdate; The attribute Total in the new table corresponds to the attribute SalesAmount in the table dbo.factinternetsales; In your query you should use adequate JOIN operations. In particular, note that there may be orders for which not all information above may be available, but which should still be included in the results of your query. The SQL query would be: F.SalesOrderNumber AS OrderNumber, C.FirstName, C.LastName, G.PostalCode

8 Homework 1 Decision Support Systems Page 8 of 10 G.StateProvinceCode AS State, G.CountryRegionCode AS Country, P.EnglishProductName AS Product, D.FullDataAlternateKey AS ShipDate, F.SalesAmount AS Total dbo.factinternetsales F LEFT JOIN dbo.dimcustomer C F.CustomerKey = C.CustomerKey LEFT JOIN dbo.dimgeography G C.GeographyKey = G.GeographyKey LEFT JOIN dbo.dimproduct P F.ProductKey = P.ProductKey LEFT JOIN dbo.dimdate D F.ShipDateKey = D.DateKey 6. From the table dbo.factinternetsales (see Fig. 1), (a) (2 val.) Write down the SQL query necessary to determine the total sales amount per calendar year. You should include both the query and the obtained results. The SQL query is: D.CalendarYear AS Year, SUM(F.SalesAmount) AS Total dbo.factinternetsales F dbo.dimdate D F.ShipDateKey = D.DateKey D.CalendarYear The corresponding values are: Year Total ,517, ,158, ,105, ,576,978.98

9 Homework 1 Decision Support Systems Page 9 of 10 (b) (2 val.) With a single query, determine both the global and per year the total sales amount (Suggestion: Use a ROLLUP clause). You should include both the SQL query and the obtained results. The SQL query is: D.CalendarYear AS Year, SUM(F.SalesAmount) AS Total dbo.factinternetsales F dbo.dimdate D F.ShipDateKey = D.DateKey D.CalendarYear WITH ROLLUP The corresponding values are: Year Total ,105, ,576, ,517, ,158, All 29,358, (c) (3 val.) Using the CUBE clause, determine the total sales amount across the two dimensions: Year, corresponding to the CalendarYear attribute in table dbo.dimdate (associated with the shipping date), and Country, corresponding to the EnglishCountryRegionName attribute in table dbo.dimgeography. Write down the adequate SQL query and express the results as a crosstabulation. The SQL query is: D.CalendarYear as Year, G.EnglishCountryRegionName as Country, SUM(F.SalesAmount) as Total dbo.factinternetsales F dbo.dimdate D F.ShipDateKey = D.DateKey dbo.dimcustomer C F.CustomerKey = C.CustomerKey

10 Homework 1 Decision Support Systems Page 10 of 10 dbo.dimgeography G C.GeographyKey = G.GeographyKey D.CalendarYear, G.EnglishCountryRegionName WITH CUBE The corresponding cross-tabulation is: All Australia 1,251, ,166, ,002, ,641, ,061,000.6 Canada 143, , , , ,977,844.9 France 172, , , , ,644,017.7 Germany 219, , ,021, ,125, ,894,312.3 United Kingdom 280, , ,271, ,255, ,391,712.2 United States 1,038, ,172, ,721, ,457, ,389,789.5 All 3,105, ,576, ,517, ,158, ,358,677.2

--NOTE: When the task does not specify sort order, it is your responsibility to order the information -- so that is easy to interpret.

--NOTE: When the task does not specify sort order, it is your responsibility to order the information -- so that is easy to interpret. --* BUSIT 103 Assignment #8 DUE DATE: Consult course calendar Name: Christopher Singleton Class: BUSIT103 - Online Instructor: Art Lovestedt Date: 11/15/2014 --You are to develop SQL statements for each

More information

/* Module 9 Subqueries

/* Module 9 Subqueries /* Module 9 Subqueries This first part of this demo uses the AdventureWorksDW2012 database which is the data warehouse that corresponds to the AdventureWorks2012 operational database. */ USE AdventureWorksDW2012;

More information

Sample Dataedo documentation. Data warehouse. Documentation

Sample Dataedo documentation. Data warehouse. Documentation Sample Dataedo documentation Data warehouse Documentation 2017-09-01 of Contents 1. Dimensions... 5 1.1. s...5 1.1.1. : dbo.dimaccount... 5 1.1.2. : dbo.dimcurrency... 6 1.1.3. : dbo.dimcustomer...6 1.1.4.

More information

Assignment 1. Question 1: Brock Wilcox CS

Assignment 1. Question 1: Brock Wilcox CS Assignment 1 Brock Wilcox wilcox6@uiuc.edu CS 412 2009-09-30 Question 1: In the introduction chapter, we have introduced different ways to perform data mining: (1) using a data mining language to write

More information

Temporal Data Warehouses: Logical Models and Querying

Temporal Data Warehouses: Logical Models and Querying Temporal Data Warehouses: Logical Models and Querying Waqas Ahmed, Esteban Zimányi, Robert Wrembel waqas.ahmed@ulb.ac.be Université Libre de Bruxelles Poznan University of Technology April 2, 2015 ITBI

More information

Data Analysis and Data Science

Data Analysis and Data Science Data Analysis and Data Science CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/29/15 Agenda Check-in Online Analytical Processing Data Science Homework 8 Check-in Online Analytical

More information

In this task, you specify formatting properties for the currency and percentage measures in the Analysis Services Tutorial cube.

In this task, you specify formatting properties for the currency and percentage measures in the Analysis Services Tutorial cube. AL GHURAIR UNIVERSITY College of Computing CIS 303 Data Mining Week 3: Modifying Measures, Attributes and Hierarchies After defining, deploying, and processing your initial cube, and then reviewing dimension

More information

Chapter 18: Data Analysis and Mining

Chapter 18: Data Analysis and Mining Chapter 18: Data Analysis and Mining Database System Concepts See www.db-book.com for conditions on re-use Chapter 18: Data Analysis and Mining Decision Support Systems Data Analysis and OLAP 18.2 Decision

More information

SQL SERVER ASSIGNMENTS OPPORTUNITIES CREATE A SCHOOL DATABASE SIMILAR TO THIS

SQL SERVER ASSIGNMENTS OPPORTUNITIES CREATE A SCHOOL DATABASE SIMILAR TO THIS 1 OPPORTUNITIES CREATE A SCHOOL DATABASE SIMILAR TO THIS 2 CREATE A LIBRARY MANAGEMENT SYSTEM DATABASE SIMILAR TO THIS. TABLES ASSIGNMENT CREATE A TABLE STRUCTURE SIMILAR TO THIS : 3 SELECT C.EnglishProductCategoryName,S.EnglishProductSubcategoryName,

More information

Section A. 1. a) Explain the evolution of information systems into today s complex information ecosystems and its consequences.

Section A. 1. a) Explain the evolution of information systems into today s complex information ecosystems and its consequences. Section A 1. a) Explain the evolution of information systems into today s complex information ecosystems and its consequences. b) Discuss the reasons behind the phenomenon of data retention, its disadvantages,

More information

Implementing a Data Warehouse with SQL Server 2014

Implementing a Data Warehouse with SQL Server 2014 Training Handbook Implementing a Data Warehouse with SQL Server 2014 Some elements of this workshop are subject to change. This workshop is for informational purposes only. Module 2: Creating Multidimensional

More information

SQL Server Analysis Services

SQL Server Analysis Services DataBase and Data Mining Group of DataBase and Data Mining Group of Database and data mining group, SQL Server 2005 Analysis Services SQL Server 2005 Analysis Services - 1 Analysis Services Database and

More information

CIS 611: ENTERPRISE DATABASE AND DATA WAREHOUSING. Project: Multidimensional OLAP Cube using Adventure Works Data Warehouse

CIS 611: ENTERPRISE DATABASE AND DATA WAREHOUSING. Project: Multidimensional OLAP Cube using Adventure Works Data Warehouse CIS 611: ENTERPRISE DATABASE AND DATA WAREHOUSING Project: Multidimensional OLAP Cube using Adventure Works Data Warehouse Overview: A data warehouse is a centralized repository that stores data from multiple

More information

SSAS 2008 Tutorial: Understanding Analysis Services

SSAS 2008 Tutorial: Understanding Analysis Services Departamento de Engenharia Informática Sistemas de Informação e Bases de Dados Online Analytical Processing (OLAP) This tutorial has been copied from: https://www.accelebrate.com/sql_training/ssas_2008_tutorial.htm

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube OLAP2 outline Multi Dimensional Data Model Need for Multi Dimensional Analysis OLAP Operators Data Cube Demonstration Using SQL Multi Dimensional Data Model Multi dimensional analysis is a popular approach

More information

Decision Support Systems aka Analytical Systems

Decision Support Systems aka Analytical Systems Decision Support Systems aka Analytical Systems Decision Support Systems Systems that are used to transform data into information, to manage the organization: OLAP vs OLTP OLTP vs OLAP Transactions Analysis

More information

Data Mining By IK Unit 4. Unit 4

Data Mining By IK Unit 4. Unit 4 Unit 4 Data mining can be classified into two categories 1) Descriptive mining: describes concepts or task-relevant data sets in concise, summarative, informative, discriminative forms 2) Predictive mining:

More information

ETL and OLAP Systems

ETL and OLAP Systems ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data

More information

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value.

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value. AP Statistics - Problem Drill 05: Measures of Variation No. 1 of 10 1. The range is calculated as. (A) The minimum data value minus the maximum data value. (B) The maximum data value minus the minimum

More information

Unit 7: Basics in MS Power BI for Excel 2013 M7-5: OLAP

Unit 7: Basics in MS Power BI for Excel 2013 M7-5: OLAP Unit 7: Basics in MS Power BI for Excel M7-5: OLAP Outline: Introduction Learning Objectives Content Exercise What is an OLAP Table Operations: Drill Down Operations: Roll Up Operations: Slice Operations:

More information

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data. Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss

More information

Data warehouses Decision support The multidimensional model OLAP queries

Data warehouses Decision support The multidimensional model OLAP queries Data warehouses Decision support The multidimensional model OLAP queries Traditional DBMSs are used by organizations for maintaining data to record day to day operations On-line Transaction Processing

More information

Getting to Know Your Data

Getting to Know Your Data Chapter 2 Getting to Know Your Data 2.1 Exercises 1. Give three additional commonly used statistical measures (i.e., not illustrated in this chapter) for the characterization of data dispersion, and discuss

More information

Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services Course Details Course Outline Module 1: Introduction to Microsoft SQL Server Analysis Services This module introduces

More information

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 432 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

Regression Analysis and Linear Regression Models

Regression Analysis and Linear Regression Models Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical

More information

Data Science. Data Analyst. Data Scientist. Data Architect

Data Science. Data Analyst. Data Scientist. Data Architect Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &

More information

SQL Server 2005 Analysis Services

SQL Server 2005 Analysis Services atabase and ata Mining Group of atabase and ata Mining Group of atabase and ata Mining Group of atabase and ata Mining Group of atabase and ata Mining Group of atabase and ata Mining Group of SQL Server

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical

More information

Data Warehousing 2. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Data Warehousing 2. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa ICS 421 Spring 2010 Data Warehousing 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/30/2010 Lipyeow Lim -- University of Hawaii at Manoa 1 Data Warehousing

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 4320 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and CHAPTER-13 Mining Class Comparisons: Discrimination between DifferentClasses: 13.1 Introduction 13.2 Class Comparison Methods and Implementation 13.3 Presentation of Class Comparison Descriptions 13.4

More information

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS Assist. Prof. Dr. Volkan TUNALI Topics 2 Business Intelligence (BI) Decision Support System (DSS) Data Warehouse Online Analytical Processing (OLAP)

More information

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking

More information

Sql Fact Constellation Schema In Data Warehouse With Example

Sql Fact Constellation Schema In Data Warehouse With Example Sql Fact Constellation Schema In Data Warehouse With Example Data Warehouse OLAP - Learn Data Warehouse in simple and easy steps using Multidimensional OLAP (MOLAP), Hybrid OLAP (HOLAP), Specialized SQL

More information

Call: SAS BI Course Content:35-40hours

Call: SAS BI Course Content:35-40hours SAS BI Course Content:35-40hours Course Outline SAS Data Integration Studio 4.2 Introduction * to SAS DIS Studio Features of SAS DIS Studio Tasks performed by SAS DIS Studio Navigation to SAS DIS Studio

More information

Course Number : SEWI ZG514 Course Title : Data Warehousing Type of Exam : Open Book Weightage : 60 % Duration : 180 Minutes

Course Number : SEWI ZG514 Course Title : Data Warehousing Type of Exam : Open Book Weightage : 60 % Duration : 180 Minutes Birla Institute of Technology & Science, Pilani Work Integrated Learning Programmes Division M.S. Systems Engineering at Wipro Info Tech (WIMS) First Semester 2014-2015 (October 2014 to March 2015) Comprehensive

More information

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke Data Warehouses Yanlei Diao Slides Courtesy of R. Ramakrishnan and J. Gehrke Introduction v In the late 80s and early 90s, companies began to use their DBMSs for complex, interactive, exploratory analysis

More information

Real-World Performance Training Dimensional Queries

Real-World Performance Training Dimensional Queries Real-World Performance Training al Queries Real-World Performance Team Agenda 1 2 3 4 5 The DW/BI Death Spiral Parallel Execution Loading Data Exadata and Database In-Memory al Queries al Queries 1 2 3

More information

Testing Masters Technologies

Testing Masters Technologies 1. What is Data warehouse ETL TESTING Q&A Ans: A Data warehouse is a subject oriented, integrated,time variant, non volatile collection of data in support of management's decision making process. Subject

More information

EXAMGOOD QUESTION & ANSWER. Accurate study guides High passing rate! Exam Good provides update free of charge in one year!

EXAMGOOD QUESTION & ANSWER. Accurate study guides High passing rate! Exam Good provides update free of charge in one year! EXAMGOOD QUESTION & ANSWER Exam Good provides update free of charge in one year! Accurate study guides High passing rate! http://www.examgood.com Exam : 70-460 Title : Transition Your MCITP: Business Intelligence

More information

Data Mining. 2.4 Data Integration. Fall Instructor: Dr. Masoud Yaghini. Data Integration

Data Mining. 2.4 Data Integration. Fall Instructor: Dr. Masoud Yaghini. Data Integration Data Mining 2.4 Fall 2008 Instructor: Dr. Masoud Yaghini Data integration: Combines data from multiple databases into a coherent store Denormalization tables (often done to improve performance by avoiding

More information

Adnan YAZICI Computer Engineering Department

Adnan YAZICI Computer Engineering Department Data Warehouse Adnan YAZICI Computer Engineering Department Middle East Technical University, A.Yazici, 2010 Definition A data warehouse is a subject-oriented integrated time-variant nonvolatile collection

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension

More information

Business Analytics in the Oracle 12.2 Database: Analytic Views. Event: BIWA 2017 Presenter: Dan Vlamis and Cathye Pendley Date: January 31, 2017

Business Analytics in the Oracle 12.2 Database: Analytic Views. Event: BIWA 2017 Presenter: Dan Vlamis and Cathye Pendley Date: January 31, 2017 Business Analytics in the Oracle 12.2 Database: Analytic Views Event: BIWA 2017 Presenter: Dan Vlamis and Cathye Pendley Date: January 31, 2017 Vlamis Software Solutions Vlamis Software founded in 1992

More information

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process.

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process. MTAT.03.183 Data Mining Week 7: Online Analytical Processing and Data Warehouses Marlon Dumas marlon.dumas ät ut. ee Acknowledgment This slide deck is a mashup of the following publicly available slide

More information

Improving the Performance of OLAP Queries Using Families of Statistics Trees

Improving the Performance of OLAP Queries Using Families of Statistics Trees Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University

More information

Data Warehousing & OLAP

Data Warehousing & OLAP CMPUT 391 Database Management Systems Data Warehousing & OLAP Textbook: 17.1 17.5 (first edition: 19.1 19.5) Based on slides by Lewis, Bernstein and Kifer and other sources University of Alberta 1 Why

More information

Dta Mining and Data Warehousing

Dta Mining and Data Warehousing CSCI6405 Fall 2003 Dta Mining and Data Warehousing Instructor: Qigang Gao, Office: CS219, Tel:494-3356, Email: q.gao@dal.ca Teaching Assistant: Christopher Jordan, Email: cjordan@cs.dal.ca Office Hours:

More information

Data Mining and Analytics. Introduction

Data Mining and Analytics. Introduction Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Fall 2013 Reading: Chapter 3 Han, Chapter 2 Tan Anca Doloc-Mihu, Ph.D. Some slides courtesy of Li Xiong, Ph.D. and 2011 Han, Kamber & Pei. Data Mining. Morgan Kaufmann.

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Data Mining: Exploring Data. Lecture Notes for Chapter 3 Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 07 : 06/11/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus Overview: Analysis Services enables you to analyze large quantities of data. With it, you can design, create, and manage multidimensional structures that contain detail and aggregated data from multiple

More information

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel Research Methods for Business and Management Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel A Simple Example- Gym Purpose of Questionnaire- to determine the participants involvement

More information

An Overview of Data Warehousing and OLAP Technology

An Overview of Data Warehousing and OLAP Technology An Overview of Data Warehousing and OLAP Technology CMPT 843 Karanjit Singh Tiwana 1 Intro and Architecture 2 What is Data Warehouse? Subject-oriented, integrated, time varying, non-volatile collection

More information

OFFICIAL MICROSOFT LEARNING PRODUCT 10778A. Implementing Data Models and Reports with Microsoft SQL Server 2012 Companion Content

OFFICIAL MICROSOFT LEARNING PRODUCT 10778A. Implementing Data Models and Reports with Microsoft SQL Server 2012 Companion Content OFFICIAL MICROSOFT LEARNING PRODUCT 10778A Implementing Data Models and Reports with Microsoft SQL Server 2012 Companion Content 2 Implementing Data Models and Reports with Microsoft SQL Server 2012 Information

More information

Data Warehousing and Data Mining SQL OLAP Operations

Data Warehousing and Data Mining SQL OLAP Operations Data Warehousing and Data Mining SQL OLAP Operations SQL table expression query specification query expression SQL OLAP GROUP BY extensions: rollup, cube, grouping sets Acknowledgements: I am indebted

More information

Implementing Data Models and Reports with SQL Server 2014

Implementing Data Models and Reports with SQL Server 2014 Course 20466D: Implementing Data Models and Reports with SQL Server 2014 Page 1 of 6 Implementing Data Models and Reports with SQL Server 2014 Course 20466D: 4 days; Instructor-Led Introduction The focus

More information

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Syllabus. Syllabus. Motivation Decision Support. Syllabus Presentation: Sophia Discussion: Tianyu Metadata Requirements and Conclusion 3 4 Decision Support Decision Making: Everyday, Everywhere Decision Support System: a class of computerized information systems

More information

BUSINESS ANALYTICS. 96 HOURS Practical Learning. DexLab Certified. Training Module. Gurgaon (Head Office)

BUSINESS ANALYTICS. 96 HOURS Practical Learning. DexLab Certified. Training Module. Gurgaon (Head Office) SAS (Base & Advanced) Analytics & Predictive Modeling Tableau BI 96 HOURS Practical Learning WEEKDAY & WEEKEND BATCHES CLASSROOM & LIVE ONLINE DexLab Certified BUSINESS ANALYTICS Training Module Gurgaon

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

More information

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Lectures for the course: Data Warehousing and Data Mining (IT 60107) Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline

More information

SAS Visual Analytics 8.2: Working with Report Content

SAS Visual Analytics 8.2: Working with Report Content SAS Visual Analytics 8.2: Working with Report Content About Objects After selecting your data source and data items, add one or more objects to display the results. SAS Visual Analytics provides objects

More information

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing

More information

Big Data 13. Data Warehousing

Big Data 13. Data Warehousing Ghislain Fourny Big Data 13. Data Warehousing fotoreactor / 123RF Stock Photo The road to analytics Aurelio Scetta / 123RF Stock Photo Another history of data management (T. Hofmann) 1970s 2000s Age of

More information

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?

More information

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology ❷Chapter 2 Basic Statistics Business School, University of Shanghai for Science & Technology 2016-2017 2nd Semester, Spring2017 Contents of chapter 1 1 recording data using computers 2 3 4 5 6 some famous

More information

Advanced Data Management Technologies

Advanced Data Management Technologies ADMT 2017/18 Unit 10 J. Gamper 1/37 Advanced Data Management Technologies Unit 10 SQL GROUP BY Extensions J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements: I

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file 1 SPSS Guide 2009 Content 1. Basic Steps for Data Analysis. 3 2. Data Editor. 2.4.To create a new SPSS file 3 4 3. Data Analysis/ Frequencies. 5 4. Recoding the variable into classes.. 5 5. Data Analysis/

More information

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Data Warehouses Chapter 12. Class 10: Data Warehouses 1 Data Warehouses Chapter 12 Class 10: Data Warehouses 1 OLTP vs OLAP Operational Database: a database designed to support the day today transactions of an organization Data Warehouse: historical data is

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

Data Warehousing Conclusion. Esteban Zimányi Slides by Toon Calders

Data Warehousing Conclusion. Esteban Zimányi Slides by Toon Calders Data Warehousing Conclusion Esteban Zimányi ezimanyi@ulb.ac.be Slides by Toon Calders Motivation for the Course Database = a piece of software to handle data: Store, maintain, and query Most ideal system

More information

collection of data that is used primarily in organizational decision making.

collection of data that is used primarily in organizational decision making. Data Warehousing A data warehouse is a special purpose database. Classic databases are generally used to model some enterprise. Most often they are used to support transactions, a process that is referred

More information

8: Statistics. Populations and Samples. Histograms and Frequency Polygons. Page 1 of 10

8: Statistics. Populations and Samples. Histograms and Frequency Polygons. Page 1 of 10 8: Statistics Statistics: Method of collecting, organizing, analyzing, and interpreting data, as well as drawing conclusions based on the data. Methodology is divided into two main areas. Descriptive Statistics:

More information

Reminds on Data Warehousing

Reminds on Data Warehousing BUSINESS INTELLIGENCE Reminds on Data Warehousing (details at the Decision Support Database course) Business Informatics Degree BI Architecture 2 Business Intelligence Lab Star-schema datawarehouse 3 time

More information

Year 10 General Mathematics Unit 2

Year 10 General Mathematics Unit 2 Year 11 General Maths Year 10 General Mathematics Unit 2 Bivariate Data Chapter 4 Chapter Four 1 st Edition 2 nd Edition 2013 4A 1, 2, 3, 4, 6, 7, 8, 9, 10, 11 1, 2, 3, 4, 6, 7, 8, 9, 10, 11 2F (FM) 1,

More information

Data Modeling and Databases Ch 7: Schemas. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 7: Schemas. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 7: Schemas Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database schema A Database Schema captures: The concepts represented Their attributes

More information

Welcome to the topic of SAP HANA modeling views.

Welcome to the topic of SAP HANA modeling views. Welcome to the topic of SAP HANA modeling views. 1 At the end of this topic, you will be able to describe the three types of SAP HANA modeling views and use the SAP HANA Studio to work with views in the

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

Writing Queries Using Microsoft SQL Server 2008 Transact- SQL

Writing Queries Using Microsoft SQL Server 2008 Transact- SQL Writing Queries Using Microsoft SQL Server 2008 Transact- SQL Course 2778-08; 3 Days, Instructor-led Course Description This 3-day instructor led course provides students with the technical skills required

More information

Decision Support Systems 2012/2013. MEIC - TagusPark. Homework #5. Due: 15.Apr.2013

Decision Support Systems 2012/2013. MEIC - TagusPark. Homework #5. Due: 15.Apr.2013 Decision Support Systems 2012/2013 MEIC - TagusPark Homework #5 Due: 15.Apr.2013 1 Frequent Pattern Mining 1. Consider the database D depicted in Table 1, containing five transactions, each containing

More information

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated

More information

INTERMEDIATE SQL GOING BEYOND THE SELECT. Created by Brian Duffey

INTERMEDIATE SQL GOING BEYOND THE SELECT. Created by Brian Duffey INTERMEDIATE SQL GOING BEYOND THE SELECT Created by Brian Duffey WHO I AM Brian Duffey 3 years consultant at michaels, ross, and cole 9+ years SQL user What have I used SQL for? ROADMAP Introduction 1.

More information

Relations and Functions 2.1

Relations and Functions 2.1 Relations and Functions 2.1 4 A 2 B D -5 5 E -2 C F -4 Relation a set of ordered pairs (Domain, Range). Mapping shows how each number of the domain is paired with each member of the range. Example 1 (2,

More information

Implementing and Maintaining Microsoft SQL Server 2005 Analysis Services

Implementing and Maintaining Microsoft SQL Server 2005 Analysis Services Implementing and Maintaining Microsoft SQL Server 2005 Analysis Services Introduction Elements of this syllabus are subject to change. This three-day instructor-led course teaches students how to implement

More information

Choosing the Right Procedure

Choosing the Right Procedure 3 CHAPTER 1 Choosing the Right Procedure Functional Categories of Base SAS Procedures 3 Report Writing 3 Statistics 3 Utilities 4 Report-Writing Procedures 4 Statistical Procedures 6 Available Statistical

More information

Hyperion Interactive Reporting Reports & Dashboards Essentials

Hyperion Interactive Reporting Reports & Dashboards Essentials Oracle University Contact Us: +27 (0)11 319-4111 Hyperion Interactive Reporting 11.1.1 Reports & Dashboards Essentials Duration: 5 Days What you will learn The first part of this course focuses on two

More information

Homework set 4 - Solutions

Homework set 4 - Solutions Homework set 4 - Solutions Math 3200 Renato Feres 1. (Eercise 4.12, page 153) This requires importing the data set for Eercise 4.12. You may, if you wish, type the data points into a vector. (a) Calculate

More information

Writing Analytical Queries for Business Intelligence

Writing Analytical Queries for Business Intelligence MOC-55232 Writing Analytical Queries for Business Intelligence 3 Days Overview About this Microsoft SQL Server 2016 Training Course This three-day instructor led Microsoft SQL Server 2016 Training Course

More information

Computational Databases: Inspirations from Statistical Software. Linnea Passing, Technical University of Munich

Computational Databases: Inspirations from Statistical Software. Linnea Passing, Technical University of Munich Computational Databases: Inspirations from Statistical Software Linnea Passing, linnea.passing@tum.de Technical University of Munich Data Science Meets Databases Data Cleansing Pipelines Fuzzy joins Data

More information

SAS Web Report Studio 3.1

SAS Web Report Studio 3.1 SAS Web Report Studio 3.1 User s Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2006. SAS Web Report Studio 3.1: User s Guide. Cary, NC: SAS

More information

Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) *

Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) * OpenStax-CNX module: m39305 1 Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) * Free High School Science Texts Project This work is produced by OpenStax-CNX

More information

Multiple Regression White paper

Multiple Regression White paper +44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms

More information

QUALITY MONITORING AND

QUALITY MONITORING AND BUSINESS INTELLIGENCE FOR CMS DATA QUALITY MONITORING AND DATA CERTIFICATION. Author: Daina Dirmaite Supervisor: Broen van Besien CERN&Vilnius University 2016/08/16 WHAT IS BI? Business intelligence is

More information