Data Warehouse Testing. By: Rakesh Kumar Sharma

Similar documents
Fig 1.2: Relationship between DW, ODS and OLTP Systems

DATA MINING TRANSACTION

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database.

Data warehouse architecture consists of the following interconnected layers:

STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / BTIX05 - BTECH DEPARTMENT OF INFORMATICS. By: Dr. Tendani J. Lavhengwa

Full file at

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Data Mining Concepts & Techniques

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

Oracle Database 11g: Data Warehousing Fundamentals

Data Warehouse and Data Mining

Data warehousing in telecom Industry

Testing Masters Technologies

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model

Sql Fact Constellation Schema In Data Warehouse With Example

BI/DWH Test specifics

After completing this course, participants will be able to:

QUALITY MONITORING AND

Data Warehouse and Data Mining

Summary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse

QA Microsoft Designing Business Intelligence Solutions with Microsoft SQL Server 2012

A Methodology for Integrating XML Data into Data Warehouses

Basics of Dimensional Modeling

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Call: SAS BI Course Content:35-40hours

Decision Support, Data Warehousing, and OLAP

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus

REPORTING AND QUERY TOOLS AND APPLICATIONS

CHAPTER 3 Implementation of Data warehouse in Data Mining

Data transfer, storage and analysis for data mart enlargement

Data Mining and Data Warehousing Introduction to Data Mining

An Overview of Data Warehousing and OLAP Technology

Adnan YAZICI Computer Engineering Department

Data Warehouse Design Using Row and Column Data Distribution

20466C - Version: 1. Implementing Data Models and Reports with Microsoft SQL Server

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa

Microsoft SQL Server Training Course Catalogue. Learning Solutions

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

Data Management Glossary

Rocky Mountain Technology Ventures

COURSE 20466D: IMPLEMENTING DATA MODELS AND REPORTS WITH MICROSOFT SQL SERVER

Evolution of Database Systems

Advanced Data Management Technologies Written Exam

Decision Support Systems aka Analytical Systems

Data Warehousing. Adopted from Dr. Sanjay Gunasekaran

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

A Systems Approach to Dimensional Modeling in Data Marts. Joseph M. Firestone, Ph.D. White Paper No. One. March 12, 1997

Getting Started enterprise 88. Oracle Warehouse Builder 11gR2: operational data warehouse. Extract, Transform, and Load data to

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394

COMPUTER MCQs. 1. DOS floppy disk does not have 1) a boot record 2) a file allocation table 3) a root directory

20767B: IMPLEMENTING A SQL DATA WAREHOUSE

ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS. CS121: Relational Databases Fall 2017 Lecture 22

Question Bank. 4) It is the source of information later delivered to data marts.

Integrating with Microsoft Visual Studio Team System. For Borland CaliberRM Users

Data Warehousing and OLAP Technology for Primary Industry

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

collection of data that is used primarily in organizational decision making.

The Data Organization

Oracle BI 11g R1: Build Repositories

Informatica Power Center 10.1 Developer Training

The Data Organization

Decision Support Systems

Data Warehousing and OLAP Technologies for Decision-Making Process

Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses

Instant Data Warehousing with SAP data

Logical design DATA WAREHOUSE: DESIGN Logical design. We address the relational model (ROLAP)

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

Topics covered 10/12/2015. Pengantar Teknologi Informasi dan Teknologi Hijau. Suryo Widiantoro, ST, MMSI, M.Com(IS)

ETL and OLAP Systems

Data Warehousing 2. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

: How does DSS data differ from operational data?

Implement a Data Warehouse with Microsoft SQL Server

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Data Warehousing Concepts

Implementing a Data Warehouse with Microsoft SQL Server 2014

A Data Warehouse Implementation Using the Star Schema. For an outpatient hospital information system

A Case Study Building Financial Report and Dashboard Using OBIEE Part I

OBIEE Course Details

BUSINESS INTELLIGENCE AND OLAP

20463C-Implementing a Data Warehouse with Microsoft SQL Server. Course Content. Course ID#: W 35 Hrs. Course Description: Audience Profile

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

SQL Server Analysis Services

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

InfoSphere Warehouse V9.5 Exam.

Q1) Describe business intelligence system development phases? (6 marks)

Implementing a Data Warehouse with Microsoft SQL Server

Knowledge Modelling and Management. Part B (9)

Implementing Data Models and Reports with SQL Server 2014

MIS2502: Data Analytics Dimensional Data Modeling. Jing Gong

Exam /Course 20767B: Implementing a SQL Data Warehouse

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Data Warehouse. Asst.Prof.Dr. Pattarachai Lalitrojwong

Handout 12 Data Warehousing and Analytics.

Transcription:

Data Warehouse Testing By: Rakesh Kumar Sharma

Index...2 Introduction...3 About Data Warehouse...3 Data Warehouse definition...3 Testing Process for Data warehouse:...3 Requirements Testing :...3 Unit Testing :...4 Integration Testing :...4 Scenarios to be covered in Integration Testing...5 Validating the Report data...6 User Acceptance Testing...6 Conclusion...6

Introduction This document details the testing process involved in data warehouse testing and test coverage areas. It explains the importance of data warehouse application testing and the various steps of the testing process. About Data Warehouse Data warehouse is the main repository of the organization's historical data. It contains the data for management's decision support system. The important factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis (data mining) on the information within data warehouse without slowing down the operational systems. Data Warehouse definition Subject-oriented : Subject Oriented -Data warehouses are designed to help you analyse data. For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented. The data is organized so that all the data elements relating to the same realworld event or object are linked together. Integrated : Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. The database contains data from most or all of an organization's operational applications and is made consistent. Time-variant : The changes to the data in the database are tracked and recorded to produce reports on data changed over time. In order to discover trends in business, analysts need large amounts of data. A data warehouse's focus on change over time is what is meant by the term time variant. Non-volatile : Data in the database is never over-written or deleted, once committed, the data is static, read-only, but retained for future reporting. Once entered into the warehouse, data should not change. This is logical because the purpose of a data warehouse is to enable you to analyse what has occurred. Testing Process for Data warehouse: Testing for a Data warehouse consists of requirements testing, unit testing, integration testing and acceptance testing. Requirements Testing : The main aim for doing Requirements testing is to check stated requirements for completeness. Requirements can be tested on following factors. 1. Are the requirements Complete? 2. Are the requirements Singular? 3. Are the requirements Ambiguous? 4. Are the requirements Developable? 5. Are the requirements Testable? In a Data warehouse, the requirements are mostly around reporting. Hence it becomes more important to verify whether these reporting requirements can be catered using the data available. Successful requirements are those structured closely to business rules and address functionality and performance. These business rules and requirements provide a solid

foundation to the data architects. Using the defined requirements and business rules, high level design of the data model is created. Once requirements and business rules are available, rough scripts can be drafted to validate the data model constraints against the defined business rules. Unit Testing : Unit testing for data warehouses is WHITEBOX. It should check the ETL procedures/mappings/jobs and the reports developed. This is usually done by the developers. Unit testing will involve following 1. Whether ETLs are accessing and picking up right data from right source. 2. All the data transformations are correct according to the business rules and data warehouse is correctly populated with the transformed data. 3. Testing the rejected records that don t fulfil transformation rules. Integration Testing : After unit testing is complete, it should form the basis of starting integration testing. Integration testing should test out initial and incremental loading of the data warehouse. Integration testing will involve following 1. Sequence of ETLs jobs in batch. 2. Initial loading of records on data warehouse. 3. Incremental loading of records at a later date to verify the newly inserted or updated data. 4. Testing the rejected records that don t fulfil transformation rules. 5. Error log generation. The overall Integration testing life cycle executed is planned in four phases: Requirements Understanding, Test Planning and Design, Test Case Preparation and Test Execution.

QA Team Reviews BRD for completeness. QA Team builds Test Plan Business Requirement Document/Requirement Traceability Matrix Requirements Testing High Level Design document Review of HLD Develop Test Cases and SQL Queries Test Case Preparation Unit Testing Test Execution Functional Testing Regression Testing Performance Testing User Acceptance Testing (UAT) Process for Data warehouse Testing Scenarios to be covered in Integration Testing Integration Testing would cover End-to-End Testing for DWH. The coverage of the tests would include the below: 1. Count Validation - Record Count Verification DWH backend/reporting queries against source and target as a initial check. 2. Source Isolation - Validation after isolating the driving sources. 3. Dimensional Analysis - Data integrity between the various source tables and relationships. 4. Statistical Analysis - Validation for various calculations. 5. Data Quality Validation - Check for missing data, negatives and consistency. Field-by-Field data verification can be done to check the consistency of source and target data. 6. Granularity - Validate at the lowest granular level possible (Lowest in the hierarchy E.g. Country- City-Street start with test cases on street). 7. Other validations. - Graphs, Slice/dice, meaningfulness, accuracy

Validating the Report data Once the ETLs are tested for count and data verification, the data being showed onto the reports hold utmost importance. QA team should verify the data reported with the source data for consistency and accuracy. 1. Verify Report data with source - Although the data present in a data warehouse will be stored at an aggregate level compare to source systems. Here the QA team should verify the granular data stored in data warehouse against the source data available. 2. Field level data verification - QA team must understand the linkages for the fields displayed in the report and should trace back and compare that with the source systems. 3. Creating SQLs - Create SQL queries to fetch and verify the data from Source and Target. Sometimes it s not possible to do the complex transformations done in ETL. In such a case the data can be transferred to some file and calculations can be performed. User Acceptance Testing Here the system is tested with full functionality and is expected to function as in production. At the end of UAT, the system should be acceptable to the client for use in terms of ETL process integrity and business functionality and reporting. Conclusion Evolving needs of the business and changes in the source systems will drive continuous change in the data warehouse schema and the data being loaded. Hence, it is necessary that development and testing processes are clearly defined, followed by impact-analysis and strong alignment between development, operations and the business. Database Design Most data warehouses use a star schema to represent the multidimensional data model. The database consists of a single fact table and a single table for each dimension. Each tuple in the fact table consists of a pointer (foreign key often uses a generated key for efficiency) to each of the dimensions that provide its multidimensional coordinates, and stores the numeric measures for those coordinates. Each dimension table consists of columns that correspond to attributes of the dimension

Star schemas do not explicitly provide support for attribute hierarchies. Snowflake schemas provide a refinement of star schemas where the dimensional hierarchy is explicitly represented by normalizing the dimension tables.

Typical Data Warehousing Architecture Conceptual Model and Front End Tools Multidimensional Data