Best Practices - Pentaho Data Modeling

Similar documents
Oracle Database 11g: Data Warehousing Fundamentals

Best Practice - Pentaho OLAP Design Guidelines

Pentaho and Online Analytical Processing (OLAP)

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database.

Call: SAS BI Course Content:35-40hours

Oracle Database 11g: Administer a Data Warehouse

Sql Fact Constellation Schema In Data Warehouse With Example

Data Warehouse and Data Mining

Getting Started enterprise 88. Oracle Warehouse Builder 11gR2: operational data warehouse. Extract, Transform, and Load data to

Data Warehouse and Data Mining

Cognos Dynamic Cubes

MOC 20463C: Implementing a Data Warehouse with Microsoft SQL Server

Welcome to the topic of SAP HANA modeling views.

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Full file at

InfoSphere Warehouse V9.5 Exam.

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

Customizing Blackboard Intelligence Adding columns to fact tables

OLAP Introduction and Overview

20463C-Implementing a Data Warehouse with Microsoft SQL Server. Course Content. Course ID#: W 35 Hrs. Course Description: Audience Profile

MICROSOFT BUSINESS INTELLIGENCE (MSBI: SSIS, SSRS and SSAS)

Introduction to DWH / BI Concepts

Pentaho 30 for 30 QUICK START EVALUTATION. Aakash Shah

Pentaho Aggregation Designer User Guide

Best Practices for Choosing Content Reporting Tools and Datasources. Andrew Grohe Pentaho Director of Services Delivery, Hitachi Vantara

Website: Contact: / Classroom Corporate Online Informatica Syllabus

CHAPTER 3 Implementation of Data warehouse in Data Mining

Data Integration and ETL with Oracle Warehouse Builder

Fig 1.2: Relationship between DW, ODS and OLTP Systems

MAD Skills: New Analysis Practices for Big Data

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

Pentaho Server: Optimizing Connection Pools

Course Contents: 1 Business Objects Online Training

Pentaho & SAS: Getting data from SAS and exploit it into Pentaho

Implement a Data Warehouse with Microsoft SQL Server

Question: 1 What are some of the data-related challenges that create difficulties in making business decisions? Choose three.

CSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News!

Implementing a Data Warehouse with Microsoft SQL Server

Data, Information, and Databases

Data Warehouse Testing. By: Rakesh Kumar Sharma

Microsoft End to End Business Intelligence Boot Camp

Data Warehousing 11g Essentials

Information Management Fundamentals by Dave Wells

MS-55045: Microsoft End to End Business Intelligence Boot Camp

Oracle 1Z0-640 Exam Questions & Answers

1Z Oracle Business Intelligence (OBI) Foundation Suite 11g Essentials Exam Summary Syllabus Questions

INFORMATION TECHNOLOGY STANDARD

Recommendations for Logging and Monitoring

Implementing a Data Warehouse with Microsoft SQL Server 2012

20767B: IMPLEMENTING A SQL DATA WAREHOUSE

Unit 10 Databases. Computer Concepts Unit Contents. 10 Operational and Analytical Databases. 10 Section A: Database Basics

20466C - Version: 1. Implementing Data Models and Reports with Microsoft SQL Server

Exam /Course 20767B: Implementing a SQL Data Warehouse

SAP CERTIFIED APPLICATION ASSOCIATE - SAP HANA 2.0 (SPS01)

Implementing a Data Warehouse with Microsoft SQL Server 2012

SwatCube An OLAP approach for Managing Swat Model results

A quality product by Brainheaters education solutions Pvt. Ltd. Brainheaters Notes. Revised (A.Y )

A Methodology for Integrating XML Data into Data Warehouses

Drawing the Big Picture

On-Line Application Processing

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

SAMPLE. Preface xi 1 Introducting Microsoft Analysis Services 1

After completing this course, participants will be able to:

Pentaho Data Integration (PDI) Techniques - Guidelines for Metadata Injection

Meetings This class meets on Mondays from 6:20 PM to 9:05 PM in CIS Room 1034 (in class delivery of instruction).

Chris Claterbos, Vlamis Software Solutions, Inc.

Data warehouse architecture consists of the following interconnected layers:

Oracle Hyperion Profitability and Cost Management

Graph Databases. Guilherme Fetter Damasio. University of Ontario Institute of Technology and IBM Centre for Advanced Studies IBM Corporation

ETL and OLAP Systems

CISC 7610 Lecture 4 Approaches to multimedia databases. Topics: Document databases Graph databases Metadata Column databases

Pentaho Data Integration (PDI) Standards for Lookups, Joins, and Subroutines

BUSINESS INTELLIGENCE FOR EVALUATION E-VOUCHER AIRLINE REPORT

Designing and Managing a Microsoft Business Intelligence Solution Exam.

STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / BTIX05 - BTECH DEPARTMENT OF INFORMATICS. By: Dr. Tendani J. Lavhengwa

Complete. The. Reference. Christopher Adamson. Mc Grauu. LlLIJBB. New York Chicago. San Francisco Lisbon London Madrid Mexico City

A Mathematical Model For Treatment Selection Literature

ETL Best Practices and Techniques. Marc Beacom, Managing Partner, Datalere

Vendor: IBM. Exam Code: P Exam Name: IBM InfoSphere Information Server Technical Mastery Test v2. Version: Demo

Audience BI professionals BI developers

Designing dashboards for performance. Reference deck

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

OLAP and Data Warehousing

Recently Updated Dumps from PassLeader with VCE and PDF (Question 1 - Question 15)

Working with Pentaho Interactive Reporting and Metadata

IBM Industry Data Models

Pentaho and Microsoft Azure

Configuring Pentaho to Use Database-Based Security

1Z0-526

MSBI. Business Intelligence Contents. Data warehousing Fundamentals

I. INTRODUCTION II. LITERATURE REVIEW. A. EPSBED 1) EPSBED Definition EPSBED is a reporting media which organized by the study program of each college

An Overview of Data Warehousing and OLAP Technology

EZY Intellect Pte. Ltd., #1 Changi North Street 1, Singapore

IBM Cognos Framework Manager: Design Metadata Models (V10.2)

SAS Data Integration Studio 3.3. User s Guide

Oracle In-Memory & Data Warehouse: The Perfect Combination?

Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

Oracle BI 11g R1: Build Repositories

Performance and Scalability Overview

Transcription:

Best Practices - Pentaho Data Modeling

This page intentionally left blank.

Contents Overview... 1 Best Practices for Data Modeling and Data Storage... 1 Best Practices - Data Modeling... 1 Dimensional Models... 1 Database Optimization... 2 Database Indexing... 2 Schema Tables... 2 Default Values... 3 Best Practices - Data Storage... 3 Data Storage... 3 Reporting Data... 3 Out-of-the-Box Configuration... 3

This page intentionally left blank.

Overview This document is intended to provide best practices around how to design and build your Pentaho solution for maximum speed, reuse, portability, maintainability, and knowledge transfer. It is not intended to demonstrate how to implement each best practice or provide templates based on the best practices defined within the document. Software Version Pentaho 5.4, 6.x, 7.x Best Practices for Data Modeling and Data Storage The document is arranged in a series of topic groups with individual best practices for that topic explained: Data Modeling Data Storage Best Practices - Data Modeling This section provides best practices and information on data models and operating and improving databases, tables, and values. Dimensional Models Database Optimization Database Indexing Schema Tables Default Values Dimensional Models A dimensional model design should be used whenever possible. Dimensional models are optimized for online queries and data warehousing. A Star Schema is a common example of a dimensional model. Dimensional models allow Mondrian and Pentaho to perform best at high volumes. Pentaho Data Modeling Best Practices Pentaho 1

Database Optimization Make sure the database server and instance are optimized for analytic workloads. Databases have specific parameters for analysis that do not apply to transaction workloads. Typically, this is providing the maximum amount of RAM and CPU to the database server and adjusting the Database Management System (DBMS) kernel parameters to efficiently use that additional capacity. Mondrian can only perform as fast as the database can return data. Database Indexing Make sure the database has standard indexing applied. Create indexes on all primary keys of a dimension, and all foreign keys in a fact table. Create indexes for each level of each hierarchy in all dimensions of all cubes of all schemas. A common approach to indexing can be found in Recommendations for database tuning. Indexes on keys are especially important on high cardinality dimensions and levels. Primary and foreign keys should be single-column integers or BIGINT. Keys should not be string, GUID or a combination of several fields. This allows the Mondrian-generated queries to perform at an optimal rate. Schema Tables Avoid using database views or structured query language (SQL) queries as tables in a schema, where possible. Define all tables in the schema as database tables. Normally, when an SQL query or view is desired, this is an indication that more extracting, transforming, and loading (ETL) needs to be done to the data before analysis. The entire database view must be evaluated before the filters are applied to avoid poor performance. Normally, if the dimensional model is set up properly, these techniques are not needed. If you are unable or unwilling to create a dimensional model and use ETL, these may be your only options. Pentaho Data Modeling Best Practices Pentaho 2

Default Values Pre-populate a record for all levels of all hierarchies with a default value. Use a value of N/A, unknown, or -1 to represent a not found value in a lookup. This will allow the data to flow into the analytic database without being lost. N/A records can later be found and updated as appropriate. Best Practices - Data Storage This section provides best practices for storing and reporting data, as well out-of-the-box configuration for reporting. Data Storage Reporting Data Out of-the-box Configuration Data Storage Data should reside on high-speed input-output storage. Use a physically mounted drive, or storage area network (SAN) with fiber channel in the same data center. A virtual machine for a database should be avoided, if possible, unless data storage concerns can be addressed. Database performance can degrade the performance of the entire analytic project. Reporting Data Do not use the database provided with Pentaho to store your reporting data. Store your reporting data in other database platforms better suited for this workload. The Pentaho database is for metadata around a Pentaho object, not for reporting data. Out-of-the-Box Configuration Do not use an out-of-the-box configuration for your reporting data DMBS. Modify your memory and kernel parameters soon after installation before loading with data. Most out-of-the-box DBMS configurations will only perform very well on very small or demonstrative data sets. Pentaho Data Modeling Best Practices Pentaho 3