Performance Tuning in Informatica Developer

Similar documents
Performance Optimization for Informatica Data Services ( Hotfix 3)

ETL Transformations Performance Optimization

Performance Tuning. Chapter 25

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

How to Convert an SQL Query to a Mapping

Migrating Mappings and Mapplets from a PowerCenter Repository to a Model Repository

Creating a Column Profile on a Logical Data Object in Informatica Developer

SelfTestEngine.PR000041_70questions

How to Use Full Pushdown Optimization in PowerCenter

Database System Concepts

Optimizing Performance for Partitioned Mappings

Optimizing Testing Performance With Data Validation Option

Informatica Power Center 10.1 Developer Training

Developing a Dynamic Mapping to Manage Metadata Changes in Relational Sources

Introduction Alternative ways of evaluating a given query using

Transformer Looping Functions for Pivoting the data :

Chapter 14: Query Optimization

SQL Server Integration Services

Jyotheswar Kuricheti

Advanced Databases. Lecture 4 - Query Optimization. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

Strategies for Incremental Updates on Hive

University of Waterloo Midterm Examination Sample Solution

Cannot Create Index On View 'test' Because

Query Optimization. Shuigeng Zhou. December 9, 2009 School of Computer Science Fudan University

IBM Optim Query Workload Tuner for DB2 for z/os 4.1. Hands-on Labs

7. Query Processing and Optimization

Optimizing Session Caches in PowerCenter

Chapter 13: Query Optimization. Chapter 13: Query Optimization

An Optimization of Disjunctive Queries : Union-Pushdown *

Creating a Web Service that Accesses Data from Two Data Objects

DB2 SQL Tuning Tips for z/os Developers

Chapter 12: Query Processing

Query processing and optimization

Chapter 11: Query Optimization

T-SQL Training: T-SQL for SQL Server for Developers

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 16-1

Querying Data with Transact SQL

Chapter 14 Query Optimization

Chapter 14 Query Optimization

Chapter 14 Query Optimization

Informatica Data Explorer Performance Tuning

CSE 530A. Query Planning. Washington University Fall 2013

20761 Querying Data with Transact SQL

MIT Introduction to Program Analysis and Optimization. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Query Processing & Optimization

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

Microsoft Office Access 2007: Intermediate Course 01 Relational Databases

Querying Data with Transact-SQL

SQL Server Reporting Services

Workbooks (File) and Worksheet Handling

Intelligent Caching in Data Virtualization Recommended Use of Caching Controls in the Denodo Platform

Advanced Oracle Performance Troubleshooting. Query Transformations Randolf Geist

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

Table of Contents. Eccella 1

PowerCenter 7 Architecture and Performance Tuning

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Product Documentation SAP Business ByDesign August Analytics

Tuning the Hive Engine for Big Data Management

Increasing Performance for PowerCenter Sessions that Use Partitions

Tour of common optimizations

IT Deployment Guide AT-OMNI-512 AT-OMNI-521 AT-OMNI-111 AT-OMNI-112 AT-OMNI-121 AT-OMNI-122. Atlona Manuals OmniStream AT-OMNI-232

Informatica PIM. Export - Data types. Version: Date:

Talend Open Studio for Data Quality. User Guide 5.5.2

Querying Data with Transact SQL Microsoft Official Curriculum (MOC 20761)

Optimized Analytical Processing New Features with 11g R2

CSIT5300: Advanced Database Systems

BusinessObjects Frequently Asked Questions

20761B: QUERYING DATA WITH TRANSACT-SQL

Aggregate Data in Informatica Developer

Management Reports Centre. User Guide. Emmanuel Amekuedi

Querying Data with Transact-SQL

IBM Case Manager Version User's Guide IBM SC

Introduction to Optimization Local Value Numbering

Query Optimization. Chapter 13

Chapter 19 Query Optimization

Database System Concepts

Sql Server Syllabus. Overview

Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses

Join (SQL) - Wikipedia, the free encyclopedia

Administração e Optimização de Bases de Dados 2012/2013 Index Tuning

Frequently Asked Questions. Fulltext Indexing on Large Documentum Repositories For Content Server Versions up to 5.2.x

ORACLE TRAINING CURRICULUM. Relational Databases and Relational Database Management Systems

Integration Services. Creating an ETL Solution with SSIS. Module Overview. Introduction to ETL with SSIS Implementing Data Flow

Creating an Analyst Viewer User and Group

192 Chapter 14. TotalCost=3 (1, , 000) = 6, 000

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

CMP-3440 Database Systems

Creating a Crosstab Query in Design View

6 Months Training Module in.net Module 1-Total Days-20

Performance Tuning for MDM Hub for IBM DB2

Proje D2K. CMM (Capability Maturity Model) level Project Standard:- Corporate Trainer s Profile

Evaluation of Relational Operations

PROBLEM SOLVING AND OFFICE AUTOMATION. A Program consists of a series of instruction that a computer processes to perform the required operation.

Project. CIS611 Spring 2014 SS Chung Due by April 15. Performance Evaluation Experiment on Query Rewrite Optimization

TotalCost = 3 (1, , 000) = 6, 000

Release Notes. PREEvision. Version 6.5 SP11 English

14.3 Transformation of Relational Expressions

Functional programming with Common Lisp

Configuring a JDBC Resource for IBM DB2/ iseries in Metadata Manager HotFix 2

Advanced Databases. Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch

Transcription:

Performance Tuning in Informatica Developer 2010 Informatica

Abstract The Data Integration Service uses optimization methods to improve the performance of a mapping. You can choose an optimizer level to determine which optimization methods the Data Integration Service applies to the mapping at run-time. This article describes the Data Integration Service optimization methods and the optimizer levels that you can choose. Supported Versions Data Services 9.0-9.0.1 Data Quality 9.0-9.0.1 Table of Contents Performance Tuning Overview.... 2 Optimization Methods.... 3 Early Projection Optimization Method.... 3 Early Selection Optimization Method.... 3 Predicate Optimization Method.... 4 Semi-Join Optimization Method.... 5 Setting the Optimizer Level for a Developer Tool Mapping.... 6 Setting the Optimizer Level for a Deployed Mapping.... 6 Performance Tuning Overview The Developer tool contains features that allow you to tune the performance of mappings. You might be able to improve mapping performance by updating mapping optimizer level through the mapping configuration or mapping deployment properties. If you notice that a mapping takes an excessive amount of time to run, you might want to change the optimizer level for the mapping. The optimizer level determines which optimization methods the Data Integration Service applies to the mapping at run-time. You can choose one of the following optimizer levels: None. The Data Integration Service does not optimize the mapping. It runs the mapping exactly as you designed it. Minimal. The Data Integration Service applies the early projection optimization method to the mapping. Normal. The Data Integration Service applies the early projection, early selection, and predicate optimization methods to the mapping. This is the default optimizer level. Full. The Data Integration Service applies the early projection, early selection, predicate optimization, and semi-join optimization methods to the mapping. You set the optimizer level for a mapping in the mapping configuration or mapping deployment properties. The Data Integration Service applies different optimizer levels to the mapping depending on how you run the mapping. You can run a mapping in the following ways: From the Run menu or mapping editor. The Data Integration Service uses the normal optimizer level. From the Run dialog box. The Data Integration Service uses the optimizer level in the selected mapping configuration. 2

From the command line. The Data Integration Service uses the optimizer level in the application's mapping deployment properties. You can also preview mapping output. When you preview mapping output, the Developer tool uses the optimizer level in the selected data viewer configuration. Optimization Methods To increase mapping performance, select an optimizer level for the mapping. The optimizer level controls the optimization methods that the Data Integration Service applies to a mapping. The Data Integration Service can apply the following optimization methods: Early projection. The Data Integration Service attempts to reduce the amount of data that passes through a mapping by identifying unused ports and removing the links between those ports. The Data Integration Service applies this optimization method when you select the minimal, normal, or full optimizer level. Early selection. The Data Integration Service attempts to reduce the amount of data that passes through a mapping by applying the filters as early as possible. The Data Integration Service applies this optimization method when you select the normal or full optimizer level. Predicate optimization. The Data Integration Service attempts to improve mapping performance by inferring new predicate expressions and by simplifying and rewriting the predicate expressions generated by a mapping or the transformations within the mapping. The Data Integration Service applies this optimization method when you select the normal or full optimizer level. Semi-join. The Data Integration Service attempts to reduce the amount of data extracted from the source by decreasing the size of one of the join operand data sets. The Data Integration Service applies this optimization method when you select full optimizer level. The Data Integration Service can apply multiple optimization methods to a mapping at the same time. For example, it applies the early projection, early selection, and predicate optimization methods when you select the normal optimizer level. Early Projection Optimization Method The early projection optimization method causes the Data Integration Service to identify unused ports and remove the links between those ports. Identifying and removing links between unused ports improves performance by reducing the amount of data the Data Integration Service moves across transformations. When the Data Integration Service processes a mapping, it moves the data from all connected ports in a mapping from one transformation to another. In large, complex mappings, or in mappings that use nested mapplets, some ports might not ultimately supply data to the target. The early projection method causes the Data Integration Service to identify the ports that do not supply data to the target. After the Data Integration Service identifies unused ports, it removes the links between all unused ports from the mapping. The Data Integration Service does not remove all links. For example, it does not remove the following links: Links connected to a Custom transformation Links connected to transformations that call an ABORT() or ERROR() function, send email, or call a stored procedure If the Data Integration Service determines that all ports in a transformation are unused, it removes all transformation links except the link to the port with the least data. The Data Integration Service does not remove the unused transformation from the mapping. The Developer tool enables this optimization method by default. Early Selection Optimization Method The early selection optimization method applies the filters in a mapping as early as possible. 3

Filtering data early increases performance by reducing the number of rows that pass through the mapping. In the early selection method, the Data Integration Service splits, moves, splits and moves, or removes the Filter transformations in a mapping. The Data Integration Service might split a Filter transformation if the filter condition is a conjunction. For example, the Data Integration Service might split the filter condition "A>100 AND B<50" into two simpler conditions, "A>100" and "B<50." When the Data Integration Service can split a filter, it attempts to move the simplified filters up the mapping pipeline, closer to the mapping source. Splitting the filter allows the Data Integration Service to move the simplified filters up the pipeline separately. Moving the filter conditions closer to the source reduces the number of rows that pass through the mapping. The Data Integration Service might also remove Filter transformations from a mapping. It removes a Filter transformation when it can apply the filter condition to the transformation logic of the transformation immediately upstream of the original Filter transformation. The Data Integration Service cannot always move a Filter transformation. For example, it cannot move a Filter transformation upstream of the following transformations: Custom transformations Transformations that call an ABORT() or ERROR() function, send email, or call a stored procedure Transformations that maintain count through a variable port, for example, COUNT=COUNT+1 Transformations that create branches in the mapping. For example, the Data Integration Service cannot move a Filter transformation upstream if it is immediately downstream of a Router transformation with two output groups. The Data Integration Service does not move a Filter transformation upstream in the mapping if doing so changes the mapping results. The Developer tool enables this optimization method by default. You might want to disable this method if it does not increase performance. For example, a mapping contains source ports "P1" and "P2." "P1" is connected to an Expression transformation that evaluates "P2=f(P1)." "P2" is connected to a Filter transformation with the condition "P2>1." The filter drops very few rows. If the Data Integration Service moves the Filter transformation upstream of the Expression transformation, the Filter transformation must evaluate "f(p1)>1" for every row in source port "P1." The Expression transformation also evaluates "P2=f(P1)" for every row. If the function is resource intensive, moving the Filter transformation upstream nearly doubles the number of times it is called, which might degrade performance. Predicate Optimization Method The predicate optimization method causes the Data Integration Service to examine the predicate expressions generated by a mapping or the transformations within a mapping to determine whether the expressions can be simplified or rewritten to increase performance of the mapping. When the Data Integration Service runs a mapping, it generates queries against the mapping sources and performs operations on the query results based on the mapping logic and the transformations within the mapping. The generated queries and operations often involve predicate expressions. Predicate expressions represent the conditions that the data must satisfy. The filter and join conditions in Filter and Joiner transformations are examples of predicate expressions. This optimization method causes the Data Integration Service to examine the predicate expressions generated by a mapping or the transformations within a mapping to determine whether the expressions can be simplified or rewritten to increase performance of the mapping. The Data Integration Service also attempts to apply predicate expressions as early as possible to improve mapping performance. This method also causes the Data Integration Service to infer relationships implied by existing predicate expressions and create new predicate expressions based on the inferences. For example, a mapping contains a Joiner transformation with the join condition "A=B" and a Filter transformation with the filter condition "A>5." The Data Integration Service might be able to add the inference "B>5" to the join condition. 4

The Data Integration Service uses the predicate optimization method with the early selection optimization method when it can apply both methods to a mapping. For example, when the Data Integration Service creates new filter conditions through the predicate optimization method, it also attempts to move them upstream in the mapping through the early selection method. Applying both optimization methods improves mapping performance when compared to applying either method alone. The Data Integration Service applies this optimization method when it can run the mapping more quickly. It does not apply this method when doing so changes mapping results or worsens mapping performance. When the Data Integration Service rewrites a predicate expression, it applies mathematical logic to the expression to optimize it. For example, the Data Integration Service might perform any or all of the following actions: Identify equivalent variables across predicate expressions in the mapping and generates simplified expressions based on the equivalencies. Identify redundant predicates across predicate expressions in the mapping and remove them. Extract subexpressions from disjunctive clauses and generates multiple, simplified expressions based on the subexpressions. Normalize a predicate expression. Apply predicate expressions as early as possible in the mapping. The Data Integration Service might not apply predicate optimization to a mapping when the mapping contains transformations with a datatype mismatch between connected ports. The Data Integration Service might not apply predicate optimization to a transformation when any of the following conditions are true: The transformation contains explicit default values for connected ports. The transformation calls an ABORT() or ERROR() function, sends email, or calls a stored procedure. The transformation does not allow predicates to be moved. For example, a developer might create a Custom transformation that has this restriction. The Developer tool enables this optimization method by default. Semi-Join Optimization Method The semi-join optimization method attempts to reduce the amount of data extracted from the source by modifying join operations in the mapping. The Data Integration Service applies this method to a Joiner transformation when one input group has many more rows than the other and when the larger group has many rows with no match in the smaller group based on the join condition. The Data Integration Service attempts to decrease the size of the data set of one join operand by reading the rows from the smaller group, finding the matching rows in the larger group, and then performing the join operation. Decreasing the size of the data set improves mapping performance because the Data Integration Service no longer reads unnecessary rows from the larger group source. The Data Integration Service moves the join condition to the larger group source and reads only the rows that match the smaller group. Before applying this optimization method, the Data Integration Service performs analyses to determine whether semi-join optimization is possible and likely to be worthwhile. If the analyses determine that this method is likely to increase performance, the Data Integration Service applies it to the mapping. The Data Integration Service then reanalyzes the mapping to determine whether there are additional opportunities for semi-join optimization. It performs additional optimizations if appropriate. The Data Integration Service does not apply semi-join optimization unless the analyses determine that there is a high probability for improved performance. For the Data Integration Service to apply the semi-join optimization method to a join operation, the Joiner transformation must meet the following requirements: The join type must be normal, master outer, or detail outer. The joiner transformation cannot perform a full outer join. 5

The detail pipeline must originate from a relational source. The join condition must be a valid sort-merge-join condition. That is, each clause must be an equality of one master port and one detail port. If there are multiple clauses, they must be joined by AND. If the mapping does not use target-based commits, the Joiner transformation scope must be All Input. The master and detail pipelines cannot share any transformation. The mapping cannot contain a branch between the detail source and the Joiner transformation. The semi-join optimization method might not be beneficial in the following circumstances: The Joiner transformation master source does not contain significantly fewer rows than the detail source. The detail source is not large enough to justify the optimization. Applying the semi-join optimization method adds some overhead time to mapping processing. If the detail source is small, the time required to apply the semi-join method might exceed the time required to process all rows in the detail source. The Data Integration Service cannot get enough source row count statistics for a Joiner transformation to accurately compare the time requirements of the regular join operation against the semi-join operation. The Developer tool does not enable this method by default. Setting the Optimizer Level for a Developer Tool Mapping When you run a mapping through the Run menu or mapping editor, the Developer tool runs the mapping with the normal optimizer level. To run the mapping with a different optimizer level, run the mapping through the Run dialog box. 1. Open the mapping. 2. Select Run > Open Run Dialog. The Run dialog box appears. 3. Select a mapping configuration that contains the optimizer level you want to apply or create a mapping configuration. 4. Click the Advanced tab. 5. Change the optimizer level, if necessary. 6. Click Apply. 7. Click Run to run the mapping. The Developer tool runs the mapping with the optimizer level in the selected mapping configuration. Setting the Optimizer Level for a Deployed Mapping Set the optimizer level for a mapping you run from the command line by changing the mapping deployment properties in the application. The mapping must be in an application. 1. Open the application that contains the mapping. 2. Click the Advanced tab. 3. Select the optimizer level. 4. Save the application. After you change the optimizer level, you must redeploy the application. 6

Authors Lori Troy Technical Writer Ashlee Brinan Principal Technical Writer 7