Jyotheswar Kuricheti

Similar documents
Performance Tuning. Chapter 25

INFORMATICA PERFORMANCE

Optimizing Session Caches in PowerCenter

Increasing Performance for PowerCenter Sessions that Use Partitions

Optimizing Performance for Partitioned Mappings

PowerCenter 7 Architecture and Performance Tuning

Informatica PowerCenter (Version 9.0.1) Performance Tuning Guide

Informatica Power Center 10.1 Developer Training

Informatica Developer Tips for Troubleshooting Common Issues PowerCenter 8 Standard Edition. Eugene Gonzalez Support Enablement Manager, Informatica

Performance Optimization for Informatica Data Services ( Hotfix 3)

Migrating Mappings and Mapplets from a PowerCenter Repository to a Model Repository

ETL Transformations Performance Optimization

Website: Contact: / Classroom Corporate Online Informatica Syllabus

Best Practices for Optimizing Performance in PowerExchange for Netezza

How to Use Full Pushdown Optimization in PowerCenter

SelfTestEngine.PR000041_70questions

Optimizing Testing Performance With Data Validation Option

Implementing Data Masking and Data Subset with IMS Unload File Sources

Tuning Enterprise Information Catalog Performance

Informatica Data Explorer Performance Tuning

Design Studio Data Flow performance optimization

Oracle Hyperion Profitability and Cost Management

Importing Flat File Sources in Test Data Management

Passit4sure.P questions

Informatica PowerCenter (Version HotFix 1) Advanced Workflow Guide

Migrating External Loader Sessions to Dual Load Sessions

Implementing Data Masking and Data Subset with IMS Unload File Sources

Code Page Settings and Performance Settings for the Data Validation Option

This document contains information on fixed and known limitations for Test Data Management.

A Examcollection.Premium.Exam.47q

Topic 1, Volume A QUESTION NO: 1 In your ETL application design you have found several areas of common processing requirements in the mapping specific

Question: 1 What are some of the data-related challenges that create difficulties in making business decisions? Choose three.

Tuning the Hive Engine for Big Data Management

Oracle 1Z0-640 Exam Questions & Answers

Using PowerCenter to Process Flat Files in Real Time

Getting Information Out of the Informatica Repository. William Flood, ETL Team Lead Charles Schwab

IT Best Practices Audit TCS offers a wide range of IT Best Practices Audit content covering 15 subjects and over 2200 topics, including:

Code Page Configuration in PowerCenter

C Exam Code: C Exam Name: IBM InfoSphere DataStage v9.1

Tuning Intelligent Data Lake Performance

COPYRIGHTED MATERIAL. Contents. Introduction. Chapter 1: Welcome to SQL Server Integration Services 1. Chapter 2: The SSIS Tools 21

Create Rank Transformation in Informatica with example

Oracle Database 11g: Administration Workshop II

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

User Guide. PowerCenter Connect for Netezza. (Version )

INFORMATICA POWERCENTER BASICS KNOWLEDGES

MyOra 6.0. SQL Tool for Oracle. User Guide

Implementing Data Masking and Data Subset with Sequential or VSAM Sources

Product Overview. Technical Summary, Samples, and Specifications

ORANET- Course Contents

Course Outline: Oracle Database 11g: Administration II. Learning Method: Instructor-led Classroom Learning. Duration: 5.

New Features Guide Sybase ETL 4.9

New Features... 1 Upgrade Changes... 1 Installation and Upgrade... 1 Known Limitations... 2 Informatica Global Customer Support...

Techno Expert Solutions An institute for specialized studies!

Cloud Mapping Designer (CMD) - FAQs

This document contains information on fixed and known limitations for Test Data Management.

Transformer Looping Functions for Pivoting the data :

Data Warehousing Concepts

CHAPTER. Oracle Database 11g Architecture Options

User Guide. Informatica PowerCenter Connect for MSMQ. (Version 8.1.1)

How to Migrate RFC/BAPI Function Mappings to Use a BAPI/RFC Transformation

Chapter 4 Data Movement Process

Using Standard Generation Rules to Generate Test Data

Eternal Story on Temporary Objects

<Insert Picture Here> MySQL Cluster What are we working on

Anthony AWR report INTERPRETATION PART I

<Insert Picture Here> Looking at Performance - What s new in MySQL Workbench 6.2

Lock Tuning. Concurrency Control Goals. Trade-off between correctness and performance. Correctness goals. Performance goals.

Exam 1Z0-061 Oracle Database 12c: SQL Fundamentals

Advanced Workflow Guide

Integration Services. Creating an ETL Solution with SSIS. Module Overview. Introduction to ETL with SSIS Implementing Data Flow

Configuring the Oracle Network Environment. Copyright 2009, Oracle. All rights reserved.

Oracle Database 10g: New Features for Administrators Release 2

SILWOOD TECHNOLOGY LTD. Safyr Metadata Discovery Software. Safyr Getting Started Guide

This document contains important information about main features, installation, and known limitations for Data Integration Hub.

DBArtisan 8.6 New Features Guide. Published: January 13, 2009

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM

What s New in MySQL 5.7 Geir Høydalsvik, Sr. Director, MySQL Engineering. Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Performance Tuning and Sizing Guidelines for Informatica Big Data Management

SQL Server 2014: In-Memory OLTP for Database Administrators

DBMS Performance Tuning

ETL Benchmarks V 1.1

Real-time Session Performance

Peak ETA Developers Guide

Introduction. Assessment Test. Chapter 1 Introduction to Performance Tuning 1. Chapter 2 Sources of Tuning Information 33

Quick Start SAP Sybase IQ 16.0

Enterprise Data Catalog Fixed Limitations ( Update 1)


Using MDM Big Data Relationship Management to Perform the Match Process for MDM Multidomain Edition

Oracle 12C DBA Online Training. Course Modules of Oracle 12C DBA Online Training: 1 Oracle Database 12c: Introduction to SQL:

Basics of SQL Transactions

ORACLE 11gR2 DBA. by Mr. Akal Singh ( Oracle Certified Master ) COURSE CONTENT. INTRODUCTION to ORACLE

Table of Contents. Eccella 1

Data Integration and ETL with Oracle Warehouse Builder

EMC Unisphere for VMAX Database Storage Analyzer

Heckaton. SQL Server's Memory Optimized OLTP Engine

PowerCenter Repository Maintenance

Managing Oracle Real Application Clusters. An Oracle White Paper January 2002

Data Validation Option Best Practices

Bigtable. Presenter: Yijun Hou, Yixiao Peng

Transcription:

Jyotheswar Kuricheti 1

Agenda: 1. Performance Tuning Overview 2. Identify Bottlenecks 3. Optimizing at different levels : Target Source Mapping Session System 2

3

Performance Tuning Overview: 4

What is Performance Tuning? Goal of Performance Tuning How do you measure performance? Throughput Why is it Critical? Load time is very critical to meet SLA needs of the data availability in the reports. How do you improve performance? Identify Bottlenecks Eliminate Bottlenecks Test Load option to see if any improvement in the performance Add partitions Change one variable at a time 5

Reasons for Session Performance Issues: CPU : CPU intensive operations like string manipulation inside Expression transformation Memory/Disk access : File system read/write issues Paging (lookup cache etc.) due to non availability of RAM Non availability of buffer blocks Network : Database and PowerCenter servers connected by WAN Input/output Operations Poor/Complex Design Incorrect Load Strategies 6

The optimization can be done on different levels of Informatica: 1) Target level 2) Source level 3) Mapping level 4) Transformation level 5) Session level 6) Grid Level 7) Component level 8) System level 7

Identify Bottlenecks : 8

WRT_8165 : TIMEOUT BASED COMMIT POINT This is a common message that can be seen in the session log where there are session performance issues Signifies that there aren t enough rows available in memory to insert and issue a commit This message means that there is a bottle neck either in source, target or any of the transformations and the bottleneck needs to be identified and removed to improve session performance. 9

Methods to identify performance bottlenecks: 1. Run test sessions 2. Analyze performance details 3. Analyze thread statistics 4. Monitor system performance 1.Run Test Sessions: Running a test load of few records to read data from a flat file or to write to a flat file target to identify source and target bottlenecks. Precisely depending upon the throughput performance will be measured 2.Analyze thread statistics: Analyze thread statistics to determine the optimal number of partition points. 10

3.Analyze performance details: Like performance counters, to determine where session performance decreases. Use Collect Performance Data in Session properties Areas to check when repository performance is a concern User would like to see statistics from the monitor 4.Monitor system performance: You can use system monitoring tools to view the percentage of CPU use, I/O waits, and paging to identify system bottlenecks. Use the Workflow Monitor to view system resource usage. 11

2.Using Thread statistics: This is the way where we get statistics from a session log file. Before going we need to know few points about Thread. DTM (Data Transformation manager) create a master thread to run our sessions. For each target load order group in a mapping, the master thread can create several threads. The types of threads depend on the session properties and the transformations in the mapping. The number of threads depends on the partitioning information for each target load order group in the mapping. 1. Mapping Threads 2. Pre- and Post-Session Threads 3. Reader Threads 4. Transformation Threads 5. Writer Threads Thread analysis is to decide the mapping performance depending upon the statistics of threads. we can use these statistics to identifying the source, target, or transformation bottlenecks. From session log file we will have 4 entries which give details about performance. 12

1. Run Time : total time taken by a thread 2. Idle Time: time period where thread is idle. 3. Busy: (run time - idle time) / run time X 100 4. Thread work time: time taken by each transformation in a thread Example : MANAGER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point [SQ_XXXX] has completed: Total Run Time = [576.620988] secs, Total Idle Time = [535.601729] secs, Busy Percentage = [7.113730]. MANAGER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of partition point [SQ_XXXX] has completed: Total Run Time = [577.301318] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [99.000000]. LKP_ADDRESS: 20.000000 percent AGG_ADDRESS: 79.000000 percent MANAGER> PETL_24022 Thread [WRITER_1_1_1] created for the write stage of partition point(s) [TGT_XXXX] has completed: Total Run Time = [577.363934] secs, Total Idle Time = [492.602374] secs, Busy Percentage = [14.680785]. The thread with the highest busy percentage identifies the bottleneck in the session. Blindly we can't add a partition point also to enable few thread to the process because the CPU may be busy with other task and just we are putting more pressure on CPU. Even we can ignore high busy percentage if total run time is less than 60 seconds 13

Optimizing at different levels : 14

(1).Target Level optimization: We are having two different target types. 1. Flat File 2. Database. If you are facing issue with flat file target then problem may not be with the flat file target, but problem lies with the storage space or with storage drive. Database: While loading into database we need to consider the following points. 1. Drop indexes and key constraints 2. Increase checkpoint intervals 3. Use bulk loading 4. Use external loading 5. Minimize deadlocks 6. Increase database network packet size 7. Optimize Oracle target databases 15

Drop indexes and key constraints : The loading of data will be slow on indexes or key constraints defined tables. Use pre-session commands to drop indexes before session loading. After loading the data the constraints or indexes need to be built again using postsession commands. Increase checkpoint intervals: The performance of loading depends on how many less check points do we have. To do so increase the checkpoint interval in the database Use bulk loading: Integration Service bypasses the database log, which speeds performance. Recovery is not possible. Use external loaders: With almost all the databases we have self built loading mechanism. Like for oracle SQL loader, for Teradata we can use Teradata external loader, to increase the performance we can load separate pipelines for separate partitions. Minimize deadlocks: We need to avoid attacking on same target from multiple sources systems, i.e. using multiple ways we shouldn't try to populate data at a single target. Use different target connection groups. 16

Increase database network packet size: If you realized that problem is with database consult your DBA and try to increase network packet size in listener.ora and tnsnames.ora Optimize Oracle target databases: With help of your DBA you can increase storage segments size or any database level changes can be created or added. Tune the Oracle redo log in the init.ora file 17

(2).Source Level Optimization: 1. Optimize the query. 2. Use conditional filters 3. Increase database network packet size Optimize the query: Join multiple sources in one SQ with hints, Indexes on group by and order by classes. Configure the source database to run parallel queries. Use conditional Filters: Apply filter on source data, but we need to do complete analysis before applying filters on source data. Connect ports from SQ only if they are needed in target. Increase database network packet size: If you realized that problem is with database consult your DBA and try to increase network packet size in listener.ora and tnsnames.ora 18

(3). Mapping Level Optimization: Mapping level optimization is a time taking process. Eliminate unwanted transformations, unwanted fields and links. The mapping optimization has to be done after source and target level optimization. 1. Optimize the flat file sources 2. Configure single-pass reading 3. Optimize Simple Pass Through mappings 4. Optimize filters 5. Optimize data type conversions 6. Optimize expressions Optimize the flat file sources: by avoiding double or single quotes and escape characters we can optimize flat file sources for delimiter files and managing the sequential buffer length for fixed files. Configure single-pass reading: Consider using single-pass reading if you have multiple sessions that use the same sources. It allows you to populate multiple targets with one source qualifier. It avoids using of a joiner for RDBMS source tables. 19

20

Optimize Simple Pass Through mappings : If we are passing data from source to target, connect directly from source qualifier to target. If use wizard to create Simple Pass Through mappings it will add an expression in between target and source qualifier. Optimize filters: If your source is a relational table use filter at source qualifier. It restricts some of the rows which are not valid for mapping process. For flat file sources, use filter transformation after the source qualifier. Avoid complex conditions at filter, go for integer or true/false conditions. Optimize datatype conversions: eliminate unnecessary datatype conversions. Use integer values in conditions of Lookup and Filter transformations. Before doing data conversion be aware of source and target data types. Optimize expressions: Factoring Out Common Logic. Minimizing Aggregate Function Calls. Ex: Use SUM(COLUMN_A + COLUMN_B) instead of SUM(COLUMN_A) + SUM(COLUMN_B) Call lookups conditionally. Use local variables in expression transformation. Use operators instead of Functions. 21

(4).Transformation level Optimization: The Transformation level optimization we can consider as a part of mapping optimization. Here we will get more information how to handle transformations more effectively. Optimizing Aggregator Transformations: They often slow performance because they must group data before processing it. 1. Group on Numeric : Group on numeric columns instead of string and date columns. 2. Group on Indexed Columns: 3. Using Sorted input: It reduces the amount of data cached which improves performance. 4. Reduce complex logic in aggregator expressions 5. Using Incremental Aggregation 5. Filter Data Before You Aggregate. 6. Limiting Port Connections 22

Optimizing Joiner Transformations : Joiner joins data of different sources into a single pipeline. 1.Designate the master source as the source with fewer duplicate key values. 2.Designate the master source as the source with fewer rows as it compares each row of the detail source against the master source 3.Perform joins in a database. Use SQ to perform join for relational tables 4.Use Sorted Data to join Optimizing Lookup Transformations : Caching lookup tables: Use the appropriate cache type: Static, Shared and Persistent caches Enable concurrent caches: number of additional concurrent pipelines is set to one or more Optimize Lookup policy on multiple matching: use any matching value, performance can improve because the transformation does not index on all ports but it still returns first value that matches lkp condition. Reduce the number of cached rows. Override the ORDER BY statement. 23

Optimizing the Lookup Condition: =,<,>,<=,>=,!= Filtering Lookup Rows Indexing the conditional columns in Lookup Table Optimizing Multiple Lookups Optimizing Sequence Generator Transformations: create a reusable Sequence Generator and use it in multiple mappings simultaneously. By configuring the Number of Cached Values property for sequence number we get some good results. Optimizing Sorter Transformations: If the Integration Service cannot allocate enough memory to sort data, it fails the session. For best performance, configure Sorter cache size with a value less than or equal to the amount of available physical RAM on the Integration Service machine. Default size is 16 MB Use the following formula to determine the size of incoming data: # input rows ([Sum(column size)] + 16 24

Optimizing Source Qualifier Transformations: Select distinct, filter, tune the query Optimizing SQL Transformations: Use query mode instead of script mode Do not use transaction statements like commit, rollback in an SQL transformation query. In query mode construct a static query by using parameter binding instead of string substitution in the SQL Editor Choose static connection instead of dynamic connection 25

(5).Session Level optimization: 1.Grid 2.Pushdown Optimization 3.Concurrent Sessions and Workflows 4.Buffer Memory 5.Caches 6.Target-Based Commit 7.Real-time Processing 8.Staging Areas 9.Log Files 10.Error Tracing 11.Post-Session Emails 26

1.Grid: A Load Balancer distributes tasks to nodes without overloading any node. A grid can improve performance when you have a performance bottleneck in the extract and load steps of a session, when memory or temporary storage is a performance bottleneck Ex : Sorter, Aggregator, Joiner (stores intermediate results) 2.Pushdown Optimization: Integration Service executes SQL against the source or target database instead of processing the transformation logic within the Integration Service. 3.Concurrent Sessions and Workflows : 4.Buffer Memory: Adjust DTM Buffer Size & Default Buffer Block Size 27

5.Caches: Limit the Number of Connected Ports With a 64-bit platform, the Integration Service is not limited to the 2 GB cache limit of a 32-bit platform. If the allocated cache is not large enough to store the data, the Integration Service stores the data in a temporary disk file, a cache file. Performance slows each time the Integration Service pages to a temporary file. The Transformation_readfromdisk or Transformation_writetodisk counters for any Aggregator, Rank, or Joiner transformation indicate the number of times the Integration Service pages to disk to process the transformation. 6.Target-Based Commit : If the commit interval is too high, the Integration Service may fill the database log file and cause the session to fail. 28

7.Real-time Processing: Increase the flush latency to improve throughput Source-based commit interval determines how often the Integration Service commits real-time data to the target. To obtain the fastest latency, set the source-based commit to 1. 8.Staging Areas: The Integration Service can read multiple sources with a single pass, which can reduce the need for staging areas. 9.Log Files: Workflows and sessions always create binary logs which can be accessed in the Administrator tool. 10.Error Tracing: Set the tracing level appropriately. To debug use Verbose. Use Terse when you do not want to log error messages for reject data. 29

11.Post-Session Emails: configure the session to write to log file when you configure post-session email to attach a session log. Enable flat file logging (6).Optimizing Grid Deployments: Add nodes to the grid. Increase storage capacity and bandwidth. Use shared file systems. Use a high-throughput network (7).Optimizing PowerCenter Repository Performance: Ensure the Repository Service process runs on the same machine where the repository database resides Order conditions in object queries Use a single-node tablespace for the PowerCenter repository if you install it on a DB2 db. Optimize the database schema for the PowerCenter repository if you install it on a DB2 or Microsoft SQL Server database by 30

enabling the Optimize Database Schema option for the Repository Service in the Administration Console. Optimizing Integration Service Performance: Use native drivers instead of ODBC drivers for the Integration Service. Run the Integration Service in ASCII data movement mode if character data is 7-bit ASCII or EBCDIC. ASCII mode take 1 byte to store each character where as UNICODE takes 2 bytes. Cache PowerCenter metadata for the Repository Service. Run Integration Service with high availability : Integration Service recovers workflows and sessions that may fail because of temporary network or machine failures. To recover from a workflow or session, the Integration Service writes the states of each workflow and session to temporary files in a shared directory which may decrease performance 31

(8).Optimizing the System: Improve network speed : Minimize the number of network hops between the source and target databases and the IS A local disk can move data 5 to 20 times faster than a network. Store flat files as source or target in IS machine Move Target DB to a Server System if possible Ask Network Engineer to provide enough Bandwidth Use multiple CPUs to run multiple sessions in parallel Reduce paging Use processor binding Using Pipeline Partitions: After you tune the application, databases, and system for maximum singlepartition performance, you may find that the system is under-utilized. At this point, you can configure the session to have two or more partitions To improve performance, ensure the number of pipeline partitions equals the number of database partitions. 32

Use the database partitioning partition type for source and target databases. Enable parallel queries/inserts SQ : pass-through partition Filter : round-robin partition Sorter : hash auto-keys partitioning. Delete default partition at Aggregator Performance Counters : All transformations have counters. The Integration Service tracks the number of input rows, output rows, and error rows for each transformation. Some transformations have performance counters: right-click the session in the Workflow Monitor and choose Properties. Click the Properties tab in the details dialog box. Errorrows Readfromcache and Writetocache Readfromdisk and Writetodisk Rowsinlookupcache 33

If these counters display any number other than zero, you can increase the cache sizes to improve session performance. 34

2014 by Author (Jyotheswar Kuricheti). All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without prior written permission of Author. 35