Informatica V9 Sizing Guide

Similar documents
Informatica Data Explorer Performance Tuning

Carrier Services. Intelligent telephony. for over COUNTRIES DID NUMBERS. All IP

Purchasing. Operations 3% Marketing 3% HR. Production 1%

Step 1: New Portal User User ID Created Using IdentityIQ (IIQ)

AN POST SCHEDULE OF CHARGES

Power Analyzer Firmware Update Utility Version Software Release Notes

STANDARD BROADBAND & FIBRE BROADBAND PLANS

STANDARD BROADBAND & FIBRE BROADBAND PLANS

20 th October 2011 STEP357

DATA APPENDIX. Real Exchange Rate Movements and the Relative Price of Nontraded Goods Caroline M. Betts and Timothy J. Kehoe

Global entertainment and media outlook Explore the content and tools

International Roaming Critical Information Summaries JULY 2017

ERIFLEX. FLEXIBAR Performance with Frequency

VOICE/DATA SIMCARD USA UNLIMITED

Cisco Voice Services Provisioning Tool 2.6(1)

END-OF-SALE AND END-OF-LIFE ANNOUNCEMENT FOR THE CISCO MEDIA CONVERGENCE SERVER 7845H-2400

Tips and Tricks for Data Quality Management

CUSTOMER GUIDE Interoute One Bridge Outlook Plugin Meeting Invite Example Guide

International Business Mail Rate Card

MANUAL VOICE/DATA SIMCARD CANADA

Light Quality and Energy Efficiency The CIE Approach

Items exceeding one or more of the maximum weight and dimensions of a flat. For maximum dimensions please see the service user guide.

Dataliner Message Displays Using DL50 Slaves with a DL40 Master

International Packets

EventBuilder.com. International Audio Conferencing Access Guide. This guide contains: :: International Toll-Free Access Dialing Instructions

Conferencing and Recording

Improving digital infrastructure for a better connected Thailand

Optimization in the Mail Sorting

ALL-IN-ONE PRESENTATION SYSTEMS

RPM International Inc. Hotline Instructions

Cisco Aironet In-Building Wireless Solutions International Power Compliance Chart

iclass SE multiclass SE 125kHz, 13.56MHz 125kHz, 13.56MHz

Agilent N1918A Power Analysis Manager

Digital EAGLEs. Outlook and perspectives

U85026A Detector 40 to 60 GHz

Spoka Meet Audio Calls Rates Dial-In UK

Payphone Origination Service Charge Rate Per Min. Mobile Origination Service Charge. MLB Switched Rate Per Min. MLB Dedicated Rate Per Min

Customers want to transform their datacenter 80% 28% global IT budgets spent on maintenance. time spent on administrative tasks

Cisco 2651XM Gateway - PBX Interoperability: Avaya Definity G3 PBX using Analog FXO Interfaces to an H.323 Gateway

For: Ministry of Education From Date: 19 November 18-2 December 18 Venue: M1 Shops

Global Economic Indicators: Global Leading Indicators

Transforming networks and services for communications service providers

OPERATIONS MANUAL Audio Conferencing

Programming Note. Agilent Technologies Quick Reference Guide For the 8757D/E Scalar Network Analyzer

Supplier Responding to New Products RFP Event

Cisco Extensible Provisioning and Operations Manager 4.5

Multi-Site Parallel Testing with the S535 Wafer Acceptance Test System APPLICATION NOTE

Safety. Introduction

Product info. PCKeeper - Ver Windows 10. PCKeeper Antivirus - Ver Windows 10. BRATISLAVA, Slovakia, Sept 2, 2016

OnAudience.com I Report 2017 Ad blocking in the Internet

STANDARD BROADBAND & FIBRE BROADBAND PLANS

Manual. Continental Supplier Portal (SUS) - Create Bid. Internal. 24. März 2016

Keysight Technologies RS232/UART Protocol Triggering and Decode for Infiniium Series Oscilloscopes. Data Sheet

Uploading protocols and Assay Control Sets to the QIAsymphony SP via the USB stick

END-USER MANUAL. Sennheiser HeadSetup Pro

FB-DIMM Commands/Data and Lane Traffic Verification

Addressing Geoff Huston APNIC

Investigating Country Differences in Mobile App User Behaviour and Challenges for Software Engineering. Soo Ling Lim

Quintiles vdesk Welcome Guide

The IECEE CB Scheme facilitates Global trade of Information Technology products.

Understanding Normalization and Product Catalog

N7624B Signal Studio for LTE Technical Overview

Agilent E4982A LCR Meter

CISCO IP PHONE 7970G NEW! CISCO IP PHONE 7905G AND 7912G XML

Patent Portfolio Overview July The data in this presentation is current as of this date.

Demo Guide. Keysight Multi-Operator with M937xA PXIe Vector Network Analyzers

Patent Portfolio Overview May The data in this presentation is current as of this date.

Innovative Fastening Technologies

PSDPRO Parallel Port Programmer for ST s Programmable System Device (PSD) Products

International Business Parcels Rate card

Airframe Types. Fixed Wing. Others

Agilent U2941A Parametric Test Fixture

Traffic Offload. Cisco 7200/Cisco 7500 APPLICATION NOTE

The Role of SANAS in Support of South African Regulatory Objectives. Mr. Mpho Phaloane South African National Accreditation System

Automation DriveServer

Complement Drive Test for UTRAN using a passive Protocol Monitor

Cisco CallManager 4.0-PBX Interoperability: Lucent/Avaya Definity G3 MV1.3 PBX using 6608-T1 PRI NI2 with MGCP

Allianz SE Reinsurance Branch Asia Pacific Systems Requirements & Developments. Dr. Lutz Füllgraf

Electronic access to technical information. Work in progress in Development of. Members. International Standards,

E-Seminar. Voice over IP. Internet Technical Solution Seminar

Keysight E6966B IMS-SIP Network Emulator. Technical Overview

Agenda CSA Overview CSA major features CSA status CSA s File attachments and Browsers specifications CSA login / URL references CSA demonstratio

Mexico s Telecommunications Constitutional Reform, the Shared Network and the Public - Private Collaboration. MBB Forum Shanghai, China

Configuring DHCP for ShoreTel IP Phones

EE Pay Monthly Add-Ons & Commitment Packs. Version

Training Notes Unity Real Time 2

Access Code and Phone Number

Cisco 3745 Gateway - PBX Interoperability: Avaya Definity G3 PBX using Q.931 PRI Network Side Interfaces to an H.323 Gateway

PIRLS 2016 INTERNATIONAL RESULTS IN READING

STM32-MP3NL/DEC. STM32 audio engine MP3 decoder library. Description. Features

Increasing Performance for PowerCenter Sessions that Use Partitions

ISO/IEC JTC 1/SC 32 N 2432

Integrating CaliberRM with Mercury TestDirector

KNX Japan KNX The Success Story

Internet Telephony. CURS4161 Curriculum Studies Computer Studies II. David Keffer UOIT Student # rd February, 2005.

USING TREND SERVERPROTECT5 WITH CISCO CALLMANAGER

English Version. Postal Services - Open Standard Interface - Address Data File Format for OCR/VCS Dictionary Generation

Addressing and Routing in Geoff Huston APNIC

GUIDE TO ONLINE APPLICATION FOR SPACE Individual and Group Applications 15 May 2013

ISO/IEC JTC 1/SC 32 N 2334

Transcription:

Informatica V9 Sizing Guide Overview of Document This document shows average sizing for V9 Installs at 3 different levels. The first is the size of installed elements on the file system. The second is the runtime footprint of general V9 services for all users. The last is the additional overhead in memory and disk of an individual user s running mappings. The mappings contribution to disk/memory usage is usually the most critical and the most difficult to average without particular details. The details below can be used as a basis of scaling calculations based on number of concurrent mappings submitted to the server, transform usage in the mapping and the data file input size in number of rows and columns. Base on Disk Install Size A typical Server Platform Install will take about 3.2 GB of Disk. This does not include disk usage for reference data items listed in appendix 1 and 2. In addition it does not take account of the database usage of a typical set of reference data items. The numbers below are for a full set of all these content elements on the file system. Depending on individual customers usage appendix 1 and 2 may be used to estimate more exact disk sizing. Server Platform Install Size Identity Reference Data Address Reference Data Reference Table Data 3.2 GB 600 MB 4 GB 3 GB This gives a rounded base figure of about 12 GB. This does not include additional customer reference data or increases in Address Reference data which is amended as country postal authorities add additional data. General Runtime Memory Sizing The below sizing is for an average running server with no disk/memory intensive mappings and no loaded content No Service Name Virtual Set Working Set 1. Admin Console 773K 133K 2. MRS 1288K 407K 3. Mapping Service 978K 254K

4. Analyst Tool 702K 79K This table shows the average sizes of the 4 V9 services of a typical configuration. The Virtual Set is the total memory in virtual memory and the Working Set is the physically resident memory usage. Address Validation Reference data This data is loaded globally for all users. The customers configuration dictates which AV reference files are loaded. The Address Validation file size guide may be used here to estimate memory usage. The average size in memory of each loaded element is approximately the same as the disk footprint. For example if a user runs a mapping that uses the following reference file United States Batch/Interactive 533 MB It would be expected that the process memory size will grow by 533MB approximately. It should be noted that this memory cost is for the life time of the server and is a once off cost for the server and all mappings run in the server s lifetime. The loaded Address Validation data is not unloaded even when there are no current users for performance reasons. User Created Mapping Memory and Disk footprint This section is split into 3 types of Data Quality component. The Standard elements don t incur any additional costs in memory or disk usage beyond its standard running size. The Dynamic components are of 2 different types. Reference data based transforms which hold in-memory, the same reference table lookup structures and Dynamic transforms that can include items like third party engines, sort space or b-tree storage. The Dynamic transforms use both memory and disk that can considerably depending on the data being processed. Standard DQ Transformations Comparison Transformation Decision Transformation Merge Transformation None of the transforms have dynamic memory or disk usage that varies with the size of the data being processed. All these components are referred to as passive since they process data rows in small batches and send to the next component in the mapping immediately. Reference Data Based Transformations Case Convertor Transformation Labeller Transformation

Parser Transformation Standardiser Transformation These transforms are all based around usage of reference data. While they are all passive in that they process data immediately they have initialisation costs that increase memory based on configuration. This memory usage makes them dynamic based on the transforms configuration but not dynamic based on the number of rows presented for processing While the reference data is managed in a database for editing, at runtime it s held in memory for performance. To optimise the throughput this in-memory storage is designed for speed rather than space efficiency. The current list of reference tables available is around 3.5K so a list of tables and in-memory sizes is not included. Each transform will have its own copy of the in-memory reference data. To enable sizing the customer should take the number of bytes in each column of the reference table and multiply it by the number of lines. This final calculation multiplied by 1.3 will give an approximate guide to the in memory footprint. For example a reference table with 10K rows and 6 columns with an average byte count per column of 25 will give 10000 * 6 * 25 * 1.3 approximately 2M runtime memory usage. This runtime memory cost is for the lifetime of the mapping. All in memory reference tables are freed when the mapping is finished. Dynamic DQ Transformations All the following components have dynamic memory and disk usage. These components are referred to as active and in general store large numbers of rows internally for block processing and have memory/disk requirements that increase in-line with the volume of input rows and number of corresponding columns per row Address Validator Transformation This component is treated in the General Runtime memory sizing section as it affects all users as soon as the first mapping is run. Association Transformation This component makes extensive use of B-tree file based storage. Each column used in the association will have its own b-tree and a general b-tree is used to store all the input data rows. The Informatica b-tree is space efficient but not compressed. So the general sizing guideline here is as follows, Each association column is the total volume of data for each column * 20 bytes per input row The general storage cache is the size of the input data set * 10 bytes per row will be the on disk runtime cost. An internal memory map of association id s and rows will be no larger than 20 bytes * the number of rows Sorter Based Transforms Consolidation Transformation

Key Generator Transformation These transforms all contain standard Informatica sort transforms. Currently they are all set to auto. This is an internal configuration which attempts to give the transform as much memory as possible without affecting system performance. When user wants more explicit control the sort transform can be set with a memory limit on the maximum amount of main memory it can use to sort data. The on disk temp size will grow as all data rows must be stored by the sort transform Match Transformation The match transform makes use of 2 different types of B-tree depending on its configuration. When a user has configured a set of pass through ports and Identity matching both types will be used. In general it can be assumed that the B-tree storage will not exceed in a significant way the total size of disk the data would occupy if sitting outside the B-tree on the file system. Worked Sizing for US based Customer Because any individual customer will have problem specific requirements the following example shows how the data in this document may be applied to create more accurate sizing estimates. The example shows the sizing for both disk and memory for a 4 user DQ installation using US Address Validation, US Identity Matching and US Reference Dictionaries. While this number may be small the variable elements of disk/memory usage only magnify when you have multiple users concurrently using disk and memory intensive transforms. The transforms that have individual requirements per mapping run are indicated in the document. Base Server Disk Requirements Base Memory Requirements 12 GB (Calculation shown above) 2 GB (Calculation shown above) Assumption here is that a mapping without disk/memory sensitive components will add little beyond the standard footprint. This will not be true with very complex mappings. User 1 Running a matching mapping Dual Source Identity with Source1 containing 1M rows and source2 containing 100K rows, 6 columns with 25 bytes per column, 20 columns of pass-through data with 25 bytes per column This mapping will have 2 sorters from the key generation phase, 1 B-tree from matching, 1 B-tree from Identity and internal memory usage for Identity and clustering Disk Usage B-tree 1 Identity = 1100000 * 6 * 25 = 165MB B-tree 2 Pass-through = 1100000 * 20 * 25 = 550MB Memory Usage = Internal storage for large number of transforms used for matching 10MB

User 2 Running an AV mapping Single Source with Source1 containing 1M rows This mapping will have minimal transforms but will load the all US AV validation reference data United States Batch/Interactive 533 MB United States GeoCoding 422 MB United States FastCompletion 380 MB Total Disk added = 0 Memory Usage = 533 + 422 + 380 = 1.3 GB User 3 Running Standardisation Single Source with Source1 containing 10M rows This mapping will have minimal transforms but will load 10 dictionaries to standardise Assume each dictionary has 10K rows with 5 columns and 25 bytes average per column Total Disk added = 0 Memory Usage 10000 * 5 * 25 * 1.3 = 1.6 MB per dictionary Total Memory = 16MB User 4 Running Association Single Source with Source1 containing 10M rows and association running across 8 groups This mapping will not have other matching transforms and will source data directly from a single table. Each association key column will have a 10 byte key and there will be 10 additional columns of row data each 50 bytes wide Each Key column Btree will take 10M * ( 10 + 20) 300MB General Storage will take 10M * ((8 * 10) + (10 * 50)) 5.8GB Total Disk 300MB * 8 columns + 5.8GB = 8.2GB Total Memory = 10M * 20 = 200MB Total Additional Memory/Disk used by the 4 concurrently running mappings Disk = 165MB + 550MB + 8200MB = 8915MB Memory = 10 MB + 1300MB + 16MB + 200MB = 1526MB

Summary The data in this document estimates the standard disk and memory footprint of the V9 server. In addition the 2 tables shown at the end of the document will allow a user to minimise the on disk footprint of the install if this is required. The Example sizing at the bottom of the document shows how to estimate a mappings contribution to disk/memory by analysing the composition of the mapping and each transforms contribution to disk/memory usage. The example also shows the importance of factoring in the number of concurrent users and likely usage in defined the total peak requirements of an individual installation. Appendix 1 Address Validation Reference Data with On Disk size Largest 50 files United States Batch/Interactive 533 MB United Kingdom FastCompletion 501 MB United States GeoCoding 422 MB United States FastCompletion 380 MB United Kingdom Batch/Interactive 306 MB France FastCompletion 210 MB France Batch/Interactive 153 MB Argentina FastCompletion 120 MB Brazil FastCompletion 104 MB Germany FastCompletion 102 MB Germany Batch/Interactive 99 MB United Kingdom Supplementary 94.5 MB Italy FastCompletion 92.9 MB Argentina Batch/Interactive 90 MB Canada FastCompletion 83.1 MB India FastCompletion 83.1 MB India Batch/Interactive 80 MB Germany GeoCoding 73.5 MB Brazil Batch/Interactive 73.3 MB Italy Batch/Interactive 66 MB

Canada Batch/Interactive 61.8 MB United Kingdom GeoCoding 51.8 MB Sweden FastCompletion 49 MB Mexico FastCompletion 48.5 MB Australia FastCompletion 44.6 MB Russian Federation FastCompletion 44.3 MB Mexico Batch/Interactive 42.8 MB Australia Batch/Interactive 40.9 MB Russian Federation Batch/Interactive 40.5 MB France GeoCoding 39.7 MB Portugal FastCompletion 38.8 MB Italy GeoCoding 36.6 MB Netherlands FastCompletion 35.5 MB Canada GeoCoding 32.7 MB China FastCompletion 28.4 MB Netherlands Batch/Interactive 27.8 MB Sweden Batch/Interactive 27.4 MB Spain GeoCoding 25.6 MB Australia GeoCoding 25.4 MB Spain FastCompletion 23.7 MB Chile FastCompletion 23.4 MB Netherlands GeoCoding 22.7 MB Portugal Batch/Interactive 22.5 MB China Batch/Interactive 21.4 MB Finland GeoCoding 18.8 MB Switzerland FastCompletion 18.2 MB Sweden GeoCoding 17.8 MB Chile Batch/Interactive 16.8 MB

Belgium FastCompletion 16.1 MB Spain Batch/Interactive 15.4 MB The full list can be found at: http://www.addressdoctor.com/en/support/countrydownloadv5.asp Appendix 2 Identity Based Matching Reference Data with On Disk Size IM_japan_i.zip 86,222,167 IM_japan.zip 86,222,153 IM_japan_r.zip 15,754,935 IM_gaelic.zip 9,237,372 IM_canada.zip 8,933,319 IM_international.zip 5,303,974 IM_chinese_s.zip 4,955,588 IM_south_africa.zip 4,260,152 IM_uk.zip 4,241,637 IM_ireland.zip 4,241,357 IM_new_zealand.zip 4,200,805 IM_australia.zip 4,153,252 IM_usa.zip 4,134,750 IM_arabic_m.zip 3,893,388 IM_indonesia.zip 3,494,046 IM_cyrillic.zip 3,022,104 IM_arabic_r.zip 2,980,176 IM_singapore.zip 2,505,578 IM_india.zip 2,321,418 IM_chinese_t.zip 2,189,993 IM_aml.zip 2,083,153 IM_greek_l.zip 2,057,442 IM_switzerland.zip 2,028,497 IM_france.zip 1,950,898 IM_philippines.zip 1,896,332 IM_luxembourg.zip 1,812,614 IM_belgium.zip 1,696,864 IM_germany.zip 1,604,137 IM_brasil.zip 1,596,925 IM_portugal.zip 1,596,786 IM_korean_r.zip 1,588,819 IM_italy.zip 1,554,842 IM_turkey.zip 1,552,887 IM_hk_r.zip 1,542,915 IM_sweden.zip 1,528,272 IM_czech.zip 1,525,846

IM_netherlands.zip 1,476,954 IM_taiwan_r.zip 1,473,532 IM_denmark.zip 1,473,231 IM_slovakia.zip 1,458,393 IM_malaysia.zip 1,447,577 IM_thai_r.zip 1,443,929 IM_spain.zip 1,438,526 IM_chinese_r.zip 1,431,129 IM_colombia.zip 1,414,047 IM_argentina.zip 1,413,962 IM_indo_chin_r.zip 1,410,620 IM_chile.zip 1,400,965 IM_peru.zip 1,389,800 IM_vietnam_r.zip 1,379,744 IM_puerto_rico.zip 1,372,143 IM_mexico.zip 1,344,656 IM_thai.zip 1,279,607 IM_finland.zip 1,273,884 IM_norway.zip 1,273,795 IM_poland.zip 1,261,906 IM_greek.zip 1,247,548 IM_hungary.zip 1,205,908 IM_estonia.zip 1,092,791 IM_korean.zip 821,290 IM_ofac.zip 759,006 IM_hebrew.zip 754,978 IM_chinese_i.zip 544,844 IM_arabic.zip 297,401