Configuring AWS IAM Authentication for Informatica Cloud Amazon Redshift Connector

Configuring AWS IAM Authentication for Informatica Cloud Amazon Redshift Connector Copyright Informatica LLC 2015, 2017. Informatica, the Informatica logo, and Informatica Cloud are trademarks or registered trademarks of Informatica LLC in the United States and many jurisdictions throughout the world. A current list of Informatica trademarks is available on the web at https://www.informatica.com/trademarks.html

Abstract You can use AWS Identity and Access Management (IAM) to control individual and group access to Amazon Redshift resources. You can configure AWS IAM to run tasks on the Secure Agent that is installed on the EC2 system. AWS IAM service provides enhanced security. This article describes the guidelines to configure IAM Authentication for Informatica Cloud Amazon Redshift Connector. Supported Versions Informatica Cloud Fall 2016 December Table of Contents Overview.... 2 Create Minimal Amazon S3 Bucket Policy.... 2 Create the Amazon EC2 Role.... 3 Create the Amazon Redshift Role.... 4 Add Amazon Redshift Role to the Redshift Cluster.... 4 Create an Amazon Redshift Connection.... 5 Create a Data Synchronization Task... 5 Overview You are a business analyst for an e-commerce organization. The organization stores product and customer data in an on-premise MySQL database. You want to securely read data from the on-premise MySQL database and write data to Amazon Redshift for analysis. You want to perform analysis on the data to make business decisions and enhance customer relationships. To control the access of Amazon Redshift resources, you can define permissions to the users by configuring AWS Identity and Access Management (IAM). Perform the following steps to configure IAM authentication: 1. Create a minimal Amazon S3 bucket policy. 2. Create an Amazon EC2 Role and an EC2 instance. 3. Create an Amazon Redshift Role ARN. 4. Add the Amazon Redshift Role ARN to the Amazon Redshift cluster. 5. Create an Amazon Redshift connection. 6. Create a Data Synchronization task. Create Minimal Amazon S3 Bucket Policy The minimal Amazon S3 bucket policy ensures Amazon Redshift Connector performs read and write operations successfully. You can restrict user operations and user access to a particular Amazon S3 bucket by assigning an AWS IAM policy to the users. Configure the AWS IAM policy through the AWS console. 2

You can use the following minimum required permissions to successfully read data from and write data to Amazon Redshift resources: PutObject GetObject GetObjectVersion DeleteObject DeleteObjectVersion ListBucket GetBucketPolicy The following snippet shows a sample Amazon S3 bucket policy: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:putobject", "s3:getobject", "s3:getobjectversion", "s3:deleteobject", "s3:deleteobjectversion", "s3:listbucket", "s3:getbucketpolicy" ], "Resource": [ "arn:aws:s3:::<specify_bucket_name>/*", "arn:aws:s3:::<specify_bucket_name>/*" ] } ] } You can specify the following regions that Amazon S3 supports to create an Amazon S3 bucket: Asia Pacific (Mumbai) Asia Pacific (Seoul) Asia Pacific (Singapore) Asia Pacific (Sydney) Asia Pacific (Tokyo) Canada (Central) EU (Ireland) EU (Frankfurt) South America (Sao Paulo) US East (N. Virginia) US East (Ohio) US West (N. California) US West (Oregon) Create the Amazon EC2 Role When you create an EC2 system in the Redshift cluster, you can use Amazon EC2 Role. 1. Log in to the AWS Console. 2. Click Identity and Access Management. 3. Select Role under the Details menu and click Create New Role. 4. Specify the name of the role in the Set Role Name page. 5. Click Next Step. 3

6. Select the Amazon EC2 role type in the Select Role Type page. 7. Select the required Amazon S3 Policy in the Attach Policy page. 8. Click Next Step. 9. Review the Role Name, Role ARN, Trusted Entities, and Policies values in the Review page. 10. Click Create Role. After creating the Amazon EC2 Role, create an EC2 instance. Assign the Amazon EC2 Role to the EC2 instance. For more information about creating an EC2 instance and assigning an Amazon EC2 Role to the Amazon EC2 instance, see the Amazon Redshift documentation. Create the Amazon Redshift Role Use the Amazon Redshift Role for secure access to Amazon Redshift resources. 1. Log in to the AWS Console. 2. Click Identity and Access Management. 3. Select Role under the Details menu and click Create New Role. 4. Specify the name of the role in the Set Role Name page. 5. Click Next Step. 6. Select the Amazon Redshift role type in the Select Role Type page. 7. Select the required Amazon S3 Policy in the Attach Policy page. 8. Click Next Step. 9. Review the Role Name, Role ARN, Trusted Entities, and Policies values in the Review page. 10. Click Create Role. You created an Amazon Redshift Role, for example, arn:aws:iam::123123456789:role/redshift_write. You must assign this role to the Amazon Redshift cluster to successfully perform the read and write operations. Add Amazon Redshift Role to the Redshift Cluster After you create an Amazon Redshift Role, you must associate the role with an Amazon Redshift cluster to read data from and write data to Amazon Redshift target. 1. Log in to the AWS Console. 4

2. Click Amazon Redshift under the Database option. 3. Click Clusters under Dashboard and select your cluster. 4. Click Manage IAM Roles. The Manage IAM Roles dialog box displays. 5. Select the required Amazon Redshift Role. For example, arn:aws:iam::123123456789:role/redshift_write. 6. Click Apply changes. After you add the Amazon Redshift Role to the Redshift cluster, install the Secure Agent on the EC2 instance. For more information about installing a Secure Agent, see Informatica Cloud online help. Create an Amazon Redshift Connection Create an Amazon Redshift connection and specify the connection properties to configure IAM to control secure access of Amazon Redshift resources. When you create an Amazon Redshift connection, do not provide the Access Key ID and Secret Access Key. The following image shows sample values in the Amazon Redshift connection properties: The Secure Agent uses the username, password, and jdbc url to validate the connection. When you configure the IAM Role, the Secure Agent uses the Amazon Resource Name (ARN) associated with the IAM Role to access the data from the Amazon Redshift target. When you run the Data Synchronization task, the Secure Agent validates the IAM policy. Create a Data Synchronization Task Create a Data Synchronization task to read data from an on-premise MySQL database and write data to Amazon Redshift target for analysis. Configure AWS IAM authentication for secure and controlled access to Amazon Redshift resources when you run the Data Synchronization task. 1. Select Task Wizard on the Informatica Cloud home page. 5

2. Select Data Synchronization from the menu. The Data Synchronization page appears. 3. Select New. The Definition tab appears. 4. Provide the task details. The following image shows sample task details: 5. Select Next. The Source tab appears. 6. Provide source details to read data from the MySQL source. The following image shows sample source details: 7. Select Next. The Target tab appears. 8. Select the target Connection and Target Object required for the task. The following image shows sample target details: 9. Select Next. 10. The Data Filters tab appears. Default is Process all rows. 6

11. Select Next. In Field Mapping tab, map the source fields to the target fields. 12. Select Next. The Schedule tab appears. 13. Provide the appropriate values for the following advanced target properties: S3 Bucket Name. CopyOptions Property File. Specify the AWS IAM ROLE that you created. Verify that the Amazon S3 bucket and Amazon Redshift cluster reside in the same region. The following image shows sample advanced target properties: 14. Click Save and Run the task. The Secure Agent writes the data to Amazon Redshift target when you specify the ARN in the advanced target properties. The following snippet describes a sample COPY command content: WRITER_1_*_1> Amazon_RedshiftWriter_10004 [2017-01-16 11:46:33.745] [INFO] The agent is running the following SQL query: copy public.master_account1_new (id, isdeleted, masterrecordid, name, type, parentid, billingstreet, billingcity, billingstate, billingpostalcode, billingcountry, billinglatitude, billinglongitude, shippingstreet, shippingcity, shippingstate, shippingpostalcode, shippingcountry, shippinglatitude, shippinglongitude, phone, fax, accountnumber, website, photourl, sic, industry, annualrevenue, numberofemployees, ownership, tickersymbol, description, rating, site, ownerid, createddate, createdbyid, lastmodifieddate, lastmodifiedbyid, systemmodstamp, lastactivitydate, lastvieweddate, lastreferenceddate, jigsaw, jigsawcompanyid, accountsource, sicdesc, customerpriority c, sla c, numberoflocations c, upsellopportunity c, slaserialnumber c, slaexpirationdate c, active c, myemail c, test1 c) from 's3://sample.name.bucket.csv.' credentials 'aws_iam_role=arn:aws:iam::123123456789:role/redshift_read' MAXERROR 1 CSV QUOTE '\037' NULL '' ACCEPTINVCHARS '?' DELIMITER '\036' ROUNDEC IGNOREHEADER 1 GZIP COMPUPDATE OFF; 7

After you run the task, you will be redirected to the Monitor Log page. In the Monitor Log page, you can monitor the status of the Data Synchronization tasks. Authors Fariyal Arif Documentation Trainee Chanchal Das Lead Technical Writer Shivaprasad Yallappagoudar Lead QA Engineer 8