Informatica Data Lake Management on the AWS Cloud

Size: px
Start display at page:

Download "Informatica Data Lake Management on the AWS Cloud"

Transcription

1 Informatica Data Lake Management on the AWS Cloud Quick Start Reference Deployment January 2018 Informatica Big Data Team Vinod Shukla AWS Quick Start Reference Team Contents Overview... 2 Informatica Components... 3 Costs and Licenses... 3 Architecture... 4 Informatica Services on AWS... 5 Planning the Data Lake Management Deployment...8 Deployment Options...8 Prerequisites...8 Deployment Steps... 9 Step 1. Prepare Your AWS Account... 9 Step 2. Upload Your Informatica License... 9 Step 3. Launch the Quick Start Step 4. Monitor the Deployment Step 5. Download and Install Informatica Developer Manual Cleanup Troubleshooting Using Informatica Data Lake Management on AWS Page 1 of 30

2 Transient and Persistent Clusters Common AWS Architecture Patterns for Informatica Data Lake Management Process Flow Additional Resources GitHub Repository Document Revisions This Quick Start deployment guide was created by Amazon Web Services (AWS) in partnership with Informatica. Quick Starts are automated reference deployments that use AWS CloudFormation templates to deploy key technologies on AWS, following AWS best practices. Overview This Quick Start reference deployment guide provides step-by-step instructions for deploying the Informatica Data Lake Management solution on the AWS Cloud. A data lake uses a single, Hadoop-based data repository that you create to manage the supply and demand of data. Informatica s solution on the AWS Cloud integrates, organizes, administers, governs, and secures large volumes of both structured and unstructured data. The solution delivers actionable fit-for-purpose, reliable, and secure information for business insights. Consider the following key principles when you implement a data lake: The data lake must prevent barriers to onboarding data of any type and size from any source. Data must be easily refined and immediately provisioned for consumption. Data must be easy to find, retrieve, and share within the organization. Data is a corporate accountable asset, managed collaboratively by data governance, data quality, and data security initiatives. This Quick Start is for users who want to deploy and develop an Informatica Data Lake Management solution on the AWS Cloud. Page 2 of 30

3 Informatica Components The Data Lake Management solution uses the following Informatica products: Informatica Big Data Management enables your organization to process large, diverse, and fast changing datasets so you can get insights into your data. Use Big Data Management to perform big data integration and transformation without writing or maintaining Apache Hadoop code. Collect diverse data faster, build business logic in a visual environment, and eliminate hand-coding to get insights on your data. Informatica Enterprise Data Catalog brings together all data assets in an enterprise and presents a comprehensive view of the data assets and data asset relationships. Enterprise Data Catalog captures the technical, business, and operational metadata for a large number of data assets that you use to determine the effectiveness of enterprise data. From across the enterprise, Enterprise Data Catalog gathers information related to metadata, including column data statistics, data domains, data object relationships, and data lineage information. A comprehensive view of enterprise metadata can help you make critical decisions on data integration, data quality, and data governance in the enterprise. The Developer tool includes the native and Hadoop run-time environments for optimal processing. In the native environment, the Data Integration Service processes the data. In the Hadoop environment, the Data Integration Service pushes the processing to nodes in a Hadoop cluster. Costs and Licenses You are responsible for the cost of the AWS services used while running this Quick Start reference deployment. There is no additional cost for using the Quick Start. The AWS CloudFormation template for this Quick Start includes configuration parameters that you can customize. Some of these settings, such as instance type, will affect the cost of deployment. See the pricing pages for each AWS service you will be using for cost estimates. This Quick Start requires a license to deploy the Informatica Data Lake Management solution, as described in the Prerequisites section. To sign up for a demo license, contact Informatica. Page 3 of 30

4 Architecture Figure 1 shows the typical components of a generic data lake management solution. Figure 1: Components of a data lake management solution The solution includes the following core components, beginning with the lower part of the diagram in Figure 1: Big Data Infrastructure: From a connectivity perspective (for example, on-premises, cloud, IOT, unstructured, semi-structured) the solution reliably accommodates an expanding volume and variety of data types. The solution has the capacity to scale up (when you increase individual hardware capacity) or scale out (when you increase infrastructure capacity linearly for parallel processing), and can be deployed directly into your AWS environment. Big Data Storage: The solution can store large amounts of a variety of data (structured, unstructured, semi-structured) at scale with the performance that guarantees timely delivery of data to business analysts. Big Data Processing: The solution can process data at any latency, such as real time, near real time, and batch, using big data processing frameworks such as Apache Spark. Metadata Intelligence manages all the metadata from a variety of data sources. For example, a data catalog manages data generated by big data and by traditional sources. To Page 4 of 30

5 do this, it collects, indexes, and applies machine learning to metadata. It also provides metadata services such as semantic search, automated data domain discovery and tagging, and data intelligence that can guide user behavior. Big Data Integration, in which a data lake architecture must integrate data from various disparate data sources, at any latency, with the ability to rapidly develop ELT (extract, load, and transform) or ETL (extract, transform, and load) data flows. Big Data Governance and Quality are critical to a data lake, especially when dealing with a variety of data. The purpose of big data governance is to deliver trusted, timely, and relevant information to support the business outcome. Big Data Security is the process of minimizing data risk. Activities include discovering, identifying, classifying, and protecting sensitive data, as well as analyzing its risk based on value, location, protection, and proliferation. Finally, Intelligent Data Applications (Self-Service Data Preparation, Enterprise Data Catalog, and Data Security Intelligence) provide data analysts, data scientists, data stewards, and data architects with a collaborative self-service platform for data governance and security that can discover, catalog and prepare data for big data analytics. Informatica Services on AWS Deploying this Quick Start with default parameters builds the Informatica Data Lake environment illustrated in Figure 2 in the AWS Cloud. The Quick Start deployment automatically creates the following Informatica elements: Domain Model Repository Service Data Integration Service In addition, the deployment automatically embeds Hadoop clusters in the virtual private cloud (VPC) for metadata storage and processing. The deployment then assigns the connection to the Amazon EMR cluster for the Hadoop Distributed File System (HDFS) and Hive. It also sets up connections to enable scanning of Amazon Simple Storage Service (Amazon S3) and Amazon Redshift environments as part of the data lake. Page 5 of 30

6 The Informatica domain and repository database are hosted on Amazon Relational Database Service (Amazon RDS) using Oracle, which handles management tasks such as backups, patch management, and replication. To access Informatica Services on the AWS Cloud, you can install the Informatica client to run Big Data Management on a Microsoft Windows machine. You can then access Enterprise Data Catalog by using a web browser. Figure 2 shows the Informatica Data Lake Management solution deployed on AWS. Figure 2: Informatica Data Lake Management solution deployed on AWS The Quick Start sets up a highly available architecture that spans two Availability Zones, and a VPC configured with public and private subnets according to AWS best practices. Managed network address translation (NAT) gateways are deployed into the public subnets and configured with an Elastic IP address for outbound internet connectivity. Page 6 of 30

7 The Quick Start also installs and configures the following information services during the one-click deployment: Informatica domain, which is the fundamental administrative unit of the Informatica platform. The Informatica platform has a service-oriented architecture that provides the ability to scale services and share resources across multiple machines. Model Repository Service, which is a relational database that stores all the metadata for projects created using Informatica client tools. The model repository also stores runtime and configuration information for applications that are deployed to a Data Integration Service. Data Integration Service, which is a compute component within the Informatica domain that manages requests to submit big data integration, big data quality, and profiling jobs to the Hadoop cluster for processing. Content Management Service, which manages reference data. It provides reference data information to the Data Integration Service and Informatica Developer. Analyst Service, which runs the Analyst tool in the Informatica domain. The Analyst Service manages the connections between the service components and the users who log in to the Analyst tool. You can perform column and rule profiling, manage scorecards, and manage bad records and duplicate records in the Analyst tool. Profiling, which helps you find the content, quality, and structure of data sources of an application, schema, or enterprise. A profile is a repository object that finds and analyzes all data irregularities across data sources in the enterprise, and hidden data problems that put data projects at risk. The profiling results include unique values, null values, data domains, and data patterns. When you use this Quick Start, you can run profiling on the Data Integration Service (default) or Hadoop. Business Glossary, which consists of online glossaries of business terms and policies that define important concepts within an organization. Data stewards create and publish terms that include information such as descriptions, relationships to other terms, and associated categories. Glossaries are stored in a central location for easy lookup by consumers. Glossary assets include business terms, policies, and categories that contain information that consumers might search for. A glossary is a high-level container that stores Glossary assets. A business term defines relevant concepts within the organization, and a policy defines the business purpose that governs practices related to the term. Business terms and policies can be associated with categories, which are descriptive classifications. Catalog Service, which runs Enterprise Data Catalog and manages connections between service components and external applications. Page 7 of 30

8 An embedded Hadoop cluster that uses Hortonworks, running HDFS, Hbase, Yarn, and Solr. Informatica Cluster Service, which runs and manages all Hadoop services, Apache Ambari server, and Apache Ambari agents on the embedded Hadoop cluster. Metadata and Catalog, which include the metadata persistence store, search index, and graph database in an embedded Hadoop cluster. The catalog represents an indexed inventory of all the data assets in the enterprise that you configure in Enterprise Data Catalog. Enterprise Data Catalog organizes all the enterprise metadata in the catalog and enables the users of external applications to discover and understand the data. The Informatica domain and the Informatica Model Repository databases are configured on Amazon RDS using Oracle. Planning the Data Lake Management Deployment Deployment Options This Quick Start provides two deployment options: Deployment of the Data Lake Management solution into a new VPC (end-toend deployment). This option builds a new virtual private cloud (VPC) with public and private subnets, and then deploys the Informatica Data Lake Management solution into that infrastructure. Deployment of the Data Lake Management solution into an existing VPC. This option provisions data lake components into your existing AWS infrastructure. The Quick Start provides separate templates for these options. It also lets you configure CIDR blocks, instance types, and data lake settings, as discussed later in this guide. Prerequisites Specialized Knowledge Before you deploy this Quick Start, we recommend that you become familiar with the following AWS services: Amazon VPC Amazon EC2 Amazon EMR If you are new to AWS, see the Getting Started Resource Center. Page 8 of 30

9 Technical Requirements Before you deploy this Quick Start, verify the following prerequisites: You have an account with AWS, and you know the account login information. You have purchased a license for the Informatica Data Lake Management solution. To sign up for a demo license, please contact Informatica, your sales representative, or the consulting partner you re working with. The license file should have a name like AWSDatalakeLicense.key. Deployment Steps Step 1. Prepare Your AWS Account 1. If you don t already have an AWS account, create one at by following the on-screen instructions. 2. Use the region selector in the navigation bar to choose the AWS Region where you want to deploy the Informatica Data Lake Management solution on AWS. 3. Create a key pair in your preferred region. When you log in to any Amazon EC2 system or Amazon EMR cluster, you use a password file for authentication. The file is called a private key file and has a file name extension of.pem. If you do not have an existing.pem key to use, follow the instructions in the AWS documentation to create a key pair. Note Your administrator might ask you to use a particular existing key pair. When you create a key pair, you save the.pem file to your desktop system. Simultaneously, AWS saves the key pair to your account. Make a note of the key pair that you want to use for the Data Lake Management instance, so that you can provide the key pair name during network configuration. 4. If necessary, request a service limit increase for the Amazon EC2 M3 and M4 instance types. You might need to do this if you already have an existing deployment that uses this instance type, and you think you might exceed the default limit with this reference deployment. Step 2. Upload Your Informatica License Upload the license for the Informatica Data Lake Management solution to an S3 bucket, following the instructions in the Amazon S3 documentation. You will be prompted for the bucket name during deployment. Page 9 of 30

10 To sign up for a demo license, please contact Informatica, your sales representative, or the consulting partner you re working with. Step 3. Launch the Quick Start Note You are responsible for the cost of the AWS services used while running this Quick Start reference deployment. There is no additional cost for using this Quick Start. For full details, see the pricing pages for each AWS service you will be using in this Quick Start. Prices are subject to change. 1. Choose one of the following options to launch the AWS CloudFormation template into your AWS account. For help choosing an option, see deployment options earlier in this guide. Option 1 Deploy data lake into a new VPC on AWS Launch Option 2 Deploy data lake into an existing VPC on AWS Launch Important If you re deploying Informatica Data Lake Management into an existing VPC, make sure that your VPC has two private and two public subnets in different Availability Zones for the database instances. These subnets require NAT gateways or NAT instances in their route tables, to allow the instances to download packages and software without exposing them to the Internet. You ll also need the domain name option configured in the DHCP options as explained in the Amazon VPC documentation. You ll be prompted for your VPC settings when you launch the Quick Start. Each deployment takes about two hours to complete. 2. Check the region that s displayed in the upper-right corner of the navigation bar, and change it if necessary. This is where the network infrastructure for Informatica Data Lake Management will be built. The template is launched in the US East (Ohio) Region by default. 3. On the Select Template page, keep the default setting for the template URL, and then choose Next. Page 10 of 30

11 4. On the Specify Details page, change the stack name if needed. Review the parameters for the template. Provide values for the parameters that require input. For all other parameters, review the default settings and customize them as necessary. When you finish reviewing and customizing the parameters, choose Next. In the following tables, parameters are listed by category and described separately for the two deployment options: Parameters for deploying Informatica components into a new VPC Parameters for deploying Informatica components into an existing VPC Note The templates for the two scenarios share most, but not all, of the same parameters. For example, the template for an existing VPC prompts you for the VPC and subnet IDs in your existing VPC environment. You can also download the templates and edit them to create your own parameters based on your specific deployment scenario. Option 1: Parameters for deploying into a new VPC View template Network Configuration: Availability Zones (AvailabilityZones) The two Availability Zones that will be used to deploy Informatica Data Lake Management components. The Quick Start preserves the logical order you specify. VPC CIDR (VPCCIDR) Private Subnet 1 CIDR (PrivateSubnet1CIDR) Private Subnet 2 CIDR (PrivateSubnet2CIDR) Public Subnet 1 CIDR (PublicSubnet1CIDR) Public Subnet 2 CIDR (PublicSubnet2CIDR) /16 The CIDR block for the VPC /19 The CIDR block for the private subnet located in Availability Zone /19 The CIDR block for the private subnet located in Availability Zone /20 The CIDR block for the public (DMZ) subnet located in Availability Zone /20 The CIDR block for the public (DMZ) subnet located in Availability Zone 2. IP Address Range (RemoteAccessCIDR) The CIDR IP range that is permitted to access the Informatica domain and the Amazon EMR cluster. We recommend that you use a constrained CIDR range to reduce the potential of inbound attacks from unknown IP addresses. For example, to Page 11 of 30

12 specify the range of to , enter /49. Amazon EC2 Configuration: Informatica Embedded Cluster Size (ICSClusterSize) Informatica Domain Instance Type (InformaticaServer InstanceType) Key Pair Name (KeyPairName) Small c4.4xlarge The size of the Informatica embedded cluster. Choose from the following: Small: c4.8xlarge, single node Medium: c4.8xlarge, three nodes Large: c4.8xlarge, six nodes The EC2 instance type for the instance that hosts the Informatica domain. The two options are c4.4xlarge and c4.8xlarge. A public/private key pair, which allows you to connect securely to your instance after it launches. When you created an AWS account, this is the key pair you created in your preferred region. Amazon EMR Configuration: EMR Cluster Name (EMRClusterName) EMR Core Instance Type (EMRCoreInstanceType) m4.xlarge The name of the Amazon EMR cluster where the Data Lake Management instance will be deployed. The instance type for Amazon EMR core nodes. EMR Core Nodes (EMRCoreNodes) The number of core nodes. Enter a value between 1 and 500. EMR Master Instance Type (EMRMasterInstance Type) EMR Logs Bucket Name (EMRLogBucket) m4.xlarge The instance type for the Amazon EMR master node. The S3 bucket where the Amazon EMR logs will be stored. Page 12 of 30

13 Amazon RDS Configuration: Informatica Database Username (DBUser) Informatica Database Instance Password (DBPassword) awsquickstart The user name for the database instance associated with the Informatica domain and services (such as Model Repository Service, Data Integration Service, and Content Management Service). The user name is an 8-18 character string. The password for the database instance associated with the Informatica domain and services. The password is an 8-18 character string. Amazon Redshift Configuration: Redshift Cluster Type (RedshiftClusterType) Redshift Database Name (RedshiftDatabaseName) single-node dev The type of cluster. You can specify single-node or multi-node. If you specify multi-node, use the Redshift Number of Nodes parameter to specify how many nodes you would like to provision in your cluster. The name of the first database to create when the cluster is created. Redshift Database Port (RedshiftDatabasePort) Redshift Number of Nodes (RedshiftNumberOf Nodes) 5439 The port number on which the cluster accepts incoming connections. 1 The number of compute nodes in the cluster. For multi-node clusters, this parameter must be greater than 1. Redshift Node Type (RedshiftNodeType) Redshift Username (RedshiftUsername) Redshift Password (RedshiftPassword) ds2.xlarge defaultuser The compute, memory, storage, and I/O capacity of the cluster's nodes. For node size specifications, see the Amazon Redshift documentation. The user name that is associated with the master user account for the cluster that is being created. The password that is associated with the master user account for the cluster that is being created. The password must be an 8-64 character string that consists of at least one uppercase letter, one lowercase letter, and one number. Page 13 of 30

14 Informatica Enterprise Catalog and BDM Configuration: Informatica Administrator Username (InformaticaAdminUser) Informatica Administrator Password (InformaticaAdmin Password) License Key Location (InformaticaKeyS3Bucket) License Key Name (InformaticaKeyName) Import Sample Content (ImportSampleData) No The administrator user name for accessing Big Data Management. You can specify any string. Make a note of the user name and password, and use it later to log in to the Administrator tool to configure the Informatica domain. The administrator password for accessing Big Data Management. You can specify any string. Make a note of the user name and password, and use it later to log in to the Administrator tool to configure the Informatica domain. The name of the S3 bucket in your account that contains the Informatica license key. The Informatica license key name; for example, INFALicense_10_2.key. Note: The key file must be in the top level of the S3 bucket and not in a subfolder. Select Yes to import sample catalog data. You can use the sample data to get started with the product. AWS Quick Start Configuration: Informatica recommends that you do not change the default values for the parameters in this category. Quick Start S3 Bucket Name (QSS3BucketName) Quick Start S3 Key Prefix (QSS3KeyPrefix) quickstartreference informatica/ datalake/latest/ The S3 bucket name for the Quick Start assets. This bucket name can include numbers, lowercase letters, uppercase letters, and hyphens (-), but should not start or end with a hyphen. You can specify your own bucket if you copy all of the assets and submodules into it, if you want to customize the templates and override the Quick Start behavior for your specific implementation. The S3 key name prefix for your copy of the Quick Start assets. This prefix can include numbers, lowercase letters, uppercase letters, hyphens (-), and forward slashes (/). This parameter enables you to customize or extend the Quick Start for your specific implementation. Page 14 of 30

15 Option 2: Parameters for deploying into an existing VPC View template Network Configuration: VPC (VPCID) Informatica Domain Subnet (InformaticaServer SubnetID) Informatica Database Subnets (DBSubnetIDs) IP Address Range (IPAddressRange) The ID of your existing VPC where you want to deploy the Informatica Data Lake Management solution (for example, vpc e). The VPC must meet the following requirements: It must be set up with public access through the internet via an attached internet gateway. The DNS Resolution property of the VPC must be set to Yes. The Edit DNS Hostnames property of the VPC must be set to Yes. A publicly accessible subnet ID where the Informatica domain will reside. Select one of the available subnets listed. The IDs of two private subnets in the selected VPC. Note: These subnets must be in different Availability Zones in the selected VPC. The CIDR IP range that is permitted to access the Informatica domain and the Informatica embedded cluster. We recommend that you use a constrained CIDR range to reduce the potential of inbound attacks from unknown IP addresses. For example, to specify the range of to , enter /49. Amazon EC2 Configuration: Key Pair Name (KeyPairName) Informatica Domain Instance Type (InformaticaServer InstanceType) Informatica Embedded Cluster c4.4xlarge Small A public/private key pair, which allows you to connect securely to your instance after it launches. When you created an AWS account, this is the key pair you created in your preferred region. The EC2 instance type for the instance that hosts the Informatica domain. The two options are c4.4xlarge and c4.8xlarge. The size of the Informatica embedded cluster. Choose from the following: Page 15 of 30

16 Size (ICSClusterSize) Small: c4.8xlarge, single node Medium: c4.8xlarge, three nodes Large: c4.8xlarge, six nodes Amazon EMR Configuration: EMR Master Instance Type (EMRMasterInstance Type) EMR Core Instance Type (EMRCoreInstanceType) EMR Cluster Name (EMRClusterName) m4.xlarge m4.xlarge The instance type for the Amazon EMR master node. The instance type for Amazon EMR core nodes. The name of the Amazon EMR cluster where the Data Lake Management instance will be deployed. EMR Core Nodes (EMRCoreNodes) The number of core nodes. Enter a value between 1 and 500. EMR Logs Bucket Name (EMRLogBucket) The S3 bucket where the Amazon EMR logs will be stored. Amazon RDS Configuration: Informatica Database Username (DBUser) Informatica Database Instance Password (DBPassword) awsquickstart The user name for the database instance associated with the Informatica domain and services (such as Model Repository Service, Data Integration Service, and Content Management Service). The user name is an 8-18 character string. The password for the database instance associated with the Informatica domain and services. The password is an 8-18 character string. Amazon Redshift Configuration: Redshift Database Name (RedshiftDatabaseName) dev The name of the first database to create when the cluster is created. Page 16 of 30

17 Redshift Cluster Type (RedshiftClusterType) single-node The type of cluster. You can specify single-node or multi-node. If you specify multi-node, use the Redshift Number of Nodes parameter to specify how many nodes you would like to provision in your cluster. Redshift Number of Nodes (RedshiftNumberOf Nodes) 1 The number of compute nodes in the cluster. For multi-node clusters, this parameter must be greater than 1. Redshift Node Type (RedshiftNodeType) Redshift Username (RedshiftUsername) Redshift Password (RedshiftPassword) ds2.xlarge defaultuser The compute, memory, storage, and I/O capacity of the cluster's nodes. For node size specifications, see the Amazon Redshift documentation. The user name that is associated with the master user account for the cluster that is being created. The password that is associated with the master user account for the cluster that is being created. The password must be an 8-64 character string that consists of at least one uppercase letter, one lowercase letter, and one number. Redshift Database Port (RedshiftDatabasePort) 5439 The port number on which the cluster accepts incoming connections. Informatica Enterprise Catalog and BDM Configuration: Informatica Administrator Username (InformaticaAdminUser name) Informatica Administrator Password (InformaticaAdmin Password) License Key Location (InformaticaKeyS3Bucket) License Key Name (InformaticaKeyName) The administrator user name for accessing Big Data Management. You can specify any string. Make a note of the user name and password, and use it later to log in to the Administrator tool to configure the Informatica domain. The administrator password for accessing Big Data Management. You can specify any string. Make a note of the user name and password, and use it later to log in to the Administrator tool to configure the Informatica domain. The name of the S3 bucket in your account that contains the Informatica license key. The Informatica license key name; for example, INFALicense_10_2.key. Note: The key file must be in the top level of the S3 bucket and not in a subfolder. Page 17 of 30

18 Import Sample Content (ImportSampleData) No Select Yes to import sample catalog data. You can use the sample data to get started with the product. AWS Quick Start Configuration: Informatica recommends that you do not change the default values for the parameters in this category. Quick Start S3 Bucket Name (QSS3BucketName) Quick Start S3 Key Prefix (QSS3KeyPrefix) quickstartreference informatica/ datalake/latest/ The S3 bucket name for the Quick Start assets. This bucket name can include numbers, lowercase letters, uppercase letters, and hyphens (-), but should not start or end with a hyphen. You can specify your own bucket if you copy all of the assets and submodules into it, if you want to customize the templates and override the Quick Start behavior for your specific implementation. The S3 key name prefix for your copy of the Quick Start assets. This prefix can include numbers, lowercase letters, uppercase letters, hyphens (-), and forward slashes (/). This parameter enables you to customize or extend the Quick Start for your specific implementation. When you finish reviewing and customizing the parameters, choose Next. 5. On the Options page, you can specify tags (key-value pairs) for resources in your stack and set advanced options. When you re done, choose Next. 6. On the Review page, review and confirm the template settings. Under Capabilities, select the check box to acknowledge that the template will create IAM resources. 7. Choose Create to deploy the stack. Step 4. Monitor the Deployment During deployment, you can monitor the creation of the cluster instance and the Informatica domain, and get more information about system resources. 1. Choose the stack that you are creating, and then choose the Events tab to monitor the creation of the stack. Figure 3 shows part of the Events tab. Page 18 of 30

19 Figure 3: Monitoring the deployment in the Events tab When stack creation is complete, the Status field shows CREATE_COMPLETE, and the Outputs tab displays a list of stacks that have been created, as shown in Figure Choose the Resources tab. Figure 4: Stack creation complete This tab displays information about the stack and the Data Lake instance. You can select the linked physical ID properties of individual resources to get more information about them, as shown in Figure 5. Page 19 of 30

20 3. Choose the Outputs tab. Figure 5: Resources tab When the Informatica domain setup is complete, the Outputs tab displays the following information: Key RedShiftIamRole EICCatalogURL InstanceID InformaticaAdminConsoleURL EtcHostFileEntry EICAdminURL EMRResourceManagerURL RedShiftClusterEndpoint CloudFormationLogs S3DatalakeBucketName InstanceSetupLogs InformaticaHadoopInstallLogs InformaticaDomainDatabaseEndPoint InformaticaAdminConsoleServerLogs InformaticaHadoopClusterURL Amazon Resource Name (ARN) for the Amazon RedShift IAM role URL for the Informatica EIC user console Informatica domain host name URL for the Informatica administrator console Etc host file entry to be added to the /etc/hosts file to enable access to the domain, using the host name of the Adminstrative Server URL for the EIC Administrator URL for the Amazon EMR Resource Manager Amazon Redshift cluster endpoint Location of the AWS CloudFormation installation log Name of the S3 bucket used for the data lake Location of the setup log for the Informatica domain EC2 instance Location of the master node Hadoop installation log Informatica domain database endpoint Location of the Informatica domain installation log URL to the IHS Hadoop gateway node Page 20 of 30

21 Key InformaticaBDMDeveloperClient Location where you can download the Informatica Developer tool (see step 5) Note If the Outputs tab is not populated with this information, wait for domain setup to be complete. 4. Use the links in the Outputs tab to access Informatica management tools. For example: Use InformaticaAdminConsoleURL EICAdminURL EICCatalogURL To Open the Instance Administration screen. You can use this screen to manage Informatica services and resources. You can also get additional information about the instance, such as the public DNS and public IP address. Administer the Enterprise Data Catalog environment. Access Enterprise Data Catalog. See the Informatica Enterprise Data Catalog User Guide for information about logging in to Enterprise Data Catalog. Step 5. Download and Install Informatica Developer Informatica Developer (the Developer tool) is an application that you use to design and implement data integration, data quality, data profiling, data services, and big data solutions. You can use the Developer tool to import metadata, create connections, and create data objects. You can also use the Developer tool to create and run profiles, mappings, and workflows. 1. Log in to the AWS CloudFormation console at 1. Choose the Outputs tab. 2. Right-click the value of the InformaticaBDMDeveloperClient key to download the Developer tool client installer. 3. Uncompress and launch the installer to install the Developer tool on a local drive. Page 21 of 30

22 Manual Cleanup If you deploy the Quick Start for a new VPC, Amazon EMR creates security groups that are not deleted when you delete the Amazon EMR cluster. To clean up after deployment, follow these steps: 1. Delete the Amazon EMR cluster. 2. Delete the Amazon EMR-managed security groups (ElasticMapReduce-master, ElasticMapReduce-slave) by deleting the circularly dependent rules followed by the security groups themselves. 3. Delete the AWS CloudFormation stack. Troubleshooting Q. I encountered a CREATE_FAILED error when I launched the Quick Start. A. If you encounter this error in the AWS CloudFormation console, we recommend that you relaunch the template with Rollback on failure set to No. (This setting is under Advanced in the AWS CloudFormation console, Options page.) With this setting, the stack s state will be retained and the instance will be left running, so you can troubleshoot the issue. (You'll want to look at the log files in %ProgramFiles%\Amazon\EC2ConfigService and C:\cfn\log.) Important When you set Rollback on failure to No, you ll continue to incur AWS charges for this stack. Please make sure to delete the stack when you ve finished troubleshooting. For additional information, see Troubleshooting AWS CloudFormation on the AWS website. Q. I encountered an error while installing Informatica domain and services. A. We recommend that you view the /installation.log log file to get more information about the errors you encountered. Q. I encountered a size limitation error when I deployed the AWS Cloudformation templates. A. We recommend that you launch the Quick Start templates from the location we ve provided or from another S3 bucket. If you deploy the templates from a local copy on your computer or from a non-s3 location, you might encounter template size limitations when you create the stack. For more information about AWS CloudFormation limits, see the AWS documentation. Page 22 of 30

23 Using Informatica Data Lake Management on AWS After you deploy this Quick Start, you can use any of the patterns described in this section to use the Informatica Data Lake Management solution on AWS. Transient and Persistent Clusters Amazon EMR provides two methods to configure a cluster: transient and persistent. Transient clusters are shut down when the jobs are complete. For example, if a batchprocessing job pulls web logs from Amazon S3 and processes the data once a day, it is more cost-effective to use transient clusters to process web log data and shut down the nodes when the processing is complete. Persistent clusters continue to run after data processing is complete. The Informatica Data Lake Management solution supports both cluster types. For more information, see the Amazon EMR best practices whitepaper. This Quick Start sets up a persistent EMR cluster with a configurable number of core nodes, as defined by the EMRCoreNodes parameter. Common AWS Architecture Patterns for Informatica Data Lake Management Informatica Data Lake Management supports the following patterns that leverage AWS for big data processing. Pattern 1: Using Amazon S3 In this first pattern, data is loaded to Amazon S3 using Informatica. For data processing, the Informatica Big Data Management mapping logic pulls data from Amazon S3 and sends it for processing to Amazon EMR. Amazon EMR does not copy the data to the local disk or HDFS. Instead, the mappings open multithreaded HTTP connections to Amazon S3, pull data to the Amazon EMR cluster, and process data in streams, as illustrated in Figure 6. Page 23 of 30

24 Figure 6: Pattern 1 using Amazon S3 Pattern 2: Using HDFS and Amazon S3 as Backup Storage In this pattern, Informatica writes data directly to HDFS and leverages the Amazon EMR task nodes to process the data and periodically copy data to Amazon S3 as the backup storage, as illustrated in Figure 7. The advantage of this pattern is the ability to process data without copying it to Amazon EMR. Although copying to Amazon EMR may improve performance, the disadvantage is durability. Because Amazon EMR uses ephemeral disk to store data, data could be lost if the EC2 instance for Amazon EMR fails. HDFS replicates data within the Amazon EMR cluster and can usually recover from node failures. However, data loss could still occur if the number of lost nodes is greater than your replication factor. Informatica recommends that you back up HDFS data to Amazon S3 periodically. Page 24 of 30

25 Figure 7: Pattern 2 using HDFS and Amazon S3 as backup Pattern 3: Using Amazon Kinesis and Kinesis Firehose for Real-Time and Streaming Analytics In the third pattern, unbounded events streams that are continuously generated from devices, IoT applications, and cloud applications are ingested in real time, using Informatica Edge Data Streaming, into Amazon Kinesis. Using Informatica Big Data Streaming, which leverages the existing Informatica platform, streaming pipelines can be built using pre-built transformations, connectors, and parsers. These elements are optimized to execute on an Amazon EMR cluster in streaming mode using Spark Streaming. They support the consumption of data records from an Amazon Kinesis stream and act as a producer for writing data to a defined Amazon Kinesis Firehose delivery stream. Data can be persisted to Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service (Amazon ES) and delivered in JSON and binary payloads. For more information about deploying Informatica Big Data Streaming on AWS, please contact Informatica or your implementation partner. Figure 8 shows the Informatica Big Data Streaming architecture. Page 25 of 30

26 Figure 8: Pattern 3 using the Informatica Big Data Streaming architecture Pattern 4: Using AWS for Self-Service Data Discovery and Preparation In the last pattern, Informatica Enterprise Data Lake provides data analysts with a collaborative, self-service, big data discovery and preparation solution. Analysts can rapidly discover and turn raw data into insights, with quality and governance powered by data intelligence deployed on AWS. When deployed on AWS, Informatica Enterprise Data Lake leverages the existing Informatica platform, which allows analysts to discover, search, and explore data assets for analysis using an AI-driven data catalog. The Data Lake Management solution makes recommendations based on the behavior and shared knowledge of the data assets used for analysis. Once analysts find the relevant data, they can blend, transform, cleanse, and enrich data by using a Microsoft Excel-like data preparation interface, at scale on an Amazon EMR cluster. Data is prepared, published, and made available for consumption in the data lake. An analyst can assess the prepared data using ad-hoc queries to generate charts, tables, and other visual formats. IT can operationalize the data preparation steps that will execute the ad-hoc work done by analysts into Informatica big data mappings, which will run in batch on an Amazon EMR cluster. Page 26 of 30

27 You can deploy Informatica Enterprise Data Lake on the same AWS infrastructure that supports Informatica Big Data Management and Informatica Enterprise Data Catalog. Figure 9 shows the data flows for Informatica Enterprise Data Lake. Figure 9: Data flows used in pattern 4 Process Flow Figure 10 shows the process flow for using the Informatica Data Lake Management solution on AWS. It illustrates the data flow process using the Informatica Data Lake Management solution and Amazon EMR, Amazon S3, and Amazon Redshift. Page 27 of 30

28 Figure 10: Informatica Data Lake Management Solution process flow using Amazon EMR The numbers in Figure 10 refer to the following steps: Step 1: Collect and move data from on-premises systems into Amazon S3 storage. Consider offloading infrequently used data, and batch-load raw data to a defined landing zone in Amazon S3. Step 2: Collect cloud application and streaming data generated by machines and sensors in Amazon S3 storage instead of staging it in a temporary file system or a data warehouse. Step 3: Discover and profile data stored in Amazon S3, using Amazon EMR as the processing infrastructure. Profile data to better understand its structure and context. Parse raw data, either in multi-structured or unstructured formats, to extract features and entities, and cleanse data with data quality tasks. To prepare data for analysis, you can execute prebuilt transformations and data quality rules natively in EMR to prepare data for analysis. Step 4: Match duplicate data within and across big data sources and link them to create a single view. Step 5: Perform data masking to protect confidential data such as credit card information, social security numbers, names, addresses, and phone numbers from unintended exposure to reduce the risk of data breaches. Data masking helps IT organizations manage the access to their most sensitive data, providing enterprise-wide scalability, robustness, and connectivity to a vast array of databases. Page 28 of 30

29 Step 6: Data analysts and data scientists can prepare and collaborate on data for analytics by incorporating semantic search, data discovery, and intuitive data preparation tools for interactive analysis with trusted, secure, and governed data assets. Step 7: After cleansing and transforming data on Amazon EMR, move high-value curated data back to Amazon S3 or to Amazon Redshift. From there, users can directly access data with BI reports and applications. Additional Resources AWS services AWS CloudFormation Amazon EBS Amazon EC2 Amazon EMR Amazon Redshift Amazon S3 Amazon VPC Informatica Informatica Network: a source for product documentation, Knowledge Base articles, and other information Quick Start reference deployments AWS Quick Start home page Page 29 of 30

30 GitHub Repository You can visit our GitHub repository to download the templates and scripts for this Quick Start, to post your comments, and to share your customizations with others. Document Revisions Date Change In sections January 2018 Initial publication 2018, Amazon Web Services, Inc. or its affiliates, and Informatica LLC. All rights reserved. Notices This document is provided for informational purposes only. It represents AWS s current product offerings and practices as of the date of issue of this document, which are subject to change without notice. Customers are responsible for making their own independent assessment of the information in this document and any use of AWS s products or services, each of which is provided as is without warranty of any kind, whether express or implied. This document does not create any warranties, representations, contractual commitments, conditions or assurances from AWS, its affiliates, suppliers or licensors. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers. The software included with this paper is licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at or in the "license" file accompanying this file. This code is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Page 30 of 30

Informatica Big Data Management on the AWS Cloud

Informatica Big Data Management on the AWS Cloud Informatica Big Data Management on the AWS Cloud Quick Start Reference Deployment November 2016 Andrew McIntyre, Informatica Big Data Management Team Santiago Cardenas, AWS Quick Start Reference Team Contents

More information

Confluence Data Center on the AWS Cloud

Confluence Data Center on the AWS Cloud Confluence Data Center on the AWS Cloud Quick Start Reference Deployment March 2017 Atlassian AWS Quick Start Reference Team Contents Overview... 2 Costs and Licenses... 2 Architecture... 3 Prerequisites...

More information

SIOS DataKeeper Cluster Edition on the AWS Cloud

SIOS DataKeeper Cluster Edition on the AWS Cloud SIOS DataKeeper Cluster Edition on the AWS Cloud Quick Start Reference Deployment April 2017 Last update: May 2017 (revisions) SIOS Technology Corp. AWS Quick Start Reference Team Contents Overview...

More information

HashiCorp Vault on the AWS Cloud

HashiCorp Vault on the AWS Cloud HashiCorp Vault on the AWS Cloud Quick Start Reference Deployment November 2016 Last update: April 2017 (revisions) Cameron Stokes, HashiCorp, Inc. Tony Vattathil and Brandon Chavis, Amazon Web Services

More information

JIRA Software and JIRA Service Desk Data Center on the AWS Cloud

JIRA Software and JIRA Service Desk Data Center on the AWS Cloud JIRA Software and JIRA Service Desk Data Center on the AWS Cloud Quick Start Reference Deployment Contents October 2016 (last update: November 2016) Chris Szmajda, Felix Haehnel Atlassian Shiva Narayanaswamy,

More information

Netflix OSS Spinnaker on the AWS Cloud

Netflix OSS Spinnaker on the AWS Cloud Netflix OSS Spinnaker on the AWS Cloud Quick Start Reference Deployment August 2016 Huy Huynh and Tony Vattathil Solutions Architects, Amazon Web Services Contents Overview... 2 Architecture... 3 Prerequisites...

More information

Building a Modular and Scalable Virtual Network Architecture with Amazon VPC

Building a Modular and Scalable Virtual Network Architecture with Amazon VPC Building a Modular and Scalable Virtual Network Architecture with Amazon VPC Quick Start Reference Deployment Santiago Cardenas Solutions Architect, AWS Quick Start Reference Team August 2016 (revisions)

More information

Swift Web Applications on the AWS Cloud

Swift Web Applications on the AWS Cloud Swift Web Applications on the AWS Cloud Quick Start Reference Deployment November 2016 Asif Khan, Tom Horton, and Tony Vattathil Solutions Architects, Amazon Web Services Contents Overview... 2 Architecture...

More information

Lambda Architecture for Batch and Stream Processing. October 2018

Lambda Architecture for Batch and Stream Processing. October 2018 Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.

More information

Puppet on the AWS Cloud

Puppet on the AWS Cloud Puppet on the AWS Cloud Quick Start Reference Deployment AWS Quick Start Reference Team March 2016 This guide is also available in HTML format at http://docs.aws.amazon.com/quickstart/latest/puppet/. Contents

More information

Splunk Enterprise on the AWS Cloud

Splunk Enterprise on the AWS Cloud Splunk Enterprise on the AWS Cloud Quick Start Reference Deployment February 2017 Bill Bartlett and Roy Arsan Splunk, Inc. Shivansh Singh AWS Quick Start Reference Team Contents Overview... 2 Costs and

More information

Cloudera s Enterprise Data Hub on the AWS Cloud

Cloudera s Enterprise Data Hub on the AWS Cloud Cloudera s Enterprise Data Hub on the AWS Cloud Quick Start Reference Deployment Shivansh Singh and Tony Vattathil Amazon Web Services October 2014 Last update: April 2017 (revisions) This guide is also

More information

Remote Desktop Gateway on the AWS Cloud

Remote Desktop Gateway on the AWS Cloud Remote Desktop Gateway on the AWS Cloud Quick Start Reference Deployment Santiago Cardenas Solutions Architect, AWS Quick Start Team April 2014 Last update: June 2017 (revisions) This guide is also available

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Informatica Enterprise Information Catalog

Informatica Enterprise Information Catalog Data Sheet Informatica Enterprise Information Catalog Benefits Automatically catalog and classify all types of data across the enterprise using an AI-powered catalog Identify domains and entities with

More information

Standardized Architecture for PCI DSS on the AWS Cloud

Standardized Architecture for PCI DSS on the AWS Cloud AWS Enterprise Accelerator Compliance Standardized Architecture for PCI DSS on the AWS Cloud Quick Start Reference Deployment AWS Professional Services AWS Quick Start Reference Team May 2016 (last update:

More information

unisys Unisys Stealth(cloud) for Amazon Web Services Deployment Guide Release 2.0 May

unisys Unisys Stealth(cloud) for Amazon Web Services Deployment Guide Release 2.0 May unisys Unisys Stealth(cloud) for Amazon Web Services Deployment Guide Release 2.0 May 2016 8205 5658-002 NO WARRANTIES OF ANY NATURE ARE EXTENDED BY THIS DOCUMENT. Any product or related information described

More information

Implementing Informatica Big Data Management in an Amazon Cloud Environment

Implementing Informatica Big Data Management in an Amazon Cloud Environment Implementing Informatica Big Data Management in an Amazon Cloud Environment Copyright Informatica LLC 2017. Informatica LLC. Informatica, the Informatica logo, Informatica Big Data Management, and Informatica

More information

Cloudera s Enterprise Data Hub on the Amazon Web Services Cloud: Quick Start Reference Deployment October 2014

Cloudera s Enterprise Data Hub on the Amazon Web Services Cloud: Quick Start Reference Deployment October 2014 Cloudera s Enterprise Data Hub on the Amazon Web Services Cloud: Quick Start Reference Deployment October 2014 Karthik Krishnan Page 1 of 20 Table of Contents Table of Contents... 2 Abstract... 3 What

More information

How can you implement this through a script that a scheduling daemon runs daily on the application servers?

How can you implement this through a script that a scheduling daemon runs daily on the application servers? You ve been tasked with implementing an automated data backup solution for your application servers that run on Amazon EC2 with Amazon EBS volumes. You want to use a distributed data store for your backups

More information

Securely Access Services Over AWS PrivateLink. January 2019

Securely Access Services Over AWS PrivateLink. January 2019 Securely Access Services Over AWS PrivateLink January 2019 Notices This document is provided for informational purposes only. It represents AWS s current product offerings and practices as of the date

More information

Video on Demand on AWS

Video on Demand on AWS Video on Demand on AWS AWS Implementation Guide Tom Nightingale April 2017 Last updated: November 2018 (see revisions) Copyright (c) 2018 by Amazon.com, Inc. or its affiliates. Video on Demand on AWS is

More information

Amazon AppStream 2.0: SOLIDWORKS Deployment Guide

Amazon AppStream 2.0: SOLIDWORKS Deployment Guide 2018 Amazon AppStream 2.0: SOLIDWORKS Deployment Guide Build an Amazon AppStream 2.0 environment to stream SOLIDWORKS to your users June 2018 https://aws.amazon.com/appstream2/ 1 Welcome This guide describes

More information

Chef Server on the AWS Cloud

Chef Server on the AWS Cloud Chef Server on the AWS Cloud Quick Start Reference Deployment Mike Pfeiffer December 2015 This guide is also available in HTML format at http://docs.aws.amazon.com/quickstart/latest/chef-server/. Contents

More information

Enterprise Data Catalog for Microsoft Azure Tutorial

Enterprise Data Catalog for Microsoft Azure Tutorial Enterprise Data Catalog for Microsoft Azure Tutorial VERSION 10.2 JANUARY 2018 Page 1 of 45 Contents Tutorial Objectives... 4 Enterprise Data Catalog Overview... 5 Overview... 5 Objectives... 5 Enterprise

More information

Best Practices and Performance Tuning on Amazon Elastic MapReduce

Best Practices and Performance Tuning on Amazon Elastic MapReduce Best Practices and Performance Tuning on Amazon Elastic MapReduce Michael Hanisch Solutions Architect Amo Abeyaratne Big Data and Analytics Consultant ANZ 12.04.2016 2016, Amazon Web Services, Inc. or

More information

AWS Solutions Architect Associate (SAA-C01) Sample Exam Questions

AWS Solutions Architect Associate (SAA-C01) Sample Exam Questions 1) A company is storing an access key (access key ID and secret access key) in a text file on a custom AMI. The company uses the access key to access DynamoDB tables from instances created from the AMI.

More information

SaaS Identity and Isolation with Amazon Cognito on the AWS Cloud

SaaS Identity and Isolation with Amazon Cognito on the AWS Cloud SaaS Identity and Isolation with Amazon Cognito on the AWS Cloud Quick Start Reference Deployment Judah Bernstein, Tod Golding, and Santiago Cardenas Amazon Web Services October 2017 Last updated: December

More information

Standardized Architecture for NIST-based Assurance Frameworks in the AWS Cloud

Standardized Architecture for NIST-based Assurance Frameworks in the AWS Cloud AWS Enterprise Accelerator Compliance Standardized Architecture for NIST-based Assurance Frameworks in the AWS Cloud Quick Start Reference Deployment AWS Professional Services AWS Quick Start Reference

More information

Quick Install for Amazon EMR

Quick Install for Amazon EMR Quick Install for Amazon EMR Version: 4.2 Doc Build Date: 11/15/2017 Copyright Trifacta Inc. 2017 - All Rights Reserved. CONFIDENTIAL These materials (the Documentation ) are the confidential and proprietary

More information

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without

More information

Installing and Configuring PowerCenter in the AWS Cloud

Installing and Configuring PowerCenter in the AWS Cloud Installing and Configuring PowerCenter in the AWS Cloud Copyright Informatica LLC 2017. Informatica, the Informatica logo, and PowerCenter are trademarks or registered trademarks of Informatica LLC in

More information

Energy Management with AWS

Energy Management with AWS Energy Management with AWS Kyle Hart and Nandakumar Sreenivasan Amazon Web Services August [XX], 2017 Tampa Convention Center Tampa, Florida What is Cloud? The NIST Definition Broad Network Access On-Demand

More information

Microsoft Windows Server Failover Clustering (WSFC) and SQL Server AlwaysOn Availability Groups on the AWS Cloud: Quick Start Reference Deployment

Microsoft Windows Server Failover Clustering (WSFC) and SQL Server AlwaysOn Availability Groups on the AWS Cloud: Quick Start Reference Deployment Microsoft Windows Server Failover Clustering (WSFC) and SQL Server AlwaysOn Availability Groups on the AWS Cloud: Quick Start Reference Deployment Mike Pfeiffer July 2014 Last updated: April 2015 (revisions)

More information

CPM. Quick Start Guide V2.4.0

CPM. Quick Start Guide V2.4.0 CPM Quick Start Guide V2.4.0 1 Content 1 Introduction... 3 Launching the instance... 3 CloudFormation... 3 CPM Server Instance Connectivity... 3 2 CPM Server Instance Configuration... 4 CPM Server Configuration...

More information

IoT Device Simulator

IoT Device Simulator IoT Device Simulator AWS Implementation Guide Sean Senior May 2018 Copyright (c) 2018 by Amazon.com, Inc. or its affiliates. IoT Device Simulator is licensed under the terms of the Amazon Software License

More information

Configuring AWS for Zerto Virtual Replication

Configuring AWS for Zerto Virtual Replication Configuring AWS for Zerto Virtual Replication VERSION 1 MARCH 2018 Table of Contents 1. Prerequisites... 2 1.1. AWS Prerequisites... 2 1.2. Additional AWS Resources... 3 2. AWS Workflow... 3 3. Setting

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC SAP Agile Data Preparation Simplify the Way You Shape Data Introduction SAP Agile Data Preparation Overview Video SAP Agile Data Preparation is a self-service data preparation application providing data

More information

Towards a Real- time Processing Pipeline: Running Apache Flink on AWS

Towards a Real- time Processing Pipeline: Running Apache Flink on AWS Towards a Real- time Processing Pipeline: Running Apache Flink on AWS Dr. Steffen Hausmann, Solutions Architect Michael Hanisch, Manager Solutions Architecture November 18 th, 2016 Stream Processing Challenges

More information

Amazon Elasticsearch Service

Amazon Elasticsearch Service Amazon Elasticsearch Service Fully managed, reliable, and scalable Elasticsearch service. Have Your Frontend & Monitor It Too Scalable Log Analytics Inside a VPC Lab Instructions Contents Lab Overview...

More information

EC2 Scheduler. AWS Implementation Guide. Lalit Grover. September Last updated: September 2017 (see revisions)

EC2 Scheduler. AWS Implementation Guide. Lalit Grover. September Last updated: September 2017 (see revisions) EC2 Scheduler AWS Implementation Guide Lalit Grover September 2016 Last updated: September 2017 (see revisions) Copyright (c) 2016 by Amazon.com, Inc. or its affiliates. EC2 Scheduler is licensed under

More information

Advanced Architectures for Oracle Database on Amazon EC2

Advanced Architectures for Oracle Database on Amazon EC2 Advanced Architectures for Oracle Database on Amazon EC2 Abdul Sathar Sait Jinyoung Jung Amazon Web Services November 2014 Last update: April 2016 Contents Abstract 2 Introduction 3 Oracle Database Editions

More information

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

MAPR DATA GOVERNANCE WITHOUT COMPROMISE MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance

More information

CloudHealth. AWS and Azure On-Boarding

CloudHealth. AWS and Azure On-Boarding CloudHealth AWS and Azure On-Boarding Contents 1. Enabling AWS Accounts... 3 1.1 Setup Usage & Billing Reports... 3 1.2 Setting Up a Read-Only IAM Role... 3 1.3 CloudTrail Setup... 5 1.4 Cost and Usage

More information

What s New at AWS? looking at just a few new things for Enterprise. Philipp Behre, Enterprise Solutions Architect, Amazon Web Services

What s New at AWS? looking at just a few new things for Enterprise. Philipp Behre, Enterprise Solutions Architect, Amazon Web Services What s New at AWS? looking at just a few new things for Enterprise Philipp Behre, Enterprise Solutions Architect, Amazon Web Services 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

More information

Amazon AWS-Solution-Architect-Associate Exam

Amazon AWS-Solution-Architect-Associate Exam Volume: 858 Questions Question: 1 You are trying to launch an EC2 instance, however the instance seems to go into a terminated status immediately. What would probably not be a reason that this is happening?

More information

Infosys Information Platform. How-to Launch on AWS Marketplace Version 1.2.2

Infosys Information Platform. How-to Launch on AWS Marketplace Version 1.2.2 Infosys Information Platform How-to Launch on AWS Marketplace Version 1.2.2 Copyright Notice 2016 Infosys Limited, Bangalore, India. All Rights Reserved. Infosys believes the information in this document

More information

Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud

Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud WHITE PAPER / AUGUST 8, 2018 DISCLAIMER The following is intended to outline our general product direction. It is intended for

More information

Oracle Data Integrator 12c: Integration and Administration

Oracle Data Integrator 12c: Integration and Administration Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 67863102 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive

More information

SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS

SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS 1. What is SAP Vora? SAP Vora is an in-memory, distributed computing solution that helps organizations uncover actionable business insights

More information

EdgeConnect for Amazon Web Services (AWS)

EdgeConnect for Amazon Web Services (AWS) Silver Peak Systems EdgeConnect for Amazon Web Services (AWS) Dinesh Fernando 2-22-2018 Contents EdgeConnect for Amazon Web Services (AWS) Overview... 1 Deploying EC-V Router Mode... 2 Topology... 2 Assumptions

More information

Transit VPC Deployment Using AWS CloudFormation Templates. White Paper

Transit VPC Deployment Using AWS CloudFormation Templates. White Paper Transit VPC Deployment Using AWS CloudFormation Templates White Paper Introduction Amazon Web Services(AWS) customers with globally distributed networks commonly need to securely exchange data between

More information

Hortonworks DataPlane Service

Hortonworks DataPlane Service Data Steward Studio Administration () docs.hortonworks.com : Data Steward Studio Administration Copyright 2016-2017 Hortonworks, Inc. All rights reserved. Please visit the Hortonworks Data Platform page

More information

FAQs. Business (CIP 2.2) AWS Market Place Troubleshooting and FAQ Guide

FAQs. Business (CIP 2.2) AWS Market Place Troubleshooting and FAQ Guide FAQs 1. What is the browser compatibility for logging into the TCS Connected Intelligence Data Lake for Business Portal? Please check whether you are using Mozilla Firefox 18 or above and Google Chrome

More information

Introduction to AWS GoldBase. A Solution to Automate Security, Compliance, and Governance in AWS

Introduction to AWS GoldBase. A Solution to Automate Security, Compliance, and Governance in AWS Introduction to AWS GoldBase A Solution to Automate Security, Compliance, and Governance in AWS September 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

At Course Completion Prepares you as per certification requirements for AWS Developer Associate.

At Course Completion Prepares you as per certification requirements for AWS Developer Associate. [AWS-DAW]: AWS Cloud Developer Associate Workshop Length Delivery Method : 4 days : Instructor-led (Classroom) At Course Completion Prepares you as per certification requirements for AWS Developer Associate.

More information

Store, Protect, Optimize Your Healthcare Data in AWS

Store, Protect, Optimize Your Healthcare Data in AWS Healthcare reform, increasing patient expectations, exponential data growth, and the threat of cyberattacks are forcing healthcare providers to re-evaluate their data management strategies. Healthcare

More information

AWS Glue. Developer Guide

AWS Glue. Developer Guide AWS Glue Developer Guide AWS Glue: Developer Guide Copyright 2017 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may not be used in connection

More information

AWS Landing Zone. AWS User Guide. November 2018

AWS Landing Zone. AWS User Guide. November 2018 AWS Landing Zone AWS User Guide November 2018 Copyright (c) 2018 by Amazon.com, Inc. or its affiliates. AWS Landing Zone User Guide is licensed under the terms of the Amazon Software License available

More information

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved. Gain Insights From Unstructured Data Using Pivotal HD 1 Traditional Enterprise Analytics Process 2 The Fundamental Paradigm Shift Internet age and exploding data growth Enterprises leverage new data sources

More information

Pexip Infinity and Amazon Web Services Deployment Guide

Pexip Infinity and Amazon Web Services Deployment Guide Pexip Infinity and Amazon Web Services Deployment Guide Contents Introduction 1 Deployment guidelines 2 Configuring AWS security groups 4 Deploying a Management Node in AWS 6 Deploying a Conferencing Node

More information

Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect

Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect Igor Roiter Big Data Cloud Solution Architect Working as a Data Specialist for the last 11 years 9 of them as a Consultant specializing

More information

AWS Remote Access VPC Bundle

AWS Remote Access VPC Bundle AWS Remote Access VPC Bundle Deployment Guide Last updated: April 11, 2017 Aviatrix Systems, Inc. 411 High Street Palo Alto CA 94301 USA http://www.aviatrix.com Tel: +1 844.262.3100 Page 1 of 12 TABLE

More information

What s New at AWS? A selection of some new stuff. Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services

What s New at AWS? A selection of some new stuff. Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services What s New at AWS? A selection of some new stuff Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services Speed of Innovation AWS Pace of Innovation AWS has been continually expanding its

More information

Amazon AppStream 2.0: Getting Started Guide

Amazon AppStream 2.0: Getting Started Guide 2018 Amazon AppStream 2.0: Getting Started Guide Build an Amazon AppStream 2.0 environment to stream desktop applications to your users April 2018 https://aws.amazon.com/appstream2/ 1 Welcome This guide

More information

Overview of AWS Security - Database Services

Overview of AWS Security - Database Services Overview of AWS Security - Database Services June 2016 (Please consult http://aws.amazon.com/security/ for the latest version of this paper) 2016, Amazon Web Services, Inc. or its affiliates. All rights

More information

Securing Amazon Web Services (AWS) EC2 Instances with Dome9. A Whitepaper by Dome9 Security, Ltd.

Securing Amazon Web Services (AWS) EC2 Instances with Dome9. A Whitepaper by Dome9 Security, Ltd. Securing Amazon Web Services (AWS) EC2 Instances with Dome9 A Whitepaper by Dome9 Security, Ltd. Amazon Web Services (AWS) provides business flexibility for your company as you move to the cloud, but new

More information

Oracle Data Integrator 12c: Integration and Administration

Oracle Data Integrator 12c: Integration and Administration Oracle University Contact Us: +34916267792 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive data integration platform

More information

Data Management Glossary

Data Management Glossary Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative

More information

Microsoft SharePoint Server 2013 on the AWS Cloud: Quick Start Reference Deployment

Microsoft SharePoint Server 2013 on the AWS Cloud: Quick Start Reference Deployment Microsoft SharePoint Server 2013 on the AWS Cloud: Quick Start Reference Deployment Mike Pfeiffer August 2014 Last updated: April 2015 (revisions) Table of Contents Abstract... 3 What We ll Cover... 4

More information

SAP Vora - AWS Marketplace Production Edition Reference Guide

SAP Vora - AWS Marketplace Production Edition Reference Guide SAP Vora - AWS Marketplace Production Edition Reference Guide 1. Introduction 2 1.1. SAP Vora 2 1.2. SAP Vora Production Edition in Amazon Web Services 2 1.2.1. Vora Cluster Composition 3 1.2.2. Ambari

More information

Hortonworks and The Internet of Things

Hortonworks and The Internet of Things Hortonworks and The Internet of Things Dr. Bernhard Walter Solutions Engineer About Hortonworks Customer Momentum ~700 customers (as of November 4, 2015) 152 customers added in Q3 2015 Publicly traded

More information

Oracle WebLogic Server 12c on AWS. December 2018

Oracle WebLogic Server 12c on AWS. December 2018 Oracle WebLogic Server 12c on AWS December 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only. It represents

More information

Datameer for Data Preparation:

Datameer for Data Preparation: Datameer for Data Preparation: Explore, Profile, Blend, Cleanse, Enrich, Share, Operationalize DATAMEER FOR DATA PREPARATION: EXPLORE, PROFILE, BLEND, CLEANSE, ENRICH, SHARE, OPERATIONALIZE Datameer Datameer

More information

4) An organization needs a data store to handle the following data types and access patterns:

4) An organization needs a data store to handle the following data types and access patterns: 1) A company needs to deploy a data lake solution for their data scientists in which all company data is accessible and stored in a central S3 bucket. The company segregates the data by business unit,

More information

AWS Service Catalog. User Guide

AWS Service Catalog. User Guide AWS Service Catalog User Guide AWS Service Catalog: User Guide Copyright 2017 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may not be used in

More information

Hackproof Your Cloud Responding to 2016 Threats

Hackproof Your Cloud Responding to 2016 Threats Hackproof Your Cloud Responding to 2016 Threats Aaron Klein, CloudCheckr Tuesday, June 30 th 2016 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Changing Your Perspective Moving

More information

Community Edition Getting Started Guide. July 25, 2018

Community Edition Getting Started Guide. July 25, 2018 Community Edition Getting Started Guide July 25, 2018 Copyright 2018 by Qualys, Inc. All Rights Reserved. Qualys and the Qualys logo are registered trademarks of Qualys, Inc. All other trademarks are the

More information

Information empowerment for your evolving data ecosystem

Information empowerment for your evolving data ecosystem Information empowerment for your evolving data ecosystem Highlights Enables better results for critical projects and key analytics initiatives Ensures the information is trusted, consistent and governed

More information

Tetration Cluster Cloud Deployment Guide

Tetration Cluster Cloud Deployment Guide First Published: 2017-11-16 Americas Headquarters Cisco Systems, Inc. 170 West Tasman Drive San Jose, CA 95134-1706 USA http://www.cisco.com Tel: 408 526-4000 800 553-NETS (6387) Fax: 408 527-0883 THE

More information

Security & Compliance in the AWS Cloud. Amazon Web Services

Security & Compliance in the AWS Cloud. Amazon Web Services Security & Compliance in the AWS Cloud Amazon Web Services Our Culture Simple Security Controls Job Zero AWS Pace of Innovation AWS has been continually expanding its services to support virtually any

More information

Automating Elasticity. March 2018

Automating Elasticity. March 2018 Automating Elasticity March 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only. It represents AWS s current product

More information

Introduction to AWS GoldBase

Introduction to AWS GoldBase Introduction to AWS GoldBase A Solution to Automate Security, Compliance, and Governance in AWS October 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

@Pentaho #BigDataWebSeries

@Pentaho #BigDataWebSeries Enterprise Data Warehouse Optimization with Hadoop Big Data @Pentaho #BigDataWebSeries Your Hosts Today Dave Henry SVP Enterprise Solutions Davy Nys VP EMEA & APAC 2 Source/copyright: The Human Face of

More information

Architecting for Greater Security in AWS

Architecting for Greater Security in AWS Architecting for Greater Security in AWS Jonathan Desrocher Security Solutions Architect, Amazon Web Services. Guy Tzur Director of Ops, Totango. 2015, Amazon Web Services, Inc. or its affiliates. All

More information

AWS 101. Patrick Pierson, IonChannel

AWS 101. Patrick Pierson, IonChannel AWS 101 Patrick Pierson, IonChannel What is AWS? Amazon Web Services (AWS) is a secure cloud services platform, offering compute power, database storage, content delivery and other functionality to help

More information

The Emerging Data Lake IT Strategy

The Emerging Data Lake IT Strategy The Emerging Data Lake IT Strategy An Evolving Approach for Dealing with Big Data & Changing Environments bit.ly/datalake SPEAKERS: Thomas Kelly, Practice Director Cognizant Technology Solutions Sean Martin,

More information

Virtual Private Cloud. User Guide. Issue 21 Date HUAWEI TECHNOLOGIES CO., LTD.

Virtual Private Cloud. User Guide. Issue 21 Date HUAWEI TECHNOLOGIES CO., LTD. Issue 21 Date 2018-09-30 HUAWEI TECHNOLOGIES CO., LTD. Copyright Huawei Technologies Co., Ltd. 2018. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any

More information

ForeScout CounterACT. (AWS) Plugin. Configuration Guide. Version 1.3

ForeScout CounterACT. (AWS) Plugin. Configuration Guide. Version 1.3 ForeScout CounterACT Hybrid Cloud Module: Amazon Web Services (AWS) Plugin Version 1.3 Table of Contents Amazon Web Services Plugin Overview... 4 Use Cases... 5 Providing Consolidated Visibility... 5 Dynamic

More information

Move Amazon RDS MySQL Databases to Amazon VPC using Amazon EC2 ClassicLink and Read Replicas

Move Amazon RDS MySQL Databases to Amazon VPC using Amazon EC2 ClassicLink and Read Replicas Move Amazon RDS MySQL Databases to Amazon VPC using Amazon EC2 ClassicLink and Read Replicas July 2017 2017, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided

More information

Enroll Now to Take online Course Contact: Demo video By Chandra sir

Enroll Now to Take online Course   Contact: Demo video By Chandra sir Enroll Now to Take online Course www.vlrtraining.in/register-for-aws Contact:9059868766 9985269518 Demo video By Chandra sir www.youtube.com/watch?v=8pu1who2j_k Chandra sir Class 01 https://www.youtube.com/watch?v=fccgwstm-cc

More information

Amazon Web Services. Block 402, 4 th Floor, Saptagiri Towers, Above Pantaloons, Begumpet Main Road, Hyderabad Telangana India

Amazon Web Services. Block 402, 4 th Floor, Saptagiri Towers, Above Pantaloons, Begumpet Main Road, Hyderabad Telangana India (AWS) Overview: AWS is a cloud service from Amazon, which provides services in the form of building blocks, these building blocks can be used to create and deploy various types of application in the cloud.

More information

Splunk & AWS. Gain real-time insights from your data at scale. Ray Zhu Product Manager, AWS Elias Haddad Product Manager, Splunk

Splunk & AWS. Gain real-time insights from your data at scale. Ray Zhu Product Manager, AWS Elias Haddad Product Manager, Splunk Splunk & AWS Gain real-time insights from your data at scale Ray Zhu Product Manager, AWS Elias Haddad Product Manager, Splunk Forward-Looking Statements During the course of this presentation, we may

More information

Serverless Computing. Redefining the Cloud. Roger S. Barga, Ph.D. General Manager Amazon Web Services

Serverless Computing. Redefining the Cloud. Roger S. Barga, Ph.D. General Manager Amazon Web Services Serverless Computing Redefining the Cloud Roger S. Barga, Ph.D. General Manager Amazon Web Services Technology Triggers Highly Recommended http://a16z.com/2016/12/16/the-end-of-cloud-computing/ Serverless

More information

Building Big Data Storage Solutions (Data Lakes) for Maximum Flexibility. AWS Whitepaper

Building Big Data Storage Solutions (Data Lakes) for Maximum Flexibility. AWS Whitepaper Building Big Data Storage Solutions (Data Lakes) for Maximum Flexibility AWS Whitepaper Building Big Data Storage Solutions (Data Lakes) for Maximum Flexibility: AWS Whitepaper Copyright 2018 Amazon Web

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

Amazon Web Services 101 April 17 th, 2014 Joel Williams Solutions Architect. Amazon.com, Inc. and its affiliates. All rights reserved.

Amazon Web Services 101 April 17 th, 2014 Joel Williams Solutions Architect. Amazon.com, Inc. and its affiliates. All rights reserved. Amazon Web Services 101 April 17 th, 2014 Joel Williams Solutions Architect Amazon.com, Inc. and its affiliates. All rights reserved. Learning about Cloud Computing with AWS What is Cloud Computing and

More information

Managing and Auditing Organizational Migration to the Cloud TELASA SECURITY

Managing and Auditing Organizational Migration to the Cloud TELASA SECURITY Managing and Auditing Organizational Migration to the Cloud 1 TELASA SECURITY About Me Brian Greidanus bgreidan@telasasecurity.com 18+ years of security and compliance experience delivering consulting

More information

Amazon Virtual Private Cloud. Getting Started Guide

Amazon Virtual Private Cloud. Getting Started Guide Amazon Virtual Private Cloud Getting Started Guide Amazon Virtual Private Cloud: Getting Started Guide Copyright 2017 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks

More information