Amazon Web Services Foundational Services for Research Computing Mike Kuentz, WWPS Solutions Architect April 2017 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Global Infrastructure 16 Regions 42 Availability Zones 68 Edge Locations Region & Number of Availability Zones AWS GovCloud (2) EU Ireland (3) US West Frankfurt (2) Oregon (3) London (2) Northern California (3) Asia Pacific US East Singapore (2) N. Virginia (5), Ohio (3) Sydney (2), Tokyo (3), Seoul (2), Mumbai (2) Canada Central (2) China Beijing (2) South America São Paulo (3) Announced Regions Paris, Ningxia
Zoom In: AWS Region Sample US Region Zoom In: AWS AZ Sample Availability Zone Availability Zone A Availability Zone B Datacenter Datacenter Availability Zone C Datacenter
Foundational Services Compute VMs, Auto-scaling, & Load Balancing Storage Object, Blocks, Archival, Import/Export Databases Relational, NoSQL, Caching, Migration
Amazon Elastic Compute Cloud (EC2) : Instance Families General purpose Compute optimized Dense-storage High-I/O optimized GPU FPGA enabled Memory optimized M4 C4 D2 G2 R4 P2 M3 C3 I3 X1 T2 F1
Memory Optimized Announced in October, to be available first half 2016 New X1 instances will feature up to 2 TB of memory Powered by up to four Intel Xeon E7 processors Processors have high memory bandwidth and large L3 caches Up to 100 vcpus in a single EC2 instance Designed to support high-performance, memory-bound applications Designed for demanding enterprise workloads including production installations of SAP HANA, Microsoft SQL Server, Apache Spark, and Presto Excellent fit for many HPC workloads
GPU Instances
Field Programmable Arrays (FPGA) The DRAGEN Genome pipeline enables ultra rapid analysis of Next Generation Sequencing (NGS) data, reducing the time required for analyzing a whole genome at 30x coverage from 10 to 30 hours using the current industry standard, BWA-MEM+GATK-HC software, to ~22 minutes, with the same level of accuracy for both SNPs and INDELs. Edico Genome
Performant Network AWS proprietary, 10Gb networking Highest performance in largest EC2 instance sizes Full bi-section bandwidth in placement groups, with no network oversubscription Enhanced Networking Available on C3, C4, M4, R3, I2, X1 Over 1M PPS performance, reduced instance-toinstance latencies, more consistent network performance
Foundational Services Compute VMs, Auto-scaling, & Load Balancing Storage Object, Blocks, Archival, Import/Export Databases Relational, NoSQL, Caching, Migration
Mulitple storage options on AWS EFS EC2+EBS Amazon S3 Glacier Highly available, multi- AZ, fully managed network-attached Elastic File System. For near-line, highlyavailable storage of files in a traditional NFS format. Create a single-az shared file system using EC2 and EBS, with thirdparty or open source software. For near-line storage of files optimized for high I/O performance. Secure, durable, highlyscalable object storage. Fast access, low cost. For long-term durable storage of data, in a readily accessible get/put access format. Secure, durable, long term, highly costeffective object storage. For long-term storage and archival of data that is infrequently accessed.
Amazon Elastic Block Store (EBS) Volume Types
Amazon Simple Storage Service (S3) Web accessible object store Natively online (HTTP/HTTPS) Massively horizontal scale More genomics tools are supporting! $ samtools view http://s3.amazonaws.com/na12878.bam chr01:1000-1100
Amazon S3 Standard Infrequent Access Designed for data that is accessed less frequently, but requires rapid access when needed (e.g. genomic cohort reanalysis) Much lower cost to store, pay to retrieve ($0.0125/GB/month + 0.01/GB retrieved*) Amazon EC2 Amazon S3 Internet * US-East pricing
Foundational Services Compute VMs, Auto-scaling, & Load Balancing Storage Object, Blocks, Archival, Import/Export Databases Relational, NoSQL, Caching, Migration
AWS Database Services Amazon RDS Amazon Redshift DynamoDB ElastiCache Aurora
Key Amazon Amazon RDS Features Amazon RDS Configuration Improve Availability Increase Throughput Reduce Latency Multi-AZ Push-Button Scaling Multi AZ Read Replicas Provisioned IOPS availability zone Region availability zone Push-Button Scaling Read Replicas Provisioned IOPS
Traditional HPC / HTC clusters Elasticsearch RDS SQS Other Services Machine Learning S3 EMR M E E E Scalable Compute Cluster S S S PNFS
Data Security and Compliance
The Shared Responsibility model Customer/Partner Audited Facilities Physical security Compute infrastructure Storage infrastructure Network infrastructure Virtualization layer (EC2) Hardened service endpoints Rich IAM capabilities Network configuration Security groups + = OS firewalls Operating systems Applications Proper service configuration Auth & acct management Authorization policies Re-focus your security professionals on a subset of the problem Take advantage of high levels of uniformity and automation
Store and analyze restricted-access genomics on AWS bit.ly/aws-dbgap
Enabling Compliant Workloads Build HIPAA-eligible applications that store, process and transmit PHI Business Associate Agreement (BAA) addendum available Services covered under the AWS BAA addendum: Amazon EC2 Amazon EMR Amazon EBS Amazon S3 Amazon Glacier DynamoDB Amazon RDS Amazon Redshift Elastic Load Balancing Compute Block Level, Object, and Archival Storage Database and Data Warehousing Traffic Distribution
Thank you! Architecting for Genomic Data Security and Compliance in AWS http://bit.ly/aws-dbgap Architecting for HIPAA Security and Compliance on Amazon Web Services http://bit.ly/aws-hipaa