Deep Dive on Amazon Elastic File System

Similar documents
Introducing Amazon Elastic File System (EFS)

Amazon Elastic File System

Agenda. Introduction Storage Primer Block Storage Shared File Systems Object Store On-Premises Storage Integration

AWS Storage Gateway. Not your father s hybrid storage. University of Arizona IT Summit October 23, Jay Vagalatos, AWS Solutions Architect

AWS Storage Gateway. Amazon S3. Amazon EFS. Amazon Glacier. Amazon EBS. Amazon EC2 Instance. storage. File Block Object. Hybrid integrated.

Cloud and Storage. Transforming IT with AWS and Zadara. Doug Cliche, Storage Solutions Architect June 5, 2018

CIT 668: System Architecture. Amazon Web Services

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

AWS_SOA-C00 Exam. Volume: 758 Questions

AWS: Basic Architecture Session SUNEY SHARMA Solutions Architect: AWS

AWS Solutions Architect Associate (SAA-C01) Sample Exam Questions

Cloud Storage with AWS: EFS vs EBS vs S3 AHMAD KARAWASH

File Storage Level 100

About Intellipaat. About the Course. Why Take This Course?

Oracle WebLogic Server 12c on AWS. December 2018

Aurora, RDS, or On-Prem, Which is right for you

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

AWS Storage Optimization. AWS Whitepaper

ActiveNET. #202, Manjeera Plaza, Opp: Aditya Park Inn, Ameerpetet HYD

Creating your Virtual Data Centre

AWS Administration. Suggested Pre-requisites Basic IT Knowledge

At Course Completion Prepares you as per certification requirements for AWS Developer Associate.

Pass4test Certification IT garanti, The Easy Way!

Advanced Architectures for Oracle Database on Amazon EC2

PrepAwayExam. High-efficient Exam Materials are the best high pass-rate Exam Dumps

Designing Fault-Tolerant Applications

Cloudera s Enterprise Data Hub on the Amazon Web Services Cloud: Quick Start Reference Deployment October 2014

AWS Services for Data Migration Luke Anderson Head of Storage, AWS APAC

Security & Compliance in the AWS Cloud. Amazon Web Services

Amazon Web Services (AWS) Solutions Architect Intermediate Level Course Content

LINUX, WINDOWS(MCSE),

Amazon AWS-Solution-Architect-Associate Exam

Introduction to Database Services

Amazon Web Services. Foundational Services for Research Computing. April Mike Kuentz, WWPS Solutions Architect

Introduction to Amazon Cloud & EC2 Overview

Building Extreme-Scale File Services in the Oracle Public Cloud Ed Beauvais, Director Product Management

Security & Compliance in the AWS Cloud. Vijay Rangarajan Senior Cloud Architect, ASEAN Amazon Web

WHITEPAPER AMAZON ELB: Your Master Key to a Secure, Cost-Efficient and Scalable Cloud.

Training on Amazon AWS Cloud Computing. Course Content

Amazon Web Services Training. Training Topics:

AWS Solution Architect Associate

2013 AWS Worldwide Public Sector Summit Washington, D.C.

Oracle IaaS, a modern felhő infrastruktúra

Building a Modular and Scalable Virtual Network Architecture with Amazon VPC

Security Aspekts on Services for Serverless Architectures. Bertram Dorn EMEA Specialized Solutions Architect Security and Compliance

Amazon Web Services and Feb 28 outage. Overview presented by Divya

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

MarkLogic Cloud Service Pricing & Billing Effective: October 1, 2018

Expected Learning Outcomes Introduction To AWS

Scaling Massive Content Stores in the Cloud. CloudExpo New York June Alfresco Founder & CTO

Using SQL Server on Amazon Web Services

POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN. PostgresConf US

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

OnCommand Cloud Manager 3.2 Deploying and Managing ONTAP Cloud Systems

Project Presentation

Introduction to Amazon Web Services. Jeff Barr Senior AWS /

Protecting Your Data in AWS. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Next Generation Storage for The Software-Defned World

HOW TO PLAN & EXECUTE A SUCCESSFUL CLOUD MIGRATION

Amazon ElastiCache. User Guide API Version

Security: Michael South Americas Regional Leader, Public Sector Security & Compliance Business Acceleration

High School Technology Services myhsts.org Certification Courses

Enroll Now to Take online Course Contact: Demo video By Chandra sir

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services

Introduction to Cloud Computing

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

Configuring AWS for Zerto Virtual Replication

We are ready to serve Latest IT Trends, Are you ready to learn? New Batches Info

SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS

Amazon Web Services (AWS) Training Course Content

Amazon Aurora Deep Dive

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS

AWS Solution Architect (AWS SA)

Determining the IOPS Needs for Oracle Database on AWS

Scaling on AWS. From 1 to 10 Million Users. Matthias Jung, Solutions Architect

HPE Digital Learner AWS Certified SysOps Administrator (Intermediate) Content Pack

SQL Server Performance on AWS. October 2018

Amazon Aurora Relational databases reimagined.

Amazon Aurora Deep Dive

AWS Certifications. Columbus Amazon Web Services Meetup - February 2018

Microsoft Windows Server Failover Clustering (WSFC) and SQL Server AlwaysOn Availability Groups on the AWS Cloud: Quick Start Reference Deployment


CLOUD ECONOMICS: HOW TO QUANTIFY THE BENEFITS OF MOVING TO THE CLOUD

Red Hat Storage Server for AWS

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

How can you implement this through a script that a scheduling daemon runs daily on the application servers?

Amazon Elastic Compute Cloud (EC2)

4 Myths about in-memory databases busted

Emerging Technologies for HPC Storage

Principal Solutions Architect. Architecting in the Cloud

Security on AWS(overview) Bertram Dorn EMEA Specialized Solutions Architect Security and Compliance

Data Protection Done Right with Dell EMC and AWS

Automating Elasticity. March 2018

Object Storage Level 100

ArcGIS 10.3 Server on Amazon Web Services

POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN

Best Practices and Performance Tuning on Amazon Elastic MapReduce

Store, Protect, Optimize Your Healthcare Data in AWS

Overview of AWS Security - Database Services

Introduction to Amazon Web Services

Transcription:

Deep Dive on Amazon Elastic File System Yong S. Kim AWS Business Development Manager, Amazon EFS Paul Moran Technical Account Manager, Enterprise Support 28 th of June 2017 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

What to expect from this session Recognize why and when to use Amazon EFS Understand key technical/security concepts Learn how to leverage EFS s performance Review EFS s economics

What to expect from this session Recognize why and when to use Amazon EFS Understand key technical/security concepts Learn how to leverage EFS s performance Review EFS s economics

How EFS fits in to the AWS storage platform Amazon EFS File Amazon S3 Amazon Glacier Object Data Transfer Amazon EBS (persistent) Block Amazon EC2 Instance Store (ephemeral) Snowball Storage Gateway Direct Connect 3 rd Party Connectors Transfer Acceleration Kinesis Firehose

We focused on changing the game 1 2 3 Simple Elastic Scalable Highly durable Highly available

1 Amazon EFS is Simple Fully managed - No hardware, network, file layer - Create a scalable file system in seconds! Seamless integration with existing tools and apps - NFS v4.1 widespread, open - Standard file system access semantics - Works with standard OS file system APIs Simple pricing = simple forecasting

2 Amazon EFS is Elastic File systems grow and shrink automatically as you add and remove files No need to provision storage capacity or performance You pay only for the storage space you use, with no minimum fee

3 Amazon EFS is Scalable File systems can grow to petabytes of capacity Throughput scales automatically as file systems grow Consistent low latencies regardless of file system size Support for thousands of concurrent NFS connections

Highly Durable and Highly Available (Multi-AZ) Every file system object is redundantly stored across multiple Availability Zones in a Region Designed to sustain Availability Zone offline conditions Superior to traditional NAS availability models Appropriate for production/tier 0 applications

How to think about EFS relative to EBS Amazon EFS Amazon EBS PIOPS Performance Per-operation latency Throughput scale Low, consistent Multiple GBs per second Lowest, consistent Single GB per second Data availability / durability Stored redundantly across multiple AZs Stored redundantly in a single AZ Characteristics Access 1 to 1000s of EC2 instances, from multiple AZs, concurrently Single EC2 instance in a single AZ Use cases Big Data and analytics, media processing workflows, content management, web serving, home directories Boot volumes, transactional and NoSQL databases, data warehousing & ETL

Do you need an EFS file system? If you have an application running on EC2 or use case that requires a file system AND Requires multi-attach OR GBs/s throughput OR Multi-AZ availability/durability OR Requires automatic scaling (grow/shrink) of storage

Access your EFS file system via AWS Direct Connect On-premises servers Direct Connect EFS in your Amazon VPC

Direct Connect support addresses three of the scenarios Migration Bursting Tiering Backup / DR

What customers are using EFS for today Web serving Content management Database backups Container storage Home directories Analytics Media and Entertainment workflows Workflow management

Where is EFS available today? US West (Oregon) US East (N. Virginia) US East (Ohio) EU (Ireland) Asia Pacific (Sydney) More coming soon!

What to expect from this session Recognize why and when to use Amazon EFS Understand key technical/security concepts Learn how to leverage EFS s performance Review EFS s economics

EFS s Design VPC REGION EC2 AVAILABILITY ZONE 1 AVAILABILITY ZONE 2 File system EC2 EC2 EC2 AVAILABILITY ZONE 3 Data can be accessed from any AZ in the Region while maintaining full consistency

What is a file system? The primary resource in EFS for storing files and directories Regional construct 10 per account per region (soft) Default throughput limit 3 GB/s (soft) Accessible from EC2 VPC, EC2-Classic via ClassicLink Accessible from on-premises AWS Direct Connect

What is a mount target? To access your file system within a VPC, you create mount targets in the VPC A mount target is an NFS endpoint that lives in your VPC A mount target has an IP address and a DNS name you use in your mount command A mount target is highly available VPC EC2 Mount target EC2 EC2 EC2 REGION AVAILABILITY ZONE 1 AVAILABILITY ZONE 2 AVAILABILITY ZONE 3

Mount EFS NFSv4.0 NFSv4.1 Linux Kernel 4+

Mount an EFS File System Launch EC2 instance from EC2 Console Connect to the instance Make a directory Mount EFS file system Query disk file system & file system table df; df -ht; df -h -t nfsv4; mount -t nfsv4 mount t nfs4 o nfsvers=4.1 [file system DNS name]:/ /[user s target directory]

Recommended kernel version and NFS mount options Kernel version Mount options Use Linux kernel 4.0+ (e.g., Amazon Linux 2016.03.0, Ubuntu 15.10 or 16.04) Mount via NFSv4.1 Specify 1MB read/write buffers ( rsize / wsize ) Ensure operations are asynchronous Recommend the following mount options: -o nfsvers=4.1, rsize=1048576,wsize=1048576,hard, timeo=600,retrans=2,async

Resources for Amazon EFS Tags Typical key-value pair Create & associate tag with file system Up to 50 tags per file system

Resources for Amazon EFS Mount Targets One or more per file system Create in a VPC Subnet One per Availability Zone Must be in the same VPC

Resources for Amazon EFS Security Groups Standard VPC Security Group Same VPC as subnet Up to five per mount target Allow inbound TCP port 2049 from NFS clients

Several security mechanisms Control network traffic to and from file systems (mount targets) by using VPC security groups and network ACLs Control file and directory access by using POSIX permissions Control administrative access (API access) to file systems by using AWS Identity and Access Management (IAM) EFS supports action-level and resource-level permissions

The AWS Management Console, CLI, and SDK each allow you to perform a variety of management tasks Create a file system Create and manage mount targets Tag a file system Delete a file system View details on file systems in your AWS account

All EFS AWS CLI Commands aws efs create-file-system aws efs create-mount-target aws efs create-tags aws efs delete-file-system aws efs delete-mount-target aws efs delete-tags aws efs describe-file-systems aws efs describe-mount-target-security-groups aws efs describe-mount-targets aws efs describe-tags aws efs modify-mount-target-security-groups

What to expect from this session Recognize why and when to use Amazon EFS Understand key technical/security concepts Learn how to leverage EFS s performance Review EFS s economics

Amazon EFS is designed for wide spectrum of performance needs High throughput and parallel I/O Genomics Big data analytics Scale-out jobs Web serving Home directories Content management Metadata-intensive jobs Low latency and serial I/O

Amazon EFS has a distributed data storage design EC2 EC2 EC2 EC2 File systems distributed across unconstrained number of servers Avoids bottlenecks/constraints of traditional file servers Enables high levels of aggregate IOPS/throughput EC2 EC2 Data also distributed across Availability Zones (durability, availability)

Choose the performance mode best suited to your workload Mode What s it for? Advantages Tradeoffs When to use General purpose (default) Latency-sensitive applications and general-purpose workloads Lowest latencies for file operations Limit of 7,000 ops/sec Best choice for most workloads Max I/O Large-scale and dataheavy applications Virtually unlimited ability to scale out throughput/iops Slightly higher latencies Consider if 10s (or more) instances access your file system concurrently

Use the PercentIOLimit CloudWatch metric to determine if you re constrained by General Purpose mode

Burst Model Based on size of file system Starts w/ 2.1 TiB burst credits Min. burst throughput 100 MiB/s Baseline throughput 50 MiB/s per TiB Burst throughput 100 MiB/s Per TiB

Burst Model Examples File System Size (GiB) Baseline Aggregate Throughput (MiB/s) Burst Aggregate Throughput (MiB/s) Maximum Burst Duration (Min/Day) 10 0.5 100 7.2 512 25 100 360 1024 50 100 720 4096 200 400 720 16384 800 1600 720

Burst Model Throughput (MiB/s) Current Baseline Current throughput is above baseline consuming burst credits Time

Burst Model Throughput (MiB/s) Baseline Current Current throughput is below baseline adding burst credits Time

I/O Size Implication I/O size impacts throughput of serialized operations Throughput 4 KB 32 KB 256 KB 2 MB 16 MB I/O size

How to take advantage of EFS s distributed architecture: Parallelise 30000 25000 20000 Aggregate IOPS of parallel writes using 10 m4.xlarge instances IOPS 15000 10000 5000 0 0 20 40 60 80 100 120 140 160 # of Total Threads Parallelise via multiple threads and/or multiple instances

Use CloudWatch for a number of views of file system performance DataReadIOBytes DataWriteIOBytes MetadataIOBytes TotalIOBytes BurstCreditBalance PermittedThroughput ClientConnections PercentIOLimit Measure throughput ( Sum of bytes divided by seconds in time period) or ops/sec ( Data Samples divided by seconds in time period) Monitor your burst credit usage over time to ensure sufficient throughput capacity Compare to actual throughput to determine whether you re being constrained by the burst model View the number of clients connected to your file system Determine whether you re being constrained by General Purpose mode (PercentIOLimit at or near 100%)

Transferring media assets to EFS Size ranges from a few GB to 100+GB per file Data sources: Amazon S3 Amazon EBS

Transferring many small files to EFS Size ranges from 64K to 256K Data sources: Amazon S3 Amazon EBS

GNU parallel For people who live life in the parallel lane Tool for executing jobs in parallel Similar to xargs Replace loops in shell scripts GNU parallel makes sure output from the commands is the same output as you would get if you had run the commands sequentially https://www.gnu.org/software/parallel/

As with copying from within EC2, using a script based on the GNU parallel tool reduces transfer time Total Time to Copy 26200 Files vs Number of Threads Time 900 800 700 600 500 400 300 200 100 0 0 2 4 6 8 10 12 14 16 18 Number of Threads

Use parallel threads GNU parallel # Create destination directory tree from source find. -type d -print0 parallel -j $N_THREADS -0 "mkdir -p ${DST_DIR}/{}" > /dev/null 2>&1 # Copy files find.! \( -type d \) -print0 parallel -j $N_THREADS -0 "cp - f {} ${DST_DIR}/{}"

Results Large files 50 instances Small files 300 instances

Summary / tl;dr Parallelise everything Threads Instances Test, test, test Capture & analyze test data Check your burst credit earn/spend rate when testing ensure sufficient amount of storage Less than $5/hr for 300 instances

What to expect from this session Recognize why and when to use Amazon EFS Understand key technical/security concepts Learn how to leverage EFS s performance Review EFS s economics

Operating your own multi-attach file storage on the cloud is complex and expensive Replicate EBS volumes (1 per EC2 instance) Substantial management overhead (sync data, provision and manage volumes) Costly (one volume per instance) Use an NFS server or shared file layer Complex to set up and maintain Scale challenges HA challenges Costly (compute + storage)

Do It Yourself Cost and Complexity NFS Clients NFS Clients NFS Clients NFS Server NFS Server NFS Server Volume Volume Volume Volume Volume Volume

EFS TCO example Let s say you need to store ~500 GB and require high availability and durability Using a shared file layer on top of EBS, you might provision 600 GB (with ~85% utilisation) and fully replicate the data to a second Availability Zone for availability/durability Example comparative cost: Storage (2x 600 GB EBS gp2 volumes): $132 per month Compute (2x m4.xlarge instances): Inter-AZ data transfer costs (est.): Total $320 per month $135 per month $587 per month EFS cost is (500GB * $0.33/GB-month*) = $165 per month, with no additional charges

EFS: Simple and Fully Managed NFS Clients NFS Clients Single Namespace NFS Clients Mount Target Mount Target Mount Target

EFS Economics No minimum commitments or up-front fees No need to provision storage in advance No other fees, charges, or billing dimensions Price: $0.30/GB-Month (US Regions) $0.33/GB-Month (EU Ireland) $0.36/GB-Month (AP Sydney)

Reference AWS Loft EFS Hands-on Walk-through - https://bit.ly/awsloft2017 AWS 10-minute Tutorials - https://aws.amazon.com/getting-started/tutorials/ Amazon EFS Web page - https://aws.amazon.com/efs/ YouTube AWS Channel - https://www.youtube.com/user/amazonwebservices Reference Architecture - https://aws.amazon.com/architecture/ QuickStarts - https://aws.amazon.com/architecture/ qwiklabs - https://aws.qwiklabs.com/

Thank you! Yong Kim yongskim@amazon.com Paul Moran pdmoran@amazon.com