CIT 668: System Architecture Amazon Web Services
Topics 1. AWS Global Infrastructure 2. Foundation Services 1. Compute 2. Storage 3. Database 4. Network 3. AWS Economics
Amazon Services Architecture
Regions and Availability Zones AWS resources are either Global Tied to a region Tied to an availability zone Regions are completed isolated from each other. Availability zones are data centers within a region. https://aws.amazon.com/about-aws/globalinfrastructure/
Edge Locations Content delivery network (CDN) Goal: serve content with low latency, high availability. Solution: cache content in multiple geographically distributed data centers so a DC is near each user. Edge Locations Traditional Content Delivery Content Delivery Network
Compute Storage Database Networking Foundation Services
Compute Elastic Compute Cloud (EC2) Create virtual servers in the cloud in seconds. Setup with any OS and software. Manage with administrative access. Auto Scaling Create and remove EC2 instances based on triggers. Time and date based triggers. Resource based triggers.
Amazon EC2 is A web service that enables you to launch and manage server instances Designed to make web-scale computing easier for developers. A simple web service interface that provides programmable control of your cloud resources
EC2 Features Elastic Allows you to instantiate one to thousands of server instances either manually or automatically. Flexible Choice of multiple instance types, OS, and software packages. Available SLA commitment 99.95% availability in each region. Pay as You Go Pay for resources as you need them, though reserved instances offer lower pricing for longer commitments.
Amazon Machine Image Virtual root disk image Contains OS Contains most applications Start a VM by Booting an AMI Creates an instance Catalog of pre-built AMIs OS: Linux (many distros), OpenSolaris, Windows Software: Apache, MySQL, Oracle, WordPress, etc. Available at http://aws.amazon.com/amis
Instance An instance is a VM running the OS and software on an AMI. You can launch many instances of the same AMI. Other users can launch instances of that AMI too. Each instance is a separate and independent virtual server.
EC2 Instance Types General Purpose Balanced compute, memory, network resources. Useful for typical server applications. Compute Optimized Many vcpus with lowest cost per vcpu. High traffic web apps, video encoding, analytics. GPU Instances Provide access to GPUs with hundreds of CUDA cores. Gaming and 3D graphics. Memory Optimized High memory with lowest cost per GiB of RAM. Databases and distributed caches. Storage Optimized Large storage and high speed storage (SSD) versions. Large databases and fileservers.
General Purpose (m1) Instance Types Small Instance Large Instance Extra Large Instance 1.7GiB RAM 7.5GiB RAM 15GiB RAM 1 Virtual Core 2 Virtual Cores 4 Virtual Cores 1 EC2 Compute Unit 160GB instance storage 32-bit or 64-bit 4 EC2 Compute Units 2 x 420GB instance storage 64-bit platform 8 EC2 Compute Units 4 x 420GB instance storage 64-bit platform 1 EC Compute Unit = Early 2006 1.7 GHz Xeon CPU
Access Identifiers AWS uses a set of different access identifiers Use public key cryptography Public identifier kept on service on instance Can be shared with anyone Private identifier kept on your PC Must keep secret
Elastic Block Store Volume An addressable virtual disk Can be attached to an instance Format Mount Store files Volumes have lifetime independent of instance Disk storage persists even if instance terminated
Block Device Mapping Map system devices to AWS block storage. VM Device Name AWS Volume ID Status Timestamp DeleteOnTermination
Security Group A Security Group defines the set of permitted inbound connections for an instance. Each group is a named access control list. Entries specify allowed protocols, ports, and IPs. Essentially a firewall. A single Security Group can be applied to multiple instances. Multiple Security Groups can be applied to a single instance.
S3 and EBS Instance Lifecycles S3-backed Instance EBS-backed Instance Data remains accessible if instance is rebooted or (EBS-only) stopped. Data cannot be recovered after an instance is terminated. http://shlomoswidler.com/2009/07/ec2-instance-life-cycle.html
S3 and EBS-backed Instance Differences
EC2 Resources Persistent Resources Elastic IP Addresses Elastic Block Storage Volumes Elastic Load Balancers Security Groups Amazon Machine Images Ephemeral Resources Instances, including Instance memory state Instance disk state Non-elastic IP address DNS name How can you maintain a running system if your servers are transient and unreliable?
AMI Types Public AMIs made available by Amazon and the EC2 community. Private AMIs that you own and create; may be developed from Public AMIs. Shared AMIs built by developers and shared with the EC2 community. Paid AMIs that you purchase or that come with a service contract from a company such as Red Hat.
Security Credentials Credentials to Administer Instances AWS Management Console: Amazon account Query and Third Party UIs: Secret access key SOAP, EC2 CLI: X.509 certificate and private key Credentials to Connect to an Instance Amazon EC2 key pair Windows administrator password Credentials to Build Instances UNIX: X.509 certificate and private key Windows: Amazon account
Instance Network Addresses EC2 instances assigned 2 IPs at launch Private RFC1918 IP address for internal use Public IP address NAT-mapped to private IP EC2 instances assigned 2 DNS names at launch Internal: resolves only inside EC2 Public: associated with instance until stopped Elastic IP addresses Static IP addresses you map to an instance Can keep and remap elastic IP addresses Charged only for allocated but unused elastic IPs
Using Tags Can tag AMIs Instances EBS Volumes EBS Snapshots but not Elastic IPs Key pairs Security groups
Compute Storage Database Networking Foundation Services
Storage Elastic Block Store (EBS) provides Off-instance storage Persistence beyond instance lifetime High availability and reliability Attach and detach from running instance Exposure as device with an instance Simple Storage Service (S3) provides Highly available and reliable storage for objects. Objects can be up to 5TB in size. Objects are accessible simply via a URL. Amazon Glacier provides Cheap, reliable long term backup with 24 hour turnaround.
Elastic Block Store (EBS) EBS Volumes are up to 1TB in size Attach to any EC2 instance in same AZ Create snapshots at any time Create new volumes based on snapshots Reliability Annual Failure Rate (AFR) of 0.1-0.5% Commodity hard disk AFR is ~4% About as reliable as a RAID set Use snapshots for backups Pricing per GB-month
EBS Snapshots Snapshots saved to S3 Not visible by S3 API. Snapshots are EBS volumes themselves. Snapshots are fast Use Copy on Write (CoW), i.e. Only changed blocks since last snapshot need to update. http://blog.rightscale.com/2008/08/20/amazon-ebs-explained/
S3 Features An Internet-scale data storage service All data is stored redundantly in multiple AZs Data is located in the region you specify Stores objects from 1 byte to 5TB in size Objects are stored in a bucket and retrieved via a unique, developer-assigned URL You can have 100 named buckets Each bucket can store an unlimited objects in a flat namespace.
Applications of S3 Fast, scalable, and reliable web file hosting Especially useful for audio and video files http://aws.amazon.com/articles/1073
Amazon Glacier Cloud based backup and long term storage Durable: data stored on multiple devices at multiple sites. Cheap: as low as 0.01 per GB-month. Slow: retrieval guaranteed within 24 hours; usually requires 3-5 hours. Organize data in vaults. Store archives (up to 40TB) in vaults. Can have up to 1000 vaults. Jobs notify user of completion using Amazon SNS.
Compute Storage Database Networking Foundation Services
Databases Amazon Relational Database Service (RDS) Managed relational database services. Access via standard database protocols and SQL. Amazon SimpleDB Non-relational (NoSQL) flexible database service. Access via web service requests. Table size limited to 10GB. Amazon DynamoDB Scalable NoSQL database service introduced in 2012. No table size limits, automatically partitions and scales.
Relational Database Service (RDS) Users created their own DB instances DB types: MySQL, Oracle, PostgreSQL, MSSQL. Instance types with different CPU, RAM, storage. Can create replicated DB instances across AZs. Amazon manages Software installation and updates. Backups. System administration.
SimpleDB Cloud-based non-relational (NoSQL) data store Data is stored in domains (tables) Tables limited to 10GB in size. Domains have a set of attributes (columns) Attributes can have up to 256 values Domains can have up to a billion items (rows) SimpleDB can be queried using a simple version of SQL via web service requests. Does not support JOIN operations
Attributes can be added Dynamically Initial model for person domain Effect of adding Middle name attribute
DynamoDB Highly reliable and scalable key/value store. Fast Stores associative arrays rather than tables. Keys can have multiple values. High throughput (built on SDs). Very low latency (<10ms). Users reserve desired throughput. DynamoDB reconfigures itself to meet reservations.
Compute Storage Database Networking Foundation Services
Networking Virtual Private Cloud (VPC) Logically isolated segment of AWS cloud. Complete control over virtual network, including Subnets, routing tables, IP address ranges, etc. Security and privacy. AWS Direct Connect Dedicated private 1 or 10Gbps connection to AWS. Available in about a dozen major data centers. Consistent latency and throughput. Lower data transfer pricing.
AWS Economics
AWS Economics AWS prices its resources based on Time: An hour of CPU time Volume: GB of transferred data Count: Number of messages queued Time and Space: GB-month of data storage Billing is done at beginning of month
EC2 Instance Pricing Linux Instances Windows Instances
Reserved Instances Instance Pricing Options Reservations for 1 to 3 years. Price discounted by up to 65%. Instance type can change within instance class. Instances can be moved between AZs. Spot Instances Bid on spare Amazon compute capacity. If bid exceeds current SpotPrice, you have an instance. Your instance runs until SpotPrice exceeds your bid. Useful for large computations whose results are not needed at a specific time.
EC2 Communication Charges
AWS architecture Key Points Global infrastructure Fundamental Services Application Services Management and Administration Fundamental Services Compute: EC2, auto-scaling Storage: EBS, S3 Database: RDS, SimpleDB, DynamoDB Networking: VPC, Direct Connect
Key Points AMIs are virtual disk images A single AMI may have many instances Instances are running VMs Run in an AZ located in a region. Use keypair to access via ssh. On instance termination Local storage is lost except EBS volumes. DNS name and IP address are lost. Use elastic IPs or own DNS for permanent addresses. EC2 bills for time, data transfer, and storage.