What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)? What is Amazon Machine Image (AMI)? Amazon Elastic Compute Cloud (EC2)? What are the differences between EBS-Backed and Instance Store? What is the Amazon EC2 Pricing for 1-YEAR TERM of d2.2xlarge? What are Amazon S3 buckets and Objects? What is AWS Free Usage Tier of S3?
Cloud computing means storing and accessing data and programs over the Internet instead of your computer's hard drive. Web and Internet based on demand computational services Infrastructure complexity transparent to end user Horizontal scaling with no additional cost Increased throughput Public Clouds Amazon Web Services, Windows Azure, Google AppEngine, Private Cloud Infrastructure Software Eucalyptus, Nimbus, OpenNebula
Manage provisioning of virtual machines for a cloud providing infrastructure as a service Coordinates many components Hardware and OS Network, DNS, DHCP VMM Hypervisor VM Image archives User front end, etc..
Types of clouds: Infrastructure as a Service (IaaS) Eg: Amazon EC2 Platform as a Service (PaaS) Eg: Microsoft Azure, Google App Engine Software as a Service (SaaS) Eg: Salesforce
Compute o Elastic Compute Service (EC2) o Elastic MapReduce o Auto Scaling Storage o Simple Storage Service (S3) o Elastic Block Store (EBS) o AWS Import/Export Messaging o Simple Queue Service (SQS) o Simple Notification Service (SNS) Database o SimpleDB o Relational Database Service (RDS) Content Delivery o CloudFront Networking o Elastic Load Balancing o Virtual Private Cloud Monitoring o CloudWatch Workforce o Mechanical Turk
Virtualization is the creation of a virtual (rather than actual) version of something, such as an operating system, a server, a storage device or network resources. There are three areas of IT where virtualization is making headroads, network virtualization, storage virtualization and server virtualization. Different virtualization techniques User mode Linux Pure virtualization (eg:vmware) Hard till processor came up with virtualization extensions (hardware assisted virtualization) Para virtualization (eg: Xen) Modified guest OS s
An Amazon Machine Image (AMI) is a master image for the creation of virtual servers (known as EC2 instances) in the Amazon Web Services (AWS) environment. http://docs.aws.amazon.com/awsec2/latest/userguide/amis.html
You can select an AMI to use based on the following characteristics: Region Operating system Architecture (32-bit or 64-bit) Launch Permissions Storage for the Root Device Launch Permission public explicit implicit Description The owner grants launch permissions to all AWS accounts. The owner grants launch permissions to specific AWS accounts. The owner has implicit launch permissions for an AMI.
Amazon EC2 is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers. EC2 provides a wide selection of instance types optimized to fit different use cases. Instance types comprise varying combinations of CPU, memory, storage, and networking capacity and give you the flexibility to choose the appropriate mix of resources for your applications. Each instance type includes one or more instance sizes, allowing you to scale your resources to the requirements of your target workload.
Elastic Web-Scale Computing Completely Controlled Flexible Cloud Hosting Services Designed for use with other Amazon Web Services Reliable Secure Inexpensive On-Demand Instances Reserved Instances Spot Instances http://aws.amazon.com/ec2/
1. Choose AMI 2. Choose Instance Type 3. Configure Instance 4. Add Storage 5. Tag Instance 6. Configure Security Group 7. Review
52.1.234.76.rdp
Amazon Simple Storage Service (Amazon S3), provides developers and IT teams with secure, durable, highlyscalable object storage. Amazon S3 is easy to use, with a simple web services interface to store and retrieve any amount of data from anywhere on the web. With Amazon S3, you pay only for the storage you actually use. There is no minimum fee and no setup cost. Amazon S3 can be used alone or together with other AWS services such as Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Block Store (Amazon EBS), and Amazon Glacier, as well as third party storage repositories and gateways. Amazon S3 provides cost-effective object storage for a wide variety of use cases including cloud applications, content distribution, backup and archiving, disaster recovery, and big data analytics.
Durable Low Cost Available Secure Scalable Send Event Notifications High Performance Integrated Easy to use Backup and Archiving Content Storage & Distribution Big Data Analytics Static Website Hosting Cloud-native Application Data Disaster Recovery
A bucket is a container for objects stored in Amazon S3. Every object is contained in a bucket. Bucket names must be globally unique. Just like a bucket holds water, Amazon buckets are like a container for your files. You can name your buckets the way you like but it should be unique across the Amazon system. Follow a domain naming convention, like downloads.xyz.com or media.xyz.com. example above: media.xyz.com bucket on Amazon S3 will correspond to a web like http://media.xyz.com.s3.amazonaws.com/ address while download.xyz.com will correspond to a URL like http://download.xyz.com.s3.amazonaws.com/
Entities stored in Amazon S3. Objects consist of object data and metadata. Metadata consists of key-value pairs. Object data is opaque. Accessing objects in S3 buckets: Move data into and out of S3 buckets Set access privileges
Let us go online and create Bucket
EBS is a new type of storage designed specifically for Amazon EC2 instances. Amazon EBS allows you to create volumes that can be mounted as devices by EC2 instances. Amazon EBS volumes behave as if they were raw unformatted external hard drives and can be formatted using a file system such as ext3 (Linux) or NTFS (Windows) and mounted on an EC2 instance; files are accessed through the file system. They have user supplied device names and provide a block device interface. S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers. S3 needs software to be able to read and write files but is hugely scalable, stores 6 copies of data for HA and redundancy, and is rumoured to be written in Erlang and is hugely scalable.
Amazon EMR is a service enables to analyze and process vast amounts of data. It does this by distributing the computational work across a cluster of virtual servers running in the Amazon cloud. The cluster is managed using an open-source framework called Hadoop. Amazon EMR has made enhancements to Hadoop Hadoop clusters running on Amazon EMR use EC2 instances as virtual Linux servers for the master and slave nodes Amazon S3 for bulk storage of input and output data CloudWatch to monitor cluster performance and raise alarms Data can move into and out of DynamoDB using Amazon EMR and Hive http://docs.aws.amazon.com/elasticmapreduce/latest/developerguide/emr-what-is-emr.html
You are going to use Elastic Compute Cloud (EC2) Amazon Simple Storage Service (S3) Elastic MapReduce (EMR) There are several ways to interact with Amazon Web Services. In this tutorial, will focus on 1 of them The Amazon Console: it is a graphical interface that you can use to launch and manage job flows, which is the easiest way to get started;
Create an AWS account - http://aws.amazon.com/ Sign up for EC2 cloud compute services - http://aws.amazon.com/ec2/ Set up Security Credentials (under menu Account Security Credentials) - 3 kinds of credentials, you need to create an Access Key ; use it to access S3 storage Sign up for S3 storage services - http://aws.amazon.com/s3/ Sign up for EMR - http://aws.amazon.com/elasticmapreduce/
Streaming: Hadoop streaming is the built-in utility provided with Hadoop. Streaming supports any scripting language, such as Python or Ruby. It is easy to read and debug, numerous libraries and data are available, and it is fast and simple. You can script your data analysis process and avoid writing code by using the existing libraries. Streaming is a low level interface. You are responsible for converting your problem definition into specific Map and Reduce tasks and then implementing those tasks via scripts. Custom JAR: The Custom JAR job flow type supports MapReduce programs written in Java. You can leverage your existing knowledge of Java using this method. While you have the most flexibility of any job flow type in designing your job flow, you must know Java and the MapReduce API. Custom JAR is a low level interface. You are responsible for converting your problem definition into specific Map and Reduce tasks and then implementing those tasks in your JAR.
You can also use Amazon EMR to analyze and process data without writing a line of code. Several open-source applications run on top of Hadoop and make it possible to run map-reduce jobs and manipulate data using either a SQL-like syntax with Hive, or a specialized language called Pig Latin. Amazon EMR is integrated with Apache Hive and Apache Pig.
The World Factbook, produced for US policymakers and coordinated throughout the US Intelligence Community, marshals facts on every country, dependency, and geographic entity in the world. We share this information with the people of all nations in the belief that knowledge of the truth underpins the functioning of free societies. The Factbook provides information on the history, people, government, economy, energy, geography, communications, transportation, military, and transnational issues for 267 world entities. https://www.cia.gov/library/publications/the-world-factbook/
Calculate the word frequencies of CIA world Factbook Input Mapping Shuffling Reducing Sorting foo car bar foo bar foo car car car foo, 1 car, 1 bar, 1 foo, 1 bar, 1 foo, 1 car, 1 car, 1 car, 1 foo,1 car,1 bar, 1 foo, 1 bar, 1 foo, 1 car, 1 car, 1 car, 1 bar,<1,1> car,<1,1,1,1> foo,<1,1,1> bar,2 car,4 foo,3
Mapper.py: import sys for line in sys.stdin: for word in line.split(): print(word.lower() + "\t" + 1) Reducer.py: import sys counts = {} for line in sys.stdin: word, count = line.split("\t") dict[word] = dict.get(word, 0) + int(count) for word, count in counts: print(word.lower() + "\t" + 1)