Assignment thesis Integrate Cloud Infrastructure in Unified Hosting Environment

Size: px

Start display at page:

Download "Assignment thesis Integrate Cloud Infrastructure in Unified Hosting Environment"

Naomi Richards
5 years ago
Views:

1 Assignment thesis Integrate Cloud Infrastructure in Unified Hosting Environment Writer: Tri Vo Hoang Responsible: Dipl. Inf. Josef Spillner Chair owner: Prof. Dr. rer. nat. habil. Dr. h. c. Alexander Schill Institute for Systems Architecture Chair of Computer Networks TU Dresden 1

2 STATEMENT OF AUTHORSHIP I hereby certify that this document has been written by myself, and describes my own work about Integrate Cloud Infrastructure in Unified Hosting Environment, unless otherwise acknowledged in the text. All verbatim extracts have been distinguished by quotation marks, and all sources of information have been specifically acknowledged by clear cross referencing to author and work. I declare that this work has not been submitted for any other degree. Vo, Hoang Tri Dresden,

3 Table of Contents Overview and motivation...5 Chapter 1. INTRODUCTION TO CLOUD COMPUTING General overview History of the cloud forming Why Cloud architectures? Disadvantages of cloud computing Cloud basic components Public, private and hybrid clouds Vendors and Its Product on the market Amazon Web Services Google Microsoft Azure...15 Chapter 2. AMAZON WEB SERVICES Overview Storage Concepts Object concepts Security concepts Elastic Compute Cloud Concepts (EC2) AMI Image Bundle Upload Register and Launch Availability zones Private, public and elastic IP addresses Elastic Block Store (EBS) Security groups Security key pairs...22 Chapter 3. EUCALYPTUS Architecture overview Node Controller (NC) Clusters Controller (CC) Cloud Controller (CLC) Monitor and control Users control Images in Eucalyptus Instances Control Service level agreements Control Eucalyptus Networking Public interface Private interface Eucalyptus Networking Mode System mode Static mode Managed mode Why Eucalyptus? Compare to other cloud open source systems AppScale

4 2.2 Enomalism Nimbus...33 Chapter 4. HYPERVISOR Differences between various hypervisors Full Virtualization using Binary Translation Paravirtualization and XEN Hardware Assisted Virtualization and KVM Hypervisor Xen and its networking Summary, Xen or KVM? The roles of hypervisors in Cloud computing system...38 Chapter 5. INTEGRATION EUCALYPTUS & UHE Unified Hosting Environment Motivation UHE Concepts Design and architecture An integration solution model for virtual images in UHE Deployment Adaptation with Eucalyptus Monitor with Eucalyptus...44 Chapter 6. INSTALLATION AND IMPLEMENTATION...46 PART A. INSTALLATION Xen Installation Launch an instance from Xen Eucalyptus Installation...49 PART B. IMPLEMENTATION Deployment of virtual images in UHE Monitoring resources on node machines How to send a REST request in Eucalyptus and Amazon EC A REST request example List of request operations supported by Eucalyptus Using an Amazon Client API: PHP Using an Amazon Client API: Python...65 Chapter 7. FROM TEST CASES TO A REAL CLOUD Test cases Implementation for a real cloud Use elastic IP address HOW TO: download Amazon Images for use with Eucalyptus HOW TO: use Amazon Images to work for Eucalyptus? My Troubles Shooting Logs...69 Eucalyptus installation issues...69 When running an instance...70 When deploying virtual image into UHE...70 SUMMARY...71 REFERENCES

5 Overview and motivation Unified Hosting Environment (UHE) is an environment which is defined as a composite consisting of various service platforms, ranges from OSGi service platforms, conventional web servers for web applications, operating systems implementing the Linux Standard Base (LSB) service interface to BPEL engines and virtualized machines. These type of services can be deployed in one environment dynamically and then they can interact with the administration controller in terms of monitor and adaption of the deployed services. My writing is a part of this project, which brings an elastic computed computing of the cloud infrastructure to integrate into UHE. In other way, you have a virtual image, which contains a system with full functionalities of web services and want to deploy the image into a hosting environment (UHE) immediately, so that on one hand customers can either launch it on demand or even shut it down after use, on the other hand the hosting environment's administrator can perform some monitoring tasks parallel. To archive this goal, from the beginning we will take a general view at the cloud computing, its advantage, infrastructure, components and also at the supported vendors. Then we will explorer Amazon Web Services in details in chapter two, not just because Amazon is one of the first pioneer on these market places today, but also its concept of the cloud is very tight to Eucalyptus, an open source software infrastructure for implementing cloud computing on clusters. Thus we will use Eucalyptus, describes in chapter three, for implementing one architecture, which consists of one cloud controller and many node controllers, on which our virtual instances can be launched from images and managed dynamically. Beside understanding the Eucalyptus architecture and networking in chapter three, we also have to know more about the hypervisors in details and a comparison between various hypervisors, since this so called virtual machine monitor, is the heart control of the life cycles of our instances. And it is mentioned in chapter four. Then we will talk about UHE generally and a concept model which shows you how to integrate Eucalyptus into UHE, so that they can work together to fulfill our requirements: deployment, monitor and adaption. Finally in chapter six, after understanding all the concepts, we will come to install Eucalyptus on a machine, also have to run some tests to launch an instance with a hypervisor (Xen). Then in this test environment, which consists one Eucalyptus and UHE system, we will use a python script for deployment. For monitoring and managing the instances, we will build a monitoring site to interact with the Eucalyptus Controller. I choose Amazon PHP Library, because it is easier to modify it to work with Eucalyptus without any problems, and further more it can be use to build a browser based management web site, since we all know the advantages of a browser is independent from any software requirements and systems. Including with my writing is the above python scripts and a site demo written in PHP to interact with Eucalyptus cloud. In summary, my writing is on its own way to show readers the integration of Eucalyptus with the UHE system. And at the end is a review which concerns how to bring our services from test environment onto a real business market. 5

Chapter one INTRODUCTION TO CLOUD COMPUTING The term of Cloud Computing may not be strange to a computer scientist nowadays, as it grows fast in the last 10 years with the support of Internet network

6 Chapter one INTRODUCTION TO CLOUD COMPUTING The term of Cloud Computing may not be strange to a computer scientist nowadays, as it grows fast in the last 10 years with the support of Internet network infrastructure. Slowly and slowly people and companies have changed their habit ways of using resources to develop or maintain their own data. For a simple example, it changes from how they manage their own data. Data can be managed in a computer, a personal computer as usual, or can be stored in a centralized third party provider, which already have all the resources they want and on their demands when they need. This chapter will guide the readers through various sections which provides an understanding about Cloud Computing and history, yes, its history but not a boring one. Because we will find out soon that the development of Cloud Computing is sooner or later a trend when John McCarthy1 opined his words in 1960 that computation may someday be organized as a public utility. The term cloud has begun to come into commercial use in the early 1990s and up to today. And further more we will give a short description about its characters, together with a discussion about the advantages and disadvantages of this new trend and point of some ideas how people would like to use the cloud architectures in real scenarios. But first of all let's take a first shot about its definition and continue with a general overview to get the picture in which position the cloud stands today within and versus the traditional architectures. Definition Cloud computing is a style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet. Users need not have knowledge of, expertise in, or control over the technology infrastructure in the cloud that supports them. [1] 1 John McCarthy (computer scientist) who received the Turing Awards in 1971 for his major contributions to the field of Artificial Intelligence (AI) 6

1. General overview To make it clear, we can understand cloud computing as technologies that rely on the Internet to satisfy the computing needs of users, who do not generally own the physical

7 1. General overview To make it clear, we can understand cloud computing as technologies that rely on the Internet to satisfy the computing needs of users, who do not generally own the physical infrastructure. All services are often provided by a third party with several common business applications online. Users can choose the services they want and access them from a web browser, while the software and data are stored on third party's company. The concept generally includes combinations of the following terms that we used to hear about: infrastructure as a service (IaaS), platform as a service (PaaS) and software as a service (SaaS), explaining in the following. Illustration 1 3 main models of the cloud as a pyramid. Author: Dustin Amrhein, Staff Software Engineer, IBM SaaS [2] or a more simple name Service On Demand. It is one of the first developed model of cloud computing in its own history, of which the concept was first extended by Microsoft through the development of Web Services. SaaS is a model of software deployment whereby a provider develop his own services and licenses their applications to customers for use as services on demand. SaaS is very useful from the point of view that licenses share across one organization and between organizations. This will reduce the cost for every device, instead of purchasing softwares with a fixed cost at the beginning, SaaS enables licensing only the variable amount of software needed. Besides upgraded and installing software patches in an organization for every device is not easy and but complicated and we can let the vendors control the rest. Users only need to concentrate on their projects, not on the installation steps and patching one. But it also reduces the freedom of use from consumers when they depend too much on their vendors. This disadvantage will be discuss later in section 4 together with other characters of the cloud. Platform as a service (PaaS) [3] is the delivery of a computing platform and solution stack as a service. With no software downloads or installation needed for end users, without the cost and complexity steps from buying, managing the underlying hardware and software layers, PaaS provides complete life cycle of the whole services entirely available from the Internet within the actual target delivery 7

8 platform, such as the life cycle hosting deployment design development testing as well as team collaboration with interactive visualization tools and state management. It can have web services integrated and marshalling by supporting SOAP and REST for creating compositions of multiple web services. These services are all provided as an integrated solution over the web. By bringing all built in infrastructure services together, the cost is reduced and further more it gains security, scalability rather than obtaining, testing and integrating these separately. PaaS is a prefer by having an already stable platform and easier for developing an interactive of multi user application. But the lower flexibility of PaaS may not be compatible with sites which grow quickly, when these sites have new complex additional features that may be difficult to implement on a web based platform. Infrastructure as a Service (IaaS) [4] is the delivery of computer infrastructure (typically a platform virtualization environment) as a service [wiki] or a nice understanding name, everything as a service, which include all services from servers hosting, software, data center space or network equipment. Consumers have to pay for their utility computing basis (eg per instance hour) or amount of resources they consumed. In short, consumers can have their resources on demand on working day and turn it off when they don't use it anymore, this can be at night or due to uncritical time and seasons. So they don't have to pay for their hosting resources, because in this world of virtual environment machines, a resource instance can be start and shut down from a virtual image. We can add more volume to a running instance on demand and more over create snapshot from a volume for backing storage. In general, we are simply talking about bringing the services to the customers when they need. So what would the provider give their best to support the customers. Normally the service provider will also support many features of load balancing, elastic block stage with scalability, elastic ip address and back up storages etc, which will be discussed in details in Amazon concepts of chapter two, to improve their services. And sometimes its distributed characters of the cloud makes you confused with grid computing, a form of distributed computing whereby a super computer is composed of a cluster of networked and loosely coupled computers together, acting in concert to perform very large tasks. Here we also have to mention about grid computing too. Because in cloud computing users can also compile a very large tasks, divide and distribute it parallel over several node machines. This is a service supported by Amazon Web Service, a very popular and success in the cloud provider industry. 2. History of the cloud forming [1] When criticize about Cloud Computing people always remember to the time of the 60s when we already had a centralized system developed, by which users connected through dumb terminals, a serial RS 232 connection, to mainframe computers. But it limited both freedom and creativity. After that is the time for a strong enough PC, a personal computer, which satisfies for everybody needs. In the early 1990s to refer to large Asynchronous Transfer Mode (ATM) networks, the 8

9 term cloud had already come into commercial use. By the turn of the 21st century, it began to appear more widely, although most of the focus at that time was limited to SaaS. In 1999, Salesforce.com applied many technologies developed by companies such as Google and Yahoo! to business applications, providing the concept of On demand and SaaS customizable by customers with limited technical support required. In the early 2000s, Microsoft extended the concept of SaaS through the development of web services. IBM detailed these concepts in 2001 in the Autonomic Computing Manifesto, which described advanced automation techniques such as self monitoring, self healing, self configuring, and self optimizing in the management of complex IT systems. In the early to mid 2000's, both Google and Amazon independently developed their own cloud computing architectures on which to run their businesses. Having developed this infrastructure, they realized that their own infrastructure became a service itself, which could be sold on a per usage basis to developers. Amazon, in particular, realized that it can sell its Platform as a Service. Thus, Amazon is often seen as a front runner in the commercializing of cloud computing, in particular with coming up with billing and usage models. In 2007 Google, IBM, and a number of universities formed a research cloud to provide a cloud computing environment for student researchers to develop new cloud computing techniques and applications. Although cannot be compared to Amazon, it gives the opportunities for students to test, understand and allow for the further development of cloud computing. One of the open sources out there attracts the researchers is Eucalyptus. It is originated in the Computer Science Department at the University of California, Santa Barbara as a researched project. We will take a deep look on it together with Amazon Web Services in the later chapters. 3. Why Cloud architectures? [1] Following is the check list features by using cloud. 1. No startup time, no fix costs. Traditionally, a large scale system needs startup investment of hardware (PC, routers, power supplies, cooling...) It takes an amount of time before the project can be really started. Now startup time is zero. As you can see on the illustration below on the left, the capital expenditure, or so called fixed costs, occurs at the set up time of the beginning. This type of cost can't be avoid, it includes infrastructure costs for hardware, software, services and management overhead. The operational expenditure is variable costs, it is various and based on the consumed resources in use such as electricity, hosting service and bandwidth usage in a period time. This is only what we need to pay and cloud providers gives us an instant service on demand. Theoretically it sounds great but in practice the project manager has to calculate again. Although he might be able to gets rid of and save on upfront fix costs at the beginning, he might actually pay more for variable costs, if the grown up grad of cost line of the cloud model is much higher than the capital expense. In this case, the fix costs would be relatively small and the cloud model might not make great fiscal sense. Illustration 2 shows two cost lines, the traditional line and the cloud model line, which cut each other at one point. This is the critical point for the project manager making his decision, if 9

it is more efficient to use the cloud architecture or not. Decision depends on how many the amounts of the variable on the X axis will take.

Illustration 2: Diagram showing economics of cloud computing versus traditional IT, including capital expenditure (CapEx) and operational expenditure (OpEx), by Sam Johnston [wiki] Illustration 3:

This is the case when your brilliant brain has a very nice idea of service. You started to invest heavily but did not get famous. You became the victim of your own failure.

10 it is more efficient to use the cloud architecture or not. Decision depends on how many the amounts of the variable on the X axis will take. The x axis, in some cases, is how long the project will take. Will this variable come over the critical point? If so, the cloud model is not efficient anymore. Illustration 2: Diagram showing economics of cloud computing versus traditional IT, including capital expenditure (CapEx) and operational expenditure (OpEx), by Sam Johnston [wiki] Illustration 3: Diagram showing fix costs is low and variable costs of the cloud model grow fast and critical point for choosing model 2. You scale as you grow. This is the case when your brilliant brain has a very nice idea of service. You started to invest heavily but did not get famous. You became the victim of your own failure. Here, the solution is low risk because you scale as your service success grows. 3. Efficient resource utilization. No more worry about running out of capacities. Developers can manage their resources on request and scale them elastically on their own. 4. You pay what you consume. The customers are not reliable for entire the infrastructure. The differences between desktop application and the cloud, we use infrastructure of third party and only get bill exactly for what was used. 5. Parallel processing and load balancing. With Cloud Architectures, it would be possible to launch 500 instances and process the same job in one hour. Giving an example from a popular vendor like Amazon Web Service, it uses an open source distributed processing framework called Hadoop, which allows computation of large datasets by splitting the dataset into manageable small jobs, spreading these small jobs across several of machines. It can automatically manage the overall process by launching jobs, processing the job no matter where the data is physically located and at the end collecting the job output into a final result. Then it shuts down all the virtual node machines it's used before, releasing all resources back to the cloud [5] Nice talk! And you wonder if it can be real. This will attract you more with a live example. SmugMug, an online photo storage application that stores more than 10

11 half a petabyte of data on S3 from Amazon, estimates cost savings on service and storage to be close to $1 million. It is a heavy user of the Elastic Compute Cloud (EC2) computing resources to meet surges in demand. [6] But everything has its disadvantages which will be discussed in the next section. 4. Disadvantages of cloud computing Talking about the great benefits of the cloud doesn't mean it is totally perfect in use. Since everything is on the cloud, first you cannot reach or backup your data information right away just by plugging your USB stick into your laptop. Moreover cloud computing has been criticized for limiting the freedom of users by the way you totally depend on your services provider. Is it possible for customers to use another application softwares other than the only chosen one from providers? People from time to time will be trap in with their vendors until they can force the depend users for their own sack. Especially users don't have an option on installation new applications and some certain tasks have to be approved by the administrator. Moreover users store their privacy on a third parties and who says the service providers won't use it for many other statistic marketing purposes, like the habit uses of people. Lately, in concern of security, when everything is centralized, the problem of security management is bigger since people not sure of a total security system. And cloud providers will be such targets for hacking with all the riches behind their firewalls. Cloud computing has unique attributes that require risk assessment in areas such as data integrity, recovery, and privacy, and an evaluation of legal issues in areas such as e discovery, regulatory compliance, and auditing Gartner says.2 5. Cloud basic components [1] Successful implementation of cloud computing require proper implementation of certain six components. 1. Client: When talking about a cloud computing system, it's helpful to divide it into two sections: the front end and the back end. They connect to each other through a network, usually the Internet. The front end is the side the computer user, or client. The back end is the cloud section of the system. The front end includes the client's computer and the application required to access the cloud computing system. Not all cloud computing systems have the same user interface. Services like Web based e mail programs leverage existing Web browsers like Internet Explorer or Firefox. There are certain systems that requires pre installed applications to ensure smooth transition between the system and the clients. 2. Service: A cloud service (eg Web Service) is software systems designed to support interoperable machine to machine interaction over a network which may be accessed by other cloud computing components, service providers or end users directly. 3.Applications: Cloud application leverages the Cloud in software architecture, often eliminating the need to install and run the application on the customer's own computer, thus alleviating the burden of software maintenance, 2 Gartner.com one of the world s leading information technology research and advisory company. 11

12 ongoing operation, and support. For example web application Face Book, Software as a service (Google Apps, SAP and Salesforce). 4. Platform as a service: In regular websites or applications, the application is directly connected to the server. In cloud computing world, the application is still launched to another application called the platform, which facilitates deployment of applications. The platform usually comes as the programming language such as.net, Python, Java or Ruby on Rails. So, for those who's looking for his cloud computing providers, they will have to consider the set programming languages that could be run in the platform. 5. Storage as a service: is the delivery of data storage as a service, often billed on a utility computing basis (example per gigabyte per month). Consistency and availability of service are very important in cloud computing. Storage will naturally require the storage to be available and multi duplicated backup all the time. 6. Infrastructure (Infrastructure as a service) is the delivery of a platform virtualization environment as a service. This could be considered as the platform behind the storage as the infrastructure helps the storage deal with load problems, by load balancing and management. 6. Public, private and hybrid clouds [3] Now that we already have known the cloud computing architectures so far, let's take a look at the three major types of clouds. 1. Public clouds are cloud services which exist beyond the company firewall, and they are fully hosted and managed by the cloud provider, not by the customer company. Whether it is software, application infrastructure, or physical infrastructure, the cloud provider takes on the responsibilities of installation, management, provisioning, and maintenance them. Customers only have to choose and pay for the resources they use, so under utilization is eliminated. Public clouds typically charge a monthly usage fee per GB combined with bandwidth transfer charges. However, the consumers have less configuration options than what they would be if the resource was controlled directly by the consumers (private cloud). And when we have little control over the infrastructure, processes requiring tight security and regulatory compliance do not fit for public clouds. Illustration 4: Cloud types [3] 12

13 2. Private clouds. The difference between private and public storage clouds is simple. Where is the cloud deployed? A public cloud is offered as a service, usually over an Internet connection. Private clouds are deployed within the company firewall and managed by this enterprise's organization itself. The storage is typically not shared outside the enterprise and full control is retained by the organization. Scaling the cloud is as simple as adding another server to the pool and the self managing architecture expands the cloud by adding performance and capacity. By taking full control of setting up and maintaining the cloud, a company has all available configuration options and typically more secure. But, it takes more difficulties and prohibitive costs for establishing such this system. 3. Hybrid clouds are a combination of public and private clouds together. These clouds would typically be created by the enterprise, and the cloud provider has a part of management responsibilities. The hybrid cloud leverages services that are in both the public and private space. It depends on a company's goals and needs of services, and obtain them from the public or private cloud, as appropriate. A well constructed hybrid cloud could service secure, mission critical processes, such as receiving customer payments, as well as those that are secondary to the business, such as employee payroll processing. 7. Vendors and Its Product on the market 7.1 Amazon Web Services [9] Amazon, as a pioneer of cloud computing, provides a number of offerings which are of interest to developers. One of the most well known cloud service from Amazon is the EC2 (Elastic Computing Cloud) service. EC2 allows for the creation of virtual machine instances called AMI (Amazon Machine Images) that run on Amazon's own infrastructure. Amazon's S3 service is an online storage service which is particularly attractive to startup companies who need to scale their storage capability. It can be used as an adjunct to other Amazon Cloud services, such as the EC2. This means that an AMI, perhaps a Linux machine running PHP, can use Amazon S3 as its data store. As the data traffic grows, the S3 service expands elastically. Amazon's SimpleDB is a fast and simple cloud based database which provides indexing, storage, and access. It is significantly simpler than a fully fledged relational database since it requires no schema, it indexes data automatically, and it provides APIs for storage and access. Amazon's SQS (Simple Queue Service) provides a queue service, similar to JMS but with a RESTful interface. You also can use SQS in conjunction with Amazon's other cloud services, or as part of any other application which can connect to it using a simple HTTP GET or POST. For the hybrid application, it is a suitable replacement for a JMS queue. It can be accessed through its RESTful, XML interface, allowing for easy integration with an existing application. SQS is probably the most obvious choice for this particular hybrid application. Many software providers have partnered with Amazon to help their customers leverage EC2. For example, IBM and Amazon have partnered to offer many of IBM's most popular enterprise software, such as DB2, Informix, and 13

14 WebSphere on EC Google [10] Google provides a cloud computing platform called App Engine, which is based on Google's long established underlying platform. This includes GFS (Google's File System) and Bigtable, a database system built on GFS. Programming in the Google App Engine is done using Python. Programmers write their applications using Python and then they run on the App Engine framework. Languages other than Python will be supported in the future. A local emulator of the App Engine environment can be downloaded, for development purposes. App Engine is free and includes up to 500 MB of storage and enough CPU bandwidth to provide five million page views per day. The Google App Engine provides some useful infrastructure, including both its GFS derived data store and a memcache implementation. However, it does not provide an out of the box queue mechanism. You do have a full Python programming environment, so you can simply create your own JMS replacement on top of the App Engine. The data store is well suited for your hybrid application, and it takes very little Python to whip up a RESTful interface to your queue. System Amazon Properties EC2 Google App Engine Microsoft Live Mesh Sun Sun Grid Focus Infrastructure Platform Infrastructure Infrastructure Service Type Compute EC2, Web Storage S3 Application Storage Compute Virtualization OS Level running on a Xen Hypervisor Application Container OS Level Job Management System (Sun Grid Engine) none none none Dynamic none Negotiation of QoS Parameters User access interface Amazon EC2 Web based Web based Command line administration Live Desktop tools Console and any devices with Live Mesh installed Job submission scripts, Sun Grid Web Portal Web Api Yes Yes Unknown Yes Value added Service Providers Yes No No Yes Programming Framework Customizable Linux based Python Not applicable Solaris OS, C, C++, Fortran 14

15 Amazon Machine Image (AMI) A compared table between vendors, by M.O'Neill from IBM [8] 7.3 Microsoft Azure [11] As you might expect, Windows and.net feature prominently in Windows Azure. Azure is a web based, scalable hosting environment for applications. Developers can build applications using one of Microsoft's popular desktop coding tools Visual Studio, then deploy them to the Windows Azure platform, where they can be accessed by any computer or internet connected mobile device. The Azure platform provides numerous services such as services for infrastructure like file storage and data access, as well as more specialized services like search and contact management. It also includes the.net Service Bus. This is Microsoft's implementation of the classic Enterprise Service Bus (ESB) design pattern. One of the simplest use cases for an ESB is a message queue, so it could definitely serve as a replacement to your JSM queue. The.NET Service Bus is also developer friendly. It supports both a lightweight, RESTful interface that uses XML, and a stronger SOAP based interface that includes a full implementation of the WS * standards. Both of these interfaces allow for easy interoperability between your existing application and the.net Service Bus. SUMMARY In this chapter we have an overview about the cloud computing. The cloud has some main advantages of variable lower costs and resources on demand, but it limits the freedom of users by the way they totally depend on their services provider and the security risks. There are definition terms that we should distinguish, whereby public cloud is fully hosted and managed by the cloud provider, not by the customer company, and private clouds are deployed within the company firewall and managed by this enterprise's organization itself. We've also talked about 3 types of the cloud and one of it is more important to mention again, Infrastructure as a Service (IaaS), which includes all services from servers hosting, software, data center space or network equipment. Consumers only have to pay for their utility computing basis such as per hours or amount of resources they consumed. This is the services Amazon gives out to their customers. We will talk about Amazon in the next chapter because Amazon and the open source Eucalyptus are very closed to each other and thus we will use Eucalyptus in this writing project to solve the exercises. All of these reasons will be mentioned more details on the top page of the next chapter. 15

16 Chapter two AMAZON WEB SERVICES In the first chapter we have already taken a look about the cloud computing architectures, its characters, components and also mention shortly about some vendors of this industry. Since we got the picture about the cloud, it's time to take a deeper look into the real industry to figure out what kinds of use cases vendors provide to us, how we use and interact with these services generally, what we need to know as an application developer. For these understanding reasons, it's worth to choose Amazon Web Services (AWS) as a vendor to explorer. Readers may question about why I should write details about AWS in this writing. Not only AWS is a successful pioneer of cloud computing, but also the fact, as I mentioned at the beginning and now again, that my writing is a part of the project Unified Hosting Environment. In this project I need to find out how we integrate, monitor and adapt such an existing cloud service into the project. And we find that Eucalyptus, an open source software infrastructure for implementing cloud computing on clusters, which fits our needs. The fact is Eucalyptus is developed very close and tight to AWS as it use AWS as its interface. So it's second reason for me to write about AWS much more in details, since both of the two AWS and Eucalyptus have several functions and concepts in common. 1. Overview [9] AWS provides a set of services for ready to use computing infrastructure. Their computing platform was built and refined over the years by Amazon and is now available to anyone who has access to the Internet. A key point is that the infrastructure is elastic and can scale up and down based on demand. Now we can call out four popular services from them: storage, computing, messaging, and datasets. 1. Storage: you can store your files, documents and everything else by using Amazon S3. The only differences between it and another regular hosting service are scalable, reliable, highly available low cost storage. It's about $0.150 per GB per month of storage used and $0.100 per GB for all data transfer in and out. 2. Computing: Amazon Elastic Compute Cloud (EC2) presents a true virtual computing environment, allowing you to use web service interfaces to launch virtual instances with a variety of operating systems from Linux Ubuntu to Windows. The system you choose to launch may have applications installed already. These are provided from various vendors such as Sun, IBM and Oracle. So you can have your custom application environment, manage your network s access permissions, and run your image using as many or few systems as you desire and turn them off when they are unneeded. 3. Messaging: In the cloud it is important to communicate and decouple your application components together in an asynchronous way. Other vendor can use JMS for this sack. Amazon has their own so called Amazon Simple Queue Service (SQS). 4. Datasets: Amazon SimpleDB (SDB) provides scalable, indexed, zero maintenance storage, along with processing and fast querying for datasets. It is significantly simpler than a relational database since it requires no schema, it 16

indexes data automatically, and it provides APIs for storage and access. All these above services can be composed with each other, implementing a very fast scalable and strong applications.

can continue in the further chapters. 2. Storage Concepts Illustration 5: amazon S3 [9] [12] The following only give you a first overview of S3 functionalities.

17 indexes data automatically, and it provides APIs for storage and access. All these above services can be composed with each other, implementing a very fast scalable and strong applications. But I only have to explain the two of them in details, the storage S3 and Elastic EC2, because their concepts are used again by Eucalyptus and it is necessary to understand these concepts before we can continue in the further chapters. 2. Storage Concepts Illustration 5: amazon S3 [9] [12] The following only give you a first overview of S3 functionalities. They are described more in details right after this. You can write, read, and delete data objects. Each object is stored in a bucket and retrieved via a unique, developer assigned key. Authentication mechanisms are provided to ensure that data is kept secure from unauthorized access. Objects can be made private or public, and rights can be granted to specific users. Uses standards based REST and SOAP interfaces. Default download protocol is HTTP. A BitTorrent protocol interface is provided to lower costs for high scale distribution. 2.1 Object concepts [12] So S3 is a service which we can store data on the internet. The storage scales to fit our needs. To use S3 we need to understand three basic concepts: buckets, objects and keys. A bucket is simply a container for objects stored in Amazon S3. Every object is contained within a bucket. Think of a bucket as analogous to a folder, or a 17

18 directory, on the file system. One of the distinctions between a file folder and a bucket is that each bucket and its contents are addressable using a URL. For example, if the object named photos/puppy.jpg is stored in the peter bucket, then it has the following address: You can't create a bucket within a bucket. Objects Think of an object as the file you want to store. Each object stored is composed of two entities: data and metadata. The data is the actual thing being stored, such as a PDF file, Word document, a video file, etc. And metadata for describing the associated object, specified by us as key value pairs when the object is sent to S3 for storage. Some examples of metadata are the HTTP Content Type, the date the object was last modified, and any other custom metadata specific to you or our application. A key Each object stored within an S3 bucket is identified using a unique key. Each object inside a bucket has exactly one key. The bucket and the key are together used to provide the unique identification for each object stored in S3. In the example above, peter is bucket name and photo/puppy.jpg is the key. 2.2 Security concepts [12] Because each bucket and object created in S3 belongs to one user account creating them, so you have to give permissions to other users and customers for them to be able to see the list of objects in your S3 buckets or to download the data contained within them. When storing data, we should consider these security rules: a) Authentication: when you send a request to S3, this request should be authenticated. Each user account is given a unique access key id for this reason. b) Authorization: user trying to access the resource must have the permissions or rights to the resource. Each S3 object has an access control list (ACL) associated with it that explicitly identifies the grants and permissions for that resource. c) Integrity: each S3 request must be digitally signed by the requesting user with a secret key. The digital signature is sent together with the request message. d) Encryption: you can access S3 through the HTTPS protocol to ensure that the data is transmitted through an encrypted connection. e) Nonrepudiation: each S3 request is time stamped and serves as proof of the transaction. For the above security reasons, each and every REST request made to S3 must go through the following standard steps: 1) The request and all needed parameters must be assembled into a string. 2) The request string is then signed by the secret access key of the user to create a keyed Hash Message Authentication Code (HMAC) signature of the request string. 3) This calculated signature is then added as a parameter onto the http request. 18

19 4) S3 supports both REST and SOAP interface for sending user requests. An example, if you send a REST request, the signature should be included as a query parameter or as a HTTP Content, depending on GET or POST request respectively. 5) Amazon S3 will check to see if the provided signature is valid. If the signature is valid, then S3 will process the request. Further details on which parameters of the request must be signed, is not shown here. It will be mentioned in the later chapter of implementation. 3. Elastic Compute Cloud Concepts (EC2) [12] This service makes it simple to create, launch, and provision virtual instances. To use Amazon EC2, you simply: create an Amazon Machine Image (AMI) containing your applications, libraries, data and associated configuration settings. Or you can choose pre configured templated images, already provided from Amazon, to start the instance immediately. Then upload the AMI into the amazon service S3 mentioned above. After starting an instance you can configure security and network access of it. Also add more volumes attach to an instance, or associate it with an elastic static IP address to communicate to the world and many more interesting possibilities. After running, you can terminate your instances and pay only for the resources that you actually consume, like instance hours or data transfer. All the tasks can be performed via a command lines tool from Amazon or by using the web service APIs or the variety of management tools provided by third parties. Following are important concepts of EC2 services that you should need to know. 3.1 AMI AMI is an encrypted machine image that contains all information necessary to boot instances of your software. For example, an AMI might contain Linux, Apache, and your web site or it might contain Linux, Hadoop, and a custom application. AMIs are stored in Amazon S3. Illustration 6: EC2 Flow [9] 3.2 Image Bundle Upload Register and Launch The above graphic explains the basic flow for using EC2. From the beginning you 19

can create an AMI from scratch (Linux and UNIX only) or based on an existing AMI. To use a file system image with EC2, first you must bundle it as an AMI (step 2).

20 can create an AMI from scratch (Linux and UNIX only) or based on an existing AMI. To use a file system image with EC2, first you must bundle it as an AMI (step 2). This step requires you to splits your encrypted image into manageable parts ready for upload to storage S3 and creates a manifest file that contains a list of all image parts with their checksums, then register your image with Amazon EC2. After bundling, you can launch as many instances of the AMI as you want, let them run in available zones you choose (see below). Also administer and use your instances as you would with any servers. Illustration 7: Availability zones [9] 3.3 Availability zones. Amazon has multiple data centers in separate geographical locations. You can launch your instance in different locations. One region (USA or Europa) has different availability zones within it. Each availability zones is engineered by Amazon to be insulated from failures in the other one. By launching instance in separate regions, you can design your application to be closer to specific customers or to meet legal or other requirements. By launching instances in separate Availability Zones, you can protect your applications from the failure of a single location. If you do not specify an availability zone when launching an instance, Amazon will automatically choose one for you based on the current system health and capacity. 3.4 Private, public and elastic IP addresses Each instance is automatically assigned a private and a public address that are directly mapped to each other through Network Address Translation (NAT). The private IP address are used for communication between instances. It is associated with the instance for its lifetime and is only returned to Amazon EC2 when the instance terminates. It is recommended to use the private address when you are communicating between instances. Because this ensures that your network traffic 20

21 follows the highest bandwidth, lowest cost, and lowest latency path through amazon's network. Illustration 8: Elastic IP Address [9] The second one, public IP address, can, of course, be used to access the instance over the Internet. But each time you launch an instance, though, this address will change. The disadvantage of public address is the case that if you are using any kind of dynamic DNS mapping for connecting a DNS name to the IP address, it can take as long as 24 hours before the change is propagated across the Internet. To solve this problem, Amazon EC2 provides elastic IP addresses. It is static IP address, associated with your EC2 account, not to a specific instance, and is permanently associated with your account unless you explicitly release it back to EC2. Amazon allows you to remap your given IP addresses to any instance on your own, rather than waiting on a data technician to reconfigure or replace your host, or waiting for DNS to propagate to all of your customers. Thinking of, when you have an instance failure, just quickly start another instance and remapping it (or using an existing instance). In the illustration 11 above, the administrator decides to replace a web server by another one. To do this, the administrator starts a new instance (1), disassociates an elastic IP address from a running instance (2), associates the elastic IP address with the new instance (3), and terminates the old instance (4). Notice that at any given time, you can only have a single instance mapped to an elastic IP address. When you associate an elastic IP address with an instance, its current public IP address is released to the Amazon EC2 public IP address pool. If you disassociate an elastic IP address from the instance, the instance is automatically assigned a new public IP address. 3.5 Elastic Block Store (EBS) EBS lets you create volumes that can be mount as block level devices to a running instance. You can also create snapshots from these volumes and later recreate a 21

22 volume from the snapshot. Each snapshot represents the state of a volume at a specific point in time. You can thus easily store files and data that need to persist beyond the lifetime of an instance on an EBS volume, then easily attach and reattach that volume to any instance you want. Illustration 9: Elastic Block Store use case [9] This takes advantage in many use cases. In the illustration above, Amazon EBS allows you to attach any instance to a storage volume. In the event your running instance fails. Your volume, which stored all necessary data intact from this failure instance, automatically unmount. And you can start another healthy instance from image then mount this volume to it. So it's a very quickly recover. Isn't it? Another use case! EBS volumes exist separately from the actual instances and persist until you delete them. So you run an instance periodically to perform a batch processing job on a large and growing data set. At the end of your job, you shut down the instance, but just leave your volume running. The next time you can continue to process the large data set, you launch a new instance and mount it with your existing volume. Note that one volume can only be attached to one instance at a time. However one instance can be mounted with many volumes. Each EBS volume is associated and located in an availability zone. The instance to which the volume is being attached must be in the same availability zone. 3.6 Security groups All instances launched within the EC2 environment run inside a security group. A security group is a named collection of access rules. These rules specify which incoming network traffic should be delivered to the instance, while other one will be denied. An example, you can set or restrict access by IP address or classless inter domain routing (CIDR) rules, which let you specify a port range and transport protocol. 3.7 Security key pairs A public/private SSH key is specified for every running instance. By using this key pair, you can log in to the console of one running instance. EC2 will add the public key to the launched instance, and you can then use the private key to ssh into it. Note that this key pair is different from the access key ID and security key mentioned in the section S3 security and you might be confused. Those access key 22

23 ID and security key are for making a request to Amazon Web Service. These public/private key pairs are for the users to securely log into a running instance. SUMMARY In this chapter we understand various common concepts of Amazon Web Service and the open source system Eucalyptus. Amazon has a storage service S3 for users to store data and images to. The storage files are managed and accessible by buckets and keys concepts. EC2 is an elastic computing system which enables users to launch an instance from images (AMI) on demand. Here we have several concepts: by launching instances in separate Availability Zones, users can protect their applications from the failure of a single location. Elastic IP address is a strong feature which is associated with a user account, but not to a specific instance so that users can to remap their given IP addresses to any instance on they own. If not, every launched instances will have a different public IP address. We have Elastic Block Store functionality which we can mount and unmount a storage volume to one running instance For securities reasons, we can define a security group. They are rules specify which incoming network traffic should be delivered to the instance, while other one will be denied. And finally we should remember that every requests to both S3 and EC2 have to be authenticated and timestamped by using the given access key ID and access secret key of every users. Besides users have to create a pair of public/private SSH keys on their own, which is used to login to the console of every running instance. Eucalyptus, an open source system which we will discuss in the next chapter, can be considered as an emulator of Amazon Web Service. Eucalyptus implements all the above concepts for controlling the life cycle of instances. Just like Amazon, they are Availability Zones, Elastic IP address, Elastic Block Store, also access key ID and access secret key for authentication and a pair of public/private SSH keys for logging in the console of the instance. For file storage, Eucalyptus uses Walrus as a storage system, instead of S3 from Amazon. 23

24 Chapter three EUCALYPTUS Amazon is often seen as a front runner in the commercializing of cloud computing. A number of universities formed a research cloud to provide a cloud computing environment for student researchers to develop new cloud computing techniques and applications. Although cannot be compared to Amazon, it gives the opportunities for students to test, understand and allow for the further development of cloud computing. One of the open sources out there attracts the researchers is Eucalyptus. Eucalyptus, stand for Elastic Utility Computing Architecture for Linking Your Programs To Useful Systems, is an open source software infrastructure for implementing cloud computing on clusters. Originating in the Computer Science Department at the University of California, Santa Barbara as a research project, the software is now maintained by Eucalyptus Systems a company founded by the original authors of the software to continue the effort. In a report from Eucalyptus, Amazon seemed to be the best documented of the available choices at the time Eucalyptus began the development and also the most commercially successful so they chose to implement it as their interface. The current interface to Eucalyptus is compatible with Amazon's EC2, S3, and EBS interfaces that we have already mentioned details in the chapter two above. It means you can use some Amazon's tools, such as the Amazon's command line tool or Amazon developed Api, to interact with Eucalyptus directly. But there are some differences between the two interfaces. And the Amazon's client application is needed to be modified for use with Eucalyptus, which we will discuss further in the chapter of implementation. And the infrastructure is designed to support multiple client side interfaces. Because Eucalyptus and Amazon have the common interface, so the concepts of Amazon EC2 and S3 that we discussed are the same for the Eucalyptus too. In concrete, the following terms are used in Eucalyptus again, such as bucket and key for object storage, authentication for security request, elastic IP address, elastic block store, security groups and SSH key pair for login console. It is recommended for readers who haven't read these concepts before to have a look. So that in this chapter we only have to focus on the main architecture of Eucalyptus and see how they build the whole cloud for the elastic compute computing or to answer the question, what exactly are images and running instances, on what it runs so that instances can start from images on demand. 1. Architecture overview [13] First of all, we will have a little talk about public IP addresses thing to clarify the followed coming idea of Eucalyptus's architecture. Since public IP addresses are usually rare, and the security ramifications of allowing complete access from the public Internet can be daunting, administrators usually deploy clusters as pools of machines on private, unroutable networks with a single head node responsible for routing traffic between the worker pool and a public network. For example, an administrator might configure a single front end machine with a publicly 24

accessible IP address, while the node machines are connected via a private network. But node machines are behind a firewall and cannot be contacted from the outside world.

25 accessible IP address, while the node machines are connected via a private network. But node machines are behind a firewall and cannot be contacted from the outside world. In order to make all of these types of resources part of a single cloud, Eucalyptus reflects the hierarchical architecture as depicted in Figure 15, where the three hierarchical levels are shown. Illustration 10: Eucalyptus architecture [13] 1.1 Node Controller (NC) This is where the virtual machine instances are executed on the physical resources and is responsible for instance start up, inspection, shutdown, and cleanup. There are typically many NCs in a Eucalyptus installation, but only one NC needs to execute per physical machine, since a single NC can manage multiple virtual machine instances on a single machine. In more details, those instances are launched and managed by the underlying hypervisor installed on the NC. The currently version of Eucalyptus supports two hypervisors Xen and KVM. Simply says, they are virtual machine environment, possible to create an image from a specific system, to run instances from these images on demand and control them. So by using those hypervisors, Eucalyptus can launch an elastic computed computing on demand. For an interactive web service, the NC interface is described via a WSDL document that defines the instance data structure and instance control operations that the NC support, such as runinstance, describeinstance, terminateinstance, describeresource and startnetwork. An important operation interface is the describeresource operation, which reports current physical resource characteristics (compute cores, memory, and disk capacity) to the caller and the startnetwork operation sets up and configures the virtual Ethernet overlay. These operations' interfaces will call and forward the given command parameters to the hypervisor and let it control for the rest performances. 25

26 1.2. Clusters Controller (CC) A collection of NCs that logically belong to a single head node called Cluster Controller, where they get the requests and return reports to. The CC is responsible for gathering state information from its collection of NCs, scheduling incoming VM instance execution requests to individual NCs, and managing the configuration of public and private instance networks. The WSDL that describes the CC interface is similar to the NC interface, except that each operation is plural instead of singular (runinstances, describeinstances, terminateinstances, describeresources). When a CC receives a runinstances request, it performs a simple scheduling task of determining which NCs can support the incoming instance by querying each NC through describeresource and choosing the first NC that has enough free resources. The CC also implements a describeresources operation, however, instead of reporting actual physical resources available, this operation takes as input a description of resources that a single instance could occupy, and returns the number of instances of that type can be simultaneously executed on the NCs Cloud Controller (CLC) Each Eucalyptus has a single Cloud Controller. On one hand it is the entry point for users to interact with the whole system. On the other hand CLC is responsible for processing incoming user initiated or administrative requests, making high level VM instance scheduling decisions, processing service level agreements (SLAs) and maintaining persistent system and user metadata. The CLC contents several web services implementation. The services are configured and managed by an enterprise service bus (ESB) that publishes services and mediates handling of user requests while decoupling the service implementation from message routing and transport details. Note that CLC's services use the same concept rules like Amazon EC2. An example, it handles user requests and requires authentication for each request like EC2's request, persistent system and user metadata. The same for every running VM instances, a public/private SSH key pair is associated with a specific instance for users to login into the console, as it is for Amazon EC2. With this, the CLC implementation can function as an Amazon EC2. You can interact to CLC by using the Amazon's EC2 client tools in both SOAP messages and Query interfaces. Amazon provides a WSDL document that describes a Web service SOAP based client interface to their service as well as a document describing an HTTP Query based interface, both of which can be translated by the CLC user interface service into Eucalyptus. How these messages or query look like, are mentioned details in the implementation chapter. 2. Monitor and control 2.1. Users control [13] The diagram on the next page shows more details about services that CLC can handle. In additional is the support of some administrative tasks, such as adding 26

and removing users and disk images though a Web based interface. At any time, the administrator can find out which instances a user is executing and terminate them. 2.

27 and removing users and disk images though a Web based interface. At any time, the administrator can find out which instances a user is executing and terminate them Images in Eucalyptus To run in Eucalyptus, an image needs a Xen (or KVM) compatible guest OS kernel and, optionally, a RAM disk image. For example a Eucalyptus (using Xen) images are synonymous with images used by Xen, which are disk partitions. If you can use an image/kernel/ramdisk combination with Xen, then you should be able to get it to work with Eucalyptus. Note that all users may upload and register images to Eucalyptus (depending on access granted to them by the Eucalyptus administrator), but only the admin user may ever upload/register kernels or ramdisks. After an image is added, any user can run instances of that image. Administrators are able to temporarily disable or permanently remove the image. Finally administrator can also add and remove nodes from cluster controller s configuration. Illustration 11: Overview of services that comprise the Cloud Controller. Lines indicate the flow of mes sages where the dashed lines correspond to internal service messages. [13] 2.3 Instances Control Persistent metadata in Eucalyptus is managed by a component of the CLC named the VmControl. On one hand this component continuously maintains the state of our resources, such as number of instances each CC could potentially create. Other hand, when user requests to launch an instance, it coordinates with associated image, keypairs, networks, and security groups to validate and resolves user request. Messages are then sent to the CCs involved in the allocation. The involved CC takes care of scheduling the request to its controlled NC, which creates the VM instance and respond accordingly. In Eucalyptus user can also choose an availability zone, upon which their instances 27

28 runs. Each availability zone corresponds to a single cluster within the Eucalyptus cloud. The advantage is that the networking within a single availability zone can be made much faster. The term availability zone of Amazon and Eucalyptus are similar in order for allocations to separate the zones and reduce the chance of correlated failure. Under Eucalyptus, each availability zone is restricted to a single cluster where at Amazon, the zones are much broader. 2.4 Service level agreements Control Service level agreements (SLAs) are implemented as extensions to the message handling service which can inspect, modify, and reject the message. The VmControl keeps the system state up to date. Each CC is passively polled to obtain the state of its instance availability, allocations, virtual network, and registered images. Information which gathered via polling is treated as ground truth and user requests, which are handled in transactions, that commit only when they are reflected on the current resources. 3. Eucalyptus Networking [13] In the cloud, VM instance network solution must address connectivity, isolation, and performance. Connectivity means every virtual machines in the same NC or in different NC under Eucalyptus control must be able to communicate with each other. But besides connectivities, the network also has to fulfill the isolation between instances. It is important because users are granted super user access to their provisioned VMs and they may have super user access to the underlying network interfaces. This ability can cause security concerns, in case that if two instances are running on one physical machine, a user of one VM may have the ability to snoop and influence network packets belonging to another. Note that current hypervisor do not support this Public interface The public interface is assigned for communication outside of a given set of VM instances. For example, in an environment that has available public IP addresses, these addresses may be assigned to VM instances at instance boot time. In environments where instances are connected to a private local network, and this local network has a router that supports external communication through network address translation (NAT). In this case, the public interface may be assigned with a valid private IP address given by the router Private interface The private interface is used only for inter VM communication across zones, where VM instances are running inside separate private networks (zones) but need to communicate with one another. Illustration 11 on the next page will make you clear that the instance s private interface is connected via a bridge to a virtual software Ethernet system called Virtual Distributed Ethernet (VDE). VDE is an Ethernet protocol, where users can specify and control virtual Ethernet switch and cable abstractions that are implemented as programs. When a Eucalyptus system is initiated, it sets up a VDE network overlay that 28

29 creates one VDE switch per CC and NC component and many VDE wire established between switches. The VDE switches support a spanning tree protocol, which allows redundant links to exist while preventing loops in the network. Illustration 12: Each VM instance is assigned a public interface for external network connections, and a private interface connected to a fully virtual Ethernet network for inter VM communication [13] At instance run time, the NC responsible for controlling the VM creates a new Ethernet bridge that is connected to the local VDE switch and configures the instance to attach its private interface to the new bridge. At this point, our requirement of instance connectivity is satisfied, because any VM started on any NC will be able to contact any other VM over the virtual Ethernet. Currently, Eucalyptus allows the administrator to define a class B IP subnet that is to be used by instances connected to the private network, and each new instance is assigned a dynamic IP address from within the specified subnet. Now the second requirement of the virtual network is network traffic isolation between instances. As mentioned from the beginning, we want that if two instances, owned by separate users, are running on the same host or on different hosts connected to the same physical Ethernet, they do not have the ability to inspect or modify each other s network traffic. To solve this problem, Eucalyptus simply uses the concept of a virtual local area network (VLAN). In VLAN every set of instances owned by a particular user is assigned a tag, inserted into every communicated frame header, that is then used as an identifier assigned to that user s instances. VDE switch ports then only forward packets that have the same VLAN tag. So a set of instances will only be forwarded traffic on VDE ports that other instances in the set are attached to, and all traffic they generate will be tagged with a VLAN identifier at the virtual switch level, thus isolating instance network traffic even when two instances are running on the same physical resource. Illustration 18 shows how two instances owned by user A and user B running on the same physical resource are connected to the VDE network through ports 29

Eucalyptus Networking Mode [14] The CC can be configured to set up the public interface network in three ways corresponding to three common environments that Eucalyptus currently support: SYSTEM,

30 configured to only forward traffic based on a particular VM s assigned VLAN. Illustration 13: If two different user instances (A and B) are running on the same resource, VLAN tagging is used for isolating network traffic between VMs. 4. Eucalyptus Networking Mode [14] The CC can be configured to set up the public interface network in three ways corresponding to three common environments that Eucalyptus currently support: SYSTEM, STATIC and MANAGED mode. Illustration 14: System mode needs an external DHCP server assigns IP for VM instances dynamically when it starts. Running instances are connected to Front End through a software bridge br0 [15] System mode This instructs Eucalyptus to attach the VM s public interface directly to a software Ethernet bridge (br0 on example). This software bridge is connected to the real physical machine s network, allowing the administrator to handle VM network 30

31 DHCP requests the same way they handle regular DHCP request. VM instances typically obtain an IP address using DHCP, the same way any machines on a local network using DHCP would obtain an address. So in this mode, we must have an external DHCP server setup that has a dynamic pool of IP addresses. This DHCP server assigns IP to each instance on NC when it starts. System mode is only most useful for users who want to test Eucalyptus on their laptops/desktops, since the requirement of isolation between instances is not held Static mode This mode offers the Eucalyptus administrator more control over VM IP address assignment. Here Eucalyptus doesn't reply on an external DHCP server, it installs it own controlled DHCP server and the administrator configures Eucalyptus with a 'map' of MAC address/ip Address pairs. When a VM is instantiated, Eucalyptus sets up a static entry within a Eucalyptus controlled DHCP server, takes the next free MAC/IP pair, assigns it to a VM. Now similar to system mode, Eucalyptus will then attaches the VMs Ethernet device to the physical Ethernet through a software bridge. This mode is useful for administrators who have a pool of MAC/IP addresses that they wish to always assign to their VMs. Note that running Eucalyptus in system or static mode disables some key functionality such as the definition of security groups, elastic IPs in Amazon EC2 and the isolation of network traffic between VMs mentioned in the private interface above. And finally is the availability of the meta data service (use of the URL to obtain instance specific information) Managed mode The third networking configuration, managed mode, has the most features of the three modes. In this mode, as with the static mode, the CC installs its own DHCP server allows the administrator to define a network. The administrator also defines an interface on the CC that is connected to this network, and a range of IP addresses that are dynamically assigned IP addresses for instances as they start. Eucalyptus users can define a number of named networks, or security groups, to which they can apply network ingress rules to any VM that runs within that network. When a user runs a VM instance, they specify the name of such a network that a VM is to be a member of, and Eucalyptus selects a subset of the entire range of IPs that other VMs in the same network can reside. A user can specify ingress rules that apply to a given network, such as allowing ping (ICMP) or ssh (TCP, port 22) traffic to reach their VMs. This capability allows Eucalyptus expose a capability similar to Amazon's security groups. In addition, the administrator can specify a pool of public IP addresses that users may allocate, then assign to VMs either at boot or dynamically at run time. This capability is similar to Amazon's Elastic IPs. Eucalyptus administrators that require security groups, elastic IPs, and VM network isolation must use this mode. 2. Why Eucalyptus? Compare to other cloud open source systems Firstly, I would like to begin this discussion by acknowledging the strong advantages of Eucalyptus. It was designed to be as easy to install and as non 31

32 intrusive as possible. Coming together with various distribution systems such as Ubuntu, CentOS, OpenSUSE and Debian, Eucalyptus installation is easy. One of the important point is, the external interface to Eucalyptus is based on an already popular API developed by Amazon. So Eucalyptus can take advantages of having other using members coming from Amazon Web Services without any problems. People don't have to change their habits of use and their favorite programming languages. Amazon has several API libraries in different languages developed, such as PHP, Java, C#, Python and Ruby, and because Eucalyptus has the same common interface and concepts, which was already described in the chapter of Amazon concepts, so people can use these libraries again, they don't need to change anything much on their Client sides excepts for some small modifications, to work with Eucalyptus immediately. And we can have further functions developed as our needs. In addition to commercial cloud computing mentioned at the end of chapter 1 above (Amazon, Google AppEngine, Microsoft Azure, Sun Grid) there are many more open source projects aimed at resources provisioning with the help of virtualization, such as AppScale, Enomalism and Nimbus. 2.1 AppScale AppScale is an open source implementation of the Google AppEngine (GAE) that enables users to deploy, test, debug, measure, and monitor their GAE applications prior to deployment on Google's proprietary resources. An AppScale image can be used to execute AppScale over 2 virtualized systems Xen and KVM, and also over 2 cloud systems Amazon AWS EC2 and Eucalyptus. AppScale can be uploaded and registered to Eucalyptus together with its associated kernel/ramdisk like every other normal images and you can execute AppScale after that. In short, people use AppScale to test for Google App Engine Applications using their own clusters with greater scalability and reliability than the Google SDK provides but it's not yet ready for production. 2.2 Enomalism [17] Enomalism can be considered as a turbogear, a Python based web server application that uses the libvirt library from Red Hat to manage multiple hypervisors. Enomalism currently supports many hypervisors such as the KVM, Xen, OpenVZ, Linux Containers and VirtualBox hypervisors and the Amazon EC2 service. One of the advantage of Enomalism, it allows you to manage your resources across any of its supported technologies from one dashboard. The dashboard is a web based interface which provides a real time analysis how all platform components are operating. It is very convenient that you can configure, and control virtual machines visually (see example snapshot below). It currently supports disk images creation from its own VMCast service, from here you can create your own disk images for common Linux distributions easily. 32

Illustration 15: Enomalism Dashboard [17] Against Eucalyptus, Enomalism officially has a visual control with full functionalities, especially the image creation for common Linux distributions.

33 Illustration 15: Enomalism Dashboard [17] Against Eucalyptus, Enomalism officially has a visual control with full functionalities, especially the image creation for common Linux distributions. Eucalyptus doesn't have such a visual control. And the most annoying thing to use Eucalyptus, a user has to create an image by himself, for example using a utility like VM builder. (Just in case he doesn't want to use the prebuilt images provided by the administrator). Eucalyptus let the users develop Client Controller on their own by using the same API from Amazon and they have more ways to extend functionalities rather than with Enomalism. Enomalism has it own Restful API. 2.3 Nimbus [18] Nimbus is another interesting cloud computing infrastructure with several components. They call their cloud controller the workspace service, which publishes informations about each workspace (node) available for the user. Nimbus currently works only on Xen but KVM will come up soon. Nimbus is different from Eucalyptus, it has two kinds of interfaces: one is based on Web Service Resource Framework (WSRF). It is indeed Web Services and state management (WS Notification) protocol, which comes with their own Client tool (similar commands to the EC2 client). And the second supported interface is EC2, but it only has a few functions supported: describe images, run/describe/terminate/reboot instances and add/delete key pair. The rest EC2 functions are unsupported. In summary, Eucalyptus comes up with an easy installation from several distributions and it's like an emulator of Amazon. For the requirements of UHE, we need a container, where a virtual image is registered and instance run on a network overlay that isolates network traffic of different users. (since I don't see this as a promise in other open source clouds above). The image may be an AMI coming from a strong cloud provider Amazon. And Amazon users can interact with Eucalyptus like they always do with Amazon. About the poor Client GUI of Eucalyptus, there are visual client tools, such as Cloud42, Elasticfox and especially EC2Dream developed recently which provide all EC2 and S3 functionalities for Eucalyptus. EC2Dream tool also has a very detailed instructions with pictures for Eucalyptus's users to begin with. 33

Chapter four HYPERVISOR As we've already known right from the previous chapter, Eucalyptus lets a virtual machine monitor installed on NC, which takes control over the life cycles of instances, from

34 Chapter four HYPERVISOR As we've already known right from the previous chapter, Eucalyptus lets a virtual machine monitor installed on NC, which takes control over the life cycles of instances, from create images, launching and control the instances to terminate it on demand. Those so called virtual machine monitors are hypervisors, a computer software/hardware platform virtualization software that allows multiple operating systems to run on a host computer concurrently. As Eucalyptus supports two hypervisors for their instances' control, KVM and Xen. Amazon uses Xen. Understanding more details about Xen and KVM, this will help administrators manage the compatibilities between the two systems Eucalyptus and Amazon, since the EMI (Eucalyptus Machine Image) and AMI (Amazon Machine Image) are created from KVM/Xen and Xen hypervisor respectively. In this chapter we will go further about hypervisor on details. It won't go deep about the technologies code insides the hypervisor but will give you a necessary knowledge for understanding further steps in the implementation chapter. Besides there are several types of hypervisor which are different from technologies, that readers have to be clear in order to get a choice of hypervisor on the market places to fit their needs. We will discuss about it now right away. 1. Differences between various hypervisors [19] For management parallel OS systems on a machine, a virtualization layer is added between the hardware and operating system as seen in Figure below. This virtualization layer allows multiple operating system instances to run concurrently within virtual machines on a single computer, dynamically partitioning and sharing the available physical resources such as CPU, storage, memory and I/O devices. Illustration 16: Virtualization layer [19] The x86 architecture offers four levels of privilege known as Ring 0, 1, 2 and 3 to 34

35 operating systems and applications to manage access to the computer hardware. While user level applications typically run in Ring 3, the operating system needs to have direct access to the memory and hardware and must execute its privileged instructions in Ring 0. Virtualizing the x86 architecture requires placing a virtualization layer under the operating system to create and manage the without virtualization virtual machines that deliver shared resources. Illustration 17: without (left) and with Binary virtualization (right) [19] 1.1 Full Virtualization using Binary Translation This virtualization type uses a combination of binary translation and direct execution techniques. The hypervisor layer translates all Guest operating system instructions on the fly to new instructions that have the intended effect on the virtual hardware and caches the results for future use, while the user level instructions run unmodified at native speed. Note that the guest OS is not aware it is being virtualized and requires no modification. Full virtualization offers the best isolation and security for virtual machines, and simplifies migration and portability. But a very big disadvantage is of course the overhead of binary translation. VMware s virtualization products and Microsoft Virtual Server are examples of full virtualization. 1.2 Paravirtualization and XEN There are hypervisors like Xen uses a form of virtualization known as paravirtualization. In short, the Guest operating system must be modified before use with the virtualization layer. Paravirtualization involves modifying the Guest OS kernel to replace non virtualizable instructions with hypercalls that communicate directly with the virtualization layer hypervisor. The hypervisor also provides hypercall interfaces for other critical kernel operations such as memory management, interrupt handling and time keeping. Paravirtualization is different from full virtualization, because the Guest OS knows that it is virtualized. The advantage vs full virtualization is significant lower overhead. For the fact that paravirtualization requires a modified OS Kernel. This is not a problem since a hypervisor like Xen get xenified Kernel support from various OS systems like Debian, OpenSolaris, OpenSUSE and Fedora... But since paravirtualization cannot support unmodified operating systems and the Windows 35

system cannot be modified, so it cannot help in virtualize a Windows system by using this technology. To solve this problem, hypervisor Xen (version after 3.

This kind of virtualization will be describe next. Illustration 18: Paravirtualization [19] 1.3 Hardware Assisted Virtualization and KVM [19] [20] Now comes the hardware vendors in play.

36 system cannot be modified, so it cannot help in virtualize a Windows system by using this technology. To solve this problem, hypervisor Xen (version after 3.0) introduces the capability to run Microsoft Windows as a guest operating system unmodified if the processor supports hardware virtualization provided by Intel VT or AMD V. This kind of virtualization will be describe next. Illustration 18: Paravirtualization [19] 1.3 Hardware Assisted Virtualization and KVM [19] [20] Now comes the hardware vendors in play. They develop new features to simplify virtualization techniques. Both Intel and AMD support virtualization instructions in hardware as Virtualization Technology (VT x) and AMD V respectively. KVM (Kernel based Virtual Machine) is a full virtualization solution for Linux on x86 hardware which takes advantage of this hardware feature. Following shows the KVM based architecture. Under this model every virtual machine is considered as a regular Linux process, scheduled by the standard Linux scheduler and they run fast as closed to native. A KVM based system's privileged footprint is truly small, only a host kernel plus a few thousand lines of the kernel mode driver but have unlimited hardware access. Using KVM, one can run multiple virtual machines running unmodified Linux or Windows images. Processors with Intel VT and AMD V became available in 2006, so only newer systems contain the hardware assist features. And some Laptop production vendors like VIAO disables this VT hardware feature supports. Illustration 19: KVM based architecture [20] 36

2. Hypervisor Xen and its networking [21] The following part describes how the network inside a Xen environment looks like when Xen starts up its running instance.

37 2. Hypervisor Xen and its networking [21] The following part describes how the network inside a Xen environment looks like when Xen starts up its running instance. It is necessary to understand these networking rules before we go the implementation chapter of this writing. There is a lot of confusion understanding Xen networking. I've attached the diagram below that may help explain it better for some Xen beginners. A Xen system has multiple layers, the lowest and most privileged of which is Xen itself. Xen may host multiple guest operating systems, each of which is executed within a secure virtual machine. In Xen terminology, each running instances is called a domain. Domains are scheduled by Xen to make effective use of the available physical CPUs. Each guest OS manages its own applications. The first domain, domain 0, is created automatically when the system boots and has special management privileges. Domain 0 builds other guest domains (called domu) upon it and manages their virtual devices. It also performs administrative tasks such as suspending, resuming and migrating other virtual machines. And finally you have to put in mind that every dom is connected to a centralized software bridge (for example xenbr0 on the diagram below). Xen creates, by default, seven pairs of connected virtual ethernet interfaces for use by dom0. For example veth0 is connected to vif0.0, veth1 is connected to vif Think of them as two ethernet interfaces connected by an internal crossover ethernet cable, which one end (example, vif4.0) is attached to the central bridge and the other end is the interface within domu that you can configure its IP and MAC address, as you always do when you mount an image into a directory and configure the interface inside of it under /etc/network/interfaces Illustration 20: Xen networking. Privileged dom0 manages all domu upon it. All doms are connected through a bridge xenbr0 to outside world. Peth0 is a real ethernet physical card of your machine. [21] 37

38 When Xen starts up, it runs the network bridge script, which: 1. Creates a new bridge named xenbr0 * 2. real ethernet interface eth0 is brought down 3. The IP and MAC addr of eth0 are copied to virtual network interface veth0 4. Real interface eth0 is renamed peth0 5. Virtual interface veth0 is renamed eth0 6. Peth0 and vif0.0 are attached to bridge xenbr0 7. The bridge, peth0, eth0 and vif0.0 are brought up When a domu starts up, Xen (running in dom0) runs the vif bridge script, which attaches vif<id#>.0 to xenbr0 and vif<id#>.0 is brought up * Please notice that in Xen 3.3, the default bridge name is the same as the real physical interface it is attached to. So it will be eth0 instead of xenbr0. The real physical interface eth0 is renamed to peth0 as usual. 3. Summary, Xen or KVM? In summary of this chapter, we should remember about two types of virtualization, paravirtualization and hardware assist virtualization. While hardware VT needs your machine to have hardware supported (Intel VT or AMD V), paravirtualization doesn't. But paravirtualization has to have a modified OS kernel to run with. We also talked about Xen and KVM, which are hypervisors that use paravirtualization and hardware VT respectively. In testing KVM has been consistently more stable and easier to configure than any other hypervisor. It does not require special kernels, and therefore does not suffer from device support issues. The downside to KVM is that it requires virtualization instructions built into the CPU to function correctly, otherwise it falls back to Qemu mode which is significantly slower. If your CPU supports virtualization (most modern CPU's do). It is highly recommended using KVM as your hypervisor. 4. The roles of hypervisors in Cloud computing system Look into the structure of Eucalyptus in chapter 3, Eucalyptus can be considered as a collection of many node machines, a head node controller. Hypervisor is installed in node machines. And alone itself, hypervirsor can take control over the life cycles of instances. Furthermore hypervisor like Xen and KVM can perform additional monitoring tasks to get all resources' information from all the instances they control. We'll take a closer look on some command examples that Xen can perform in the following table. 'xm' commands Description create Create a domain domu (launch an instance) destroy Terminate a domain immediately list List information about all/some domains. top Monitor a host and the domains in real time. Above are only some examples which shows that alone a hypervisor like Xen can manage all the life cycles of instances already. Call 'xm help' for more functions. So Eucalyptus is only a central connection, which on one hand receives 38

performance tasks from users through a service interface and gives it further to a concrete node machine, whereby the hypervisor will continue the rest of the task.

39 performance tasks from users through a service interface and gives it further to a concrete node machine, whereby the hypervisor will continue the rest of the task. In a concrete example with Xen (Illustration below), after Xen start instances, every instances domu together with dom0 are connected through a software central bridge (named eth0). Again this bridge is connected to the real physical interface (peth0) on the machine. Through this, the local network on a Node machine will connect to the out side world. From this point, there are various kinds of network of the Eucalyptus cloud architecture you can deal with. If you've already read it from the Eucalyptus networking section, you can have an external DHCP server to dynamically assign IP address for the whole local network on the nodes. Or if you choose a more advanced networking mode, a managed mode, you can have a DHCP server which is installed on the Cluster node, which reserves a range of IP addresses that are dynamically assigned IP addresses for instances as they start. So this mode will give you more control over the IP assignment for each instances on a Node and the isolation between instances. Illustration 21: Hypervisor Xen in Eucalyptus structure [Tri Vo Hoang] The illustration above shows, a Xen hypervisor is installed on a node machine. As one part of Eucalyptus, the node controller interact with its installed hypervisor on the same machine and gives these operations as a web service to the above controller, the cluster controller. In summary, hypervisor plays a very important role in the cloud structure, not only by Eucalyptus but also by Amazon and other cloud system. It is considered as a heart to control all local instances on node machines. The cloud controller node is a head for giving performance tasks to its collected node machines. 39

40 Chapter five INTEGRATION EUCALYPTUS & UHE This chapter will show you how to integrate Eucalyptus into a unified hosting enviroment (UHE), so that they can work together to fulfill our requirements: deployment, monitor and adaption. But first, of course, we have to understand the UHE model, what it is for and how its concepts are all about. After that we should mention about other options rather than using Eucalyptus with UHE and why we choose Eucalyptus for this purpose. And finally an integration solution model is described in details before we can go to the implementation chapter. 1. Unified Hosting Environment 1.1 Motivation Imaging the world of web services is not just as simple as one programming language and one service description model. It is indeed heterogeneous, which includes various programing language from Java to C#, different execution models as well as packaging formats from Axis,.NET Archive, Bpel, OSGi... Or for the purpose of this writing, the service is even packed into a virtual machine image, ship it to the host and launch on demand. But for the service providers, this variety causes much more difficult to offer the best hosting and availability guarantees for any service. For an example with a simple task of listing all the services you have, is not easy. A Unified Hosting Environment (UHE) is an environment which is defined as a composite consisting of various service platforms. All above services can be deployed in one environment dynamically and then they can interact with the administration controller in terms of deployment, monitoring and adaption. 1.2 UHE Concepts [22] In General, for deployment, a service package is routed to its associated container. The packages are assumed to have two properties: self contained and self described. Self contained as can often be seen with servlets and associated libraries in WAR packages, or to rely on a dependency resolution and configuration mechanism as is the case with LSB package managers or OSGi bundle loaders. As in my case, a virtual image is indeed a root of files system, including all necessary running software dependencies for its web service. Once starting, the web services are ready for use. So it is considered as self contained. Second, the self described property of the package is the configuration of requirements that a service needed to run, this will make the assumption of dependency resolving of the container on the package deployment easier. So if any package is not self contained, then the UHE must be dependency resolving, otherwise the service will not run [22]. If a package is not self described, then the UHE must meet all its implicit requirements, otherwise the service will not run either. For deployment UHE needs to deal with these properties, which are shown again in the table below. Most of the service packages don't ship with their containers and dependency libraries. But their requirements can be analyzed and supported by UHE. Table: Properties of service package 40

41 Package type Self contained 1. OSGi bundle possibly 2. Axis WAR possibly 3. Webapp/DEB no 4. System/RPM no 5. ODE BPEL archive no 6. VM image yes Properties of service packages [22] 1.3 Design and architecture Self described yes no yes yes yes no [23] Puq is a Python implementation of UHE and is structured into three modules for deployment monitoring and adaptation of services. Each module depends on a specific Container Adapter between itself and the respective service container. But in mean time, only the deployment module is available, whereas monitoring and adaptation only exist on the concept level. So I haven't seen the Container Adapter in real codes. In this writing, I only want to focus on the deployment module and describe the main design of it to answer the question, how a service package is analyzed and deploy into the system. It is useful for the later section of this chapter, which draw out a model for deployment of a virtual machine image into the UHE system. The deployment begins by passing the location of the service package as a parameter to a python script, a package analyzer. From here the service package is analyzed for its package type. This job mostly depends on matching the extension of the package type and some special files containing in it, if it is a zip package. The deployment script also looks for an existing wsdl file inside the package and parse it for some interesting informations like service name. From now on, it will call the associated handler for every concrete package type. Every handler will have it own handling jobs, for example, if the package is an Axis WAR Archive, the handler will copy the Axis WAR Archive package into its respective container, Tomcat. Information on service description wsdl, which extracted from the package before, will also be registered with the UHE's database, named ConQo, after it was modified its service endpoint for the concrete hosting environment. The data stored in ConQo will be used for further monitoring and adaption purposes. Illustration 22: Puq deployment module [23] 41

3. An integration solution model for virtual images in UHE 3.1 Deployment The following diagram shows my solution model for deploying a virtual image into UHE.

42 3. An integration solution model for virtual images in UHE 3.1 Deployment The following diagram shows my solution model for deploying a virtual image into UHE. First, to handle the virtual image package, we'll use our Eucalyptus system for deploying and controlling the life cycle of images: register, run, monitor, and terminate. Second, UHE will act as a coordinator for all tasks. Like other service packages, after the deployment of virtual images finished, the UHE holds the informations of the input service package such as service name and service description in the UHE's database ConQo. Now, dealing with virtual images, we have additional information to store in the database: the imageid of the service package, which was first registered and returned from Eucalyptus. By knowing the imageid registered by Eucalyptus, the UHE can perform several tasks for later monitoring and adaption of that virtual image with Eucalyptus. Following are the deployment process in details. Illustration 23: 1 4 deployment steps 1. Handling like other service packages, the virtual image is passed to the deployment script as a parameter. The deployment analyzer will mount the virtual image into a loop device then search for an existing wsdl file in the image. Note that administrator has ability to configure a default loop device for images to mount in and user can locate a directory path to his wsdl to shorten the search process. If not, the script will search for the entire image directories. After having wsdl found, the script will parse for its service name information inside the wsdl file. 42

43 2. The image is now registered to Walrus. Eucalyptus uses Walrus for storing all data. The concepts of managing image and data are the same as Amazon Storage S3, which we have already mentioned in the chapter of Amazon. 3. After registered successfully, we'll become an imageid of our registered virtual image in return. 4. Like other service package, the service description is modified with a new service endpoint. For the case of virtual image, administrator can allocate an elastic IP address associated with a user account from a IP pool. This feature has been mentioned in the concept of Amazon EC2 but I may write it down here again. As a feature of Amazon and Eucalyptus, every user account has a static IP address and we call it an Elastic IP Address. A user can later assign his whatever running instances with this elastic IP address by himself. The advantage of it, for example, if the instance falls for failures, users can simply start another instance from image, assign it with his static IP address and thus his elastic IP address doesn't change, the service endpoint won't change for the outside perspective. (Please read section 2.1 of the last chapter for details) Now again in the step 4, the service endpoint in the service description is modified with a given elastic IP address owned by a user account. After that, imageid together with service name and service description are now stored into the ConQo database. 3.2 Adaptation with Eucalyptus As we've already known from the previous chapters, Eucalyptus gives out a web service interface for users to interact with the cloud, from run an instance, to login or mount a loop device into the running instance, save changes to another image, backup in a volume and terminate it. Both users and administrators have the possibilities to control all the life cycle of an image. In the deployment process, an input virtual image has been registered to Eucalyptus and we stored the imageid inside our database of UHE together with its service name and service description. So in steps of the above illustration shows, user/administrator can contact the database and search for the web service he wants to run, UHE will start its associated imageid and the running instance is served to users immediately. For monitoring and adaption, both administrator and users can perform his jobs parallel. They have several possibilities to interact with Eucalyptus, either via a command line tool (Amazon EC2 Command line tool or Eucalyptus euca2ool) or a Client GUI. Client GUI can be designed by developers by using Amazon Library API in different languages like Java, Python and PHP. I also implement a demo client. It is based on PHP Amazon API to interact with Eucalyptus. How to send a request to Eucalyptus or how to use the Client API and a list of all request operations are described in section B.3 and 4 of the next implementation chapter. Now for administrators, he can specify 5 default options of using resources for a launched instance. A launched instance can be set with a numbers of using CPUs, how much RAM is given for this instance and its maximal size on the node machine. This configuration is on the administration management web site ( localhost:8443/) and below is a snapshot. 43

Illustration 24: VM Types, a snapshot from Eucalyptus administrator's management Site When running an instance, users can choose which kinds of default resources he wants to give to his instance with

44 Illustration 24: VM Types, a snapshot from Eucalyptus administrator's management Site When running an instance, users can choose which kinds of default resources he wants to give to his instance with the t parameter. #ec2 run instances emi XXXXXX k mykey kernel eki XXXXXX t m1.small The example above choose the 'm1.small' resource option, which uses 1 CPU, 128 Mb RAM and can grow up to 1 Gb disk space. 3.3 Monitor with Eucalyptus So far to the current version of Eucalyptus, this cloud controller supports various interface functionalities for users and administrators to inspect the status of the instances. These monitoring functionalities, like the adaption above, can be called via a command line or via an Amazon library API. But until now Eucalyptus only permit users to have some monitoring functions, such as listing all the running instances and the status of them (pending, running, terminate), instance's owner, instance's location, when it was started and its public/private/elastic IP address. Following are some responded informations which interests the administrators for monitoring instances. #ec2 describe instances ownerid imageid kernelid, ramdiskid instancestate launchtime ipaddress platform Access Key ID of the user who owns the reservation. Image ID of the AMI used to launch the instance. Kernel and Ramdisk ID associated with instance. running or pending or terminate? The time the instance launched. IP address of the instance Platform of the instance (e.g., Windows). But for more monitoring functionalities, such as the assumption of resources on 44

45 instances (CPU/RAM usage in %), the current version of Eucalyptus has nothing with it. They plan to implement these in version 1.6. They still don't have a fix released time. But it will come out maybe in the end of the year In fact, the whole Eucalyptus can be summarized as a Web Service. As we already know from the chapter three, there are three components of this cloud. The Cloud Controller on front end is a web service written in Java, which handles distribution of VM instance control events from users to clusters of resources. The Cluster Controller is based on C language and continues to handles distribution of control events to individual of resources on every node machines it controls. Finally the node controller commands and interacts directly with a hypervisor to perform the tasks and gives these functionalities as its web service to the cluster controller. All of the three components are Web Services and they have their own web service description. The main heart of this system is the hypervisor. So if they (Eucalyptus programmers) want to build an extra monitoring or adaption functionalities, they will go the same way, they'll build extra web services for the node controller. In mean time, Eucalyptus still doesn't publish the news when they will release their version 1.6, which has functionalities that makes monitoring the resources on the node machines easier and more global. Since Eucalyptus web services is hard coded so we cannot modify it for our extra web service. But if we want to have extra monitoring functionalities of resources right now, the hypervisor Xen has a XenAPI written in Python, which enables users to call to Xen remotely via XMLRPC protocol. This API calls can be issued over two transports: SSL encrypted TCP on port 443 (https) over an IP network or plaintext over a local Unix domain xen socket. The instruction on how to use this API is poor and unclear documented. But I will take you to it on section B.2 in the next chapter. You will see that we can get all information collected by Xen on all its controlled instances, including the usage status of CPU/RAM in %, display blocked, wait and executed average time for each domain. It can also setup a running domain with a maximal memory. I have also searched for a KVM API but I couldn't find any, furthermore I cannot test on it if there is one does exist. As promised the next chapter will be about installation and protocol details. The implementation of the above deployment steps are in that chapter under part B, section 1. From now the theoretical concepts ends, so please read back if there are any unclear steps during the installation and implementation instructions. 45

46 Chapter six INSTALLATION AND IMPLEMENTATION In this chapter I will guide you through many steps for installation of one Ubuntu system which consists the Eucalyptus as the cloud computing architecture, together with Xen as a hypervisor to control the life cycles of running instances. Talking about why I choose to install Xen instead of KVM. It's simply because my VAIO laptop doesn't support a Hardware Assist Virtualization, as this technology only became available from KVM is much more easy to install and you do not encounter as many problems and bugs as when you use Xen, since Ubuntu from its Intrepid version does support a Xen installation package but not a xenified kernel for Ubuntu. And from the previous chapter you may know that Xen needs a modified OS kernel to work with, as it uses the paravirtualization concept. So this is the main problem. Besides Eucalyptus is designed to be installed, where Cluster Control is on a different machine from Node Control, so if you put them all in one machine, the networking setting will be a little confused. And more important that you can only setup the system mode for Eucalyptus networking. After the installation steps, we'll go to deploy a virtual image into UHE. And then we will explore in details about the differences between Eucalyptus and Amazon from message protocols and images usage, since they have a common interface and concepts developed but there are some changes insides. Together with theses notices of differences, a small implementation of a Client Interface, which uses an Amazon API, modified for Eucalyptus will be shown. At the ends is a description about all web service's operations, supported by both EC2 and Eucalyptus, which should be useful for monitoring and adoption for further developed project. PART A. INSTALLATION 1. Xen Installation [24] Both Eucalyptus and Xen don't have a well documented instructions to install Xen. Eucalyptus has KVM installed automatically during its installation, but for Xen we have to install and make it works manually. Following instructions are what people teach each other on a mailing list. This hypervisor works independently from Eucalyptus, so I recommend that you should install Xen first for isolating and solving the errors easier, if any occur. Only continue to test for Eucalyptus after you can create and run an instance from Xen independently. Before entering this section you can use KVM if your CPU supports it. First check if your CPU supports hardware virtualization if this is the case, the command egrep '(vmx svm)' color=always /proc/cpuinfo should display something, e.g. like this: #egrep '(vmx svm)' color=always /proc/cpuinfo flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr... If nothing is displayed, then your processor doesn't support hardware virtualization, and you must follow with Xen. Now Xen needs a modified OS kernel to start with. It's true that Ubuntu Intrepid and the currently version Jaunty do not include a 46

47 dom0 linux kernel, but it does still include a Xen 3.3 hypervisor installed package. According to Evan Broder3, this is likely just a workload issue; Intrepid runs a linux kernel version which Xen doesn't explictly support, and so the Ubuntu kernel team would have had to forward port the Xen patches from This sort of unsync will go away when Xen gets into the mainstream kernel, which may actually happen soon. But for now some people have found out a way around to solve this, we can use a xenified Debian dom0 kernel for Ubuntu. 1. Install xen on node machine apt get install python2.5 ubuntu xen server 2. Download kernel package You can download the package from here or from my writing repository Kernel: image xen 686/download Modules: modules xen 686/download 3. Exact the two downloaded files using sudo dpkg i linux image xen 686_ lenny2_i386.deb linux modules xen 686_ lenny2_i386.deb It will generate 2 files: the xenified Debian kernel and ramdisk in the currently directory and also automatically add the following entries in /boot/grub/menu.lst title Xen 3.3 / Ubuntu 9.04, kernel xen 686 uuid e75b035c f49f 4e3d a57d 6e823aac9428 kernel /boot/xen 3.3.gz module /boot/vmlinuz xen 686 root=uuid=e75b035c f49f 4e3d a57d 6e823aac9428 ro console=tty0 module /boot/initrd.img xen 686 quiet These entries configure that you can boot into Xen dom0 environment from the startup. As you noticed on the 2 lines begins with module. The first module and second module line points to our two generated files from above, which are kernel and ramdisk respectively. If the utility grub update command doesn't create theses entries automatically then you have to do it manually. Just to make sure every lines point to the correct files directory. 4. Getting Dom0 networking running The key to getting the network running is #removing the Ubuntu network manager apt get remove network manager then update the file /etc/network/interfaces to look like this: auto lo iface lo inet loopback #software brigde eth0 for instances to connect to auto eth0 iface eth0 inet dhcp /+bug/

48 This should work fine for most people who have DHCP server running which is common in both a corporate environment and an environment where you have an ADSL router running. 5. Restart the computer to xen dom0 environment Choose the line with the title Xen 3.3 / Ubuntu 9.04 from the startup. Since you booted into Xen's environment, Xen has configured his network as described in the section Xen networking of chapter 4. In summary, the real physical interface is now peth0 (renamed from eth0). It connects to an additional software bridge eth0. This is the bridge that every running instances' interfaces should bind to. 2. Launch an instance from Xen [24] Before running Eucalyptus, we have to make sure that Xen works probably. Eucalyptus already has a small pre built image that we can download and use it to test for launching an instance. 1. Download location: File to download: euca ttylinux.tgz (or from my installation repository.) 2. Unpack the package The package contains an image and associated kernel. This image doesn't require a ramdisk to start. 3. Configure image interface First make sure the image interface is configured in the right way, you can mount it into a directory. Its interface should look like this, for example. # network device name INTERFACE="eth0" # set to "yes" to use DHCP instead of the settings below DHCP="yes" 4. Create a text file, named xen.cfg with the following content #path to the associated kernel in the downloaded package. kernel = "/root/ttylinux/vmlinuz xen" #Ram size memory = 128 #instance's name name = "ttylinux" #connected the instance's interface to bridge eth0 vif = [ 'bridge=eth0' ] dhcp = "dhcp" #assign instance with a static ip address (*) ip=' ' gateway=' ' netmask=' ' #path to the used image. disk = ['file:/root/ttylinux/ttylinux.img,xvda1,w'] root = "/dev/xvda1 ro" extra = 'xencons=tty' (*) Note when assign instance with a static ip address. Here I encountered a problem, when launching this instance by the way that I let a DHCP server assign it an IP address dynamically. And Xen sometimes could not get it, so the command 48

49 just stopped by an output DHCP pending... If you encounter this problem too, then assign the instance with a static IP address is recommended. 9th. line: path to the used image. 5. Create a template for Ubuntu Jaunty based on Debian Edgy Before we can use the xen console (xm create c domain or xm console domain) we have to update our Ubuntu. Because our Xen are running on a Debian kernel and note that the virtual console of this kernel is found on /dev/hvc0 therefore if we want to be able to use the xen console [25], we must run a getty on /dev/hvc0 by following: #copy all files in edgy.d to a new folder jaunty.d. cd /usr/lib/xen tools sudo ln s edgy.d jaunty.d gksu gedit /usr/lib/xen tools/jaunty.d/30 disable gettys Then we have to edit the file 30 disable gettys in the new folder jaunty.d : Open the text file named fix_getty.txt in my installation folder from the repository and include it to the opened file 30 disable gettys (The include text is a little bit long for putting it here) 6. Launching an instance #launch an instance using our configuration file above xm create c /root/xen.cfg If everything's fine, you can #list all the instances dom of Xen xm list Or #connect to instance named ttylinux xm console ttylinux 3. Eucalyptus Installation [26] This section will guide you to install Eucalyptus in system mode with all components in one machine. For more details of Eucalyptus networking please read the section 3 of chapter 3. Note 1. In a basic Eucalyptus setup, the system is composed of two machines (a front end and a node), where the front end runs both Cloud and Cluster controller and the Node machine runs Node Controller. Node is where a hypervisor (Xen or KVM) is set up. In system mode we need an external DHCP server, which assigns IP addresses to every launching instances. The structure is illustrated in the diagram below. Note 2. It is also possible to install both Front end and Node together in one machine, although it may cause some networking problems. Note that system mode is only for testing since there are some security features which is only supported by managed mode. 49

Illustration 25: Eucalyptus System mode [15] Note 3. If you wish to access Eucalyptus from behind a firewall (i.e. the EC2 and AMI tools and the cloud will be on different sides of a firewall) then port 8773 must be open.

50 Illustration 25: Eucalyptus System mode [15] Note 3. If you wish to access Eucalyptus from behind a firewall (i.e. the EC2 and AMI tools and the cloud will be on different sides of a firewall) then port 8773 must be open. Additionally, if you plan to register your Eucalyptus installation with a cloud management platform, 8773 and 8443 must be open. Note 4. If you use KVM instead of Xen. KVM is installed together with Eucalyptus automatically, so you don't have anything more to deal with hypervisor installation. Installation steps: 1. Boot into Xen dom0 environment. We will install Eucalyptus into dom0. (If you use KVM, since it runs directly on hardware, it is not required to boot into a modified kernel). I recommend to configure Xen (/etc/xen/xend config.sxp) with these settings and restart. (xend http server yes) (xend unix server yes) (xend port 8000) (xend unix path /var/lib/xend/xend socket) (xend address localhost) (network script network bridge) (vif script vif bridge) (dom0 min mem 196) (dom0 cpus 0) (vncpasswd '') 2. Install the packages Ubuntu provides complete Eucalyptus installation packages. So if you install Eucalyptus on Ubuntu, it fulfills all prerequisites automatically. (*) For us front end and node machine are on the same computer. #install cloud and cluster on front end machine apt get install eucalyptus cloud eucalyptus cc On the node machine #install node controller on node machine apt get install eucalyptus nc 3. Networking configuration We have already configured our network from Xen installation so we can skip this step. I should repeat again, in Xen environment, our real physical interface is 50

named peth0 with a software bridge eth0 connected to it. 4. Configure & Register Eucalyptus components. First, configure Eucalyptus (/etc/eucalyptus/eucalyptus.

51 named peth0 with a software bridge eth0 connected to it. 4. Configure & Register Eucalyptus components. First, configure Eucalyptus (/etc/eucalyptus/eucalyptus.conf) with these settings: #choose the hypervisor HYPERVISOR="xen" #real physical interface of Node machine (Xen renames eth0 #to peth0 when it starts) VNET_INTERFACE="peth0" #software bridge that instances will be connected to VNET_BRIDGE="eth0" VNET_DHCPDAEMON="/usr/sbin/dhcpd3" #run Eucalyptus in SYSTEM mode VNET_MODE="SYSTEM" Second, make sure all the components are running /etc/init.d/eucalyptus cloud start /etc/init.d/eucalyptus cc start /etc/init.d/eucalyptus nc start On front end #register a cluster (on localhost) to the cloud controlle euca_conf addcluster <clustername> localhost To register node machine with the cluster, edit /etc/eucalyptus/eucalyptus.conf #node machine is also on localhost NODES=" " (*) If you have a node machine on a different host. On the Front end machine: #If you have a node machine on a different host euca_conf addnode <node_hostname> 5. Initial Login #enter the admin web site first time with admin/admin Illustration 26: Administrator manager snapshot 51

52 6. Download Certificate We will use an Amazon Command Tool EC2 to connect with the Cloud controller. This tool requires a SSH public/private key to send command to the cloud for security reasons. You can download the certificate key from the login page. Then exact the unzip file into a directory (example ~/euca) 7. Install a Command line Tool Here we have two choices: use an Amazon command line tool or euca2ool from Eucalyptus. I don't use euca2ool in my test environment because I encountered different bugs while running it. I mention it here because euca2ool is more compatible with Eucalyptus. It overcomes the problem that when you upload files into an existing bucket, EC2 tool will occur failure, since this is an incompatibility between the Eucalyptus and Amazon. a. Amazon Command line Tool EC2 Eucalyptus don't update with the new EC2 tool so we will install it from the package and then remove them just to insure all dependencies get installed. Then download the new version from Amazon homepage. It is necessary to do so or you will have bugs when running EC2 with Eucalyptus. cd ~/euca apt get install ec2 api tools ec2 ami tools apt get remove ec2 api tools ec2 ami tools wget downloads/ec2 api tools zip unzip ec2 api tools zip mv ec2 api tools ec2 wget downloads/ec2 ami tools zip unzip ec2 ami tools zip mv ec2 ami tools ec2ami Insert the following entries into the text file ~/euca/eucarc then source it every time you use EC2 tool. export export export export JAVA_HOME=/usr/lib/jvm/java 6 openjdk EC2_HOME=~/euca/ec2 EC2_AMITOOL_HOME=~/euca/ec2ami PATH=$PATH:$EC2_HOME/bin:$EC2_AMITOOL_HOME/bin source ~/euca/eucarc b. euca2ool command line tool Euca2ool is a very new command line tool from Eucalyptus. It is based upon an open source Python script, BOTO, which is actually an Amazon library API for connecting to Amazon. To install: download euca2ool from or it's already in my repository under the installation directory tar zxvf euca2ools 1.0 *.tar.gz cd euca2ools 1.0 * sudo s echo deb file://${pwd}./ >> /etc/apt/sources.list 52

53 apt get update apt get install euca2ools 8. Bundle, upload and register an image Eucalyptus uses the same concept as Amazon Storage S3 to manage uploaded files and register images, which means the use of bucket and key. Please read the concept of Amazon S3 in section 2 chapter 2 for more details. In this example we use a small image ttylinux with its associated xenified kernel, which are already prebuilt by Eucalyptus. We already downloaded ttylinux when testing with Xen. Bundle, upload and register a kernel: mkdir ~/kernel #bundle image to small chunks into a directory ~/kernel ec2 bundle image i ~/ttylinux/vmlinuz xen d ~/kernel kernel true #upload images chunks to a bucket named bucket kernel ec2 upload bundle b bucket kernel m ~/kernel/vmlinuz xen.manifest.xml #register object inside bucket kernel with Eucalyptus ec2 register bucket kernel/vmlinuz xen.manifest.xml The same for an image: mkdir ~/image ec2 bundle image i ~/ttylinux/ttylinux.img d ~/image ec2 upload bundle b bucket image m ~/image/ttylinux.img.manifest.xml ec2 register bucket image/ttylinux.img.manifest.xml Since our small image doesn't need a ramdisk to boot with. But if you need to use a ramdisk to launch another big instances, just upload and register the ramdisk file to Eucalyptus as the same way as above. 9. Launch an instance Before running an instance of your image, you should first create a SSH keypair that you can use to log into your instance as root, once it boots. The key is stored, so you will only have to do this once. Run the following command: #create a SSH keypair mykey and save it ec2 add keypair mykey > ~/euca/mykey.priv chmod 0600 ~/euca/mykey.priv (*) You can always run 'ec2 describe keypairs' to get a list of created keys. #list all registered image we have on Eucalyptus. ec2 describe images On the returned list, you will see the image and kernel with their ID. ID for images, kernels and ramdisks begin with eki, emi and eri respectively. #specify which imageid with which kernelid and key to #launch an instance ec2 run instances kernel <eki XXX> <emi XXX> k mykey (*) For ttylinux we don't need a ramdisk, for other images, use ramdisk <eri> It might take a while before an instance can be launch completely. 53

54 #To monitor the running instances status ec2 describe instances In the output, you should see information about the instance, including its state. While first time caching is being performed, the instance's state will be 'pending'. As soon as the instance is started, the state will become 'running'. As soon as the instance acquires an IP address from DHCP, you will see the public and private address fields change from ' ' to a usable IP. If your instance IP address stays always by ' ', please read the troubles shooting for solving this problem. #log into your new instance with mykey ssh i ~/euca/mykey.priv root@<ip_of_instance> For some errors may occurs please read my troubles shooting logs in the last chapter: From test to a real cloud. 10. Commands for managing bucket/key In order to delete an image, you must first de register the image ec2 deregister <emi XXXXXXXX> Then, you can remove the files stored in your bucket. Assuming you have sourced your 'eucarc' to set up EC2 client tools, then: ec2 delete bundle a $EC2_ACCESS_KEY s $EC2_SECRET_KEY url $S3_URL b <bucket> p <file prefix> If you would like to remove the image and the bucket, add the ' clear' option: ec2 delete bundle a $EC2_ACCESS_KEY s $EC2_SECRET_KEY url $S3_URL b <bucket> p <file prefix> clear The administrator can set 'default' registered kernel/ramdisk identifiers that will be used if a kernel/ramdisk is unspecified by either of the above options. This is accomplished by logging in to the administrative interface ( PART B. IMPLEMENTATION 1. Deployment of virtual images in UHE This is an implementation steps in details of the model concept in chapter five section 3. Please have a look on it before going on this. Below will guide you through some important notices and instructions to speed up your first test. a. Implementation in details First, like other service packages, a virtual image is passed into deployer_service.py as an image location parameter and serviceanalyser.py will match for the format type. In this analyze script, there is already a matching MINE type for virtual images. But somehow my test image, taken from Eucalyptus, couldn't be recognized. So I simply match the image with an extension of.img, as every image's extension from Eucalyptus has. Second, administrator can configure a default directory for an image to mount in. After being mounted, the analyze script will search for a wsdl file. By default, I set a directory in the image to search for is: /etc/servicedescription. This can be configured in deployer.ini. Please change the tag wsdl_dir to / if you want to 54

55 set the configuration to search for entire of image's directories by default. User has ability to search again if he types a wrong path or he may want to exit. Third, after recognizing the package format, handler_eucalyptus.py is called as a handler for virtual images. The image will be bundled, upload and registered to Walrus (Eucalyptus) by calling the shell command line EC2 tool from Python script. Note that I currently use ec2 upload bundle (from Amazon Command line tool) for uploading a bundle image into Walrus. This will report a 409 error when uploading to a bucket name that already exists. This is a known compatibility issue when using ec2 tools with Eucalyptus. The workaround is to use the command ec2 delete bundle (below) to delete the existing bucket, before uploading to a bucket with the same name, or to use a different bucket name. euca delete bundle a $EC2_ACCESS_KEY s $EC2_SECRET_KEY url $S3_URL b <bucket> p <file prefix> clear This is of course annoying for a real production. And there is an alternative way, Eucalyptus has just come up with its new command line tool, euca2ool. After installing it (for installation instructions see section 1.7b above), you can use this tool just by editing the file handler_eucalyptus.py, by replacing the following lines and no other changes needed: ec2 bundle image > euca bundle image ec2 upload bundle > euca upload bundle ec2 register > euca register The reason I didn't use euca2ool in my test is, when I tested the command euca register and some other commands in my test laptop environment, there were bugs occurred. I already posted it in Eucalyptus forum. If you don't have this bugs in your test please use euca2ool command line instead. Euca2ool at the moment is in version 1.0 b. These instructions will speed up your first test Requirements: Eucalyptus & EC2 tool installed (section 1 above) Download the certificate (if not) from Extract the certificate files in Sandbox/eucalyptus/certificate Configure handler/deployer.ini. You can let the default configuration unchanged. But need to update element tags under #ec2 enviroment configuration with your new certificate information getting from Sandbox/eucalyptus/certificate/eucarc EC2_PRIVATE_KEY EC2_CERT EUCALYPTUS_CERT EC2_ACCESS_KEY EC2_SECRET_KEY EC2_USER_ID The ConQo database is extended with additional column imageid : 55

ALTER TABLE servicedescriptions ADD COLUMN imageid character varying(256) From handler directory run: python deployer../test archives/ttylinux.img 2.

56 ALTER TABLE servicedescriptions ADD COLUMN imageid character varying(256) From handler directory run: python deployer../test archives/ttylinux.img 2. Monitoring resources on node machines As described in section 4 of chapter 5, we can use XenAPI to communicate and perform some tasks with Xen from A Client remotely. But first I want to introduce a command line xentop. Xentop As a linux user, you probably experienced with the top command which is used to display information, such as CPU and memory usage, about processes running on a particular system. Xen has the similar command as well. They call it xentop, which shows information about all the domains running on your node system in real time. And it has to be run as root. XenAPI [27] Following are my instructions on how to bind XenAPI in a python script. First of all, when we installed xen already, the API is only one file named XenAPI.py under the directory /usr/lib/python2.6/site packages/xen/xm. Or if you download the xen's source code, you can find it under the directory /tools/python/xen/xm/ So you can include it in your python script and use. import sys sys.path.append('/usr/lib/python2.6/site packages') from xen.xm.xenapi import Session import xmlrpclib session=session('httpu:///var/run/xend/xen api.sock') session.login_with_password('root', '') 56

57 Did you notice the 2 lines at the end? They create a session object and return a reference to the client. This reference will be passed as an argument to all future API calls. Examples of monitoring operations after you've got the session. xenapi=session.xenapi print xenapi.vm.get_all_records() print xenapi.vm_metrics.get_all_records() print xenapi.vm.get_all() The API have many classes for you to interact with. And we only interested in monitoring classes: VM and VM_metrics. A VM object represents a particular virtual machine instance and we can perform many control methods such as start, suspend and example fields like power_state, memory_static_max, name_label. From one of them, we are only interested in the methode get_all_records() which returns all resources' information currently on use that xen knows about, including all attribute fields (name_label, memory_static_max, VCPUs_max, power_state...) of dom0 and all domu. The returned variable is in a format of a python library (pairs of key word and value) for programmers to work with. Another interesting operation of this class is set_memory_dynamic_max_live if you want to set more memory in database and on running VM. The second class in the example VM_metrics, will give you more detailed information on CPU state. All of the class's attributes and operations are very well documented at: Note that, in the example above I use a local socket to connect with xen. If you want to connect to xen remotely please use: session=session(' session.login_with_password('me', 'mypassword') In addition, please edit /etc/xen/xm config.xml & uncomment: <server type='xen API' uri=' username='me' password='mypassword' /> Also open /etc/xen/xend config.sxp and enable the xmlrpc functionalities (xend tcp xmlrpc server yes) (xend unix xmlrpc server yes) (xend tcp xmlrpc server address 'localhost') (xend tcp xmlrpc server port 8006) 57

58 3. How to send a REST request in Eucalyptus and Amazon EC2 Eucalyptus has recently released its new version with many fixes, especially now we can use POST request since the latter version had bugs with this protocol. So Eucalyptus now supports both SOAP and REST request totally to interact between the Client and the Cloud Controller. In this section, I will show you how a REST message and its parameters requirement looks like, and also compare it to the request from Amazon EC2, since they only have some small differences. The differences between the two, will help the readers to modify every developed Client Api of Amazon to use for Eucalyptus. We also note that Eucalyptus doesn't publish any official WSDL until now. But because they try to implement Eucalyptus to have the same service interface like Amazon, so the operation names and message types are the same. The only one thing different is the service endpoint. So developers can simple use the Amazon WSDL and modify its service endpoint to You can send Query requests over either HTTP or HTTPS. Regardless of which protocol you use, you must include a signature in every Query request. The method described in the following procedure is known as signature version 2. This is the content of the message which we have to sign for every request. StringToSign = HTTPVerb + "\n" + Value_Of_Host_Header_In_Lowercase + "\n" + HTTPRequestURI + "\n" + Canonicalized_Query_String Before explaining the above string components we first take notice of the web service interface of Amazon and Eucalyptus. This is an endpoint where a request is sent to. Amazon: Eucalyptus: Different service endpoint makes different request string components String components Amazon Eucalyptus Value_Of_Host_Header_In_ ec2.amazonaws.com localhost:8773 Lowercase is the address of the (since we want to web service interface. connect to the Eucalyptus on localhost) The HTTPRequestURI is the / HTTP absolute path component of the URI up to, but not including, the query string. If the HTTPRequestURI is empty, use a forward slash / /services/eucalyptus Canonicalized_Query_String is a string of query parameters. It is in form 58

59 of &queryparameter=value and has the following rules: First, sort the UTF 8 query string by parameter name with natural byte ordering. Then URL encode the parameter name and values. Note to separate parameter names from values with the equals sign (=), even if the parameter value is empty and separate the name value pairs with an ampersand (&) 4. A REST request example This an example of a sending request DescribeImages, which returns information of all the images we have. All other request actions will be listed later in section B.6. Example, if we want to send a request DescribeImages: &Version= &Expires= T12%3A00%3A00Z &SignatureVersion=2 &SignatureMethod=HmacSHA256 &AWSAccessKeyId=<Access Key ID> Then, this is the string we need to sign: GET\n localhost:8773\n /services/eucalyptus\n AWSAccessKeyId=<Access Key ID> &Action=DescribeImages &Expires= T12%3A00%3A00Z &SignatureMethod=HmacSHA256 &SignatureVersion=2 &Version= We sign the above string with user Secret Key. <Secret Key> and <Access Key ID> can be get from the user login web site. Following is the complete request to send: &Version= &Expires= T12%3A00%3A00Z &Signature=<URLEncode(Base64Encode(Signature))> &SignatureVersion=2 &SignatureMethod=HmacSHA256 &AWSAccessKeyId=<Access Key ID> Above is an example of using an action operation DescribeImages. A list of more web service actions is in the next section below. 5. List of request operations supported by Eucalyptus [28] Following are an overview of all main and important operations supported by Eucalyptus, categorized by monitoring and adaption functions. The operation can be used to send a rest request in a format, for example?action=describeimages (required query parameters are above in section 4) 1. Monitoring 59

60 Operation Description Images DescribeAvailabilityZones List all available zones information for users DescribeImages List all images for user to choose and launch an instance. DescribeImageAttribute Attribute of a specified image, such as launch permission, associated kernel/ramdisk of this image, operating system platform. Instances DescribeInstances Returns information about instances that you own such as: instance ID, state (pending, running or terminated), its launch time, DNS name, IP address, owner id... Elastic Address DescribeAddresses Lists elastic IP addresses assigned to your account, or provides information about a specific address: which instance id is currently running at this public address. Securities DescribeSecurityGroups DescribeKeyPairs Returns information about security groups that you own: allowed ip address range, port range, permission protocol. List all key pairs available to you (in case you forgot its name) or information on a specified key pair. Volumes DescribeVolumes Describes all volumes or a specified volume that you own: volume id, state (is it in use?), size of the volume, and the instance id which this volume is currently mounting in. Snapshots DescribeSnapshots Information about all snapshots owned by the account, such as its associated volume, state (pending?), shows the status on creating process (80%) and start time. 2. Adaption Operation Description Images RegisterImage DeregisterImage Images must be registered to the Cloud Controller before they can be launched. Deregisters the specified EMI. Once deregistered, the EMI cannot be used to launch new instances. 60

61 ModifyImageAttribute Modifies an attribute of an image. Instances RunInstances RebootInstances TerminateInstances Launch instances from a specified image. Requests a reboot of one or more instances. Shuts down one or more instances Elastic Address AllocateAddress AssociateAddress DisassociateAddress ReleaseAddress Returns an elastic IP address from the IP Pool for use with the account. After allocated an elastic IP address you can associate it with an instance. Disassociates the IP address from the instance to which it is assigned. Releases an elastic IP address associated with your account to the IP Pool Controller. Securities CreateSecurityGroup Creates a new security group. Every instance is launched in a security group. AuthorizeSecurityGroupIn Adds permissions to a security group: allowed IP gress range, Port range, or permission protocol. DeleteSecurityGroup Deletes a security group that you own. RevokeSecurityGroupIngre Revokes permissions from a security group. ss Key Pairs CreateKeyPair DeleteKeyPair Creates a new 2048 bit RSA key pair with the specified name. Deletes the specified key pair you own. Volumes CreateVolume DeleteVolume AttachVolume DetachVolume Creates a new volume to which any instances can attach within the same Availability Zone. Deletes a volume that you own. Mount a specified volume to one running instance. Unmount a volume from one running instance. Snapshots CreateSnapshot DeleteSnapshot Creates a snapshot of a volume and stores it for backups, to make identical copies of instance devices, and to save data before shutting down an instance. Deletes a snapshot of a volume that you own. In summary, with the above actions list you can perform several tasks for monitoring and adapting an elastic computed computing. Its concepts is discussed in chapter 2. One again you should remember that every request has to be authenticated and follows the common rules which is described in section B.1 of 61

62 this chapter in details. For a description of query parameters (sending and receive), please refer to the web site of Amazon EC2 API. The request parameters are well described and too large for copying it here again. query apis.html 6. Using an Amazon Client API: PHP Amazon has many Client API in various languages already developed, such as in PHP, Java, C#, Python and Ruby. Developers can use these API to interact with the Cloud Controller easier and more friendly than a EC2 command line tool. So we can use these Api for our Eucalyptus with some small modifications by the socket connect. Some Amazon Client Api, like Java Api, comes up with a client socket which was compiled into a class. It means it is not an original source codes for us to play with. So I choose their Amazon PHP Api, as the codes of client socket are easy to modify and simple to understand. We first begin with some notes. In general, the Socket Client sends a request together with an automatically signed signature, as we mentioned above, to the Cloud Controller then it will parse the reply XML with its Xslt Processor to get the information. There are two places where we have to modify. First is the client socket connect to Amazon and second is the Xslt templates, which is used to parse the answer XML message. 1. Modification of Client Socket Amazon Client Api originally points to so we have to change the located address to Secondly, Amazon Client PHP Api uses signature version 2 and POST request. Note that Eucalyptus version has been released and support both of these protocols as well so there are not many works on modifying the Api to fit with Eucalyptus. (Last month, I had to use this Api in signature version 1 with a GET request, since doesn't support POST request) Besides for the reason against replay attacks, Amazon needs a time stamp for every Client request. And the Client has to adjust its local time to USA time zone to synchronize with the Amazon's server time. Since our Eucalyptus is at home for testing, we will have to adjust the time. 2. Modification of XSLT Templates The return XML message, which sent by Amazon Client, has a namespace of /. Since Eucalyptus doesn't update its EC2 version, it's still from So we need to modify the template namespace of Amazon Client to Following are the modification steps. You can use the library from my repository as it already has a Client socket and some associated XSL templates modified. You can also run /Amazon/EC2/Samples/index.php as my demo. Prerequisites 62

63 apt get php5, libapache2 mod php5, php5 mcrypt, php5 xsl install Modification steps Download the PHP Library for Amazon EC2 from AWS. Configure /Amazon/EC2/Samples/.config.inc.php with user's Access Secret Key and Access ID. (get them from Open Amazon/EC2/Client.php (here is client socket) Search & update the address location private $_config = array ( 'ServiceURL' =>' 'UserAgent' => 'Amazon EC2 PHP5 Library', 'SignatureVersion' => 1, 'SignatureMethod' => 'HmacSHA256', Search for the function _calculatestringtosignv2 Update $data = 'POST';... $data.= $endpoint['host']; With $data = 'POST';... $data.= $endpoint['host'].":8773"; 6. Now change the time. Search for function _getformattedtimestamp Update return gmdate("y m d\th:i:s.\\0\\0\\0\\z", time()); To return gmdate("y m d\th:i:s.\\0\\0\\0\\z", time()+7200); 7. Update namespace Assume you want to use the function DescribeImages, open Amazon/EC2/Model/ DescribeImagesResponse.php Update $xpath >registernamespace('a', ' /'); To $xpath >registernamespace('a', ' /'); open the XML template Amazon/EC2/Model/DescribeImagesResponse.xslt Search for / Replace with / All done! 63

64 Including with the API is a full examples of every functions in the directory Amazon/EC2/Samples/ or you can view my demo index.php under the same directory. It has some monitoring functions: list all images, kernel and ramdisk registered to the Cloud, list all availability zones and key pairs. Further functions such as launch or terminate instance, can be implemented using this PHP API without problems, just following to the examples under Amazon/EC2/Samples/ 7. Using an Amazon Client API: Python [29] Boto is an open source written in Python script, which I have tested to connect and perform request to the Cloud controller successfully. To use it, simply download the source code from As root, go to the download directory and install its python script python setup.py install Before interacting with Eucalyptus EC2 Interface, we define a connection region = RegionInfo(name="eucalyptus",endpoint="localhost") connection = boto.connect_ec2( aws_access_key_id="accessid", aws_secret_access_key="secretkey", is_secure=false, region=region, port=8773, path="/services/eucalyptus") <Access Secret Key> and <Access ID> getting from Now we can call a function. For example: images = connection.get_all_images() >> print images >> [Image:emi 20b65349, Image:emi 22b6534b] For more functions, please see the document tutorial including with the download source code. 64

The Last Chapter FROM TEST CASES TO A REAL CLOUD 1. Test cases Following are a list of cases have been tested during deployment of a virtual image. An image is successfully deployed when deployer.

65 The Last Chapter FROM TEST CASES TO A REAL CLOUD 1. Test cases Following are a list of cases have been tested during deployment of a virtual image. An image is successfully deployed when deployer.ini is set correctly, wsdl is found on the image and image is uploaded to a non existing bucket name (if using EC2 tool) euca2ool has been tested for functions of euca bundle image and euca upload image on both shell terminal and inside python script handler_eucalyptus.py. While euca register fails when testing in shell terminal. Perhaps euca2ool has just been released in version 1. Could not mount image, if the mount point is not free. Exit. But I think, since admin can configure a specific mount point that no other program will use (for example Sandbox/eucalyptus/tmp), there will be no errors. Image is unmounted on exiting the analyzed job. Users type wrong path so that wsdl could not be found. Users have 2 options: type again or exit deployer.ini is configured with wrong certificate private key/access secret key or any wrong environment variables, could not bundle image/could not upload image to bucket. Exit 2. Implementation for a real cloud 1. Use elastic IP address Illustration 27: the use of elastic IP address in our solution model As we've already known from the section Eucalyptus Networking mode of chapter 3, Eucalyptus has MANAGED MODE. In this networking mode, all the cloud, cluster and node controller are installed on different machines and provide the feature of an elastic IP Address (described in chapter 2, section 3.5). In details, a DHCP server is installed on a Cluster Controller, which reserves a range of IP Address when it starts and gives an IP Address later for a launched instance. 65

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing Slide 1 Slide 3 ➀ What is Cloud Computing? ➁ X as a Service ➂ Key Challenges ➃ Developing for the Cloud Why is it called Cloud? services provided