Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson, Nelson Araujo, Dennis Gannon, Wei Lu, and

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson, Nelson Araujo, Dennis Gannon, Wei Lu, and"

Transcription

1 Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson, Nelson Araujo, Dennis Gannon, Wei Lu, and Jaliya Ekanayake

2 Range in size from edge facilities to megascale. Economies of scale Approximate costs for a small size center (1000 servers) and a larger, 100K server center. Technology Network Storage Cost in smallsized Data Center $95 per Mbps/ month $2.20 per GB/ month Cost in Large Data Center $13 per Mbps/ month $0.40 per GB/ month Ratio Each data center is 11.5 times the size of a football field Administration ~140 servers/ Administrator >1000 Servers/ Administrator 7.1

3

4

5

6 A bunch of machines in data centers Fabric Controller Owns all data center hardware Uses inventory to host services Deploys applications to free resources Maintains the health of those applications Maintains health of hardware If the node goes offline, FC will try to recover it If a failed node can t be recovered, FC migrates role instances to a new node, A suitable replacement location is found, Existing role instances are notified of change Manages the service life cycle starting from bare metal Highly-available Fabric Controller (FC)

7 Up to 7 Guest VMs A Host Virtual Machine An Optimized Hypervisor At Minimum (Small) CPU: GHz x64 Memory: 1.7GB Network: Mbps Local Storage: 500GB Up to (Extra Large) CPU: 8 Cores Memory: 14.2 GB Local Storage: 2 + TB

8 At Minimum CPU: GHz x64 Memory: 1.7GB Network: Mbps Local Storage: 500GB Up to CPU: 8 Cores Memory: 14.2 GB Local Storage: 2 + TB

9 Azure Platform Worker Role Web Role Compute Blobs Queues Storage Tables Drives

10 A closer look HTTP Blobs Drives Tables Queues Application Compute Fabric Storage Access Data is exposed via.net and RESTful interfaces Data can be accessed by: Windows Azure apps Other on-premise applications or cloud applications

11 Account jared Container images movies Blob PIC01.JPG PIC02.JPG MOV1.AVI

12 Number of Blob Containers Can have has many Blob Containers as will fit within the storage account limit Blob Container A container holds a set of blobs Set access policies at the container level Private or Public accessible Associate Metadata with Container Metadata are <name, value> pairs Up to 8KB per container

13 Block Blob Targeted at streaming workloads Each blob consists of a sequence of blocks Each block is identified by a Block ID Size limit 200GB per blob Page Blob Targeted at random read/write workloads Each blob consists of an array of pages Each page is identified by its offset from the start of the blob Size limit 1TB per blob

14 Account Container Blob Block or Page jared images movies PIC01.JPG PIC02.JPG MOV1.AVI Block or Page 1 Block or Page 2 Block or Page 3

15 Scalable message paths Provides loose synchronization Any number of messages One week of persistence Maximum size 8KB Visibility timeout Producers P 2 P Consumers C 1 C 2

16 Provides Structured Storage Massively Scalable Tables Billions of entities (rows) and TBs of data Can use thousands of servers as traffic grows Data is replicated several times Table A storage account can create many tables Table name is scoped by account Set of entities (i.e. rows) Entity Set of properties (columns) Required properties PartitionKey, RowKey and Timestamp

17 Partition 1 Partition 2 Source : Windows Azure Table Programming Table Storage

18 A Windows Azure Drive is a Page Blob formatted as a NTFS single volume Virtual Hard Drive (VHD) Drives can be up to 1TB A VM can dynamically mount up to 8 drives A Page Blob can only be mounted by one VM at a time for read/write Remote Access via Page Blob Can upload the VHD to its Page Blob using the blob interface, and then mount it as a Drive Can download the Drive through the Page Blob interface

19 A closer look Web Role Worker Role HTTP Load Balancer IIS ASP.NET, WCF, etc. main() { } Agent Agent Fabric VM

20 Using queues for reliable messaging To scale, add more of either 1) Receive work Web Role Worker Role ASP.NET, WCF, etc. main() { } 4) Do work 2) Put work in queue 3) Get work from queue Queue

21 Queues are the application glue Decouple parts of application, easier to scale independently; Resource allocation, different priority queues and backend servers Mask faults in worker roles (reliable messaging). Use Inter-role communication for performance TCP communication between role instances Define your ports in the service models

22 Points of interest Access Data is exposed via.net and RESTful interfaces Data can be accessed by: Windows Azure apps Other on-premise applications or cloud applications

23 Work Develop Development Fabric Home Develop Your App Run Development Storage Source Control Version Local Application Works Locally

24 What the Value Add? Provide a platform that is scalable and available Services are always running, rolling upgrades/downgrades Failure of any node is expected, state has to be replicated Failure of a role (app code) is expected, automatic recovery Services can grow to be large, provide state management that scales automatically Handle dynamic configuration changes due to load or failure Manage data center hardware: from CPU cores, nodes, rack, to network infrastructure and load balancers.

25 Key takeaways Cloud services have specific design considerations Always on, distributed state, large scale, fault tolerance Scalable infrastructure demands a scalable architecture Stateless roles and durable queues Windows Azure frees service developers from many platform issues Windows Azure manages both services and servers

26

27 Web Portal Web Service Web Role Job registration Job Management Role Scaling Engine Job Scheduler Global dispatch queue Worker Worker Worker NCBI databas es Database updating Role Job Registry Azure Table Blast databases, temporary data, Azure etc.) Blob

28

29

30 Always design with failure in mind - On large jobs it will happen, and it can happen anywhere Factoring work into optimal sizes has large performance impacts - The optimal size may change depending on the scope of the job Test runs are your friend - Blowing $20,000 of computation is not a good idea Make ample use of logging features - When failure does happen, it s good to know where Cutting 10 years of computation down to 1 week is great!! - Little Cloud development headaches are probably worth it

31 Thank you!