2009. 3. 11 한재선 (jshan0000@gmail.com) KAIST 정보미디어경영대학원겸직교수 & NexR 대표이사 http://www.web2hub.com S1
S2 Big Switch: Power Burden Iron Works Edison Power Plant & Power Grid
S3 Big Switch: Computing Corporate Data Center PC Cloud Computing Center & Internet
S4 Definition of Cloud Computing A pool of abstracted, highly scalable, and managed compute infrastructure capable of hosting end-customer applications and billed by consumption - Is Cloud Computing Ready for The Enterprise?, Forrester Research Cloud computing is an emerging approach to shared infrastructure in which large pools of systems are linked together to provide IT services - Press release on Blue Cloud, IBM A style of computing where IT-related capabilities are provided as a service, allowing users to access technology-enabled services from the Internet ( in the cloud ) without knowledge of, expertise with, or control over the technology infrastructure that supports them - Cloud Computing, Wikipedia A paradigm in which information is permanently stored in servers on the Inte rnet and cached temporarily on clients that include desktops, entertainment cent ers, table computers, notebooks, wall computers, handhelds, etc - ORGs for Scalable, Robust, Privacy-Friendly Client Cloud Computing, IEEE Internet Computing
S5 My Definition For companies Providing IT infrastructure and environment to develop/host/run services and apps, on demand, with pay-as-you-go pricing, as a service For end-users Providing resource and services to store data and run application, in various devices, anytime, anywhere, as a service
S6 Features Prescripted & Abstracted Infrastructure Fully Virtualized Equipped with Dynamic Infrastructure Software Pay by Consumption Free of Long-Term Contracts Application and OS Independent Free of Software or Hardware Installation Source: Is Cloud Computing Ready For The Enterprise, Forrester Research
S7 Advantages Economies of scale Cost No upfront CapEx(Capital Expenditure) Pay-as-you-go pricing model Scalability Scale capacity on demand Handling dynamic workloads Productivity Easy to use Reduced time-to-market Maintenance Easy or no management Instant software updates
S8 The Evolution of Computing Grid Computing Utility Computing Cloud Computing
Grid + Utility + Autonomic S9 = Cloud Computing Grid Computing A form of distributed computing whereby a "super and virtual computer" is composed of a cluster of networked, loosely-coupled computers, acting in concert to perform very large tasks Utility Computing The packaging of computing resources, such as computation and storage, as a metered service similar to a traditional public utility such as electricity Autonomic Computing Computer systems capable of self-management Cloud Computing, Wikipedia
Difference from Previous S10 Computing Enterprise Cloud Computing As a Service Grid Computing Utility Computing End-User
S11 Trends: x-computing Google Trends
S12 Market Volume Estimation Year 2011 $160 billion = $95 billion (business and productivity apps (e-mail, office, CRM, etc.)) + $65 billion (online advertising) The Cloud Wars: $100+ billion at stake, Merrill Lynch Research note
S13 Cloud Computing Market http://peterlaird.blogspot.com/2008/09/visual-map-of-cloud-computingsaaspaas.html
Enterprise Cloud Computing Cloud Computing Software Hadoop, 3Tera, Xen, VMware, NexR VCC, IBM Blue Cloud, etc S14 Classifying Cloud Computing Cloud Services/Applications (Software as a Service) Apple MobileMe, Google Apps, Nokia Ovi, Salesforce.com Apps, etc Cloud Platform (Platform as a Service) Google App Engine, force.com, Facebook F8, Bungee Labs, etc Cloud Infrastructure (Infrastructure as a Service) Amazon S3&EC2, Joyent, GoGrid, AT&T, etc
S15 Cloud Infrastructure Definition Offering virtualized infrastructure resources such as storage, compute, and network, over Intenet for services and apps
Players: Cloud Infrastructure S16
Case Study: Amazon Web S17 Services The First & Best Cloud Computing Data as a Service People as as Service E-Commerce Service Historical Pricing Mechanical Turk Search as a Service Alexa Web Info. Service Alexa Top Sites Alexa Site Thumbnail Alexa Web Search Platform Infrastructure as a Service Simple Queue Service Simple Storage Service Elastic Compute Cloud Simple DB Cloud Infrastructure
S18 Amazon Cloud Infrastructure S3 $0.15 per GB-Month EC2 $0.10 per Instance-Hour SimpleDB $1.50 per GB-Month
S19 Why Amazon AWS? online video mixing utility 25,000 to 250,000 users in 3 days At peak, 20,000 new users per hour 50 to 4000 instances (servers) in 5 days At peak, 40 new instances (servers) per hour
S20 Success Story: NY Times Image Processing at New York Times Convert 11 million articles (1851-1980) of TIFF format into PDF Using Amazon S3 and EC2 for HW, Hadoop for SW TIFF format (http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/)
S21 NY Times Architecture Amazon S3 TIFF Image (4TB) PDF (1.5TB) AMI Hadoop MapReduce Amazon EC2 (100 instances)
S22 NY Times Cost Storage: 5.5 TB Data Transfer-in: 4 TB S3 Instances: 100 X 24 hours EC2 http://calculator.s3.amazonaws.com/calc5.html Only $ 1,465 Actually under $ 400
Success Story: Entrip S23
S24 Amazon S3 Source: Building Blocks for True Internet Applications, Jeff Bar
S25 Amazon S3: Concepts Object Opaque data to be stored (1 byte 5 Gigabytes) Authentication and access controls Bucket Object container any number of objects 100 buckets per account Standards-Based Interfaces REST and SOAP URL-Addressability every object has a URL Example: http://johnsmith.s3.amazonaws.com/photos/puppy.jpg Key Unique object identifier within bucket Up to 1024 bytes long Flat object storage model
S26 Amazon S3: REST API Just HTTP Requests Common header Authorization: AWS [aws-access-key-id]:[header-signature] create bucket request mkdir PUT /[bucket-name] HTTP/1.0 put object request write PUT /[bucket-name]/[key-name] HTTP/1.0 list bucket request ls GET /[test-bucket] HTTP/1.0 get object request read GET /[bucket-name]/[key-name] HTTP/1.0 delete object request rm DELETE /[bucket-name]/[key-name] HTTP/1.0 delete bucket request rmdir DELETE /[bucket-name] HTTP/1.0
S27 Amazon S3: Use Cases Source: Building Blocks for True Internet Applications, Jeff Bar
S28 Amazon EC2 $0.10/Hr $0.40/Hr $0.80/Hr Source: Building Blocks for True Internet Applications, Jeff Bar
S29 Amazon EC2: Concepts Amazon Machine Image (AMI) Bootable root disk stored in S3 Pre-defined or user-built Catalog of user-built AMIs OS: Fedora, Centos, Gentoo, Debian, Ubuntu, Windows Server App Stack: LAMP, mpiblast, Hadoop Instance Running copy of an AMI Launch in less than 2 minutes Start/stop programmatically Xen-base Virtualization Simple APIs SOAP & HTTP Query Launch and control instances
S30 Amazon EC2: Use Cases Source: Building Blocks for True Internet Applications, Jeff Bar
S31 Amazon AWS Extensions Free S3 Firefox Plugin Free EC2 Firefox Plugin RightScale Commercial EC2 Management Service
S32 Cloud Platform (PaaS) Definition Platform offering all of the facilities required to support the end-to-end life cycle of building and delivering web applications and services entirely available from the Internet - with no software downloads or installation for developers (from Platform as a service, Wikipedia) Also known as Cloudware Supporting functions Workflow management Design development testing deployment hosting maintenance Development tool Web GUI, Client SDK, Team collaboration, version control, developer community facilitation Cloud infrastructure Storage, computation, persistence, state management, scalability, web service integration, database integration, security
Players: Cloud Platform S33
Case Study: S34 Google App Engine Run your web applications on Google's infrastructure http://code.google.com/appengine/ Offering infrastructure free 500MB Storage, 10 GB Bandwidth In&Out/day, 5 million PV/1 month Offering python development environment
S35 Google App Engine 1.Scalable Service Infrastructure 2. Python Runtime and APIs 3. Software Development Kit 4. Web-based Admin Console 5. Scalable Datastore
S36 Scalable Service Infra Services Services Services Services Python Runtime: Service Execution Bigtable: Distributed Data Store GFS: Distributed File System Commodity PC Cluster
S37 Software Development Kit App Engine SDK Deploy Web Server Uploader Python Framework webapp, Django API local version datastore Google Acount URL Fetch Mail Development, Testing
App Engine APIs S38
S39 Scalable Datastore Datastore Model Class GQL Query
Web-based Admin Console S40
Billing & Expected Pricing S41
Code: Using webapp S42 helloworld.py framework from google.appengine.ext import webapp from google.appengine.ext.webapp.util import run_wsgi_app class MainPage(webapp.RequestHandler): def get(self): self.response.headers['content-type'] = 'text/plain' self.response.out.write('hello, webapp World!') application = webapp.wsgiapplication( [('/', MainPage)], debug=true) app.yaml application: helloworld version: 1 runtime: python api_version: 1 handlers: - url: /.* script: helloworld.py def main(): run_wsgi_app(application) if name == " main ": main() Testing the app google_appengine/dev_appserver.py helloworld/ http://localhost:8080/
S43 Code: Using User Service helloworld.py from google.appengine.api import users from google.appengine.ext import webapp from google.appengine.ext.webapp.util import run_wsgi_app class MainPage(webapp.RequestHandler): def get(self): user = users.get_current_user() if user: self.response.headers['content-type'] = 'text/plain' self.response.out.write('hello, ' + user.nickname()) else: self.redirect(users.create_login_url(self.request.uri)) application = webapp.wsgiapplication( [('/', MainPage)], debug=true) def main(): run_wsgi_app(application) if name == " main ": main()
S44 Code: Using Datastore helloworld.py from google.appengine.ext import db class Greeting(db.Model): // defining data model author = db.userproperty() content = db.stringproperty(multiline=true) date = db.datetimeproperty(auto_now_add=true) class Guestbook(webapp.RequestHandler): def post(self): greeting = Greeting() if users.get_current_user(): greeting.author = users.get_current_user() // storing data greeting.content = self.request.get('content') greeting.put() self.redirect('/') class MainPage(webapp.RequestHandler): // querying data def get(self): greetings = db.gqlquery("select * FROM Greeting ORDER BY date DESC LIMIT 10") for greeting in greetings: if greeting.author:
S45 Code: Uploading the App Registering the app 1. Signing in to App Engine: http://appengine.google.com/ 2. Creating an App http://application-id.appspot.com/ Uploading the app appcfg.py update helloworld/ http://application-id.appspot.com
S46 Cloud Service Definition Consumer and Business products, services and solutions that are delivered and consumed in real-time over the Internet (IDC exchange)
Players: Cloud Services S47
S48 Case Study: Apple MobileMe Apple s cloud services Launching in June 2008 Sync in the Web between PC, Mac, iphone Push service to iphone within several seconds Mac-like Web UI (with Ajax & DHTML)
S49 MobileMe Components Three layers of components Apple s hosted web apps Mail, Contacts, Calendar, Gallery, and idisk Developed using SproutCore JavaScript framework Appearance of the Mac OS X desktop Server cloud of online services Secret Linux, Mac OS X, Solaris, Apache, Netcache AppleIDiskServer: WebDAV file shares Client-side push and sync apps (on iphone, Desktop) Wide-Area Bonjour mechanism
S50 Cloud Software Definition Software to help building and running cloud computing service and environment There are too many softwares Especially, open source cloud software LAMP, Hadoop
Players: Cloud Software S51
S52 Case Study: Hadoop Nutch: Open Source Search Engine MapReduce: Distributed Data Processing HBase: Distributed Data Store HDFS: Distributed File System Google Search MapReduce Bigtable GFS Commodity PC Cluster Google Platform
Hadoop Ecosystem S53 - Power of Open Platform NexR VCC Hadoop on Virtualization? Cascading Workflow management for Hadoop MapReduce Yahoo Pig Query Language Interface on Hadoop Yahoo Zookeeper Distributed Management IBM MapReduce Tools Eclipse plug-in for MapReduce programs HDFS, MapReduce HBase, HOD, Streaming, Fuse-DFS, EC2 Support Facebook Hive Data warehousing on Hadoop Parhely ORM for HBase Katta Distributed indexing with Hadoop Mahout & Hama Machine Learning using Hadoop MapReduce
S54 Caution Be practical!
S55 Key Issues To Overcome - From Forrester Report Concerns about stability Few big-name players offering clouds Few enterprise reference accounts Concerns around security Lack of commercial ISV support Little geographic locality Not for the faint-of-tech Not very enterprise friendly
S56 Key Issues To Overcome - Others Integration with in-house systems Application licensing complexity Privacy Constant network connectivity Confidence to service providers Open standard Interoperability between services
Cloud Computing Incidents S57 Database CloudComputing:Incidents Database, Wikipedia
S58 Outage of Cloud Computing Amazon S3 Outage 8 hours in July 20, 2008 (Affected: all) Cause: Design fault (server-to-server communication) Flexiscale Outage 2 days in August 26, 2008 (Affected: all) Cause: Engineer mistake Gmail Outage 2 hours in August 11, 2008 (Affected: many) Cause: Change management Apple MobileMe Outage Several hours in July 10, 2008 (Affected: many) Cause: Migration from.mac to MobileMe CloudComputing:Incidents Database, Wikipedia
S59 Closure of Cloud Computing MediaMax/Linkup Cloud storage service Data loss of half of user files in July 2007 20,000 paid users are affected Finally, service closure in July 2008 Zimki Early cloud platform service (from 2006) Service closure in December 2007 Caused by the cease of investment CloudComputing:Incidents Database, Wikipedia http://www.theregister.co.uk/2007/07/30/canon_stalls_fotango/, TheRegister
S60 Where is Cloud Computing? More political & psychological than technical
Thank You!!! S61