Distributed Data Management. Christoph Lofi Institut für Informationssysteme Technische Universität Braunschweig

Size: px
Start display at page:

Download "Distributed Data Management. Christoph Lofi Institut für Informationssysteme Technische Universität Braunschweig"

Transcription

1 Distributed Data Management Christoph Lofi Institut für Informationssysteme Technische Universität Braunschweig

2 12.0 The Cloud 12.1 Map & Reduce 12.2 Cloud beyond Storage 12.3 Computing as a Service SaaS PaaS IaaS Distributed Data Management Christoph Lofi IfIS TU Braunschweig 2

3 12.1 Map & Reduce Just storing massive amounts of data is often not enough! Often, we also need to process and transform that data Large-Scale Data Processing Use thousands of worker nodes within a computation cluster to process large data batches But don t want hassle of managing things Map & Reduce provides Automatic parallelization & distribution Fault tolerance I/O scheduling Monitoring & status updates Distributed Data Management Christoph Lofi IfIS TU Braunschweig 3

4 12.1 Map & Reduce Initially, implemented by Google for building the Google search index i.e. crawling the web, building inverted word index, computing page rank, etc. General framework for parallel high volume data processing J. Dean, S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters, Symp. Operating System Design and Implementation, San Francisco, USA, 2004 Also available as Open Source implementation as part of Apache Hadoop Distributed Data Management Christoph Lofi IfIS TU Braunschweig 4

5 12.1 Map & Reduce Base idea There is a large number of input data, identified by a key i.e. input given as key-value pairs e.g. all web pages of the internet identified by their URL A map operation is a simple function which accepts one input key-value pair A map operation runs as a autonomous thread on one single node of a cluster Many map jobs can run in parallel on different input keys Returns for a single input key-value pair a set of intermediate key-value pairs map(key, value) Set of intermediate (key, value) After map job is finished, the node is free to perform another map job for the next input key-value pair A central controller distributes map jobs to free nodes Distributed Data Management Christoph Lofi IfIS TU Braunschweig 5

6 12.1 Map & Reduce After input data is mapped, reduce jobs can start reduce(key, values) is run for each unique key emitted by map() Each reduce job is also run autonomously on one single node Many reduce jobs can run in parallel on different intermediate key groups Reduce emits final output of the map-reduce operation Each reduce job takes all map tuples with a given key as input Generate usually one, but possible more output tuples Distributed Data Management Christoph Lofi IfIS TU Braunschweig 6

7 12.1 Map & Reduce Each reduce is executed on a set of intermediate map results which have the same key To efficiently select that set, the intermediate keyvalue pairs are usually shuffled i.e. just sorted and grouped by their respective key After shuffling, reduce input data can be selected by a simple range scan Distributed Data Management Christoph Lofi IfIS TU Braunschweig 7

8 12.1 Map & Reduce Example: Counting words in documents map(key, value): // key: doc name; // value: text of doc for each word w in value: emit(w, 1); reduce(key, values): // key: a word; // values: list of counts result = 0; for each v in values) result += v; emit(key, result); Distributed Data Management Christoph Lofi IfIS TU Braunschweig 8

9 12.1 Map and Reduce Example: Counting words in documents doc1: distributed db and p2p doc2: map and reduce is a distributed processing technique for db map(key,value) distributed 1 db 1 and 1 p2p 1 map 1 and 1 reduce 1 is a 1 distributed 1 reduce(key,values) distributed 2 db 2 and 2 p2p 1 map 1 reduce 1 is 1 Distributed Data Management Christoph Lofi IfIS TU Braunschweig 9

10 12.1 Map and Reduce Improvement: Combiners Combiners are mini-reducers that run in-memory after the map phase Used to group rare map keys into larger groups e.g. word counts: group multiple extremely rare words under one key (and mark that they are grouped ) Used to reduce network and worker scheduling overhead Distributed Data Management Christoph Lofi IfIS TU Braunschweig 10

11 12.1 Map & Reduce Responsibility of the map and reduce master Often, also called scheduler Assign Map and Reduce tasks to workers on nodes Usually, map tasks are assigned to worker nodes as a batch and not one by one Often called a split, i.e. subset of the whole input data Split often implemented by a simple hash function with as many buckets as worker nodes Full split data is assigned to worker node which starts a map task for each input key-value pair Check for node failure Check for task completion Route map results to reduce tasks Distributed Data Management Christoph Lofi IfIS TU Braunschweig 11

12 12.1 Map & Reduce Map and Reduce overview Distributed Data Management Christoph Lofi IfIS TU Braunschweig 12

13 12.1 Map and Reduce Master is responsible for worker node fault tolerance Handled via re-execution Detect failure via periodic heartbeats Re-execute completed + in-progress map tasks Re-execute in progress reduce tasks Task completion committed through master Robust: lost 1600/1800 machines once finished ok Master failures are not handled Unlikely due to redundant hardware Distributed Data Management Christoph Lofi IfIS TU Braunschweig 13

14 12.1 Map and Reduce Showcase: machine usage during web indexing Fine granularity tasks: map tasks >> machines Minimizes time for fault recovery Can pipeline shuffling with map execution Better dynamic load balancing Showcase uses 200,000 map & 5,000 reduce tasks Running on 2,000 machines Distributed Data Management Christoph Lofi IfIS TU Braunschweig 14

15 12.1 MR - Performance Distributed Data Management Christoph Lofi IfIS TU Braunschweig 15

16 12.1 MR - Performance Distributed Data Management Christoph Lofi IfIS TU Braunschweig 16

17 12.1 MR - Performance Distributed Data Management Christoph Lofi IfIS TU Braunschweig 17

18 12.1 MR - Performance Distributed Data Management Christoph Lofi IfIS TU Braunschweig 18

19 12.1 MR - Performance Distributed Data Management Christoph Lofi IfIS TU Braunschweig 19

20 12.1 MR - Performance Distributed Data Management Christoph Lofi IfIS TU Braunschweig 20

21 12.1 MR - Performance Distributed Data Management Christoph Lofi IfIS TU Braunschweig 21

22 12.1 MR - Performance Distributed Data Management Christoph Lofi IfIS TU Braunschweig 22

23 12.1 MR - Performance Distributed Data Management Christoph Lofi IfIS TU Braunschweig 23

24 12.1 MR - PageRank PageRank is one of the major algorithm behind Google Search See our wonderful IR lecture (No 12)!! Key Question: How important is a given website? Importance independent of query Idea: other pages vote for a site by linking to it also called giving credit to Pages with many votes are probably important If an important site votes for another site, that vote has a higher weight as when an unimportant site votes t 1 x t 2 t 3 Distributed Data Management Christoph Lofi IfIS TU Braunschweig 24

25 12.1 MR - PageRank Given page x with in-bound links t 1,, t n, where C(t) is the out-degree of t α is probability of random jump N is the total number of nodes in the graph PR x = α 1 + (1 α) N i=1 n ( PR t i ) C t i Distributed Data Management Christoph Lofi IfIS TU Braunschweig 25

26 12.1 MR - PageRank Properties of PageRank Can be computed iteratively Effects at each iteration is local Sketch of algorithm: Start with seed PR i values Each page distributes PR i credit to all pages it links to Each target page adds up credit from multiple inbound links to compute PR i+1 Iterate until values converge Distributed Data Management Christoph Lofi IfIS TU Braunschweig 26

27 12.1 MR - PageRank Map Step: Distribute Page Rank Credits to link targets Reduce Step: gather up PageRank credit from multiple sources to compute new PageRank value Distributed Data Management Christoph Lofi IfIS TU Braunschweig 27

28 12.1 MR - Performance Turbo-Charging Map and Reduce Naïve approach for implementing Map and Reduce Move data to workers Have a cluster of computation nodes A master, multiple workers Master has access to all data Master splits the data and assigns map tasks Master transfers input data to workers Map results are somehow transferred to reduce workers Directly? Pipelined? Via master? In short: a lot of data shipping is necessary Distributed Data Management Christoph Lofi IfIS TU Braunschweig 28

29 12.1 MR - Performance Location aware file system approach Rely on a distributed file system like GFS or HFS Or even on a higher layers like Bigtable or HBase All those systems are especially designed for increased Map and Reduce performance Idea: Each processing node runs a GFS chunk server and a Map & Reduce Worker Input data is stored in large chunks in GFS Start a worker task which uses a local chunk as batch map input Read sequentially through the local chunk GFS as well as BigTable are optimized for sequential scans Distributed Data Management Christoph Lofi IfIS TU Braunschweig 29

30 12.1 MR - Performance Map workers sequentially appends intermediate keyvalue pairs to another chunk (local or remote) GFS as well as BigTable are optimized for append operations Reduce workers also scan through local chunks as input and append results to a local or remote chunk File system responsible for distributing data Very easy scheduling for master Just assign local data to workers Fault tolerant (data loss improbable) Distributed Data Management Christoph Lofi IfIS TU Braunschweig 30

31 12.2 The Cloud Distributed Data Management Christoph Lofi IfIS TU Braunschweig 31

32 12.2 The Cloud The term cloud computing is often seen as a successor of client-server architectures Often used as synonym for centralized on-demand pay-what-you-use provisioning of general computation resources e.g. compared to utility providers like electric power grids or water supply Computing as a commodity Cloud is used as a metaphor for the Internet Users or applications just use computation resources provided in the internet instead using local hardware or software Distributed Data Management Christoph Lofi IfIS TU Braunschweig 32

33 12.2 The Cloud Computation resources can mean a lot of things: Dynamic access to raw metal Raw storage space or CPU time Fully operational server are provided by the cloud Low-level services and platforms e.g. runtime platforms like Jave JRE» User can run application directly on cloud platform» No own servers or platform software needed e.g. abstracted storage space like space within a database or a file system» This is what we did in the last weeks! Distributed Data Management Christoph Lofi IfIS TU Braunschweig 33

34 13.0 The Cloud Software services i.e. some functionalities required by user software is provided by the cloud» Used via web service remote procedure calls» e.g. delegate a the rendering of a map in a user applciarion to Google Maps Full software functionality e.g. rented web applications replacing traditional server or desktop applications» e.g. rent CRM software online from SalesForce, use Google apps instead of MS Office, etc. Distributed Data Management Christoph Lofi IfIS TU Braunschweig 34

35 12.2 The Cloud Underlying base problem Successfully running IT departments and IT infrastructure can be very difficult and expensive for companies High fixed costs Acquiring and paying competent IT staff Competent is often very hard to get Buying and maintaining servers Correctly hosting hardware Proper power and cooling facilities, network connections, server racks, etc. Buying and maintaining software Distributed Data Management Christoph Lofi IfIS TU Braunschweig 35

36 12.2 The Cloud Load and Utilization Issues How much hardware resources are required by each application and / or service? How to handle scaling issues? What happens if demand increases or declines? How to handle spike loads? Digg Effect Traditional data centers are notoriously underutilized, often idle 85% of the time Over provisioning for future growth or spikes Insufficient capacity planning and sizing Improper understanding of scalability requirements etc. Distributed Data Management Christoph Lofi IfIS TU Braunschweig 36

37 12.2 The Cloud Cloud computing centrally unifies computation resources and provides them on-demand Degree of centralization and provision may differ Centralize hardware within a department? A company? A number of companies? Globally? Provide resources only oneself? To some partners? To anybody? How to compensate resource for resource usage? Provide resources by a rental model (e.g. monthly fee)? Provide resources metered on what-is-used basis (e.g. similar to electricity or water?) Provide resources for free? Distributed Data Management Christoph Lofi IfIS TU Braunschweig 37

38 12.2 The Cloud Usually, three types of clouds are distinguished Public Cloud Private Cloud Hybrid Cloud Distributed Data Management Christoph Lofi IfIS TU Braunschweig 38

39 12.2 The Cloud Public Cloud Traditional cloud computing Services and resources are offered via the internet to anybody willing to pay for them User just pays for services, usually no acquisition, administration or maintenance of hardware / software necessary Services usually provided by off-site 3 rd party providers Open for use by general public Exist beyond firewall, fully hosted and managed by the vendor Customers are individuals, corporations and others e.g. Amazon's Web Services and Google AppEngine Offers startups and SMB s quick setup, scalability, flexibility and automated management. Pay as you go model helps startups to start small and go big Security and compliance? Reliability and privacy concerns hinder the adoption of cloud Amazon S3 services were down for 6 hours in 2010 What will Amazon do with all the data? Distributed Data Management Christoph Lofi IfIS TU Braunschweig 39

40 12.2 The Cloud Private Cloud Cloud computing hardware are within the premises of a company behind the cooperate firewall Resources are only provided internally for various departments Private clouds are still fully bought, build, and maintained by the company using it But usually not exclusive to single departments! Still, costs could be prohibitive and may by far exceed that of public clouds Fine grained control over resources More secure as they are internal to organization Schedule and reshuffle resources based on business demands Ideal for apps requiring tight security and regulatory concerns Development requires hardware investments and in-house expertise Distributed Data Management Christoph Lofi IfIS TU Braunschweig 40

41 12.2 The Cloud Hybrid Cloud Both private and public cloud services or even non-cloud services are used or offered simultaneously State-of-art for most companies relying on cloud technology Distributed Data Management Christoph Lofi IfIS TU Braunschweig 41

42 12.2 The Cloud Properties promised by Cloud computing Agility Resources are quickly available when needed Costs i.e. servers must not be ordered and build, software doesn t need to be configured and installed, etc. Capital expenditure is converted to operational expenditure Independence Services are available everywhere and for any device Distributed Data Management Christoph Lofi IfIS TU Braunschweig 42

43 12.2 The Cloud Multi-tenancy Resources are shared by larger pool of users Resources can be centralized which reduces the costs Load distribution of users differs Peak loads can usually be distributed Overall utilization and efficiency of resources is better Reliability Most cloud services promise durable and reliable resources due to distribution and replication Scalability If a user needs more resources or performance, it can easily provisioned Distributed Data Management Christoph Lofi IfIS TU Braunschweig 43

44 12.2 The Cloud Low maintenance Cloud services or applications are not installed on user s machines, but maintained centrally by specialized staff Transparency and metering Costs for computation resources are directly visible and transparent Pay-what-you-use models Cloud computing generally promises to be beneficial for fast growing startups, SMBs and enterprises alike. Cost effective solutions to key business demands Improved overall efficiency Distributed Data Management Christoph Lofi IfIS TU Braunschweig 44

45 12.2 The Cloud The cloud heavily encourages a self-service model Users can simply request the resources they need from cloudscaling.com Distributed Data Management Christoph Lofi IfIS TU Braunschweig 45

46 12.3 XaaS Anything-as-a-Service XaaS= X as a service In general, cloud providers offer any computation resources as a service In the long run, all computation needs of a company should be modeled, provided and used as a service e.g. in Amazon s private and public cloud infrastructures: everything is a service! Distributed Data Management Christoph Lofi IfIS TU Braunschweig 46

47 12.3 XaaS Services provide a strictly defined functionality with certain guarantees Service description and service-level agreement (SLA) Services description explains what is offered by the service SLA further clarifies the provisioning guarantees Often: performance, latency, reliability, availability, etc. Distributed Data Management Christoph Lofi IfIS TU Braunschweig 47

48 12.3 XaaS Usually, three main resources may be offered as a service Software as a Service SaaS Platform as a Service PaaS Infrastructure as a Service IaaS Client Application Platform Infrastructure Server Distributed Data Management Christoph Lofi IfIS TU Braunschweig 48

49 12.3 XaaS Application Services (services on demand) Gmail, GoogleCalender Payroll, HR, CRM, etc Sugar CRM, IBM Lotus Live Platform Services (resources on demand) Middleware, Integration, Messaging, Information, connectivity etc Amazon AWS, Boomi, CastIron, Google AppEngine Infrastructure as services (physical assets as services) IBM Blue House, VMWare Cloud Edition, Amazon EC2, Microsoft Azure Platform, Distributed Data Management Christoph Lofi IfIS TU Braunschweig 49

50 12.3 XaaS Individuals Corporations Non-Commercial? CLOUD Cloud Middle Ware Storage Provisioning OS Provisioning Network Provisioning Service(apps) Provisioning SLA(monitor), Security, Billing, Payment Resources Services Storage Network OS Distributed Data Management Christoph Lofi IfIS TU Braunschweig 50

51 12.3 IaaS Infrastructure as a Service (IaaS) Provides raw computation infrastructure, i.e. usually a virtual server e.g. see hardware virtualization (VMWare & co.) Successor to dedicated server rental For the user, a virtual server is similar to a real server Has CPU cores, main memory, hard disc space, etc. Usually provided as self-service raw machine User is responsible for installing and maintaining applications like e.g. operating system, databases or server software User does not need to buy, host or maintain the actual hardware Distributed Data Management Christoph Lofi IfIS TU Braunschweig 51

52 12.3 IaaS The IaaS provider can host multiple virtual servers on a single, real machine Often, virtual severs per real server Virtualization is used to abstract server hardware for virtual servers Virtual system also often called virtual machines (neutral term) or appliances (usually suggesting preinstalled OS and software) Virtualization of hardware is usually handled by a socalled hypervisor, e.g. Xen, KVM, VMWare, HyperV, Distributed Data Management Christoph Lofi IfIS TU Braunschweig 52

53 1 many #appliances 12.3 IaaS In short, IaaS is virtualization on multiple hardware machines Normal Server 1 machine with one OS Traditional virtualization 1 machine hosting multiple virtual servers Distributed Application 1 appliance running on multiple machines IaaS Multiple machines running multiple virtual servers Dynamic load balancing between machines Traditional virtualization Normal server IaaS Distributed Appliance 1 #machines many Distributed Data Management Christoph Lofi IfIS TU Braunschweig 53

54 12.3 IaaS Hypervisor is responsible for allocating available resources to VMs Dispatch VMs to machines Relocate VM to balance load Distribute resources Network adaptors, logical discs, RAM, CPU cores, etc Distributed Data Management Christoph Lofi IfIS TU Braunschweig 54

55 12.3 IaaS Usually, virtual machines offered by IaaS infrastructures cannot grow arbitrarily big Usually capped by actual server size or a smaller server group Really big applications are usually deployed in socalled Pods Similar to database shards Group of machines running one or multiple appliances Machines within a Pod are very tightly networked Distributed Data Management Christoph Lofi IfIS TU Braunschweig 55

56 12.3 IaaS i.e. each Pod is a full copy of given virtual machines with full OS and application installed Usually, there are multiple copies of a given Pod (and its VMs) Each Pod is responsible for a disjoint part of the whole workload Pods are usually scattered across availability zones (e.g. data centers or a certain rack) Physically separated, usually with own power / network, etc. Distributed Data Management Christoph Lofi IfIS TU Braunschweig 56

57 12.3 IaaS IaaS Pods from CloudScaling.com Distributed Data Management Christoph Lofi IfIS TU Braunschweig 57

58 12.3 IaaS Simplified Pod example: Googl Multiple Pods, each Pod running on multiple machines with a full and independent installation of Gmail software Load balancer decides during user log-in which Pod will handle the user session Users are distributed across Pods Pods are flexible by using shared GFS file system Distributed Data Management Christoph Lofi IfIS TU Braunschweig 58

59 12.3 IaaS Mission critical applications should be designed such that they run in multiple availability zones on multiple Pods Cloud control system (CCS) responsible for distribution and replication Distributed Data Management Christoph Lofi IfIS TU Braunschweig 59

60 12.3 IaaS Pod Architectures Each pod consists of multiple machines with mainboards, CPUs, and main memory Question: where to put secondary storage? Usually, three options Storage area network (SAN) Direct attached storage (DAS) Network attached storage (NAS) or. Storage Service! (e.g. GFS & co.) Distributed Data Management Christoph Lofi IfIS TU Braunschweig 60

61 12.3 IaaS SAN Pods Individual servers don t have own secondary storage Storage area network provides shared hard disks storage for all machines of a Pod Pro All machines have access to the same data Allows for dynamic load balancing or migration of appliances e.g. VMware vmotion Con Very very expensive Higher latency than direct attached storage Distributed Data Management Christoph Lofi IfIS TU Braunschweig 61

62 12.3 IaaS SAN Pods Distributed Data Management Christoph Lofi IfIS TU Braunschweig 62

63 12.3 IaaS DAS Pods Each server has its own set of hard drives Accessing data from other servers may be difficult Pro Cheap Low latency for accessing local data Con Usually, no shared data access Usually, difficult to live-migrate appliances (due to no shared data) But: by using clever storage abstractions, common problems can be circumvented Use distributed file system or a distributed data store! e.g. Apache S3 & SimpleDB, Google GFS & BigTable, Apache HBase & HFS, etc. Distributed Data Management Christoph Lofi IfIS TU Braunschweig 63

64 12.3 IaaS DAS Pods Distributed Data Management Christoph Lofi IfIS TU Braunschweig 64

65 12.3 Amazon EC2 IaaS example: Amazon EC2 Elastic Compute Cloud is one of the core service of the Amazon Cloud Infrastructure Public IaaS Cloud Customers may rent virtual servers hosted at Amazons Data Centers Can freely install OS and applications as needed Virtual servers are offered in different sizes and are paid by CPU usage Basic storage is offered within the VM, but usually additional storage services are used by application which cost extra e.g. S3, SimpleDB, or Dynamo DB Distributed Data Management Christoph Lofi IfIS TU Braunschweig 65

66 12.3 Amazon EC2 Example: t2.micro 1.0 GB memory 1 vcpu units 1 virtual core 1 vcpu is roughly one 2.5 GHz Xeon core No dedicated storage Has to use AWS network storage Burstable performance: 6 CPU credits per hour 1 CPU credit = 1 minute full CPU performance Costs $0.013 per hour $9,30 per month Usually many users start will the small instance, also heavily used for testing From July 2010 Distributed Data Management Christoph Lofi IfIS TU Braunschweig 66

67 12.3 Amazon EC2 Example: m3.xlarge 15 GB memory 4 vcpu units Total of 13 ECU (Elastic Compute Units) 1 ECU is roughly equal to 1.5GHz Xeon core 80 GB instance storage on SSD More storage via AWS Costs $0.28 per hour $201 per month Distributed Data Management Christoph Lofi IfIS TU Braunschweig 67

68 12.3 Amazon EC2 Example: i2.8xlarge 244 GB of memory 32 vcpu Total of 104 ECU units 6400 GB of instance storage on SSD Costs $6.82 per hour $4910 per month Distributed Data Management Christoph Lofi IfIS TU Braunschweig 68

69 12.3 Amazon EC2 Rough Estimations (Oct 2009) Roughly 40,000 servers Uses standard server racks with 16 machines per rack Mostly packed with 2U dual-socket Quad-Core Intel Xeons Roughly matches the High-Mem Quad XL instance Uses around 8 500GB Raid-0 disks Target cost around $2500 per machine in average 75% of the machines are US, the remainder in Europe and Asia Amazon aims at a utilization rate of 75% Very rough guesses state that Amazon may earn $25,264 per hour with EC2! From Oct 2009 Distributed Data Management Christoph Lofi IfIS TU Braunschweig 69

70 12.3 PaaS Platform as a Service (PaaS) Provides software platforms on demand e.g. runtime engines (JavaVM,.Net Runtime, etc.), storage systems (distributed file system, or databases), web services, communication services, etc. PaaS systems are usually used to develop and host web applications or web services User applications run on the provided platform In contrast to IaaS, no installation and maintenance of operation system and server applications necessary Centrally managed and maintained Services or runtimes are directly usable Distributed Data Management Christoph Lofi IfIS TU Braunschweig 70

71 12.3 Google AppEngine Google AppEngine provides users a managed Phyton or Java Runtime Web applications can be directly hosted in AppEngine Just upload you WAR file and you are done Users are billed by resource usage Some free resources provided everyday 1 GB in- and out traffic, 6.5 hours CPU, 500 MB storage overall Resource Unit Unit cost Outgoing Bandwidth GB $0.12 Incoming Bandwidth GB $0.10 CPU Time CPU hours $0.10 Stored Data GB / month $0.15 Recipients ed recipients $ Distributed Data Management Christoph Lofi IfIS TU Braunschweig 71

72 12.3 Google AppEngine Each application can access system resources up to a fixed maximum AppEngine is not fully scalable! AppEngine max values (2010) CPU: 1730 hours CPU per day; 72 minutes CPU per minute Data in or out: 1 TB per day; 10 GB per minute Request: 43M web service calls per day, 30K calls per minute Data storage: no limit (uses BigTable which can scale in size!!) Distributed Data Management Christoph Lofi IfIS TU Braunschweig 72

73 12.3 Amazon SimpleDB Amazon Simple DB is data storage system roughly similar to Google BigTable Simple table-centric database engine SimpleDB is directly ready to use No user configuration or administration Accessible via web service SimpleDB is highly available, uses flexible schemas, and eventual consistency Similar to HBase or BigTable Distributed Data Management Christoph Lofi IfIS TU Braunschweig 73

74 12.3 Amazon Simple DB Any application may use SimpleDB for data storage Simple web service provided to interact with Simple DB Create or delete a table (called domain) Put and delete rows Query for rows Users pay for storage, data transfer, and computation time 25 hours computation time (for querying) are free per month Later: $0.154 per machine hour in 2009 Later: $0.140 per machine hour in GB of data transfer is free per month Later: $0.15 per GB in 2009 Later: $0.12 per GB in Gb of data storage is free per month Later: $0.28 per GB in 2009 Later: $0.25 per GB in 2014 Distributed Data Management Christoph Lofi IfIS TU Braunschweig 74

75 12.3 SaaS Software as a Service (SaaS) Full applications are offered on-demand User just need to consume the software; no installation or maintenance necessary All administrative and maintenance tasks are performed by the Cloud provider e.g. hosting physical hardware, maintaining platforms, maintaining software, dealing with security, scalability, etc. Distributed Data Management Christoph Lofi IfIS TU Braunschweig 75

76 12.3 SalesForce Salesforce.com On-Demand CRM software Customer-Relationship-Management Cooperation with Google Apps in early summer Provides simple online services for Customer database Lead management Call center Customer portal Knowledge Bases Collaboration environments Etc. Distributed Data Management Christoph Lofi IfIS TU Braunschweig 76

77 12.3 SalesForce Distributed Data Management Christoph Lofi IfIS TU Braunschweig 77

78 12.3 SalesForce Distributed Data Management Christoph Lofi IfIS TU Braunschweig 78

79 12.3 SalesForce Bills per month and user, based on edition Distributed Data Management Christoph Lofi IfIS TU Braunschweig 79

80 12.3 Google Apps Google Apps Provides standard office application on-demand i.e. Targeting at the lower-end of the customer base of Microsoft Office MS counters with Office 365 Google Apps provides & Groupware Spreadsheets Documents Presentations Online Forms Drawings etc. Distributed Data Management Christoph Lofi IfIS TU Braunschweig 80

81 12.3 Google Apps Distributed Data Management Christoph Lofi IfIS TU Braunschweig 81

82 Next Semester Multimedia Databases Information Retrieval Relational Databases 1 Distributed Data Management Christoph Lofi IfIS TU Braunschweig 82

83 Distributed Data Management Thank you for your attention! Distributed Data Management Christoph Lofi IfIS TU Braunschweig 83

Distributed Data Management

Distributed Data Management Distributed Data Management Christoph Lofi José Pinto Christian Nieke Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14.0 The Cloud 14.1 Cloud beyond Storage

More information

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University CPSC 426/526 Cloud Computing Ennan Zhai Computer Science Department Yale University Recall: Lec-7 In the lec-7, I talked about: - P2P vs Enterprise control - Firewall - NATs - Software defined network

More information

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability

More information

Introduction To Cloud Computing

Introduction To Cloud Computing Introduction To Cloud Computing What is Cloud Computing? Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g.,

More information

Distributed Systems. 31. The Cloud: Infrastructure as a Service Paul Krzyzanowski. Rutgers University. Fall 2013

Distributed Systems. 31. The Cloud: Infrastructure as a Service Paul Krzyzanowski. Rutgers University. Fall 2013 Distributed Systems 31. The Cloud: Infrastructure as a Service Paul Krzyzanowski Rutgers University Fall 2013 December 12, 2014 2013 Paul Krzyzanowski 1 Motivation for the Cloud Self-service configuration

More information

Cloud Computing. What is cloud computing. CS 537 Fall 2017

Cloud Computing. What is cloud computing. CS 537 Fall 2017 Cloud Computing CS 537 Fall 2017 What is cloud computing Illusion of infinite computing resources available on demand Scale-up for most apps Elimination of up-front commitment Small initial investment,

More information

ECE Enterprise Storage Architecture. Fall ~* CLOUD *~. Tyler Bletsch Duke University

ECE Enterprise Storage Architecture. Fall ~* CLOUD *~. Tyler Bletsch Duke University ECE590-03 Enterprise Storage Architecture Fall 2017.~* CLOUD *~. Tyler Bletsch Duke University Includes material adapted from the course Information Storage and Management v2 (module 13), published by

More information

Next-Generation Cloud Platform

Next-Generation Cloud Platform Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology

More information

CIT 668: System Architecture. Amazon Web Services

CIT 668: System Architecture. Amazon Web Services CIT 668: System Architecture Amazon Web Services Topics 1. AWS Global Infrastructure 2. Foundation Services 1. Compute 2. Storage 3. Database 4. Network 3. AWS Economics Amazon Services Architecture Regions

More information

Improving the MapReduce Big Data Processing Framework

Improving the MapReduce Big Data Processing Framework Improving the MapReduce Big Data Processing Framework Gistau, Reza Akbarinia, Patrick Valduriez INRIA & LIRMM, Montpellier, France In collaboration with Divyakant Agrawal, UCSB Esther Pacitti, UM2, LIRMM

More information

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud? DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing Slide 1 Slide 3 ➀ What is Cloud Computing? ➁ X as a Service ➂ Key Challenges ➃ Developing for the Cloud Why is it called Cloud? services provided

More information

Data Centers and Cloud Computing

Data Centers and Cloud Computing Data Centers and Cloud Computing CS677 Guest Lecture Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet

More information

Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack

Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack Robert Collazo Systems Engineer Rackspace Hosting The Rackspace Vision Agenda Truly a New Era of Computing 70 s 80 s Mainframe Era 90

More information

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama Basics of Cloud Computing Lecture 2 Cloud Providers Satish Srirama Outline Cloud computing services recap Amazon cloud services Elastic Compute Cloud (EC2) Storage services - Amazon S3 and EBS Cloud managers

More information

Cloud Computing Lecture 4

Cloud Computing Lecture 4 Cloud Computing Lecture 4 1/17/2012 What is Hypervisor in Cloud Computing and its types? The hypervisor is a virtual machine monitor (VMM) that manages resources for virtual machines. The name hypervisor

More information

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

Data Centers and Cloud Computing. Slides courtesy of Tim Wood Data Centers and Cloud Computing Slides courtesy of Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet

More information

Data Centers and Cloud Computing. Data Centers

Data Centers and Cloud Computing. Data Centers Data Centers and Cloud Computing Slides courtesy of Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet

More information

DEEP DIVE INTO CLOUD COMPUTING

DEEP DIVE INTO CLOUD COMPUTING International Journal of Research in Engineering, Technology and Science, Volume VI, Special Issue, July 2016 www.ijrets.com, editor@ijrets.com, ISSN 2454-1915 DEEP DIVE INTO CLOUD COMPUTING Ranvir Gorai

More information

THE DEFINITIVE GUIDE FOR AWS CLOUD EC2 FAMILIES

THE DEFINITIVE GUIDE FOR AWS CLOUD EC2 FAMILIES THE DEFINITIVE GUIDE FOR AWS CLOUD EC2 FAMILIES Introduction Amazon Web Services (AWS), which was officially launched in 2006, offers you varying cloud services that are not only cost effective but scalable

More information

Motivation. Map in Lisp (Scheme) Map/Reduce. MapReduce: Simplified Data Processing on Large Clusters

Motivation. Map in Lisp (Scheme) Map/Reduce. MapReduce: Simplified Data Processing on Large Clusters Motivation MapReduce: Simplified Data Processing on Large Clusters These are slides from Dan Weld s class at U. Washington (who in turn made his slides based on those by Jeff Dean, Sanjay Ghemawat, Google,

More information

2013 AWS Worldwide Public Sector Summit Washington, D.C.

2013 AWS Worldwide Public Sector Summit Washington, D.C. 2013 AWS Worldwide Public Sector Summit Washington, D.C. EMR for Fun and for Profit Ben Butler Sr. Manager, Big Data butlerb@amazon.com @bensbutler Overview 1. What is big data? 2. What is AWS Elastic

More information

Cloud Computing. Ennan Zhai. Computer Science at Yale University

Cloud Computing. Ennan Zhai. Computer Science at Yale University Cloud Computing Ennan Zhai Computer Science at Yale University ennan.zhai@yale.edu About Final Project About Final Project Important dates before demo session: - Oct 31: Proposal v1.0 - Nov 7: Source code

More information

Cloud Computing 4/17/2016. Outline. Cloud Computing. Centralized versus Distributed Computing Some people argue that Cloud Computing. Cloud Computing.

Cloud Computing 4/17/2016. Outline. Cloud Computing. Centralized versus Distributed Computing Some people argue that Cloud Computing. Cloud Computing. Cloud Computing By: Muhammad Naseem Assistant Professor Department of Computer Engineering, Sir Syed University of Engineering & Technology, Web: http://sites.google.com/site/muhammadnaseem105 Email: mnaseem105@yahoo.com

More information

Cloud Computing Introduction & Offerings from IBM

Cloud Computing Introduction & Offerings from IBM Cloud Computing Introduction & Offerings from IBM Gytis Račiukaitis IT Architect, IBM Global Business Services Agenda What is cloud computing? Benefits Risks & Issues Thinking about moving into the cloud?

More information

Large Scale Computing Infrastructures

Large Scale Computing Infrastructures GC3: Grid Computing Competence Center Large Scale Computing Infrastructures Lecture 2: Cloud technologies Sergio Maffioletti GC3: Grid Computing Competence Center, University

More information

Introduction to Database Services

Introduction to Database Services Introduction to Database Services Shaun Pearce AWS Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Today s agenda Why managed database services? A non-relational

More information

CHEM-E Process Automation and Information Systems: Applications

CHEM-E Process Automation and Information Systems: Applications CHEM-E7205 - Process Automation and Information Systems: Applications Cloud computing Jukka Kortela Contents What is Cloud Computing? Overview of Cloud Computing Comparison of Cloud Deployment Models Comparison

More information

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama Basics of Cloud Computing Lecture 2 Cloud Providers Satish Srirama Outline Cloud computing services recap Amazon cloud services Elastic Compute Cloud (EC2) Storage services - Amazon S3 and EBS Cloud managers

More information

Middle East Technical University. Jeren AKHOUNDI ( ) Ipek Deniz Demirtel ( ) Derya Nur Ulus ( ) CENG553 Database Management Systems

Middle East Technical University. Jeren AKHOUNDI ( ) Ipek Deniz Demirtel ( ) Derya Nur Ulus ( ) CENG553 Database Management Systems Middle East Technical University Jeren AKHOUNDI (1836345) Ipek Deniz Demirtel (1997691) Derya Nur Ulus (1899608) CENG553 Database Management Systems * Introduction to Cloud Computing * Cloud DataBase as

More information

EBOOK: VMware Cloud on AWS: Optimized for the Next-Generation Hybrid Cloud

EBOOK: VMware Cloud on AWS: Optimized for the Next-Generation Hybrid Cloud EBOOK: VMware Cloud on AWS: Optimized for the Next-Generation Hybrid Cloud Contents Introduction... 3 What is VMware Cloud on AWS?... 5 Customer Benefits of Adopting VMware Cloud on AWS... 6 VMware Cloud

More information

Architekturen für die Cloud

Architekturen für die Cloud Architekturen für die Cloud Eberhard Wolff Architecture & Technology Manager adesso AG 08.06.11 What is Cloud? National Institute for Standards and Technology (NIST) Definition On-demand self-service >

More information

Faculté Polytechnique

Faculté Polytechnique Faculté Polytechnique INFORMATIQUE PARALLÈLE ET DISTRIBUÉE CHAPTER 7 : CLOUD COMPUTING Sidi Ahmed Mahmoudi sidi.mahmoudi@umons.ac.be 13 December 2017 PLAN Introduction I. History of Cloud Computing and

More information

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved BERLIN 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Introduction to Amazon EC2 Danilo Poccia Technical Evangelist @danilop 2015, Amazon Web Services, Inc. or its affiliates. All

More information

CLOUD COMPUTING. Lecture 4: Introductory lecture for cloud computing. By: Latifa ALrashed. Networks and Communication Department

CLOUD COMPUTING. Lecture 4: Introductory lecture for cloud computing. By: Latifa ALrashed. Networks and Communication Department 1 CLOUD COMPUTING Networks and Communication Department Lecture 4: Introductory lecture for cloud computing By: Latifa ALrashed Outline 2 Introduction to the cloud comupting Define the concept of cloud

More information

Cloud Computing introduction

Cloud Computing introduction Cloud and Datacenter Networking Università degli Studi di Napoli Federico II Dipartimento di Ingegneria Elettrica e delle Tecnologie dell Informazione DIETI Laurea Magistrale in Ingegneria Informatica

More information

COMPARING COST MODELS - DETAILS

COMPARING COST MODELS - DETAILS COMPARING COST MODELS - DETAILS SOFTLAYER TOTAL COST OF OWNERSHIP (TCO) CALCULATOR APPROACH The Detailed comparison tab in the TCO Calculator provides a tool with which to do a cost comparison between

More information

Windows Servers In Microsoft Azure

Windows Servers In Microsoft Azure $6/Month Windows Servers In Microsoft Azure What I m Going Over 1. How inexpensive servers in Microsoft Azure are 2. How I get Windows servers for $6/month 3. Why Azure hosted servers are way better 4.

More information

HOW TO PLAN & EXECUTE A SUCCESSFUL CLOUD MIGRATION

HOW TO PLAN & EXECUTE A SUCCESSFUL CLOUD MIGRATION HOW TO PLAN & EXECUTE A SUCCESSFUL CLOUD MIGRATION Steve Bertoldi, Solutions Director, MarkLogic Agenda Cloud computing and on premise issues Comparison of traditional vs cloud architecture Review of use

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines A programming model in Cloud: MapReduce Programming model and implementation for processing and generating large data sets Users specify a map function to generate a set of intermediate key/value pairs

More information

Introduction to Cloud Computing and Virtual Resource Management. Jian Tang Syracuse University

Introduction to Cloud Computing and Virtual Resource Management. Jian Tang Syracuse University Introduction to Cloud Computing and Virtual Resource Management Jian Tang Syracuse University 1 Outline Definition Components Why Cloud Computing Cloud Services IaaS Cloud Providers Overview of Virtual

More information

Cloud Computing Concepts, Models, and Terminology

Cloud Computing Concepts, Models, and Terminology Cloud Computing Concepts, Models, and Terminology Chapter 1 Cloud Computing Advantages and Disadvantages https://www.youtube.com/watch?v=ojdnoyiqeju Topics Cloud Service Models Cloud Delivery Models and

More information

CLOUD COMPUTING ABSTRACT

CLOUD COMPUTING ABSTRACT Ruchi Saraf CSE-VII Sem CLOUD COMPUTING By: Shivali Agrawal CSE-VII Sem ABSTRACT Cloud computing is the convergence and evolution of several concepts from virtualization, distributed application design,

More information

Top 40 Cloud Computing Interview Questions

Top 40 Cloud Computing Interview Questions Top 40 Cloud Computing Interview Questions 1) What are the advantages of using cloud computing? The advantages of using cloud computing are a) Data backup and storage of data b) Powerful server capabilities

More information

INFS 214: Introduction to Computing

INFS 214: Introduction to Computing INFS 214: Introduction to Computing Session 13 Cloud Computing Lecturer: Dr. Ebenezer Ankrah, Dept. of Information Studies Contact Information: eankrah@ug.edu.gh College of Education School of Continuing

More information

CSE Lecture 11: Map/Reduce 7 October Nate Nystrom UTA

CSE Lecture 11: Map/Reduce 7 October Nate Nystrom UTA CSE 3302 Lecture 11: Map/Reduce 7 October 2010 Nate Nystrom UTA 378,000 results in 0.17 seconds including images and video communicates with 1000s of machines web server index servers document servers

More information

Lecture 20: WSC, Datacenters. Topics: warehouse-scale computing and datacenters (Sections )

Lecture 20: WSC, Datacenters. Topics: warehouse-scale computing and datacenters (Sections ) Lecture 20: WSC, Datacenters Topics: warehouse-scale computing and datacenters (Sections 6.1-6.7) 1 Warehouse-Scale Computer (WSC) 100K+ servers in one WSC ~$150M overall cost Requests from millions of

More information

Cloud Computing: Making the Right Choice for Your Organization

Cloud Computing: Making the Right Choice for Your Organization Cloud Computing: Making the Right Choice for Your Organization A decade ago, cloud computing was on the leading edge. Now, 95 percent of businesses use cloud technology, and Gartner says that by 2020,

More information

ZeroStack vs. AWS TCO Comparison ZeroStack s private cloud as-a-service offers significant cost advantages over public clouds.

ZeroStack vs. AWS TCO Comparison ZeroStack s private cloud as-a-service offers significant cost advantages over public clouds. @ZeroStackInc sales@zerostack.com www.zerostack.com ZeroStack vs. AWS TCO Comparison ZeroStack s private cloud as-a-service offers significant cost advantages over public clouds. White Paper Introduction

More information

Introduction to Cloud Computing. [thoughtsoncloud.com] 1

Introduction to Cloud Computing. [thoughtsoncloud.com] 1 Introduction to Cloud Computing [thoughtsoncloud.com] 1 Outline What is Cloud Computing? Characteristics of the Cloud Computing model Evolution of Cloud Computing Cloud Computing Architecture Cloud Services:

More information

Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Copyright 2011, Oracle and/or its affiliates. All rights reserved. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,

More information

Lesson 14: Cloud Computing

Lesson 14: Cloud Computing Yang, Chaowei et al. (2011) 'Spatial cloud computing: how can the geospatial sciences use and help shape cloud computing?', International Journal of Digital Earth, 4: 4, 305 329 GEOG 482/582 : GIS Data

More information

Understanding Cloud Migration. Ruth Wilson, Data Center Services Executive

Understanding Cloud Migration. Ruth Wilson, Data Center Services Executive Understanding Cloud Migration Ruth Wilson, Data Center Services Executive rhwilson@us.ibm.com Migrating to a Cloud is similar to migrating data and applications between data centers with a few key differences

More information

On-Premises Cloud Platform. Bringing the public cloud, on-premises

On-Premises Cloud Platform. Bringing the public cloud, on-premises On-Premises Cloud Platform Bringing the public cloud, on-premises How Cloudistics came to be 2 Cloudistics On-Premises Cloud Platform Complete Cloud Platform Simple Management Application Specific Flexibility

More information

COMP6511A: Large-Scale Distributed Systems. Windows Azure. Lin Gu. Hong Kong University of Science and Technology Spring, 2014

COMP6511A: Large-Scale Distributed Systems. Windows Azure. Lin Gu. Hong Kong University of Science and Technology Spring, 2014 COMP6511A: Large-Scale Distributed Systems Windows Azure Lin Gu Hong Kong University of Science and Technology Spring, 2014 Cloud Systems Infrastructure as a (IaaS): basic compute and storage resources

More information

Parallel Computing: MapReduce Jin, Hai

Parallel Computing: MapReduce Jin, Hai Parallel Computing: MapReduce Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology ! MapReduce is a distributed/parallel computing framework introduced by Google

More information

White Paper. Platform9 ROI for Hybrid Clouds

White Paper. Platform9 ROI for Hybrid Clouds White Paper Platform9 ROI for Hybrid Clouds Quantifying the cost savings and benefits of moving from the Amazon Web Services (AWS) public cloud to the Platform9 hybrid cloud. Abstract Deciding whether

More information

VMware on IBM Cloud:

VMware on IBM Cloud: VMware on IBM Cloud: How VMware customers can deploy new or existing applications with SoftLayer resources. Introduction This paper focuses on how existing VMware customers can gain a strategic advantage

More information

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The

More information

Cloud Computing. Technologies and Types

Cloud Computing. Technologies and Types Cloud Computing Cloud Computing Technologies and Types Dell Zhang Birkbeck, University of London 2017/18 The Technological Underpinnings of Cloud Computing Data centres Virtualisation RESTful APIs Cloud

More information

Cloud + Big Data Putting it all Together

Cloud + Big Data Putting it all Together Cloud + Big Data Putting it all Together Even Solberg 2009 VMware Inc. All rights reserved 2 Big, Fast and Flexible Data Big Big Data Processing Fast OLTP workloads Flexible Document Object Big Data Analytics

More information

vrealize Business Standard User Guide

vrealize Business Standard User Guide User Guide 7.0 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition. To check for more recent editions of this

More information

Mobile Cloud Computing

Mobile Cloud Computing MTAT.03.262 -Mobile Application Development Lecture 8 Mobile Cloud Computing Satish Srirama, Huber Flores satish.srirama@ut.ee Outline Cloud Computing Mobile Cloud Access schemes HomeAssignment3 10/20/2014

More information

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. reserved. Insert Information Protection Policy Classification from Slide 8

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. reserved. Insert Information Protection Policy Classification from Slide 8 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,

More information

Welcome to the New Era of Cloud Computing

Welcome to the New Era of Cloud Computing Welcome to the New Era of Cloud Computing Aaron Kimball The web is replacing the desktop 1 SDKs & toolkits are there What about the backend? Image: Wikipedia user Calyponte 2 Two key concepts Processing

More information

POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN. PostgresConf US

POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN. PostgresConf US POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN PostgresConf US 2018 2018-04-20 ABOUT ME Alexander Kukushkin Database Engineer @ZalandoTech Email: alexander.kukushkin@zalando.de

More information

Data center interconnect for the enterprise hybrid cloud

Data center interconnect for the enterprise hybrid cloud WHITEPAPER Data center interconnect for the enterprise hybrid cloud The world is moving to the cloud. Everything from entertainment and consumer mobile applications to enterprise software and government

More information

vsan Mixed Workloads First Published On: Last Updated On:

vsan Mixed Workloads First Published On: Last Updated On: First Published On: 03-05-2018 Last Updated On: 03-05-2018 1 1. Mixed Workloads on HCI 1.1.Solution Overview Table of Contents 2 1. Mixed Workloads on HCI 3 1.1 Solution Overview Eliminate the Complexity

More information

BUILDING A PRIVATE CLOUD. By Mark Black Jay Muelhoefer Parviz Peiravi Marco Righini

BUILDING A PRIVATE CLOUD. By Mark Black Jay Muelhoefer Parviz Peiravi Marco Righini BUILDING A PRIVATE CLOUD By Mark Black Jay Muelhoefer Parviz Peiravi Marco Righini HOW PLATFORM COMPUTING'S PLATFORM ISF AND INTEL'S TRUSTED EXECUTION TECHNOLOGY CAN HELP 24 loud computing is a paradigm

More information

Course Overview. ECE 1779 Introduction to Cloud Computing. Marking. Class Mechanics. Eyal de Lara

Course Overview. ECE 1779 Introduction to Cloud Computing. Marking. Class Mechanics. Eyal de Lara ECE 1779 Introduction to Cloud Computing Eyal de Lara delara@cs.toronto.edu www.cs.toronto.edu/~delara/courses/ece1779 Course Overview Date Topic Sep 14 Introduction Sep 21 Python Sep 22 Tutorial: Python

More information

IT your way - Hybrid IT FAQs

IT your way - Hybrid IT FAQs Hybrid IT IT your way - Hybrid IT FAQs Create a strategy that integrates in-house and outsourced IT services to meet ever-changing business requirements. Combine on-premise and off premise solutions Mix

More information

THE ZADARA CLOUD. An overview of the Zadara Storage Cloud and VPSA Storage Array technology WHITE PAPER

THE ZADARA CLOUD. An overview of the Zadara Storage Cloud and VPSA Storage Array technology WHITE PAPER WHITE PAPER THE ZADARA CLOUD An overview of the Zadara Storage Cloud and VPSA Storage Array technology Zadara 6 Venture, Suite 140, Irvine, CA 92618, USA www.zadarastorage.com EXECUTIVE SUMMARY The IT

More information

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018 Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster

More information

Cloud Analytics and Business Intelligence on AWS

Cloud Analytics and Business Intelligence on AWS Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse

More information

Cloud Computing Briefing Presentation. DANU

Cloud Computing Briefing Presentation. DANU Cloud Computing Briefing Presentation Contents Introducing the Cloud Value Proposition Opportunities Challenges Success Stories DANU Cloud Offering Introducing the Cloud What is Cloud Computing? IT capabilities

More information

MySQL In the Cloud. Migration, Best Practices, High Availability, Scaling. Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017.

MySQL In the Cloud. Migration, Best Practices, High Availability, Scaling. Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017. MySQL In the Cloud Migration, Best Practices, High Availability, Scaling Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017 1 Let me start. With some Questions! 2 Question One How Many of you

More information

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure Nutanix Tech Note Virtualizing Microsoft Applications on Web-Scale Infrastructure The increase in virtualization of critical applications has brought significant attention to compute and storage infrastructure.

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data

More information

CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies

CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies Lecture 8 Cloud Programming & Software Environments: High Performance Computing & AWS Services Part 2 of 2 Spring 2015 A Specialty Course

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 2. MapReduce Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Framework A programming model

More information

Acknowledgements. Beyond DBMSs. Presentation Outline

Acknowledgements. Beyond DBMSs. Presentation Outline Acknowledgements Beyond RDBMSs These slides are put together from a variety of sources (both papers and slides/tutorials available on the web) Sharma Chakravarthy Information Technology Laboratory Computer

More information

CLOUD COMPUTING PRIMER

CLOUD COMPUTING PRIMER CLOUD COMPUTING PRIMER for Small and Medium-Sized Businesses CONTENTS 1 Executive Summary 2 ABCs of Cloud Computing An IT Revolution 3 The Democratization of Computing Cloud Computing Service Models SaaS

More information

MapReduce and Friends

MapReduce and Friends MapReduce and Friends Craig C. Douglas University of Wyoming with thanks to Mookwon Seo Why was it invented? MapReduce is a mergesort for large distributed memory computers. It was the basis for a web

More information

Dell EMC Hyper-Converged Infrastructure

Dell EMC Hyper-Converged Infrastructure Dell EMC Hyper-Converged Infrastructure New normal for the modern data center GLOBAL SPONSORS Traditional infrastructure and processes are unsustainable Expensive tech refreshes, risky data migrations

More information

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju Distributed Data Infrastructures, Fall 2017, Chapter 2 Jussi Kangasharju Chapter Outline Warehouse-scale computing overview Workloads and software infrastructure Failures and repairs Note: Term Warehouse-scale

More information

Private Cloud Database Consolidation Name, Title

Private Cloud Database Consolidation Name, Title Private Cloud Database Consolidation Name, Title Agenda Cloud Introduction Business Drivers Cloud Architectures Enabling Technologies Service Level Expectations Customer Case Studies Conclusions

More information

How to Keep UP Through Digital Transformation with Next-Generation App Development

How to Keep UP Through Digital Transformation with Next-Generation App Development How to Keep UP Through Digital Transformation with Next-Generation App Development Peter Sjoberg Jon Olby A Look Back, A Look Forward Dedicated, data structure dependent, inefficient, virtualized Infrastructure

More information

Go Cloud. VMware vcloud Datacenter Services by BIOS

Go Cloud. VMware vcloud Datacenter Services by BIOS Go Cloud VMware vcloud Datacenter Services by BIOS Is your IT infrastructure always in tune with your business? If a market opportunity suddenly arises, can your business respond in time? Or is the opportunity

More information

CompSci 516: Database Systems

CompSci 516: Database Systems CompSci 516 Database Systems Lecture 12 Map-Reduce and Spark Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements Practice midterm posted on sakai First prepare and

More information

Oracle IaaS, a modern felhő infrastruktúra

Oracle IaaS, a modern felhő infrastruktúra Sárecz Lajos Cloud Platform Sales Consultant Oracle IaaS, a modern felhő infrastruktúra Copyright 2017, Oracle and/or its affiliates. All rights reserved. Azure Window collapsed Oracle Infrastructure as

More information

SQL Server SQL Server 2008 and 2008 R2. SQL Server SQL Server 2014 Currently supporting all versions July 9, 2019 July 9, 2024

SQL Server SQL Server 2008 and 2008 R2. SQL Server SQL Server 2014 Currently supporting all versions July 9, 2019 July 9, 2024 Current support level End Mainstream End Extended SQL Server 2005 SQL Server 2008 and 2008 R2 SQL Server 2012 SQL Server 2005 SP4 is in extended support, which ends on April 12, 2016 SQL Server 2008 and

More information

Getting Hybrid IT Right. A Softchoice Guide to Hybrid Cloud Adoption

Getting Hybrid IT Right. A Softchoice Guide to Hybrid Cloud Adoption Getting Hybrid IT Right A Softchoice Guide to Hybrid Cloud Adoption Your Path to an Effective Hybrid Cloud The hybrid cloud is on the radar for business and IT leaders everywhere. IDC estimates 1 that

More information

Lecture 11 Hadoop & Spark

Lecture 11 Hadoop & Spark Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem

More information

CS 61C: Great Ideas in Computer Architecture. MapReduce

CS 61C: Great Ideas in Computer Architecture. MapReduce CS 61C: Great Ideas in Computer Architecture MapReduce Guest Lecturer: Justin Hsia 3/06/2013 Spring 2013 Lecture #18 1 Review of Last Lecture Performance latency and throughput Warehouse Scale Computing

More information

Introduction to data centers

Introduction to data centers Introduction to data centers Paolo Giaccone Notes for the class on Switching technologies for data centers Politecnico di Torino December 2017 Cloud computing Section 1 Cloud computing Giaccone (Politecnico

More information

Enterprise Cloud Computing. Eddie Toh Platform Marketing Manager, APAC Data Centre Group Cisco Summit 2010, Kuala Lumpur

Enterprise Cloud Computing. Eddie Toh Platform Marketing Manager, APAC Data Centre Group Cisco Summit 2010, Kuala Lumpur 1 Enterprise Cloud Computing Eddie Toh Platform Marketing Manager, APAC Data Centre Group Cisco Summit 2010, Kuala Lumpur Agenda 2 Fundamentals of Enterprise Cloud Computing IT & Cloud Computing Requirements

More information

Lecture 09: VMs and VCS head in the clouds

Lecture 09: VMs and VCS head in the clouds Lecture 09: VMs and VCS head in the Hands-on Unix system administration DeCal 2012-10-29 1 / 20 Projects groups of four people submit one form per group with OCF usernames, proposed project ideas, and

More information

Map Reduce Group Meeting

Map Reduce Group Meeting Map Reduce Group Meeting Yasmine Badr 10/07/2014 A lot of material in this presenta0on has been adopted from the original MapReduce paper in OSDI 2004 What is Map Reduce? Programming paradigm/model for

More information

Map Reduce. Yerevan.

Map Reduce. Yerevan. Map Reduce Erasmus+ @ Yerevan dacosta@irit.fr Divide and conquer at PaaS 100 % // Typical problem Iterate over a large number of records Extract something of interest from each Shuffle and sort intermediate

More information

What is Cloud Computing? Cloud computing is the dynamic delivery of IT resources and capabilities as a Service over the Internet.

What is Cloud Computing? Cloud computing is the dynamic delivery of IT resources and capabilities as a Service over the Internet. 1 INTRODUCTION What is Cloud Computing? Cloud computing is the dynamic delivery of IT resources and capabilities as a Service over the Internet. Cloud computing encompasses any Subscriptionbased or pay-per-use

More information