Workshop Report: ElaStraS - An Elastic Transactional Datastore in the Cloud
|
|
- Shon May
- 6 years ago
- Views:
Transcription
1 Workshop Report: ElaStraS - An Elastic Transactional Datastore in the Cloud Sudipto Das, Divyakant Agrawal, Amr El Abbadi Report by: Basil Kohler January 4, 2013 Prerequisites This report elaborates and discusses the paper ElaStraS - An Elastic Transactional Datastore in the Cloud [1]. Some information goes beyond the scope of the paper itself. These Sections are referenced at the end. Some parts of the discussion in Section 4 also reflect the discussion had at the end of my talk. 1 Introduction Cloud computing allows to build highly scalable systems which make use of the virtually infinite resources of the cloud. Currently, many web-based applications still rely on more traditional relational database systems. However, these databases are usually not elastically scalable, which makes them inefficient to use in the cloud. In the paper the authors describe a system which provides a transactional datastore which is elastically scaleable and thus appropriate for a cloud environment. The next Section 1.1 further defines what the cloud is and why cloud services are useful for web-applications. The second Section 1.2 talks about why such a software system is needed and what problems it solves. 1.1 The Cloud Cloud computing is an abstract new term for old and well known principles. It describes resources that are provided as a service. These resources can be the hardware itself, for example computing power, storage and network bandwidth. Companies like Amazon offer these resources as a service. This is called Infrastructure as a Service (IaaS). But a resource can also be a software system or a development solution wich provides tools and libraries to create applications in the cloud. Google AppEnginge or Microsofts Azure are examples for such Platform as a Service (PaaS). In the paper the focuse is set on the IaaS environment. There are serveral advantages and reasons why cloud services recently became so popular. Most of the resources in the cloud can be viewed as virtually infinite. The cloud providers have huge datacenters with very high provisioning and they usually 1
2 Figure 1: Overview of the 3-tier architecture of a typical web-application.. have enough capital to add more resources and infrastructure if required. Because there are virtually infinite resources and the resources are provided as a service, the system is very elastic. One can always request new or abandon unneeded resources. This allows to save money, because an application only uses as many resources as required. This is called the pay-as-you-go model. Another advantage regarding cost is, that there is no initial cost to build a infrastructure. This also allows to transfer the cost and risk of maintaining infrastructure. There are also downfalls in the area of cloud computing. Cloud services can become costly. It depends on the kind of serivce. For example backup storage is already really cheap, but bandwidth or storage with low response time is still quite expensive. I think this will become less of an issue in the future, but currently it has to be taken into account. Another problematic subject is security. Many companies are probably not comfortable with storing sensitive data in the datacenter of another company. 1.2 ElaStraS Motivation Web-based applications typically are built as a 3-tier architecture. Figure 1.2 illustrates this architecture. On top are the web-servers which are responsible to handle the user request and send back responses. The web servers talk to the application-server, which contain the logic of the application. The application server receive and store the data to the database server. The weband application server can be scaled easily, because they do not rely on each other. However, the database server cannot be scaled by simply starting new instances. A distributed database would be required which then needs to use costly distributed transactions which might lower the performance because of locking mechanisms. The second issue with scaling the database is the partitioning of data. Section 2.1 talks about this subject, the main question is how 2
3 . Figure 2: Overview of the ElaStraS system. Manager OTM: Owning Transaction Manager. HTM: High Level Transaction to distribute the data across several instances. To overcome this bottleneck, one might be tempted to use a cloud storage service like Amazon S3. These services provide virtually infinite storage and the data can be replicated to several data centers around the world. But the big downfall is, that any transactional management is lost. S3 and the like are simple key-value stores. Furthermore, they are only eventually consistent. This means, S3 cannot be simply used as a database server. This motivated the authors to consider and build ElaStraS. A system which uses and relies on a distributed storage, is still able to provide certain transactional guarantees and can be elastically scaled. 2 The ElaStraS System ElaStraS is built on cloud resources and runs on Amazon Web Services (AWS). Each component is a EC2 instance. Also the web- and application-server are instances in the cloud. Web-application usually do not require complex schemas. That is why the authors decided to build a simple key-value store instead of a full fledged relational database. So it is a datastore and not a database. Figure 2 shows an overview of the ElaStraS system. On top one can see the Web- and Application Server. The application server forward all requests to the load balancer. The load balancer simply distributes the request to the components of ElaStraS. On the bottom there is the S3 service where all data is stored persistently. In the middle are the three main components of ElaStraS. The OTM are the owning transaction managers. Each OTM has exclusive access to a certain partition of the data. Therefore it can read and write this data. The 3
4 HTM are the high level transaction managers. They can only read data. Any write transactions they receive are forwarded to the OTMs. The master server manages the state of the system and is responsible to start new instances. It also assigns partitions to the OTMs. The next Section 2.1 Partitioning explains what partitions are and how they are used for ElaStraS. Section 2.2 Amazon S3 talks about the properties of Amazons distributed Storage S3. The three main components are described more detailed in the Sections 2.3 OTM, 2.4 HTM and 2.5 Master Server. 2.1 Partitioning Partitioning of data [4] is a known principle for distributed database management systems. It allows to increase the performance and availability of data in certain cases. Data can either be partitioned horizontally or vertically. Horizontal partitioning is also known as sharding. It means that rows of a table are distributed to several partitions. For example, customers of a Swiss webshop application could be partitioned depending on the canton they live in. So there would be a partition for Basel, Zürich, Bern and so on. If the application knows that a transaction only needs to work on a single partition, no distributed transaction is required. In the example above, if a customer makes an order, this transaction can be done only on the partition of the customers canton. Vertical partitioning is less common. Vertical partitioning means, that a table is split by its column. It is basically the same principle used for normalization. But vertical partitioning can go further than normalization. A Table could be split by dynamic and by static columns. This would prevent locks on the static data when the dynamic data gets written and thus increase read performance. ElaStraS uses partitions to grant exclusive read and write access to the OTMs. There are two partition configurations. They are called static and dynamic 1. Static partitioning means, that the data is manually partitioned by the database designer. The web-shop example from above would be such a case. This means that the Application using the database is aware of the partitioning scheme and therefore it is possible to use local transactions on these partitions. In the dynamic configuration the master server dynamically creates the partitions. It builds the partitions for certain value ranges. For example, it could build the hash values of a certain column and then build the partitions according to these values. However, this means that the application is not aware of these partitions and therefore it cannot use local transactions. Section Minitransactions 2.6 explains how one can avoid to do full fledged distributed transactions while still having certain transactional semantics. 2.2 Amazons S3 Properties The distributed storage service of Amazon (S3) is popular and used by many applications to store and backup huge volumes of data. It provides high availability of 99.99% and a extremely high durability of % [5]. It also provides virtually infinite storage. Furthermore, it has a virtually constant response time regardless of the number of users. However, the response time is high. Compared to a local disk it is at least to order of magnitude bigger. Table 1 Not to confuse with vertical partitioning of dynamic and static data 4
5 Figure 3: Response time and bandwidth of S3 for varying page size [2]. 2.2 shows some numbers of S3 response times over the internet. This means, that S3 access is expensive in terms of latency and access should be avoided where possible [2]. 2.3 Owning Transaction Manager OTMs have exclusive access to a partition of the data. Therefore they can write to this data without the need of any distributed synchronization. Further, they can also cache the partition in the memory of the instance. So a OTM is similar to a traditional database. The difference being, that it stores the data to S3 and that it does not have a fast access to persistent storage, because when a EC2 instance crashes all data is lost. The last Section about S3 2.2 already concluded, that access to S3 is slow. To apply common RDB logging, the logs are written to Amazons Elastic Block Storage (EBS). This service is specifically designed to persist data of EC2 instances, but it costs more money than S3. This allows the OTM to provide ACID guarantees for transactions which only write to a single partition in the static partitioning configuration. Because one of the main goals of ElaStraS is elasticity, the OTMs have to be scalable during runtime, or at least without too much effort. The authors did not explained in detail how this should work. I can just think of some scenarios. In dynamic partitioning, the master server could notice high load of a OTM and then stop the instance, split the partition and start two OTMs for each partition. This would lead to a small downtime of a part of the system, but I think we can assume that the overall performance increase would be worth it. For the static partitioning, I don t know how this could work, because the Master server cannot simply split a partition, because the application has to know the partitions to make local transactions possible. 2.4 High Level Transaction Manager The HTM are simpler than the OTM because they can only read data. They have access to all partitions and they can simply cache the data in their local memory. Therefore read performance can be very easily scaled by increasing the number of HTMs. If a HTM receives a write transaction, it acts as a coordinator for a distributed transaction. If the partitions are static, it simply forwards the transaction to the OTM and waits for the result. In case of dynamic partition or for a global transaction, the HTM has to apply the 2PC protocol. Since 2PC is expensive, minitransactions are used, which allow for a more lightweight version of 2PC. Section 2.6 Minitransaction talks about some details. 5
6 2.5 Master Server The master server manages and persistently stores all meta data of the system. This data includes the assigned partitions. The master server also controls the running OTM and HTM and is able to start new instances or shut instances down. Like already mentioned, S3 is too slow to store critical data and it is also only eventually consistent. The master server is a single point of failure and the state has always to be consistent and replicated. Thats why the master server uses Googles Chubby library, which implements distributed synchronization with the PAXOS protocol. This allows to replicate and persistently store the meta data of the server. However, PAXOS is expensive because it uses locks and needs distributed agreement. Since the Master server is not in the datapath of ElaStraS, the server should not be a bottleneck of the system. 2.6 Minitransactions Minitransaction were introduced for the Sinfonia Service [3]. In this Section I will shortly talk about what a minitransaction is and how they are used in Sinfonia. The goal of a minitransaction is to allow more lightweight distributed transactions, which are still powerful. A minitransaction consists of a set of compareitems, a set of read-items and a set of write-items. A modified version of 2PC is used. Instead of executing the actions first and then doing a 2PC protocol to decide for commit or rollback, most actions are directly executed in the first 2PC step. A coordinator sends the minitransaction to the participants. They compare all compare-items. If all comparisons are positive, the read-items are applied. Finally, the write items are applied and additionally logged. The coordinator then decides on commit or rollback and sends the message to the participants. In their system, the coordinator does not use any logs and an external component is used to recover from a coordinator crash by communicating with the participants. This further simplifies the protocol, because the participants do not have to acknowledge the commit. But the recovery proceedure is more complicated. It is not clear how exactly these minitransactions are used for ElaStraS. Furthermore I am not sure why they assume that the transaction is executed in an extra step before 2PC starts, because we learned that the transaction is executed in the first step. However, the separation of read and write items certainly simplify the transactions, but limits the possibilities of a single transaction. 3 Conclusion The cloud is a good fit for web-applications because they can be developed with low cost and still scale up later. User access times of web-based application often fluctuate, which can be accounted for with the elasticity of the cloud services. These web-based application often use a typical 3-Tier architecture. The web- and application server can be scaled up easily, but the database is a bottleneck. ElaStraS shows, how one could overcome this bottleneck while still providing transactional guarantees in certain situations. By partitioning the data and providing exclusive access to this data to a single OTM, transactions 6
7 are possible for a single partition. Elasticity is achieved by spawning new HTM and OTM instances. Heavy caching limits access to the high latency S3 storage. The master server persists the state of the system and handles all meta data. 4 Discussion The paper presents several interesting concepts and the key aspects when building a system on a distributed storage. These concepts can also be found in related work, for example in [2]. Figure 4 shows an overview of these concepts. Splitting of the data: The storage itself has to be split to allow exclusive access for the instances. Without splitting, every write access would have to be synchronized, which would be too expensive. Caching: The data stored in the distributed storage has to be cached by the instace accessing it. Otherwise there would be too much access to the distributed storage. Care has to be taken if this data gets written. Some mechanism is required to log the writes of this data. Master: The use of a master server in a distributed system is also quite common. It allows to store and manage the state of the system and avoids complex and expensive protocols to synchronize the state between the instances. The server cannot lie in the data path and one has to make sure, that the master cannot become a bottleneck. The server is also a single point of failure, which means that the data has to be persistently and consistently stored. Elasticity: The core components of the system should preferably be scalable. This means that the master server can spawn or close instances at any point in time. This ensures that the system is always efficient while having enough performance. Even though the ElaStraS system is a interesting composition and appliance of these key components, the authors did not really talk about any new concepts. However, it is a good starting point to get a general idea of how one could build a similar system. The authors claim that the system is elastic, but they do not elaborate on how one can scale the system in the static configuration. The system can scale in the dynamic mode, this is easy to see, but then application is not able to use local transactions for a partition. So in the end, the static configuration is the more interesting configuration. I could not understand how the system could scale for write transactions in this configuration. Furthermore, ElaStraS should gain performance by the use of minitransactions and the more lightweight 2PC protocol. But again, it was not clear to me how this exactly works and what exactly the trade-offs and advantages of minitransactions are. Generally I think an actual use case for ElaStraS would have helped to point out advantages and to explain how the system works in detail. Finally I need to apologise. In my presentation I claimed that the reference 7
8 . Figure 4: Abstract illustration of the key concepts to build a system on a distributed storage to the Sinfonia paper was the one to a PowerPoint presentation. But this was an error on my side when looking up the reference, because the presentation showed up on top of my search result and the actual paper was further down. References [1] S. Das, D. Agrawal, A. E. Abbadi; ElaStraS - An Elastic Transactional Datastore in the Cloud; Department of Computer Science, UC Santa Barbara, CA, USA, 2009 [2] M. Brantner, D. Florescu, D. Graf, D. Kossmann, T. Kraska; Building a database on S3; In SIGMOD, pages , 2008 [3] M. K. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Kara- manolis; Sinfonia: a new paradigm for building scalable dis- tributed systems; In SOSP, pages , 2007 [4] Wikipedia; accessed on 2. January 2013; last modified on 17. December 2012 at 20:31 [5] AWS; accessed on 2. January
arxiv: v1 [cs.db] 23 Aug 2010
ElasTraS: An Elastic Transactional Data Store in the Cloud Sudipto Das Divyakant Agrawal Amr El Abbadi Department of Computer Science, UC Santa Barbara, CA, USA {sudipto, agrawal, amr}@cs.ucsb.edu arxiv:1008.3751v1
More informationMINIMIZING TRANSACTION LATENCY IN GEO-REPLICATED DATA STORES
MINIMIZING TRANSACTION LATENCY IN GEO-REPLICATED DATA STORES Divy Agrawal Department of Computer Science University of California at Santa Barbara Joint work with: Amr El Abbadi, Hatem Mahmoud, Faisal
More informationApp Engine: Datastore Introduction
App Engine: Datastore Introduction Part 1 Another very useful course: https://www.udacity.com/course/developing-scalableapps-in-java--ud859 1 Topics cover in this lesson What is Datastore? Datastore and
More informationsinfonia: a new paradigm for building scalable distributed systems
sinfonia: a new paradigm for building scalable distributed systems marcos k. aguilera arif merchant mehul shah alistair veitch christos karamanolis hp labs hp labs hp labs hp labs vmware motivation 2 corporate
More informationDISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?
DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing Slide 1 Slide 3 ➀ What is Cloud Computing? ➁ X as a Service ➂ Key Challenges ➃ Developing for the Cloud Why is it called Cloud? services provided
More informationLesson 14: Cloud Computing
Yang, Chaowei et al. (2011) 'Spatial cloud computing: how can the geospatial sciences use and help shape cloud computing?', International Journal of Digital Earth, 4: 4, 305 329 GEOG 482/582 : GIS Data
More informationScale and Scalability Thoughts on Transactional Storage Systems. Liuba Shrira Brandeis University
Scale and Scalability Thoughts on Transactional Storage Systems Liuba Shrira Brandeis University Woman s Workshop, SOSP 2007 Stuff about me Brandeis professor, MIT/CSAIL affiliate, more stuff about me:
More informationAmazon ElastiCache 8/1/17. Why Amazon ElastiCache is important? Introduction:
Amazon ElastiCache Introduction: How to improve application performance using caching. What are the ElastiCache engines, and the difference between them. How to scale your cluster vertically. How to scale
More informationCS848 Paper Presentation Building a Database on S3. Brantner, Florescu, Graf, Kossmann, Kraska SIGMOD 2008
CS848 Paper Presentation Building a Database on S3 Brantner, Florescu, Graf, Kossmann, Kraska SIGMOD 2008 Presented by David R. Cheriton School of Computer Science University of Waterloo 15 March 2010
More informationCloud Storage with AWS: EFS vs EBS vs S3 AHMAD KARAWASH
Cloud Storage with AWS: EFS vs EBS vs S3 AHMAD KARAWASH Cloud Storage with AWS Cloud storage is a critical component of cloud computing, holding the information used by applications. Big data analytics,
More informationJargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems
Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons
More informationMiddle East Technical University. Jeren AKHOUNDI ( ) Ipek Deniz Demirtel ( ) Derya Nur Ulus ( ) CENG553 Database Management Systems
Middle East Technical University Jeren AKHOUNDI (1836345) Ipek Deniz Demirtel (1997691) Derya Nur Ulus (1899608) CENG553 Database Management Systems * Introduction to Cloud Computing * Cloud DataBase as
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL
More informationIntroduction to Distributed Data Systems
Introduction to Distributed Data Systems Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart Web Data Management and Distribution http://webdam.inria.fr/textbook January
More informationSCALABLE CONSISTENCY AND TRANSACTION MODELS
Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS 69 Brewer s Conjecture Three properties that are desirable and expected from realworld shared-data systems C: data consistency A:
More informationCPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University
CPSC 426/526 Cloud Computing Ennan Zhai Computer Science Department Yale University Recall: Lec-7 In the lec-7, I talked about: - P2P vs Enterprise control - Firewall - NATs - Software defined network
More informationDistributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf
Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need
More informationScalability of web applications
Scalability of web applications CSCI 470: Web Science Keith Vertanen Copyright 2014 Scalability questions Overview What's important in order to build scalable web sites? High availability vs. load balancing
More informationGoal of the presentation is to give an introduction of NoSQL databases, why they are there.
1 Goal of the presentation is to give an introduction of NoSQL databases, why they are there. We want to present "Why?" first to explain the need of something like "NoSQL" and then in "What?" we go in
More informationArchitekturen für die Cloud
Architekturen für die Cloud Eberhard Wolff Architecture & Technology Manager adesso AG 08.06.11 What is Cloud? National Institute for Standards and Technology (NIST) Definition On-demand self-service >
More informationCPS 512 midterm exam #1, 10/7/2016
CPS 512 midterm exam #1, 10/7/2016 Your name please: NetID: Answer all questions. Please attempt to confine your answers to the boxes provided. If you don t know the answer to a question, then just say
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (2/2) March 16, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More informationMySQL In the Cloud. Migration, Best Practices, High Availability, Scaling. Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017.
MySQL In the Cloud Migration, Best Practices, High Availability, Scaling Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017 1 Let me start. With some Questions! 2 Question One How Many of you
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 17 Database Systems as a Cloud Service
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 17 Database Systems as a Cloud Service Final Project Presentations Presentation Logistics Where: CSE 403 When
More informationDEEP DIVE INTO CLOUD COMPUTING
International Journal of Research in Engineering, Technology and Science, Volume VI, Special Issue, July 2016 www.ijrets.com, editor@ijrets.com, ISSN 2454-1915 DEEP DIVE INTO CLOUD COMPUTING Ranvir Gorai
More information10. Replication. Motivation
10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure
More informationRA-GRS, 130 replication support, ZRS, 130
Index A, B Agile approach advantages, 168 continuous software delivery, 167 definition, 167 disadvantages, 169 sprints, 167 168 Amazon Web Services (AWS) failure, 88 CloudTrail Service, 21 CloudWatch Service,
More informationCloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe
Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability
More informationData Protection Done Right with Dell EMC and AWS
Data Protection Done Right with Dell EMC and AWS There is a growing need for superior data protection Introduction If you re like most businesses, your data is continually growing in speed and scale, and
More informationLarge-Scale Web Applications
Large-Scale Web Applications Mendel Rosenblum Web Application Architecture Web Browser Web Server / Application server Storage System HTTP Internet CS142 Lecture Notes - Intro LAN 2 Large-Scale: Scale-Out
More informationCISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL
CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours
More informationBigTable. Chubby. BigTable. Chubby. Why Chubby? How to do consensus as a service
BigTable BigTable Doug Woos and Tom Anderson In the early 2000s, Google had way more than anybody else did Traditional bases couldn t scale Want something better than a filesystem () BigTable optimized
More informationCluster-Level Google How we use Colossus to improve storage efficiency
Cluster-Level Storage @ Google How we use Colossus to improve storage efficiency Denis Serenyi Senior Staff Software Engineer dserenyi@google.com November 13, 2017 Keynote at the 2nd Joint International
More informationDIVING IN: INSIDE THE DATA CENTER
1 DIVING IN: INSIDE THE DATA CENTER Anwar Alhenshiri Data centers 2 Once traffic reaches a data center it tunnels in First passes through a filter that blocks attacks Next, a router that directs it to
More informationExtreme Computing. NoSQL.
Extreme Computing NoSQL PREVIOUSLY: BATCH Query most/all data Results Eventually NOW: ON DEMAND Single Data Points Latency Matters One problem, three ideas We want to keep track of mutable state in a scalable
More informationRule 14 Use Databases Appropriately
Rule 14 Use Databases Appropriately Rule 14: What, When, How, and Why What: Use relational databases when you need ACID properties to maintain relationships between your data. For other data storage needs
More informationNoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems
CompSci 516 Data Intensive Computing Systems Lecture 21 (optional) NoSQL systems Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Key- Value Stores Duke CS,
More informationChoosing the Right Deduplication Solution for Your Organization
Choosing the Right Deduplication Solution for Your Organization Application-based deduplication versus appliance-based deduplication Introduction Due to the significant data growth and extended retention
More informationUnderstanding Cloud Migration. Ruth Wilson, Data Center Services Executive
Understanding Cloud Migration Ruth Wilson, Data Center Services Executive rhwilson@us.ibm.com Migrating to a Cloud is similar to migrating data and applications between data centers with a few key differences
More informationA Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores
A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores Nikhil Dasharath Karande 1 Department of CSE, Sanjay Ghodawat Institutes, Atigre nikhilkarande18@gmail.com Abstract- This paper
More informationCloud Computing. Technologies and Types
Cloud Computing Cloud Computing Technologies and Types Dell Zhang Birkbeck, University of London 2017/18 The Technological Underpinnings of Cloud Computing Data centres Virtualisation RESTful APIs Cloud
More informationDistributed Data Store
Distributed Data Store Large-Scale Distributed le system Q: What if we have too much data to store in a single machine? Q: How can we create one big filesystem over a cluster of machines, whose data is
More informationMap-Reduce. Marco Mura 2010 March, 31th
Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of
More informationCloud Computing Technologies and Types
Cloud Computing Technologies and Types Jo, Heeseung From Dell Zhang's, Birkbeck, University of London The Technological Underpinnings of Cloud Computing Data centers Virtualization RESTful APIs Cloud storage
More informationData Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 14: Data Replication Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database Replication What is database replication The advantages of
More informationCS5412: OTHER DATA CENTER SERVICES
1 CS5412: OTHER DATA CENTER SERVICES Lecture V Ken Birman Tier two and Inner Tiers 2 If tier one faces the user and constructs responses, what lives in tier two? Caching services are very common (many
More informationThe Intersection of Cloud & Solid State Storage
The Intersection of Cloud & Solid State Storage Val Bercovici Cloud Czar, NetApp Office of the CTO SNIA Cloud Storage Initiative SNIA Solid State Storage Initiative Cloud Backdrop Worldwide IT spending
More informationDATABASE SCALE WITHOUT LIMITS ON AWS
The move to cloud computing is changing the face of the computer industry, and at the heart of this change is elastic computing. Modern applications now have diverse and demanding requirements that leverage
More informationEBOOK. FROM DISASTER RECOVERY TO ACTIVE-ACTIVE: NuoDB AND MULTI-DATA CENTER DEPLOYMENTS
FROM DISASTER RECOVERY TO ACTIVE-ACTIVE: NuoDB AND MULTI-DATA CENTER DEPLOYMENTS INTRODUCTION Traditionally, multi-data center strategies were deployed primarily to address disaster recovery scenarios.
More informationReplication. Feb 10, 2016 CPSC 416
Replication Feb 10, 2016 CPSC 416 How d we get here? Failures & single systems; fault tolerance techniques added redundancy (ECC memory, RAID, etc.) Conceptually, ECC & RAID both put a master in front
More informationIntroduction to Amazon Web Services
Introduction to Amazon Web Services Introduction Amazon Web Services (AWS) is a collection of remote infrastructure services mainly in the Infrastructure as a Service (IaaS) category, with some services
More informationLow-Latency Multi-Datacenter Databases using Replicated Commit
Low-Latency Multi-Datacenter Databases using Replicated Commit Hatem Mahmoud, Faisal Nawab, Alexander Pucher, Divyakant Agrawal, Amr El Abbadi UCSB Presented by Ashutosh Dhekne Main Contributions Reduce
More informationCS 655 Advanced Topics in Distributed Systems
Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3
More informationTHE ZADARA CLOUD. An overview of the Zadara Storage Cloud and VPSA Storage Array technology WHITE PAPER
WHITE PAPER THE ZADARA CLOUD An overview of the Zadara Storage Cloud and VPSA Storage Array technology Zadara 6 Venture, Suite 140, Irvine, CA 92618, USA www.zadarastorage.com EXECUTIVE SUMMARY The IT
More informationCS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.
Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message
More informationDatacenter replication solution with quasardb
Datacenter replication solution with quasardb Technical positioning paper April 2017 Release v1.3 www.quasardb.net Contact: sales@quasardb.net Quasardb A datacenter survival guide quasardb INTRODUCTION
More informationMotivation There are applications for which it is critical to establish certain availability, consistency, performance etc.
1 Motivation Motivation There are applications for which it is critical to establish certain availability, consistency, performance etc. Banking Web mail KOS, CourseWare (to some degree) Questions How
More informationCS5412: TRANSACTIONS (I)
1 CS5412: TRANSACTIONS (I) Lecture XVII Ken Birman Transactions 2 A widely used reliability technology, despite the BASE methodology we use in the first tier Goal for this week: in-depth examination of
More informationHigh Noon at AWS. ~ Amazon MySQL RDS versus Tungsten Clustering running MySQL on AWS EC2
High Noon at AWS ~ Amazon MySQL RDS versus Tungsten Clustering running MySQL on AWS EC2 Introduction Amazon Web Services (AWS) are gaining popularity, and for good reasons. The Amazon Relational Database
More informationFederated Array of Bricks Y Saito et al HP Labs. CS 6464 Presented by Avinash Kulkarni
Federated Array of Bricks Y Saito et al HP Labs CS 6464 Presented by Avinash Kulkarni Agenda Motivation Current Approaches FAB Design Protocols, Implementation, Optimizations Evaluation SSDs in enterprise
More informationOracle Rdb Hot Standby Performance Test Results
Oracle Rdb Hot Performance Test Results Bill Gettys (bill.gettys@oracle.com), Principal Engineer, Oracle Corporation August 15, 1999 Introduction With the release of Rdb version 7.0, Oracle offered a powerful
More informationUsing MySQL for Distributed Database Architectures
Using MySQL for Distributed Database Architectures Peter Zaitsev CEO, Percona SCALE 16x, Pasadena, CA March 9, 2018 1 About Percona Solutions for your success with MySQL,MariaDB and MongoDB Support, Managed
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 11/15/12 Agenda Check-in Centralized and Client-Server Models Parallelism Distributed Databases Homework 6 Check-in
More informationHow do we build TiDB. a Distributed, Consistent, Scalable, SQL Database
How do we build TiDB a Distributed, Consistent, Scalable, SQL Database About me LiuQi ( 刘奇 ) JD / WandouLabs / PingCAP Co-founder / CEO of PingCAP Open-source hacker / Infrastructure software engineer
More informationMassive Scalability With InterSystems IRIS Data Platform
Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special
More informationPersistent Storage with Docker in production - Which solution and why?
Persistent Storage with Docker in production - Which solution and why? Cheryl Hung 2013-2017 StorageOS Ltd. All rights reserved. Cheryl 2013-2017 StorageOS Ltd. All rights reserved. 2 Why do I need storage?
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 (Fall 2018) Part 7: Mutable State (2/2) November 13, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides are
More informationDistributed Systems COMP 212. Revision 2 Othon Michail
Distributed Systems COMP 212 Revision 2 Othon Michail Synchronisation 2/55 How would Lamport s algorithm synchronise the clocks in the following scenario? 3/55 How would Lamport s algorithm synchronise
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.
More informationEngineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05
Engineering Goals Scalability Availability Transactional behavior Security EAI... Scalability How much performance can you get by adding hardware ($)? Performance perfect acceptable unacceptable Processors
More informationNewSQL Without Compromise
NewSQL Without Compromise Everyday businesses face serious challenges coping with application performance, maintaining business continuity, and gaining operational intelligence in real- time. There are
More informationWLS Neue Optionen braucht das Land
WLS Neue Optionen braucht das Land Sören Halter Principal Sales Consultant 2016-11-16 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information
More informationCloud Computing. DB Special Topics Lecture (10/5/2012) Kyle Hale Maciej Swiech
Cloud Computing DB Special Topics Lecture (10/5/2012) Kyle Hale Maciej Swiech Managing servers isn t for everyone What are some prohibitive issues? (we touched on these last time) Cost (initial/operational)
More informationFIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS
WHITE PAPER FIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS Over the past 15 years, server virtualization has become the preferred method of application deployment in the enterprise datacenter.
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (1/2) March 14, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More informationConsistency and Scalability
COMP 150-IDS: Internet Scale Distributed Systems (Spring 2015) Consistency and Scalability Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah Copyright 2015 Noah
More informationNext-Generation Cloud Platform
Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology
More informationCPET 581 Cloud Computing: Technologies and Enterprise IT Strategies
CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies Lecture 8 Cloud Programming & Software Environments: High Performance Computing & AWS Services Part 2 of 2 Spring 2015 A Specialty Course
More informationRok: Decentralized storage for the cloud native world
Whitepaper Rok: Decentralized storage for the cloud native world Cloud native applications and containers are becoming more and more popular, as enterprises realize their benefits: containers are great
More informationLast time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson
Distributed systems Lecture 6: Elections, distributed transactions, and replication DrRobert N. M. Watson 1 Last time Saw how we can build ordered multicast Messages between processes in a group Need to
More informationMySQL Performance Improvements
Taking Advantage of MySQL Performance Improvements Baron Schwartz, Percona Inc. Introduction About Me (Baron Schwartz) Author of High Performance MySQL 2 nd Edition Creator of Maatkit, innotop, and so
More informationModule Day Topic. 1 Definition of Cloud Computing and its Basics
Module Day Topic 1 Definition of Cloud Computing and its Basics 1 2 3 1. How does cloud computing provides on-demand functionality? 2. What is the difference between scalability and elasticity? 3. What
More informationECE Enterprise Storage Architecture. Fall ~* CLOUD *~. Tyler Bletsch Duke University
ECE590-03 Enterprise Storage Architecture Fall 2017.~* CLOUD *~. Tyler Bletsch Duke University Includes material adapted from the course Information Storage and Management v2 (module 13), published by
More informationMySQL High Availability
MySQL High Availability And other stuff worth talking about Peter Zaitsev CEO Moscow MySQL Users Group Meetup July 11 th, 2017 1 Few Words about Percona 2 Percona s Purpose To Champion Unbiased Open Source
More informationRAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE
RAID SEMINAR REPORT 2004 Submitted on: Submitted by: 24/09/2004 Asha.P.M NO: 612 S7 ECE CONTENTS 1. Introduction 1 2. The array and RAID controller concept 2 2.1. Mirroring 3 2.2. Parity 5 2.3. Error correcting
More informationPerformance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences
Performance and Forgiveness June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences Margo Seltzer Architect Outline A consistency primer Techniques and costs of consistency
More informationDistributed System. Gang Wu. Spring,2018
Distributed System Gang Wu Spring,2018 Lecture4:Failure& Fault-tolerant Failure is the defining difference between distributed and local programming, so you have to design distributed systems with the
More informationContinuous Data Protection
Continuous Data Protection Comprehensive protection of data is a critical responsibility of a data warehouse. This includes both protection against unauthorized access and protection against data loss
More informationHadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017
Hadoop File System 1 S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y Moving Computation is Cheaper than Moving Data Motivation: Big Data! What is BigData? - Google
More informationTROPIC: Transactional Resource Orchestration Platform In the Cloud
TROPIC: Transactional Resource Orchestration Platform In the Cloud Changbin Liu, Yun Mao*, Xu Chen*, Mary Fernandez*, Boon Thau Loo, Jacobus Van der Merwe* * netdb.cis.upenn.edu/dmf 1 Motivation Infrastructure
More informationAtomicity. Bailu Ding. Oct 18, Bailu Ding Atomicity Oct 18, / 38
Atomicity Bailu Ding Oct 18, 2012 Bailu Ding Atomicity Oct 18, 2012 1 / 38 Outline 1 Introduction 2 State Machine 3 Sinfonia 4 Dangers of Replication Bailu Ding Atomicity Oct 18, 2012 2 / 38 Introduction
More informationDetermining the IOPS Needs for Oracle Database on AWS
Determining the IOPS Needs for Oracle Database on AWS Abdul Sathar Sait December 2014 Contents Abstract 2 Introduction 2 Storage Options for Oracle Database 3 IOPS Basics 4 Estimating IOPS for an Existing
More informationMegastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database.
Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database. Presented by Kewei Li The Problem db nosql complex legacy tuning expensive
More informationScaling DreamFactory
Scaling DreamFactory This white paper is designed to provide information to enterprise customers about how to scale a DreamFactory Instance. The sections below talk about horizontal, vertical, and cloud
More informationIntroduction To Cloud Computing
Introduction To Cloud Computing What is Cloud Computing? Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g.,
More informationFundamentals Large-Scale Distributed System Design. (a.k.a. Distributed Systems 1)
Fundamentals Large-Scale Distributed System Design (a.k.a. Distributed Systems 1) https://columbia.github.io/ds1-class/ 1 Interested in... 1. scalable web services? 2. big data? 3. and the large-scale
More information6.824 Final Project. May 11, 2014
6.824 Final Project Colleen Josephson cjoseph@mit.edu Joseph DelPreto delpreto@mit.edu Pranjal Vachaspati pranjal@mit.edu Steven Valdez dvorak42@mit.edu May 11, 2014 1 Introduction The presented project
More informationIntegrity in Distributed Databases
Integrity in Distributed Databases Andreas Farella Free University of Bozen-Bolzano Table of Contents 1 Introduction................................................... 3 2 Different aspects of integrity.....................................
More informationSummary: Open Questions:
Summary: The paper proposes an new parallelization technique, which provides dynamic runtime parallelization of loops from binary single-thread programs with minimal architectural change. The realization
More information