Middle East Technical University Jeren AKHOUNDI (1836345) Ipek Deniz Demirtel (1997691) Derya Nur Ulus (1899608) CENG553 Database Management Systems
* Introduction to Cloud Computing * Cloud DataBase as a Service (DBaaS) * Big Data and Cloud Database Examples
Relies on sharing computing resources Cloud is used as metaphor for the internet Cloud Computing = A type of internet based computing allows users to access software applications over the internet rather than installing the applications on each workstation
https://cloud.google.com Google has a private cloud that uses for delivering many different services to it s users including email access, document application,. http://office.microsoft.com Microsoft has Microsoft SharePoint Online Services that allows for content and business intelligence tools to be moved into the cloud. Microsoft currently makes its office applications available in cloud.
If you store your data on a cloud storage system, you ll be able to retrieve data from any location that has Internet access. You don t need to carry around a physical storage device or use the same computer to save and retrieve your information. If you have the proper storage system, you could even allow other people to access the data, turning a personal project into a collaborative effort.
* On-demand self-service: A consumer can provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service s provider. * Broad network access: Cloud needs to be accessed across the internet from a broad range of devices * Resource pooling The provider s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. There is a sense of location-independence in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or data center). Examples of resources include storage, processing, memory, network bandwidth, and virtual machines.
*Rapid elasticity Capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time. * Measured service Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be managed, controlled, and reported providing transparency for both the provider and consumer of the utilized service.
* Software as a Service (SaaS): SaaS is a complete operating environment with applications, management, and the user interface. Consumers purchase the ability to access and use an application or service that is hosted in the cloud. Example : Online Microsoft Office
* Platform as a Service (PaaS): The client can deploy its applications on the cloud infrastructure or use applications that were programmed using languages and tools that are supported by the PaaS service provider. The client is responsible for installing and managing the application that it is deploying.
* Infrastructure as a Service (IaaS): This is the most basic of the cloud service models Consumers control and manage the systems in term of the operating systems, applications, storage and network connectivity.
*Private Cloud The cloud infrastructure is provisioned for exclusive use by a single organization comprising multiple consumers
*Community Cloud Cloud is shared among a number of organizations with similar interests and requirements.
*Public Cloud Cloud is available to the public on a commercial basis by a cloud service provider.
*Hybrid Cloud Cloud consists of a number of clouds of any type. But the clouds have the ability through their interfaces to allow data and applications to be moved from one cloud to another.
* A database can be accessed by the clients via the internet from the cloud database service provider and is deliverable to the users when they demand it. * There is need to distribute the data over different data centers distributed over different locations. * The database must be accessible all the time so that the user can get the data whenever he needs.
* The cloud database holds the data on different data centers located at different locations. * There are multiple nodes across a cloud database, designed for query services, for data centers that are located in different geological locations and the corporate data centers as well. * This means that linking is mandatory for the easy and complete access of the database over the cloud services.
* For this purpose, peer-to-peer communications are preferred. * The purpose to adopt peer-to-peer communication is that, a single node can handle any sort of the query implemented by the user. * Once the query is generated from the user via computer, the node first decides the sort of query, and which node will be best for the query. * After the query is identified by the node, then it is transferred to that specific node.
* The above figure shows the working of a node for fetching data from DBMS data and files. Moreover, the CDBMS will also maintain its database for storing the data that is being frequently used by the nodes. This improves the performance of CDBMS.
* As the cloud database may have a huge number of queries as expected, then handling more queries, the CDBMS may face performance issues. * It is known that there are many nodes in a cloud DBMS, but these nodes are not enough all the times. * This overload of queries needs to be handled immediately. For this purpose, CDBMS instantly initiate a new node that shares the load of queries to the database.
* In CDBMS, resolve to each query; different nodes may be involved. The most effective way to handle the database is by having distributed queries. * The distributed query can be understood as the combination of many queries, and each query will make contact to each distributed node for the retrieval of the information. As there are different queries; so the number of results can be multiplied as well. As the answer that are distributed; they are joined at the end.
*Reduced costs: You do not need a high-powered and high-priced computer to run cloud computing's web-based applications. In fact, your PC in this scenario does not even need a CD or DVD drive, as no software programs have to be loaded and no document files need to be saved. Instead of purchasing expensive software applications, you can get most of what you need for free! There s no need to spend big money on hardware, software or licensing fees. One-time-payment and pay-as-you-go making the cloud even more attractive.
*Instant software updates: Another advantage to cloud computing is that you are no longer faced with choosing between obsolete software and high upgrade costs. When the application is web-based, updates happen automatically. When you access a web-based application, you get the latest version.
*Unlimited storage capacity: Cloud computing offers virtually limitless storage. *Increased data reliability: Unlike desktop computing, in which if a hard disk crashes and destroy all your valuable data, a computer crashing in the cloud should not affect the storage of your data. The process of backing up and recovering data is simplified since those now reside on the cloud and not on a physical device. *Universal document access: That is not a problem with cloud computing, because you do not take your documents with you. Instead, they stay in the cloud, and you can access them whenever you have a computer and an Internet connection.
* Less training: It takes fewer people to do more work on a cloud, with a minimal learning curve on hardware and software issues. * Device independence: Applications provided through the Cloud can be accessed from any device a computer, a smartphone, an ipad etc. * Easier group collaboration: Sharing documents leads directly to better collaboration. * Environmentally friendly: The cloud is in general more efficient than the typical IT infrastructure and It takes fewer resources to compute, thus saving energy.
* Requires a constant Internet connection: Cloud computing is impossible if you cannot connect to the Internet. * Can be slow: Even with a fast connection, web-based applications can sometimes be slower than accessing a similar software program on your desktop PC. Everything about the program, from the interface to the current document, has to be sent back and forth from your computer to the computers in the cloud.
* Features might be limited: This situation is bound to change, but today many web-based applications simply are not as full-featured as their desktop-based applications. For example, you can do a lot more with Microsoft PowerPoint than with Google Presentation's web-based offering. * Stored data might not be secure: Security is the biggest concern when it comes to cloud computing. With cloud computing, all your data is stored on the cloud. The question is «How secure is the cloud?»
* Dependency and vendor lock-in: One of the major disadvantages of cloud computing is the implicit dependency on the provider. If a user wishes to switch to some other provider, then it can be really painful and cumbersome to transfer huge data from the old provider to the new one. * Stored data can be lost: Theoretically, data stored in the cloud is safe, replicated across multiple machines. But on the off chance that your data goes missing, you have no physical or local backup. Put simply, relying on the cloud puts you at risk if the cloud lets you down. End of part 2
* Handling Big Data: Traditional relational database management systems : - Hard to scale-out, - Due to dependencies between tables arising from foreign keys. Cloud based applications: - usually do not have access to centralized, - high performance servers but instead to a large number of distributed, commodity systems. Huge datasets requires data entities to be distributed and processed indepedently. Due to ACID operations, relational databases can not achieve this. Solution: NoSQL(Not only use SQL)
* Characteristics: - schema-free - easy replication support - simple API - eventually consistent / BASE (not ACID) - huge amount of data. *"Not only SQL" to emphasize that NoSQL systems also allow SQL-like query language. *The idea behind the NoSQL: Giving up ACID constraints, one can achieve much higher performance and scalability. * NoSQL offers a schema-free storage solution, with limited query capabilities to enable extreme scale-out through easy data replication.
* IDC predicts that big data is growing at an annual rate of 60% for structured and unstructured data.
* The Cloud Computing model is a perfect match for big data since cloud computing provides unlimited resources on demand * With cloud technology, providers are rolling out more ways to host those databases in the public cloud, freeing users from dedicating their own dedicated hardware to these databases, while providing the ability to scale the databases into large capacities. * A common feature of many NoSQL databases is that data is automatically distributed to new machines when they are added to the cluster, so the performance is also improved
* Virtual machine image - Cloud platforms allow users to rent virtual machine instances for a limited time. - It is possible to run a NoSQL database on these virtual machines. - Users can upload their own machine image with a database installed on it, use ready-made machine images that already include an optimized installation of a database, or install the NoSQL database on a running machine instance.
* Database as a service - Some cloud platforms offer options for using familiar NoSQL database products as a service, such as MongoDB, Redis and Cassandra, without physically launching a virtual machine instance for the database. - Application owners do not have to install and maintain the database on their own, and pay according to usage. - Some database as a service providers provide additional features, such as clustering or high availability, that are not available in the onpremise version of the database
* Native cloud NoSQL databases - Some providers offer a NoSQL database service which is available only on the cloud. - A well-known example is Amazon s SimpleDB, a simple NoSQL key-value store. - SimpleDB cannot be installed on a local machine and cannot be used on any cloud platform except Amazon s.
* AWS has a variety of cloud-based database services, including both relational and NoSQL databases. * Amazon Relational Database (RDS) : run MySQL, Oracle, SQL Server, or PostgreSQL database engines, scale compute & storage * Amazon SimpleDB : A NoSQL database service for smaller datasets * Amazon DynamoDB : scalable NoSQL database service. * its solid-state drive (SSD)-backed database that automatically replicates workloads across at least three availability zones * An animation about nosql databases and DynamoDB : http://www.youtube.com/watch?v=oz-7wjj9hz0
* Microsoft uses its SQL Server technology to provide a relational database, allowing customers to either access a SQL database on its cloud, or hosted SQL server instances on virtual machines. * Microsoft also emphasizes hybrid databases that combine data both on a customer's premise and with the Azure cloud through SQL Data Sync. * Microsoft has a cloud-hosted NoSQL database service named Tables as well, while Blobs (binary large object storage), are optimized for media files such as audio and video.
* Google Cloud SQL is a MySQL database. * Easy to use, doesn't require any software installation or maintenance and is ideal for small to medium-sized applications. * All data replicated in multiple locations for great availability and durability * Choice of billing options: * Per use option means you only pay for the time you access your data * Package option allows you to control your costs for more frequent access * Up to 16GB of RAM and 100GB data storage * Create and manage instances in the Google Developers Console * Data stored in datacenters in the EU or the US * Synchronous or asynchronous geographic replication * Java and Python compatibility * Support for connecting with the Secure Sockets Layer (SSL) protocol
* Runs on some big companies cloud systems like Google Cloud Platform. * Open sourced by Facebook in 2008. * a NoSQL solution. * Use Virtual Machine Image as deployment model * Highly scalable, eventually consistent, distributed, structured keyvalue store. * Keys map to multiple values, which are grouped into column families * Cassandra's data model offers the convenience of column indexes with the performance of log-structured updates, strong support for denormalization and materialized views, and powerful built-in caching.
* a NoSQL solution. * MongoLab gives users access to MongoDB on a variety of major cloud providers, including AWS, Azure and Joyent. * Like other gateway-type services, MongoLab also integrates with various platform as a service (PaaS) tools at the application tier. * MongoLab run on either shared or dedicated environments, with the latter being slightly more expensive.
* Rackspace's database comes in either a cloud or managed hosted offering via Cloud Databases, which is the name of its product. * Rackspace emphasizes the container-based virtualization of its Cloud Databases, which it says allow for higher performance of the database service compared to if it was run entirely on virtualized infrastructure. * Cloud Databases also incorporates a SAN storage network and it's based on an OpenStack platform. * Rackspace announced a NoSQL database in its cloud from provider Cloudant.
* EnterpriseDB focuses on the open source PostgreSQL databases. * Works with Oracle database applications. * With EnterpriseDB's Postgres Plus Advanced Server, organizations can use applications written for on-premise Oracle databases through EnterpriseDB, which runs in clouds from Amazon Web Services and HP. * It has binary replication and scheduled backups as well.
* Unlike other databases in the cloud, StormDB runs its fully distributed, relational database on bare-metal servers, meaning there is no virtualization of machines. * StormDB officials claim this leads to better performance and easier management because users do not have to choose the size of virtual machine instances their database runs on. * Despite running on bare metal, customers do share clusters of servers, although StormDB promises there is isolation among customer databases. * StormDB also automatically shards databases in its cloud. The company is currently in a free beta.