CHOOSING A DATABASE- AS-A-SERVICE

Size: px
Start display at page:

Download "CHOOSING A DATABASE- AS-A-SERVICE"

Transcription

1 CHOOSING A DATABASE- AS-A-SERVICE AN OVERVIEW OF OFFERINGS BY MAJOR PUBLIC CLOUD SERVICE PROVIDERS Warner Chaves Principal Consultant, Microsoft Certified Master, Microsoft MVP With Contributors Danil Zburivsky, Director of Big Data and Data Science Vladimir Stoyak, Principal Consultant for Big Data, Certified Google Cloud Platform Qualified Developer Derek Downey, Practice Advocate, OpenSource Databases Manoj Kukreja, Big Data and IT Security Specialist, CISSP, CCAH and OCP When it comes to running your data in the public cloud, there is a range of Database-as-a-Service (DBaaS) offerings from all three major public cloud providers. Knowing which is best for your use case can be challenging. This paper provides a high-level overview of the main DBaaS offerings from Amazon, Microsoft, and Google. After reading this white paper, you ll have a high-level understanding of the most popular data repositories and data analytics service offerings from each vendor, you ll know the key differences among the offers, and which ones are best for each use case. With this information, you can direct your more detailed research to a manageable number of options. White Paper 1

2 This white paper does not discuss private cloud providers or colocation environments, streaming, data orchestration, or Infrastructure-as-a-Service (IaaS) offerings. This paper is targeted to IT professionals with a good understanding of databases and also business people who want an overview of data platforms in the cloud. WHAT IS A DBAAS OFFERING? A DBaaS is a database running in the public cloud. Three things define a DBaaS: The service provider installs and maintains the database software, including backups and other common database administration tasks. The service provider also owns and manages the operating system, hypervisors, and bare metal hardware. Application owners pay according to their usage of the service. Usage of the service must be flexible users can scale up or down on demand and also create and destroy environments on demand. These operations should be possible through code with no provider intervention. FOUR CATEGORIES OF DBAAS OFFERINGS To keep things simple, we ve created four categories of DBaaS offerings. Your vehicles of choice are: The Corollas: These are the classic RDBMS services in the cloud: Amazon Relational Database Service (RDS), Microsoft Azure SQL Database, and Google Cloud SQL. The Formula One offerings: These special-purpose offerings ingest and query data very quickly but might not offer all the amenities of the Corollas. Options include Amazon DynamoDB, Microsoft Azure DocumentDB, Google Cloud Datastore, and Google Cloud Bigtable. The 18-wheelers: These data warehouses of structured data in the cloud include Amazon Redshift, Microsoft Azure SQL Data Warehouse, and Google BigQuery. The container ships: These Hadoop-based big-data systems can carry anything, and include Amazon Elastic MapReduce (EMR), Microsoft Azure HDInsight, and Google Cloud Dataproc. This category also includes the further automated offering of Azure Data Lake. The rest of this white paper discusses each category and the Amazon, Microsoft, and Google offerings within each category. We describe each offering, explain what it is well suited for, provide expert tips or additional relevant information, and provide high-level pricing information. White Paper 2

3 COROLLAS With the Corollas, just like with the car, you know what you re getting, and you know what to expect. This type of classic RDBMS service gets you from point A to point B reliably. It s not the flashiest or newest thing on the block, but it gets the job done. AMAZON RDS Amazon Relational Database Service (RDS) is the granddaddy of DBaaS offerings available on the Internet. RDS is an automation layer that Amazon has built on top of MySQL, MariaDB, Oracle, PostgreSQL, and SQL Server. Amazon has also developed its own MySQL fork called Amazon Aurora, which also lives inside RDS. RDS is an easy way to transition into DBaaS because the service mimics the onpremises experience very closely. You simply need to provision an RDS instance, which maps very closely to the virtual machine models that Amazon offers. Amazon then installs bits, manages patches and backups, and can also manage the high availability, so you do not need to plan and execute these tasks yourself. RDS is very good for lift-and-shift types of cloud migrations. It makes it easy for existing staff to take advantage of the service because it mimics the on-premises experience, be it physical or virtual. The storage is very flexible: this is both a pro and a con. The pro is that you have a lot of control over storage. The con is that there are so many storage options, you need the knowledge to choose the best one for your use case. Amazon has general storage, provisioned IOPS (input/output operations per second), and two categories of magnetic storage. The storage method you choose will depend on your particular use cases. You need to be aware that Amazon does not make every patch version of all products available on RDS. Instead, Amazon makes only some major service packs or Oracle patch levels available. As a result, the exact patch level that you have on premises might not map to a patch level on RDS. In this situation, do not move to a patch level that is below the patch level you have because that may result in product regressions. Instead, wait until Amazon has deployed a patch level higher than what you have. At this point, it should be fairly safe to start testing if you want to migrate to RDS. The hourly rate for RDS depends on: whether you have your own license or if Amazon is leasing you the license; White Paper 3

4 how much compute power you choose: The number of cores, and amount of memory and temporary disk you want on this instance; the storage you require; and whether you pre-purchased with Reserved Instances. MICROSOFT AZURE SQL DATABASE Microsoft Azure SQL Database is a cloud-first SQL Server fork. The term cloudfirst means that Microsoft now tests and deploys their code continuously with Azure SQL Database, and the code and lessons learned are implemented in the retail SQL Server product whether the product is on premises or on a virtual machine. Even if you don t have any investment in SQL Server, Azure SQL Database is an excellent DBaaS platform because of the investments made to support the elastic capabilities and to the ease of scaling horizontally. As you need more capacity, you just add more databases. It s also easy to manage the databases by pooling resources, performing elastic queries, and performing elastic job executions. You could deploy your own code to do something similar in Amazon RDS, but in Azure SQL Database, Microsoft has already built it for you. In addition, Azure SQL Database makes it easy to build an elastic application on a relational service. This capability supports the Software-as-a-Service (SaaS) model, wherein you have many clients and each has a database. The SaaS provider has a data layer that is easier to manage and scale than if they were running on their own infrastructure. Unlike Amazon RDS, Azure SQL Database does not exactly map to a type of retail database, such as Oracle, SQL Server, or open-source MySQL. It is closely related to SQL Server but it s not licensed or sold in a similar way. As a result, Azure SQL Database does not have any licensing component. At the same time, Azure SQL Database does not give you a lot of control over the hardware. With Amazon RDS, you need to select CPUs, memory, and your storage layout. Azure SQL Database does all this for you. With Azure SQL Database the only thing that you need to choose is the service tier. Your choice determines how much power your database has. There are three service tiers: basic, standard, and premium. Each of these also has some sub-tiers to increase or decrease performance. If you have many databases in Azure SQL Database, you can also choose the elastic database pool pricing option to increase your savings by sharing resources. White Paper 4

5 Azure SQL Database is a good choice if you already have Transact-SQL (T-SQL) skills in-house. If you have a large investment in SQL Server, Azure SQL Database is the most natural way to take advantage of DBaaS offerings in the cloud. It s also a very good web scale relational service in its own right because of all the investments made to support the SaaS model. You do need to ensure that you do the proper SQL tuning to be able to choose the right service tier for your needs. In the past, it was more difficult to scale up because all equipment was on premises. Now, it s very easy to increase the power of the service and therefore pay more money. However, just because scaling up is easy does not mean it s always what you need to do. If you perform the proper SQL tuning, you will not need to pay more for raw power. Azure SQL Database has a simple pricing model. You pay an hourly rate for the service tier your database is running on: Basic, Standard, or Premium. Each has a different size limit for the database and provides more performance as you go up in the tier. GOOGLE CLOUD SQL Google Cloud SQL is a MySQL managed database service that is very similar to Amazon RDS for MySQL and Amazon Aurora. You select an instance and deploy it without needing to install any software. CIoud SQL automates all your backups, replication, patches, and updates anywhere in the world while ensuring greater than percent availability. Automatic failover ensures your database will be available when you need it. Cloud SQL Second Generation introduces per-minute, pay-per-use billing, automatic sustained use discounts, and instance tiers to fit any budget. Cloud SQL does have restrictions on: anything related to loading/dumping the database to a local file system, installing plugins, creating user-defined functions, performance schema, SUPER privileges, and Storage engines: InnoDB is the only one supported for Second Generation instances White Paper 5

6 Pricing for Cloud SQL Second Generation is made up of three components: instance pricing, storage pricing, and network pricing. The charge is based on the machine type you choose for the instance. Storage and network pricing are separate charges. FORMULA ONE OFFERINGS The Formula One DBaaS offerings are fit-for-purpose offerings. They do not have all the functionality of the mature RDBMS products but they do a limited number of things very well. A Formula One car is built purely for speed. It does not have a cup holder, heated seats, or satellite radio. However, it s fit for purpose and that purpose is to go fast. (Admittedly, you might miss some of the amenities that you are used to with a regular car.) Similarly, the Formula One DBaaS offerings are built for purpose. That purpose is to ingest and query data very quickly. Think of them as NoSQL in the cloud. The NoSQL movement was popularized by large web applications such as Google and Facebook as a way to differentiate their database platforms from the classic RDBMS offerings. Usually NoSQL products handle horizontal scalability with more ease, have more relaxed restrictions on schema (if any), and forego some of the ACID requirements as a trade-off for more speed. AMAZON DYNAMODB Amazon DynamoDB is a very popular service offered through Amazon Web Services (AWS). It s basically a NoSQL document/key value table store. All you need to define is the table and either its key or its key and sort order. The schema is completely flexible and is up to you. DynamoDB is best suited for applications with known query patterns that don t require complex transactions and that ingest large volumes of data. DynamoDB is built for scale-out growth of high-ingest applications because the Amazon scale-out architecture guarantees that you will not run out of space. You don t need to worry about the scale out, you just need to know that that this is how Amazon has architected the service. For example, when you specify a partition key for records, they will all be distributed to the same nodes that Amazon builds transparently behind the scenes for your data. This offering does not have an optimizer, so it does not support ad hoc SQL querying the way a relational product does. It s more a set of the normalized instantiated views based on the indexes that you have created on your data. Querying is not done with SQL, it is performed through a different type of specification. Amazon provides SDKs in many languages, including Java, NET, and Python. You use White Paper 6

7 these SDKs to develop queries. This process does require a bit of learning but that s not a major time investment. Although DynamoDB does not have a fixed schema, it does support complex schemas. For example, fields are denormalized: some fields could be lists, some could be maps or sets. This service also exposes a stream-based API, so if you need to replicate the data changes from DynamoDB to another system, you can do so through the stream-based API. Because this service does not support ad hoc querying, your schema can have a huge impact on what you re allowed to do on your application. DynamoDB also has a finite number of indexes that you can apply: five global indexes for each table. You need to keep in mind the indexing limits and lack of an optimizer, and ensure that your schema will be able to support your future application requirements. The cost of DynamoDB is based on storage, how much data you have, and the I/O rate: your number of requests for read units and write units. If you have any streams, you will need to pay for the streams read rate. MICROSOFT AZURE DOCUMENTDB Microsoft Azure DocumentDB is a NoSQL document database that is basically a repository for JSON (JavaScript Object Notation) documents. JSON documents have no schema restrictions. They can contain almost any type of field, and they can also have nested fields. This DBaaS is NoSQL denormalized, with built-in support for partition collection, so you can specify a field in the JSON documents and Azure DocumentDB will partition the documents based on that field. Azure DocumentDB also has built in geo-replication support, so you can have, for example, an Azure DocumentDB collection reading and writing on the east coast of the United States and a replica of this collection that you can use for reads in the central United States. If there s an issue with the DocumentDB on the east coast, you can failover to the other geo-region for very high availability. Azure DocumentDB is a good choice for JSON-based storage, and it s very easy to set up and start storing documents. Retrieval is also easy because this database supports full-blown SQL-style queries, so you don t need to learn any new query language. If you don t specify any indexes, the system has some automatic indexing policies. However, keep in mind that indexing has a storage consumption value, so the more White Paper 7

8 indexes you have, the more storage you will consume and you will pay for that amount of storage. The storage could be for indexes that you do not use, so ensure that the automatic indexing policies work for your use case. If it doesn t make sense to have an index on a field because you never search on it, you can disable the index through a custom policy. Also, if specific collections have limits and you need to perform partition collection, each key will be able to hold no more than 10 Gbit of documents. If you need more than this amount per partition key, you will probably want to ensure that you design with a very high-granularity partition key. Azure DocumentDB offers some pre-defined tiers for billing based on common usage patterns. However, if you want to customize the system, you can easily select your individual compute power, referred to as request units, plus the amount of storage that you want for the collection. GOOGLE CLOUD DATASTORE Google Cloud Datastore is Google s version of a NoSQL cloud service similar to Amazon DynamoDB and Microsoft Azure DocumentDB. From an architecture perspective, Cloud Datastore is similar to other key/value stores. The data model is organized based on entities, which loosely resemble rows in a relational table. Entities can have multiple properties but no rigid schema is imposed on entities. Two different entities of a similar type don t need to have the same number or type of properties. An interesting feature of Cloud Datastore is built-in support for hierarchical data. In addition to all the properties you would expect from a cloud NoSQL DBaaS, such as massive scalability, high availability, and flexible storage, Cloud Datastore also supports some unique properties, including out-of-the-box transaction support and encryption at rest. Google also provides tight integration of Cloud Datastore with other Google Cloud Platform services. Applications running in Google App Engine can use Cloud Datastore as their default database. You can also load data from Cloud Datastore into Google BigQuery for analytics purposes. There are multiple ways to access data in Cloud Datastore. There are client libraries for most popular programming languages as well as a REST interface. Google also provides a GQL language that is roughly modelled on SQL and can provide an easier transition from relational databases to the NoSQL world. White Paper 8

9 Cloud Datastore automatically indexes all properties for an entity, making simple single-property queries possible without any additional configuration. More complex multi-property indexes can be created by defining them in a special configuration file. Similar to other cloud NoSQL services, Cloud Datastore is priced according to amount of storage the database requires and the number of different operations it performs. Google defines prices for reads, writes, and deletes per 100,000 entities. However, simple requests such as fetching an entity by its key (which is a very common operation), are free. GOOGLE CLOUD BIGTABLE Google Cloud Bigtable is Google s NoSQL big-data database service. It s the cloud version of the same database that powers many core Google services, including Search, Analytics, Maps, and Gmail. Bigtable is designed to handle massive workloads at consistent low latency and high throughput, so it s a great choice for both operational and analytical applications, including Internet of Things (IoT) use cases, user analytics, and financial data analysis. This public cloud service gives you instant access to all the engineering effort that was put into Bigtable at Google over the years. The Apache HBase-like database is flexible and robust, and lacks some of the inherited HBase issues, such as Java Google Cloud stalls. In addition, Cloud Bigtable is completely managed, so you don t need to provision hardware, install software, or handle failures. Cloud Bigtable does not have strong typing; it s basically a massive key value table. As data comes in, it is treated as binary strings. This DBaaS also does not have any type of querying through SQL. You have the key, then you can get the value. Cloud Bigtable is also built for very large tables, so it s not worth considering this for anything less than a table of 1 terabyte. Pricing for Cloud Bigtable is based on: the number of Cloud Bigtable nodes that you provision in your project for each hour (you will be billed for a minimum of one hour); the amount of storage that your tables use over a one-month period; and the amount of network bandwidth used. Some types of network egress traffic are subject to bandwidth charges. White Paper 9

10 18-WHEELERS The 18-wheelers can handle the heavy load of structured data. These are basically data warehouses in the cloud. They store and easily query large amounts of structured data. AMAZON REDSHIFT Amazon Redshift is the granddaddy of the 18-wheeler DBaaS offerings. This is Amazon s modified PostgreSQL with columnar storage. Other columnar storage-type offerings include HPE Vertica, Microsoft SQL Server Parallel Data Warehouse (PDW) and SQL Server Column stores, Oracle Exadata Database Machine (Exadata), and Oracle Database In-Memory. All these technologies achieve excellent compression ratios through the columnar storage. Instead of storing the data by rows, they store it by columns, which makes the scans of the data very fast. Redshift is a relational massively parallel processing (MPP) data warehouse, so there are multiple nodes rather than just one big machine. The service works with SQL queries as well as allowing you to write your own modules on Python. Because Redshift is scaled per node, if you need more power you need to add another node. This means you need to make a selection of both compute and storage, and the service is charged per node, per hour. Redshift gives you a lot of control over specific node configurations, so you can choose how many cores and how much memory the nodes have. You can also decide whether to pay more and have the fastest storage on the nodes through solid state drives (SSDs) or save some money by instead using hard drive-based storage attached to the nodes. Redshift is a very good warehousing solution for all your data. If you have a big footprint on AWS, Redshift is definitely the warehousing solution that you want. With Redshift, you do need to watch node count and configurations. The ideal configuration of your Redshift cluster might depend on your workload and your workload patterns, so you need to decide if it s better to have fewer nodes with really high specs or more nodes with less compute or less memory. Based on your analysis, you then need to properly tune Redshift for your workload and warehouse design. Also be aware of possible copy issues due to Amazon Simple Storage System (S3) consistency. Amazon recommends that you use the manifest files to specify what White Paper 10

11 you want to load so that you re not in a situation where you just read the names of the files off S3, and because of the eventual consistency, there is a file that you miss. Finally, Redshift does require regular maintenance to keep the statistics and tables up to date. If you do any updates or deletes, the service has an operation called the Vacuum to keep the tables optimally organized for fast retrieval. Redshift is billed by the hour per node. The cost of each node depends on the configuration of cores, amount of memory, and type of storage you select. MICROSOFT AZURE SQL DATA WAREHOUSE Microsoft Azure SQL Data Warehouse is Microsoft s response to Redshift. It s fully relational, with 100 percent SQL-type queries, and highly compatible with the T-SQL for SQL Server. If you have SQL Server investments, it would be very easy to adopt SQL Data Warehouse. Like Redshift, storage is columnar and the service is MPP. Data is split into storage distributions when you load it. The architecture is distributed, so a query is sent to all the different nodes to help resolve your questions. Azure SQL Data Warehouse scales compute and storage independently. Unlike Redshift, where you always need to scale on a full node, Azure SQL Data Warehouse allows you to add just more compute if you only need more compute. You can also add more storage and keep the same amount of compute. A very powerful capability is that you can pause compute completely. For example, if you don t have much load on your data warehouse during the weekend, you can decide to shut it down and pause it completely during the weekend, for maximum savings. Azure SQL Data Warehouse is an excellent enterprise warehousing solution, particularly if you have a lot of data already built on Azure services. If you have a pause-friendly workload, this service will provide very good savings. Unlike Redshift, which gives you a lot of control over the configuration of the nodes, Azure SQL Data Warehouse gives you no control over hardware. It s 100 percent Platform-as-a-Service (PaaS). You simply select a compute unit, called a Data Warehousing Unit (DWU). The amount of the DWU will give you an idea of the power that you get for the data warehouse. White Paper 11

12 Be aware that at the time of publication, not all T-SQL data types are supported yet. For example, if you need to store spatial data, you could store it now just as binary, but you won t have full support of all the spatial functions. Before you start a full migration to Azure SQL Data Warehouse, ensure that you carefully review which functionality is available. However, if you have only regular structured types on your data warehouse, it s definitely wise to consider this service now. Azure SQL Data Warehouse has two separate cost components: storage and compute. Compute is elastic and is billed by the hour based on the number of DWUs you provision. GOOGLE BIGQUERY Google BigQuery is a mix of an 18-wheeler and a container ship. A container ship is a big-data, Hadoop-style service. BigQuery is a hybrid because it is based on a structured schema but at the same time allows for easy integration with Google DataProc and fixing schema on read over storage. The service supports regular tables with data stored inside the service as well as virtual tables where you put schema on read. It s the same with external tables, so you can map BigQuery to other services inside Google, such as Google Cloud Storage, and then have those tables defined inside BigQuery to be used for your analytic queries. BigQuery is Google Cloud Platform s serverless analytics data warehouse, so you do not need to manage hardware, software or the operating system. Google has replaced its SQL with a standards-compliant dialect that enables more advanced query planning and optimization. There are also new data types, additional support for timestamps, and extended JOIN support. BigQuery also has a streaming interface, so instead of running an Extract, Transform and Load (ETL) process based mostly on fixed-schedule batch processing, you can also have a streaming flow that brings inserts directly into BigQuery constantly using an API or the Cloud Dataflow engine. BigQuery is a very good one-stop shop if you have streaming data, relational data, and file-based data because it can put schema on read. But watch out for White Paper 12

13 high-compute queries where Google estimates that it takes too much compute to resolve at their regular rate per query: as of August 2016, the limit is 1 terabyte. Above this limit, the extra compute cost is $5 per terabyte. You might receive an error message that reads, Hey, you need to run with higher compute. The cost of that query will be higher, so you will need to watch out for runaway costs. Hadoop can be attached to BigQuery tables, but it does require a temporary data copy. The BigQuery Hadoop connector will perform a temporary data copy to Google Cloud Storage (GCS) for Hadoop. Don t be surprised if you incur some GCS costs for this type of operation. BigQuery is billed per storage and per query, with automatic lower pricing tiers after 90 days of data being idle. Costs are based on the amount of data you have and how much of it you read. If you are streaming data in, you will pay extra for it. With BigQuery, you pay only for data read by queries. For example, if you have a 20-TB warehouse in BigQuery, but you re only running 1 to 10 queries per day, you will pay for only those queries. You do not need to pay for provisioning compute storage the way you do with Redshift. With Azure SQL Data Warehouse you also pay for compute, but at least you can pause it. BigQuery goes one step beyond by charging for only specific queries that you run. As a result, you don t even need to think about starting and pausing compute. You simply use compute on demand whenever you want to run a query. CONTAINER SHIPS The container ships are big-data systems that carry everything, any shape or form. They are really Hadoop-as-a-Service, and this is very attractive because on-premises Hadoop deployments have a high cost to experiment: the high cost of curiosity. You need to build your Hadoop service and also have enough storage and enough nodes before you can start your data exploration. If you instead do your data exploration in the cloud, you can let the cloud deploy all the power you need. If you need a very large cluster, you don t need to make any type of capital expenditures to get up and running. You also don t need to make operational expenditures to manage the cluster. You simply create and destroy as needed, and you pay for storage in the cloud. All the major cloud providers offer this type of service. All of the container ships follow a similar pattern. You pick a machine model for your nodes, deploy the cluster with a given size, pick how many nodes you want, and then you attach the ship to a storage service that it can read the data from. The Amazon storage service is S3. The Microsoft Azure services are Azure Data Lake and Azure Blob Storage. Google uses Google Cloud Storage. White Paper 13

14 After the cluster is deployed, you use it as a Hadoop installation if you need to run EMR, Apache Spark, Apache Storm, or any other type of Hadoop-based service. AMAZON ELASTIC MAPREDUCE Amazon Elastic MapReduce (EMR) is a managed Hadoop framework that makes it easy, fast, and cost-effective to distribute and process vast amounts of your data across dynamically scalable Amazon EC2 instances. You can also run other popular distributed frameworks such as Spark and Presto in Amazon EMR, and interact with data in other AWS data stores such as Amazon S3 and Amazon DynamoDB. Amazon EMR releases are packaged using a system based on Apache Bigtop, which is an open-source project associated with the Hadoop ecosystem. In addition to Hadoop and Spark ecosystem projects, each Amazon EMR release provides components that enable cluster and resource management, interoperability with other AWS services, and additional configuration optimizations for installed software. Amazon provides the AWS Data Pipeline service, which allows automating recurring clusters by implementing an orchestration layer to automatically start the cluster submit job, handle exceptions, and tear down clusters when the job is done. Amazon charges per hour for EMR. One way to minimize costs is to have some of the compute nodes deployed on Spot Instances; this provides savings of up to 90 percent. MICROSOFT AZURE HDINSIGHT Microsoft Azure HDInsight is an Apache Hadoop distribution that deploys and provisions managed Hadoop clusters. This service can process unstructured or semi-structured data and has programming extensions for. C#, Java, and.net, so you can use your programming language of choice on Hadoop to create, configure, submit, and monitor Hadoop jobs. HDInsight is tightly integrated with Excel, so you can visualize and analyze your Hadoop data in compelling new ways using a tool that s familiar to your business users. HDInsight incorporates R Server for Hadoop, a cloud implementation of one of the most popular programming languages for statistical computing and machine learning. It gives the familiarity of R with the scalability and performance of Hadoop. HDInsight also includes Apache HBase, a columnar NoSQL database that runs on top of the Hadoop Distributed File System (HDFS). This lets you do large transactional processing (OLTP) of non-relational data, enabling use cases like interactive websites or having sensor data write to Azure Blob Storage. White Paper 14

15 HDInsight includes Apache Storm, an open-source stream analytics platform that can process real-time events at large scale. It also includes Apache Spark, an open-source project in the Apache ecosystem that can run large-scale data analytics applications in memory. HDInsight includes HBase, enabling you to do large transactional processing (OLTP) of non-relational data for use cases such as interactive websites or having sensor data write to Azure Blob Storage. You can also run Spark and Storm in HDInsight. HDInsight is priced based on storage and the cost of the cluster. The cost of the cluster is an hourly rate per node of the cluster. GOOGLE CLOUD DATAPROC Google Cloud Dataproc is a managed Apache Hadoop, Apache Spark, Apache Pig, and Apache Hive service that lets you use open-source data tools for batch processing, querying, streaming, and machine learning. Dataproc helps you create clusters quickly, manage them easily, and save money by turning clusters off when you don t need them. From a networking perspective, Dataproc supports subnets, role-based access, and clusters with no public IP. Similar to Amazon EMR, Dataproc releases are packaged using a system based on Apache Bigtop, which is an open-source project associated with the Hadoop ecosystem. Although some of the tools from the Hadoop ecosystem might not be enabled by default, it is very easy to add them to the deployment. One advantage of Dataproc over EMR is how fast the cluster can be deployed: for most of the configurations the time is less than 90 seconds. Also, after the first 10 minutes there is by-the-minute billing, which makes Dataproc a great contender for building blogs of a more complex ETL pipeline. Another advantage of Dataproc over other managed Hadoop services is its integration with Google Cloud Storage as an alternative to the Hadoop Distributed File System. This integration provides immediate consistency. By contrast, it usually takes 1 to 3 minutes before files become visible on, for example, S3. Immediate consistency in Dataproc means that the same storage can be accessed across multiple clusters in a consistent manner. There is no global orchestration and scheduling service available from Google yet (similar to AWS Data Pipeline), so custom Luigi, Oozie, or Airflow will need to be deployed and maintained. White Paper 15

16 Google is also still working on deeper integration of Stackdriver, Google s integrated monitoring, logging, and diagnostics tool, with Dataproc. An integration at the Job level should be available soon. In the meantime, the Dataproc user interface does provide access to the required logs. Pricing for Dataproc is based on storage and the cost of the cluster. The cost of the cluster is an hourly rate per node of the cluster. MICROSOFT AZURE DATA LAKE Microsoft Azure Data Lake is Microsoft s one step up from Hadoop-as-a-Service. Azure Data Lake service is separated into storage and analytics. The storage service has no limit on size, including no limit on the size of a file. The analytics service can run large data jobs on demand, very similar to how BigQuery runs queries on demand. Because Azure Data Lake is a big-data type of repository, you can mix tables, you can mix files, and you can have external tables. Azure Data Lake does all this through the U-SQL language, which is a mix of SQL and C#. If you have DBAs in your company, or if you have developers who know SQL and C#, it is easy for them to be productive very quickly with Azure Data Lake without needing to learn all the different pieces of the Hadoop ecosystem, such as Pig and Hive. If you do need a full Hadoop cluster, for example if you want to use some Mahout algorithms on your data, you can attach an HDInsight cluster directly to Azure Data Lake and then run from that. You also have the option of on-demand analytics through U-SQL. Analytics can also be scaled dynamically to increase compute. You simply increase the number of analytic units, which are the nodes running your queries. Because analytics are performed per job, you can easily control your cost of using the service. Each time you submit a job, there s a fixed cost. Azure Data Lake is excellent for leveraging T-SQL and.net skills to provide Platform-as-a-Service (PaaS) big-data analytics. The barrier of entry for doing big data analytics is very low in terms of learning new skills. Be aware that this service is still on public preview at the time of this writing. For this reason, it has a limit of 50 analytic units when you run a job, and 3 concurrent jobs per account. However, if you do have a strong use case, you should reach out to Microsoft Support because they can lift these restrictions. White Paper 16

17 Azure Data Lake has two components: storage and jobs. Your total costs on storage depend on how much you store and volume of data transfers. The jobs have a flat rate per job and amount of Analytic Units. These units govern how many compute resources you can get. SUMMARY When it comes to choosing a DBaaS, you have a variety of options. The Corollas are the classic RDBMS services in the cloud: not flashy, but reliable. The Formula One offerings are built for purpose. They don t have all the functionality of the mature RDBMS products but they ingest and query data very quickly. The 18-wheelers are data warehouses in the cloud that store and easily query large amounts of structured data. The container ships are big-data systems that carry everything. Think of them as Hadoop-as-a-service. All of these offerings can improve delivery because all the management tasks are automated. As a result, there s less chance of human error and less chance of quality issues during maintenance. All of the offerings also reduce time-to-market, enable faster ROI, and reduce capital expenditures. Before you choose a service, you need to understand all of them, then closely consider your requirements. You don t want to deploy DocumentDB, then realize later that what you really needed was an RDBMS service. You don t want to choose Redshift, only to discover that you d have been better served by BigQuery. Think about your relational data, your NoSQL unstructured data, and your big structured data requirements for warehousing. Maybe you re also adopting big data analytics. With the right public cloud service for your use case, you can leverage your data to gain insights, then use those insights to gain competitive advantages. For more information about how Pythian can help you choose the right DBaaS for your needs, please visit: ABOUT THE AUTHOR Warner Warner Chaves is a principal consultant at Pythian, and Microsoft Certified Master and Microsoft MVP. Warner has been recognized by his colleagues for his ability to remain calm and collected under pressure. His transparency and candor enable him to develop meaningful relationships with his clients, where he welcomes the opportunity to be challenged. Originally from Costa Rica, Warner is fluent in English and Spanish. White Paper 17

18 CONTRIBUTORS Danil Danil Zburivsky is Pythian s director of big data and data science. Danil leads a team of big data architects and data scientists that help customers worldwide to achieve their most ambitious goals when it comes to large scale data platforms. He is recognized for his expertise in architecting, and building and supporting large mission-critical data platforms using MySQL, Hadoop and MongoDB. Danil is a popular speaker at industry events, and has authored a book titled Hadoop Cluster Deployment. Vladimir Stoyak Vladimir Stoyak is a principal consultant for big data. Vladimir is a certified Google Cloud Platform Qualified Developer, and Principal Consultant for Pythian s Big Data team. He has more than 20 years of expertise working in Big Data and machine learning technologies including Hadoop, Kafka, Spark, Flink, Hbase, and Cassandra. Throughout his career in IT, Vladimir has been involved in a number of startups. He was Director of Application Services for Fusepoint, which was recently acquired by CenturyLink. He also founded AlmaLOGIC Solutions Incorporated, an e-learning analytics company. Derek Derek Downey is the practice advocate for the OpenSource Database practice at Pythian, helping to align technical and business objectives for the company and for our clients. Derek loves automating MySQL, implementing visualization strategies and creating repeatable training environments. Manoj Manoj Kukreja is a big data and IT security specialist whose qualifications include a degree in computer science, a master s degree in engineering, along with CISSP, CCAH and OCP designations. With more than twenty years of experience in the planning, creation and deployment of complex and large scale infrastructures, Manoj has worked for large scale public and private sectors organizations including US and Canadian government agencies. Manoj has expertise in NoSQL and big data technologies including Hadoop, MySQL, MongoDB and Oracle. ABOUT PYTHIAN Pythian is a global IT services company that helps businesses become more competitive by using technology to reach their business goals. We design, implement, and manage systems that directly contribute to revenue and business success. Our services deliver increased agility and business velocity through IT transformation, and high system availability and performance through operational excellence. Our highly skilled technical teams work as an integrated extension of our clients organizations to deliver continuous transformation and uninterrupted operational excellence using our expertise in databases, cloud, DevOps, big data, advanced analytics, and infrastructure management. Pythian, The Pythian Group, love your data, pythian.com, and Adminiscope are trademarks of The Pythian Group Inc. Other product and company names mentioned herein may be trademarks or registered trademarks of their respective owners. The information presented is subject to change without notice. Copyright <year> The Pythian Group Inc. All rights reserved. White Paper 18 V NA

Tour of Database Platforms as a Service. June 2016 Warner Chaves Christo Kutrovsky Solutions Architect

Tour of Database Platforms as a Service. June 2016 Warner Chaves Christo Kutrovsky Solutions Architect Tour of Database Platforms as a Service June 2016 Warner Chaves Christo Kutrovsky Solutions Architect Bio Solutions Architect at Pythian Specialize high performance data processing and analytics 15 years

More information

2013 AWS Worldwide Public Sector Summit Washington, D.C.

2013 AWS Worldwide Public Sector Summit Washington, D.C. 2013 AWS Worldwide Public Sector Summit Washington, D.C. EMR for Fun and for Profit Ben Butler Sr. Manager, Big Data butlerb@amazon.com @bensbutler Overview 1. What is big data? 2. What is AWS Elastic

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

Swimming in the Data Lake. Presented by Warner Chaves Moderated by Sander Stad

Swimming in the Data Lake. Presented by Warner Chaves Moderated by Sander Stad Swimming in the Data Lake Presented by Warner Chaves Moderated by Sander Stad Thank You microsoft.com hortonworks.com aws.amazon.com red-gate.com Empower users with new insights through familiar tools

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache Databases on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

Middle East Technical University. Jeren AKHOUNDI ( ) Ipek Deniz Demirtel ( ) Derya Nur Ulus ( ) CENG553 Database Management Systems

Middle East Technical University. Jeren AKHOUNDI ( ) Ipek Deniz Demirtel ( ) Derya Nur Ulus ( ) CENG553 Database Management Systems Middle East Technical University Jeren AKHOUNDI (1836345) Ipek Deniz Demirtel (1997691) Derya Nur Ulus (1899608) CENG553 Database Management Systems * Introduction to Cloud Computing * Cloud DataBase as

More information

Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect

Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect Igor Roiter Big Data Cloud Solution Architect Working as a Data Specialist for the last 11 years 9 of them as a Consultant specializing

More information

BI ENVIRONMENT PLANNING GUIDE

BI ENVIRONMENT PLANNING GUIDE BI ENVIRONMENT PLANNING GUIDE Business Intelligence can involve a number of technologies and foster many opportunities for improving your business. This document serves as a guideline for planning strategies

More information

Data Architectures in Azure for Analytics & Big Data

Data Architectures in Azure for Analytics & Big Data Data Architectures in for Analytics & Big Data October 20, 2018 Melissa Coates Solution Architect, BlueGranite Microsoft Data Platform MVP Blog: www.sqlchick.com Twitter: @sqlchick Data Architecture A

More information

Azure Data Factory. Data Integration in the Cloud

Azure Data Factory. Data Integration in the Cloud Azure Data Factory Data Integration in the Cloud 2018 Microsoft Corporation. All rights reserved. This document is provided "as-is." Information and views expressed in this document, including URL and

More information

STATE OF MODERN APPLICATIONS IN THE CLOUD

STATE OF MODERN APPLICATIONS IN THE CLOUD STATE OF MODERN APPLICATIONS IN THE CLOUD 2017 Introduction The Rise of Modern Applications What is the Modern Application? Today s leading enterprises are striving to deliver high performance, highly

More information

BIG DATA COURSE CONTENT

BIG DATA COURSE CONTENT BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data

More information

Survey of the Azure Data Landscape. Ike Ellis

Survey of the Azure Data Landscape. Ike Ellis Survey of the Azure Data Landscape Ike Ellis Wintellect Core Services Consulting Custom software application development and architecture Instructor Led Training Microsoft s #1 training vendor for over

More information

Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud

Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud WHITE PAPER / AUGUST 8, 2018 DISCLAIMER The following is intended to outline our general product direction. It is intended for

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.

Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp. 17-18 March, 2018 Beijing Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020 Today, 80% of organizations

More information

Data Analytics at Logitech Snowflake + Tableau = #Winning

Data Analytics at Logitech Snowflake + Tableau = #Winning Welcome # T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief

More information

SQL Server SQL Server 2008 and 2008 R2. SQL Server SQL Server 2014 Currently supporting all versions July 9, 2019 July 9, 2024

SQL Server SQL Server 2008 and 2008 R2. SQL Server SQL Server 2014 Currently supporting all versions July 9, 2019 July 9, 2024 Current support level End Mainstream End Extended SQL Server 2005 SQL Server 2008 and 2008 R2 SQL Server 2012 SQL Server 2005 SP4 is in extended support, which ends on April 12, 2016 SQL Server 2008 and

More information

Databases In the Cloud

Databases In the Cloud Databases In the Cloud Overcoming the challenges Rob Masson ScaleArc, Manager, Solution Architecture Jon Tobin Percona, Director, Solution Engineering What s Coming Jon Enabling the Organization Advantages

More information

White Paper / Azure Data Platform: Ingest

White Paper / Azure Data Platform: Ingest White Paper / Azure Data Platform: Ingest Contents White Paper / Azure Data Platform: Ingest... 1 Versioning... 2 Meta Data... 2 Foreword... 3 Prerequisites... 3 Azure Data Platform... 4 Flowchart Guidance...

More information

QLIK INTEGRATION WITH AMAZON REDSHIFT

QLIK INTEGRATION WITH AMAZON REDSHIFT QLIK INTEGRATION WITH AMAZON REDSHIFT Qlik Partner Engineering Created August 2016, last updated March 2017 Contents Introduction... 2 About Amazon Web Services (AWS)... 2 About Amazon Redshift... 2 Qlik

More information

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Who am I? Zohar Elkayam, CTO at Brillix Programmer, DBA, team leader, database trainer,

More information

Big data streaming: Choices for high availability and disaster recovery on Microsoft Azure. By Arnab Ganguly DataCAT

Big data streaming: Choices for high availability and disaster recovery on Microsoft Azure. By Arnab Ganguly DataCAT : Choices for high availability and disaster recovery on Microsoft Azure By Arnab Ganguly DataCAT March 2019 Contents Overview... 3 The challenge of a single-region architecture... 3 Configuration considerations...

More information

Microsoft Big Data and Hadoop

Microsoft Big Data and Hadoop Microsoft Big Data and Hadoop Lara Rubbelke @sqlgal Cindy Gross @sqlcindy 2 The world of data is changing The 4Vs of Big Data http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data 3 Common

More information

ELASTIC DATA PLATFORM

ELASTIC DATA PLATFORM SERVICE OVERVIEW ELASTIC DATA PLATFORM A scalable and efficient approach to provisioning analytics sandboxes with a data lake ESSENTIALS Powerful: provide read-only data to anyone in the enterprise while

More information

The Data Explosion. A Guide to Oracle s Data-Management Cloud Services

The Data Explosion. A Guide to Oracle s Data-Management Cloud Services The Data Explosion A Guide to Oracle s Data-Management Cloud Services More Data, More Data Everyone knows about the data explosion. 1 And the challenges it presents to businesses large and small. No wonder,

More information

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)?

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)? What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)? What is Amazon Machine Image (AMI)? Amazon Elastic Compute Cloud (EC2)?

More information

Run Critical Databases in the Cloud

Run Critical Databases in the Cloud Cloud Essentials Run Critical Databases in the Cloud Oracle Cloud is ideal for OLTP and analytic applications and it s ready for your enterprise workloads. Cloud computing is transforming business practices

More information

Data Lake Based Systems that Work

Data Lake Based Systems that Work Data Lake Based Systems that Work There are many article and blogs about what works and what does not work when trying to build out a data lake and reporting system. At DesignMind, we have developed a

More information

WHITEPAPER. MemSQL Enterprise Feature List

WHITEPAPER. MemSQL Enterprise Feature List WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure

More information

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale

More information

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight ESG Lab Review InterSystems Data Platform: A Unified, Efficient Data Platform for Fast Business Insight Date: April 218 Author: Kerry Dolan, Senior IT Validation Analyst Abstract Enterprise Strategy Group

More information

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

Architectural challenges for building a low latency, scalable multi-tenant data warehouse Architectural challenges for building a low latency, scalable multi-tenant data warehouse Mataprasad Agrawal Solutions Architect, Services CTO 2017 Persistent Systems Ltd. All rights reserved. Our analytics

More information

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high

More information

Big Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Big Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data on AWS Big Data Agility and Performance Delivered in the Cloud 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Technologies and techniques for working productively

More information

Understanding the latent value in all content

Understanding the latent value in all content Understanding the latent value in all content John F. Kennedy (JFK) November 22, 1963 INGEST ENRICH EXPLORE Cognitive skills Data in any format, any Azure store Search Annotations Data Cloud Intelligence

More information

microsoft

microsoft 70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series

More information

EBOOK: VMware Cloud on AWS: Optimized for the Next-Generation Hybrid Cloud

EBOOK: VMware Cloud on AWS: Optimized for the Next-Generation Hybrid Cloud EBOOK: VMware Cloud on AWS: Optimized for the Next-Generation Hybrid Cloud Contents Introduction... 3 What is VMware Cloud on AWS?... 5 Customer Benefits of Adopting VMware Cloud on AWS... 6 VMware Cloud

More information

17/05/2017. What we ll cover. Who is Greg? Why PaaS and SaaS? What we re not discussing: IaaS

17/05/2017. What we ll cover. Who is Greg? Why PaaS and SaaS? What we re not discussing: IaaS What are all those Azure* and Power* services and why do I want them? Dr Greg Low SQL Down Under greg@sqldownunder.com Who is Greg? CEO and Principal Mentor at SDU Data Platform MVP Microsoft Regional

More information

EBOOK. NetApp ONTAP Cloud FOR MICROSOFT AZURE ENTERPRISE DATA MANAGEMENT IN THE CLOUD

EBOOK. NetApp ONTAP Cloud FOR MICROSOFT AZURE ENTERPRISE DATA MANAGEMENT IN THE CLOUD EBOOK NetApp ONTAP Cloud FOR MICROSOFT AZURE ENTERPRISE DATA MANAGEMENT IN THE CLOUD NetApp ONTAP Cloud for Microsoft Azure The ONTAP Cloud Advantage 3 Enterprise-Class Data Management 5 How ONTAP Cloud

More information

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism Big Data and Hadoop with Azure HDInsight Andrew Brust Senior Director, Technical Product Marketing and Evangelism Datameer Level: Intermediate Meet Andrew Senior Director, Technical Product Marketing and

More information

Oracle GoldenGate for Big Data

Oracle GoldenGate for Big Data Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines

More information

Architekturen für die Cloud

Architekturen für die Cloud Architekturen für die Cloud Eberhard Wolff Architecture & Technology Manager adesso AG 08.06.11 What is Cloud? National Institute for Standards and Technology (NIST) Definition On-demand self-service >

More information

HDInsight > Hadoop. October 12, 2017

HDInsight > Hadoop. October 12, 2017 HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond

More information

Progress DataDirect For Business Intelligence And Analytics Vendors

Progress DataDirect For Business Intelligence And Analytics Vendors Progress DataDirect For Business Intelligence And Analytics Vendors DATA SHEET FEATURES: Direction connection to a variety of SaaS and on-premises data sources via Progress DataDirect Hybrid Data Pipeline

More information

Overview of Data Services and Streaming Data Solution with Azure

Overview of Data Services and Streaming Data Solution with Azure Overview of Data Services and Streaming Data Solution with Azure Tara Mason Senior Consultant tmason@impactmakers.com Platform as a Service Offerings SQL Server On Premises vs. Azure SQL Server SQL Server

More information

5 Fundamental Strategies for Building a Data-centered Data Center

5 Fundamental Strategies for Building a Data-centered Data Center 5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse

More information

#techsummitch

#techsummitch www.thomasmaurer.ch #techsummitch Justin Incarnato Justin Incarnato Microsoft Principal PM - Azure Stack Hyper-scale Hybrid Power of Azure in your datacenter Azure Stack Enterprise-proven On-premises

More information

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS Dr Adnene Guabtni, Senior Research Scientist, NICTA/Data61, CSIRO Adnene.Guabtni@csiro.au EC2 S3 ELB RDS AMI

More information

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications

More information

Syncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET

Syncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET SOLUTION SHEET Syncsort DMX-h Simplifying Big Data Integration Goals of the Modern Data Architecture Data warehouses and mainframes are mainstays of traditional data architectures and still play a vital

More information

CLOUD COMPUTING PRIMER

CLOUD COMPUTING PRIMER CLOUD COMPUTING PRIMER for Small and Medium-Sized Businesses CONTENTS 1 Executive Summary 2 ABCs of Cloud Computing An IT Revolution 3 The Democratization of Computing Cloud Computing Service Models SaaS

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

CHEM-E Process Automation and Information Systems: Applications

CHEM-E Process Automation and Information Systems: Applications CHEM-E7205 - Process Automation and Information Systems: Applications Cloud computing Jukka Kortela Contents What is Cloud Computing? Overview of Cloud Computing Comparison of Cloud Deployment Models Comparison

More information

When, Where & Why to Use NoSQL?

When, Where & Why to Use NoSQL? When, Where & Why to Use NoSQL? 1 Big data is becoming a big challenge for enterprises. Many organizations have built environments for transactional data with Relational Database Management Systems (RDBMS),

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Ian Choy. Technology Solutions Professional

Ian Choy. Technology Solutions Professional Ian Choy Technology Solutions Professional XML KPIs SQL Server 2000 Management Studio Mirroring SQL Server 2005 Compression Policy-Based Mgmt Programmability SQL Server 2008 PowerPivot SharePoint Integration

More information

New Approaches to Big Data Processing and Analytics

New Approaches to Big Data Processing and Analytics New Approaches to Big Data Processing and Analytics Contributing authors: David Floyer, David Vellante Original publication date: February 12, 2013 There are number of approaches to processing and analyzing

More information

Introduction to Database Services

Introduction to Database Services Introduction to Database Services Shaun Pearce AWS Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Today s agenda Why managed database services? A non-relational

More information

Modern ETL Tools for Cloud and Big Data. Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc.

Modern ETL Tools for Cloud and Big Data. Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc. Modern ETL Tools for Cloud and Big Data Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc. Agenda Landscape Cloud ETL Tools Big Data ETL Tools Best Practices

More information

Rickard Linck Client Technical Professional Core Database and Lifecycle Management Common Analytic Engine Cloud Data Servers On-Premise Data Servers

Rickard Linck Client Technical Professional Core Database and Lifecycle Management Common Analytic Engine Cloud Data Servers On-Premise Data Servers Rickard Linck Client Technical Professional Core Database and Lifecycle Management Common Analytic Engine Cloud Data Servers On-Premise Data Servers Watson Data Platform Reference Architecture Business

More information

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp. Data 101 Which DB, When Joe Yong (joeyong@microsoft.com) Azure SQL Data Warehouse, Program Management Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020

More information

Shine a Light on Dark Data with Vertica Flex Tables

Shine a Light on Dark Data with Vertica Flex Tables White Paper Analytics and Big Data Shine a Light on Dark Data with Vertica Flex Tables Hidden within the dark recesses of your enterprise lurks dark data, information that exists but is forgotten, unused,

More information

Your New Autonomous Data Warehouse

Your New Autonomous Data Warehouse AUTONOMOUS DATA WAREHOUSE CLOUD Your New Autonomous Data Warehouse What is Autonomous Data Warehouse Autonomous Data Warehouse is a fully managed database tuned and optimized for data warehouse workloads

More information

CenturyLink for Microsoft

CenturyLink for Microsoft Strategic Partner Alliances CenturyLink for Microsoft EMPOWER REACH AGILITY 2017 CenturyLink. All Rights Reserved. The CenturyLink mark, pathways logo and certain CenturyLink product names are the property

More information

Introduction to K2View Fabric

Introduction to K2View Fabric Introduction to K2View Fabric 1 Introduction to K2View Fabric Overview In every industry, the amount of data being created and consumed on a daily basis is growing exponentially. Enterprises are struggling

More information

FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM

FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM RECOMMENDATION AND JUSTIFACTION Executive Summary: VHB has been tasked by the Florida Department of Transportation District Five to design

More information

Oracle Autonomous Database

Oracle Autonomous Database Oracle Autonomous Database Maria Colgan Master Product Manager Oracle Database Development August 2018 @SQLMaria #thinkautonomous Safe Harbor Statement The following is intended to outline our general

More information

Oracle Exadata: Strategy and Roadmap

Oracle Exadata: Strategy and Roadmap Oracle Exadata: Strategy and Roadmap - New Technologies, Cloud, and On-Premises Juan Loaiza Senior Vice President, Database Systems Technologies, Oracle Safe Harbor Statement The following is intended

More information

Accelerate your Azure Hybrid Cloud Business with HPE. Ken Won, HPE Director, Cloud Product Marketing

Accelerate your Azure Hybrid Cloud Business with HPE. Ken Won, HPE Director, Cloud Product Marketing Accelerate your Azure Hybrid Cloud Business with HPE Ken Won, HPE Director, Cloud Product Marketing Mega trend: Customers are increasingly buying cloud services from external service providers Speed of

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time

More information

What s New at AWS? A selection of some new stuff. Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services

What s New at AWS? A selection of some new stuff. Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services What s New at AWS? A selection of some new stuff Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services Speed of Innovation AWS Pace of Innovation AWS has been continually expanding its

More information

Transform your data estate with cloud, data and AI

Transform your data estate with cloud, data and AI Transform your data estate with cloud, data and AI The world is changing Data will grow to 44 ZB in 2020 Today, 80% of organizations adopt cloud-first strategies AI investment increased by 300% in 2017

More information

SoftNAS Cloud Data Management Products for AWS Add Breakthrough NAS Performance, Protection, Flexibility

SoftNAS Cloud Data Management Products for AWS Add Breakthrough NAS Performance, Protection, Flexibility Control Any Data. Any Cloud. Anywhere. SoftNAS Cloud Data Management Products for AWS Add Breakthrough NAS Performance, Protection, Flexibility Understanding SoftNAS Cloud SoftNAS, Inc. is the #1 software-defined

More information

Total Cost of Ownership: Benefits of the OpenText Cloud

Total Cost of Ownership: Benefits of the OpenText Cloud Total Cost of Ownership: Benefits of the OpenText Cloud OpenText Managed Services in the Cloud delivers on the promise of a digital-first world for businesses of all sizes. This paper examines how organizations

More information

Cloud Computing: Making the Right Choice for Your Organization

Cloud Computing: Making the Right Choice for Your Organization Cloud Computing: Making the Right Choice for Your Organization A decade ago, cloud computing was on the leading edge. Now, 95 percent of businesses use cloud technology, and Gartner says that by 2020,

More information

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Nikita Ivanov CTO and Co-Founder GridGain Systems Peter Zaitsev CEO and Co-Founder Percona About the Presentation

More information

Why All Column Stores Are Not the Same Twelve Low-Level Features That Offer High Value to Analysts

Why All Column Stores Are Not the Same Twelve Low-Level Features That Offer High Value to Analysts White Paper Analytics & Big Data Why All Column Stores Are Not the Same Twelve Low-Level Features That Offer High Value to Analysts Table of Contents page Compression...1 Early and Late Materialization...1

More information

CloudSwyft Learning-as-a-Service Course Catalog 2018 (Individual LaaS Course Catalog List)

CloudSwyft Learning-as-a-Service Course Catalog 2018 (Individual LaaS Course Catalog List) CloudSwyft Learning-as-a-Service Course Catalog 2018 (Individual LaaS Course Catalog List) Microsoft Solution Latest Sl Area Refresh No. Course ID Run ID Course Name Mapping Date 1 AZURE202x 2 Microsoft

More information

August 23, 2017 Revision 0.3. Building IoT Applications with GridDB

August 23, 2017 Revision 0.3. Building IoT Applications with GridDB August 23, 2017 Revision 0.3 Building IoT Applications with GridDB Table of Contents Executive Summary... 2 Introduction... 2 Components of an IoT Application... 2 IoT Models... 3 Edge Computing... 4 Gateway

More information

28 February 1 March 2018, Trafo Baden. #techsummitch

28 February 1 March 2018, Trafo Baden. #techsummitch #techsummitch 28 February 1 March 2018, Trafo Baden #techsummitch Transform your data estate with cloud, data and AI #techsummitch The world is changing Data will grow to 44 ZB in 2020 Today, 80% of organizations

More information

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security Bringing OpenStack to the Enterprise An enterprise-class solution ensures you get the required performance, reliability, and security INTRODUCTION Organizations today frequently need to quickly get systems

More information

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without

More information

Integrate MATLAB Analytics into Enterprise Applications

Integrate MATLAB Analytics into Enterprise Applications Integrate Analytics into Enterprise Applications Aurélie Urbain MathWorks Consulting Services 2015 The MathWorks, Inc. 1 Data Analytics Workflow Data Acquisition Data Analytics Analytics Integration Business

More information

Cloud Analytics and Business Intelligence on AWS

Cloud Analytics and Business Intelligence on AWS Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse

More information

IBM dashdb Local. Using a software-defined environment in a private cloud to enable hybrid data warehousing. Evolving the data warehouse

IBM dashdb Local. Using a software-defined environment in a private cloud to enable hybrid data warehousing. Evolving the data warehouse IBM dashdb Local Using a software-defined environment in a private cloud to enable hybrid data warehousing Evolving the data warehouse Managing a large-scale, on-premises data warehouse environments to

More information

BUYER S GUIDE TO AWS AND AZURE RESERVED INSTANCES

BUYER S GUIDE TO AWS AND AZURE RESERVED INSTANCES WHITEPAPER BUYER S GUIDE TO AWS AND AZURE RESERVED INSTANCES Maximizing RI Cost-Saving Potential www.cloudcheckr.com For the right workloads, those that are predictable and stable, utilizing reserved instances

More information

CORPORATE PERFORMANCE IMPROVEMENT DOES CLOUD MEAN THE PRIVATE DATA CENTER IS DEAD?

CORPORATE PERFORMANCE IMPROVEMENT DOES CLOUD MEAN THE PRIVATE DATA CENTER IS DEAD? CORPORATE PERFORMANCE IMPROVEMENT DOES CLOUD MEAN THE PRIVATE DATA CENTER IS DEAD? DOES CLOUD MEAN THE PRIVATE DATA CENTER IS DEAD? MASS MIGRATION: SHOULD ALL COMPANIES MOVE TO THE CLOUD? Achieving digital

More information

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

MAPR DATA GOVERNANCE WITHOUT COMPROMISE MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance

More information

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Many corporations and Independent Software Vendors considering cloud computing adoption face a similar challenge: how should

More information

Minimizing the Risks of OpenStack Adoption

Minimizing the Risks of OpenStack Adoption Minimizing the Risks of OpenStack Adoption White Paper Minimizing the Risks of OpenStack Adoption Introduction Over the last five years, OpenStack has become a solution of choice for enterprise private

More information

Autonomous Data Warehouse in the Cloud

Autonomous Data Warehouse in the Cloud AUTONOMOUS DATA WAREHOUSE CLOUD` Connecting Your To Autonomous in the Cloud DWCS What is It? Oracle Autonomous Database Warehouse Cloud is fully-managed, highperformance, and elastic. You will have all

More information

Aurora, RDS, or On-Prem, Which is right for you

Aurora, RDS, or On-Prem, Which is right for you Aurora, RDS, or On-Prem, Which is right for you Kathy Gibbs Database Specialist TAM Katgibbs@amazon.com Santa Clara, California April 23th 25th, 2018 Agenda RDS Aurora EC2 On-Premise Wrap-up/Recommendation

More information

Cloud Revenue Streams

Cloud Revenue Streams Cloud Revenue Streams Not All Cloud Platforms Are Created Equal Prepared for: White Paper 2012 Neovise, LLC. All Rights Reserved. Introduction Cloud computing is creating new ways for businesses to outsource

More information