NoSQL-- Hadoop/ HBase/ MongoDB Database Management System II Computer Science Department

Size: px
Start display at page:

Download "NoSQL-- Hadoop/ HBase/ MongoDB Database Management System II Computer Science Department"

Transcription

1 NoSQL-- Hadoop/ HBase/ MongoDB Database Management System II Computer Science Department Dr. Karne

2 Description: In the pass decades, the volume of data is increasing explosively. There are 5-V characteristics [1] to describe the contemporary data. With the advent of advanced electric equipment, more data are collected through smart phone, sensors. These data are not being able to handle by relational database(rdbms) due to the scale and structure. RDBMS need users to design well-formed, strict schema to store the data. However, most of the data nowadays are semistructure or even non-structure hence can t follow the schema. This become a challenge to RDBMS and obviously, it s not suitable for data analyst and data miner. To overcome this difficulty, people turn to other architecture like Hadoop or MongoDB. They both belong to NoSQL (Not only SQL) database design (Here we specify the HBase which runs on top of HDFS). NoSQL database is schema-less and thus having more flexibility than RDBMS to handle various type of data. In this paper, we will introduce Hadoop (HBase) and MongoDB two famous DBMS and compare to RDBMS with their architecture and performance. However, none of them are designed to replace RDBMS, it depends on the user requirement. RDBMS is a mature product and robust in handling online transaction processing (OLTP) and small size of data online analysis processing (OLAP). Hadoop and MongoDB is better for OLAP with big data due to its scalability. Outline: Introduction to Hadoop/ HBase/ MongoDB. Comparison of infrastructure/ Cons and Pros. Example of company database in these different databases. Compare the performance in terms of querying/loading data. How to migrate from SQL to NoSQL Introduction to Hadoop (HBase) and MongoDB: Hadoop: Hadoop is a framework that is useful for big data analysis. The core of Hadoop is composed of two parts: HDFS and MapReduce [6]. HDFS provides reliable, scalable and fault tolerate data storage. The idea of MapReduce originates from divide-and-conquer. First, map data into clusters of nodes, each node is responsible for part of the data computation. Finally, each node return its result back to client, aggregate all the results then merge to a single result. This allows parallel implementation hence reduce the time for analysis. The architecture of Hadoop is shown below. It contains numbers of data nodes to store and process data, one name node monitors all the data node. Every data node has a task tracker to keep track its own schedule and report result once finished. One job tracker monitors all the task trackers.

3 Hadoop architecture Map-Reduce paradigm HBase: HBase is a NoSQL database runs on top of HDFS [5][7]. It s a key-value column-oriented storage. It doesn t require each key-value pair has same number of columns. Moreover, it also allows each column has several column members hence we call column family. Each record contains a timestamp to keep different versions. The data looks like the tables in RDBMS, those tables are split into regions and distributed to various nodes. Similar to HDFS, HBase has three different kind of servers in the architecture. One is region server responsible for storing data and accepting request from clients. Second one is master server which is to assign region server and create, delete and update tables. The last one is Zookeeper, it s used to maintain which servers are live or unavailable. HBase allows user to define row key just we define primary key in the RDBMS. Accessing specific record can through key quickly. With this feature, HBase can handle real-time data processing but still can t replace RDBMS to process complex transactions. HBase architecture ZooKeeper monitor all nodes with heartbeats MongoDB: MongoDB is a free and open-source cross-platform document-oriented database program developed by MongoDB, Inc. It is classified as a NoSQL database program which uses JSON-like documents is vary in schemas. [8] The aim of MongoDB is to enhance the high-performance of data storage solutions with providing WEB applications. [9] MongoDB is database product with the character between SQL and NoSQL. It is one of the popular NoSQL database with abundant functions. It also supports various of different formats such as JSO-like and BSON-like documents that stored complex data types. One of its advanced characteristic is the big support to powerful query language. MongoDB is similar with the object-orient query language which can achieve

4 most functions like in RDBMS and support data indexing. When we using MongoDB, we can create records without defining the structure first, that means we can just simply change the format or structure by adding new fields or deleting existing ones. MongoDB help us use a much easier way to represent hierarchical relationship, to store arrays, and other more complex structures. MongoDB features MongoDB architecture Comparison of infrastructure, Cons and Pros with RDBMS: HDFS/HBase V.S. RDBMS: Characteristics HDFS HBase RDBMS MongoDB Property Distributed file Distributed Relational system NoSQL database database Index Schema N/A Flexible Required & fixed Flexible Data Size Petabytes Petabytes Terabytes Byte Data Storage Files Key-value Roworiented column-oriented BSON files Data Type Semi/Un Semi/Un structured structured Structured JSON/BSON Scalability Scale-out Scale-out Scale-up Scale-up Scale Cost Low Low High High Normalization No need No need Required Doesn t support joins Index N/A Support primary Support index. No various secondary indexing indexing various Random accessing Update No support Support Support Support Write once, read many times Read/Write many times Read/Write many times Read/Write many times

5 Best fit OLAP Batch processing Massive data storage Handle massive data but also need random read/write OLTP Complex transactions Acceptable OLAP NoSQL New SQL Big Data MongoDB V.S. RDBMS: Compare with RDBMS (SQL), MongoDB is schema less that one collection holds different documents. The structure of a single object is clear and easy to understand. There are no complex joins. Also with rich queries it can support dynamic queries on documents using JSON document based query language almost has the same power with SQL. MongoDB uses internal memory to store the working set that makes the access of data faster. Company database example: er ce 21 James E Borg DEPARTMENT HBase: RDBMS: When using RDBMS, we need well-designed schema and normalized tables like below. Both employee and department information are stored in different tables, if we want to know the department name to which employee belongs, Don is used as foreign key and joined with two tables to get the relations. EMPLOYEE Fname Min it Lnam e Ssn Bdate Address Se x Salar y Super_s sn Dn o John B Smith Fondren, M Houston, TX 0 Frankl T Wong Voss, Houston, M in TX 0 Jennif S Walla Berry, Bellaire, F TX 450 Stone, Houston, TX 0 M 00 0 null 1 Dname Dnumber Mgr_ssn Mgr_start_date Research Administration Headquarters As we mentioned above, HBase is a key-value column-oriented database. Each row has a row key just like primary key in RDBMS and a collection of column families and column members. So, we don t need to normalize our table, in fact, it s usually de-normalized. HBase allows user to add arbitrary column members under each column family. The company example in HBase is like below, notice that it allows no value stored in the cell like Mgr_ssn in the row key 88866, namely, HBase stores sparse table. When we read data from table, the format looks like

6 KEY VALUE Row Key Family Member Timestamp Value Name F T1 John Row key Name Address Dept. Personal Info. F M L stre et John B Smith Frankl in Jennif er T Wong S Walla ce James E Borg city Houst on Houst on Houst on Houst on Stat e nam e Dn o TX Rd 5 TX Rd 5 TX Adm in 4 Mgr_ssn Se x M M TX Head 1 M When clients want to read data from HBase, if it s their first time, they need to get a META table which holds the region server location stored in Zookeeper, then query the corresponding location of records and visit the region server. This information will be cached in the clients. When clients issue a write operation, they write the data to a write-ahead log (WAL) located in the MemStore which is a write cache. The WAL can be used for recovery. There is one MemStore per column family per region. When MemStore accumulates enough data, it will be written into files in the disk. HBase store data in HFiles which is indexed as B+ tree style. This allows clients to read specific data without having to scan all the files compared with HDFS only. MongoDB: Nowadays, NoSQL such as MongoDB is widely used in many companies and organizations. Such as, Google, ebay, Federal communication commission, Saks Fifth Ave, and CitiGroup etc. different areas like government, high-tech, and financial services. For retail store, they can build sync what the customer see with MongoDB, to create seamless, endless window shopping. For government, it is easier to use MongoDB to store and analyze fast-moving data. MongoDB provide a rich source of querying. For high-tech company like Expedia, MongoDB supports for flexible document model and any attributes that can analyze better applications or networks. Performance comparison: To understand how NoSQL database benefit to large data analysis, we study some performance related paper to compare RDBMS with Hadoop [2][3]. Performance metrics include data loading time and query execution time. MySQL is used to compare with Hive where it s a data warehouse can analyze, query and extract data and work on top of Hadoop. It provides SQL like syntax and transform query into Map-Reduce automatically. The experiment data set are of three different size. As we can see, the loading time difference get larger when data set size increases. This is because RDBMS needs to check data F Sala ry Bda te

7 integrity when loading data, however, NoSQL database doesn t follow strict schema so it can postpone this step until reading data. Query comparison is based on TPC-DS (Transaction Processing Performance Council- Decision Support) which is a benchmark for evaluating decision support systems. A complex query contains joining 5 tables, 4 aggregation operations, 1 group by operation and 1 order by operation is selected for experiment. The result is shown on the right-hand side, Hive outperforms MySQL especially when data set goes up to 390 GB. This results from the attribution of Map- Reduce and paralleling of Hadoop. Concurrent query performance is also evaluated through the same configuration. Below graphs show the result for comparison of execution time of concurrent query with consecutive query. Table 7 has 4 queries and each query is concurrently running with 5 times so total 20 queries are executing concurrently. The last column represents run 5 times concurrently for each individual query. Obviously, the total concurrency performance is worse than solo concurrency. This is because Hive initially is not designed for such kind of purpose. Hence, such transaction schedule or locking mechanism over distributed system may not optimized as RDBMS cause poor performance for concurrent queries. Migrate SQL to NoSQL: Many of current enterprises use RDBMS face the problem of explosive increasing data. As we mentioned before, NoSQL database is not designed to replace RDBMS. Instead, we can take advantage of NoSQL and combined with RDMBS. The former can be used as backend data storage or off-line data analysis and the latter can handle small/medium data size and complex transactions. Thus, the problem is how to integrate them as a hybrid database [4]. The idea is to use middleware to solve this problem. One paper proposed a data adapter system to integrate heterogeneous databases. The architecture is shown as below. Traditional, user behind application interacts with RDB directly. However, we need one DB adapter to parse and

8 translate SQL command to make it works on NoSQL as well, one DB converter to transform data from RDB to NoSQL in the hybrid database scenario. DB converter is a crucial part because it needs to keep data consistency all the time. When user sends a query request, the query executes on both RDB and NoSQL. At the same time, DB converter may also work on transform data from RDB to NoSQL so it s possible user can get different results from both side. The author takes the idea of synchronization from operation system, blocking some query on NoSQL side when transformation is working and try to minimize the blocking time. RDBMS is a mature product and can handle very complex real-time transaction and many companies still use it for management, it s not practical to develop one NoSQL to replace RDBMS. However, with the existence of middle ware, people can take features of two types of database and reduce the learning rate for employee. How to perfectly integrate them to work on real world is still an interesting topic to explore.

9 Reference: [1] Ishwarappa and J. Anuradha, "A Brief Introduction on Big Data 5Vs Characteristics and Hadoop Technology", Procedia Computer Science, vol. 48, pp , [2] K. Gadiraju, M. Verma, K. Davis and P. Talaga, "Benchmarking performance for migrating a relational application to a parallel implementation", Future Generation Computer Systems, vol. 63, pp , [3] A. Alekseev, V. Osipova, M. Ivanov, A. Klimentov, N. Grigorieva and H. Nalamwar, "Efficient data management tools for the heterogeneous big data warehouse", Physics of Particles and Nuclei Letters, vol. 13, no. 5, pp , [4] Y. Liao, J. Zhou, C. Lu, S. Chen, C. Hsu, W. Chen, M. Jiang and Y. Chung, "Data adapter for querying and transformation between SQL and NoSQL database", Future Generation Computer Systems, vol. 65, pp , [5] "HBase and MapR-DB: Designed for Distribution, Scale, and Speed MapR", Mapr.com, [Online]. Available: [Accessed: 06- May- 2017]. [6] B. Hedlund, "Understanding Hadoop Clusters and the Network", Bradhedlund.com, [Online]. Available: [Accessed: 06- May- 2017]. [7] "HBase Tutorial", [Online]. Available: [Accessed: 06- May- 2017]. [8] Banker, Kyle (March 28, 2011), MongoDB in Action (1st ed.), Manning, p. 375, ISBN [9] Hawkins, Tim; Plugge, Eelco; Membrey, Peter (September 26, 2010), The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Computing (1st ed)

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS h_da Prof. Dr. Uta Störl Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe 2017 163 Performance / Benchmarks Traditional database benchmarks

More information

Comparing SQL and NOSQL databases

Comparing SQL and NOSQL databases COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2014 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations

More information

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( ) Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL

More information

Introduction to BigData, Hadoop:-

Introduction to BigData, Hadoop:- Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,

More information

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) 4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) Benchmark Testing for Transwarp Inceptor A big data analysis system based on in-memory computing Mingang Chen1,2,a,

More information

Big Data Analytics. Rasoul Karimi

Big Data Analytics. Rasoul Karimi Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Outline

More information

Ghislain Fourny. Big Data 5. Column stores

Ghislain Fourny. Big Data 5. Column stores Ghislain Fourny Big Data 5. Column stores 1 Introduction 2 Relational model 3 Relational model Schema 4 Issues with relational databases (RDBMS) Small scale Single machine 5 Can we fix a RDBMS? Scale up

More information

Ghislain Fourny. Big Data 5. Wide column stores

Ghislain Fourny. Big Data 5. Wide column stores Ghislain Fourny Big Data 5. Wide column stores Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage 2 Where we are User interfaces

More information

Data Informatics. Seon Ho Kim, Ph.D.

Data Informatics. Seon Ho Kim, Ph.D. Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu HBase HBase is.. A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage. Designed to operate

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1 Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1 Chapter 25 Distributed Databases and Client-Server Architectures Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 25 Outline

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

Translation of ER-diagram into Relational Schema. Dr. Sunnie S. Chung CIS430/530

Translation of ER-diagram into Relational Schema. Dr. Sunnie S. Chung CIS430/530 Translation of ER-diagram into Relational Schema Dr. Sunnie S. Chung CIS430/530 Learning Objectives Define each of the following database terms 9.2 Relation Primary key Foreign key Referential integrity

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 10: Mutable State (1/2) March 15, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

Data Storage Infrastructure at Facebook

Data Storage Infrastructure at Facebook Data Storage Infrastructure at Facebook Spring 2018 Cleveland State University CIS 601 Presentation Yi Dong Instructor: Dr. Chung Outline Strategy of data storage, processing, and log collection Data flow

More information

A Glimpse of the Hadoop Echosystem

A Glimpse of the Hadoop Echosystem A Glimpse of the Hadoop Echosystem 1 Hadoop Echosystem A cluster is shared among several users in an organization Different services HDFS and MapReduce provide the lower layers of the infrastructures Other

More information

Hadoop An Overview. - Socrates CCDH

Hadoop An Overview. - Socrates CCDH Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected

More information

COSC 6339 Big Data Analytics. NoSQL (II) HBase. Edgar Gabriel Fall HBase. Column-Oriented data store Distributed designed to serve large tables

COSC 6339 Big Data Analytics. NoSQL (II) HBase. Edgar Gabriel Fall HBase. Column-Oriented data store Distributed designed to serve large tables COSC 6339 Big Data Analytics NoSQL (II) HBase Edgar Gabriel Fall 2018 HBase Column-Oriented data store Distributed designed to serve large tables Billions of rows and millions of columns Runs on a cluster

More information

Guides for Installing MS SQL Server and Creating Your First Database. Please see more guidelines on installing procedure on the class webpage

Guides for Installing MS SQL Server and Creating Your First Database. Please see more guidelines on installing procedure on the class webpage Guides for Installing MS SQL Server and Creating Your First Database Installing process Please see more guidelines on installing procedure on the class webpage 1. Make sure that you install a server with

More information

Typical size of data you deal with on a daily basis

Typical size of data you deal with on a daily basis Typical size of data you deal with on a daily basis Processes More than 161 Petabytes of raw data a day https://aci.info/2014/07/12/the-dataexplosion-in-2014-minute-by-minuteinfographic/ On average, 1MB-2MB

More information

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014 NoSQL Databases Amir H. Payberah Swedish Institute of Computer Science amir@sics.se April 10, 2014 Amir H. Payberah (SICS) NoSQL Databases April 10, 2014 1 / 67 Database and Database Management System

More information

Evolution of Database Systems

Evolution of Database Systems Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second

More information

Part 1 on Table Function

Part 1 on Table Function CIS611 Lab Assignment 1 SS Chung 1. Write Table Functions 2. Automatic Creation and Maintenance of Database from Web Interface 3. Transforming a SQL Query into an Execution Plan in Relational Algebra for

More information

10 Million Smart Meter Data with Apache HBase

10 Million Smart Meter Data with Apache HBase 10 Million Smart Meter Data with Apache HBase 5/31/2017 OSS Solution Center Hitachi, Ltd. Masahiro Ito OSS Summit Japan 2017 Who am I? Masahiro Ito ( 伊藤雅博 ) Software Engineer at Hitachi, Ltd. Focus on

More information

Big Data Hadoop Stack

Big Data Hadoop Stack Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware

More information

CIB Session 12th NoSQL Databases Structures

CIB Session 12th NoSQL Databases Structures CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is

More information

Hadoop. copyright 2011 Trainologic LTD

Hadoop. copyright 2011 Trainologic LTD Hadoop Hadoop is a framework for processing large amounts of data in a distributed manner. It can scale up to thousands of machines. It provides high-availability. Provides map-reduce functionality. Hides

More information

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons

More information

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, 2016 1 OBJECTIVES ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, 2016 2 WHAT

More information

Database Evolution. DB NoSQL Linked Open Data. L. Vigliano

Database Evolution. DB NoSQL Linked Open Data. L. Vigliano Database Evolution DB NoSQL Linked Open Data Requirements and features Large volumes of data..increasing No regular data structure to manage Relatively homogeneous elements among them (no correlation between

More information

HBase... And Lewis Carroll! Twi:er,

HBase... And Lewis Carroll! Twi:er, HBase... And Lewis Carroll! jw4ean@cloudera.com Twi:er, LinkedIn: @jw4ean 1 Introduc@on 2010: Cloudera Solu@ons Architect 2011: Cloudera TAM/DSE 2012-2013: Cloudera Training focusing on Partners and Newbies

More information

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan COSC 416 NoSQL Databases NoSQL Databases Overview Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Databases Brought Back to Life!!! Image copyright: www.dragoart.com Image

More information

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10 Scalable Web Programming CS193S - Jan Jannink - 2/25/10 Weekly Syllabus 1.Scalability: (Jan.) 2.Agile Practices 3.Ecology/Mashups 4.Browser/Client 7.Analytics 8.Cloud/Map-Reduce 9.Published APIs: (Mar.)*

More information

Module - 17 Lecture - 23 SQL and NoSQL systems. (Refer Slide Time: 00:04)

Module - 17 Lecture - 23 SQL and NoSQL systems. (Refer Slide Time: 00:04) Introduction to Morden Application Development Dr. Gaurav Raina Prof. Tanmai Gopal Department of Computer Science and Engineering Indian Institute of Technology, Madras Module - 17 Lecture - 23 SQL and

More information

Big Data Analytics using Apache Hadoop and Spark with Scala

Big Data Analytics using Apache Hadoop and Spark with Scala Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (1/2) March 14, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES 1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB

More information

What is database? Types and Examples

What is database? Types and Examples What is database? Types and Examples Visit our site for more information: www.examplanning.com Facebook Page: https://www.facebook.com/examplanning10/ Twitter: https://twitter.com/examplanning10 TABLE

More information

CIS611 Lab Assignment 1 SS Chung

CIS611 Lab Assignment 1 SS Chung CIS611 Lab Assignment 1 SS Chung 1. Creating a Relational Database Schema from ER Diagram, Populating the Database and Querying Over the database with SQL 2. Automatic Creation and Maintenance of Database

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

Using space-filling curves for multidimensional

Using space-filling curves for multidimensional Using space-filling curves for multidimensional indexing Dr. Bisztray Dénes Senior Research Engineer 1 Nokia Solutions and Networks 2014 In medias res Performance problems with RDBMS Switch to NoSQL store

More information

Non-Relational Databases. Pelle Jakovits

Non-Relational Databases. Pelle Jakovits Non-Relational Databases Pelle Jakovits 25 October 2017 Outline Background Relational model Database scaling The NoSQL Movement CAP Theorem Non-relational data models Key-value Document-oriented Column

More information

BigTable: A Distributed Storage System for Structured Data

BigTable: A Distributed Storage System for Structured Data BigTable: A Distributed Storage System for Structured Data Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26

More information

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 16: NoSQL and JSon Current assignments: Homework 4 due tonight Web Quiz 6 due next Wednesday [There is no Web Quiz 5 Today s lecture: JSon The book covers

More information

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model. 1. MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model. MapReduce is a framework for processing big data which processes data in two phases, a Map

More information

Database Systems CSE 414

Database Systems CSE 414 Database Systems CSE 414 Lecture 16: NoSQL and JSon CSE 414 - Spring 2016 1 Announcements Current assignments: Homework 4 due tonight Web Quiz 6 due next Wednesday [There is no Web Quiz 5] Today s lecture:

More information

Fattane Zarrinkalam کارگاه ساالنه آزمایشگاه فناوری وب

Fattane Zarrinkalam کارگاه ساالنه آزمایشگاه فناوری وب Fattane Zarrinkalam کارگاه ساالنه آزمایشگاه فناوری وب 1391 زمستان Outlines Introduction DataModel Architecture HBase vs. RDBMS HBase users 2 Why Hadoop? Datasets are growing to Petabytes Traditional datasets

More information

Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...

Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,... Data Ingestion ETL, Distcp, Kafka, OpenRefine, Query & Exploration SQL, Search, Cypher, Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...

More information

Hadoop Development Introduction

Hadoop Development Introduction Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

Introduction to Data Management. Lecture #2 (Big Picture, Cont.)

Introduction to Data Management. Lecture #2 (Big Picture, Cont.) Introduction to Data Management Lecture #2 (Big Picture, Cont.) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v Still hanging

More information

Session Active Databases (2+3 of 3)

Session Active Databases (2+3 of 3) INFO-H-415 - Advanced Databes Session 2+3 - Active Databes (2+3 of 3) Consider the following databe schema: DeptLocation DNumber DLocation Employee FName MInit LName SSN BDate Address Sex Salary SuperSSN

More information

Introduction to Data Management. Lecture #2 (Big Picture, Cont.) Instructor: Chen Li

Introduction to Data Management. Lecture #2 (Big Picture, Cont.) Instructor: Chen Li Introduction to Data Management Lecture #2 (Big Picture, Cont.) Instructor: Chen Li 1 Announcements v We added 10 more seats to the class for students on the waiting list v Deadline to drop the class:

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data

More information

OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks

OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks Monte Zweben Co- Founder and Chief Execu6ve Officer John Leach Co- Founder and Chief Technology Officer September 30, 2015 The Tradi6onal

More information

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related

More information

A Fast and High Throughput SQL Query System for Big Data

A Fast and High Throughput SQL Query System for Big Data A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190

More information

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Scaling Up HBase Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Partly based on materials

More information

Column Stores and HBase. Rui LIU, Maksim Hrytsenia

Column Stores and HBase. Rui LIU, Maksim Hrytsenia Column Stores and HBase Rui LIU, Maksim Hrytsenia December 2017 Contents 1 Hadoop 2 1.1 Creation................................ 2 2 HBase 3 2.1 Column Store Database....................... 3 2.2 HBase

More information

Importing and Exporting Data Between Hadoop and MySQL

Importing and Exporting Data Between Hadoop and MySQL Importing and Exporting Data Between Hadoop and MySQL + 1 About me Sarah Sproehnle Former MySQL instructor Joined Cloudera in March 2010 sarah@cloudera.com 2 What is Hadoop? An open-source framework for

More information

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Goal of the presentation is to give an introduction of NoSQL databases, why they are there. 1 Goal of the presentation is to give an introduction of NoSQL databases, why they are there. We want to present "Why?" first to explain the need of something like "NoSQL" and then in "What?" we go in

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

Certified Big Data and Hadoop Course Curriculum

Certified Big Data and Hadoop Course Curriculum Certified Big Data and Hadoop Course Curriculum The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation

More information

CSE-E5430 Scalable Cloud Computing Lecture 9

CSE-E5430 Scalable Cloud Computing Lecture 9 CSE-E5430 Scalable Cloud Computing Lecture 9 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 15.11-2015 1/24 BigTable Described in the paper: Fay

More information

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos Instituto Politécnico de Tomar Introduction to Big Data NoSQL Databases Ricardo Campos Mestrado EI-IC Análise e Processamento de Grandes Volumes de Dados Tomar, Portugal, 2016 Part of the slides used in

More information

Some different database system architectures. (a) Shared nothing architecture.

Some different database system architectures. (a) Shared nothing architecture. Figure.1 Some different database system architectures. (a) Shared nothing architecture. Computer System 1 Computer System CPU DB CPU DB MEMORY MEMORY Switch Computer System n CPU DB MEMORY Figure.1 continued.

More information

Shen PingCAP 2017

Shen PingCAP 2017 Shen Li @ PingCAP About me Shen Li ( 申砾 ) Tech Lead of TiDB, VP of Engineering Netease / 360 / PingCAP Infrastructure software engineer WHY DO WE NEED A NEW DATABASE? Brief History Standalone RDBMS NoSQL

More information

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung HDFS: Hadoop Distributed File System CIS 612 Sunnie Chung What is Big Data?? Bulk Amount Unstructured Introduction Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per

More information

/ Cloud Computing. Recitation 10 March 22nd, 2016

/ Cloud Computing. Recitation 10 March 22nd, 2016 15-319 / 15-619 Cloud Computing Recitation 10 March 22nd, 2016 Overview Administrative issues Office Hours, Piazza guidelines Last week s reflection Project 3.3, OLI Unit 4, Module 15, Quiz 8 This week

More information

Big Data with Hadoop Ecosystem

Big Data with Hadoop Ecosystem Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process

More information

A STUDY ON THE TRANSLATION MECHANISM FROM RELATIONAL-BASED DATABASE TO COLUMN-BASED DATABASE

A STUDY ON THE TRANSLATION MECHANISM FROM RELATIONAL-BASED DATABASE TO COLUMN-BASED DATABASE A STUDY ON THE TRANSLATION MECHANISM FROM RELATIONAL-BASED DATABASE TO COLUMN-BASED DATABASE Chin-Chao Huang, Wenching Liou National Chengchi University, Taiwan 99356015@nccu.edu.tw, w_liou@nccu.edu.tw

More information

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

Integrating Oracle Databases with NoSQL Databases for Linux on IBM LinuxONE and z System Servers

Integrating Oracle Databases with NoSQL Databases for Linux on IBM LinuxONE and z System Servers Oracle zsig Conference IBM LinuxONE and z System Servers Integrating Oracle Databases with NoSQL Databases for Linux on IBM LinuxONE and z System Servers Sam Amsavelu Oracle on z Architect IBM Washington

More information

Distributed Systems 16. Distributed File Systems II

Distributed Systems 16. Distributed File Systems II Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS

More information

Oracle Big Data SQL High Performance Data Virtualization Explained

Oracle Big Data SQL High Performance Data Virtualization Explained Keywords: Oracle Big Data SQL High Performance Data Virtualization Explained Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data SQL, SQL, Big Data, Hadoop, NoSQL Databases, Relational Databases,

More information

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng Bigtable: A Distributed Storage System for Structured Data Andrew Hon, Phyllis Lau, Justin Ng What is Bigtable? - A storage system for managing structured data - Used in 60+ Google services - Motivation:

More information

A Survey on Big Data

A Survey on Big Data A Survey on Big Data D.Prudhvi 1, D.Jaswitha 2, B. Mounika 3, Monika Bagal 4 1 2 3 4 B.Tech Final Year, CSE, Dadi Institute of Engineering & Technology,Andhra Pradesh,INDIA ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

COSC 304 Introduction to Database Systems. NoSQL Databases. Dr. Ramon Lawrence University of British Columbia Okanagan

COSC 304 Introduction to Database Systems. NoSQL Databases. Dr. Ramon Lawrence University of British Columbia Okanagan COSC 304 Introduction to Database Systems NoSQL Databases Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Relational Databases Relational databases are the dominant form

More information

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 11: NoSQL & JSON (mostly not in textbook only Ch 11.1) HW5 will be posted on Friday and due on Nov. 14, 11pm [No Web Quiz 5] Today s lecture: NoSQL & JSON

More information

Data Architectures in Azure for Analytics & Big Data

Data Architectures in Azure for Analytics & Big Data Data Architectures in for Analytics & Big Data October 20, 2018 Melissa Coates Solution Architect, BlueGranite Microsoft Data Platform MVP Blog: www.sqlchick.com Twitter: @sqlchick Data Architecture A

More information

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI 2006 Presented by Xiang Gao 2014-11-05 Outline Motivation Data Model APIs Building Blocks Implementation Refinement

More information

745: Advanced Database Systems

745: Advanced Database Systems 745: Advanced Database Systems Yanlei Diao University of Massachusetts Amherst Outline Overview of course topics Course requirements Database Management Systems 1. Online Analytical Processing (OLAP) vs.

More information

Impala. A Modern, Open Source SQL Engine for Hadoop. Yogesh Chockalingam

Impala. A Modern, Open Source SQL Engine for Hadoop. Yogesh Chockalingam Impala A Modern, Open Source SQL Engine for Hadoop Yogesh Chockalingam Agenda Introduction Architecture Front End Back End Evaluation Comparison with Spark SQL Introduction Why not use Hive or HBase?

More information

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY WHAT IS NOSQL? Stands for No-SQL or Not Only SQL. Class of non-relational data storage systems E.g.

More information

HBase vs Neo4j. Technical overview. Name: Vladan Jovičić CR09 Advanced Scalable Data (Fall, 2017) Ecolé Normale Superiuere de Lyon

HBase vs Neo4j. Technical overview. Name: Vladan Jovičić CR09 Advanced Scalable Data (Fall, 2017) Ecolé Normale Superiuere de Lyon HBase vs Neo4j Technical overview Name: Vladan Jovičić CR09 Advanced Scalable Data (Fall, 2017) Ecolé Normale Superiuere de Lyon 12th October 2017 1 Contents 1 Introduction 3 2 Overview of HBase and Neo4j

More information

Tutorial Outline. Map/Reduce vs. DBMS. MR vs. DBMS [DeWitt and Stonebraker 2008] Acknowledgements. MR is a step backwards in database access

Tutorial Outline. Map/Reduce vs. DBMS. MR vs. DBMS [DeWitt and Stonebraker 2008] Acknowledgements. MR is a step backwards in database access Map/Reduce vs. DBMS Sharma Chakravarthy Information Technology Laboratory Computer Science and Engineering Department The University of Texas at Arlington, Arlington, TX 76009 Email: sharma@cse.uta.edu

More information

Querying a Relational Database COMPANY database For Lab4, you use the Company database that you built in Lab2 and used for Lab3

Querying a Relational Database COMPANY database For Lab4, you use the Company database that you built in Lab2 and used for Lab3 CIS30/530 Lab Assignment SS Chung Querying a Relational Database COMPANY database For Lab, you use the Company database that you built in Lab2 and used for Lab3 1. Update the following new changes into

More information

COSC344 Database Theory and Applications. σ a= c (P) S. Lecture 4 Relational algebra. π A, P X Q. COSC344 Lecture 4 1

COSC344 Database Theory and Applications. σ a= c (P) S. Lecture 4 Relational algebra. π A, P X Q. COSC344 Lecture 4 1 COSC344 Database Theory and Applications σ a= c (P) S π A, C (H) P P X Q Lecture 4 Relational algebra COSC344 Lecture 4 1 Overview Last Lecture Relational Model This Lecture ER to Relational mapping Relational

More information

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads WHITE PAPER Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads December 2014 Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

Dr. Anis Koubaa. Advanced Databases SE487. Prince Sultan University

Dr. Anis Koubaa. Advanced Databases SE487. Prince Sultan University Advanced Databases Prince Sultan University College of Computer and Information Sciences Fall 2013 Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases Anis Koubaa SE487

More information

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering Big Data Processing Technologies Chentao Wu Associate Professor Dept. of Computer Science and Engineering wuct@cs.sjtu.edu.cn Schedule (1) Storage system part (first eight weeks) lec1: Introduction on

More information

Introduction to NoSQL by William McKnight

Introduction to NoSQL by William McKnight Introduction to NoSQL by William McKnight All rights reserved. Reproduction in whole or part prohibited except by written permission. Product and company names mentioned herein may be trademarks of their

More information

AN ALGORITHM FOR MAPPING THE RELATIONAL DATABASES TO MONGODB A CASE STUDY

AN ALGORITHM FOR MAPPING THE RELATIONAL DATABASES TO MONGODB A CASE STUDY International Journal of Computer Science and Applications, Technomathematics Research Foundation Vol. 14, No. 1, pp. 65 79, 2017 AN ALGORITHM FOR MAPPING THE RELATIONAL DATABASES TO MONGODB A CASE STUDY

More information

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

Hive and Shark. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

Hive and Shark. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic) Hive and Shark Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Hive and Shark 1393/8/19 1 / 45 Motivation MapReduce is hard to

More information

Database Solution in Cloud Computing

Database Solution in Cloud Computing Database Solution in Cloud Computing CERC liji@cnic.cn Outline Cloud Computing Database Solution Our Experiences in Database Cloud Computing SaaS Software as a Service PaaS Platform as a Service IaaS Infrastructure

More information

Translation of ER-diagram into Relational Schema. Dr. Sunnie S. Chung CIS430/530

Translation of ER-diagram into Relational Schema. Dr. Sunnie S. Chung CIS430/530 Translation of ER-diagram into Relational Schema Dr. Sunnie S. Chung CIS430/530 Learning Objectives Define each of the following database terms 9.2 Relation Primary key Foreign key Referential integrity

More information