SQL++ Query Language 9/9/2015. Semistructured Databases have Arrived. in the form of JSON. and its use in the FORWARD framework

Size: px
Start display at page:

Download "SQL++ Query Language 9/9/2015. Semistructured Databases have Arrived. in the form of JSON. and its use in the FORWARD framework"

Transcription

1 Query Language and its use in the FORWARD framework with the support of National Science Foundation, Google, Informatica, Couchbase Semistructured Databases have Arrived SQL-on-Hadoop NoSQL PIG CQL SQL-on-Hadoop Jaql Others N1QL AQL in the form of JSON location: 'Alpine', readings: time: timestamp(' t20:00:00'), ozone: 0.035, no2: , time: timestamp(' t22:00:00'), ozone: 'm', co: 0.4 1

2 Semistructured Query Languages come along SQL: Expressive Power and Adoption JSON: Semistructured Data but in a Tower of Babel SELECT AVG(temp) AS tavg FROM readings GROUP BY sid SQL db.readings.aggregate( $group: _id: "$sid", tavg: $avg:"$temp") MongoDB readings -> group by sid = $.sid into tavg: avg($.temp) ; Jaql for $r in collection("readings") group by $r.sid return tavg: avg($r.temp) JSONiq a = LOAD 'readings' AS (sid:int, temp:float); b = GROUP a BY sid; c = FOREACH b GENERATE AVG(temp); DUMP c; Pig with limited query abilities Growing out of key-value databases or SQL with special type UDFs Far from the full-fledged power of prior non-relational query languages OQL, Xquery *academic projects being the exception (eg, ASTERIX, FORWARD) 2

3 in lieu of formal semantics Query languages were not meant for Ninjas Outline What is? How we use in UCSD s FORWARD? Introduction to the formal survey of NoSQL, NewSQL and SQL-on-Hadoop : Data model + query language for querying both structured data (SQL) and semistructured data (native JSON) Formal Schemaless Semantics: adjusting tuple calculus to the richness of JSON Configurable : Formal configuration options towards choosing semantics, understanding repercussions Full fledged, declarative, aggressively backwardscompatible with SQL Unifying: 11 popular databases/qls captured by appropriate settings of Configurable 3

4 How is it used in the Big Data industry ASTERIXDB CouchDB heading towards How do we use it in UCSD Automatic Incremental Maintenance Middleware Query Optimization & Execution: FORWARD virtual database query plans based on algebra based on a schemaless algebra Data Model (also JSON++ data model) A superset of SQL and JSON Extend SQL with arrays + nesting + heterogeneity location: 'Alpine', readings: time: timestamp(' t20:00:00'), ozone: 0.035, no2: , time: timestamp(' t22:00:00'), ozone: 'm', co: 0.4 Array nested inside a tuple Heterogeneous tuples in collections Arbitrary compositions of array, bag, tuple 4

5 Data Model (also JSON++ data model) A superset of SQL and JSON Extend SQL with arrays + nesting + heterogeneity + permissive structures 5, 10, 3, 21, 2, 15, 6, 4, lo: 3, exp: 4, hi: 7, 2, 13, 6 Arbitrary compositions of array, bag, tuple Data Model (also JSON++ data model) A superset of SQL and JSON Extend SQL with arrays + nesting + heterogeneity + permissive structures Extend JSON with bags and enriched types location: 'Alpine', readings: time: timestamp(' t20:00:00'), ozone: 0.035, no2: , time: timestamp(' t22:00:00'), ozone: 'm', co: 0.4 Bags Enriched types Exercise: Why the data model is a superset of the SQL data model? a SQL table corresponds to a JSON bag that has homogeneous tuples 5

6 Data Model (also JSON++ data model) A superset of SQL and JSON Extend SQL with arrays + nesting + heterogeneity + permissive structures + permissive schema Extend JSON with bags and enriched types location: 'Alpine', readings: time: timestamp(' t20:00:00'), time: timestamp, ozone: 0.035, ozone: any, no2: *, time: timestamp(' t22:00:00'), ozone: 'm', co: 0.4 location: string, readings: With schema Or, schemaless Or, partial schema From SQL to Goal: Queries that input/output any JSON type, yet backwards compatible with SQL Backwards Compatibility with SQL readings : sid: 2, temp: 70.1, sid: 2, temp: 49.2, sid: 1, temp: null SELECT DISTINCT r.sid FROM readings AS r WHERE r.temp < 50 sid : 2 sid temp null Find sensors that recorded a temperature below 50 sid 2 6

7 From SQL to Goal: Queries that input/output any JSON type, yet backwards compatible with SQL Adjust SQL syntax/semantics for heterogeneity, complex input/output values Achieved with few extensions and mostly by SQL limitation removals From Tuple Variables to Element Variables readings : 1.3, 0.7, 0.3, 0.8 SELECT r AS co FROM readings AS r WHERE r < 1.0 ORDER BY r DESC LIMIT 2 Find the highest two sensor readings that are below 1.0 co: 0.8, co: 0.7 Element Variables readings : 1.3, 0.7, 0.3, 0.8 FROM readings AS r B out FROM = B in WHERE = r : 1.3, r : 0.7, r : 0.3, r : 0.8 WHERE r < 1.0 B out WHERE = B in ORDERBY = r : 0.7, r : 0.3, r : 0.8 ORDER BY r DESC LIMIT 2 B out ORDERBY = B in LIMIT = r : 0.8, r : 0.7, r : 0.3 B out LIMIT = B in SELECT = r : 0.8 r : 0.7 SELECT r AS co co: 0.8, co: 0.7 7

8 Correlated Subqueries Everywhere sensors : 1.3, 2 0.7, 0.7, , 0.8, , 1.4 SELECT r AS co FROM sensors AS s, s AS r WHERE r < 1.0 ORDER BY r DESC LIMIT 2 Find the highest two sensor readings that are below 1.0 co: 0.9, co: 0.8 Correlated Subqueries Everywhere FROM sensors AS s, s AS r sensors : 1.3, 2 0.7, 0.7, , 0.8, , 1.4 B out FROM = B in WHERE = s : 1.3, 2, r : 1.3, s : 1.3, 2, r : 2, s : 0.7, 0.7, 0.9, r : 0.7, s : 0.7, 0.7, 0.9, r : 0.7, s : 0.7, 0.7, 0.9, r : 0.9, s : 0.3, 0.8, 1.1, r : 0.3, s : 0.3, 0.8, 1.1, r : 0.8, s : 0.3, 0.8, 1.1, r : 1.1, s : 0.7, 1.4, r : 0.7, s : 0.7, 1.4, r : 1.4 WHERE r < 1.0 ORDER BY r DESC LIMIT 2 SELECT r AS co Correlated Subqueries Everywhere sensors : 1.3, 2 0.7, 0.7, , 0.8, , 1.4 FROM sensors AS s, s AS r WHERE r < 1.0 B out WHERE = B in ORDERBY = s : 0.7, 0.7, 0.9, r : 0.7, s : 0.7, 0.7, 0.9, r : 0.7, s : 0.7, 0.7, 0.9, r : 0.9, s : 0.3, 0.8, 1.1, r : 0.3, s : 0.3, 0.8, 1.1, r : 0.8, s : 0.7, 1.4, r : 0.7 ORDER BY r DESC LIMIT 2 SELECT r AS co 8

9 Correlated Subqueries Everywhere sensors : 1.3, 2 0.7, 0.7, , 0.8, , 1.4 FROM sensors AS s, s AS r WHERE r < 1.0 ORDER BY r DESC LIMIT 2 B out ORDERBY = B in LIMIT = s : 0.7, 0.7, 0.9, r : 0.9, s : 0.3, 0.8, 1.1, r : 0.8, s : 0.7, 0.7, 0.9, r : 0.7, s : 0.7, 0.7, 0.9, r : 0.7, s : 0.7, 1.4, r : 0.7, s : 0.3, 0.8, 1.1, r : 0.3 B out LIMIT = B in SELECT = s : 0.7, 0.7, 0.9, r : 0.9, s : 0.3, 0.8, 1.1, r : 0.8 SELECT r AS co co: 0.9, co: 0.8 Utilization of Input Order y x sensors : 1.3, 2 0.7, 0.7, , 0.8, , 1.4 SELECT r AS co, rp AS x, sp AS y FROM sensors AS s AT sp, s AS r AT rp WHERE r < 1.0 ORDER BY r DESC LIMIT 2 Find the highest two sensor readings that are below 1.0 co: 0.9, x: 3, y: 2, co: 0.8, x: 2, y: 3 Constructing Nested Results Subqueries Again Γ 0 = sensors : sensor: 1, sensor: 2, logs: sensor: 1, co: 0.4, sensor: 1, co: 0.2, sensor: 2, co: 0.3, FROM sensors AS s B out FROM = B in SELECT = s : sensor: 1, s : sensor: 2 SELECT s.sensor AS sensor, ( SELECT l.co AS co FROM logs AS l WHERE l.sensor = s.sensor ) AS readings sensor : 1, readings: co:0.4, co:0.2, sensor : 2, readings: co:0.3 9

10 Constructing Nested Results Γ 1 = s : sensor : 1, sensors : sensor: 1, sensor: 2, logs: sensor: 1, co: 0.4, sensor: 1, co: 0.2, sensor: 2, co: 0.3, FROM logs AS l B out FROM = B in SELECT = l : sensor: 1, co: 0.4, l : sensor: 1, co: 0.2, l : sensor: 2, co: 0.3 WHERE l.sensor = s.sensor B out FROM = B in SELECT = l : sensor: 1, co: 0.4, l : sensor: 1, co: 0.2 SELECT l.co AS co co:0.4, co:0.2 Iterating over Attribute-Value pairs INNER Vs OUTER Correlation: Capturing Outerjoins, OUTER FLATTEN Γ = readings : co :, no2: 0.7, so2: 0.5, 2 FROM readings AS g:v, v AS n g : "no2", v : 0.7, n : 0.7, g : "so2", v : 0.5, 2, n : 0.5, g : "co", v : 0.5, 2, n : 2 SELECT ELEMENT gas: g, val: v gas: "co", num: null, gas: "no2", num: 0.7, gas: "so2", num: 0.5,, gas: "so2", num: 2 10

11 INNER Vs OUTER Correlation: Capturing Outerjoins, OUTER FLATTEN Γ = readings : co :, no2: 0.7, so2: 0.5, 2 FROM readings AS g:v INNER CORRELATE v AS n g : "no2", v : 0.7, n : 0.7, g : "so2", v : 0.5, 2, n : 0.5, g : "co", v : 0.5, 2, n : 2 SELECT ELEMENT gas: g, val: v gas: "co", num: null, gas: "no2", num: 0.7, gas: "so2", num: 0.5,, gas: "so2", num: 2 INNER Vs OUTER Correlation: Capturing Outerjoins, OUTER FLATTEN Γ = readings : co :, no2: 0.7, so2: 0.5, 2 FROM readings AS g:v OUTER CORRELATE v AS n g : "co", v :, n : missing, g : "no2", v : 0.7, n : 0.7, g : "so2", v : 0.5, 2, n : 0.5, g : "co", v : 0.5, 2, n : 2 SELECT ELEMENT gas: g, val: v gas: "co", num: null, gas: "no2", num: 0.7, gas: "so2", num: 0.5,, gas: "so2", num: 2 The Extensions to SQL Goal: Queries that input/output any JSON type, yet backwards compatible with SQL Adjust SQL syntax/semantics for heterogeneity, complex values Achieved with few extensions and SQL limitation removals Element Variables Correlated Subqueries (in FROM and SELECT clauses) Optional utilization of Input Order Grouping indendent of Aggregation Functions... and a few more details Configurability: making a unifying and configurable language 11

12 Algebras do not need schemaful inputs Γ 1 = s : sensor : 1, sensors : sensor: 1, sensor: sensor 2 co, logs: sensor: 1, co: 2 0.4, 0.3 sensor: 1, co: 0.2, sensor: 2, co: 0.3, FROM logs AS l B out FROM = B in SELECT = l : sensor: 1, co: 0.4, l : sensor: 1, co: 0.2, l : sensor: 2, co: 0.3 WHERE l.sensor = s.sensor B out FROM = B in SELECT = l : sensor: 1, co: 0.4, l : sensor: 1, co: 0.2 SELECT l.co AS co co:0.4, co:0.2 Outline What is? How we use in UCSD s FORWARD? uses in integration Introduction to the formal survey of NoSQL, NewSQL and SQL-on-Hadoop based Virtual Database Client Federated Queries Results FORWARD Virtual Database Query Processor Virtual s of Sources Virtual SQL Wrapper Virtual NewSQL Wrapper Native queries Virtual NoSQL Wrapper Native results Virtual SQL-on- Hadoop Wrapper Virtual Java Objects Wrapper SQL Database NewSQL Database NoSQL Database SQL-on- Hadoop Database Java In-Memory Objects 12

13 Virtual and/or Materialized Client s Queries Results FORWARD Integration Database Definitions Integrated Query Processor Integrated s Integrated Added Value Virtual Virtual Virtual Virtual Virtual SQL Database NewSQL Database NoSQL Database SQL-on- Hadoop Database Java In-Memory Objects Use Case 1: Hide the ++ source features behind SQL views BI Tool SELECT / Report Writer m.sid, t AS temp FROM measurements AS m, SQL query SQL result m.temperatures AS t FORWARD Virtual Database measurements: SQL sid temp or its equivalent measurements: sid: 2, temp: 70.1, sid: 2, temp: 70.2, sid: 1, temp: 71.0 Couchbase query Couchbase results measurements: sid: 2, temperatures: 70.1, 70.2, sid: 1, temperatures: 71.0 Couchbase Use Case 2: Semantics-Aware Pushdown "Sensors that recorded a temperature below 50" SELECT DISTINCT m.sid FROM measurements AS m WHERE m.temp < 50 semantics for m.temp < $match: temp: $lt: 50, $match: temp: $not: null... MongoDB semantics for temp $lt 50 query MongoDB query result MongoDB Virtual Database Virtual (Identical) MongoDB Wrapper MongoDB results MongoDB measurements: sid: 2, temp: 70.1, sid: 2, temp: 49.2, sid: 1, temp: null 13

14 Push-Down Challenges: Limited capabilities & semantic variations How to simulate incompatible semantics/missing features? Sources like SQL and MongoDB cannot execute the entirety of How to efficiently push down computation? Issues automatically handled by the query processor. Plenty of query rewriting problems, including novel ones on semi-structured operators captures Semantics Variations Semantics of "less-than" comparison are different across sources: < sql, < mongodb, etc. Config Parameters to capture these MongoDB ( x < y complex : deep, type_mismatch: false, null_lt_null : false, null_lt_value: false,... ( x < y ) Semantics Variations In NoSQL, NewSQL, SQL-on-Hadoop variation of semantics for: Paths Equality Comparisons And all the operators that use them: Selections Grouping Ordering Set operations Each of these features has a set of config parameters in 14

15 Don t give a standard Teach how to reach a standard Fast evolving space Standard would be premature Configuration options itemize and explain the space options Rationalize discussion Advise on consistent settings of the configuration options Not every configuration option setting is sensible Configuration options have to subscribe to common sense iff x = y then x = y if x = y and y = z then x = z which enables rewritings Have to be consistent with each other Eg, if true AND missing is missing, then false OR missing is missing FORWARD: Incremental Maintenance and Application Visualization layer The Incremental Maintenance functionality: (Materialized) Definition Eg, Couchbase has a JSON web log showing user, list of displayed products and FORWARD produces materialized view product category, count, products: product, count Stream of inserts, deletes, updates on base data Before DB Before IVM Stream Stream The Incremental Maintenance module Automatically and efficiently updates the materialized view to reflect the stream of changes can also enable automatic Incremental Maintenance! With attention to replication of data in views Opportunities by keys After DB After 15

16 FORWARD: Incremental Maintenance and Application Visualization layer Custom dashboards, interactive pages & apps The data models of visualization components (e.g. Google Maps) can be nicely captured with JSON models The pages are (JSON) views! Mashups of the components views feeds and incrementally updates the page views Use case From data to visualization with just & markup Ajax/Javascript visuals with no Ajax/Javascript mess How to easily connect to today s JS libraries Custom Ajax visualizations & interfaces for IT personnel FORWARD: Incremental Maintenance and Application Visualization layer (part of) the Google Map model <% unit google.maps.maps %> markers: position: latitude : number, longitude: number... <% end unit %> Outline What is? How we use in UCSD s FORWARD? uses in integration and (live) analytics applications Introduction to the formal survey of NoSQL, NewSQL and SQLon-Hadoop 16

17 Surveyed Databases SQL-on-Hadoop NoSQL Hive Pig Jaql SQL AQL MongoJDBC CQL MongoDB JSONiq N1QL BigQuery (IBM) (AsterixDB) (Cassandra) (Couchbase) (Google) Algebraic FLWOR SQL-like Created JSON-like Based Previously standard on syntax at XQuery "Dremel" Facebook Hadoop Also Research MongoDB Most 28msec No SQL-like production comprises used clusters cluster(s) syntax database via NoSQL JDBC release NewSQL database (UCI) driver PIG CQL SQL-on-Hadoop Jaql Others N1QL SQL & NewSQL AQL MongoDB driver Removes Superficial Differences covers SQL, N1QL and QL research prototypes (e.g., UCI s ASTERIX) Removing the current Tower of Babel effect Providing formal syntax and semantics SELECT AVG(temp) AS tavg FROM readings GROUP BY sid SQL readings -> group by sid = $.sid into tavg: avg($.temp) ; Jaql for $r in collection("readings") group by $r.sid return tavg: avg($r.temp) JSONiq db.readings.aggregate( $group: _id: "$sid", tavg: $avg:"$temp") MongoDB a = LOAD 'readings' AS (sid:int, temp:float); b = GROUP a BY sid; c = FOREACH b GENERATE AVG(temp); DUMP c; Pig Surveyed features 15 feature matrices (1-11 dimensions each) classifying: Data values Schemas Access and construct nested data Missing information Equality semantics Ordering semantics Aggregation Joins Set operators Extensibility 17

18 Outline What is? How we use in UCSD s FORWARD? Introduction to the formal survey of NoSQL, NewSQL and SQLon-Hadoop Methodology Example 1: data model (data values) Example 2: query language (SELECT clause) Example 3: semantics (path) Example 4: semantics (equality function) Methodology For each feature: 1. A formal definition of the feature in 2. A example 3. A feature matrix that classifies each dimension of the feature 4. A discussion of the results, partial support and unexpected behaviors All the results are empirically validated Example: Data values 1. example: location: 'Alpine', readings: time: timestamp(' t20:00:00'), ozone: 0.035, no2: , time: timestamp(' t22:00:00'), ozone: 'm', co:

19 Example: Data values 2. BNF for values: Example: Data values 3. Feature matrix: Composability (top-level values) Heterogeneity Arrays Bags Sets Maps Tuples Primitives Bag of tuples No Yes No No Partial Yes Yes Hive Jaql Any Value Yes Yes No No No Yes Yes Pig Bag of tuples Partial No Partial No Partial Yes Yes CQL Bag of tuples No Partial No Partial Partial No Yes JSONiq Any Value Yes Yes No No No Yes Yes MongoDB Bag of tuples Yes Yes No No No Yes Yes N1QL Bag of tuples Yes Yes No No No Yes Yes SQL Bag of tuples No No No No No No Yes AQL Any Value Yes Yes Yes No No Yes Yes BigQuery Bag of tuples No No No No No Yes Yes MongoJDBC Bag of tuples Yes Yes No No No Yes Yes Any Value Yes Yes Yes Partial Yes Yes Yes Example: Data values 4. Discussion of the results: Column-by-column comparison Partial support (65k scalar elements) Identify clusters (who supports JSON?) Composability (top-level values) Heterogeneity Arrays Bags Sets Maps Tuples Primitives Bag of tuples No Yes No No Partial Yes Yes Hive Jaql Any Value Yes Yes No No No Yes Yes Pig Bag of tuples Partial No Partial No Partial Yes Yes CQL Bag of tuples No Partial No Partial Partial No Yes JSONiq Any Value Yes Yes No No No Yes Yes MongoDB Bag of tuples Yes Yes No No No Yes Yes N1QL Bag of tuples Yes Yes No No No Yes Yes SQL Bag of tuples No No No No No No Yes AQL Any Value Yes Yes Yes No No Yes Yes BigQuery Bag of tuples No No No No No Yes Yes MongoJDBC Bag of tuples Yes Yes No No No Yes Yes Any Value Yes Yes Yes Partial Yes Yes Yes 19

20 Example: SELECT clause 1. example: Projecting nested collections: SELECT TUPLE s.lat, s.long, (SELECT r.ozone FROM readings AS r WHERE r.location = s.location) FROM sensors AS s "Position and (nested) ozone readings of each sensor?" Projecting non-tuples: SELECT ELEMENT ozone FROM readings "Bag of all the (scalar) ozone readings?" Example: SELECT clause 3. Feature matrix: Projecting tuples containing nested collections Projecting non-tuples Hive Partial No Jaql Yes Yes Pig Partial No CQL No No JSONiq Yes Yes MongoDB Partial Partial N1QL Partial Partial SQL No No AQL Yes Yes BigQuery No No MongoJDBC No No Yes Yes Example: SELECT clause 4. Discussion of the results: Not well supported features 3 languages support them entirely (same cluster as for data values) Projecting tuples containing nested collections Projecting non-tuples Hive Partial No Jaql Yes Yes Pig Partial No CQL No No JSONiq Yes Yes MongoDB Partial Partial N1QL Partial Partial SQL No No AQL Yes Yes BigQuery No No MongoJDBC No No Yes Yes 20

21 Config Parameters We use config parameters to encompass and compare various semantics of a feature: Minimal number of independent dimensions 1 dimension = 1 config parameter formalism parametrized by the config parameters Feature matrix classifies the values of each config parameter Example: Paths Config parameters for tuple absent : missing, type_mismatch: error, ( x.y ) Example: Paths The feature matrix classifies, for each language, the value of each config parameter: Missing Type mismatch Hive Error Error Jaql Null Error# Pig Error Error CQL Error Error JSONiq Missing# Missing# MongoDB Missing Missing N1QL Missing Missing SQL Error Error AQL Null Error BigQuery Error Error MongoJDBC 21

22 Example: Equality Function Config parameters for complex : error, type_mismatch : false, null_eq_null : null, null_eq_value : null, missing_eq_missing: missing, missing_eq_value : missing, missing_eq_null : missing ( x = y ) Example: Equality Function The feature matrix classifies, for each language, the value of each config parameter: No real cluster Some languages have multiple (incompatible) equality functions Some edge cases cannot happen due to other limitations (SQL has no complex values) Complex Type mismatch Null = Null Null = Value Missing = Missing Missing = Value Missing = Null Hive (=, <=>) Err, Err Err, Err Null, True Null, False N/A N/A N/A Jaql Boolean Null Null Null N/A N/A N/A Pig Boolean partial Err Null Null N/A N/A N/A CQL Err Err Err False/Null N/A N/A N/A JSONiq Boolean Err, False True, True False, False Missing, True Missing, False Missing, False (=, deep-equal()) Err, MongoDB Boolean False True False True False False N1QL Boolean False Null Null Missing Missing Missing SQL N/A Err Null Null N/A N/A N/A AQL Err Err Null Null N/A N/A N/A BigQuery Err Err Null Null N/A N/A N/A MongoJDBC Boolean False True False N/A @equal How to use this survey? As a database user: Understand the semantics of a (often underspecified) query language / be aware of the limitation of a database As a designer/architect of a database Produce formal specification of your query language Align semantics with SQL's As a database researcher The results might change, but the survey methodology stays As a designer/architect of database middleware Understand what capability variations need to be encapsulated and simulated 22

23 The survey shows: The marketing clusters do not correspond to real capabilities Limited capabilities: matrices are sparse and fragmented (more pressure on source-specific rewriters and distributor) The Future is Semi-Structured and Declarative Scalability Flexibility of Semistructured Data Power of Declarative: Automatic Optimization, Maintenance The primary operational and the secondary analytics application out of semistructured, declarative platforms 23

The SQL++ Unifying Semi-structured Query Language, and an Expressiveness Benchmark of SQL-on-Hadoop, NoSQL and NewSQL Databases

The SQL++ Unifying Semi-structured Query Language, and an Expressiveness Benchmark of SQL-on-Hadoop, NoSQL and NewSQL Databases Noname manuscript No. (will be inserted by the editor) The SQL++ Unifying Semi-structured Query Language, and an Expressiveness Benchmark of SQL-on-Hadoop, NoSQL and NewSQL Databases Kian Win Ong Yannis

More information

arxiv: v2 [cs.db] 5 Jun 2014

arxiv: v2 [cs.db] 5 Jun 2014 Noname manuscript No. (will be inserted by the editor) The SQL++ Unifying Semi-structured Query Language, and an Expressiveness Benchmark of SQL-on-Hadoop, NoSQL and NewSQL Databases Kian Win Ong Yannis

More information

arxiv: v3 [cs.db] 15 Aug 2014

arxiv: v3 [cs.db] 15 Aug 2014 Noname manuscript No. (will be inserted by the editor) The SQL++ Unifying Semi-structured Query Language, and an Expressiveness Benchmark of SQL-on-Hadoop, NoSQL and NewSQL Databases Kian Win Ong Yannis

More information

Announcements. JSon Data Structures. JSon Syntax. JSon Semantics: a Tree! JSon Primitive Datatypes. Introduction to Database Systems CSE 414

Announcements. JSon Data Structures. JSon Syntax. JSon Semantics: a Tree! JSon Primitive Datatypes. Introduction to Database Systems CSE 414 Introduction to Database Systems CSE 414 Lecture 13: Json and SQL++ Announcements HW5 + WQ5 will be out tomorrow Both due in 1 week Midterm in class on Friday, 5/4 Covers everything (HW, WQ, lectures,

More information

NOSQL DATABASE SYSTEMS: DATA MODELING. Big Data Technologies: NoSQL DBMS (Data Modeling) - SoSe

NOSQL DATABASE SYSTEMS: DATA MODELING. Big Data Technologies: NoSQL DBMS (Data Modeling) - SoSe NOSQL DATABASE SYSTEMS: DATA MODELING Big Data Technologies: NoSQL DBMS (Data Modeling) - SoSe 2017 24 Data Modeling Object-relational impedance mismatch Example: orders, order lines, customers (with different

More information

JSON - Overview JSon Terminology

JSON - Overview JSon Terminology Announcements Introduction to Database Systems CSE 414 Lecture 12: Json and SQL++ Office hours changes this week Check schedule HW 4 due next Tuesday Start early WQ 4 due tomorrow 1 2 JSON - Overview JSon

More information

Compile-Time Code Generation for Embedded Data-Intensive Query Languages

Compile-Time Code Generation for Embedded Data-Intensive Query Languages Compile-Time Code Generation for Embedded Data-Intensive Query Languages Leonidas Fegaras University of Texas at Arlington http://lambda.uta.edu/ Outline Emerging DISC (Data-Intensive Scalable Computing)

More information

CSE 344 APRIL 16 TH SEMI-STRUCTURED DATA

CSE 344 APRIL 16 TH SEMI-STRUCTURED DATA CSE 344 APRIL 16 TH SEMI-STRUCTURED DATA ADMINISTRATIVE MINUTIAE HW3 due Wednesday OQ4 due Wednesday HW4 out Wednesday (Datalog) Exam May 9th 9:30-10:20 WHERE WE ARE So far we have studied the relational

More information

The Pig Experience. A. Gates et al., VLDB 2009

The Pig Experience. A. Gates et al., VLDB 2009 The Pig Experience A. Gates et al., VLDB 2009 Why not Map-Reduce? Does not directly support complex N-Step dataflows All operations have to be expressed using MR primitives Lacks explicit support for processing

More information

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS Questions & Answers- DBMS https://career.guru99.com/top-50-database-interview-questions/ 1) Define Database. A prearranged collection of figures known as data is called database. 2) What is DBMS? Database

More information

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 11: NoSQL & JSON (mostly not in textbook only Ch 11.1) HW5 will be posted on Friday and due on Nov. 14, 11pm [No Web Quiz 5] Today s lecture: NoSQL & JSON

More information

Beyond Hive Pig and Python

Beyond Hive Pig and Python Beyond Hive Pig and Python What is Pig? Pig performs a series of transformations to data relations based on Pig Latin statements Relations are loaded using schema on read semantics to project table structure

More information

CSE 344 MAY 7 TH EXAM REVIEW

CSE 344 MAY 7 TH EXAM REVIEW CSE 344 MAY 7 TH EXAM REVIEW EXAMINATION STATIONS Exam Wednesday 9:30-10:20 One sheet of notes, front and back Practice solutions out after class Good luck! EXAM LENGTH Production v. Verification Practice

More information

Data Formats and APIs

Data Formats and APIs Data Formats and APIs Mike Carey mjcarey@ics.uci.edu 0 Announcements Keep watching the course wiki page (especially its attachments): https://grape.ics.uci.edu/wiki/asterix/wiki/stats170ab-2018 Ditto for

More information

Hadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop

Hadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop Hadoop Open Source Projects Hadoop is supplemented by an ecosystem of open source projects Oozie 25 How to Analyze Large Data Sets in Hadoop Although the Hadoop framework is implemented in Java, MapReduce

More information

5/1/17. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

5/1/17. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 15: NoSQL & JSON (mostly not in textbook only Ch 11.1) 1 Homework 4 due tomorrow night [No Web Quiz 5] Midterm grading hopefully finished tonight post online

More information

Uniform Query Framework for Relational and NoSQL Databases

Uniform Query Framework for Relational and NoSQL Databases Copyright 2017 Tech Science Press CMES, vol.113, no.2, pp.171-181, 2017 Uniform Query Framework for Relational and NoSQL Databases J.B. Karanjekar 1 and M.B. Chandak 2 Abstract: As the data managed by

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data

More information

Sources. P. J. Sadalage, M Fowler, NoSQL Distilled, Addison Wesley

Sources. P. J. Sadalage, M Fowler, NoSQL Distilled, Addison Wesley Big Data and NoSQL Sources P. J. Sadalage, M Fowler, NoSQL Distilled, Addison Wesley Very short history of DBMSs The seventies: IMS end of the sixties, built for the Apollo program (today: Version 15)

More information

BIG DATA COURSE CONTENT

BIG DATA COURSE CONTENT BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

Jaql. Kevin Beyer, Vuk Ercegovac, Eugene Shekita, Jun Rao, Ning Li, Sandeep Tata. IBM Almaden Research Center

Jaql. Kevin Beyer, Vuk Ercegovac, Eugene Shekita, Jun Rao, Ning Li, Sandeep Tata. IBM Almaden Research Center Jaql Running Pipes in the Clouds Kevin Beyer, Vuk Ercegovac, Eugene Shekita, Jun Rao, Ning Li, Sandeep Tata IBM Almaden Research Center http://code.google.com/p/jaql/ 2009 IBM Corporation Motivating Scenarios

More information

Unifying Big Data Workloads in Apache Spark

Unifying Big Data Workloads in Apache Spark Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache

More information

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course: Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course: 20762C Developing SQL 2016 Databases Module 1: An Introduction to Database Development Introduction to the

More information

Querying Data with Transact SQL

Querying Data with Transact SQL Course 20761A: Querying Data with Transact SQL Course details Course Outline Module 1: Introduction to Microsoft SQL Server 2016 This module introduces SQL Server, the versions of SQL Server, including

More information

Hadoop Development Introduction

Hadoop Development Introduction Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand

More information

Writing Analytical Queries for Business Intelligence

Writing Analytical Queries for Business Intelligence MOC-55232 Writing Analytical Queries for Business Intelligence 3 Days Overview About this Microsoft SQL Server 2016 Training Course This three-day instructor led Microsoft SQL Server 2016 Training Course

More information

Oracle NoSQL Database Enterprise Edition, Version 18.1

Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across

More information

Module 4. Implementation of XQuery. Part 0: Background on relational query processing

Module 4. Implementation of XQuery. Part 0: Background on relational query processing Module 4 Implementation of XQuery Part 0: Background on relational query processing The Data Management Universe Lecture Part I Lecture Part 2 2 What does a Database System do? Input: SQL statement Output:

More information

QUERYING MICROSOFT SQL SERVER COURSE OUTLINE. Course: 20461C; Duration: 5 Days; Instructor-led

QUERYING MICROSOFT SQL SERVER COURSE OUTLINE. Course: 20461C; Duration: 5 Days; Instructor-led CENTER OF KNOWLEDGE, PATH TO SUCCESS Website: QUERYING MICROSOFT SQL SERVER Course: 20461C; Duration: 5 Days; Instructor-led WHAT YOU WILL LEARN This 5-day instructor led course provides students with

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls Today s topics CompSci 516 Data Intensive Computing Systems Lecture 4 Relational Algebra and Relational Calculus Instructor: Sudeepa Roy Finish NULLs and Views in SQL from Lecture 3 Relational Algebra

More information

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction

More information

Research challenges in data-intensive computing The Stratosphere Project Apache Flink

Research challenges in data-intensive computing The Stratosphere Project Apache Flink Research challenges in data-intensive computing The Stratosphere Project Apache Flink Seif Haridi KTH/SICS haridi@kth.se e2e-clouds.org Presented by: Seif Haridi May 2014 Research Areas Data-intensive

More information

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 16: NoSQL and JSon Current assignments: Homework 4 due tonight Web Quiz 6 due next Wednesday [There is no Web Quiz 5 Today s lecture: JSon The book covers

More information

Database Systems CSE 414

Database Systems CSE 414 Database Systems CSE 414 Lecture 16: NoSQL and JSon CSE 414 - Spring 2016 1 Announcements Current assignments: Homework 4 due tonight Web Quiz 6 due next Wednesday [There is no Web Quiz 5] Today s lecture:

More information

Oracle NoSQL Database Enterprise Edition, Version 18.1

Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across

More information

Pig A language for data processing in Hadoop

Pig A language for data processing in Hadoop Pig A language for data processing in Hadoop Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Apache Pig: Introduction Tool for querying data on Hadoop

More information

Big Data Analytics using Apache Hadoop and Spark with Scala

Big Data Analytics using Apache Hadoop and Spark with Scala Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important

More information

DATABASE DESIGN II - 1DL400

DATABASE DESIGN II - 1DL400 DATABASE DESIGN II - 1DL400 Fall 2016 A second course in database systems http://www.it.uu.se/research/group/udbl/kurser/dbii_ht16 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

April Copyright 2013 Cloudera Inc. All rights reserved.

April Copyright 2013 Cloudera Inc. All rights reserved. Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and the Virtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here April 2014 Analytic Workloads on

More information

Document stores using CouchDB

Document stores using CouchDB 2018 Document stores using CouchDB ADVANCED DATABASE PROJECT APARNA KHIRE, MINGRUI DONG aparna.khire@vub.be, mingdong@ulb.ac.be 1 Table of Contents 1. Introduction... 3 2. Background... 3 2.1 NoSQL Database...

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here 2013-11-12 Copyright 2013 Cloudera

More information

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan COSC 416 NoSQL Databases NoSQL Databases Overview Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Databases Brought Back to Life!!! Image copyright: www.dragoart.com Image

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

AsterixDB. A Scalable Open Source DBMS. This presentation is based on slides made by Michael J. Carey, Chen Li, and Vassilis Tsotras 11/28/2018 1

AsterixDB. A Scalable Open Source DBMS. This presentation is based on slides made by Michael J. Carey, Chen Li, and Vassilis Tsotras 11/28/2018 1 AsterixDB A Scalable Open Source DBMS This presentation is based on slides made by Michael J. Carey, Chen Li, and Vassilis Tsotras 1 Big Data / Web Warehousing So what went on and why? What s going on

More information

The power of PostgreSQL exposed with automatically generated API endpoints. Sylvain Verly Coderbunker 2016Postgres 中国用户大会 Postgres Conference China 20

The power of PostgreSQL exposed with automatically generated API endpoints. Sylvain Verly Coderbunker 2016Postgres 中国用户大会 Postgres Conference China 20 The power of PostgreSQL exposed with automatically generated API endpoints. Sylvain Verly Coderbunker Development actors Frontend developer Backend developer Database administrator System administrator

More information

Relational Algebra. Study Chapter Comp 521 Files and Databases Fall

Relational Algebra. Study Chapter Comp 521 Files and Databases Fall Relational Algebra Study Chapter 4.1-4.2 Comp 521 Files and Databases Fall 2010 1 Relational Query Languages Query languages: Allow manipulation and retrieval of data from a database. Relational model

More information

MapReduce and Friends

MapReduce and Friends MapReduce and Friends Craig C. Douglas University of Wyoming with thanks to Mookwon Seo Why was it invented? MapReduce is a mergesort for large distributed memory computers. It was the basis for a web

More information

Simplifying Information Integration: Object-Based Flow-of-Mappings Framework for Integration

Simplifying Information Integration: Object-Based Flow-of-Mappings Framework for Integration Simplifying Information Integration: Object-Based Flow-of-Mappings Framework for Integration Howard Ho IBM Almaden Research Center Take Home Messages Clio (Schema Mapping for EII) has evolved to Clio 2010

More information

745: Advanced Database Systems

745: Advanced Database Systems 745: Advanced Database Systems Yanlei Diao University of Massachusetts Amherst Outline Overview of course topics Course requirements Database Management Systems 1. Online Analytical Processing (OLAP) vs.

More information

Querying Microsoft SQL Server

Querying Microsoft SQL Server Querying Microsoft SQL Server 20461D; 5 days, Instructor-led Course Description This 5-day instructor led course provides students with the technical skills required to write basic Transact SQL queries

More information

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time

More information

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related

More information

Certified Big Data Hadoop and Spark Scala Course Curriculum

Certified Big Data Hadoop and Spark Scala Course Curriculum Certified Big Data Hadoop and Spark Scala Course Curriculum The Certified Big Data Hadoop and Spark Scala course by DataFlair is a perfect blend of indepth theoretical knowledge and strong practical skills

More information

MTA Database Administrator Fundamentals Course

MTA Database Administrator Fundamentals Course MTA Database Administrator Fundamentals Course Session 1 Section A: Database Tables Tables Representing Data with Tables SQL Server Management Studio Section B: Database Relationships Flat File Databases

More information

COSC 304 Introduction to Database Systems. NoSQL Databases. Dr. Ramon Lawrence University of British Columbia Okanagan

COSC 304 Introduction to Database Systems. NoSQL Databases. Dr. Ramon Lawrence University of British Columbia Okanagan COSC 304 Introduction to Database Systems NoSQL Databases Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Relational Databases Relational databases are the dominant form

More information

Introduction to NoSQL by William McKnight

Introduction to NoSQL by William McKnight Introduction to NoSQL by William McKnight All rights reserved. Reproduction in whole or part prohibited except by written permission. Product and company names mentioned herein may be trademarks of their

More information

T-SQL Training: T-SQL for SQL Server for Developers

T-SQL Training: T-SQL for SQL Server for Developers Duration: 3 days T-SQL Training Overview T-SQL for SQL Server for Developers training teaches developers all the Transact-SQL skills they need to develop queries and views, and manipulate data in a SQL

More information

Hacking PostgreSQL Internals to Solve Data Access Problems

Hacking PostgreSQL Internals to Solve Data Access Problems Hacking PostgreSQL Internals to Solve Data Access Problems Sadayuki Furuhashi Treasure Data, Inc. Founder & Software Architect A little about me... > Sadayuki Furuhashi > github/twitter: @frsyuki > Treasure

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic

More information

RethinkDB. Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu

RethinkDB. Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu RethinkDB Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu Content Introduction System Features Data Model ReQL Applications Introduction Niharika Vithala What is a NoSQL Database Databases that

More information

Oracle GoldenGate for Big Data

Oracle GoldenGate for Big Data Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines

More information

Column-Family Stores: Cassandra

Column-Family Stores: Cassandra NDBI040: Big Data Management and NoSQL Databases h p://www.ksi.mff.cuni.cz/ svoboda/courses/2016-1-ndbi040/ Lecture 10 Column-Family Stores: Cassandra Mar n Svoboda svoboda@ksi.mff.cuni.cz 13. 12. 2016

More information

Big Data Architect.

Big Data Architect. Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional

More information

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc. PROFESSIONAL NoSQL Shashank Tiwari WILEY John Wiley & Sons, Inc. Examining CONTENTS INTRODUCTION xvil CHAPTER 1: NOSQL: WHAT IT IS AND WHY YOU NEED IT 3 Definition and Introduction 4 Context and a Bit

More information

New Developments in Spark

New Developments in Spark New Developments in Spark And Rethinking APIs for Big Data Matei Zaharia and many others What is Spark? Unified computing engine for big data apps > Batch, streaming and interactive Collection of high-level

More information

Query Languages for Document Stores

Query Languages for Document Stores Query Languages for Document Stores NoSQL matters conference 2013-04-26 Jan Steemann me I'm a software developer working at triagens GmbH on and with Documents Documents documents are self-contained, aggregate

More information

1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions

1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions Big Data Hadoop Architect Online Training (Big Data Hadoop + Apache Spark & Scala+ MongoDB Developer And Administrator + Apache Cassandra + Impala Training + Apache Kafka + Apache Storm) 1 Big Data Hadoop

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Data Stream Processing Topics Model Issues System Issues Distributed Processing Web-Scale Streaming 3 Data Streams Continuous

More information

Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10

Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10 Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10 RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY, KIRUMAMPAKKAM-607 402 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING QUESTION BANK

More information

Querying Data with Transact-SQL

Querying Data with Transact-SQL Course 20761A: Querying Data with Transact-SQL Page 1 of 5 Querying Data with Transact-SQL Course 20761A: 2 days; Instructor-Led Introduction The main purpose of this 2 day instructor led course is to

More information

Advanced Data Management Technologies Written Exam

Advanced Data Management Technologies Written Exam Advanced Data Management Technologies Written Exam 02.02.2016 First name Student number Last name Signature Instructions for Students Write your name, student number, and signature on the exam sheet. This

More information

4. SQL - the Relational Database Language Standard 4.3 Data Manipulation Language (DML)

4. SQL - the Relational Database Language Standard 4.3 Data Manipulation Language (DML) Since in the result relation each group is represented by exactly one tuple, in the select clause only aggregate functions can appear, or attributes that are used for grouping, i.e., that are also used

More information

Querying Data with Transact-SQL

Querying Data with Transact-SQL Querying Data with Transact-SQL Course: 20761 Course Details Audience(s): IT Professional(s) Technology: Microsoft SQL Server 2016 Duration: 24 HRs. ABOUT THIS COURSE This course is designed to introduce

More information

CSE 344 JULY 9 TH NOSQL

CSE 344 JULY 9 TH NOSQL CSE 344 JULY 9 TH NOSQL ADMINISTRATIVE MINUTIAE HW3 due Wednesday tests released actual_time should have 0s not NULLs upload new data file or use UPDATE to change 0 ~> NULL Extra OOs on Mondays 5-7pm in

More information

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos Instituto Politécnico de Tomar Introduction to Big Data NoSQL Databases Ricardo Campos Mestrado EI-IC Análise e Processamento de Grandes Volumes de Dados Tomar, Portugal, 2016 Part of the slides used in

More information

Querying Microsoft SQL Server 2012/2014

Querying Microsoft SQL Server 2012/2014 Page 1 of 14 Overview This 5-day instructor led course provides students with the technical skills required to write basic Transact-SQL queries for Microsoft SQL Server 2014. This course is the foundation

More information

Innovatus Technologies

Innovatus Technologies HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String

More information

CSE 344 FEBRUARY 2 ND DATA SEMI-STRUCTURED

CSE 344 FEBRUARY 2 ND DATA SEMI-STRUCTURED CSE 344 FEBRUARY 2 ND DATA SEMI-STRUCTURED ADMINISTRATIVE MINUTIAE HW3 due Tonight, 11:30 pm OQ4 Due Wednesday, 11:00 pm HW4 due Friday 11:30 pm Exam next Friday 3:30-5:00 WHERE WE ARE So far we have studied

More information

AVANTUS TRAINING PTE LTD

AVANTUS TRAINING PTE LTD [MS20461]: Querying Microsoft SQL Server 2014 Length : 5 Days Audience(s) : IT Professionals Level : 300 Technology : SQL Server Delivery Method : Instructor-led (Classroom) Course Overview This 5-day

More information

Certified Big Data and Hadoop Course Curriculum

Certified Big Data and Hadoop Course Curriculum Certified Big Data and Hadoop Course Curriculum The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation

More information

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS h_da Prof. Dr. Uta Störl Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe 2017 163 Performance / Benchmarks Traditional database benchmarks

More information

Using ElasticSearch to Enable Stronger Query Support in Cassandra

Using ElasticSearch to Enable Stronger Query Support in Cassandra Using ElasticSearch to Enable Stronger Query Support in Cassandra www.impetus.com Introduction Relational Databases have been in use for decades, but with the advent of big data, there is a need to use

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

SQL in the Hybrid World

SQL in the Hybrid World SQL in the Hybrid World Tanel Poder a long time computer performance geek 1 Tanel Põder Intro: About me Oracle Database Performance geek (18+ years) Exadata Performance geek Linux Performance geek Hadoop

More information

Querying Microsoft SQL Server

Querying Microsoft SQL Server Querying Microsoft SQL Server Course 20461D 5 Days Instructor-led, Hands-on Course Description This 5-day instructor led course is designed for customers who are interested in learning SQL Server 2012,

More information

Querying Data with Transact SQL Microsoft Official Curriculum (MOC 20761)

Querying Data with Transact SQL Microsoft Official Curriculum (MOC 20761) Querying Data with Transact SQL Microsoft Official Curriculum (MOC 20761) Course Length: 3 days Course Delivery: Traditional Classroom Online Live MOC on Demand Course Overview The main purpose of this

More information

Introduction to NoSQL Databases

Introduction to NoSQL Databases Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31 Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31 Introduction

More information

Teradata SQL Features Overview Version

Teradata SQL Features Overview Version Table of Contents Teradata SQL Features Overview Version 14.10.0 Module 0 - Introduction Course Objectives... 0-4 Course Description... 0-6 Course Content... 0-8 Module 1 - Teradata Studio Features Optimize

More information

Shark: Hive (SQL) on Spark

Shark: Hive (SQL) on Spark Shark: Hive (SQL) on Spark Reynold Xin UC Berkeley AMP Camp Aug 21, 2012 UC BERKELEY SELECT page_name, SUM(page_views) views FROM wikistats GROUP BY page_name ORDER BY views DESC LIMIT 10; Stage 0: Map-Shuffle-Reduce

More information

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

Introduction to BigData, Hadoop:-

Introduction to BigData, Hadoop:- Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,

More information

Progress DataDirect For Business Intelligence And Analytics Vendors

Progress DataDirect For Business Intelligence And Analytics Vendors Progress DataDirect For Business Intelligence And Analytics Vendors DATA SHEET FEATURES: Direction connection to a variety of SaaS and on-premises data sources via Progress DataDirect Hybrid Data Pipeline

More information

Middleware Layer for Replication between Relational and Document Databases

Middleware Layer for Replication between Relational and Document Databases Middleware Layer for Replication between Relational and Document Databases Susantha Bathige and P.P.G Dinesh Asanka Abstract Effective and efficient data management is key to today s competitive business

More information

Yahoo! manages more than 42,000 Hadoop nodes holding 200 PBs. more than 22 organizations are running PB-scale Hadoop clusters

Yahoo! manages more than 42,000 Hadoop nodes holding 200 PBs. more than 22 organizations are running PB-scale Hadoop clusters An Optimization Framework for Map-Reduce Queries Leonidas Fegaras Chengkai Li Upa Gupta University of Texas at Arlington http://lambda.uta.edu/mrql/ Alternatives to MR Compared to a parallel RDB, the MR

More information

BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data IBM Corporation

BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data IBM Corporation BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data 2013 IBM Corporation A Big Data architecture evolves from a traditional BI architecture

More information

Physical Database Design and Tuning. Chapter 20

Physical Database Design and Tuning. Chapter 20 Physical Database Design and Tuning Chapter 20 Introduction We will be talking at length about database design Conceptual Schema: info to capture, tables, columns, views, etc. Physical Schema: indexes,

More information