Getting Started with Cassandra
|
|
- Buck Dawson
- 6 years ago
- Views:
Transcription
1 Getting Started with Cassandra A Tutorial Ben
2 Introduction Scaling Relational Databases is Easy! It s a solved problem! Just ask EBay in
3 \ 3
4 Wow! Let s do that again! 4
5 Data Requirements Volume Velocity Variety Variability What is the word for Availability that starts with a V? 5
6 Whatever...! I need my data I need it quickly Always available Simple to operate Simple to maintain 6
7 Why Do I Need This? Data is Too Big Moves Too Fast Doesn t Fit 7
8 Who Else Uses Cassandra? 8
9 Cassandra Design Scale Linearly Continuous Availability High Performance 9
10 Apache Cassandra Google Big Table Amazon Dynamo Facebook Cassandra 10
11 Google Big Table Scalable Data Model Flexible Schema 11
12 Amazon Dynamo Scale-Out Architecture Key Distribution Automatic Partitioning Peer to Peer Architecture 12
13 Installing Cassandra 13
14 Virtual Machine: USB Drive Cassandra is there Virtual Machine Image username: notroot password: notroot Install Cassandra on the Virtual Machine (tarball in the /home/notroot directory) 14
15 Starting the VM Create a new VM in VirtualBox Name the VM Linux/64Bit 2048 RAM (could be less) Select your VM Copied from the USB Drive 15
16 Installing Cassandra 16
17 Installing Cassandra (Cont.) 17
18 Install Cassandra (Cont) Unzip the tarball to your home directory $ tar -xzvf apache-cassandra bin.tar.gz $ rm *.tar.gz 18
19 Install Cassandra (Cont) Rename The Directory $ mv apache-cassandra cassandra 19
20 Install Cassandra (Cont) Create The Data Directory $ cd cassandra $ mkdir cassandra-data 20
21 Install Cassandra (Cont) Create saved_caches, data, and commitlog directories $ cd cassandra-data $ mkdir data $ mkdir saved_caches $ mkdir commitlog 21
22 Install Cassandra (Cont) Create saved_caches, data, and commitlog directories $ cd ~/cassandra/conf Set the following values in the cassandra.yaml initial_token: 0 data_file_directories: - ~/cassandra/cassandra-data/data commitlog_directory: ~/cassandra/cassandra-data/commitlog saved_caches_directory: ~/cassandra/cassandra-data/saved_caches 22
23 Install Cassandra (Cont) Change the default logging directory in the log4j- server.properties log4j.appender.r.file= ~/cassandra/cassandra-data/system.log 23
24 Install Cassandra (Cont) Start Cassandra in the foreground $ cd ~/cassandra $ bin/cassandra -f 24
25 If You Couldn t Keep Up Or you just entered the room... You can just run as root username: notroot password: notroot 25
26 Data Fundamentals 26
27 Approaches Data Modeling in Cassandra Logical DataModels are the SAME Create Logical Models for Understanding Physical data models have different goals Physical Models for Storage and Retrieval Mostly for Retrieval 27
28 Storage Concerns Disk space is not scarce IO is scarce 28
29 Data Modeling Goals Reduce IO Fewer Round Trips 29
30 The More You Know Cassandra has a DSL called CQL It s similar to SQL It s not SQL... It s not SQL 30
31 CQL Samples Create Schema $ cqlsh cqlsh> CREATE KEYSPACE datastax... WITH strategy_class = 'SimpleStrategy'... AND strategy_options:replication_factor = 1; use datastax; cqlsh:datastax> CREATE COLUMNFAMILY users... (id text PRIMARY KEY,... fname text,... lname text,... age int); 31
32 CQL Sample: Insert Data cqlsh:datastax> INSERT INTO users... (id, fname, lname, age)... VALUES ( 'bob', 'Robert', 'Done', 33 ); cqlsh:datastax> INSERT INTO users... (id, fname, lname, age)... VALUES ( 'alice', 'Allison', 'Smith', 24 ); cqlsh:datastax> INSERT INTO users... (id, fname, lname, age)... VALUES ( 'chuck', 'Charles', 'Smith', 22 ); 32
33 Simple Range Queries cqlsh:datastax> SELECT *... FROM users; id age fname lname chuck 22 Charles Smith bob 33 Robert Done alice 24 Allison Smith 33
34 Inserts are Upserts cqlsh:datastax> INSERT INTO users... (id, fname, lname, age)... VALUES ( 'chuck', 'Charlie', 'Smithers', 50 ); cqlsh:datastax> select * from users; id age fname lname chuck 50 Charlie Smithers bob 33 Robert Done alice 24 Allison Smith 34
35 Primary Key Violation? Would require a read before a write. UUIDs are helpful v1 (TimeUUID) v3 (Faster) 35
36 Schema is loose cqlsh:datastax> INSERT INTO users... (id, address)... VALUES ( 'charlie', 123 Apple Street ); 36
37 Indexing Primary Key Secondary Index Custom Index 37
38 Primary Key We find row data by key Cassandra is a Key Value Store Rows have columns 38
39 Primary Key Query cqlsh:datastax> SELECT *... FROM users... WHERE ID = 'chuck'; id age fname lname chuck 50 Charlie Smithers 39
40 Querying on a Column cqlsh:datastax> SELECT *... FROM users... where fname = 'Charlie'; Bad Request: No indexed columns present in by-columns clause with Equal operator 40
41 Secondary Indexes cqlsh:datastax> CREATE INDEX ON users (fname); cqlsh:datastax> SELECT *... FROM users... WHERE fname = 'Charlie'; id age fname lname chuck 50 Charlie Smithers 41
42 Secondary Indexes (cont.) They are nice. But they have a cost. Require a read before a write. Must use the equals clause. Results should be low cardinality. 42
43 Foreign Keys No Concept of Foreign Keys in Cassandra All relationships are managed by you No Triggers (yet) You can have Foreign Keys It s up to you to keep the data and relationship current You may need to write to many places on an update. 43
44 Data Modeling 44
45 Data Modeling Your relationships are in your data An RDBMS can model some of those well Cassandra can model others 45
46 Schema Evolution in Cassandra Structure Follows Closely Bigtable Grouping is related to columns Early Versions Schemaless 46
47 Why Have Schema? Share Information Describe the Data Validation 47
48 Schema in Cassandra Introduced in 0.7 Consists of DataTypes You can ignore it But you shouldn t cqlsh:datastax> CREATE COLUMNFAMILY users... (id text PRIMARY KEY,... fname text,... lname text); ALTER TABLE users ADD age INT; 48
49 Looks like my RDBMS (again) No pre-allocation for rows that may be added No wasted space No limit on what can be added in the future No forced Data Migration (UPDATE TABLE) Can have thousands of columns 49
50 Composite Primary Keys Consider the CassandraFS blocks -- May be on different partitions subblocks -- kept on contiguous physical blocks Enforced by the data model CREATE TABLE sblocks ( block_id uuid, subblock_id uuid, data blob, PRIMARY KEY (block_id, subblock_id) ) Partition Key WITH COMPACT STORAGE; Column Name (Clustered) 50
51 Clustering Accessing a Row requires Seeking Bad, but unavoidable But getting a range of contiguous columns is fast After the seek Goal: We want to minimize seeking If I can pack the same information into a single row Rather than multiple rows Then I can save on seeks Wide rows are good! 51
52 Scenario: Real Time Feed Capture the data Non real-time feed here (if we need it): ( usagov_bitly_data ) 52
53 Data Definition? { "a": USER_AGENT, "c": COUNTRY_CODE, # 2-character iso code "nk": KNOWN_USER, # 1 or 0. "g": GLOBAL_BITLY_HASH, "h": ENCODING_USER_BITLY_HASH, "l": ENCODING_USER_LOGIN, "hh": SHORT_URL_CNAME, "r": REFERRING_URL, "u": LONG_URL, "t": TIMESTAMP, "gr": GEO_REGION, "ll": [LATITUDE, LONGITUDE], "cy": GEO_CITY_NAME, "tz": TIMEZONE # "hc": TIMESTAMP OF TIME HASH WAS CREATED, "al": ACCEPT_LANGUAGE } 53
54 Data { "a": "Mozilla\/5.0 (Windows NT 5.1)...", "c": "HK", "nk": 0, "tz": "Asia\/Hong_Kong", "gr": "00", "g": "NEQ8H9", "h": "P2GwTT", "l": "nasatwitter", "al": "zh-tw,zh;q=0.8,en-us;q=0.6,en;q=0.4", "hh": "go.nasa.gov", "r": " "u": " "t": , "hc": , "cy": "Central District", "ll": [ , ] } 54
55 What do we want to do? Step 1: Capture the data Step 2: Store It? Step 3:??? Step 4: Profit! 55
56 Step 3 Daily Aggregates for each URL? Daily Rankings? Just a few examples... 56
57 Daily URL Aggregation (Model) $ cqlsh -3 Version 3! cqlsh> USE datastax; cqlsh:datastax> CREATE TABLE clicks_for_hash ( global_hash text, "timestamp" double, country_code text, long_url text, PRIMARY KEY (global_hash, "timestamp") ); 57
58 Test the Data Model cqlsh:datastax> insert into clicks_for_hash... (global_hash, "timestamp", country_code, long_url)... VALUES ( 'badfood', 00001, 'US', ' badfood'); cqlsh:datastax> insert into clicks_for_hash... (global_hash, "timestamp", country_code, long_url)... VALUES ( 'foobar', 00002, 'US', ' foobar'); cqlsh:datastax> insert into clicks_for_hash... (global_hash, "timestamp", country_code, long_url)... VALUES ( 'cafebabe', 00001, 'US', ' cafebabe'); 58
59 Get Counts cqlsh:datastax> select COUNT(*)... from clicks_for_hash... where global_hash = 'cafebabe'; count cqlsh:datastax> select COUNT(*)... from clicks_for_hash... where global_hash = 'badfood'; count
60 Data Questions When were the link hashes created? How many were created today (range)? Store this data in a new column family. Sort columns by timestamp. Query By CQL 60
61 Data Model (Created?) cqlsh:datastax> CREATE COLUMNFAMILY sort_created... (key text,... created_time int,... link_hash text,... url text,... PRIMARY KEY (key, created_time)); 61
62 Simple Code $ ipython In [1]: import urllib In [2]: import cql In [3]: import json In [4]: connection = cql.connect('localhost', 9160, 'datastax', cql_version='3.0.0') In [5]: clicks = urllib.urlopen(' In [6]: clicks.readline() In [7]: cursor = connection.cursor() In [8]: while 1: obj = json.loads(clicks.readline()) if obj.has_key('h'): cursor.execute(''' INSERT INTO sort_created ("key", created_time, link_hash, url) VALUES ('all', %s, '%s', '%s')''' % (obj['hc'], obj['h'], obj['u'])) 62
63 All The Links Sorted By Created Time Could create more rows.. By Region... By Short Url CName Etc.. All sorted the same way 63
64 Some Queries cqlsh:datastax> SELECT *... FROM sort_created... WHERE created_time > 0... LIMIT 10; cqlsh:datastax> SELECT *... FROM sort_created... WHERE key = 'all'... AND created_time > LIMIT 10; cqlsh:datastax> SELECT *... FROM sort_created... WHERE key = 'all'... AND created_time > ORDER BY created_time DESC... LIMIT 10; 64
65 Overview This is just a taste Modeling OLTP Use Cases DataStax Enterprise provides integration with: Apache Hadoop (Batch Analytics) Solr (Real Time Search) Mahout (Machine Learning) 65
66 Want more Info? Drop by our Booth! 66
Getting to know. by Michelle Darling August 2013
Getting to know by Michelle Darling mdarlingcmt@gmail.com August 2013 Agenda: What is Cassandra? Installation, CQL3 Data Modelling Summary Only 15 min to cover these, so please hold questions til the end,
More informationColumn-Family Databases Cassandra and HBase
Column-Family Databases Cassandra and HBase Kevin Swingler Google Big Table Google invented BigTableto store the massive amounts of semi-structured data it was generating Basic model stores items indexed
More informationCassandra 1.0 and Beyond
Cassandra 1.0 and Beyond Jake Luciani, DataStax jake@datastax.com, 11/11/11 1 About me http://twitter.com/tjake Cassandra Committer Thrift PMC Early DataStax employee Ex-Wall St. (happily) Job Trends from
More informationColumn-Family Stores: Cassandra
Course NDBI040: Big Data Management and NoSQL Databases Practice 03: Column-Family Stores: Cassandra Martin Svoboda 1. 12. 2015 Faculty of Mathematics and Physics, Charles University in Prague Outline
More informationIntro Cassandra. Adelaide Big Data Meetup.
Intro Cassandra Adelaide Big Data Meetup instaclustr.com @Instaclustr Who am I and what do I do? Alex Lourie Worked at Red Hat, Datastax and now Instaclustr We currently manage x10s nodes for various customers,
More informationComputer Engineer Programming Electronics Math <3 <3 Physics Lego Meetups Animals Coffee GIFs
SASI and Secondary Indexes Hi! Computer Engineer Programming Electronics Math
More informationCassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent
Tanton Jeppson CS 401R Lab 3 Cassandra, MongoDB, and HBase Introduction For my report I have chosen to take a deeper look at 3 NoSQL database systems: Cassandra, MongoDB, and HBase. I have chosen these
More informationMassively scalable NoSQL with Apache Cassandra! Jonathan Ellis Project Chair, Apache Cassandra CTO,
Massively scalable NoSQL with Apache Cassandra! Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced Big data Analytics (Hadoop)? Realtime ( NoSQL ) Some Casandra users ebay Application/Use
More informationCassandra 2012: What's New & Upcoming. Sam Tunnicliffe
Cassandra 2012: What's New & Upcoming Sam Tunnicliffe sam@datastax.com DSE : integrated Big Data platform Built on Cassandra Analytics using Hadoop (Hive/Pig/Mahout) Enterprise Search with Solr Cassandra
More informationCassandra- A Distributed Database
Cassandra- A Distributed Database Tulika Gupta Department of Information Technology Poornima Institute of Engineering and Technology Jaipur, Rajasthan, India Abstract- A relational database is a traditional
More informationStudy of NoSQL Database Along With Security Comparison
Study of NoSQL Database Along With Security Comparison Ankita A. Mall [1], Jwalant B. Baria [2] [1] Student, Computer Engineering Department, Government Engineering College, Modasa, Gujarat, India ank.fetr@gmail.com
More informationTechnical Deep Dive: Cassandra + Solr. Copyright 2012, Think Big Analy7cs, All Rights Reserved
Technical Deep Dive: Cassandra + Solr Confiden7al Business case 2 Super scalable realtime analytics Hadoop is fantastic at performing batch analytics Cassandra is an advanced column family oriented system
More informationNoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu
NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related
More informationCassandra Database Security
Cassandra Database Security Author: Mohit Bagria NoSQL Database A NoSQL database (sometimes called as Not Only SQL) is a database that provides a mechanism to store and retrieve data other than the tabular
More informationADVANCED DATABASES CIS 6930 Dr. Markus Schneider
ADVANCED DATABASES CIS 6930 Dr. Markus Schneider Group 2 Archana Nagarajan, Krishna Ramesh, Raghav Ravishankar, Satish Parasaram Drawbacks of RDBMS Replication Lag Master Slave Vertical Scaling. ACID doesn
More informationUSING CASSANDRA WITH SOURCEPRO DB
USING CASSANDRA WITH SOURCEPRO DB INTRODUCTION TO CASSANDRA Apache Cassandra is a distributed database designed to manage large amounts of structured and unstructured data across many servers. Originally
More informationCQL for Apache Cassandra 3.0 (Earlier version)
CQL for Apache Cassandra 3.0 (Earlier version) Updated: 2018-08-20-07:00 2018 DataStax, Inc. All rights reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries
More informationNon-Relational Databases. Pelle Jakovits
Non-Relational Databases Pelle Jakovits 25 October 2017 Outline Background Relational model Database scaling The NoSQL Movement CAP Theorem Non-relational data models Key-value Document-oriented Column
More informationThe Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou
The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More informationWhat s New in DataStax Enterprise 3.1? A Guide for Developers, Architects and IT Managers. White Paper BY DATASTAX CORPORATION November 2013
What s New in DataStax Enterprise 3.1? A Guide for Developers, Architects and IT Managers White Paper BY DATASTAX CORPORATION November 2013 1 Table of Contents Abstract 3 Introduction 3 What s New in DataStax
More informationCIB Session 12th NoSQL Databases Structures
CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is
More informationApache Cassandra Documentation
Apache Cassandra Documentation February 16, 2012 2012 DataStax. All rights reserved. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Apache,!Apache!Cassandra,!Apache!Hadoop,!Hadoop!and!the!eye!logo! are!trademarks!of!the!apache!software!foundation!
More informationCassandra Installation and Configuration Guide. Installation
Cassandra Installation and Configuration Guide Installation 6/18/2018 Contents 1 Installation 1.1 Step 1: Downloading and Setting Environmental Variables 1.2 Step 2: Edit configuration files 1.3 Step 3:
More informationColumn-Family Stores: Cassandra
NDBI040: Big Data Management and NoSQL Databases h p://www.ksi.mff.cuni.cz/ svoboda/courses/2016-1-ndbi040/ Lecture 10 Column-Family Stores: Cassandra Mar n Svoboda svoboda@ksi.mff.cuni.cz 13. 12. 2016
More informationNoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre
NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Leveraging the NoSQL boom 2 Why NoSQL? In the last fourty years relational databases have been the default choice for serious
More informationScalable Tools - Part I Introduction to Scalable Tools
Scalable Tools - Part I Introduction to Scalable Tools Adisak Sukul, Ph.D., Lecturer, Department of Computer Science, adisak@iastate.edu http://web.cs.iastate.edu/~adisak/mbds2018/ Scalable Tools session
More informationA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA Hima S 1, Varalakshmi P 2 and Surekha Mariam Varghese 3 Department of Computer Science and Engineering, M.A. College of Engineering, Kothamangalam,
More informationBig Data Analytics. Rasoul Karimi
Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Outline
More informationOutline. Introduction Background Use Cases Data Model & Query Language Architecture Conclusion
Outline Introduction Background Use Cases Data Model & Query Language Architecture Conclusion Cassandra Background What is Cassandra? Open-source database management system (DBMS) Several key features
More informationData Informatics. Seon Ho Kim, Ph.D.
Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu HBase HBase is.. A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage. Designed to operate
More informationCassandra Design Patterns
Cassandra Design Patterns Sanjay Sharma Chapter No. 1 "An Overview of Architecture and Data Modeling in Cassandra" In this package, you will find: A Biography of the author of the book A preview chapter
More informationDistributed Non-Relational Databases. Pelle Jakovits
Distributed Non-Relational Databases Pelle Jakovits Tartu, 7 December 2018 Outline Relational model NoSQL Movement Non-relational data models Key-value Document-oriented Column family Graph Non-relational
More informationIntroduction to Database Systems CSE 344
Introduction to Database Systems CSE 344 Lecture 10: Basics of Data Storage and Indexes 1 Student ID fname lname Data Storage 10 Tom Hanks DBMSs store data in files Most common organization is row-wise
More informationIntroduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos
Instituto Politécnico de Tomar Introduction to Big Data NoSQL Databases Ricardo Campos Mestrado EI-IC Análise e Processamento de Grandes Volumes de Dados Tomar, Portugal, 2016 Part of the slides used in
More informationIndex. Raul Estrada and Isaac Ruiz 2016 R. Estrada and I. Ruiz, Big Data SMACK, DOI /
Index A ACID, 251 Actor model Akka installation, 44 Akka logos, 41 OOP vs. actors, 42 43 thread-based concurrency, 42 Agents server, 140, 251 Aggregation techniques materialized views, 216 probabilistic
More informationMigrating to Cassandra in the Cloud, the Netflix Way
Migrating to Cassandra in the Cloud, the Netflix Way Jason Brown - @jasobrown Senior Software Engineer, Netflix Tech History, 1998-2008 In the beginning, there was the webapp and a single database in a
More informationCQL for DataStax Enterprise 5.1 (Previous version)
CQL for DataStax Enterprise 5.1 (Previous version) Updated: 2018-06-11-07:00 2018 DataStax, Inc. All rights reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries
More informationCS November 2018
Bigtable Highly available distributed storage Distributed Systems 19. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account
More informationCS November 2017
Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account
More informationIntroduction to Database Systems CSE 344
Introduction to Database Systems CSE 344 Lecture 6: Basic Query Evaluation and Indexes 1 Announcements Webquiz 2 is due on Tuesday (01/21) Homework 2 is posted, due week from Monday (01/27) Today: query
More informationSimba ODBC Driver with SQL Connector for Apache Cassandra
Simba ODBC Driver with SQL Connector for Apache Cassandra 2.0.16 The release notes provide details of enhancements and features in Simba ODBC Driver with SQL Connector for Apache Cassandra 2.0.16, as well
More informationDatabase Systems CSE 414
Database Systems CSE 414 Lecture 10: Basics of Data Storage and Indexes 1 Reminder HW3 is due next Tuesday 2 Motivation My database application is too slow why? One of the queries is very slow why? To
More informationMassively scalable NoSQL with Apache Cassandra! Jonathan Ellis Project Chair, Apache Cassandra CTO,
Massively scalable NoSQL with Apache Cassandra! Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced Cassandra Job Trends Big Data trend Why Big Data Matters Big data Analytics (Hadoop)?
More informationFinal Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm
Final Exam Logistics CS 133: Databases Fall 2018 Lec 25 12/06 NoSQL Final exam take-home Available: Friday December 14 th, 4:00pm in Olin Due: Monday December 17 th, 5:15pm Same resources as midterm Except
More informationModern Database Systems Lecture 1
Modern Database Systems Lecture 1 Aristides Gionis Michael Mathioudakis T.A.: Orestis Kostakis Spring 2016 logistics assignment will be up by Monday (you will receive email) due Feb 12 th if you re not
More informationNoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems
CompSci 516 Data Intensive Computing Systems Lecture 21 (optional) NoSQL systems Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Key- Value Stores Duke CS,
More informationOverview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL
* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Overview * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy *TowardsNewSQL NoSQL
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 10: Mutable State (1/2) March 15, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More informationA NoSQL Introduction for Relational Database Developers. Andrew Karcher Las Vegas SQL Saturday September 12th, 2015
A NoSQL Introduction for Relational Database Developers Andrew Karcher Las Vegas SQL Saturday September 12th, 2015 About Me http://www.andrewkarcher.com Twitter: @akarcher LinkedIn, Twitter Email: akarcher@gmail.com
More informationIntroduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University
Introduction to Computer Science William Hsu Department of Computer Science and Engineering National Taiwan Ocean University Chapter 9: Database Systems supplementary - nosql You can have data without
More informationDatabase Systems CSE 414
Database Systems CSE 414 Lecture 15-16: Basics of Data Storage and Indexes (Ch. 8.3-4, 14.1-1.7, & skim 14.2-3) 1 Announcements Midterm on Monday, November 6th, in class Allow 1 page of notes (both sides,
More informationPig A language for data processing in Hadoop
Pig A language for data processing in Hadoop Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Apache Pig: Introduction Tool for querying data on Hadoop
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationIntroduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe
Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms
More informationFundamentals of Database Systems
Fundamentals of Database Systems Assignment: 2 Due Date: 18th August, 2017 Instructions This question paper contains 10 questions in 6 pages. Q1: Consider the following schema for an office payroll system,
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (1/2) March 14, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More information1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions
Big Data Hadoop Architect Online Training (Big Data Hadoop + Apache Spark & Scala+ MongoDB Developer And Administrator + Apache Cassandra + Impala Training + Apache Kafka + Apache Storm) 1 Big Data Hadoop
More informationGoal of the presentation is to give an introduction of NoSQL databases, why they are there.
1 Goal of the presentation is to give an introduction of NoSQL databases, why they are there. We want to present "Why?" first to explain the need of something like "NoSQL" and then in "What?" we go in
More information10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414
Announcements Database Systems CSE 414 Lecture 11: NoSQL & JSON (mostly not in textbook only Ch 11.1) HW5 will be posted on Friday and due on Nov. 14, 11pm [No Web Quiz 5] Today s lecture: NoSQL & JSON
More informationIntroduction to NoSQL Databases
Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31 Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31 Introduction
More informationNoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014
NoSQL Databases Amir H. Payberah Swedish Institute of Computer Science amir@sics.se April 10, 2014 Amir H. Payberah (SICS) NoSQL Databases April 10, 2014 1 / 67 Database and Database Management System
More informationNew Data Architectures For Netflow Analytics NANOG 74. Fangjin Yang - Imply
New Data Architectures For Netflow Analytics NANOG 74 Fangjin Yang - Cofounder @ Imply The Problem Comparing technologies Overview Operational analytic databases Try this at home The Problem Netflow data
More informationBigtable. Presenter: Yijun Hou, Yixiao Peng
Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. OSDI 06 Presenter: Yijun Hou, Yixiao Peng
More informationMongoDB An Overview. 21-Oct Socrates
MongoDB An Overview 21-Oct-2016 Socrates Agenda What is NoSQL DB? Types of NoSQL DBs DBMS and MongoDB Comparison Why MongoDB? MongoDB Architecture Storage Engines Data Model Query Language Security Data
More informationSources. P. J. Sadalage, M Fowler, NoSQL Distilled, Addison Wesley
Big Data and NoSQL Sources P. J. Sadalage, M Fowler, NoSQL Distilled, Addison Wesley Very short history of DBMSs The seventies: IMS end of the sixties, built for the Apollo program (today: Version 15)
More informationTime Series Storage with Apache Kudu (incubating)
Time Series Storage with Apache Kudu (incubating) Dan Burkert (Committer) dan@cloudera.com @danburkert Tweet about this talk: @getkudu or #kudu 1 Time Series machine metrics event logs sensor telemetry
More informationGlossary. Updated: :00
Updated: 2018-07-25-07:00 2018 DataStax, Inc. All rights reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
More informationQuick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine
Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine Version 4.11 Last Updated: 1/10/2018 Please note: This appliance is for testing and educational purposes only;
More informationApache Cassandra 2.1 for DSE (EOL)
Apache Cassandra 2.1 for DSE (EOL) Updated: 2018-06-11-07:00 2018 DataStax, Inc. All rights reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries in the
More informationDatabase Systems CSE 414
Database Systems CSE 414 Lecture 10-11: Basics of Data Storage and Indexes (Ch. 8.3-4, 14.1-1.7, & skim 14.2-3) 1 Announcements No WQ this week WQ4 is due next Thursday HW3 is due next Tuesday should be
More informationNoSQL Database Comparison: Bigtable, Cassandra and MongoDB CJ Campbell Brigham Young University October 16, 2015
Running Head: NOSQL DATABASE COMPARISON: BIGTABLE, CASSANDRA AND MONGODB NoSQL Database Comparison: Bigtable, Cassandra and MongoDB CJ Campbell Brigham Young University October 16, 2015 1 INTRODUCTION
More informationNOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY
NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY WHAT IS NOSQL? Stands for No-SQL or Not Only SQL. Class of non-relational data storage systems E.g.
More informationMigrating Oracle Databases To Cassandra
BY UMAIR MANSOOB Why Cassandra Lower Cost of ownership makes it #1 choice for Big Data OLTP Applications. Unlike Oracle, Cassandra can store structured, semi-structured, and unstructured data. Cassandra
More informationMTA Database Administrator Fundamentals Course
MTA Database Administrator Fundamentals Course Session 1 Section A: Database Tables Tables Representing Data with Tables SQL Server Management Studio Section B: Database Relationships Flat File Databases
More informationTypical size of data you deal with on a daily basis
Typical size of data you deal with on a daily basis Processes More than 161 Petabytes of raw data a day https://aci.info/2014/07/12/the-dataexplosion-in-2014-minute-by-minuteinfographic/ On average, 1MB-2MB
More informationBigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao
Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI 2006 Presented by Xiang Gao 2014-11-05 Outline Motivation Data Model APIs Building Blocks Implementation Refinement
More information20762B: DEVELOPING SQL DATABASES
ABOUT THIS COURSE This five day instructor-led course provides students with the knowledge and skills to develop a Microsoft SQL Server 2016 database. The course focuses on teaching individuals how to
More informationHadoop is essentially an operating system for distributed processing. Its primary subsystems are HDFS and MapReduce (and Yarn).
1 Hadoop Primer Hadoop is essentially an operating system for distributed processing. Its primary subsystems are HDFS and MapReduce (and Yarn). 2 Passwordless SSH Before setting up Hadoop, setup passwordless
More informationScaling for Humongous amounts of data with MongoDB
Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director, EMEA alvin@10gen.com @jonnyeight alvinonmongodb.com From here... http://bit.ly/ot71m4 ...to here... http://bit.ly/oxcsis
More informationAlgorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)
Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two
More informationClick Stream Data Analysis Using Hadoop
Governors State University OPUS Open Portal to University Scholarship All Capstone Projects Student Capstone Projects Spring 2015 Click Stream Data Analysis Using Hadoop Krishna Chand Reddy Gaddam Governors
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.
More informationSolutions for Netezza Performance Issues
Solutions for Netezza Performance Issues Vamsi Krishna Parvathaneni Tata Consultancy Services Netezza Architect Netherlands vamsi.parvathaneni@tcs.com Lata Walekar Tata Consultancy Services IBM SW ATU
More informationPrinciples of Data Management
Principles of Data Management Alvin Lin August 2018 - December 2018 Structured Query Language Structured Query Language (SQL) was created at IBM in the 80s: SQL-86 (first standard) SQL-89 SQL-92 (what
More informationDatabase Systems: Design, Implementation, and Management Tenth Edition. Chapter 7 Introduction to Structured Query Language (SQL)
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 7 Introduction to Structured Query Language (SQL) Objectives In this chapter, students will learn: The basic commands and
More informationGhislain Fourny. Big Data 5. Wide column stores
Ghislain Fourny Big Data 5. Wide column stores Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage 2 Where we are User interfaces
More information. International Journal of Advance Research in Engineering, Science & Technology. Identifying Vulnerabilities in Apache Cassandra
Impact Factor (SJIF): 4.542. International Journal of Advance Research in Engineering, Science & Technology e-issn: 2393-9877, p-issn: 2394-2444 Volume 4, Issue 4, April-2017 Identifying Vulnerabilities
More informationLecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka
Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka What problem does Kafka solve? Provides a way to deliver updates about changes in state from one service to another
More informationCS 655 Advanced Topics in Distributed Systems
Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3
More informationBig Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering
Big Data Processing Technologies Chentao Wu Associate Professor Dept. of Computer Science and Engineering wuct@cs.sjtu.edu.cn Schedule (1) Storage system part (first eight weeks) lec1: Introduction on
More informationTHE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB
More informationRAMCloud. Scalable High-Performance Storage Entirely in DRAM. by John Ousterhout et al. Stanford University. presented by Slavik Derevyanko
RAMCloud Scalable High-Performance Storage Entirely in DRAM 2009 by John Ousterhout et al. Stanford University presented by Slavik Derevyanko Outline RAMCloud project overview Motivation for RAMCloud storage:
More information5/1/17. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414
Announcements Database Systems CSE 414 Lecture 15: NoSQL & JSON (mostly not in textbook only Ch 11.1) 1 Homework 4 due tomorrow night [No Web Quiz 5] Midterm grading hopefully finished tonight post online
More informationHadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop
Hadoop Open Source Projects Hadoop is supplemented by an ecosystem of open source projects Oozie 25 How to Analyze Large Data Sets in Hadoop Although the Hadoop framework is implemented in Java, MapReduce
More informationThis tutorial helps the professionals aspiring to make a career in Big Data and NoSQL databases, especially the documents store.
About the Tutorial This tutorial provides a brief knowledge about CouchDB, the procedures to set it up, and the ways to interact with CouchDB server using curl and Futon. It also tells how to create, update
More informationApache Cassandra 3.0 for DSE 5.0 (Earlier version)
Apache Cassandra 3. for DSE 5. (Earlier version) Updated: 218-9-8-7: 218 DataStax, Inc. All rights reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries
More informationNoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018
NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE DACE https://dace.unige.ch Data and Analysis Center for Exoplanets. Facility to store, exchange and analyse data
More informationHustle Documentation. Release 0.1. Tim Spurway
Hustle Documentation Release 0.1 Tim Spurway February 26, 2014 Contents 1 Features 3 2 Getting started 5 2.1 Installing Hustle............................................. 5 2.2 Hustle Tutorial..............................................
More information