Rails on HBase. Zachary Pinter and Tony Hillerson RailsConf 2011

Similar documents
Data Informatics. Seon Ho Kim, Ph.D.

COSC 6339 Big Data Analytics. NoSQL (II) HBase. Edgar Gabriel Fall HBase. Column-Oriented data store Distributed designed to serve large tables

Comparing SQL and NOSQL databases

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

Typical size of data you deal with on a daily basis

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Introduction to BigData, Hadoop:-

A Glimpse of the Hadoop Echosystem

Innovatus Technologies

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Hadoop Development Introduction

Big Data Architect.

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Big Data Hadoop Stack

Presented by Sunnie S Chung CIS 612

CIB Session 12th NoSQL Databases Structures

CISC 7610 Lecture 2b The beginnings of NoSQL

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Projected by: LUKA CECXLADZE BEQA CHELIDZE Superviser : Nodar Momtsemlidze

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

HBase: Overview. HBase is a distributed column-oriented data store built on top of HDFS

Prototyping Data Intensive Apps: TrendingTopics.org

Strategies for Incremental Updates on Hive

Chapter 5. The MapReduce Programming Model and Implementation

HBase... And Lewis Carroll! Twi:er,

Importing and Exporting Data Between Hadoop and MySQL

10 Million Smart Meter Data with Apache HBase

Apache Hadoop Goes Realtime at Facebook. Himanshu Sharma

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Ghislain Fourny. Big Data 5. Column stores

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

Certified Big Data and Hadoop Course Curriculum

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Big Data Hadoop Course Content

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics

Impala. A Modern, Open Source SQL Engine for Hadoop. Yogesh Chockalingam

ExamTorrent. Best exam torrent, excellent test torrent, valid exam dumps are here waiting for you

Hadoop ecosystem. Nikos Parlavantzas

Building A Better Test Platform:

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Big Data Analytics using Apache Hadoop and Spark with Scala

A Review Paper on Big data & Hadoop

A Fast and High Throughput SQL Query System for Big Data

Hadoop & Big Data Analytics Complete Practical & Real-time Training

The State of Apache HBase. Michael Stack

microsoft

Techno Expert Solutions An institute for specialized studies!

CS November 2018

CS November 2017

Column Stores and HBase. Rui LIU, Maksim Hrytsenia

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Big Data with Hadoop Ecosystem

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Exam Questions

Apache HBase Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel

Distributed File Systems II

Scaling. Marty Weiner Grayskull, Eternia. Yashh Nelapati Gotham City

Hadoop: The Definitive Guide

Hadoop An Overview. - Socrates CCDH

Hadoop. Introduction / Overview

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

Pig A language for data processing in Hadoop

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

CS / Cloud Computing. Recitation 11 November 5 th and Nov 8 th, 2013

Certified Big Data Hadoop and Spark Scala Course Curriculum

8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara

EsgynDB Enterprise 2.0 Platform Reference Architecture

Impala Intro. MingLi xunzhang

BigTable: A Distributed Storage System for Structured Data

App Engine: Datastore Introduction

Welcome to the New Era of Cloud Computing

Performance Analysis of Hbase

HBase Solutions at Facebook

Ghislain Fourny. Big Data 5. Wide column stores

Apache Spark and Scala Certification Training

/ Cloud Computing. Recitation 10 March 22nd, 2016

Microsoft Perform Data Engineering on Microsoft Azure HDInsight.

Introduction to NoSQL Databases

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench

Challenges for Data Driven Systems

TI2736-B Big Data Processing. Claudia Hauff

Hadoop. Introduction to BIGDATA and HADOOP

Fattane Zarrinkalam کارگاه ساالنه آزمایشگاه فناوری وب

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Scaling. Yashh Nelapati Gotham City. Marty Weiner Krypton. Friday, July 27, 12

HBASE INTERVIEW QUESTIONS

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

Rapid Automated Indication of Link-Loss

Lecture 7: MapReduce design patterns! Claudia Hauff (Web Information Systems)!

DIVING IN: INSIDE THE DATA CENTER

CSE-E5430 Scalable Cloud Computing Lecture 9

Oracle 1Z Oracle Big Data 2017 Implementation Essentials.

Transcription:

Rails on HBase Zachary Pinter and Tony Hillerson RailsConf 2011

What we will cover

What is it?

What are the tradeoffs that HBase makes?

Why HBase is probably the wrong choice for your app

Why HBase might be the right choice for your app

+ How to use HBase with Rails

What we won t cover

This is not a deep dive

This is not a guide to setting up a production HBase cluster *See http://cloudera.com

So... What is HBase?

HBase is based on Bigtable

HBase Grew out of Hadoop

Core Concepts

Distributed Column Oriented Database

Tables column key core:link row llb0ga http://google.com llb260 http://en.oreilly.com/rails2011/public/ schedule/detail/17886

Column Families core column family stats column family key core:link stats:hits llb0ga http://google.com 100504 llb260 http://en.oreilly.com/rails2011/ public/schedule/detail/17886 2

Column Families Must be defined at creation or added to schema later Should group data accessed and sized similarly

Columns Can be dynamically added given that the family exists Stored as a byte array - do what you need with it

Column Families: Data Locality

Regions master llb0ga... llbee9 lld92q... lq1rro Regionserver Regionserver

Main Operations Get Put Incr Delete Scan

JRuby based shell

hbase shell ruby-1.9.2-p180 :002 > status 1 servers, 0 dead, 1.0000 average load ruby-1.9.2-p180 :003 > (1..10).each{ status } 1 servers, 0 dead, 3.0000 average load 1 servers, 0 dead, 3.0000 average load 1 servers, 0 dead, 3.0000 average load 1 servers, 0 dead, 3.0000 average load 1 servers, 0 dead, 3.0000 average load 1 servers, 0 dead, 3.0000 average load 1 servers, 0 dead, 3.0000 average load 1 servers, 0 dead, 3.0000 average load 1 servers, 0 dead, 3.0000 average load 1 servers, 0 dead, 3.0000 average load => 1..10 ruby-1.9.2-p180 :004 >

API REST Thrift Apache Avro Java API

What are the tradeoffs that HBase makes? NoSQL = Tradeoffs

Highly Distributed Pros Built-in sharding No single server holds all the data Different servers operate on different slices of the data Cons Makes joins difficult/impossible Limited real-time queries/sorting

Sorted Row Keys Bytearray keys Any kind of data you want Sorted in byte order All row access through the key

Sorted Row Keys Pros Real-time random lookups by key Real-time range queries Cons Conflates unique identifier with default sort Keys are an integral part of schema design, requiring extra consideration

Map / Reduce Pros Scales linearly: more servers = faster response times Operates on huge datasets (billions of rows) Works well with dynamic/non-uniform schemas Cons Scans every row per query * Not real-time (batch operation)

HBase vs. X http://nosql-database.org http://www.sevenweeks.org/ Seven Databases in Seven Weeks Eric Redmond Don t Miss His Talk!!

Why HBase is probably the wrong choice for your app

HBase Limitations

HBase Limitations Limited real-time queries (key lookup, range lookup) Limited sorting (one default sort per table) Limited transactions Manual indexing Joins/Normalization Storage of large binary data *

HBase Requirements At least 5 Servers (Zookeeper, Region Server, Name Nodes, Data Nodes, etc) Memory Intensive (2-4gb bar minimum per server, depending on server role) CPU Intensive Rough EC2 Cost: 5 c1.xlarge reserved for 1 year = over $850/month (not counting bandwidth!)

Why HBase might be the right choice for your app

Large data sets Ability to aggregate and analyze billions of rows Starts to shine when relational databases break down Is your db already sharded? Are you using SQL queries that take hours to run?

Hadoop Integration HDFS for storing files Native Map/Reduce Supports data locality Doesn t require data to be transferred

Flexible schema Supports non-heterogeneous data through column families with mixed key/value pairs Ability to refactor the data store via map/reduce (beats a multi-day ALTER TABLE command)

Rails on HBase

hbase-stargate https://github.com/greglu/hbase-stargate Uses Stargate, HBase REST server

Rhino https://github.com/sqs/rhino/ Thrift API Seems to be abandoned?

Massive Record https://github.com/companybook/massive_record It works! Uses Thrift API

Kwisatz: An URL Shortener

Demo: JRuby Scripts

Demo: Rails Front-End

Source Demo source code github.com/effectiveui/jruby-hbase-demo github.com/effectiveui/rails-hbase-massive-record

Getting Set Up (Mac) brew install hbase Cloudera VM http://bit.ly/dsfpa9

Recap

Where Would I Use HBase?

Where is HBase s Sweet Spot

Where Would I Use Rails With HBase?

Questions?

Thank You! Zachary: @zpinter Tony: @thillerson EffectiveUI.com