Introduction to Column Oriented Databases in PHP

Similar documents
COSC 6339 Big Data Analytics. NoSQL (II) HBase. Edgar Gabriel Fall HBase. Column-Oriented data store Distributed designed to serve large tables

Using space-filling curves for multidimensional

Comparing SQL and NOSQL databases

CSE-E5430 Scalable Cloud Computing Lecture 9

10 Million Smart Meter Data with Apache HBase

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics

Facebook. The Technology Behind Messages (and more ) Kannan Muthukkaruppan Software Engineer, Facebook. March 11, 2011

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

CIB Session 12th NoSQL Databases Structures

Introduction to BigData, Hadoop:-

Data Informatics. Seon Ho Kim, Ph.D.

ADVANCED HBASE. Architecture and Schema Design GeeCON, May Lars George Director EMEA Services

How to pimp high volume PHP websites. 27. September 2008, PHP conference Barcelona. By Jens Bierkandt

A Glimpse of the Hadoop Echosystem

Distributed Programming the Google Way

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

Apache Hadoop Goes Realtime at Facebook. Himanshu Sharma

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

W b b 2.0. = = Data Ex E pl p o l s o io i n

NoSQL Databases. an overview

YCSB++ benchmarking tool Performance debugging advanced features of scalable table stores

Switching to Innodb from MyISAM. Matt Yonkovit Percona

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

App Engine: Datastore Introduction

HBASE INTERVIEW QUESTIONS

CISC 7610 Lecture 2b The beginnings of NoSQL

HBase. Леонид Налчаджи

Faster HBase queries. Introducing hindex Secondary indexes for HBase. ApacheCon North America Rajeshbabu Chintaguntla

CGS 3066: Spring 2017 SQL Reference

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

Typical size of data you deal with on a daily basis

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

Ghislain Fourny. Big Data 5. Column stores

YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores

LazyBase: Trading freshness and performance in a scalable database

Jyotheswar Kuricheti

Column-Family Databases Cassandra and HBase

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

Databases 2 (VU) ( / )

RocksDB Key-Value Store Optimized For Flash

Challenges for Data Driven Systems

Big Table. Google s Storage Choice for Structured Data. Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla

COSC 6397 Big Data Analytics. Data Formats (III) HBase: Java API, HBase in MapReduce and HBase Bulk Loading. Edgar Gabriel Spring 2014.

Introduction to NoSQL Databases

Scaling with mongodb

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

Rails on HBase. Zachary Pinter and Tony Hillerson RailsConf 2011

MySQL Cluster Web Scalability, % Availability. Andrew

The NoSQL Ecosystem. Adam Marcus MIT CSAIL

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017

BigTable: A Distributed Storage System for Structured Data

/ Cloud Computing. Recitation 10 March 22nd, 2016

MapReduce and Friends

Database Evolution. DB NoSQL Linked Open Data. L. Vigliano

FROM LEGACY, TO BATCH, TO NEAR REAL-TIME. Marc Sturlese, Dani Solà

Hibernate Search Googling your persistence domain model. Emmanuel Bernard Doer JBoss, a division of Red Hat

Advanced HBase Schema Design. Berlin Buzzwords, June 2012 Lars George

Map-Reduce. Marco Mura 2010 March, 31th

MySQL Performance Improvements

Scalable Search Engine Solution

Design Patterns for Large- Scale Data Management. Robert Hodges OSCON 2013

Ghislain Fourny. Big Data 5. Wide column stores

Presented by Sunnie S Chung CIS 612

Welcome to the New Era of Cloud Computing

BigTable. Chubby. BigTable. Chubby. Why Chubby? How to do consensus as a service

Introduction to Database Services

DATABASE SCALE WITHOUT LIMITS ON AWS

Cassandra 1.0 and Beyond

Distributed computing: index building and use

Rule 14 Use Databases Appropriately

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Database Applications (15-415)

HBase: Overview. HBase is a distributed column-oriented data store built on top of HDFS

Eternal Story on Temporary Objects

SEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME. Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013

Topics. History. Architecture. MongoDB, Mongoose - RDBMS - SQL. - NoSQL

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

Foreword...xv Introduction...1 Part I: Buying and Getting Started with Your Computer...5

Kathleen Durant PhD Northeastern University CS Indexes

A Brief Introduction of TiDB. Dongxu (Edward) Huang CTO, PingCAP

Introduction to Hadoop and MapReduce

Processing of big data with Apache Spark

Big Data with Hadoop Ecosystem

Database Applications (15-415)

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing

Guided dossier preparation in IUCLID Cloud. 07 February 2018

Mix n Match Async and Group Replication for Advanced Replication Setups. Pedro Gomes Software Engineer

About 1. Chapter 1: Getting started with hbase 2. Remarks 2. Examples 2. Installing HBase in Standalone 2. Installing HBase in cluster 3

Distributed Data Analytics Partitioning

Study of NoSQL Database Along With Security Comparison

Column Stores and HBase. Rui LIU, Maksim Hrytsenia

Outline. Parallel Database Systems. Information explosion. Parallelism in DBMSs. Relational DBMS parallelism. Relational DBMSs.

Apache HBase Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel

Part 2 (Disk Pane, Network Pane, Process Details & Troubleshooting)

Help! I need more servers! What do I do?

Transcription:

Introduction to Column Oriented Databases in PHP By Slavey Karadzhov <slavey.karadzhov@zend.com>

About Me Name: Slavey Karadzhov (Славей Караджов) Programmer since my early days PHP programmer since 1999 Open source fan and supporter Hobby (software) inventor Won the best IT Innovation of the Year 2004 from PC Magazine Working for Zend Technologies since 2008 In Stuttgart, Germany As a programmer, consultant and trainer 2 Introduction to Colum Oriented Databases in PHP

Agenda How Do They Look? Why New Type Of Database? The Details Supported Operations Available Open Source CODB engines Putting Everything Together How To Use It From PHP Is CODB For Me? Questions 3 Introduction to Colum Oriented Databases in PHP

How Do They Look? and what is the difference to relational databases?

How do they look? Representation in Relational (Row Oriented) DB Representation in Column Oriented DB (CODB) 5 Introduction to Colum Oriented Databases in PHP

How do they look? (2) Table http://labs.google.com/papers/bigtable.html Associative array Sparse Multidimensional Distributed Persistent Sorted 6 Introduction to Colum Oriented Databases in PHP

How do they look? (3) Column Families Family columna, columnb, Qualifier one, two, three Three dimensions + 1 Row, Family, Qualifier +1 more dimension: time 7 Introduction to Colum Oriented Databases in PHP

Why New Type Of Database? and aren t the relational databases enough?

Why New Type Of Database? PHP PHP PHP 2001 / 1 server, PHP and MySQL / 200 users MySQL 2002 / 2 PHP machines, MySQL 1 MySQL machine / 2k users 2003 / 4+ PHP machines, 1 MySQL machine /20k users 9 Introduction to Colum Oriented Databases in PHP

Why New Type Of Database?(2) Problem 1: The Db is not scalable 2x more powerful machine brings less than 1,5x performance boost and costs a lot Solution 1: Sharding Move different tables to different DBs Splits one table by row into two or more tables on different DB machines Using commodity software 10 Introduction to Colum Oriented Databases in PHP

Why New Type Of Database?(3) Sharding has to be done on the client MySQL and other relational DBs do not support sharding on the server side Problem 2: The PHP code becomes complex Difficult to use and understand Resizing the shards is manual painful work Sorting and merging of data becomes real problem For speed more data needs to be redundant Joins between shards are almost impossible 11 Introduction to Colum Oriented Databases in PHP

Why New Type Of Database?(4) Example: Prerequisites: Social web site. Every user has friends. Not all friends in one shard Every user has activity: adding a comment, blog post, image upload, etc. Every user has avatar icon and name Give me the last 20 activities from my friends Sound simple, right? 12 Introduction to Colum Oriented Databases in PHP

Why New Type Of Database?(5) Example (cont d): Give me the last 20 activities from my friends Here is how it is done: Get the list of friends for an user Get information about the friends name, avatar, shard Group the friends in shards For each shard get the last 20 activities per friend In PHP merge all the incoming data and sort it by date so that the first 20 show up For this 20 results the PHP needs (quite often) 60 or more MB RAM >10 seconds CPU time 13 Introduction to Colum Oriented Databases in PHP

Why New Type Of Database?(6) Activity table as a CODB table And in CODB you have to do SCAN activity, { COLUMNS= friends:0000001, LIMIT=20} Be patient, the details are in the next slides 14 Introduction to Colum Oriented Databases in PHP

The Details must-know when designing a table Introduction 15 to Colum Oriented Databases in PHP

The Details There are no joins between tables. The row keys are always sorted in alphabetical order The type of the column value is always string, unless a counter The generation of the row key is very important There is no auto-increment of the key No key constraints (no primary, no unique, etc) 16 Introduction to Colum Oriented Databases in PHP

The Details (2) The generation of the row key is very important (2) The fastest search is by using the row key Secondary indexes are almost not supported One cannot search in the value of the columns Nothing like columnb= de But can search if a column family exists: columa:one 17 Introduction to Colum Oriented Databases in PHP

Supported Operations list of the most important ones Introduction 18 to Colum Oriented Databases in PHP

Supported Operations CREATE creating a table CREATE tablename, columna, columnb, columnc Plus options to the table column families: Versions how many versions of the cell value to keep In Memory if the cell value should be kept in memory for faster search Counter if the column family will be used for autonomous increment and decrement The value of the cell is integer and adding or decreasing the value is thread-safe (no race conditions) Compression good for storing, bad for searching 19 Introduction to Colum Oriented Databases in PHP

Supported Operations(2) PUT for adding a cell PUT table rowkey, columnfamilykey, value PUT table row1, columna:one, eins PUT table row1, columna:two, zwei PUT table row1, columna:three, drei PUT table row1, columnb, de PUT table row1, columnc, German GET get row by its row key value GET table, rowkey, { COLUMNS, VERSIONS} GET table, row2, { columnb:one } 20 Introduction to Colum Oriented Databases in PHP

Supported Operations(3) There is no UPDATE! DELETE for deleting a cell or complete row DELETE table, rowkey, columnfamilykey for a cell DELETE table, rowkey SCAN find rows SCAN table, {COLUMNS=array(), LIMIT=<int>, STARTROW=string, STOPROW SCAN table, { COLUMNS=[ columnb:one ]} Returns all rows that have the cell column family B and qualifier one SCAN table, { STARTROW= ro, STOPROW= row3 } - Returns all rows for which the row key starts with ro and ends with row3 21 Introduction to Colum Oriented Databases in PHP

Available Open Source CODB engines as of today Introduction 22 to Colum Oriented Databases in PHP

Available Open Source CODB engines Hbase Uses Hadoop as distributed storage backend Written in Java Large community base No single point of failure (ZooKeeper) Hypertable Also Hadoop Written in C May be faster and less memory/cpu intensive Smaller community base No single point of failure (Own manager for the Zoo) 23 Introduction to Colum Oriented Databases in PHP

Putting Everything Together without PHP

Putting Everything Together Activity table SCAN activity, { COLUMNS= friends:0000001, LIMIT=20} The result Sorted by key Limited to 20 results Taken, as if, from one server (no extra sharding logic) 25 Introduction to Colum Oriented Databases in PHP

Putting Everything Together(2) SCAN activity, { COLUMNS= friends:0000001, LIMIT=20} What needs to be done additionally is: Get the user data for your friends and merge it in PHP 26 Introduction to Colum Oriented Databases in PHP

How To Use It From PHP some code examples Introduction 27 to Colum Oriented Databases in PHP

How To Use It From PHP Activity as an ActiveRecord 28 Introduction to Colum Oriented Databases in PHP

How To Use It From PHP(2) ActiveRecord 29 Introduction to Colum Oriented Databases in PHP

How To Use It From PHP (3) The PHP code is much simpler Resizing the shards is done automatically and transparently If the load gets high just add new CODB server Failover, no single point of failure Sorting is fast Merging of the data, if the table is well-designed, is also fasts 30 Introduction to Colum Oriented Databases in PHP

Is CODB For Me? who should use it Introduction 31 to Colum Oriented Databases in PHP

Is CODB For Me? Require a lot of memory and hard disk space More redundant, more de-normalized Rational DBs have decades of optimizations behind them Does not makes sense to use it for small websites Although this may change if CODB becomes main stream Not for a bank system No transaction safety Can be partially/eventually consistent Works fine if the final result is a relatively small set of data 32 Introduction to Colum Oriented Databases in PHP

Thank you very much questions anyone?

References BigTable paper: http://labs.google.com/papers/bigtable.html Hypertable: http://hypertable.org/ Apache Hbase: http://hbase.apache.org/ Apache Hadoop: http://hadoop.apache.org/ Sharding: http://en.wikipedia.org/wiki/shard_(database_architecture) Active Record Pattern: http://en.wikipedia.org/wiki/active_record_pattern 34 Introduction to Colum Oriented Databases in PHP