PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

Similar documents
PROFESSIONAL NoSQL PART I GETTING STARTED. PART II LEARNING THE NoSQL BASICS. PART III GAINING PROFICIENCY WITH NoSQL. PART IV MASTERING NoSQL

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Introduction to NoSQL Databases

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

Big Data Hadoop Course Content

Hadoop & Big Data Analytics Complete Practical & Real-time Training

Big Data Analytics using Apache Hadoop and Spark with Scala

Cassandra- A Distributed Database

CIB Session 12th NoSQL Databases Structures

Big Data Development CASSANDRA NoSQL Training - Workshop. November 20 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos

Advanced Database Technologies NoSQL: Not only SQL

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

CISC 7610 Lecture 2b The beginnings of NoSQL

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Webinar Series TMIP VISION

A Review Of Non Relational Databases, Their Types, Advantages And Disadvantages

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Database Availability and Integrity in NoSQL. Fahri Firdausillah [M ]

Exploring Cassandra and HBase with BigTable Model

Understanding NoSQL Database Implementations

ffirs.indd ii 8/8/11 2:37:28 PM

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

CompSci 516 Database Systems

Challenges for Data Driven Systems

/ Cloud Computing. Recitation 10 March 22nd, 2016

The NoSQL Ecosystem. Adam Marcus MIT CSAIL

DATABASE DESIGN II - 1DL400

Hadoop Development Introduction

Hadoop An Overview. - Socrates CCDH

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan

Advanced Data Management Technologies

A Study of NoSQL Database

HADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation)

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe

Relational databases

BIG DATA COURSE CONTENT

Presented by Sunnie S Chung CIS 612

Oracle GoldenGate for Big Data

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

5/1/17. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

CSE 344 JULY 9 TH NOSQL

NoSQL Databases. an overview

CS 655 Advanced Topics in Distributed Systems

Stages of Data Processing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

/ Cloud Computing. Recitation 8 October 18, 2016

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Data Storage Infrastructure at Facebook

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics

IoT Data Storage: Relational & Non-Relational Database Management Systems Performance Comparison

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

App Engine: Datastore Introduction

NOSQL Databases: The Need of Enterprises

Hadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop

Haridimos Kondylakis Computer Science Department, University of Crete

Database Systems CSE 414

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

Hadoop. Introduction to BIGDATA and HADOOP

Study of NoSQL Database Along With Security Comparison

Big Data Architect.

Distributed Data Store

Presented by Nanditha Thinderu

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System ADC

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

Shen PingCAP 2017

Non-Relational Databases. Pelle Jakovits

Chapter 24 NOSQL Databases and Big Data Storage Systems

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Getting to know. by Michelle Darling August 2013

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Performance Comparison of NOSQL Database Cassandra and SQL Server for Large Databases

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University

Data Informatics. Seon Ho Kim, Ph.D.

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Sources. P. J. Sadalage, M Fowler, NoSQL Distilled, Addison Wesley

In-Memory Data processing using Redis Database

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Cassandra Design Patterns

Comparing SQL and NOSQL databases

ADVANCED DATABASES CIS 6930 Dr. Markus Schneider

Hadoop Online Training

Benchmarking Cloud Serving Systems with YCSB 詹剑锋 2012 年 6 月 27 日

Microsoft Big Data and Hadoop

Database Evolution. DB NoSQL Linked Open Data. L. Vigliano

Migrating Oracle Databases To Cassandra

Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...

Introduction to BigData, Hadoop:-

Facebook, 14 Fast projection index, 84 First database revolution data handling code, 6 DBMS, 6 network and hierarchical model, 6 7

MapReduce-II. September 2013 Alberto Abelló & Oscar Romero 1

CSE 444: Database Internals. Lecture 23 Spark

Transcription:

PROFESSIONAL NoSQL Shashank Tiwari WILEY John Wiley & Sons, Inc.

Examining CONTENTS INTRODUCTION xvil CHAPTER 1: NOSQL: WHAT IT IS AND WHY YOU NEED IT 3 Definition and Introduction 4 Context and a Bit of History 4 Big Data 7 Scalability 9 Definition and introduction 10 Sorted Ordered Column-Oriented Stores 11 Key/Value Stores 14 Document Databases 18 Graph Databases 19 Summary 20 CHAPTER 2: HELLO NOSQL: GETTING INITIAL HANDS-ON EXPERIENCE 21 First Impressions Two Simple Examples 22 A Simple Set of Persistent Preferences Data 22 Storing Car Make and Model Data 28 Working with Language Bindings 37 MongoDB's Drivers 37 A First Look at Thrift 40 Summary 42 CHAPTER 3: INTERFACING AND INTERACTING WITH NOSQL 43 If No SQL, Then What? 43 Storing and Accessing Data 44 Storing Data In and Accessing Data from MongoDB 45 Querying MongoDB 49 Storing Data In and Accessing Data from Redis 51 Querying Redis 56 Storing Data In and Accessing Data from HBase 59 Querying HBase 62

Storing Data In and Accessing Data from Apache Cassandra 63 Querying Apache Cassandra 64 Language Bindings for NoSQL Data Stores 65 Being Agnostic with Thrift 65 Language Bindings for Java 66 Language Bindings for Python 68 Language Bindings for Ruby 68 Language Bindings for PHP 69 Summary 70 CHAPTER 4: UNDERSTANDING THE STORAGE ARCHITECTURE 73 Working with Column-Oriented Databases 74 Using Tables and Columns in Relational Databases 75 Contrasting Column Databases with RDBMS 77 Column Databases as Nested Maps of Key/Value Pairs 79 Laying out the Webtable 81 HBase Distributed Storage Architecture 82 Document Store Internals 85 Storing Data in Memory-Mapped Files 86 Guidelines for Using Collections and Indexes in MongoDB 87 MongoDB Reliability and Durability 88 Horizontal Scaling 89 Understanding Key/Value Stores in Memcached and Redis 90 Under the Hood of Memcached 91 Redis Internals 92 Eventually Consistent Non-relational Databases 93 Consistent Hashing 94 Object Versioning 95 Gossip-Based Membership and Hinted Handoff 96 Summary 96 CHAPTER 5: PERFORMING CRUD OPERATIONS 97 Creating Records 97 Creating Records in a Document-Centric Database 99 Using the Create Operation in Column-Oriented Databases 105 Using the Create Operation in Key/Value Maps 108

Accessing Data 110 Accessing Documents from MongoDB 111 Accessing Data from HBase 112 Querying Redis 113 Updating and Deleting Data 113 Updating and Modifying Data in MongoDB, HBase, and Redis 114 Limited Atomicity and Transactional Integrity 115 Summary 116 CHAPTER 6: QUERYING NOSQL STORES 117 Similarities Between SQL and MongoDB Query Features 118 Loading the MovieLens Data 119 MapReduce in MongoDB 126 Accessing Data from Column-Oriented Databases Like HBase 129 The Historical Daily Market Data 129 Querying Redis Data Stores 131 Summary 135 CHAPTER 7: MODIFYING DATA STORES AND MANAGING EVOLUTION 137 Changing Document Databases 138 Schema-less Flexibility 141 Exporting and Importing Data from and into MongoDB 143 Schema Evolution in Column-Oriented Databases 145 HBase Data Import and Export 147 Data Evolution in Key/Value Stores 148 Summary 148 CHAPTER 8: INDEXING AND ORDERING DATA SETS 149 Essential Concepts Behind a Database Index 150 Indexing and Ordering in MongoDB 151 Creating and Using Indexes in MongoDB 154 Compound and Embedded Keys 160 Creating Unique and Sparse Indexes 163 Keyword-based Search and MultiKeys 164 Indexing and Ordering in CouchDB 165 The B-tree Index in CouchDB 166 Indexing in Apache Cassandra 166 Summary 168 xi

CHAPTER 9: MANAGING TRANSACTIONS AND DATA INTEGRITY 169 RDBMS and ACID 169 Isolation Levels and Isolation Strategies 171 Distributed ACID Systems 173 Consistency 174 Availability 174 Partition Tolerance 175 Upholding CAP 176 Compromising on Availability 179 Compromising on Partition Tolerance 179 Compromising on Consistency 180 Consistency Implementations in a Few NoSQL Products 181 Distributed Consistency in MongoDB 181 Eventual Consistency in CouchDB 181 Eventual Consistency in Apache Cassandra 183 Consistency in Membase 183 Summary 183 CHAPTER 10: USING NOSQL IN THE CLOUD 187 Google App Engine Data Store 188 GAE Python SDK: Installation, Setup, and Getting Started 189 Essentials of Data Modeling for GAE in Python 193 Queries and Indexes 197 Allowed Filters and Result Ordering 198 Tersely Exploring the Java App Engine SDK 202 Amazon SimpleDES 205 Getting Started with SimpleDB 205 Using the REST API 207 Accessing SimpleDB Using Java 211 Using SimpleDB with Ruby and Python 213 Summary 214 CHAPTER 11: SCALABLE PARALLEL PROCESSING WITH MAPREDUCE 217 Understanding MapReduce 218 Finding the Highest Stock Price for Each Stock 221 Uploading Historical NYSE Market Data into CouchDB 223 xil

MapReduce with HBase 226 MapReduce Possibilities and Apache Mahout 230 Summary 232 CHAPTER 12: ANALYZING BIG DATA WITH HIVE 233 Hive Basics 234 Back to Movie Ratings 239 Good Old SQL 246 JOIN(s) in Hive QL 248 Explain Plan 250 Partitioned Table 252 Summary 252 CHAPTER 13: SURVEYING DATABASE INTERNALS 253 MongoDB Internals 254 MongoDB Wire Protocol 255 Inserting a Document 257 Querying a Collection 257 MongoDB Database Files 258 Membase Architecture 261 Hypertable Under the Hood 263 Regular Expression Support 263 Bloom Filter 264 Apache Cassandra 264 Peer-to-Peer Model 264 Based on Gossip and Anti-entropy 264 Fast Writes 265 Hinted Handoff 266 Berkeley DB 266 Storage Configuration 267 Summary 268 CHAPTER 14: CHOOSING AMONG NOSQL FLAVORS 271 Comparing NoSQL Products 272 Scalability 272 Transactional Integrity and Consistency 274 Data Modeling 275 Querying Support 277 xiii

Access and Interface Availability 278 Benchmarking Performance 279 50/50 Read and Update 280 95/5 Read and Update 280 Scans 280 Scalability Test 281 Hypertable Tests 281 Contextual Comparison 282 Summary 283 CHAPTER 15: COEXISTENCE 285 Using MySQL as a NoSQL Solution 285 Mostly Immutable Data Stores 289 Polyglot Persistence at Facebook 290 Data Warehousing and Business Intelligence 291 Web Frameworks and NoSQL 292 Using Rails with NoSQL 292 Using Django with NoSQL 293 Using Spring Data 295 Migrating from RDBMS to NoSQL 300 Summary 300 CHAPTER 16: PERFORMANCE TUNING 301 Goals of Parallel Algorithms 301 The Implications of Reducing Latency 301 How to Increase Throughput 302 Linear Scalability 302 Influencing Equations 303 Amdahl's Law 303 Little's Law 304 Message Cost Model 305 Partitioning 305 Scheduling in Heterogeneous Environments 306 Additional Map-Reduce Tuning 307 Communication Overheads 307 Compression 307 File Block Size 308 Parallel Copying 308 HBase Coprocessors 308 Leveraging Bloom Filters 309 Summary 309 xiv

CHAPTER 17: TOOLS AND UTILITIES 311 RRDTool 312 Nagios 314 Scribe 315 Flume 316 Chukwa 316 Pig 317 Interfacing with Pig 318 Pig Latin Basics 318 Nodetool 320 OpenTSDB 321 Solandra 322 Hummingbird and C5t 324 GeoCouch 325 Alchemy Database 325 Webdis 326 Summary 326 APPENDIX: INSTALLATION AND SETUP INSTRUCTIONS 329 Installing and Setting Up Hadoop 329 installing Hadoop 330 Configuring a Single-node Hadoop Setup 331 Configuring a Pseudo-distributed Mode Setup 331 Installing and Setting Up HBase 335 Installing and Setting Up Hive 335 Configuring Hive 336 Overlaying Hadoop Configuration 337 Installing and Setting Up Hypertable 337 Making the Hypertable Distribution FHS-Compliant 338 Configuring Hadoop with Hypertable 339 Installing and Setting Up MongoDB 339 Configuring MongoDB 340 Installing and Configuring CouchDB 340 Installing CouchDB from Source on Ubuntu 10.04 341 Installing and Setting Up Redis 342 Installing and Setting Up Cassandra 343 Configuring Cassandra 343 Configuring log4j for Cassandra 343 Installing Cassandra from Source 344 XV

Installing and Setting Up Membase Server and Memcached 344 Installing and Setting Up Nagios 345 Downloading and Building Nagios 346 Configuring Nagios 347 Compiling and Installing Nagios Plugins 348 Installing and Setting Up RRDtool 348 Installing Handler Socket for MySQL 349 INDEX 351 vi