A NoSQL Introduction for Relational Database Developers Andrew Karcher Las Vegas SQL Saturday September 12th, 2015
About Me http://www.andrewkarcher.com Twitter: @akarcher LinkedIn, Twitter Email: akarcher@gmail.com Learning just like you. Data Engineer for Pluralsight Provider of 4000+ training courses for Developer, IT and Creative learners Local Connection Digital Tutors (OKC Company) is a part of the Pluralsight family.
Why are you here? Who here is a SQL Developer/DBA? Who has a NoSQL Product in their environment? Who has played with a NoSQL Database? Which ones?
My goals for this presentation With lack of knowledge, comes fear. If nothing else, I aim to give you enough knowledge to remove that fear. In technology, change is inevitable and those that continue to learn will find themselves in the best positions to succeed. Even Microsoft is embracing NoSQL so it is coming whether you think it is or not.
Agenda NoSQL (in general) Types of NoSQL Databases Future Directions
What is the definition of NoSQL? Someone give me a definition of NoSQL?
My Definition Not Only SQL No, Not Relational Schema-Less No, Schema Later
Big Data Three/Four/Five V s Volume Size of Data Variety Different Sources of Data Velocity Speed of Incoming Data Veracity Uncertainty/Accuracy of Data Value Extracting Value from Data
Types of NoSQL Databases Key\Value Store Column Family Document Databases Graph Databases Large Scale Analytics
Key/Value Database Hash Table with a Unique Key and a pointer to a particular item of data Stores Data in Key/Value Pairs Best way to think about it is a Table with a Primary Key and one other column Built for accessing items only by the key Value can be anything from a simple string to something like a document or serialized object Built for Fast, Scalable Reads\Writes
Key Value User1234_name John User5678_name Betty User1234_age 32 User5678_age 45 User1234_profile <json> User5678_preferences <xml>
Benefits of Key Value Ability to scale to very large installations Good for accessing data by Key Super simple access and apis Used a lot for caching databases Players Redis, Amazon Simple DB, Riak
Column Family Database Stores data at the Row Level Within the row it stored sets of columns within a family Each change is stored along with a timestamp as a new version of the row with only the columns that changed Each update is just storing new values for the columns that are changed. Very wide range of features with products in this space. Built for Fast, Scalable Reads\Writes
Benefits of Column Family Ability to scale to very large installations Facebook Scale Good for accessing data by Key and pulling data for a particular column family Not as performant for accessing data by secondary columns. Need to explicitly define secondary indexes. Flexible Data Structures Players Cassandra, Hbase, SimpleDB, DynamoDB,.
Document Databases Similar to Key\Value, but the value is a full document Documents are generally stored in JSON or some other document format Very flexible schema as two documents can have completely different structures Documents are self contained and can contain nested and multi-valued elements Documents cannot reference other documents
Benefits of Document Databases Can support wide variety of Document Types Fits well with applications in an Object Relational model Good for access by Document and the Document Key. Can define secondary indexes. Not as good for accessing a set of documents by a particular field Examples: MongoDB, CouchDB, RavenDB, DocumentDB
Graph Databases Stores data as nodes and relationships between those nodes Nodes and Relationships can have attributes that describe them Relationships are a first class object within the database. Can add any type of node or relationship at any time. Optimized for traversing relationships
Benefits of Graph Databases Highly Flexible Schema Optimized for navigating relationships Good for highly connected data (Recommendations, Friend of a Friend) Examples: Neo4j, AllegroGraph, InfiniteGraph
Large Scale Analytics Big player here is Hadoop Storage and Processing of Large-Scale Data Scaled to large number of nodes running on commodity hardware Also can be called MPP (Massive Parallel Processing) Used to answer the hard questions over Petabytes of Data Other Examples: Teradata, Vertica, Parallel Data Warehouse (APS)
Benefits of Large Scale Analytics Able to handle immense amounts of data and provide. Provide ability to query across large scopes of data with varied types.
Future Thoughts Although they are NoSQL they are all building SQL like query engines or interfaces Databases are blurring the lines between the types of database engines Scaling is hard no matter which technology you use Traditional Relational Databases are adding NoSQL database engines Azure DocumentDB, Oracle NoSQL, etc. Find the right technology for your solution The decision on which one to use is complex.
Questions? akarcher@gmail.com @akarcher http://www.andrewkarcher.com