OPEN SOURCE DB SYSTEMS Anna Topol 1 TYPES OF DBMS Relational Key-Value Document-oriented Graph 2
DBMS SELECTION Multi-platform or platform-agnostic Offers persistent storage Fairly well known Actively maintained/developed 3 TYPES OF DBMS Relational Key-Value Document-oriented Graph 4
RELATIONAL A collection of relations (i.e.: tables) Data accessed by specifying queries in SQL SQL originally based on relational algebra and now guided by the Database Language SQL ANSI standard Each tuple of a relation needs to be uniquely identifiable by some combination of its attribute values 5 RELATIONAL: FIREBIRD www.firebirdsql.org Written in C and C++ Original source code by Borland Software Corp released in July 2000 Now community-developed Free to use, modify, and distribute including with commercial license software 6
RELATIONAL: HYPERSQL http://hsqldb.org Written in Java Developed by a closed group Others can pay annual subscription fees to participate in development Free to use & distribute under a (modified) BSD license 7 RELATIONAL: LUCIDDB www.luciddb.org Developed in C++ wrapped in Java Built specifically for data warehousing and business intelligence Currently owned by The Eigenbase Project and Dynamo BI Individual developers have to sign an agreement to contribute Free to use and distribute under a GPL license 8
TYPES OF DBMS Relational Key-Value Document-oriented Graph 9 KEY-VALUE A record is a pair of a key and a value Data is written out to files No concept of data types Easily scalable and affords structural data changes Distributed via replication 10
KEY-VALUE: TOKYO CABINET http://fallabs.com/ tokyocabinet/ Written in C and provides API wrappers for other languages Records are organized in a hash table, B+ tree, or fixedlength array 11 KEY-VALUE: REDIS http://code.google.com/p/redis Written in C with multiple bindings to other languages Pseudo data types: strings, lists, sets, sorted sets Provides set operations on data (union, intersect, etc.) Custom list of commands 12
KEY-VALUE: HAMSTER DB http://hamsterdb.com Written in C with wrappers for C++, Python,.NET, and Java Data stored in a sorted B+ Tree Implements databases cursors Can save multiple databases to a single physical file Can be remote but not distributed 13 KEY-VALUE: OTHERS Project Voldemort Riak Dynomite 14
TYPES OF DBMS Relational Key-Value Document-oriented Graph 15 DOCUMENT-ORIENTED Use XML or JSON objects Querying API varies from one DBMS to another Distributed is achieved by some replication model Amount of data for same type of object can vary from one record to another vs. tables where each record has same number of attributes but not all attributes store non-null values 16
DOCUMENT-ORIENTED (2) JSON: XML: 17 DOCUMENT-ORIENTED: www.mongodb.org Written in C++ MONGODB Can scale across multiple database servers using sharding API that models SQL data access calls in C++ Provides a Map/reduce mechanism for data aggregation and batch operations 18
DOCUMENT-ORIENTED: COUCHDB http://couchdb.apache.org Written in Erlang JSON API Uses JavaScript for Aggregation and Reporting Uses HTTP request-like commands to interact with database: 19 DOCUMENT-ORIENTED: OTHERS Terrastore OrientDB ThruDB 20
TYPES OF DBMS Relational Key-Value Document-oriented Graph 21 GRAPH Entities (a person, a book, etc.) are represented as nodes Relationships (a friendship, an employee, related book) are represented as edges Both nodes and edges have properties Highly scalable, great for frequently changing data structure 22
GRAPH (2) 23 GRAPHS (3) 24
http://neo4j.org GRAPH: NEO4J Developed in Java by a team at Neotechnology with developer contributions Can handle graphs of several billion nodes/relationships/ properties OO API Stored in binary on-disk format Allows user-specified data indexing 25 GRAPH: INFOGRID http://infogrid.org Developed in Java released as open source by NetMesh In-memory cache store with persistent storage specified by the user via a common API 26
GRAPH: OTHERS AllegroGraph HyperGraphDB Bigdata 27 CONCLUSION Alternatives to RDBMS are actively being explored and developed NoSQL DBs are best for use on unstructured and semistructured data requiring great scalability - great for general purpose on-line applications RDBMS are likely to maintain their status quo for a while 28
THANK YOU 29