SQLite vs. MongoDB for Big Data

Similar documents
Lifehack #1 - Automating Twitter Growth without Being Blocked by Twitter

Using the Force of Python and SAS Viya on Star Wars Fan Posts

Storing data in databases

A DOZEN REASONS TO ABSOLUTELY NEVER USE ONENOTE

1

EDGE, MICROSOFT S BROWSER

1. MS EXCEL. a. Charts/Graphs

CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench

Intro. Scheme Basics. scm> 5 5. scm>

Queries give database managers its real power. Their most common function is to filter and consolidate data from tables to retrieve it.

relational Key-value Graph Object Document

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL

What s an SEO Strategy With Out Social Media?

RavenDB & document stores

CONTENT CALENDAR USER GUIDE SOCIAL MEDIA TABLE OF CONTENTS. Introduction pg. 3

CSCE Java. Dr. Chris Bourke. Prior to Lab. Peer Programming Pair-Up. Lab 15 - Databases & Java Database Connectivity API

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL


So, why not start making some recommendations that will earn you some cash?

Taskbar: Working with Several Windows at Once

Are your spreadsheets filled with unnecessary zero s, cluttering your information and making it hard to identify significant results?

MongoDB Web Architecture

NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018

Web Hosting. Important features to consider

An Introduction to Big Data Formats

Creating Hair Textures with highlights using The GIMP

The Fat-Free Guide to Conversation Tracking

Getting Help...71 Getting help with ScreenSteps...72

flask-dynamo Documentation

Drag and Drop Form Builder. Data Verity #2 Erikka Baker James Miller Jordan Schmerge

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

SEEM4540 Open Systems for E-Commerce Lecture 04 Servers Setup and Content Management Systems

WELCOME! - Brisbane City. Kurt Sanders Director of Strategy The Content Division. Terri Cooper Small Business Liaison.

Strong signs your website needs a professional redesign

RethinkDB. Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu

Building Self-Service BI Solutions with Power Query. Written By: Devin

CS108 Lecture 18: Databases and SQL

Creating Word Outlines from Compendium on a Mac

Troubleshooting and Tips

Four Numbers Functions Everyone Should Master

MySQL. The Right Database for GIS Sometimes

Sql 2008 Copy Table Structure And Database To

So first, to find what you need, you need to access to the site.

How to Begin: Twitter

Why I Use Python for Academic Research

Intro to Python Programming

Importing source database objects from a database

This lab will introduce you to MySQL. Begin by logging into the class web server via SSH Secure Shell Client

How To Clone, Backup & Move Your WordPress Blog! Step By Step Guide by Marian Krajcovic

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras

Zebra IMZ 320 Bluetooth / NFC Printer Guide. Created by: Ognen Plavevski, Architect, ICT4D Solutions

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras

/ Cloud Computing. Recitation 7 October 10, 2017

COPYRIGHTED MATERIAL PART I. LESSON 1: Introducing VBA. LESSON 2: Getting Started with Macros. LESSON 3: Introducing the Visual Basic Editor

Getting Started With Desk.com

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras

Permissions Tutorial: Add a Read-Only User

USING EVENTBRITE. A Guide for CLAPA Staff & Volunteers

A NoSQL Introduction for Relational Database Developers. Andrew Karcher Las Vegas SQL Saturday September 12th, 2015

A Guide to using Social Media (Facebook and Twitter)

Unifer Documentation. Release V1.0. Matthew S

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

Assignment 0. Nothing here to hand in

How to Stay Safe on Public Wi-Fi Networks

Final Cut Pro: Intro How to Make a DVD from a mini-dv tape

DISQUS. Continuous Deployment Everything. David

emerchant API guide MSSQL quick start guide

Lutheran High North Technology The Finder

Cache Coherence Tutorial

Object-Oriented Analysis and Design Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology-Kharagpur

Customizing DAZ Studio

When you first start OneNote, it creates a sample notebook for you. You can use this notebook or quickly create your own.

Frequently Asked Questions

As a lab attendant, you will be using isupport to put in tickets for issues that you work on. Those are going to break down to a few general types.

Netvibes A field guide for missions, posts and IRCs

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

INTRODUCTION TO CLOUD STORAGE

System Administrator s Handbook

The Essential Guide to VIRTUAL TEAM. Building Tools

Weebly 101. Make an Affordable, Professional Website in Less than an Hour

One of the fundamental kinds of websites that SharePoint 2010 allows

SMARTdoc v2 User Manual

Social Sharing. Facebook

File Input/Output in Python. October 9, 2017

Create-A-Page Design Documentation

Publish Content & Measure Success 1

Quick Start Guide. Microsoft OneNote 2013 looks different from previous versions, so we created this guide to help you minimize the learning curve.

CheckBook Pro 2 Help

A Guide to Understand, Install and Use Pie Register WordPress Registration Plugin

JAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights Copyright Metric insights, Inc.

Data Processing on Large Clusters. By: Stephen Cardina

What I Do After Clicking Publish

A Quick and Easy Guide To Using Canva

USING TASKS IN OUTLOOK

Getting Started: Social Media Guide

KNIME for the life sciences Cambridge Meetup

Back-end architecture

DRACULA. CSM Turner Connor Taylor, Trevor Worth June 18th, 2015

CLIENT ONBOARDING PLAN & SCRIPT

Style Report Enterprise Edition

Transcription:

SQLite vs. MongoDB for Big Data In my latest tutorial I walked readers through a Python script designed to download tweets by a set of Twitter users and insert them into an SQLite database. In this post I will provide my own thoughts on the pros and cons of using a relational database such as SQLite vs. a nosql database such as MongoDB. These are my two go-to databases for downloading and managing Big Data and there are definite advantages and disadvantages to each. The caveat is that this discussion is for researchers. Businesses will almost definitely not want to use SQLite for anything but simple applications. The Pros and Cons of SQLite SQLite has a lot going for it. I much prefer SQLite over, say, SQL. SQLite is the easiest of all relational databases. Accordingly, for someone gathering data for research SQLite is a great option. For one thing, it is pre-installed when you install Anaconda Python (my recommended installation). There s none of typical set-up with a MySQL installation, either steps such as setting up users and passwords, etc. With Anaconda Python you re good to go. Moreover, SQLite is portable. Everything is contained in a single file that can be moved around your own computer or shared with others. There s nothing complicated about it. Your SQLite database is just a regular file. Not so with MySQL, for instance, which would need to be installed separately, have user permissions set up, etc., and is definitely not so readily portable.

So, what s the downside? Two things. One, there is the set-up. To get the most out of your SQLite database, you need to predefine every column (variable) you re going to use in the database. Every tweet, for instance, will need to have the exact same variables or else your code will break. For an example of this see my recent tutorial on downloading tweets into an SQLite database. The other shortcoming flows from the pre-defining process. Some social media platforms, such as Twitter, have relatively stable APIs, which means you access the same variables the same way year in and year out. Other platforms, though (that s you, Facebook), seem to change their API constantly, which means your code to insert Facebook posts into your SQLite database will also constantly break. Here s a screenshot of what your SQLite database might look like: As you can see, it s set up like a typical flat database like an Excel spreadsheet or PANDAS or R dataframe. The columns are all pre-defined.

The Pros and Cons of MongoDB The SQLite approach contrasts starkly with the nosql approach represented by MongoDB. A primary benefit is that MongoDB is tailor-made for inserting the types of data returned by a social media platform s API particularly JSON. For instance, the Twitter API returns a JSON object for each tweet. In a prior tutorial I provide an overview of this. The code block below shows the first five lines of JSON (one line per variable) for a typical tweet object returned by the Twitter API: { _id : ObjectId( 595a71173ffc5a01d8f27de7 ), contributors : null, quoted_status_id : NumberLong(880805966375202816), text : RT @FL_Bar_Found: Thank you for your support, Stephanie! https://t.co/2vxxe3vntu, time_date_inserted : 12:30:15_03/07/2017,. } And to see the full 416 lines of JSON code for a single tweet object click on expand source below: Here is where MongoDB excels. All we need to do is grab the tweet object and tell MongoDB to insert it into our database. Do you have different columns in each tweet? MongoDB doesn t care it will just take whatever JSON you throw at it and insert it into your database. So if you are working with JSON objects that have different variables or different numbers of columns or if Facebook changes its API again you will not need to update your code and your script will not break because of it. Here s a screenshot of what the first 40 objects (tweets) in your MongoDB database might look like. You can see that the number of fields (variables) is not the same for each tweet

some have 29, some have 30, or 31, or 32: And here s what the first tweet looks like after expanding the first object: As you can see, it looks like the JSON object returned by the

Twitter API. In effect, MongoDB is great in situations where you would like to quickly grab all the data available and quickly throw it into a database. The downside of this approach is that you will have to do the defining of your data later before you can analyze it. I find this to be less and less problematic, however, since PANDAS has come around. I would much rather extract my data from MongoDB (one line of code) and do my data and variable manipulations in PANDAS rather than mess around with SQLAlchemy before even downloading the data into an SQLite database. A final benefit of MongoDB is its scalability. You have 10 million tweets to download? What about 100 million? No issues with MongoDB. With SQLite, in contrast, let s say 1 million tweets would be a good upper limit before performance drags considerably. MongoDB does have its downsides, though. Much like MySQL, MongoDB needs to be running before you insert data into it. If your server is running 24/7 that is no issue. Otherwise you ll have to remember to restart your MongoDB server each time you want to either insert data into your database or extract data you ve already inserted. MongoDB also has higher start-up costs; it is not as easy to install as SQLite and you may or may not run into disk permissions issues, username and password issues, etc. Cross your fingers and will only take you half an hour once and then you re good to go from then on. Finally, a MongoDB database is not a file like an SQLite database. This makes moving or sharing your database more troublesome. Not terribly onerous but a few extra steps. Again, if you are importing your MongoDB database into PANDAS and then using PANDAS for your data manipulations, etc., then this should not be an issue. You can easily share or move your PANDAS databases or export to CSV or Excel.

Summary Here is a summary of the pros and cons of SQLite and MongoDB for use as a Big Data-downloading database. SQLite MongoDB Portability Easy to share/move an SQLite database. Considerably more complicated. May not be an issue for you if you're work process is to export your data into PANDAS. Ease of use SQLite is simple. The database is just a single file that does not need to be 'running' 24/7. More complicated than SQLite. The MongoDB server needs to be running before your Python script can insert the data. Considerably more complicated set-up. But it is a one-time Very easy. If you have process. If you are installed Anaconda Ease of Setup lucky or are comfortable Python you are good to with the Terminal this go. one-time set-up process should not take more than an hour. Scalability Beyond a certain limit your SQLite database will become unwieldy. I've have up to a million tweets without too much difficulty, however. Can be as small or as big as you'd like.

SQLite Needs to be detailed. Setting up Every column needs to code to be defined in your insert tweets code and accounted for in each tweet. Robustness to API Changes Not robust. The Facebook API, for instance, changes almost constantly. Your database code will have to be updated each time or it will break when it tries to insert into MongoDB Easy. MongoDB will take whatever JSON Twitter throws at it and insert it into the database. Extremely robust. Easy. MongoDB will take whatever JSON you throw at it and insert it into the database. SQLite. If you ve found this post helpful please share on your favorite social media site. I my next post I will provide a tutorial of how to download tweets into a MongoDB database. Until then, happy coding!