Why I Use Python for Academic Research

Similar documents
Using the Force of Python and SAS Viya on Star Wars Fan Posts

CONTENT CALENDAR USER GUIDE SOCIAL MEDIA TABLE OF CONTENTS. Introduction pg. 3

Blurring the Line Between Developer and Data Scientist

PROJECT REPORT. TweetMine Twitter Sentiment Analysis Tool KRZYSZTOF OBLAK C

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:

Pre-Requisites: CS2510. NU Core Designations: AD

SQLite vs. MongoDB for Big Data

INTERNET MARKETING NAVEEN ANILKUMAR RELAN. Chembur Mumbai, India

Marketing & Back Office Management

RACHEL RAMJATTAN, CFRE. Stewarding Giving Day Donors How to Convert First Time Donors to Loyal Friends

Contractors Guide to Search Engine Optimization

Calendar and Checklist


Using video to drive sales

DIGIT.B4 Big Data PoC

Growing your Donors by Growing your List

End-to-End data mining feature integration, transformation and selection with Datameer Datameer, Inc. All rights reserved.

Author: Andrea Bartolini 30/10/2016

How to Use Social Media Analytics

The Definitive Guide to Preparing Your Data for Tableau

Tactic 12: Pinterest Tips

BIG DATA. Using NationBuilder to manage your data. Private Secure Dynamic Flexible

Course Syllabus. Course Information

Best of SharePoint Sites and Communities

Nonprofit Donations Products

Sharp Social. Natural Language Understanding

Job Description: Junior Front End Developer

+ + Yes. Consider traditional fundraising methods. Contact your local CANSA Care Centre for guidance. Am I comfortable with IT and software?

DATA SCIENCE NORTHWESTERN BOOT CAMP CURRICULUM OVERVIEW DATA SCIENCE BOOT CAMP

THE DATA ANALYTICS BOOT CAMP

Capacity Enhancement Courses

pandas: Rich Data Analysis Tools for Quant Finance

UCF DATA ANALYTICS AND VISUALIZATION BOOT CAMP

DATA ANALYTICS BOOT CAMP

S E N T I M E N T A N A L Y S I S O F S O C I A L M E D I A W I T H D A T A V I S U A L I S A T I O N

INFORMATION SYSTEMS AND ANALYTICS DEPARTMENT

How to Register for a Developer Account Nick V. Flor

Innovation&Development. Junior Web Developer. Portuguese (fluent) English (fluent) Spanish is a plus

SQL Server Machine Learning Marek Chmel & Vladimir Muzny

Task Delegation Checklist

ONCE SIGNED IN, YOU MAY USE YOUR HQ TO:

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Campaign Goals, Objectives and Timeline SEO & Pay Per Click Process SEO Case Studies SEO, SEM, Social Media Strategy On Page SEO Off Page SEO

JatinSir - Mastering Python

Search Engine Optimization (SEO) Services

How To Guide. ADENION GmbH Merkatorstraße Grevenbroich Germany Fon: Fax:

Today s Presentation

Navigating Your CrowdRise Dashboard Team Member Guide

Getting Started with DonateNow

GODAN Digital Communications Plan July Global Open Data for Agriculture and Nutrition

Using Social Media and New Media Technologies in Fundraising

Dealing with Data Especially Big Data

THE IMPORTANCE OF NICHE TECHNOLOGIES IN BUSINESS ANALYSIS. - Kat Okwera Jan 2019

HTML presentation, positioning and designing responsive web applications.

Part 1: How Can I Make Next Year s Event More Successful? November 15, 2010 Presenters: Amy Braiterman, Blackbaud Kim Romaszewski, Blackbaud

Welcome to the quick start guide for Leadin. We also have tutorial video s located in the members area for more of a in depth look at the software.

2013 STRIVING FOR EXCELLENCE AWARD SUBMITTAL RAISING THE BAR

GET YOUR PROFILE READY!

Spreading The Word. Tips for sharing your organization s story with the world!

Session 4: Social Media + Strategy

Now powered by Emma: A powerful tool that enables you to reach out to your constituents and grow your organization

ESTABLISHING YOUR COUNTY S ONLINE PRESENCE

Introduction to NodeXL

Tracking Online Analytics

SEO Factors Influencing National Search Results

Oracle Database 11g & MySQL 5.6 Developer Handbook (Oracle Press) PDF

Data Science Bootcamp Curriculum. NYC Data Science Academy

Data Analyst Nanodegree Syllabus

DIGITAL MARKETING Your revolution starts here

Vishnu Verrma. Calling Dreams, Full-time Blogging. Content Writing, Author & Contributor. e-commerce & Web Development, Developer

NAMI DIY: Optimizing Your Fundraising Page

Google Analytics Basics. John Sammon CEO, Sixth City Marketing

Trends in Mobile Forensics from Cellebrite

Scheduling Your Communications. How to plan ahead for successful communication before, during, and after your Giving Day

BASE STANDARD PREMIER. PRICE 290 p/m 723 p/m 1809 p/m SEO (SEARCH ENGINE OPTIMIZATION)

How to set up your Charity Choice Fundraising page

TERMS OF REFERENCE Design and website development UNDG Website

Better Funding Better Roads Video Series. 8. Display d. Interactive Presentation

Using Your CrowdRise Dashboard

Search Engine Optimization (Make your presence online with Addkoncepts)

How to NOT Get Ripped Off on Your Digital Marketing. David Mayne Vice President - Digital Strategy Performance Intermedia LLC.

FAST-A-THON SOCIAL MEDIA & APPEAL KIT

Online Copywriting Tips and Search Engine Optimisation.

The GlobalGiving Accelerator. Session 4: Social Media + Appeal Campaign Strategy

Do More With Less: The Complete Toolbox for Small & Growing Nonprofits Convio, Inc. Page 1

At the University we see a wide variety Focusing on free. 1. Preparing Data 2. Visualization

Non-GUI Test Automation Concepts and Case Studies in Maintainable Testing

Introduction to Data Analytics. David Walling

1 Topic. Image classification using Knime.

Python With Data Science

Data Analyst Nanodegree Syllabus

Oracle Responsys. Release 18B. New Feature Summary ORACLE

Beacon Catalog. Categories:

We re going to talk about whata sprint campaign is an how it relates to other grassroots campaisns

SOFTWARE DEVELOPMENT: DATA SCIENCE

Using Digital Analytics to Make Decisions

Imperva Incapsula Website Security

PEOPLE PEOPLE. Dynamic profiles of all your people, with info captured from anywhere. Includes followups & targeting.

1. Query and manipulate data with Entity Framework.

The Six Principles of BW Data Validation

Transcription:

Why I Use Python for Academic Research Academics and other researchers have to choose from a variety of research skills. Most social scientists do not add computer programming into their skill set. As a strong proponent of the value of learning a programming language, I will lay out how this has proven to be useful for me. A budding programmer could choose from a number of good options including perl, C++, Java, PHP, or others but Python has a reputation as being one of the most accessible and intuitive. I obviously like it. No matter your choice of language, there are variety of ways learning programming will be useful for social scientists and other data scientists. The most important areas are data gathering, data manipulation, and data

visualization and analysis. Data Gathering When I started learning Python four years ago, I kept a catalogue of the various scripts I wrote. Going over these scripts, I have personally written Python code to gather the following data: Download lender and borrower information for thousands of donation transactions on kiva.org. Download tweets from a list of 100 large nonprofit organizations. Download Twitter profile information from a 150 advocacy nonprofits. Scrape the Walls from 65 organizations Facebook accounts. Download @messages sent to 38 community foundations. Traverse and download html files for thousands of webpages on large accounting firms websites. Scrape data from 1,000s of organizational profiles on a charity rating site. Scrape data from several thousand organizations raising money on the crowdfunding site Indiegogo. Download hundreds of YouTube in Indiegogo fundraising campaigns. videos used Gather data available through the InfoChimps API. Scrape pinning and re-pinning data from health care organizations Pinterest accounts. Tap into the Facebook Graph API to download status updates and number of likes, comments and shares for 100 charities. This is just a sample. The point is that you can use a programming language like Python to get just about any data from the Web. When the website or social media platform makes available an API (application programming interface), accessing the data is easy. Twitter is fantastic for this very

reason. In other cases including most websites you will have to scrape the data through creative use of programming. Either way, you can gain access to valuable data. There s no need to be an expert to obtain real-world benefits from programming. I started learning Python four years ago (I now consider myself an intermediate-level programmer) and gained substantive benefits right from the start. Data Manipulation Budding researchers often seem to under-estimate how much time they will be spending on manipulating, reshaping, and processing their data. Python excels at data munging. I have recently used Python code to Loop over hundreds of thousands of tweets and modify characters, convert date formats, etc. Identify and delete duplicate entries in an SQL Loop over 74 nonprofit organizations Twitter friendfollower lists to create a 74 x 74 friendship network. Read in and write text and CSV data. Countless grouping, merging, and aggregation functions. Automatically count the number of negative words in thousands of online donation appeals. Loop over hundreds of thousands of tweets to create an edge list for a retweet network. Compute word counts for a word-document matrix from thousands of crowdfunding appeals. Create text files combining all of an organizations tweets for use in creating word clouds. Download images included in a set of tweets. Merging text files. Count number of Facebook statuses per organization. Loop over hundreds of thousands of rows of tweets in an SQLite database and create additional variables for future analysis.

Dealing with missing data. Creating dummy variables. Find the oldest entry for each organization in a Twitter Use pandas (Python Data Analysis Library) to aggregate Twitter data to the daily, weekly, and monthly level. Create a text file of all hashtags in a Twitter Data Visualization and Analysis With the proliferation of scientific computing modules such as pandas and statsmodels and scikit-learn, Python s data analysis capabilities have gotten much more powerful over the past few years. With such tools Python can now compete in many areas with devoted statistical programs such as R or Stata, which I have traditionally used for most of my data analysis and visualization. Lately I m doing more and more of this work directly in Python. Here are some of the analyses I have run recently using Python: Implement a naive Bayesian classifier to classify the sentiment in hundreds of thousands of tweets. Linguistic analysis of donation appeals and tweets using Python s Natural Language Tool Kit. Create plots of number of tweets, retweets, and public reply messages per day, week, and month. Run descriptive statistics and multiple regressions. Summary Learning a programming language is a challenge. Of that there is little doubt. Yet the payoff in improved productivity alone can be substantial. Add to that the powerful analytical and data visualization capabilities that open up to the researcher who is skilled in a programming language. Lastly, leaving aside the buzzword Big Data, programming opens up a world of new data found on websites, social media platforms, and online

data repositories. I would thus go so far as to say that any researcher interested in social media is doing themselves a great disservice by not learning some programming. For this very reason, one of my goals on this site is to provide guidance to those who are interested in getting up and running on Python for conducting academic and social media research.