Coding Tools for Research

Similar documents
CS 177 Recitation. Week 1 Intro to Java

Welcome to Python! If you re the type of person who wants to know

Week - 01 Lecture - 04 Downloading and installing Python

Software Development. Integrated Software Environment

Reviewing gcc, make, gdb, and Linux Editors 1

Chris Simpkins (Georgia Tech) CS 2316 Data Manipulation for Engineers Python Overview 1 / 9

Computing and compilers

Online Interactive IT Training Programmes for Staff Course Outline

Alternate Format for STEM

Using IDLE for

Power BI 1 - Create a dashboard on powerbi.com... 1 Power BI 2 - Model Data with the Power BI Desktop... 1

C Programming. A quick introduction for embedded systems. Dr. Alun Moon UNN/CEIS. September 2008

The Cantor Handbook. Alexander Rieder

Introduction to Data management

Reproducible Research with R and RStudio

CS 390 Chapter 2 Homework Solutions

SQL Server 2017: Data Science with Python or R?

Zend Studio has the reputation of being one of the most mature and powerful

Coding in C at Home. Part 2 Computational Physics. April 21, 2016

Operating System Services. User Services. System Operation Services. User Operating System Interface - CLI. A View of Operating System Services

Software Concepts. It is a translator that converts high level language to machine level language.

DKM COLLEGE FOR WOMEN (AUTONOMOUS),VELLORE-1.

The Evolution of Big Data Platforms and Data Science

CS 370 The Pseudocode Programming Process D R. M I C H A E L J. R E A L E F A L L

Motion Graph Practice

Explore commands on the ribbon Each ribbon tab has groups, and each group has a set of related commands.

Geographic Information Systems (GIS) - Hardware and software in GIS

L A TEX for psychological researchers

Software Development I

Programming the DMCC in C

Short Introduction to ESS

Simplicity and minimalism in software development

Visual Basic Primer A. A. Cousins

Semester 2, 2018: Lab 1

These are notes for the third lecture; if statements and loops.

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur

CSE 303: Concepts and Tools for Software Development

Professor Hugh C. Lauer CS-1004 Introduction to Programming for Non-Majors

M1-R4: IT TOOLS AND BUSINESS SYSTEMS

ODF for Blender in an elearning Context

Plan. Language engineering and Domain Specific Languages. Language designer defines syntax. How to define language

Python for Analytics. Python Fundamentals RSI Chapters 1 and 2

Welcome Back! Without further delay, let s get started! First Things First. If you haven t done it already, download Turbo Lister from ebay.

CptS 360 (System Programming) Unit 3: Development Tools

Using Images in FF&EZ within a Citrix Environment

Functional Programming and the Web

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

Cheiron Web Site: To Do and Done

CS 246 Winter Tutorial 1

How to Make Your Content ADA Compliant

DETAILED SYLLABUS 1. INTRODUCTION TO COMPUTER

Chapter 2. Operating-System Structures

Graduate Topics in Biophysical Chemistry CH Assignment 0 (Programming Assignment) Due Monday, March 19

Staff Microsoft Office Training Workshops

Computing Long Term Plan

Unix/Linux Operating System. Introduction to Computational Statistics STAT 598G, Fall 2011

Introduction to Python Part 2

Chapter 2. Tutorial: Extracting tables from PDF files

Lab::Measurement Instrumentation control with Perl The Next Generation

Software Revision Control for MASS. Git Basics, Best Practices

Qualities of software and its development

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Common Lisp Fundamentals

Transformer Looping Functions for Pivoting the data :

Read & Download (PDF Kindle) Microsoft SQL Server 2008 Administrator's Pocket Consultant

COMP26120 Academic Session: Lab Exercise 2: Input/Output; Strings and Program Parameters; Error Handling

DIRECTV Message Board

Python Programming: An Introduction to Computer Science

Identifiers. Identifiers are the words a programmer uses in a program Some identifiers are already defined. Some are made up by the programmer:

Quick Web Development using JDeveloper 10g

Lecture 2 Operating System Structures (chapter 2)

Basic Computer Course

Chapter 2: System Structures. Operating System Concepts 9 th Edition

CS240: Programming in C

The diverse software in Adobe Creative Suite 2 enables you to create

Part 3. Operating Systems

Using Microsoft PowerPoint for Our Evolved Brand

MATLAB is a multi-paradigm numerical computing environment fourth-generation programming language. A proprietary programming language developed by

Chapter 3: Operating-System Structures

C++ For Science and Engineering Lecture 2

How to version control like a pro: a roadmap to your reproducible & collaborative research

About Intellipaat. About the Course. Why Take This Course?

Genome Assembly. 2 Sept. Groups. Wiki. Job files Read cleaning Other cleaning Genome Assembly

Exercises: Instructions and Advice

It is written in plain language: no jargon, nor formality. Information gets across faster when it s written in words that our users actually use.

Matlab for FMRI Module 1: the basics Instructor: Luis Hernandez-Garcia

EG 4.1. PC-SAS users. for. I C T EG 4.1 for PC-SAS Users. Thursday - May 7 th, 2009

Getting started with Hugs on Linux

2 A little on Spreadsheets

Benchmarking of Different Programming Languages

Paper ###-YYYY. SAS Enterprise Guide: A Revolutionary Tool! Jennifer First, Systems Seminar Consultants, Madison, WI

CSC 443: Web Programming

Misc. Third Generation Batch Multiprogramming. Fourth Generation Time Sharing. Last Time Evolution of OSs

Assignment 6: The Power of Caches

CSE 4/521 Introduction to Operating Systems

LOGGING IN TO EASYBIB 1. Using Google Chrome, go to and click on Login: 2. Click on the option Sign in with Google:

Scientific Python. 1 of 10 23/11/ :00

Python is available at: Why Python is awesome:

Last class: OS and Architecture. OS and Computer Architecture

The Root Cause of Unstructured Data Problems is Not What You Think

Transcription:

Coding Tools for Research Jack Baker Jack Baker Coding Tools for Research 1 / 11

Good Coding Practice in One Slide Modular: write code in small functions which do one thing. Indent!! Self documenting: variable/function names should describe themselves as much as possible! Names tell the story, comments say why. Think before you C & P: write functions as abstract as possible. Use a decent editor. Don t write too much per line. Style guides! Books: Code Complete, Clean Code Jack Baker Coding Tools for Research 2 / 11

Managing Simulations Main aims: allow mistakes to be made without having to rerun 600 expts! Separate out procedures: I normally have 4 directories: models, methods, assess & plot Store everything: Create a pipeline, store output of each procedure, including tuning parameters. So only need to run what s needed. Good way of doing this is storing lists or objects to file using e.g. R save or python pickle. Version control useful! Makefiles can be a good way of running things only when needed (see Jamie Fbrot presentation on Sharepoint) Jack Baker Coding Tools for Research 3 / 11

Presentations/Reports Animations: animate package in Latex allows you to make animations from a list of numbered pictures (e.g. plots you ve created from R). Can also do this using any gif creator and powerpoint. Figures: can convert your R plots to latex code then edit using TiKz. Inkscape is a better paint for creating diagrams, can also work with LaTex Jack Baker Coding Tools for Research 4 / 11

Programming Languages Interpreted: Uses code as is, quicker to code, slower to run: Python, R,... Compiled: Translates code to instructions your machine can read. Slower to code, quicker to run: C++, C,... Blurring Lines: There are things such as just in time compilers; there are compilers for interpreted languages: Julia, Cython,... Readable before speed: only speed up the bottleneck! Jack Baker Coding Tools for Research 5 / 11

Programming Languages R: excellent for stats, lots of packages, slow (especially for linear algebra), not general purpose. Python: plenty of packages, general purpose (e.g. scraping), slow, but sophisticated options for speed-ups (Cython). Decent linear algebra. Julia: fast especially for loops, new so less packages + web info, excellent for optimization, better than R for general purpose stuff language changes a lot. C/++: very fast but hard to write well, slow to develop in, would only recommend for speed-ups. Jack Baker Coding Tools for Research 6 / 11

Easier Speed Ups Vectorize in Python/R! Parallelize: easy packages for R, Python & Julia that can run code on multiple cores. STORM: run large amount of code in batches. More memory. CPUs the same(ish) specs but lots of them. Session on this to follow. Cython: minimally change python code and compile. PyTorch/TensorFlow: excellent linear algebra and autodiff (exact) packages for Python/R. Fast. Jack Baker Coding Tools for Research 7 / 11

Text editors Why: good text editors make your life a lot easier! Can use one for everything: latex, C, Julia, R. Can be more productive. Emacs: excellent option, provides full environment for R/Latex/Python, etc. Bulky. Vim: more streamlined than Emacs and allows you to chain commands, closer to the terminal, but steeper learning curve. Other Options: sublime. Jack Baker Coding Tools for Research 8 / 11

Linux Terminal Why use/learn: powerful for coding, searching, manipulating files; fast; increases productivity. Needed for STORM! Example: have 100 data frames in a directory. You want to stack ones that have similar names after adding a column. 1 liner in terminal and much faster than R. My Slides from STORC: http://lancs.ac.uk/ bakerj1/pdfs/tutorials/linux.pdf Cheat sheet: http://cli.learncodethehardway.org/bash_cheat_sheet.pdf Jack Baker Coding Tools for Research 9 / 11

Object Oriented Programming Collects data and associated functions together. No longer need 600 argument functions! Very useful for big projects. Can easily store complex structures of data. Support in R, Python, Julia. Jack Baker Coding Tools for Research 10 / 11

Misc Mendeley: organiser for your papers. Version control: invaluable organiser, backup and collaboration for code. Look at Jamie s slides I sent around. Algorithms: Might need knowledge for jobs. Hadoop: Scalable database structure accessed using SQL, might be useful for jobs. Functional Programming: Programming paradigm gaining some traction easier to scale/parallelize. Jack Baker Coding Tools for Research 11 / 11