Coding Tools for Research Jack Baker Jack Baker Coding Tools for Research 1 / 11
Good Coding Practice in One Slide Modular: write code in small functions which do one thing. Indent!! Self documenting: variable/function names should describe themselves as much as possible! Names tell the story, comments say why. Think before you C & P: write functions as abstract as possible. Use a decent editor. Don t write too much per line. Style guides! Books: Code Complete, Clean Code Jack Baker Coding Tools for Research 2 / 11
Managing Simulations Main aims: allow mistakes to be made without having to rerun 600 expts! Separate out procedures: I normally have 4 directories: models, methods, assess & plot Store everything: Create a pipeline, store output of each procedure, including tuning parameters. So only need to run what s needed. Good way of doing this is storing lists or objects to file using e.g. R save or python pickle. Version control useful! Makefiles can be a good way of running things only when needed (see Jamie Fbrot presentation on Sharepoint) Jack Baker Coding Tools for Research 3 / 11
Presentations/Reports Animations: animate package in Latex allows you to make animations from a list of numbered pictures (e.g. plots you ve created from R). Can also do this using any gif creator and powerpoint. Figures: can convert your R plots to latex code then edit using TiKz. Inkscape is a better paint for creating diagrams, can also work with LaTex Jack Baker Coding Tools for Research 4 / 11
Programming Languages Interpreted: Uses code as is, quicker to code, slower to run: Python, R,... Compiled: Translates code to instructions your machine can read. Slower to code, quicker to run: C++, C,... Blurring Lines: There are things such as just in time compilers; there are compilers for interpreted languages: Julia, Cython,... Readable before speed: only speed up the bottleneck! Jack Baker Coding Tools for Research 5 / 11
Programming Languages R: excellent for stats, lots of packages, slow (especially for linear algebra), not general purpose. Python: plenty of packages, general purpose (e.g. scraping), slow, but sophisticated options for speed-ups (Cython). Decent linear algebra. Julia: fast especially for loops, new so less packages + web info, excellent for optimization, better than R for general purpose stuff language changes a lot. C/++: very fast but hard to write well, slow to develop in, would only recommend for speed-ups. Jack Baker Coding Tools for Research 6 / 11
Easier Speed Ups Vectorize in Python/R! Parallelize: easy packages for R, Python & Julia that can run code on multiple cores. STORM: run large amount of code in batches. More memory. CPUs the same(ish) specs but lots of them. Session on this to follow. Cython: minimally change python code and compile. PyTorch/TensorFlow: excellent linear algebra and autodiff (exact) packages for Python/R. Fast. Jack Baker Coding Tools for Research 7 / 11
Text editors Why: good text editors make your life a lot easier! Can use one for everything: latex, C, Julia, R. Can be more productive. Emacs: excellent option, provides full environment for R/Latex/Python, etc. Bulky. Vim: more streamlined than Emacs and allows you to chain commands, closer to the terminal, but steeper learning curve. Other Options: sublime. Jack Baker Coding Tools for Research 8 / 11
Linux Terminal Why use/learn: powerful for coding, searching, manipulating files; fast; increases productivity. Needed for STORM! Example: have 100 data frames in a directory. You want to stack ones that have similar names after adding a column. 1 liner in terminal and much faster than R. My Slides from STORC: http://lancs.ac.uk/ bakerj1/pdfs/tutorials/linux.pdf Cheat sheet: http://cli.learncodethehardway.org/bash_cheat_sheet.pdf Jack Baker Coding Tools for Research 9 / 11
Object Oriented Programming Collects data and associated functions together. No longer need 600 argument functions! Very useful for big projects. Can easily store complex structures of data. Support in R, Python, Julia. Jack Baker Coding Tools for Research 10 / 11
Misc Mendeley: organiser for your papers. Version control: invaluable organiser, backup and collaboration for code. Look at Jamie s slides I sent around. Algorithms: Might need knowledge for jobs. Hadoop: Scalable database structure accessed using SQL, might be useful for jobs. Functional Programming: Programming paradigm gaining some traction easier to scale/parallelize. Jack Baker Coding Tools for Research 11 / 11