Practical Bioinformatics

Similar documents
Practical Bioinformatics

Differential Expression Analysis at PATRIC

Strings. Genome 373 Genomic Informatics Elhanan Borenstein

File Input/Output. Learning Outcomes 10/8/2012. CMSC 201 Fall 2012 Instructor: John Park Lecture Section 01. Discussion Sections 02-08, 16, 17

Creating consistent content pages

Files. Reading from a file

Fun with functions! Built-in and user-defined functions

Before Reading Week. Lists. List methods. Nested Lists. Looping through lists using for loops. While loops

Introduction to: Computers & Programming: Review prior to 1 st Midterm

GEOMETRIC TRANSFORMATIONS AND VIEWING

Genomics - Problem Set 2 Part 1 due Friday, 1/26/2018 by 9:00am Part 2 due Friday, 2/2/2018 by 9:00am

An introduction to plotting data

d-file Language Reference Manual

Exploring Microsoft Office Excel 2007

Exercise: Graphing and Least Squares Fitting in Quattro Pro

Session 1: Introduction to Python from the Matlab perspective. October 9th, 2017 Sandra Diaz

Genomics - Problem Set 2 Part 1 due Friday, 1/25/2019 by 9:00am Part 2 due Friday, 2/1/2019 by 9:00am

More on Lists, Dictionaries Introduction to File I/O CS 8: Introduction to Computer Science Lecture #10

Spreadsheet and Graphing Exercise Biology 210 Introduction to Research

.txt - Exporting and Importing. Table of Contents

We ll be using data on loans. The website also has data on lenders.

Built-in functions. You ve used several functions already. >>> len("atggtca") 7 >>> abs(-6) 6 >>> float("3.1415") >>>

(b) If a heap has n elements, what s the height of the tree?

write does not automatically append newlines

CompClustTk Manual & Tutorial

ClaNC: The Manual (v1.1)

BCLUST -- A program to assess reliability of gene clusters from expression data by using consensus tree and bootstrap resampling method

INTRODUCTION to. Program in Statistics and Methodology (PRISM) Daniel Blake & Benjamin Jones January 15, 2010

Algorithms for Bioinformatics

File Input/Output in Python. October 9, 2017

Introduction to Python Programming

Microarray Excel Hands-on Workshop Handout

Manual Physical Inventory Upload Created on 3/17/2017 7:37:00 AM

Using Basic Formulas 4

INTRODUCTION. a Data File

CS101: Intro to Computing Summer Week 1

Chapter 5: Compatibility of Data Files

Lecture #13: More Sequences and Strings. Last modified: Tue Mar 18 16:17: CS61A: Lecture #13 1

Create formulas in Excel

MacVector for Mac OS X

h"p://bioinfo.umassmed.edu/bootstrappers/bootstrappers- courses/python1/index.html Amp III S6-102

printf( Please enter another number: ); scanf( %d, &num2);

Key Differences Between Python and Java

String Computation Program

II-9Importing and Exporting Data

CSI33 Data Structures

Introduction to C Language

Intermediate/Advanced Python. Michael Weinstein (Day 1)

Essential Skills for Bioinformatics: Unix/Linux

Office 2016 Excel Basics 06 Video/Class Project #18 Excel Basics 6: Customize Quick Access Toolbar (QAT) and Show New Ribbon Tabs

Excel Primer CH141 Fall, 2017

LING 388: Computers and Language. Lecture 5

Strings. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

CPSC 217 Midterm (Python 3 version)

Introduction to CS databases and statistics in Excel Jacek Wiślicki, Laurent Babout,

[301] JSON. Tyler Caraza-Harter

Introduction to Spreadsheets

Scottish Improvement Skills

StatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved.

CS 2316 Individual Homework 4 Greedy Scheduler (Part I) Due: Wednesday, September 18th, before 11:55 PM Out of 100 points

get set up for today s workshop

Text. Text Actions. String Contains

Quantification. Part I, using Excel

Learning Forth. Developer Technical Support DTS. for Macintosh Open Firmware (Part I) 2/1/01 version 0.9 (part I) 1

Algorithms for Bioinformatics

Group Administrator. ebills csv file formatting by class level. User Guide

Chapter 3: Data Description Calculate Mean, Median, Mode, Range, Variation, Standard Deviation, Quartiles, standard scores; construct Boxplots.

Topological Invariants with Z2Pack. Topological Matter School 2016, Donostia

CSC326 Python Imperative Core (Lec 2)

Math 15 - Spring Homework 5.2 Solutions

CSci 1113 Lab Exercise 6 (Week 7): Arrays & Strings

Optimization in One Variable Using Solver

Data and Expressions. Outline. Data and Expressions 12/18/2010. Let's explore some other fundamental programming concepts. Chapter 2 focuses on:

GIS 4653/5653: Spatial Programming and GIS. More Python: Statements, Types, Functions, Modules, Classes

Importing data in a database with levels

Lecture 3. Essential skills for bioinformatics: Unix/Linux

Python Working with files. May 4, 2017

What is KNIME? workflows nodes standard data mining, data analysis data manipulation

1. What type of error produces incorrect results but does not prevent the program from running? a. syntax b. logic c. grammatical d.

Variables and literals

CS313 Exercise 4 Cover Page Fall 2017

Section 9 Linking & Importing

Python allows variables to hold string values, just like any other type (Boolean, int, float). So, the following assignment statements are valid:

Purpose: To learn to use and format Mathcad worksheets. To become familiar with documenting related Excel/Mathcad homework assignments.

STEM. Short Time-series Expression Miner (v1.1) User Manual

File input and output and conditionals. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

WinFlexOne - Importer MHM Resources LLC

Using Excel for Graphical Analysis of Data

Click Trust to launch TableView.

Chapter 10: Creating and Modifying Text Lists Modules

S Understand more about variable scope, in the context of function calls and recursion. Practice reading and writing files.

Programming Logic and Design Seventh Edition Chapter 2 Elements of High-Quality Programs

Lab 2: input(), if else, String Operations, Variable Assignment. Ling 1330/2330: Computational Linguistics Na-Rae Han

Python for Non-programmers

Graphical Analysis of Data using Microsoft Excel [2016 Version]

What is a function? "The Practice of Computing Using Python", Punch & Enbody, Copyright 2013 Pearson Education, Inc.

Performing whole genome SNP analysis with mapping performed locally

The Ribbon The Ribbon contains multiple tabs, each with several groups of commands. You can add your own tabs that contain your favorite commands.

Application of Hierarchical Clustering to Find Expression Modules in Cancer

EXCEL IMPORT user guide

Transcription:

5/12/2015

Gotchas Strings are quoted, names of things are not. mystring = mystring

Gotchas Strings are quoted, names of things are not. mystring = mystring Case matters for variable names: mystring MyString

Gotchas Strings are quoted, names of things are not. mystring = mystring Case matters for variable names: mystring MyString Case matters for string comparison: atg ATG

Gotchas Strings are quoted, names of things are not. mystring = mystring Case matters for variable names: mystring MyString Case matters for string comparison: atg ATG Normalize sequence comparison to uppercase ATGCTGTA. upper ( ) == ATgcTgTA. upper ( )

Gotchas Strings are quoted, names of things are not. mystring = mystring Case matters for variable names: mystring MyString Case matters for string comparison: atg ATG Normalize sequence comparison to uppercase ATGCTGTA. upper ( ) == ATgcTgTA. upper ( ) (And treat RNA as cdna)

Gotchas Statements that precede code blocks (if, def, for, while,...) end with a colon. def mean ( x ) : s = 0. 0 f o r i i n x : s += i return s / len ( x )

Gotchas Statements that precede code blocks (if, def, for, while,...) end with a colon. def mean ( x ) : s = 0. 0 f o r i i n x : s += i return s / len ( x ) You can use tab and shift-tab in IPython to indent/unindent blocks of code

Gotchas Statements that precede code blocks (if, def, for, while,...) end with a colon. def mean ( x ) : s = 0. 0 f o r i i n x : s += i return s / len ( x ) You can use tab and shift-tab in IPython to indent/unindent blocks of code Loop variables retain their state after the loop is finished (so if you want to reuse the variable, you need to reinitialize it).

Mean def mean ( x ) : s = 0. 0 f o r i i n x : s += i return s / len ( x ) def mean ( x ) : return sum( x )/ f l o a t ( len ( x ) )

Standard Deviation σ x = N i (x i x) 2 N 1

Standard Deviation σ x = N i (x i x) 2 N 1 def s t d e v ( x ) : m = mean ( x ) s = 0. 0 f o r i i n x : s += ( i m) 2 from math import s q r t return s q r t ( s /( len ( x ) 1 ) )

Pearson s Correlation Coefficient r(x, y) = i (x i x)(y i ȳ) i (x i x) 2 i (y i ȳ) 2

Pearson s Correlation Coefficient def p e a r s o n ( x, y ) : mx = mean ( x ) my = mean ( y ) sxy = 0. 0 s s x = 0. 0 s s y = 0. 0 f o r i i n range ( len ( x ) ) : dx = x [ i ] mx dy = y [ i ] my sxy += dx dy s s x += dx 2 s s y += dy 2 from math import s q r t return sxy / s q r t ( s s x s s y ) r(x, y) = i (x i x)(y i ȳ) i (x i x) 2 i (y i ȳ) 2

Subject, verb that noun! return value = object.function(parameter,...) Object, do function to parameter file = open( myfile.txt ) file.read() file.readlines() for line in file: string.split() and string.join() file.write()

Binary files are like genomic DNA hexdump -C computers.png fp = open( computers.png ) fp.read(50) fp.close()

Text files are like ORFs hexdump -C 3 4 2010.txt

OS X sometimes uses CR newlines hexdump -C macfile.txt tr \r \n < macfile.txt > unixfile.txt

Windows uses CRLF newlines hexdump -C dosfile.txt

supp2data.csv CSV File

open( supp2data.csv ) File object CSV File

open( supp2data.csv ).next() single line File object CSV File

open( supp2data.csv ).read() single line whole file File object CSV File

csv.reader(open( supp2data.csv )).next() list reader File object CSV File

csv.reader(urlopen( http://example.com/csv )).next() list reader urllib object Web service CSV File

The CDT file format Minimal CLUSTER input Cluster3 CDT output Tab delimited (\t) UNIX newlines (\n) Missing values empty cells

Homework 1 Try reading the first few bytes of different files on your computer. Can you distinguish binary files from text files? 2 Create a simple data table in your favorite spreadsheet program and save it in a text format (e.g., save as CSV or tab-delimited text from Excel 1 ). Practice reading the data from Python. 3 Write a function to disect supp2data.cdt into three lists of strings (gene names, gene annotations, and experimental conditions) and one matrix (list of lists) of log ratio values (as floats, using None or 0. to represent missing values). 4 If you are familiar with Python classes, write a CDT class based on the parse in the previous exercise. Provide methods for looking up annotations and log ratios by gene name. 1 Note for Mac users: Excel will offer you Macintosh and DOS/Windows text formats. Choose DOS/Windows; otherwise, Python will think that the entire file is a single line.