eleven Documentation Release 0.1 Tim Smith

Similar documents
Package EasyqpcR. January 23, 2018

Documentation for running Normfinder in R

EasyqpcR: low-throughput real-time quantitative PCR data analysis

Command Line and Python Introduction. Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016

What Every Programmer Should Know About Floating-Point Arithmetic

CIS192 Python Programming

HANDS ON DATA MINING. By Amit Somech. Workshop in Data-science, March 2016

ARTIFICIAL INTELLIGENCE AND PYTHON

CIS192 Python Programming

hp calculators HP 17bII+ Registers / Memory Banks The Stack Registers The Storage Registers

QPCR User Guide. Version 1.8. Create Date Sep 25, 2007 Last modified Jan 26, 2010 by Stephan Pabinger

DataAssist v2.0 Software User Instructions

Package pcr. November 20, 2017

Scientific Computing with Python. Quick Introduction

Automation of qpcr Analysis

CIS192 Python Programming

INTRODUCTION TO DATABASES IN PYTHON. Filtering and Targeting Data

Building a Multi-Language Interpreter Engine

CSC Advanced Scientific Computing, Fall Numpy

When using computers, it should have a minimum number of easily identifiable states.

Functions and Decomposition

Decisions, Decisions. Testing, testing C H A P T E R 7

TESTING, DEBUGGING, EXCEPTIONS, ASSERTIONS

If Statements, For Loops, Functions

IEEE Floating Point Numbers Overview

Abstract Data Types. CS 234, Fall Types, Data Types Abstraction Abstract Data Types Preconditions, Postconditions ADT Examples

NormqPCR: Functions for normalisation of RT-qPCR data

DATA STRUCTURE AND ALGORITHM USING PYTHON

Intermediate/Advanced Python. Michael Weinstein (Day 1)

CIS192 Python Programming

Chapter 1 : Informatics Practices. Class XII ( As per CBSE Board) Advance operations on dataframes (pivoting, sorting & aggregation/descriptive

APPLICATION INFORMATION

qpcr Hand Calculations - Multiple Reference Genes

Logical operators: R provides an extensive list of logical operators. These include

Our First Programs. Programs. Hello World 10/7/2013

OREKIT IN PYTHON ACCESS THE PYTHON SCIENTIFIC ECOSYSTEM. Petrus Hyvönen

Physics 234: Lab 7 Tuesday, March 2, 2010 / Thursday, March 4, 2010

CS1 Lecture 2 Jan. 16, 2019

Data Science with Python Course Catalog

CMSC 201 Fall 2016 Lab 09 Advanced Debugging

How to import text files to Microsoft Excel 2016:

Pandas UDF Scalable Analysis with Python and PySpark. Li Jin, Two Sigma Investments

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT

CSE030 Fall 2012 Final Exam Friday, December 14, PM

We ll be using data on loans. The website also has data on lenders.

CS 1110, LAB 1: PYTHON EXPRESSIONS.

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #43. Multidimensional Arrays

MA 1128: Lecture 02 1/22/2018

Introduction to Python: The Multi-Purpose Programming Language. Robert M. Porsch June 14, 2017

SQLite vs. MongoDB for Big Data

Programming for Data Science Syllabus

Touchstone Syntax for Versions 1.0 and 2.0

Fall 2017 December 4, Scheme. Instructions

Fall 2017 December 4, 2017

Package SC3. November 27, 2017

Today. Parity. General Polygons? Non-Zero Winding Rule. Winding Numbers. CS559 Lecture 11 Polygons, Transformations

Lecture 5 8/24/18. Writing larger programs. Comments. What are we going to cover today? Using Comments. Comments in Python. Writing larger programs

Processing SAS Data Sets

COMP1730/COMP6730 Programming for Scientists. Exceptions and exception handling

Intro. Scheme Basics. scm> 5 5. scm>

Package SC3. September 29, 2018

Introduction to Computer Programming for Non-Majors

2.3 Algorithms Using Map-Reduce

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

CS1 Lecture 5 Jan. 25, 2019

Lab 8 - Subset Selection in Python

CATEGORY SKILL SET REF. TASK ITEM. 1.1 Working with Spreadsheets Open, close a spreadsheet application. Open, close spreadsheets.

Applying Data-Driven Normalization Strategies for qpcr Data Using Bioconductor

Toolboxes for Data Scientists

(Refer Slide Time 01:41 min)

GEO 425: SPRING 2012 LAB 9: Introduction to Postgresql and SQL

Scientific Computing. Error Analysis

Connecting ArcGIS with R and Conda. Shaun Walbridge

Lab 9 - Linear Model Selection in Python

CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims. Lecture 10: Asymptotic Complexity and

Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi

Scientific Computing using Python

Top of Minds Report series Removing duplicates using Informatica PowerCenter

Ch.1 Introduction. Why Machine Learning (ML)?

Data Analysis and Hypothesis Testing Using the Python ecosystem

UTILITY FUNCTIONS IN R

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

The following is the Syllabus for Module 4, Spreadsheets, which provides the basis for the practice-based test in this module.

: Intro Programming for Scientists and Engineers Final Exam

MUTABLE LISTS AND DICTIONARIES 4

What is OneNote? The first time you start OneNote, it asks you to sign in. Sign in with your personal Microsoft account.

Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller

DAY 7: EXCEL CHAPTER 5. Divya Ganesan February 5, 2013

Python for Scientists

CME 193: Introduction to Scientific Python Lecture 1: Introduction

CSEP 573: Artificial Intelligence

Introduction to Python for Scientific Computing

Hyperion Essbase Audit Logs Turning Off Without Notification

SAS and Python: The Perfect Partners in Crime

Problem Based Learning 2018

INTERMEDIATE PYTHON FOR DATA SCIENCE. Comparison Operators

DSC 201: Data Analysis & Visualization

News Headlines Monitor

Project Report Number Plate Recognition

Introduction to PeopleSoft Query. The University of British Columbia

Transcription:

eleven Documentation Release 0.1 Tim Smith February 21, 2014

Contents 1 eleven package 3 1.1 Module contents............................................. 3 Python Module Index 7 i

ii

eleven Documentation, Release 0.1 Tim D. Smith, @biotimylated. Eleven is a Python library for performing multi-gene RT-qPCR gene expression normalization. It is a free, open-source implementation of the GeNorm algorithm described by Vandesompele et al. in 2002. Eleven requires Python 2.7. Earlier versions will not be supported. Python 3.x support is on the roadmap. You will need a Scientific Python stack, including pandas and scipy. If you don t have these, you can install the free version of the Anaconda environment, which has everything you need. A sample analysis session looks like this: # Read PCR data into a pandas DataFrame. You want a data file where each # row corresponds to a separate well, with columns for the sample name, # target name, and Cq value. NTC wells should have the sample name set to # a value like NTC. >> df = pd.read_csv( my_data.csv ) # If your Sample, Target, and Cq columns are called other things, they # should be renamed to Sample, Target, and Cq. >> df = df.rename(columns={ Gene : Target, Ct : Cq }) # Drop the wells that are too close to the NTC for that target. >> censored = eleven.censor_background(df) # Rank your candidate reference genes. >> ranked = eleven.rank_targets(censored, [ Gapdh, Rn18s, Hprt, Ubc, Actb ], Control ) # Normalize your data by your most stable genes and compute normalization # factors (NFs). >> nf = eleven.calculate_nf(censored, ranked.ix[ Target, 0:3], Control ) # Now, normalize all of your expression data. >> censored[ RelExp ] = eleven.expression_nf(censored, nf, Control ) Wasn t that easy? This adds the relative expression of each well as a column of the data frame. Now you can use regular pandas tools for handling the data, so censored.groupby([ Sample, Target ])[ RelExp ].aggregate([ mean, std ]) gives you a nice table of means and standard deviations for each target in each sample. Contents 1

eleven Documentation, Release 0.1 2 Contents

CHAPTER 1 eleven package 1.1 Module contents eleven.average_cq(seq, efficiency=1.0) Given a set of Cq values, return the Cq value that represents the average expression level of the input. The intent is to average the expression levels of the samples, since the average of Cq values is not biologically meaningful. seq (iterable) A sequence (e.g. list, array, or Series) of Cq values. efficiency (float) The fractional efficiency of the PCR reaction; i.e. 1.0 is 100% efficiency, producing 2 copies per amplicon per cycle. Returns Cq value representing average expression level Return type float eleven.calculate_all_nfs(sample_frame, ranked_targets, ref_sample) For a set of n ranked_genes, calculates normalization factors NF_1, NF_2,..., NF_n. NF_i represents the normalization factor generated by considering the first i targets in ranked_targets. calculate_nf (which returns only NF_n) is probably more useful for routine analysis. ranked_targets (iterable) A list or Series of target names, in order of descending stability (ascending M). ref_sample (string) The name of the sample to normalize against. Returns a DataFrame with columns 1, 2,..., n containing normalization factors NF_1,..., NF_n for each sample, indexed by sample name. Return type DataFrame eleven.calculate_nf(sample_frame, ref_targets, ref_sample) Calculates a normalization factor from the geometric mean of the expression of all ref_targets, normalized to a reference sample. 3

eleven Documentation, Release 0.1 ref_targets (iterable) A list or Series of target names. ref_sample (string) The name of the sample to normalize against. Returns a Series indexed by sample name containing normalization factors for each sample. eleven.calculate_v(nfs) Calculates V(n+1/n) values. Useful for establishing the quality of your normalization regime. See Vandesompele 2002 for advice on interpretation. nfs (DataFrame) A matrix of all normalization factors, produced by calculate_all_nfs. Returns a Series of values [V(2/1), V(3/2), V(4/3),...]. eleven.censor_background(sample_frame, ntc_samples=[ NTC ], margin=none) Selects rows from the sample data frame that fall margin or greater cycles earlier than the NTC for that target. NTC wells are recognized by string matching against the Sample column. ntc_samples (iterable) A sequence of strings giving the sample names of your NTC wells, i.e. [ NTC ] margin (float) The number of cycles earlier than the NTC for a good sample, i.e. log2(10) Returns a view of the sample data frame containing only non-background rows Return type DataFrame eleven.collect_expression(sample_frame, ref_targets, ref_sample) Calculates the expression of all rows in the sample_frame relative to each of the ref_targets. rank_targets. ref_targets (iterable) A sequence of targets from the Target column of the sample frame. ref_sample (string) The name of the sample to which expression should be referenced. Returns a DataFrame of relative expression; rows represent rows of the sample_frame and columns represent each of the ref_targets. Return type DataFrame Used in eleven.expression_ddcq(sample_frame, ref_target, ref_sample) Calculates expression of samples in a sample data frame relative to a single reference gene and reference sample using the Cq method. For best results, the ref_sample should be defined for all targets and the ref_target should be defined for all samples, or else the series you get back will have lots of NaNs. ref_target (string) A string matching an entry of the Target column; the target to use as the reference target (e.g. Gapdh ) ref_sample (string) A string matching an entry of the Sample column. Returns a Series of expression values for each row of the sample data frame. 4 Chapter 1. eleven package

eleven Documentation, Release 0.1 Return type Series eleven.expression_nf(sample_frame, nf_n, ref_sample) Calculates expression of samples in a sample data frame relative to pre-computed normalization factors. ref_sample should be defined for all targets or the result will contain many NaNs. eleven.log2(x) nf_n (Series) A Series of normalization factors indexed by sample. You probably got this from compute_nf. ref_sample (string) The name of the sample to normalize against, which should match a value in the sample_frame Sample column. Returns a Series of expression values for each row in the sample data frame. Return type Series eleven.rank_targets(sample_frame, ref_targets, ref_sample) Uses the genorm algorithm to determine the most stably expressed genes from amongst ref_targets in your sample. See Vandesompele et al. s 2002 Genome Biology paper for information about the algorithm: http://dx.doi.org/10.1186/gb-2002-3-7-research0034 ref_targets (iterable) A sequence of targets from the Target column of sample_frame to consider for ranking. ref_sample (string) The name of a sample from the Sample column of sample_frame. It doesn t really matter what it is but it should exist for every target. Returns a sorted DataFrame with two columns, Target and M (the relative stability; lower means more stable). Return type DataFrame eleven.validate_sample_frame(sample_frame) Makes sure that sample_frame has the columns we expect. sample_frame (DataFrame) A sample data frame. Returns True (or raises an exception) Raises TypeError if sample_frame is not a pandas DataFrame ValueError if columns are missing or the wrong type 1.1. Module contents 5

eleven Documentation, Release 0.1 6 Chapter 1. eleven package

Python Module Index e eleven, 3 7