from scratch A primer for scientists working with Next-Generation- Sequencing data Chapter 1 Text output and manipulation

Similar documents
Fundamentals: Expressions and Assignment

Algorithms and Programming I. Lecture#12 Spring 2015

Table of Contents EVALUATION COPY

Hello, World! An Easy Intro to Python & Programming. Jack Rosenthal

CSCE 110 Programming I

Python The way of a program. Srinidhi H Asst Professor Dept of CSE, MSRIT

Introduction to computers and Python. Matthieu Choplin

DaMPL. Language Reference Manual. Henrique Grando

day one day four today day five day three Python for Biologists

Introduction to: Computers & Programming: Review prior to 1 st Midterm

COMP1730/COMP6730 Programming for Scientists. Data: Values, types and expressions.

CSCE 110 Programming I Basics of Python: Variables, Expressions, Input/Output

1. What type of error produces incorrect results but does not prevent the program from running? a. syntax b. logic c. grammatical d.

MEIN 50010: Python Introduction

CS 115 Lecture 4. More Python; testing software. Neil Moore

from scratch A primer for scientists working with Next-Generation- Sequencing data CHAPTER 8 biopython

c) Comments do not cause any machine language object code to be generated. d) Lengthy comments can cause poor execution-time performance.

Python Programming Exercises 1

>>> * *(25**0.16) *10*(25**0.16)

Variables, Constants, and Data Types

Overview.

Advanced Algorithms and Computational Models (module A)

Programming for Engineers Introduction to C

CSCI 121: Anatomy of a Python Script

Scripting Languages. Python basics

Programming in Python 3

Programming Fundamentals and Python

Full file at C How to Program, 6/e Multiple Choice Test Bank

ENGR 101 Engineering Design Workshop

CSC 108H: Introduction to Computer Programming. Summer Marek Janicki

Python Problems MTH 151. Texas A&M University. November 8, 2017

CMPT 120 Basics of Python. Summer 2012 Instructor: Hassan Khosravi

Introduction to programming with Python

CS Introduction to Computational and Data Science. Instructor: Renzhi Cao Computer Science Department Pacific Lutheran University Spring 2017

Getting started with Java

Introduction to Python

Expressions and Variables

Basic data types. Building blocks of computation

Strings. Genome 373 Genomic Informatics Elhanan Borenstein

Python allows variables to hold string values, just like any other type (Boolean, int, float). So, the following assignment statements are valid:

Python for Non-programmers

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

CSI Lab 02. Tuesday, January 21st

18.1. CS 102 Unit 18. Python. Mark Redekopp

PROGRAMMING FUNDAMENTALS

Python Day 3 11/28/16

Professor: Sana Odeh Lecture 3 Python 3.1 Variables, Primitive Data Types & arithmetic operators

Variables, expressions and statements

Getting Started Values, Expressions, and Statements CS GMU

There are two ways to use the python interpreter: interactive mode and script mode. (a) open a terminal shell (terminal emulator in Applications Menu)

Python Unit

Python 2 Conditionals and loops Matthew Egbert CS111

Python Games. Session 1 By Declan Fox

Fundamentals of Programming Session 4

CSE 115. Introduction to Computer Science I

printf( Please enter another number: ); scanf( %d, &num2);

Variable and Data Type I

Lab 1: Course Intro, Getting Started with Python IDLE. Ling 1330/2330 Computational Linguistics Na-Rae Han

Java Tutorial. Saarland University. Ashkan Taslimi. Tutorial 3 September 6, 2011

CS 115 Data Types and Arithmetic; Testing. Taken from notes by Dr. Neil Moore

Python Class-Lesson1 Instructor: Yao

UTORid: Lecture Section: (circle one): L0101 (MWF10) L0201 (MWF11) Instructor: Jacqueline Smith Jen Campbell

CSci 1113, Spring 2017 Lab Exercise 7 (Week 8): Strings

String Computation Program

9/5/2018. Overview. The C Programming Language. Transitioning to C from Python. Why C? Hello, world! Programming in C

Not-So-Mini-Lecture 6. Modules & Scripts

2 nd Week Lecture Notes

Advanced Python. Executive Summary, Session 1

This tutorial will teach you about operators. Operators are symbols that are used to represent an actions used in programming.

CS/IT 114 Introduction to Java, Part 1 FALL 2016 CLASS 10: OCT. 6TH INSTRUCTOR: JIAYIN WANG

R in Linguistic Analysis. Week 2 Wassink Autumn 2012

IT 374 C# and Applications/ IT695 C# Data Structures

"Hello" " This " + "is String " + "concatenation"

Introduction to Python. Data Structures

The C Programming Language. (with material from Dr. Bin Ren, William & Mary Computer Science)

There are two ways to use the python interpreter: interactive mode and script mode. (a) open a terminal shell (terminal emulator in Applications Menu)

The current topic: Python. Announcements. Python. Python

Chapter 2 Working with Data Types and Operators

2.1. Chapter 2: Parts of a C++ Program. Parts of a C++ Program. Introduction to C++ Parts of a C++ Program

Basic operators, Arithmetic, Relational, Bitwise, Logical, Assignment, Conditional operators. JAVA Standard Edition

Python I. Some material adapted from Upenn cmpe391 slides and other sources

Built-in Types of Data

Introduction to Python Code Quality

Chapter 2, Part I Introduction to C Programming

My First Command-Line Program

Scripting Languages. Diana Trandabăț

Invent Your Own Computer Games with Python

Language Reference Manual

Python language: Basics

ENGR 102 Engineering Lab I - Computation

1. BASICS OF PYTHON. JHU Physics & Astronomy Python Workshop Lecturer: Mubdi Rahman

SAMS Programming A/B. Lecture #1 Introductions July 3, Mark Stehlik

A Python programming primer for biochemists. BCH364C/394P Systems Biology/Bioinformatics Edward Marcotte, Univ of Texas at Austin

TOPIC 2 INTRODUCTION TO JAVA AND DR JAVA

Chapter 10: Creating and Modifying Text Lists Modules

Variable and Data Type I

Python Input, output and variables. Lecture 23 COMPSCI111/111G SS 2018

CHAPTER 2: Introduction to Python COMPUTER PROGRAMMING SKILLS

Installing and Using the Cisco Unity Express Script Editor

Lecture 3. Input, Output and Data Types

Transcription:

from scratch A primer for scientists working with Next-Generation- Sequencing data Chapter 1 Text output and manipulation

Chapter 1: text output and manipulation In this unit you will learn how to write text output to the shell storing values in variables the notion of a string different ways of manipulating and searching in strings basic data types supported by python

Text output print() function writes output to the console argument to print() is a string function Collection of one or more individual commands, identified by a name. Can return one or more values. argument Value passed to a function to be used within the function. A function can have zero to many arguments. example: print("hello, python world!")

Data types: string a string is a list of characters different kinds of quotes possible: 'Single quotes define strings.' "So do double quotes." '''Even triple quotes can be used. These can even contain multiple lines!''' You will work with strings A LOT! (e.g. DNA/Protein sequences, input/output of programs,...)

Creating your first script Typing commands into the command line interface will get annoying quickly, so let's create a script file: 1. open your text editor 2. type these lines: #! /usr/bin/env python3 print("hello, python world!") 3. save the file as hello.py

Running your first script 1. open up a terminal 2. navigate to the location of the file you just created 3. type $ python3 hello.py 4. admire the output 5. go play! (modify your script, see what works and what doesn't)

Comments Comments are a very important means of documenting your work. See them as guides to future you or a colleague to understand your code. In the Linux/Unix world (including Mac), there is a special kind of comment, we already used. Recall our script: #! /usr/bin/env python3 print("hello, python world!") # print msg "shebang" tells the shell which program to use to execute the script

Printing special characters useful for formatting: '\n': newline character '\t': tabulator (TAB) character examples: >>> print("hello\npython world!") Hello python world! >>> print("hello\n\tpython world!") Hello python world!

Variables a variable contains a value of a certain type (e.g. a string) variable names are arbitrary, everything is allowed except: names that start with a number reserved keywords (e.g. "print") special characters (e.g. " ", " ",...) variable names are case-sensitive ("dna" "Dna") choose clear, self-documenting names! examples: x = "ATTagg" # bad name my_dna = "ATTagg" # better seq_count = 3 # numeric value

String manipulation concatenation: print("hello" + " " + "world") length: len("hello world") returns an int (integer number) changing case all upper case: all lower case: my_dna = "ATTagg" my_dna.upper() my_dna.lower() replacement: my_rna = my_dna.replace("t", "U")

Getting parts of a string strings support indexing: "AUUGC"[1:3] syntax: string[<start>:<end>] positions are counted up from 0, not 1! <start> is inclusive <end> is exclusive a single index gives one character >>> "AUUGC"[1:3] "UU" >>> "AUUGC"[0] "A" This notation is very important! We will make extensive use of it.

Counting and searching Count occurrences of a substring: You can also find the location of the first occurrence: >>> "TATATCGC".count("T") 3 >>> "TATATCGC".count("AT") 2 >>> "TATATCGC".find("TA") 0 >>> "TATATCGC".find("X") -1 For more useful string functions see: http://docs.python.org/3/library/stdtypes.html#string-methods

Data types variables can store values of certain types: str type explanation example strings are sequences of characters, textual data int integral numbers 42 "ACGT" float floating point number 3.14, 42.0 bool a logical value True, False You can convert a value/variable to a different type by using the type name as a function: >>> str(42) '42'

Recap (one) So far we have seen the difference between functions, statements and arguments the importance of comments and how to use them how to store values in variables the way that data types work, and the importance of understanding them the difference between functions and methods, and how to use them

Recap (two) We've encountered some tools that are specifically for working with strings: different types of quotes and how to use them special characters concatenation changing the case of a string finding and counting substrings replacing bits of a string extracting bits of a string to make a new string

Getting help python interactive help is right there at your fingertips: >>> help() Welcome to Python 3.4! This is the interactive help utility. [ ] >>> help(str) the online tutorial is also very helpful: http://docs.python.org/3.4/tutorial/

Exercise 1-1: GC-content You are given the following DNA sequence: ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT Write a program that will print out the GC content of this DNA sequence. Hints: Use the len function and the count method you can use normal mathematical symbols like add (+), subtract (-), multiply (*), divide (/) and parentheses to carry out calculations on numbers in Python. When using python 2 include this line at the top of your script! from future import division

Exercise 1-2: complementing DNA You are given the following DNA sequence: ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT Write a program that will print out the complement of this DNA sequence. Hint: the replace method is your friend here

Exercise 1-3: restriction fragment lengths Here's a short DNA sequence: ACTGATCGATTACGTATAGTAGAATTCTATCATACATATATATCGATGCGTTCAT The sequence contains a recognition site for the EcoRI restriction enzyme, which cuts at the motif G*AATTC (the position of the cut is indicated by an asterisk). Write a program which will calculate the size of the two fragments that will be produced when the DNA sequence is digested with EcoRI.

Exercise 1-4: Splicing out introns Here's a short section of genomic DNA: ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTCGATCG ATCGATCGATCGATCGATCGATCGATCGATCATGCTATCATCGATCGATATCGA TGCATCGACTACTAT It comprises two exons and an intron. The first exon runs from the start of the sequence to character 63, and the second exon runs from character 91 to the end of the sequence. Write a program that will print just the coding regions of the DNA sequence.

Excercise 1-4: Splicing out introns (pt. 2) (a) Using the data from part one, write a program that will calculate what percentage of the DNA sequence is coding. (b) Using the data from part one, write a program that will print out the original genomic DNA sequence with coding bases in uppercase and non-coding bases in lowercase.