Strings. Genome 373 Genomic Informatics Elhanan Borenstein

Similar documents
Strings. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Genome 373: Intro to Python I. Doug Fowler

Introduction to Python. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

COMP1730/COMP6730 Programming for Scientists. Strings

Introduction to Python. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Python Intro GIS Week 1. Jake K. Carr

Python Class-Lesson1 Instructor: Yao

Introduction to Python - Python I

Lecture 3. Strings, Functions, & Modules

Getting Started Values, Expressions, and Statements CS GMU

MEIN 50010: Python Strings

Sequence of Characters. Non-printing Characters. And Then There Is """ """ Subset of UTF-8. String Representation 6/5/2018.

Genome 373: Intro to Python II. Doug Fowler

Sequence types. str and bytes are sequence types Sequence types have several operations defined for them. Sequence Types. Python

18.1. CS 102 Unit 18. Python. Mark Redekopp

Working with Strings. Husni. "The Practice of Computing Using Python", Punch & Enbody, Copyright 2013 Pearson Education, Inc.

File input and output and conditionals. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Python I. Some material adapted from Upenn cmpe391 slides and other sources

from scratch A primer for scientists working with Next-Generation- Sequencing data Chapter 1 Text output and manipulation

for loops Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

The Practice of Computing Using PYTHON. Chapter 4. Working with Strings. Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

>>> * *(25**0.16) *10*(25**0.16)

Introduction to Python

Numbers, lists and tuples. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Class extension and. Exception handling. Genome 559

Python Strings. Stéphane Vialette. LIGM, Université Paris-Est Marne-la-Vallée. September 25, 2012

File input and output if-then-else. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Algorithms and Programming I. Lecture#12 Spring 2015

Computer Science 121. Scientific Computing Winter 2016 Chapter 3 Simple Types: Numbers, Text, Booleans

Scripting Languages. Python basics

for loops Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Advanced Python. Executive Summary, Session 1

\n is used in a string to indicate the newline character. An expression produces data. The simplest expression

'...' "..." escaping \u hhhh hhhh '''...''' """...""" raw string Example: r"abc\txyz\n" in code;

DEBUGGING TIPS. 1 Introduction COMPUTER SCIENCE 61A

Exception Handling. Genome 559

Python Day 3 11/28/16

Language Reference Manual

Practical Programming, 2nd Edition

Programming in Python

Sequences and Loops. Indices: accessing characters in a string. Old friend: isvowel. Motivation: How to count the number of vowels in a word?

Computing with Strings. Learning Outcomes. Python s String Type 9/23/2012

10/9/07. < Moo? > \ ^ ^ \ (oo)\ ( )\ )\/\ ----w

Programming to Python

Chapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines.

Monty Python and the Holy Grail (1975) BBM 101. Introduction to Programming I. Lecture #03 Introduction to Python and Programming, Control Flow

Lesson 4: Type Conversion, Mutability, Sequence Indexing. Fundamentals of Text Processing for Linguists Na-Rae Han

Sequences and Loops. Indices: accessing characters in a string. Old friend: isvowel. Motivation: How to count the number of vowels in a word?

Introduction to Computer Programming in Python Dr. William C. Bulko. Data Types

Python for C programmers

Programming in Python 3

Python. Karin Lagesen.

Variables, Constants, and Data Types

Class extension and. Exception handling. Genome 559

Lists, loops and decisions

Strings are actually 'objects' Strings

Python for Non-programmers

Lists and the for loop

Introduction to Python

COMP 364: Functions II

COMP519 Web Programming Lecture 17: Python (Part 1) Handouts

Not-So-Mini-Lecture 6. Modules & Scripts

Getting started with Java

Interactive use. $ python. >>> print 'Hello, world!' Hello, world! >>> 3 $ Ctrl-D

CMSC201 Computer Science I for Majors

Lecture 4: Basic I/O

Table of Contents Date(s) Title/Topic Page #s. Abstraction

String Computation Program

CMSC201 Computer Science I for Majors

Python 1: Intro! Max Dougherty Andrew Schmitt

Worksheet 6: Basic Methods Methods The Format Method Formatting Floats Formatting Different Types Formatting Keywords

Python and Bioinformatics. Pierre Parutto

Compound Data Types 1

Exceptions CS GMU

What did we talk about last time? Examples switch statements

Interactive use. $ python. >>> print 'Hello, world!' Hello, world! >>> 3 $ Ctrl-D

Python: common syntax

1. What type of error produces incorrect results but does not prevent the program from running? a. syntax b. logic c. grammatical d.

Lists How lists are like strings

CSCI 121: Anatomy of a Python Script

EE 355 Unit 17. Python. Mark Redekopp

COMP1730/COMP6730 Programming for Scientists. Data: Values, types and expressions.

ITC213: STRUCTURED PROGRAMMING. Bhaskar Shrestha National College of Computer Studies Tribhuvan University

A Little Python Part 1. Introducing Programming with Python

First Java Program - Output to the Screen

Python Tutorial. Day 1

CMPT 120 Basics of Python. Summer 2012 Instructor: Hassan Khosravi

Python. Executive Summary

Python for Bioinformatics Fall 2014

Basic Scripting, Syntax, and Data Types in Python. Mteor 227 Fall 2017

Strings and Testing string methods, formatting testing approaches CS GMU

Practical Bioinformatics

MEIN 50010: Python Flow Control

MUTABLE LISTS AND DICTIONARIES 4

"Hello" " This " + "is String " + "concatenation"

STSCI Python Introduction. Class URL

CIS192 Python Programming. Robert Rand. August 27, 2015

Chapter 8: More About Strings. COSC 1436, Summer 2018 Dr. Zhang 7/10/2018

Script language: Python Data and files

Java Bytecode (binary file)

Transcription:

Strings Genome 373 Genomic Informatics Elhanan Borenstein

print hello, world pi = 3.14159 pi = -7.2 yet_another_var = pi + 10 print pi import math log10 = math.log(10) import sys arg1 = sys.argv[1] arg2 = sys.argv[2] print arg1, ":", arg2

Programs vs. Interpreter Writing and running a program: Running code in the interpreter: print hello, world! >>> print hello, world! hello, world! >python hello.py hello, world!

Strings A string type object is a sequence of characters. In Python, strings start and end with single or double quotes (they are equivalent but they have to match). >>> s = "foo" >>> print s foo >>> s = 'Foo' >>> print s Foo >>> s = "foo' SyntaxError: EOL while scanning string literal (EOL means end-of-line; to the Python interpreter there was no closing double quote before the end of line)

Defining strings Each string is stored in computer memory as an array of characters. >>> mystring = "GATTACA" mystring computer memory (7 bytes) In effect, the variable mystring consists of a pointer to the position in computer memory (the address) of the 0 th byte above. Every byte in your computer memory has a unique integer address. How many bytes are needed to store the human genome? (3 billion nucleotides)

Accessing single characters You can access individual characters by using indices in square brackets. >>> mystring = "GATTACA" >>> mystring[0] 'G' >>> mystring[2] 'T' >>> mystring[-1] 'A' Negative indices start at the >>> mystring[-2] end of the string and move left. 'C' >>> mystring[7] Traceback (most recent call last): File "<stdin>", line 1, in? IndexError: string index out of range FYI - when you request mystring[n] Python adds n to the memory address of the string and returns that byte from memory.

Accessing substrings ("slicing") >>> mystring = "GATTACA" >>> mystring[1:3] 'AT' >>> mystring[:3] 'GAT' >>> mystring[4:] 'ACA' >>> mystring[3:5] 'TA' >>> mystring[:] 'GATTACA' shorthand for beginning or end of string notice that the length of the returned string [x:y] is y - x

Special characters >>> print "He said "Wow!"" SyntaxError: invalid syntax The backslash is used to introduce a special character. >>> print "He said \"Wow!\"" He said "Wow!" >>> print "He said:\nwow!" He said: Wow! Escape sequence Meaning \ Single quote \ Double quote \n Newline \t Tab \\ Backslash

More string functionality >>> len("gattaca") 7 >>> print "GAT" + "TACA" GATTACA >>> print "A" * 10 AAAAAAAAAA >>> "GAT" in "GATTACA" True >>> "AGT" in "GATTACA" False >>> temp = "GATTACA" >>> temp2 = temp[1:4] >>> print temp2 ATT >>> print temp GATTACA Length Concatenation Repeat (you can read this as is GAT in GATTACA? ) Substring tests Assign a string slice to a variable name

String methods In Python, a method is a function that is defined with respect to a particular object. The syntax is: object.method(arguments) or object.method() - no arguments >>> dna = "ACGT" >>> dna.find("t") 3 the first position where T appears object (in this case a string object) string method method argument

String methods >>> s = "GATTACA" >>> s.find("att") 1 >>> s.count("t") 2 >>> s.lower() 'gattaca' >>> s.upper() 'GATTACA' >>> s.replace("g", "U") 'UATTACA' >>> s.replace("c", "U") 'GATTAUA' >>> s.replace("at", "**") 'G**TACA' >>> s.startswith("g") True >>> s.startswith("g") False Function with no arguments Function with two arguments

Strings are immutable Strings cannot be modified; instead, create a new string from the old one using assignment. >>> s = "GATTACA" >>> s[0] = "R" Traceback (most recent call last): File "<stdin>", line 1, in? TypeError: 'str' object doesn't support item assignment >>> w = "R" + s[1:] >>> print w RATTACA >>> print s GATTACA >>> s = R + s[1:] # THIS WILL WORK! >>> print s RATTACA

Strings are immutable String methods do not modify the string; they return a new string. >>> seq = "ACGT" >>> print seq.replace("a", "G") GCGT >>> print seq ACGT assign the result from the right to a variable name >>> new_seq = seq.replace("a", "G") >>> print new_seq GCGT >>> print seq ACGT

String summary Basic string operations: S = "AATTGG" # literal assignment - or use single quotes ' ' s1 + s2 # concatenate S * 3 # repeat string S[i] # get character at position 'i' S[x:y] # get a substring len(s) # get length of string int(s) # turn a string into an integer float(s) # turn a string into a floating point number Methods: S.upper() S.lower() S.count(substring) S.replace(old,new) S.find(substring) S.startswith(substring) S.endswith(substring) Printing: print var1,var2,var3 print "text",var1,"text" # is a special character everything after it is a comment, which the program will ignore USE LIBERALLY!! # print multiple variables # print a combination of text and vars

Class problem #1 Write a program called dna2rna.py that reads a DNA sequence from the first command line argument and prints it as an RNA sequence. Make sure it retains the case of the input. > python dna2rna.py ACTCAGT ACUCAGU > python dna2rna.py actcagt acucagu > python dna2rna.py ACTCagt ACUCagu

OR Two solutions import sys seq = sys.argv[1] new_seq = seq.replace("t", "U") newer_seq = new_seq.replace("t", "u") print newer_seq import sys print sys.argv[1] (to be continued)

OR Two solutions import sys seq = sys.argv[1] new_seq = seq.replace("t", "U") newer_seq = new_seq.replace("t", "u") print newer_seq import sys print sys.argv[1].replace("t","u") (to be continued)

Two solutions import sys seq = sys.argv[1] new_seq = seq.replace("t", "U") newer_seq = new_seq.replace("t", "u") print newer_seq OR import sys print sys.argv[1].replace("t","u").replace("t","u") It is legal (but not always desirable) to chain together multiple methods on a single line. Think through what the second program does until you understand why it works.

Tips: Reduce coding errors - get in the habit of always being aware what type of object each of your variables refers to. Use informative variable names. Build your program bit by bit and check that it functions at each step by running it.