warm-up exercise Representing Data Digitally goals for today proteins example from nature

Size: px
Start display at page:

Download "warm-up exercise Representing Data Digitally goals for today proteins example from nature"

Transcription

1 Representing Data Digitally Anne Condon September 6, 007 warm-up exercise pick two examples of in your everyday life* in what media are the is represented? is the converted from one representation to another, as you use the? How? how does a particular representation of the influence what you can do with the? might any errors arise when you use the? *one example should be computer-related, one not...try for examples no-one else might think of goals for today be able to define a representation scheme understand, and start working on, course learning goals on representation let s start with two examples example from nature proteins are critical to life organisms need ways to store and transmit descriptions of proteins proteins beads on a necklace, with 0 different bead types (amino acids) what medium, and what representation scheme does nature use? news/science_ed/structlife/

2 DNA storage medium in the cell a (double-stranded) bead necklace with four different kinds of beads (bases, nucleotides): A,C,G, T genes to keep our body functioning, proteins are constantly manufactured in our cells genes segments of our DNA store descriptions for proteins the genetic code specifies how a protein (sequence of amino acids) can be represented as DNA (sequence of bases) the genetic code the genetic code: example TTC phenylalanine TTA leucine TTG leucine CTT leucine CTC leucine CTA leucine CTG leucine ATT isoleucine ATA isoleucine GTT valine GTA valine TCC serine TCA serine TCG serine CCT proline CCC proline CCA proline CCG proline ACT threonine ACA threonine GCT alanine GCA alanine TAC tyrosine TAA stop TAG stop CAT histidine CAC histidine CAA glutamine CAG glutamine AAT asparagine AAA lysine GAT aspartic acid GAA glutamic acid TGC cysteine TGA stop TGG tryptophan CGT arginine CGC arginine CGA arginine CGG arginine AGT serine AGA arginine GGT glycine GGA glycine one code for methionine isoleucine phenelalanine aspartic acid glycine is... (partial table needed for example) the genetic code: example one code for methionine isoleucine phenelalanine aspartic acid glycine is... ATGATCTTTGACGGG (partial table needed for example)

3 suppose you have no paper, and need to describe a connect-the-dots representation using your voice. How could you do it? dimension of the grid (in cm),,,,,,,,,, first dot third dot fifth dot second dot fourth dot 0 0,,,,,,,,,, a continuous-line drawing can be represented as a sequence of dots drawn on a page a sequence of numbers, which lists the coordinates of the dots on a D grid (preceded by dimension of grid in cm) activity make a continuous-line drawing on a piece of paper, and represent it as best you can using a sequence of dots how did you decide on the number of dots to use? where to position the dots? what principles would you suggest in general, for selecting the dots to represent a picture?

4 a representation scheme a description of how of one type (source ) can be represented using of another type (encoded ) source representation scheme encoded why study representation schemes? central to CS: designing representation schemes to balance engineering (e.g. errors) with usability (e.g. aesthetic) considerations is major activity useful: facility with helps in other fields, and in everyday life fun: designing and critiquing representation schemes is creative, and a new lens with which you view your world example: representation schemes in CS polygonal representation of surfaces (see work of Alla Sheffer in CS) this is the D version of connect-the-dots! course learning goals on describe properties of representation schemes that can be found in many contexts of the world around you critique properties of representation schemes, from the stand-point of usability and engineering considerations, given information about the context in which the scheme is used course learning goals on engage in design of schemes, for example by proposing modifications that address shortcomings of given representation schemes put your knowledge to practical use, for example, in making decisions about representing your own, or deciding you want to go into CS, or applying the knowledge in your own field

5 scheme property : digital vs not digital digital scheme: encoded is digital (source may be digital or analog) digital : comprised of symbols over a finite alphabet analog : not digital, e.g. continuous line scheme property : lossless vs lossy lossless: source can be reconstructed exactly from encoded lossy: not lossless scheme property : robustness in the face of errors this one is easiest to explain in context... let s go back to one example the genetic code: robustness in the face of errors what if a DNA base is copied incorrectly? ATGATCTTTGACGGG ATGATCTCTGACGGG what if a DNA base is deleted? ATGATCTTTGACGGG ATGATCTTGACGGG critique: what might this tell us about the cell s translation machinery? summary: some properties of schemes digital or not lossless or lossy robustness in the face of errors recall: goals for today be able to define a representation scheme understand, and start working on, course learning goals on representation do you remember the goals?

6 a representation scheme a description of how of one type (source ) can be represented using of another type (encoded ) source representation scheme encoded course learning goals on describe properties of representation schemes that can be found in many contexts of the world around you critique properties of representation schemes, from the stand-point of usability and engineering considerations, given information about the context in which the scheme is used course learning goals on engage in design of schemes, for example by proposing modifications that address shortcomings of given representation schemes put your knowledge to practical use, for example, in making decisions about representing your own, or deciding you want to go into CS, or applying the knowledge in your own field

Genome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner

Genome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner Genome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner Outline I. Problem II. Two Historical Detours III.Example IV.The Mathematics of DNA Sequencing V.Complications

More information

by the Genevestigator program (www.genevestigator.com). Darker blue color indicates higher gene expression.

by the Genevestigator program (www.genevestigator.com). Darker blue color indicates higher gene expression. Figure S1. Tissue-specific expression profile of the genes that were screened through the RHEPatmatch and root-specific microarray filters. The gene expression profile (heat map) was drawn by the Genevestigator

More information

HP22.1 Roth Random Primer Kit A für die RAPD-PCR

HP22.1 Roth Random Primer Kit A für die RAPD-PCR HP22.1 Roth Random Kit A für die RAPD-PCR Kit besteht aus 20 Einzelprimern, jeweils aufgeteilt auf 2 Reaktionsgefäße zu je 1,0 OD Achtung: Angaben beziehen sich jeweils auf ein Reaktionsgefäß! Sequenz

More information

Pyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds

Pyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds Pyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds February 27, 2009 Alexander Mastroianni, Shelley Claridge, A. Paul Alivisatos Department of Chemistry, University of California,

More information

Appendix A. Example code output. Chapter 1. Chapter 3

Appendix A. Example code output. Chapter 1. Chapter 3 Appendix A Example code output This is a compilation of output from selected examples. Some of these examples requires exernal input from e.g. STDIN, for such examples the interaction with the program

More information

6 Anhang. 6.1 Transgene Su(var)3-9-Linien. P{GS.ry + hs(su(var)3-9)egfp} 1 I,II,III,IV 3 2I 3 3 I,II,III 3 4 I,II,III 2 5 I,II,III,IV 3

6 Anhang. 6.1 Transgene Su(var)3-9-Linien. P{GS.ry + hs(su(var)3-9)egfp} 1 I,II,III,IV 3 2I 3 3 I,II,III 3 4 I,II,III 2 5 I,II,III,IV 3 6.1 Transgene Su(var)3-9-n P{GS.ry + hs(su(var)3-9)egfp} 1 I,II,III,IV 3 2I 3 3 I,II,III 3 4 I,II,II 5 I,II,III,IV 3 6 7 I,II,II 8 I,II,II 10 I,II 3 P{GS.ry + UAS(Su(var)3-9)EGFP} A AII 3 B P{GS.ry + (10.5kbSu(var)3-9EGFP)}

More information

Genome Reconstruction: A Puzzle with a Billion Pieces. Phillip Compeau Carnegie Mellon University Computational Biology Department

Genome Reconstruction: A Puzzle with a Billion Pieces. Phillip Compeau Carnegie Mellon University Computational Biology Department http://cbd.cmu.edu Genome Reconstruction: A Puzzle with a Billion Pieces Phillip Compeau Carnegie Mellon University Computational Biology Department Eternity II: The Highest-Stakes Puzzle in History Courtesy:

More information

Degenerate Coding and Sequence Compacting

Degenerate Coding and Sequence Compacting ESI The Erwin Schrödinger International Boltzmanngasse 9 Institute for Mathematical Physics A-1090 Wien, Austria Degenerate Coding and Sequence Compacting Maya Gorel Kirzhner V.M. Vienna, Preprint ESI

More information

TCGR: A Novel DNA/RNA Visualization Technique

TCGR: A Novel DNA/RNA Visualization Technique TCGR: A Novel DNA/RNA Visualization Technique Donya Quick and Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275 dquick@mail.smu.edu, mhd@engr.smu.edu

More information

SUPPLEMENTARY INFORMATION. Systematic evaluation of CRISPR-Cas systems reveals design principles for genome editing in human cells

SUPPLEMENTARY INFORMATION. Systematic evaluation of CRISPR-Cas systems reveals design principles for genome editing in human cells SUPPLEMENTARY INFORMATION Systematic evaluation of CRISPR-Cas systems reveals design principles for genome editing in human cells Yuanming Wang 1,2,7, Kaiwen Ivy Liu 2,7, Norfala-Aliah Binte Sutrisnoh

More information

Supplementary Table 1. Data collection and refinement statistics

Supplementary Table 1. Data collection and refinement statistics Supplementary Table 1. Data collection and refinement statistics APY-EphA4 APY-βAla8.am-EphA4 Crystal Space group P2 1 P2 1 Cell dimensions a, b, c (Å) 36.27, 127.7, 84.57 37.22, 127.2, 84.6 α, β, γ (

More information

Amino Acid Graph Representation for Efficient Safe Transfer of Multiple DNA Sequence as Pre Order Trees

Amino Acid Graph Representation for Efficient Safe Transfer of Multiple DNA Sequence as Pre Order Trees International Journal of Bioinformatics and Biomedical Engineering Vol. 1, No. 3, 2015, pp. 292-299 http://www.aiscience.org/journal/ijbbe Amino Acid Graph Representation for Efficient Safe Transfer of

More information

Biostatistics and Bioinformatics Molecular Sequence Databases

Biostatistics and Bioinformatics Molecular Sequence Databases . 1 Description of Module Subject Name Paper Name Module Name/Title 13 03 Dr. Vijaya Khader Dr. MC Varadaraj 2 1. Objectives: In the present module, the students will learn about 1. Encoding linear sequences

More information

Assignment 4. the three-dimensional positions of every single atom in the le,

Assignment 4. the three-dimensional positions of every single atom in the le, Assignment 4 1 Overview and Background Many of the assignments in this course will introduce you to topics in computational biology. You do not need to know anything about biology to do these assignments

More information

Building and Animating Amino Acids and DNA Nucleotides in ShockWave Using 3ds max

Building and Animating Amino Acids and DNA Nucleotides in ShockWave Using 3ds max 1 Building and Animating Amino Acids and DNA Nucleotides in ShockWave Using 3ds max MIT Center for Educational Computing Initiatives THIS PDF DOCUMENT HAS BOOKMARKS FOR NAVIGATION CLICK ON THE TAB TO THE

More information

A relation between trinucleotide comma-free codes and trinucleotide circular codes

A relation between trinucleotide comma-free codes and trinucleotide circular codes Theoretical Computer Science 401 (2008) 17 26 www.elsevier.com/locate/tcs A relation between trinucleotide comma-free codes and trinucleotide circular codes Christian J. Michel a,, Giuseppe Pirillo b,c,

More information

MLiB - Mandatory Project 2. Gene finding using HMMs

MLiB - Mandatory Project 2. Gene finding using HMMs MLiB - Mandatory Project 2 Gene finding using HMMs Viterbi decoding >NC_002737.1 Streptococcus pyogenes M1 GAS TTGTTGATATTCTGTTTTTTCTTTTTTAGTTTTCCACATGAAAAATAGTTGAAAACAATA GCGGTGTCCCCTTAAAATGGCTTTTCCACAGGTTGTGGAGAACCCAAATTAACAGTGTTA

More information

Digging into acceptor splice site prediction: an iterative feature selection approach

Digging into acceptor splice site prediction: an iterative feature selection approach Digging into acceptor splice site prediction: an iterative feature selection approach Yvan Saeys, Sven Degroeve, and Yves Van de Peer Department of Plant Systems Biology, Ghent University, Flanders Interuniversity

More information

TMRPres2D High quality visual representation of transmembrane protein models. User's manual

TMRPres2D High quality visual representation of transmembrane protein models. User's manual TMRPres2D High quality visual representation of transmembrane protein models Version 0.91 User's manual Ioannis C. Spyropoulos, Theodore D. Liakopoulos, Pantelis G. Bagos and Stavros J. Hamodrakas Department

More information

Machine Learning Classifiers

Machine Learning Classifiers Machine Learning Classifiers Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve Bayes Perceptrons, Multi-layer Neural Networks

More information

Crick s Hypothesis Revisited: The Existence of a Universal Coding Frame

Crick s Hypothesis Revisited: The Existence of a Universal Coding Frame Crick s Hypothesis Revisited: The Existence of a Universal Coding Frame Jean-Louis Lassez*, Ryan A. Rossi Computer Science Department, Coastal Carolina University jlassez@coastal.edu, raross@coastal.edu

More information

Supplementary Materials:

Supplementary Materials: Supplementary Materials: Amino acid codo n Numb er Table S1. Codon usage in all the protein coding genes. RSC U Proportion (%) Amino acid codo n Numb er RSC U Proportion (%) Phe UUU 861 1.31 5.71 Ser UCU

More information

(DNA#): Molecular Biology Computation Language Proposal

(DNA#): Molecular Biology Computation Language Proposal (DNA#): Molecular Biology Computation Language Proposal Aalhad Patankar, Min Fan, Nan Yu, Oriana Fuentes, Stan Peceny {ap3536, mf3084, ny2263, oif2102, skp2140} @columbia.edu Motivation Inspired by the

More information

Supplementary Data. Image Processing Workflow Diagram A - Preprocessing. B - Hough Transform. C - Angle Histogram (Rose Plot)

Supplementary Data. Image Processing Workflow Diagram A - Preprocessing. B - Hough Transform. C - Angle Histogram (Rose Plot) Supplementary Data Image Processing Workflow Diagram A - Preprocessing B - Hough Transform C - Angle Histogram (Rose Plot) D - Determination of holes Description of Image Processing Workflow The key steps

More information

2 41L Tag- AA GAA AAA ATA AAA GCA TTA RYA GAA ATT TGT RMW GAR C K65 Tag- A AAT CCA TAC AAT ACT CCA GTA TTT GCY ATA AAG AA

2 41L Tag- AA GAA AAA ATA AAA GCA TTA RYA GAA ATT TGT RMW GAR C K65 Tag- A AAT CCA TAC AAT ACT CCA GTA TTT GCY ATA AAG AA 176 SUPPLEMENTAL TABLES 177 Table S1. ASPE Primers for HIV-1 group M subtype B Primer no Type a Sequence (5'-3') Tag ID b Position c 1 M41 Tag- AA GAA AAA ATA AAA GCA TTA RYA GAA ATT TGT RMW GAR A d 45

More information

Positional Amino Acid Frequency Patterns for Automatic Protein Annotation

Positional Amino Acid Frequency Patterns for Automatic Protein Annotation UNIVERSIDADE DE LISBOA FACULDADE DE CIÊNCIAS DEPARTAMENTO DE INFORMÁTICA Positional Amino Acid Frequency Patterns for Automatic Protein Annotation Mestrado em Bioinformática e Biologia Computacional Bioinformática

More information

高通量生物序列比對平台 : myblast

高通量生物序列比對平台 : myblast 高通量生物序列比對平台 : myblast A Customized BLAST Platform For Genomics, Transcriptomis And Proteomics With Paralleled Computing On Your Desktop 呂怡萱 Linda Lu 2013.09.12. What s BLAST Sequence in FASTA format FASTA

More information

Feed Check Sample No Meat and Bone Meal (Pork) Association of American Feed Control Officials

Feed Check Sample No Meat and Bone Meal (Pork) Association of American Feed Control Officials Feed Check Sample No. - 200997 Meat and Bone Meal (Pork) Association of American Feed Control Officials - Pass 1 Results for 193 Labs - - Pass 2 Results for 192 Labs - No. Average No. Average AOAC Method

More information

LABORATORY STANDARD OPERATING PROCEDURE FOR PULSENET CODE: PNL28 MLVA OF SHIGA TOXIN-PRODUCING ESCHERICHIA COLI

LABORATORY STANDARD OPERATING PROCEDURE FOR PULSENET CODE: PNL28 MLVA OF SHIGA TOXIN-PRODUCING ESCHERICHIA COLI 1. PURPOSE: to describe the standardized laboratory protocol for molecular subtyping of Shiga toxin-producing Escherichia coli O157 (STEC O157) and Salmonella enterica serotypes Typhimurium and Enteritidis.

More information

Feed Check Sample No Preconditioning/Receiving Chow, Med Association of American Feed Control Officials

Feed Check Sample No Preconditioning/Receiving Chow, Med Association of American Feed Control Officials Feed Check Sample No. - 200929 Preconditioning/Receiving Chow, Med Association of American Feed Control Officials - Pass 1 Results for 212 Labs - - Pass 2 Results for 211 Labs - No. Average No. Average

More information

Feed Check Sample No Foundation Cattle Mineral, Medicated Association of American Feed Control Officials

Feed Check Sample No Foundation Cattle Mineral, Medicated Association of American Feed Control Officials Feed Check Sample No. - 200927 Foundation Cattle Mineral, Medicated Association of American Feed Control Officials - Pass 1 Results for 170 Labs - - Pass 2 Results for 168 Labs - No. Average No. Average

More information

DNA Sequencing. Overview

DNA Sequencing. Overview BINF 3350, Genomics and Bioinformatics DNA Sequencing Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Eulerian Cycles Problem Hamiltonian Cycles

More information

Feed Check Sample No Chicken Starter/Grower, Medicated Association of American Feed Control Officials

Feed Check Sample No Chicken Starter/Grower, Medicated Association of American Feed Control Officials Feed Check Sample No. - 200926 Chicken Starter/Grower, Medicated Association of American Feed Control Officials - Pass 1 Results for 207 Labs - - Pass 2 Results for 206 Labs - No. Average No. Average AOAC

More information

Programming Applications. What is Computer Programming?

Programming Applications. What is Computer Programming? Programming Applications What is Computer Programming? An algorithm is a series of steps for solving a problem A programming language is a way to express our algorithm to a computer Programming is the

More information

3D-Dock. incorporating FTDock (version 2.0), RPScore, and Multidock. March Introduction Key to font usage Requirements...

3D-Dock. incorporating FTDock (version 2.0), RPScore, and Multidock. March Introduction Key to font usage Requirements... 3D-Dock incorporating FTDock (version 2.0), RPScore, and Multidock Gidon Moont, Graham R. Smith and Michael J. E. Sternberg March 2001 Contents 1 Introduction 3 1.1 Key to font usage.................................

More information

Efficient Selection of Unique and Popular Oligos for Large EST Databases. Stefano Lonardi. University of California, Riverside

Efficient Selection of Unique and Popular Oligos for Large EST Databases. Stefano Lonardi. University of California, Riverside Efficient Selection of Unique and Popular Oligos for Large EST Databases Stefano Lonardi University of California, Riverside joint work with Jie Zheng, Timothy Close, Tao Jiang University of California,

More information

GPRO 1.0 THE PROFESSIONAL TOOL FOR SEQUENCE ANALYSIS/ANNOTATION AND MANAGEMENT OF OMIC DATABASES. (February 2011)

GPRO 1.0 THE PROFESSIONAL TOOL FOR SEQUENCE ANALYSIS/ANNOTATION AND MANAGEMENT OF OMIC DATABASES. (February 2011) The user guide you are about to check may not be thoroughly updated with regard to the last downloadable version of the software. GPRO software is under continuous development as an ongoing effort to improve

More information

Molecular Evolutionary Genetics Analysis version Sudhir Kumar, Koichiro Tamura and Masatoshi Nei

Molecular Evolutionary Genetics Analysis version Sudhir Kumar, Koichiro Tamura and Masatoshi Nei CP P and MEGA manual Molecular Evolutionary Genetics Analysis version 1.01 Sudhir Kumar, Koichiro Tamura and Masatoshi Nei MEGA is distributed with a nominal fee to defray the cost of producing the user

More information

A Novel Implementation of an Extended 8x8 Playfair Cipher Using Interweaving on DNA-encoded Data

A Novel Implementation of an Extended 8x8 Playfair Cipher Using Interweaving on DNA-encoded Data International Journal of Electrical and Computer Engineering (IJECE) Vol. 4, No. 1, Feburary 2014, pp. 93~100 ISSN: 2088-8708 93 A Novel Implementation of an Extended 8x8 Playfair Cipher Using Interweaving

More information

Structural analysis and haplotype diversity in swine LEP and MC4R genes

Structural analysis and haplotype diversity in swine LEP and MC4R genes J. Anim. Breed. Genet. ISSN - OIGINAL ATICLE Structural analysis and haplotype diversity in swine LEP and MC genes M. D Andrea, F. Pilla, E. Giuffra, D. Waddington & A.L. Archibald University of Molise,

More information

Supporting Information

Supporting Information Copyright WILEY VCH Verlag GmbH & Co. KGaA, 69469 Weinheim, Germany, 2015. Supporting Information for Small, DOI: 10.1002/smll.201501370 A Compact DNA Cube with Side Length 10 nm Max B. Scheible, Luvena

More information

Understanding the content of HyPhy s JSON output files

Understanding the content of HyPhy s JSON output files Understanding the content of HyPhy s JSON output files Stephanie J. Spielman July 2018 Most standard analyses in HyPhy output results in JSON format, essentially a nested dictionary. This page describes

More information

Due Thursday, July 18 at 11:00AM

Due Thursday, July 18 at 11:00AM CS106B Summer 2013 Handout #10 July 10, 2013 Assignment 3: Recursion! Parts of this handout were written by Julie Zelenski, Jerry Cain, and Eric Roberts. This assignment consists of four recursive functions

More information

Classification of biological sequences with kernel methods

Classification of biological sequences with kernel methods Classification of biological sequences with kernel methods Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Centre for Computational Biology Ecole des Mines de Paris, ParisTech International Conference on

More information

de novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics Next Generation Sequencing Analysis

de novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics Next Generation Sequencing Analysis de novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics 27626 - Next Generation Sequencing Analysis Generalized NGS analysis Data size Application Assembly: Compare

More information

Genomic Perl. From Bioinformatics Basics to Working Code REX A. DWYER. Genomic Perl Consultancy, Inc.

Genomic Perl. From Bioinformatics Basics to Working Code REX A. DWYER. Genomic Perl Consultancy, Inc. Genomic Perl From Bioinformatics Basics to Working Code REX A. DWYER Genomic Perl Consultancy, Inc. published by the press syndicate of the university of cambridge The Pitt Building, Trumpington Street,

More information

de Bruijn graphs for sequencing data

de Bruijn graphs for sequencing data de Bruijn graphs for sequencing data Rayan Chikhi CNRS Bonsai team, CRIStAL/INRIA, Univ. Lille 1 SMPGD 2016 1 MOTIVATION - de Bruijn graphs are instrumental for reference-free sequencing data analysis:

More information

CSCI2950-C Lecture 4 DNA Sequencing and Fragment Assembly

CSCI2950-C Lecture 4 DNA Sequencing and Fragment Assembly CSCI2950-C Lecture 4 DNA Sequencing and Fragment Assembly Ben Raphael Sept. 22, 2009 http://cs.brown.edu/courses/csci2950-c/ l-mer composition Def: Given string s, the Spectrum ( s, l ) is unordered multiset

More information

An Efficient Mining for Approximate Frequent Items in Protein Sequence Database

An Efficient Mining for Approximate Frequent Items in Protein Sequence Database An Efficient Mining for Approximate Frequent Items in Protein Sequence Database J. Jeyabharathi 1, Dr.D. Shanthi 2 1 Associate Professor, Department of Computer Science and Engineering, C.R. Engineering

More information

OFFICE OF RESEARCH AND SPONSORED PROGRAMS

OFFICE OF RESEARCH AND SPONSORED PROGRAMS OFFICE OF RESEARCH AND SPONSORED PROGRAMS June 9, 2016 Mr. Satoshi Harada Department of Innovation Research Japan Science and Technology Agency (JST) K s Gobancho, 7, Gobancho, Chiyoda-ku, Tokyo, 102-0076

More information

DNA Fragment Assembly

DNA Fragment Assembly Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri DNA Fragment Assembly Overlap

More information

LOAD SCHEDULING FOR BIOINFORMATICS APPLICATIONS IN LARGE SCALE NETWORKS SUDHA GUNTURU. Bachelor of Technology in Computer Science

LOAD SCHEDULING FOR BIOINFORMATICS APPLICATIONS IN LARGE SCALE NETWORKS SUDHA GUNTURU. Bachelor of Technology in Computer Science LOAD SCHEDULING FOR BIOINFORMATICS APPLICATIONS IN LARGE SCALE NETWORKS By SUDHA GUNTURU Bachelor of Technology in Computer Science Jawaharlal Nehru Technological University Hyderabad, Andhra Pradesh 2005

More information

Figure 2.1: Simple model of a communication system

Figure 2.1: Simple model of a communication system Chapter 2 Codes In the previous chapter we examined the fundamental unit of information, the bit, and its various abstract representations: the mathematical bit, the control bit, the classical bit, and

More information

Channel. Figure 2.1: Simple model of a communication system

Channel. Figure 2.1: Simple model of a communication system Chapter 2 Codes In the previous chapter we examined the fundamental unit of information, the bit, and its various abstract representations: the Boolean bit (with its associated Boolean algebra and realization

More information

Sequence Assembly. BMI/CS 576 Mark Craven Some sequencing successes

Sequence Assembly. BMI/CS 576  Mark Craven Some sequencing successes Sequence Assembly BMI/CS 576 www.biostat.wisc.edu/bmi576/ Mark Craven craven@biostat.wisc.edu Some sequencing successes Yersinia pestis Cannabis sativa The sequencing problem We want to determine the identity

More information

Figure 2.1: Simple model of a communication system

Figure 2.1: Simple model of a communication system Chapter 2 Codes In the previous chapter we examined the fundamental unit of information, the bit, and its physical forms (the quantum bit and the classical bit), its classical mathematical model (the Boolean

More information

Axiom Patterns. COMP60421 Robert Stevens University of Manchester

Axiom Patterns. COMP60421 Robert Stevens University of Manchester Axiom Patterns COMP60421 Robert Stevens University of Manchester robert.stevens@manchester.ac.uk 1 Patterns of axioms An axiom pattern is a recurring regularity in how axioms are used or appear within

More information

Computational Methods for de novo Assembly of Next-Generation Genome Sequencing Data

Computational Methods for de novo Assembly of Next-Generation Genome Sequencing Data 1/39 Computational Methods for de novo Assembly of Next-Generation Genome Sequencing Data Rayan Chikhi ENS Cachan Brittany / IRISA (Genscale team) Advisor : Dominique Lavenier 2/39 INTRODUCTION, YEAR 2000

More information

Dynamic Programming: Sequence alignment. CS 466 Saurabh Sinha

Dynamic Programming: Sequence alignment. CS 466 Saurabh Sinha Dynamic Programming: Sequence alignment CS 466 Saurabh Sinha DNA Sequence Comparison: First Success Story Finding sequence similarities with genes of known function is a common approach to infer a newly

More information

Matriarch User's Guide

Matriarch User's Guide Matriarch User's Guide Written by: David I. Spivak, Tristan Giesa, Ravi Jagadeesan, and Markus J. Buehler Programmed by: Ravi Jagadeesan Laboratory for Atomistic and Molecular Mechanics, Department of

More information

Finding Selection in All the Right Places TA Notes and Key Lab 9

Finding Selection in All the Right Places TA Notes and Key Lab 9 Objectives: Finding Selection in All the Right Places TA Notes and Key Lab 9 1. Use published genome data to look for evidence of selection in individual genes. 2. Understand the need for DNA sequence

More information

Application of Nearest Neighbour Search techniques to Peptide identification from Mass Spectrometry

Application of Nearest Neighbour Search techniques to Peptide identification from Mass Spectrometry Escuela de Ingeniería en Computación Programa de Maestría en Computación Application of Nearest Neighbour Search techniques to Peptide identification from Mass Spectrometry A thesis submitted in partial

More information

Scalable Solutions for DNA Sequence Analysis

Scalable Solutions for DNA Sequence Analysis Scalable Solutions for DNA Sequence Analysis Michael Schatz Dec 4, 2009 JHU/UMD Joint Sequencing Meeting The Evolution of DNA Sequencing Year Genome Technology Cost 2001 Venter et al. Sanger (ABI) $300,000,000

More information

Global Alignment Scoring Matrices Local Alignment Alignment with Affine Gap Penalties

Global Alignment Scoring Matrices Local Alignment Alignment with Affine Gap Penalties Global Alignment Scoring Matrices Local Alignment Alignment with Affine Gap Penalties From LCS to Alignment: Change the Scoring The Longest Common Subsequence (LCS) problem the simplest form of sequence

More information

Graph Algorithms in Bioinformatics

Graph Algorithms in Bioinformatics Graph Algorithms in Bioinformatics Computational Biology IST Ana Teresa Freitas 2015/2016 Sequencing Clone-by-clone shotgun sequencing Human Genome Project Whole-genome shotgun sequencing Celera Genomics

More information

Bioinformatics Toolbox

Bioinformatics Toolbox Bioinformatics Toolbox For Use with MATLAB Computation Visualization Programming Reference Version 2 How to Contact The MathWorks: www.mathworks.com comp.soft-sys.matlab support@mathworks.com suggest@mathworks.com

More information

Eulerian Tours and Fleury s Algorithm

Eulerian Tours and Fleury s Algorithm Eulerian Tours and Fleury s Algorithm CSE21 Winter 2017, Day 12 (B00), Day 8 (A00) February 8, 2017 http://vlsicad.ucsd.edu/courses/cse21-w17 Vocabulary Path (or walk): describes a route from one vertex

More information

DNA Sequencing The Shortest Superstring & Traveling Salesman Problems Sequencing by Hybridization

DNA Sequencing The Shortest Superstring & Traveling Salesman Problems Sequencing by Hybridization Eulerian & Hamiltonian Cycle Problems DNA Sequencing The Shortest Superstring & Traveling Salesman Problems Sequencing by Hybridization The Bridge Obsession Problem Find a tour crossing every bridge just

More information

Sequencing. Computational Biology IST Ana Teresa Freitas 2011/2012. (BACs) Whole-genome shotgun sequencing Celera Genomics

Sequencing. Computational Biology IST Ana Teresa Freitas 2011/2012. (BACs) Whole-genome shotgun sequencing Celera Genomics Computational Biology IST Ana Teresa Freitas 2011/2012 Sequencing Clone-by-clone shotgun sequencing Human Genome Project Whole-genome shotgun sequencing Celera Genomics (BACs) 1 Must take the fragments

More information

Shortest Path Algorithm

Shortest Path Algorithm Shortest Path Algorithm C Works just fine on this graph. C Length of shortest path = Copyright 2005 DIMACS BioMath Connect Institute Robert Hochberg Dynamic Programming SP #1 Same Questions, Different

More information

OWL & FOL COMP Sean Bechhofer Uli Sattler

OWL & FOL COMP Sean Bechhofer Uli Sattler OWL & FOL COMP62342 Sean Bechhofer sean.bechhofer@manchester.ac.uk Uli Sattler uli.sattler@manchester.ac.uk 1 A reminder: quotations and citations Citations [4] inform us where you got an idea/approach/result/technique/

More information

PERFORMANCE ANALYSIS OF DATAMINIG TECHNIQUE IN RBC, WBC and PLATELET CANCER DATASETS

PERFORMANCE ANALYSIS OF DATAMINIG TECHNIQUE IN RBC, WBC and PLATELET CANCER DATASETS PERFORMANCE ANALYSIS OF DATAMINIG TECHNIQUE IN RBC, WBC and PLATELET CANCER DATASETS Mayilvaganan M 1 and Hemalatha 2 1 Associate Professor, Department of Computer Science, PSG College of arts and science,

More information

Eulerian tours. Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck. April 20, 2016

Eulerian tours. Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck.  April 20, 2016 Eulerian tours Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck http://cseweb.ucsd.edu/classes/sp16/cse21-bd/ April 20, 2016 Seven Bridges of Konigsberg Is there a path that crosses each

More information

10/15/2009 Comp 590/Comp Fall

10/15/2009 Comp 590/Comp Fall Lecture 13: Graph Algorithms Study Chapter 8.1 8.8 10/15/2009 Comp 590/Comp 790-90 Fall 2009 1 The Bridge Obsession Problem Find a tour crossing every bridge just once Leonhard Euler, 1735 Bridges of Königsberg

More information

Algorithms for Bioinformatics

Algorithms for Bioinformatics Adapted from slides by Alexandru Tomescu, Leena Salmela and Veli Mäkinen, which are partly from http://bix.ucsd.edu/bioalgorithms/slides.php 58670 Algorithms for Bioinformatics Lecture 5: Graph Algorithms

More information

Detecting Superbubbles in Assembly Graphs. Taku Onodera (U. Tokyo)! Kunihiko Sadakane (NII)! Tetsuo Shibuya (U. Tokyo)!

Detecting Superbubbles in Assembly Graphs. Taku Onodera (U. Tokyo)! Kunihiko Sadakane (NII)! Tetsuo Shibuya (U. Tokyo)! Detecting Superbubbles in Assembly Graphs Taku Onodera (U. Tokyo)! Kunihiko Sadakane (NII)! Tetsuo Shibuya (U. Tokyo)! de Bruijn Graph-based Assembly Reads (substrings of original DNA sequence) de Bruijn

More information

debgr: An Efficient and Near-Exact Representation of the Weighted de Bruijn Graph Prashant Pandey Stony Brook University, NY, USA

debgr: An Efficient and Near-Exact Representation of the Weighted de Bruijn Graph Prashant Pandey Stony Brook University, NY, USA debgr: An Efficient and Near-Exact Representation of the Weighted de Bruijn Graph Prashant Pandey Stony Brook University, NY, USA De Bruijn graphs are ubiquitous [Pevzner et al. 2001, Zerbino and Birney,

More information

Algorithms and Data Structures

Algorithms and Data Structures Algorithms and Data Structures Sorting beyond Value Comparisons Marius Kloft Content of this Lecture Radix Exchange Sort Sorting bitstrings in linear time (almost) Bucket Sort Marius Kloft: Alg&DS, Summer

More information

de novo assembly Rayan Chikhi Pennsylvania State University Workshop On Genomics - Cesky Krumlov - January /73

de novo assembly Rayan Chikhi Pennsylvania State University Workshop On Genomics - Cesky Krumlov - January /73 1/73 de novo assembly Rayan Chikhi Pennsylvania State University Workshop On Genomics - Cesky Krumlov - January 2014 2/73 YOUR INSTRUCTOR IS.. - Postdoc at Penn State, USA - PhD at INRIA / ENS Cachan,

More information

Multiple Biolgical Sequence Alignment: Scoring Functions, Algorithms, and Evaluations

Multiple Biolgical Sequence Alignment: Scoring Functions, Algorithms, and Evaluations Georgia State University ScholarWorks @ Georgia State University Computer Science Dissertations Department of Computer Science Fall 12-14-2011 Multiple Biolgical Sequence Alignment: Scoring Functions,

More information

CSE : Computational Issues in Molecular Biology. Lecture 7. Spring 2004

CSE : Computational Issues in Molecular Biology. Lecture 7. Spring 2004 CSE 397-497: Computational Issues in Molecular Biology Lecture 7 Spring 2004-1 - CSE seminar on Monday Title: Redundancy Elimination Within Large Collections of Files Speaker: Dr. Fred Douglis (IBM T.J.

More information

WSSP-10 Chapter 7 BLASTN: DNA vs DNA searches

WSSP-10 Chapter 7 BLASTN: DNA vs DNA searches WSSP-10 Chapter 7 BLASTN: DNA vs DNA searches 4-3 DSAP: BLASTn Page p. 7-1 NCBI BLAST Home Page p. 7-1 NCBI BLASTN search page p. 7-2 Copy sequence from DSAP or wave form program p. 7-2 Choose a database

More information

EECS 4425: Introductory Computational Bioinformatics Fall Suprakash Datta

EECS 4425: Introductory Computational Bioinformatics Fall Suprakash Datta EECS 4425: Introductory Computational Bioinformatics Fall 2018 Suprakash Datta datta [at] cse.yorku.ca Office: CSEB 3043 Phone: 416-736-2100 ext 77875 Course page: http://www.cse.yorku.ca/course/4425 Many

More information

STAN Manual by Anne-Sophie Valin, Patrick Durand, and Grégory Ranchy. Published May, 9th, 2005

STAN Manual by Anne-Sophie Valin, Patrick Durand, and Grégory Ranchy. Published May, 9th, 2005 STAN Manual STAN Manual by Anne-Sophie Valin, Patrick Durand, and Grégory Ranchy Published May, 9th, 2005 Revision History Revision 2.0 31/01/2007 Revised by: Laetitia Guillot Update of the screen printings,

More information

Parsimony-Based Approaches to Inferring Phylogenetic Trees

Parsimony-Based Approaches to Inferring Phylogenetic Trees Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 www.biostat.wisc.edu/bmi576.html Mark Craven craven@biostat.wisc.edu Fall 0 Phylogenetic tree approaches! three general types! distance:

More information

3. The object system(s)

3. The object system(s) 3. The object system(s) Thomas Lumley Ken Rice Universities of Washington and Auckland Seattle, June 2011 Generics and methods Many functions in R are generic. This means that the function itself (eg plot,

More information

Pattern Matching. An Introduction to File Globs and Regular Expressions

Pattern Matching. An Introduction to File Globs and Regular Expressions Pattern Matching An Introduction to File Globs and Regular Expressions Copyright 2006 2009 Stewart Weiss The danger that lies ahead Much to your disadvantage, there are two different forms of patterns

More information

Pattern Matching. An Introduction to File Globs and Regular Expressions. Adapted from Practical Unix and Programming Hunter College

Pattern Matching. An Introduction to File Globs and Regular Expressions. Adapted from Practical Unix and Programming Hunter College Pattern Matching An Introduction to File Globs and Regular Expressions Adapted from Practical Unix and Programming Hunter College Copyright 2006 2009 Stewart Weiss The danger that lies ahead Much to your

More information

10/8/13 Comp 555 Fall

10/8/13 Comp 555 Fall 10/8/13 Comp 555 Fall 2013 1 Find a tour crossing every bridge just once Leonhard Euler, 1735 Bridges of Königsberg 10/8/13 Comp 555 Fall 2013 2 Find a cycle that visits every edge exactly once Linear

More information

GEOMETRIC OPTIMIZATION IN SOME PROXIMITY AND BIOINFORMATICS PROBLEMS

GEOMETRIC OPTIMIZATION IN SOME PROXIMITY AND BIOINFORMATICS PROBLEMS University of Windsor Scholarship at UWindsor Electronic Theses and Dissertations 2014 GEOMETRIC OPTIMIZATION IN SOME PROXIMITY AND BIOINFORMATICS PROBLEMS Satish Chandra Panigrahi University of Windsor

More information

SEARCHING FOR REMOTELY HOMOLOGOUS SEQUENCES IN PROTEIN DATABASES WITH HYBRID PSI-BLAST

SEARCHING FOR REMOTELY HOMOLOGOUS SEQUENCES IN PROTEIN DATABASES WITH HYBRID PSI-BLAST SEARCHING FOR REMOTELY HOMOLOGOUS SEQUENCES IN PROTEIN DATABASES WITH HYBRID PSI-BLAST DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the Graduate

More information

Problem statement. CS267 Assignment 3: Parallelize Graph Algorithms for de Novo Genome Assembly. Spring Example.

Problem statement. CS267 Assignment 3: Parallelize Graph Algorithms for de Novo Genome Assembly. Spring Example. CS267 Assignment 3: Problem statement 2 Parallelize Graph Algorithms for de Novo Genome Assembly k-mers are sequences of length k (alphabet is A/C/G/T). An extension is a simple symbol (A/C/G/T/F). The

More information

Question 4: a. We want to store a binary encoding of the 150 original Pokemon. How many bits do we need to use?

Question 4: a. We want to store a binary encoding of the 150 original Pokemon. How many bits do we need to use? Question 4: a. We want to store a binary encoding of the 150 original Pokemon. How many bits do we need to use? b. What is the encoding for Pikachu (#25)? Question 2: Flippin Fo Fun (10 points, 14 minutes)

More information

How to Run NCBI BLAST on zcluster at GACRC

How to Run NCBI BLAST on zcluster at GACRC How to Run NCBI BLAST on zcluster at GACRC BLAST: Basic Local Alignment Search Tool Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu 1 OVERVIEW What is BLAST?

More information

Assembly in the Clouds

Assembly in the Clouds Assembly in the Clouds Michael Schatz October 13, 2010 Beyond the Genome Shredded Book Reconstruction Dickens accidentally shreds the first printing of A Tale of Two Cities Text printed on 5 long spools

More information

Purpose of sequence assembly

Purpose of sequence assembly Sequence Assembly Purpose of sequence assembly Reconstruct long DNA/RNA sequences from short sequence reads Genome sequencing RNA sequencing for gene discovery Amplicon sequencing But not for transcript

More information

Optimization of Molecular Dynamics Simulation Code and Applications to Biomolecular Systems

Optimization of Molecular Dynamics Simulation Code and Applications to Biomolecular Systems Optimization of Molecular Dynamics Simulation Code and Applications to Biomolecular Systems David M. Bowman Advisor: Dr. Paulo Martel, Faculty of Science and Technology, University of the Algarve Dissertation

More information

Lecture Overview. Sequence search & alignment. Searching sequence databases. Sequence Alignment & Search. Goals: Motivations:

Lecture Overview. Sequence search & alignment. Searching sequence databases. Sequence Alignment & Search. Goals: Motivations: Lecture Overview Sequence Alignment & Search Karin Verspoor, Ph.D. Faculty, Computational Bioscience Program University of Colorado School of Medicine With credit and thanks to Larry Hunter for creating

More information

Global Alignment. Algorithms in BioInformatics Mandatory Project 1 Magnus Erik Hvass Pedersen (971055) Daimi, University of Aarhus September 2004

Global Alignment. Algorithms in BioInformatics Mandatory Project 1 Magnus Erik Hvass Pedersen (971055) Daimi, University of Aarhus September 2004 1 Introduction Global Alignment Algorithms in BioInformatics Mandatory Project 1 Magnus Erik Hvass Pedersen (971055) Daimi, University of Aarhus September 2004 The purpose of this report is to verify attendance

More information

Genome 373: Genome Assembly. Doug Fowler

Genome 373: Genome Assembly. Doug Fowler Genome 373: Genome Assembly Doug Fowler What are some of the things we ve seen we can do with HTS data? We ve seen that HTS can enable a wide variety of analyses ranging from ID ing variants to genome-

More information