Public Repositories Tutorial: Bulk Downloads

Size: px
Start display at page:

Download "Public Repositories Tutorial: Bulk Downloads"

Transcription

1 Public Repositories Tutorial: Bulk Downloads Almost all of the public databases, genome browsers, and other tools you have explored so far offer some form of access to rapidly download all or large chunks of raw data. This access is usually provided via two possible routes: manual downloads and programmatic downloads. This tutorial covers manual downloads of bulk data from public repositories, which are performed with a Web browser, such as Firefox, IE, Chrome and Safari. Since utilizing programmatic downloads require a little bit of computer programming experience to use, we will mention them, as appropriate, in this tutorial, but will not provide worked examples. Why would you ever want to do bulk downloads? Sometimes you will need to gather many records, and rather than collect everything manually, one piece at a time, you can save a great deal of time by first downloading all or a large chunk of raw data, and then sort out from it what you actually need. Once you have the data in hand, there are a number of ways you can further manipulate it and extract the pieces that you actually want. These include using a program that reads and displays tabulated data (e.g., Microsoft Excel), using Unix commands, creating and loading the data into a database and performing SQL commands (e.g., MySQL), or writing your own computer programs to parse and manipulate the data (e.g., Perl or Python). Some background information about data formats Raw data that are represented as a single or multiple files (i.e., not in a relational database such as MySQL or Oracle) and are also commonly referred to as flat file data. The bioinformatics data in these files can be represented in many different formats. Most common bioinformatics flat file formats contain text characters (and are therefore viewable with a text viewer, such as Apple TextEdit, or Microsoft Word) and have a relatively simple and regular internal structure. These (with some of their file extensions) include the following: FASTA (*fasta, *fa, *ffa *fna) Genbank and GenPept (*gb, *gbk, *gp) Sequence alignment (*aln, *fa, *ffa) Tabulated (*txt, *txv, *csv) Tabulated data deserves special consideration since it is not as well-defined a format as the others. Tabulated data can have any number of columns, which may or may not be carefully defined. Always check the file format specifications, examine the data in the files in detail, and/or ask the authors before working with these files. In addition, before attempting to use Microsoft Excel to manipulate tabulated data, realize that Excel has built-in limits of 256 columns and 65,536 rows, and will occasionally re-interpret cell content. For example, human gene symbol DEC1 is usually converted to a date representation of December 1, such as 1- Dec, and the = prefix is interpreted as a function definition. Raw data can also be represented in files with more complex internal structure, and are usually hierarchically structured. There are two common hierarchical formats that are used to

2 represent bioinformatics data: ASN.1 and XML. ASN.1 predates XML, and is used almost exclusively by NCBI. XML is a relatively common format in bioinformatics, but is similar to tabulated data in that there is no telling what kind of, and how much data are being represented just by knowing that the file is XML. (The definition of XML internal structure is made in an additional document called a DTD, but this usually tells you nothing about what the data represents and how much of it there is.) Like the simpler formats listed above, ASN.1 and XML also contain text character data, but they were designed to be generated and read by computer programs; they are generally very difficult for our eyes to read and our brains to make sense of. Several years ago, there was a lot of hype about XML in bioinformatics circles. Ignore the residual hype. There is nothing inherently magic about XML (in fact, it looks a lot like ASN.1), and since it tends to explode the sizes of files by two to three fold, it should be avoided if there is another, more compact format available that contains all the data you need (e.g., tabulated or FASTA). Some brief comments about three additional file formats that you might come across when dealing with bulk bioinformatics data. Archived and/or compressed text data files (file extensions *gz, *tgz, *gzip, *zip, *tar) are common when data files are large or numerous because they take up less disk storage space and download faster. Compressed files generally have to be uncompressed before they are viewed or manipulated, although some programs read compressed files directly, and uncompress them on the fly. Sanger sequencing data is output from sequencing machines as sequence chromatograms or trace files (with file extensions *ab1, *scf). It is rarely necessary to deal with these files directly, unless you are using an existing software package to display the traces (to look for evidence of SNPs or mutations, for example) or perform relatively sophisticated sequence mapping or assembly tasks (with software such as Phred, Phrap or Consed). Binary data simply indicates any non-text character data, such as image data or proprietary format data. Like trace data, you will rarely need to deal with this sort of data, unless you are doing some highly specialized task, or are dealing with data directly output from a piece of lab equipment. When dealing with binary data, the format is never clear from visual inspection using a text editor, for example. Therefore, either a special program written by someone familiar with the format, or a clear and detailed specification, are usually needed. Worked example #1: NCBI Entrez Gene FTP site In this worked example, you will download two compressed tabulated files from the NCBI Entrez Gene FTP site: one with information about each human gene, and the other with GO annotations. 1. Open a Web browser on an Internet-connected computer 2. Go to NCBI s listing of their FTP sites by entering the following URL into your browser: < 3. Click on the link to Gene

3 4. Click on the file named README. Readme files are common on File Transfer Protocol (FTP) sites, and usually contain important information about file contents and structure. 5. Scroll down to the description of a file called gene_info, about halfway down the page. Note that the file is tabulated (tab-delimited) and essential details are given about what each row represents (a gene) and what the different columns contain (different pieces of information about each gene). 6. Click the BACK button on your browser 7. Click on the directory named DATA, followed by GENE_INFO and Mammalia 8. Click on the file named Homo_sapiens.gene_info.gz to download the file containing information on only human genes. You may be asked to confirm the download, and you should. The *gz extension indicates that the file is compressed using the Gnu Zip program. 9. Find your local copy of the downloaded file and double-click on it. Your computer should recognize that it is compressed and open up a decompression or archive program (will vary depending on your operating system). If it doesn t, you ll have a find or install a program that is able to decompress Gnu Zip files. A decompressed tabulated file called Homo_sapiens.gene_info should have been created if you were successful. 10. Launch Microsoft Excel and open the decompressed file. In order to load the file, you may have to set Enable to All Documents in the Open file browser. 11. The Excel Text Import Wizard will open (this is where you can specify some parser settings, such as delimiting character), but for this example, it is safe to use the default settings and click the Finish button. 12. Scroll to the right of the spreadsheet to verify that columns A-O were loaded. Scroll to the bottom to confirm that all 39,856 rows in the file were loaded. Notice that the first column is filled only with This is the Taxonomy id for human. If we had downloaded the gene_info file for all organisms (well, all organisms in which gene records have been defined), this column would have many different ids. 13. Go back to <ftp://ftp.ncbi.nlm.nih.gov/gene/data/>. 14. Click on the file named gene2go.gz to download it as you did for the file above. 15. As before, find your local copy of this file and double-click it to decompress. A file named gene2go should have been created. 16. Also as before, launch Excel and open the file gene2go. 17. Since this file contains over a million lines, Excel will not be able to load it complete and will indicate this with a dialog stating: File not loaded completely. Click OK on this dialog. Notice that there are multiple entries for each Gene ID, indicating that each row represents a GO annotation-gene combination. 18. Close this file. Remember that Excel can only load a maximum of 65,536 rows. If you really need to deal with programs of this size, the only way to do it with special software that can handle many rows of data, with Unix commands, or by doing your own computer programming. If you would like to find out more information about Unix, go to < and download the course materials. Examples of useful Unix commands and programs that can manipulate tabulated data and which bioinformaticians use every day include grep (filter rows), cut (filter columns), paste (paste columns), cat (paste rows), sort (sort rows), uniq (filter unique rows), join (intersect tables), sed

4 (replace text), wc (count rows), head (extract top of table), tail (extract bottom of table), more and less (browse text) and vi (edit text). Unix I/O piping and redirection are also exceedingly useful. Advanced homework (worth one cookie): use a terminal and Unix commands to add GO annotations in gene2go to human gene info in Homo_sapiens.gene_info. Hint: open a Unix terminal, navigate to the directory where the gene2go and Homo_sapiens.gene_info files are located, and then use grep to select rows in gene2go where the first column contains 9606 (ie, human gene), cut the first columns off of, and sort both the filtered gene2go file and Homo_sapiens.gene_info by Gene ID, join the two sorted files. Extra credit (worth an extra cookie): when doing the previous join, retain the genes in Homo_sapiens.gene_info that don t appear in gene2go (i.e., they are the genes without GO annotations). If you would like to find out more about computer programming languages, there are many languages to choose from and many good books available. For bioinformatics applications, we recommend Python or Perl, both established and powerful languages that are relatively easy to learn. If you already know how to program, and would like programmatic access to NCBI data in the Entrez suite of databases (such as Entrez Gene), NCBI offers e-utilities web services. Both the Bioperl module and the Biopython package have data structures, parsers and query functions that handle Entrez data. More information about the e-utilities is available at < Super advanced homework (worth three cookies): write a Perl or Python program that takes database name, report type and Entrez query string, connects to NCBI e-utilities server, and performs the given query on the given database. Hint #1: use the Bioperl module or Biopython package. Hint #2: Python code that does just this is located at < You can look at the source code using any text editor, but in order to run the code, you ll have to use a command terminal and make sure you have Python and Biopython installed. Worked example #2: Ensembl human genome FTP site In this worked example, you will download a single FASTA file containing all human protein sequences from the Ensembl FTP site. 1. Open a browser and go to < 2. Note that there are many methods for doing bulk data downloads from the Ensembl site. Scroll down to the section entitled FTP, and click on the link Table of links to Ensembl FTP files. 3. Notice the sentence near the top of the page indicates that all data has been compressed using the Gnu Zip program (file extension *gz). 4. Scroll down to the table of species and FTP links at the bottom of the page specifically to the row marked Homo sapiens (human), and click on the FTP link under the Peptides column. This will take you to an FTP directory.

5 5. Click on the README file, and note the information about Ensembl file naming conventions, and that all files are FASTA format (*fa) and compressed with Gnu Zip (*gz). 6. Click the browser s BACK button. 7. For this demonstration, we want peptide translations with some additional evidence and not just gene predictions one the genomic assembly alone, so click on the file named Homo_sapiens.NCBI36.50.pep.all.fa.gz in order to download this file to your local computer. 8. As in the previous worked example, find your local copy of the downloaded file and double-click on it. Your computer should recognize that it is compressed and open up a decompression or archive program. If it doesn t, you ll have a find or install a program that is able to decompress Gnu Zip files. A decompressed tabulated file called Homo_sapiens.NCBI36.50.pep.all.fa should have been created if you were successful. 9. Launch a text editor such as Apple TextEdit or Microsoft Word, and then open this downloaded and uncompressed file. As expected, protein sequence data is represented in the FASTA format. What is FASTA format, and why is it everywhere I look? FASTA format is a compact way to represent sequence data, which is also easy for humans and computers to scan through. FASTA format is named for the FASTA alignment program (predecessor to BLAST), which originally used this file format. It is a very simple format with two types of lines: definition lines (or deflines), and sequence lines. The defline always begin with a > symbol and is followed by some limited information, such as, in this example, unique peptide identifier; peptide type; chromosome build, number, start and stop coordinates and strand; gene id; and transcript id. The unique id usually is the first bit of information given on a defline. Any line that s not a defline (doesn t begin with > ) is sequence data. FASTA is such a popular and durable format that it even has its own Wikipedia page at < Because so many programs utilize and generate files in this format, get used to it, it is here to stay. Advanced homework (worth two cookies): build a local BLAST database from the FASTA file containing human proteins, which you just downloaded and uncompressed, and then retrieve and run one of the NCBI protein isoform sequences for APP against it. Hints: if you don t already have NCBI s blastall and formatdb programs installed, go to < and download the blast program suite appropriate for your computer. You ll then have to build a BLAST database with the included program formatdb, Documentation on formatdb is available at < and blastall is at < Worked example #3: Ensembl BioMart tool In this worked example, you will use the Ensembl BioMart tool to download the 3 UTRs for all human transcripts for which there is a Drosophila gene ortholog (using protein sequence

6 homology as a surrogate), and do a spot check to confirm that human 3 UTR sequence was downloaded. 1. Open a browser and go to < 2. Scroll down to the section labeled BioMart, and click on the link BioMart data mining tool 3. Under the menu labeled CHOOSE DATABASE, select Ensembl 50, and under CHOOSE DATASET, select Homo sapiens genes (NCBI36) 4. Click on the Filters link on the left sidebar, expand the MULTI SPECIES COMPARISONS section by clicking on the + icon to its immediate left, click the checkbox labeled Homolog filters, and select Orthologous Drosophila Genes. 5. Click on the Attributes link on the left sidebar, select the radio button labeled Sequences, expand the SEQUENCES section, select the radio button labeled 3 UTR. 6. Scroll down and expand the HEADER INFORMATION in order to verify that Ensembl Gene ID and Ensembl Transcript ID are checked. 7. Click on the Count tab, and notice that the gene record count next to the Dataset label on the left sidebar appears, or is updated. 8. Click on the Results tab, and notice your results appear in the main tool panel, and that they are in FASTA format. 9. Select Compressed web file (notify by ), enter your address, click on the Go button to cause a gzipped FASTA formatted file to be created on the Ensembl website. You will get an shortly with a URL link to the file. 10. When you get the (a couple of minutes, at least), click on the link, and download the file. 11. As before, find the local downloaded file and double click it to uncompress. 12. For a spot check of the data, open this downloaded and uncompressed file with a text editor and highlight and copy one of the FASTA records. 13. Go to the UCSC Genome Browser at < and click on the BLAT link on the header bar. 14. Confirm that the Genome and Assembly are Human and Mar. 2006, respectively, paste the FASTA record into the text box and click I m feeling lucky button. 15. Zoom out 3X in order to confirm that the sequence is aligning very well with the 3 UTR of a gene. Just as the e-utilities provide programmatic access to NCBI Entrez databases, programmatic access to the Ensembl database is provided through a DAS server. DAS stands for Distributed Sequence Annotation System and is a community standard for a web service that represents genomic annotations on a reference genome. All annotations are indexed with exact start and stop positions on the reference genome. There is more information about the Ensembl DAS server instance at < There is more information about the DAS specification at < Advanced homework (worth half a cookie): use Ensembl BioMart to retrieve the HGNC gene symbols, chromosome, start and end positions, and Affymetrix U95 expression annotations for all genes on Chromosome 1.

7 Worked example #4: UCSC Genome Browser Table downloads In this worked example, you will download UCSC Genome Browser gene annotations from a particular region of the human genome. 1. Open a browser and go to < 2. Copy and paste the following coordinates into the text field labeled position/search : chr21:26,074,733-26,965,003. After clicking on the jump button, you should see exon, intron, UTR and coding annotation tracks for the APP and CYYR1 genes at the top of the image. The browser has control buttons for zooming, panning and display of data tracks. In the language of UCSC Genome Browser, a track is a collection of related data, each datum of which is position indexed by chromosome, and start and stop nucleotide. 3. In order to download tables of data, click the Tables link on the blue header bar. On the new page, note the region we were viewing is now preset in the position field, and confirm that clade=vertebrate, genome= Human and assembly= Mar Make the following selections in the drop-down menus: group= Gene and Prediction Tracks, track= UCSC Genes, table= knowngene, and output format= selected fields from primary and related tables. Click on the get output button. 5. In the hg18.knowngene section, click on the check all button. In the Linked Tables section, check the box by kgxref. Click Allow Selection From Checked Tables at the bottom. 6. From the hg18.kgxref section, check the following boxes: genesymbol, refseq, protacc, and description. Scroll back up to the hg18.knowngene section, and click the get output button. 7. On the tabulated output, notice that all of the rows contain genomic coordinates (this will be true of any table downloaded from UCSC), and as expected isoforms from two genes are represented: amyloid beta A4, and cysteine and tyrosine-rich 1. The UCSC Genome Browser offers many genomes other that human, and has many other functions, including table filtering and DNA sequence downloads (in FASTA format, of course) that can be explored. You will be able to find more information at < Unlike NCBI Entrez and Ensembl, there is no programmatic interface for UCSC Genome Browser, but if you know how to create relational databases, you can recreate their table structure locally and use SQL to run queries. Advanced homework (worth half a cookie): for all genes on human chromosome X, use UCSC Genome Browser Tables to grab the name, chromosome, strand, start, end, HGNC gene symbol, gene description, and Ensembl ID. Hint: you ll have to select data from an additional table. The importance of checking bulk downloads When downloading data in bulk, it is always prudent to remember the old carpenter s adage, measure twice, cut once. You will save a lot of time by performing several spot checks of

8 the data using a different database and method (if possible) to confirm that you actually got what you expected to get. For spot-checking sequence data, use a text editor, and then BLAST < or BLAT < against an appropriate database. To spot check tabulated data, use a text editor, Excel and UNIX commands (to do things like count the number of rows), if you know them.

Tutorial 1: Exploring the UCSC Genome Browser

Tutorial 1: Exploring the UCSC Genome Browser Last updated: May 12, 2011 Tutorial 1: Exploring the UCSC Genome Browser Open the homepage of the UCSC Genome Browser at: http://genome.ucsc.edu/ In the blue bar at the top, click on the Genomes link.

More information

Genome Browsers - The UCSC Genome Browser

Genome Browsers - The UCSC Genome Browser Genome Browsers - The UCSC Genome Browser Background The UCSC Genome Browser is a well-curated site that provides users with a view of gene or sequence information in genomic context for a specific species,

More information

Genome Browsers Guide

Genome Browsers Guide Genome Browsers Guide Take a Class This guide supports the Galter Library class called Genome Browsers. See our Classes schedule for the next available offering. If this class is not on our upcoming schedule,

More information

Wilson Leung 01/03/2018 An Introduction to NCBI BLAST. Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment

Wilson Leung 01/03/2018 An Introduction to NCBI BLAST. Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment An Introduction to NCBI BLAST Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment Resources: The BLAST web server is available at https://blast.ncbi.nlm.nih.gov/blast.cgi

More information

2) NCBI BLAST tutorial This is a users guide written by the education department at NCBI.

2) NCBI BLAST tutorial   This is a users guide written by the education department at NCBI. Web resources -- Tour. page 1 of 8 This is a guided tour. Any homework is separate. In fact, this exercise is used for multiple classes and is publicly available to everyone. The entire tour will take

More information

Wilson Leung 05/27/2008 A Simple Introduction to NCBI BLAST

Wilson Leung 05/27/2008 A Simple Introduction to NCBI BLAST A Simple Introduction to NCBI BLAST Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment Resources: The BLAST web server is available at http://www.ncbi.nih.gov/blast/

More information

Introduction to Genome Browsers

Introduction to Genome Browsers Introduction to Genome Browsers Rolando Garcia-Milian, MLS, AHIP (Rolando.milian@ufl.edu) Department of Biomedical and Health Information Services Health Sciences Center Libraries, University of Florida

More information

2. Take a few minutes to look around the site. The goal is to familiarize yourself with a few key components of the NCBI.

2. Take a few minutes to look around the site. The goal is to familiarize yourself with a few key components of the NCBI. 2 Navigating the NCBI Instructions Aim: To become familiar with the resources available at the National Center for Bioinformatics (NCBI) and the search engine Entrez. Instructions: Write the answers to

More information

BovineMine Documentation

BovineMine Documentation BovineMine Documentation Release 1.0 Deepak Unni, Aditi Tayal, Colin Diesh, Christine Elsik, Darren Hag Oct 06, 2017 Contents 1 Tutorial 3 1.1 Overview.................................................

More information

Creating and Using Genome Assemblies Tutorial

Creating and Using Genome Assemblies Tutorial Creating and Using Genome Assemblies Tutorial Release 8.1 Golden Helix, Inc. March 18, 2014 Contents 1. Create a Genome Assembly for Danio rerio 2 2. Building Annotation Sources 5 A. Creating a Reference

More information

Advanced UCSC Browser Functions

Advanced UCSC Browser Functions Advanced UCSC Browser Functions Dr. Thomas Randall tarandal@email.unc.edu bioinformatics.unc.edu UCSC Browser: genome.ucsc.edu Overview Custom Tracks adding your own datasets Utilities custom tools for

More information

Analyzing ChIP- Seq Data in Galaxy

Analyzing ChIP- Seq Data in Galaxy Analyzing ChIP- Seq Data in Galaxy Lauren Mills RISS ABSTRACT Step- by- step guide to basic ChIP- Seq analysis using the Galaxy platform. Table of Contents Introduction... 3 Links to helpful information...

More information

Tutorial 4 BLAST Searching the CHO Genome

Tutorial 4 BLAST Searching the CHO Genome Tutorial 4 BLAST Searching the CHO Genome Accessing the CHO Genome BLAST Tool The CHO BLAST server can be accessed by clicking on the BLAST button on the home page or by selecting BLAST from the menu bar

More information

Supplementary Figure 1. Fast read-mapping algorithm of BrowserGenome.

Supplementary Figure 1. Fast read-mapping algorithm of BrowserGenome. Supplementary Figure 1 Fast read-mapping algorithm of BrowserGenome. (a) Indexing strategy: The genome sequence of interest is divided into non-overlapping 12-mers. A Hook table is generated that contains

More information

Tutorial 1: Using Excel to find unique values in a list

Tutorial 1: Using Excel to find unique values in a list Tutorial 1: Using Excel to find unique values in a list It is not uncommon to have a list of data that contains redundant values. Genes with multiple transcript isoforms is one example. If you are only

More information

Sequence Alignment. GBIO0002 Archana Bhardwaj University of Liege

Sequence Alignment. GBIO0002 Archana Bhardwaj University of Liege Sequence Alignment GBIO0002 Archana Bhardwaj University of Liege 1 What is Sequence Alignment? A sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity.

More information

The UCSC Genome Browser

The UCSC Genome Browser The UCSC Genome Browser UNIT 1.4 The rapid progress of public sequencing and mapping efforts on vertebrate genomes has increased the demand for tools that offer quick and easy access to the data at many

More information

Lecture 3. Essential skills for bioinformatics: Unix/Linux

Lecture 3. Essential skills for bioinformatics: Unix/Linux Lecture 3 Essential skills for bioinformatics: Unix/Linux RETRIEVING DATA Overview Whether downloading large sequencing datasets or accessing a web application hundreds of times to download specific files,

More information

Genomics 92 (2008) Contents lists available at ScienceDirect. Genomics. journal homepage:

Genomics 92 (2008) Contents lists available at ScienceDirect. Genomics. journal homepage: Genomics 92 (2008) 75 84 Contents lists available at ScienceDirect Genomics journal homepage: www.elsevier.com/locate/ygeno Review UCSC genome browser tutorial Ann S. Zweig a,, Donna Karolchik a, Robert

More information

When we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame

When we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame 1 When we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame translation on the fly. That is, 3 reading frames from

More information

Finding and Exporting Data. BioMart

Finding and Exporting Data. BioMart September 2017 Finding and Exporting Data Not sure what tool to use to find and export data? BioMart is used to retrieve data for complex queries, involving a few or many genes or even complete genomes.

More information

Exercise 2: Browser-Based Annotation and RNA-Seq Data

Exercise 2: Browser-Based Annotation and RNA-Seq Data Exercise 2: Browser-Based Annotation and RNA-Seq Data Jeremy Buhler July 24, 2018 This exercise continues your introduction to practical issues in comparative annotation. You ll be annotating genomic sequence

More information

Genomic Analysis with Genome Browsers.

Genomic Analysis with Genome Browsers. Genomic Analysis with Genome Browsers http://barc.wi.mit.edu/hot_topics/ 1 Outline Genome browsers overview UCSC Genome Browser Navigating: View your list of regions in the browser Available tracks (eg.

More information

Essential Skills for Bioinformatics: Unix/Linux

Essential Skills for Bioinformatics: Unix/Linux Essential Skills for Bioinformatics: Unix/Linux WORKING WITH COMPRESSED DATA Overview Data compression, the process of condensing data so that it takes up less space (on disk drives, in memory, or across

More information

MacVector for Mac OS X

MacVector for Mac OS X MacVector 11.0.4 for Mac OS X System Requirements MacVector 11 runs on any PowerPC or Intel Macintosh running Mac OS X 10.4 or higher. It is a Universal Binary, meaning that it runs natively on both PowerPC

More information

ChIP-Seq Tutorial on Galaxy

ChIP-Seq Tutorial on Galaxy 1 Introduction ChIP-Seq Tutorial on Galaxy 2 December 2010 (modified April 6, 2017) Rory Stark The aim of this practical is to give you some experience handling ChIP-Seq data. We will be working with data

More information

Practical Course in Genome Bioinformatics

Practical Course in Genome Bioinformatics Practical Course in Genome Bioinformatics 20/01/2017 Exercises - Day 1 http://ekhidna.biocenter.helsinki.fi/downloads/teaching/spring2017/ Answer questions Q1-Q3 below and include requested Figures 1-5

More information

Getting Started. April Strand Life Sciences, Inc All rights reserved.

Getting Started. April Strand Life Sciences, Inc All rights reserved. Getting Started April 2015 Strand Life Sciences, Inc. 2015. All rights reserved. Contents Aim... 3 Demo Project and User Interface... 3 Downloading Annotations... 4 Project and Experiment Creation... 6

More information

HymenopteraMine Documentation

HymenopteraMine Documentation HymenopteraMine Documentation Release 1.0 Aditi Tayal, Deepak Unni, Colin Diesh, Chris Elsik, Darren Hagen Apr 06, 2017 Contents 1 Welcome to HymenopteraMine 3 1.1 Overview of HymenopteraMine.....................................

More information

m6aviewer Version Documentation

m6aviewer Version Documentation m6aviewer Version 1.6.0 Documentation Contents 1. About 2. Requirements 3. Launching m6aviewer 4. Running Time Estimates 5. Basic Peak Calling 6. Running Modes 7. Multiple Samples/Sample Replicates 8.

More information

Sequence Alignment: BLAST

Sequence Alignment: BLAST E S S E N T I A L S O F N E X T G E N E R A T I O N S E Q U E N C I N G W O R K S H O P 2015 U N I V E R S I T Y O F K E N T U C K Y A G T C Class 6 Sequence Alignment: BLAST Be able to install and use

More information

ChIP-seq (NGS) Data Formats

ChIP-seq (NGS) Data Formats ChIP-seq (NGS) Data Formats Biological samples Sequence reads SRA/SRF, FASTQ Quality control SAM/BAM/Pileup?? Mapping Assembly... DE Analysis Variant Detection Peak Calling...? Counts, RPKM VCF BED/narrowPeak/

More information

Lecture 5. Essential skills for bioinformatics: Unix/Linux

Lecture 5. Essential skills for bioinformatics: Unix/Linux Lecture 5 Essential skills for bioinformatics: Unix/Linux UNIX DATA TOOLS Text processing with awk We have illustrated two ways awk can come in handy: Filtering data using rules that can combine regular

More information

The UCSC Genome Browser

The UCSC Genome Browser The UCSC Genome Browser Search, retrieve and display the data that you want Materials prepared by Warren C. Lathe, Ph.D. Mary Mangan, Ph.D. www.openhelix.com Updated: Q3 2006 Version_0906 Copyright OpenHelix.

More information

Tutorial: De Novo Assembly of Paired Data

Tutorial: De Novo Assembly of Paired Data : De Novo Assembly of Paired Data September 20, 2013 CLC bio Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 Fax: +45 86 20 12 22 www.clcbio.com support@clcbio.com : De Novo Assembly

More information

Tutorial. Small RNA Analysis using Illumina Data. Sample to Insight. October 5, 2016

Tutorial. Small RNA Analysis using Illumina Data. Sample to Insight. October 5, 2016 Small RNA Analysis using Illumina Data October 5, 2016 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com AdvancedGenomicsSupport@qiagen.com

More information

Small RNA Analysis using Illumina Data

Small RNA Analysis using Illumina Data Small RNA Analysis using Illumina Data September 7, 2016 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com

More information

Geneious 5.6 Quickstart Manual. Biomatters Ltd

Geneious 5.6 Quickstart Manual. Biomatters Ltd Geneious 5.6 Quickstart Manual Biomatters Ltd October 15, 2012 2 Introduction This quickstart manual will guide you through the features of Geneious 5.6 s interface and help you orient yourself. You should

More information

ChIP-seq Analysis Practical

ChIP-seq Analysis Practical ChIP-seq Analysis Practical Vladimir Teif (vteif@essex.ac.uk) An updated version of this document will be available at http://generegulation.info/index.php/teaching In this practical we will learn how

More information

Agilent Genomic Workbench Lite Edition 6.5

Agilent Genomic Workbench Lite Edition 6.5 Agilent Genomic Workbench Lite Edition 6.5 SureSelect Quality Analyzer User Guide For Research Use Only. Not for use in diagnostic procedures. Agilent Technologies Notices Agilent Technologies, Inc. 2010

More information

Introduction to UNIX command-line II

Introduction to UNIX command-line II Introduction to UNIX command-line II Boyce Thompson Institute 2017 Prashant Hosmani Class Content Terminal file system navigation Wildcards, shortcuts and special characters File permissions Compression

More information

4.1. Access the internet and log on to the UCSC Genome Bioinformatics Web Page (Figure 1-

4.1. Access the internet and log on to the UCSC Genome Bioinformatics Web Page (Figure 1- 1. PURPOSE To provide instructions for finding rs Numbers (SNP database ID numbers) and increasing sequence length by utilizing the UCSC Genome Bioinformatics Database. 2. MATERIALS 2.1. Sequence Information

More information

Genomic Files. University of Massachusetts Medical School. October, 2015

Genomic Files. University of Massachusetts Medical School. October, 2015 .. Genomic Files University of Massachusetts Medical School October, 2015 2 / 55. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further

More information

Tutorial: chloroplast genomes

Tutorial: chloroplast genomes Tutorial: chloroplast genomes Stacia Wyman Department of Computer Sciences Williams College Williamstown, MA 01267 March 10, 2005 ASSUMPTIONS: You are using Internet Explorer under OS X on the Mac. You

More information

Tutorial. RNA-Seq Analysis of Breast Cancer Data. Sample to Insight. November 21, 2017

Tutorial. RNA-Seq Analysis of Breast Cancer Data. Sample to Insight. November 21, 2017 RNA-Seq Analysis of Breast Cancer Data November 21, 2017 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com AdvancedGenomicsSupport@qiagen.com

More information

Very large searches present a number of challenges. These are the topics we will cover during this presentation.

Very large searches present a number of challenges. These are the topics we will cover during this presentation. 1 Very large searches present a number of challenges. These are the topics we will cover during this presentation. 2 The smartest way to merge files, like fractions from a MudPIT run, is using Mascot Daemon.

More information

Importing sequence assemblies from BAM and SAM files

Importing sequence assemblies from BAM and SAM files BioNumerics Tutorial: Importing sequence assemblies from BAM and SAM files 1 Aim With the BioNumerics BAM import routine, a sequence assembly in BAM or SAM format can be imported in BioNumerics. A BAM

More information

Importing and Merging Data Tutorial

Importing and Merging Data Tutorial Importing and Merging Data Tutorial Release 1.0 Golden Helix, Inc. February 17, 2012 Contents 1. Overview 2 2. Import Pedigree Data 4 3. Import Phenotypic Data 6 4. Import Genetic Data 8 5. Import and

More information

Browser Exercises - I. Alignments and Comparative genomics

Browser Exercises - I. Alignments and Comparative genomics Browser Exercises - I Alignments and Comparative genomics 1. Navigating to the Genome Browser (GBrowse) Note: For this exercise use http://www.tritrypdb.org a. Navigate to the Genome Browser (GBrowse)

More information

2. create the workbook file

2. create the workbook file 2. create the workbook file Excel documents are called workbook files. A workbook can include multiple sheets of information. Excel supports two kinds of sheets for working with data: Worksheets, which

More information

How to use earray to create custom content for the SureSelect Target Enrichment platform. Page 1

How to use earray to create custom content for the SureSelect Target Enrichment platform. Page 1 How to use earray to create custom content for the SureSelect Target Enrichment platform Page 1 Getting Started Access earray Access earray at: https://earray.chem.agilent.com/earray/ Log in to earray,

More information

Tutorial. Variant Detection. Sample to Insight. November 21, 2017

Tutorial. Variant Detection. Sample to Insight. November 21, 2017 Resequencing: Variant Detection November 21, 2017 Map Reads to Reference and Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com

More information

A short Introduction to UCSC Genome Browser

A short Introduction to UCSC Genome Browser A short Introduction to UCSC Genome Browser Elodie Girard, Nicolas Servant Institut Curie/INSERM U900 Bioinformatics, Biostatistics, Epidemiology and computational Systems Biology of Cancer 1 Why using

More information

Ensembl RNASeq Practical. Overview

Ensembl RNASeq Practical. Overview Ensembl RNASeq Practical The aim of this practical session is to use BWA to align 2 lanes of Zebrafish paired end Illumina RNASeq reads to chromosome 12 of the zebrafish ZV9 assembly. We have restricted

More information

How to Remove Duplicate Rows in Excel

How to Remove Duplicate Rows in Excel How to Remove Duplicate Rows in Excel http://www.howtogeek.com/198052/how-to-remove-duplicate-rows-in-excel/ When you are working with spreadsheets in Microsoft Excel and accidentally copy rows, or if

More information

Reference & Track Manager

Reference & Track Manager Reference & Track Manager U SoftGenetics, LLC 100 Oakwood Avenue, Suite 350, State College, PA 16803 USA * info@softgenetics.com www.softgenetics.com 888-791-1270 2016 Registered Trademarks are property

More information

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny.

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny. Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny stefano.gaiarsa@unimi.it Linux and the command line PART 1 Survival kit for the bash environment Purpose of the

More information

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL We have spent the first part of the course learning Excel: importing files, cleaning, sorting, filtering, pivot tables and exporting

More information

NCBI News, November 2009

NCBI News, November 2009 Peter Cooper, Ph.D. NCBI cooper@ncbi.nlm.nh.gov Dawn Lipshultz, M.S. NCBI lipshult@ncbi.nlm.nih.gov Featured Resource: New Discovery-oriented PubMed and NCBI Homepage The NCBI Site Guide A new and improved

More information

Performing whole genome SNP analysis with mapping performed locally

Performing whole genome SNP analysis with mapping performed locally BioNumerics Tutorial: Performing whole genome SNP analysis with mapping performed locally 1 Introduction 1.1 An introduction to whole genome SNP analysis A Single Nucleotide Polymorphism (SNP) is a variation

More information

Reference Guide. Adding a Generic File Store - Importing From a Local or Network ShipWorks Page 1 of 21

Reference Guide. Adding a Generic File Store - Importing From a Local or Network ShipWorks Page 1 of 21 Reference Guide Adding a Generic File Store - Importing From a Local or Network Folder Page 1 of 21 Adding a Generic File Store TABLE OF CONTENTS Background First Things First The Process Creating the

More information

Intro to NGS Tutorial

Intro to NGS Tutorial Intro to NGS Tutorial Release 8.6.0 Golden Helix, Inc. October 31, 2016 Contents 1. Overview 2 2. Import Variants and Quality Fields 3 3. Quality Filters 10 Generate Alternate Read Ratio.........................................

More information

Part 1: How to use IGV to visualize variants

Part 1: How to use IGV to visualize variants Using IGV to identify true somatic variants from the false variants http://www.broadinstitute.org/igv A FAQ, sample files and a user guide are available on IGV website If you use IGV in your publication:

More information

Exon Probeset Annotations and Transcript Cluster Groupings

Exon Probeset Annotations and Transcript Cluster Groupings Exon Probeset Annotations and Transcript Cluster Groupings I. Introduction This whitepaper covers the procedure used to group and annotate probesets. Appropriate grouping of probesets into transcript clusters

More information

BLAST Exercise 2: Using mrna and EST Evidence in Annotation Adapted by W. Leung and SCR Elgin from Annotation Using mrna and ESTs by Dr. J.

BLAST Exercise 2: Using mrna and EST Evidence in Annotation Adapted by W. Leung and SCR Elgin from Annotation Using mrna and ESTs by Dr. J. BLAST Exercise 2: Using mrna and EST Evidence in Annotation Adapted by W. Leung and SCR Elgin from Annotation Using mrna and ESTs by Dr. J. Buhler Prerequisites: BLAST Exercise: Detecting and Interpreting

More information

Topics of the talk. Biodatabases. Data types. Some sequence terminology...

Topics of the talk. Biodatabases. Data types. Some sequence terminology... Topics of the talk Biodatabases Jarno Tuimala / Eija Korpelainen CSC What data are stored in biological databases? What constitutes a good database? Nucleic acid sequence databases Amino acid sequence

More information

Integrated Genome browser (IGB) installation

Integrated Genome browser (IGB) installation Integrated Genome browser (IGB) installation Navigate to the IGB download page http://bioviz.org/igb/download.html You will see three icons for download: The three icons correspond to different memory

More information

Tutorial: Using the SFLD and Cytoscape to Make Hypotheses About Enzyme Function for an Isoprenoid Synthase Superfamily Sequence

Tutorial: Using the SFLD and Cytoscape to Make Hypotheses About Enzyme Function for an Isoprenoid Synthase Superfamily Sequence Tutorial: Using the SFLD and Cytoscape to Make Hypotheses About Enzyme Function for an Isoprenoid Synthase Superfamily Sequence Requirements: 1. A web browser 2. The cytoscape program (available for download

More information

The beginning of this guide offers a brief introduction to the Protein Data Bank, where users can download structure files.

The beginning of this guide offers a brief introduction to the Protein Data Bank, where users can download structure files. Structure Viewers Take a Class This guide supports the Galter Library class called Structure Viewers. See our Classes schedule for the next available offering. If this class is not on our upcoming schedule,

More information

Biostatistics and Bioinformatics Molecular Sequence Databases

Biostatistics and Bioinformatics Molecular Sequence Databases . 1 Description of Module Subject Name Paper Name Module Name/Title 13 03 Dr. Vijaya Khader Dr. MC Varadaraj 2 1. Objectives: In the present module, the students will learn about 1. Encoding linear sequences

More information

TUTORIAL: Generating diagnostic primers using the Uniqprimer Galaxy Workflow

TUTORIAL: Generating diagnostic primers using the Uniqprimer Galaxy Workflow TUTORIAL: Generating diagnostic primers using the Uniqprimer Galaxy Workflow In this tutorial, we will generate primers expected to amplify a product from Dickeya dadantii but not from other strains of

More information

Azure Developer Immersion Getting Started

Azure Developer Immersion Getting Started Azure Developer Immersion Getting Started In this walkthrough, you will get connected to Microsoft Azure and Visual Studio Team Services. You will also get the code and supporting files you need onto your

More information

Release Notes. Version Gene Codes Corporation

Release Notes. Version Gene Codes Corporation Version 4.10.1 Release Notes 2010 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com

More information

Linux command line basics III: piping commands for text processing. Yanbin Yin Fall 2015

Linux command line basics III: piping commands for text processing. Yanbin Yin Fall 2015 Linux command line basics III: piping commands for text processing Yanbin Yin Fall 2015 1 h.p://korflab.ucdavis.edu/unix_and_perl/unix_and_perl_v3.1.1.pdf 2 The beauty of Unix for bioinformagcs sort, cut,

More information

Tutorial: RNA-Seq analysis part I: Getting started

Tutorial: RNA-Seq analysis part I: Getting started : RNA-Seq analysis part I: Getting started August 9, 2012 CLC bio Finlandsgade 10-12 8200 Aarhus N Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com support@clcbio.com : RNA-Seq analysis

More information

Practical Unix exercise MBV INFX410

Practical Unix exercise MBV INFX410 Practical Unix exercise MBV INFX410 We will in this exercise work with a practical task that, it turns out, can easily be solved by using basic Unix. Let us pretend that an engineer in your group has spent

More information

INTRODUCTION TO BIOINFORMATICS

INTRODUCTION TO BIOINFORMATICS Molecular Biology-2017 1 INTRODUCTION TO BIOINFORMATICS In this section, we want to provide a simple introduction to using the web site of the National Center for Biotechnology Information NCBI) to obtain

More information

Genomic Files. University of Massachusetts Medical School. October, 2014

Genomic Files. University of Massachusetts Medical School. October, 2014 .. Genomic Files University of Massachusetts Medical School October, 2014 2 / 39. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further

More information

Finding data. HMMER Answer key

Finding data. HMMER Answer key Finding data HMMER Answer key HMMER input is prepared using VectorBase ClustalW, which runs a Java application for the graphical representation of the results. If you get an error message that blocks this

More information

UCSC Genome Browser Pittsburgh Workshop -- Practical Exercises

UCSC Genome Browser Pittsburgh Workshop -- Practical Exercises UCSC Genome Browser Pittsburgh Workshop -- Practical Exercises We will be using human assembly hg19. These problems will take you through a variety of resources at the UCSC Genome Browser. You will learn

More information

Practical Linux examples: Exercises

Practical Linux examples: Exercises Practical Linux examples: Exercises 1. Login (ssh) to the machine that you are assigned for this workshop (assigned machines: https://cbsu.tc.cornell.edu/ww/machines.aspx?i=87 ). Prepare working directory,

More information

HOW TO EXPORT BUYER NAMES & ADDRESSES FROM PAYPAL TO A CSV FILE

HOW TO EXPORT BUYER NAMES & ADDRESSES FROM PAYPAL TO A CSV FILE HOW TO EXPORT BUYER NAMES & ADDRESSES FROM PAYPAL TO A CSV FILE If your buyers use PayPal to pay for their purchases, you can quickly export all names and addresses to a type of spreadsheet known as a

More information

CREATING CUSTOMER MAILING LABELS

CREATING CUSTOMER MAILING LABELS CREATING CUSTOMER MAILING LABELS agrē has a built-in exports to make it easy to create a data file of customer address information, but how do you turn a list of names and addresses into mailing labels?

More information

Essential Skills for Bioinformatics: Unix/Linux

Essential Skills for Bioinformatics: Unix/Linux Essential Skills for Bioinformatics: Unix/Linux SHELL SCRIPTING Overview Bash, the shell we have used interactively in this course, is a full-fledged scripting language. Unlike Python, Bash is not a general-purpose

More information

Instructions for Using the Databases

Instructions for Using the Databases Appendix D Instructions for Using the Databases Two sets of databases have been created for you if you choose to use the Documenting Our Work forms. One set is in Access and one set is in Excel. They are

More information

COMPARATIVE MICROBIAL GENOMICS ANALYSIS WORKSHOP. Exercise 2: Predicting Protein-encoding Genes, BlastMatrix, BlastAtlas

COMPARATIVE MICROBIAL GENOMICS ANALYSIS WORKSHOP. Exercise 2: Predicting Protein-encoding Genes, BlastMatrix, BlastAtlas COMPARATIVE MICROBIAL GENOMICS ANALYSIS WORKSHOP Exercise 2: Predicting Protein-encoding Genes, BlastMatrix, BlastAtlas First of all connect once again to the CBS system: Open ssh shell client. Press Quick

More information

Tour Guide for Windows and Macintosh

Tour Guide for Windows and Macintosh Tour Guide for Windows and Macintosh 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Suite 100A, Ann Arbor, MI 48108 USA phone 1.800.497.4939 or 1.734.769.7249 (fax) 1.734.769.7074

More information

MacVector for Mac OS X. The online updater for this release is MB in size

MacVector for Mac OS X. The online updater for this release is MB in size MacVector 17.0.3 for Mac OS X The online updater for this release is 143.5 MB in size You must be running MacVector 15.5.4 or later for this updater to work! System Requirements MacVector 17.0 is supported

More information

Chapter 7. Joining Maps to Other Datasets in QGIS

Chapter 7. Joining Maps to Other Datasets in QGIS Chapter 7 Joining Maps to Other Datasets in QGIS Skills you will learn: How to join a map layer to a non-map layer in preparation for analysis, based on a common joining field shared by the two tables.

More information

Performing a resequencing assembly

Performing a resequencing assembly BioNumerics Tutorial: Performing a resequencing assembly 1 Aim In this tutorial, we will discuss the different options to obtain statistics about the sequence read set data and assess the quality, and

More information

Microsoft Excel 2007

Microsoft Excel 2007 Learning computers is Show ezy Microsoft Excel 2007 301 Excel screen, toolbars, views, sheets, and uses for Excel 2005-8 Steve Slisar 2005-8 COPYRIGHT: The copyright for this publication is owned by Steve

More information

SAM / BAM Tutorial. EMBL Heidelberg. Course Materials. Tobias Rausch September 2012

SAM / BAM Tutorial. EMBL Heidelberg. Course Materials. Tobias Rausch September 2012 SAM / BAM Tutorial EMBL Heidelberg Course Materials Tobias Rausch September 2012 Contents 1 SAM / BAM 3 1.1 Introduction................................... 3 1.2 Tasks.......................................

More information

Genomics - Problem Set 2 Part 1 due Friday, 1/26/2018 by 9:00am Part 2 due Friday, 2/2/2018 by 9:00am

Genomics - Problem Set 2 Part 1 due Friday, 1/26/2018 by 9:00am Part 2 due Friday, 2/2/2018 by 9:00am Genomics - Part 1 due Friday, 1/26/2018 by 9:00am Part 2 due Friday, 2/2/2018 by 9:00am One major aspect of functional genomics is measuring the transcript abundance of all genes simultaneously. This was

More information

ithenticate User Guide Getting Started Folders Managing your Documents The Similarity Report Settings Account Information

ithenticate User Guide Getting Started Folders Managing your Documents The Similarity Report Settings Account Information ithenticate User Guide Getting Started Folders Managing your Documents The Similarity Report Settings Account Information 1 Getting Started Whether you are a new user or a returning one, to access ithenticate

More information

Annotating a single sequence

Annotating a single sequence BioNumerics Tutorial: Annotating a single sequence 1 Aim The annotation application in BioNumerics has been designed for the annotation of coding regions on sequences. In this tutorial you will learn how

More information

Order Preserving Triclustering Algorithm. (Version1.0)

Order Preserving Triclustering Algorithm. (Version1.0) Order Preserving Triclustering Algorithm User Manual (Version1.0) Alain B. Tchagang alain.tchagang@nrc-cnrc.gc.ca Ziying Liu ziying.liu@nrc-cnrc.gc.ca Sieu Phan sieu.phan@nrc-cnrc.gc.ca Fazel Famili fazel.famili@nrc-cnrc.gc.ca

More information

Introduction to Galaxy

Introduction to Galaxy Introduction to Galaxy Dr Jason Wong Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW Day 1 Thurs 28 th January 2016 Overview What is Galaxy? Description of

More information

Bioinformatics explained: BLAST. March 8, 2007

Bioinformatics explained: BLAST. March 8, 2007 Bioinformatics Explained Bioinformatics explained: BLAST March 8, 2007 CLC bio Gustav Wieds Vej 10 8000 Aarhus C Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com info@clcbio.com Bioinformatics

More information

Finding Selection in All the Right Places TA Notes and Key Lab 9

Finding Selection in All the Right Places TA Notes and Key Lab 9 Objectives: Finding Selection in All the Right Places TA Notes and Key Lab 9 1. Use published genome data to look for evidence of selection in individual genes. 2. Understand the need for DNA sequence

More information

CHAPTER 1 COPYRIGHTED MATERIAL. Finding Your Way in the Inventor Interface

CHAPTER 1 COPYRIGHTED MATERIAL. Finding Your Way in the Inventor Interface CHAPTER 1 Finding Your Way in the Inventor Interface COPYRIGHTED MATERIAL Understanding Inventor s interface behavior Opening existing files Creating new files Modifying the look and feel of Inventor Managing

More information