Chromatin immunoprecipitation sequencing (ChIP-Seq) on the SOLiD system Nature Methods 6, (2009)

Size: px

Start display at page:

Download "Chromatin immunoprecipitation sequencing (ChIP-Seq) on the SOLiD system Nature Methods 6, (2009)"

Judith McDowell
5 years ago
Views:

1 ChIP-seq

Chromatin immunoprecipitation (ChIP) is a technique for identifying and characterizing elements in protein-dna interactions involved in gene

2 Chromatin immunoprecipitation (ChIP) is a technique for identifying and characterizing elements in protein-dna interactions involved in gene regulation or chromatin organization. Chromatin immunoprecipitation sequencing (ChIP-Seq) on the SOLiD system Nature Methods 6, (2009)

3 Chromatin immunoprecipitation sequencing (ChIP-Seq) on the SOLiD system Nature Methods 6, (2009)

Select ~200 base region based on your interpolated peaks to select the sequence surrounding the protein binding site on DNA across the genome peak_1 peak_2

4 Select ~200 base region based on your interpolated peaks to select the sequence surrounding the protein binding site on DNA across the genome peak_1 peak_2 peak_3 peak_4 The goal is to find a consensus DNA sequence among the sequences at each peak which will give us the DNA sequence that a protein recognizes and binds

5 chipseq_unaligned_seqs.fa >peak_1_75bp GCAAGTTACCACCACAGGCTCAACGTCGCTGCAGGCCGCAACGCTTGGAGCGTCGCCGCCATGCGTTCATGGTTA >peak_2_75bp TCGGAGCTTTGTTCCGAGTTGCCCCGGACTTTTCGTCGTTCCCGCGCCGCGTTGGAGTCTGAGATCTTGATTTTC >peak_3_75bp CGCTACCTGCGGTTGGTCTCAGCTGCATGACTGGACGCATGCGTTGGAGGGTTTTGTGTAGCGTTTCATGGTTAT >peak_4_75bp CGTGCGTACGACGAATCTTGTTCGCTGGCCTACTTCCCGCGCATGCGTTACTGTGAATCGGCATACCCTATCCTC ClustalW chipseq_aligned_seqs.fasta >peak_1_75bp GCAAGTTACCACCACAGGCTCAACGTCGCTGCAGGCCGCAACGCTTGGAGCGTCGCCGCCATGCGTTCATGGTTA >peak_2_75bp TCGGAGCTTTGTTCCGAGTTGCCCCGGACTTTTCGTCGTTCCCGCGCCGCGTTGGAGTCTGAGATCTTGATTTTC >peak_3_75bp CGCTACCTGCGGTTGGTCTCAGCTGCATGACTGGACGCATGCGTTGGAGGGTTTTGTGTAGCGTTTCATGGTTAT >peak_4_75bp CGTGCGTACGACGAATCTTGTTCGCTGGCCTACTTCCCGCGCATGCGTTACTGTGAATCGGCATACCCTATCCTC

6 C G C A A C G C G C G C C G C G C G C A T G C G C G C G C G C G C G C C C C G C C G C A A G C G C G C G G G C G C G C G G G C G G C G G C G C G C G C C G G C G Building a Position Weight Matrix File (.pwm) and Sequence Logo image sequences.fasta A C G T sequences.pwm sequence_logo.jpg

9 Create a sequence logo using the seqlogo Bioconductor package in R library(seqlogo) chipseq< read.table("chipseq.pwm") my_pwm< makepwm(chipseq) #formats values 0 to 0.0,.4 to 0.4 seqlogo(my_pwm) #creates the logo image A position weight matrix file format for seqlogo has DNA position represented by columns and the rows represent the four nucleotides in alphabetical order A C G T seqlogo #the makepwm( ) formatting is optional, not needed if your PWM is already formatted like:

12 Finding Files on a UNIX System find <starting_directory> mtime <modified_days> name <name_of_script(s)> #find will start from the <starting_directory> and recursively search all sub directories Examples: find./ name "*pl" #find all Perl scripts starting from current directory find./ mtime 7 #find all files modified within the last 7 days find./ mtime 7 #find all files modified 7 days ago find./ mtime +7 #find all files modified over 7 days ago find./ mtime 7 type f #only search for files not directories find./ mtime 7 name "*pl" #all Perl scripts modified within last 7 days find./ mtime 0 type f #all files modified within last 24 hrs

13 Programming Assignment Create a Perl pipeline script that does the following (choose either basic or advanced): 1. Align sequences chipseq_unaligned_seqs.fa using ClustalW with FASTA output clustalw infile=chipseq_unaligned_seqs.fa gapopen=1000 output=fasta 2. Read in the ClustalW output file chipseq_seqs.fasta and find (see next slide) the substr start and end by finding longest start and end gaps in the alignment file advanced 3. Trim sequences substr based on longest start and end gaps and save to a FASTA file 4. Creates a position weight matrix (pwm) file from your trimmed FASTA sequence file: chipseq_aligned_trimmed_seqs.fa basic 5. Creates a file with the R commands to create a sequence logo from your pwm file. 6. Runs your R commands file through R within your script using backticks and creates the sequence logo jpg image Submit your sequence file to MEME and compare the MEME sequence logo to yours Send me your Perl script, R commands file and jpg sequence logo images before next class

14 Using Perl Regular Expressions to find start of sequence from alignment (for use with the advanced version of the Perl script assignment) * * #!/usr/bin/perl w use strict; #this script trims aligned seqs so there are no end gaps; assumes no internal gaps while (my $header = <DATA>) { my $sequence = (<DATA>); chomp($header, $sequence); if ($sequence =~ / [GATC]/) { #if matches any one of the characters in [ ] # $ [0] returns position right before begin of matching string my $match_minus1 = $ [0]; # $+[0] returns position right after begin of matching string my $match_plus1 = $+[0]; } if ($sequence =~ /[GATC] /) { #if matches any one of the characters in [ ] #we want the position right after the first matching character in the string my $sequence_end = $ [0] + 1; } } DATA >peak_1 GATCT * GATCT $ [0] $+[0] $ [0] $+[0] *

Browser Exercises - I. Alignments and Comparative genomics

Browser Exercises - I. Alignments and Comparative genomics Browser Exercises - I Alignments and Comparative genomics 1. Navigating to the Genome Browser (GBrowse) Note: For this exercise use http://www.tritrypdb.org a. Navigate to the Genome Browser (GBrowse)