Pyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds

Size: px
Start display at page:

Download "Pyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds"

Transcription

1 Pyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds February 27, 2009 Alexander Mastroianni, Shelley Claridge, A. Paul Alivisatos Department of Chemistry, University of California, Berkeley, California , and Division of Materials Science, Lawrence Berkeley National Laboratory, Berkeley, California Supplemental Information 1

2 Additional TEM Images a) b) 5 nm 100 nm c) 100 nm d) 100 nm Figure 1: a) Bare gold b) The contents of the band running ahead of the pyramid band c,d) Additional images of pyramids S2

3 Sequences G1 Each strand bears a hexyl-thiol linker at its 5 end. Strand 1 TTT GCC TGG AGA TAC ATG CAC ATT ACG GCT TTC CCT ATT AGA AGG TCT CAG GTG CGC GTT TCG GTA AGT AGA CGG GAC CAG TTC GCC Strand 2 TTT CGC GCA CCT GAG ACC TTC TAA TAG GGT TTG CGA CAG TCG TTC AAC TAG AAT GCC CTT TGG GCT GTT CCG GGT GTG GCT CGT CGG Strand 3 TTT GGC CGA GGA CTC CTG CTC CGC TGC GGT TTG GCG AAC TGG TCC CGT CTA CTT ACC GTT TCC GAC GAG CCA CAC CCG GAA CAG CCC Strand 4 TTT GCC GTA ATG TGC ATG TAT CTC CAG GCT TTC CGC AGC GGA GCA GGA GTC CTC GGC CTT TGG GCA TTC TAG TTG AAC GAC TGT CGC G2 Each strand is divided into two pieces (x.1, x.2). The first piece has the thiol linker Strand 1.1 TTT TTC CCT GTA CTG GCT AGG AAT TCA CGT TTT AAT CTG GGC TTT GGG TTA AGA AAC TCC CCG Strand 1.2 CGC TGG AGG CGC ATC ACC GTT TGC GTA TGT GTT CTG TGC GGC CTG CCG TCC CGT GTG GG Strand 2.1 TTT TTC GGT GAT GCG CCT CCA GCG CGG GGA GTT TCT TAA CCC TTT CCG ACT TAC AAG AGC CGG Strand 2.2 GCG AGA CTC AGG TGG TGC CTT TGG CAT TCG ACC AGG AGA TAT CGC GTT CAG CTA TGC CC Strand 3.1 TTT TTC CCA TGA GAA TAA TAC CGC CGA TTT ACG TCA GTC CGG TTT CCC ACA CGG GAC GGC AGG C Strand 3.2 CGC ACA GAA CAC ATA CGC TTT GGG CAT AGC TGA ACG CGA TAT CTC CTG GTC GAA TGC C Strand 4.1 S3

4 TTT TTG CCC AGA TTA AAA CGT GAA TTC CTA GCC AGT ACA GGG TTT CCG GAC TGA CGT AAA TCG G Strand 4.2 CGG TAT TAT TCT CAT GGG TTT GGC ACC ACC TGA GTC TCG CCC GGC TCT TGT AAG TCG G S4

5 SequenceDesign Program The following is the C + + code listing for designing pyramids. In this version, it will generate several sets of strands for pyramids with 26 bp sides. It can be modified slightly to produce larger pyramids by changing strandlength and modifying the makestrandx functions. /* Sequence Design Written by Alexander Mastroianni in 2007 This program is used to design sets of ssdna strands for constructing pyramids out of DNA. A sequence is constructed using functions which add a specific base, a guanine or cytosine only, or any random base. As each base is included it and the previous four bases define a snippet. The snippet and its complement are added to the 1-D array snippets. Upon addition of more bases snippets is searched and existing subsequences are avoided. A pyramid is defined as having vertices A, B, C, and D. Strand 1 traces ABC. Strand 2 traces CBD. Strand 3 traces DAC. Strand 4 traces BAD. The appropriate complementary relationships are accounted for in the program. In addition, GC clamps are added at the ends of the sides and thymine residues are added at the corners to relieve tension in the system. Sets of sequences with fewer than a chosen threshold of repeated snippets are output to the screen. */ //Standard libraries #include <iostream> #include <math.h> #include <time.h> S5

6 //A snippet is a 5bp sequence. We re attempting not to repeat //snippets. This definition sets the size of a list used to hold //snippets being created. #define maxsnippets //The length of our DNA #define strandlength 87 //The number of times the program attempts to add a random base //while avoiding duplicating snippets. This is crude, but virtually //ensures that all bases get tried. #define randomattempts 100 //The four bases are represented as integers //G=0 //A=1 //C=2 //T=3 int snippets[maxsnippets]; //A list of all the snippets int snippetindex;//where we are in the snippet list //int maxrepeatlength; int overmax; //The number of repeated snippets in a sequence set int attempts; //The number of sets of sequences to generate //These 1-D arrays hold the sequences, with each base //represented as an integer int O1[strandlength]; int O2[strandlength]; int O3[strandlength]; int O4[strandlength]; //These functions set up and initialize the sequences. void get_parameters(); void initialize(); //makestrands() is function that actually generates the sequences, S6

7 //calling the four sub functions, makestrand1(), etc. void makestrands(); void makestrand1(); void makestrand2(); void makestrand3(); void makestrand4(); //These functions add any random base, a G or C only, //or a specific base, with the appropriate accounting of //the new snippet. int attemptaddany(int testsnippet); int attemptaddgc(int testsnippet); int attemptaddspecific(int testsnippet, int base); //A test to see if a particular snippet x is in the list already. bool snippetfound(int x); //This returns the complement of a given base. int complement(int x); //These functions return a random base, or GC only. //They simply return bases, but don t check for snippet repetition. //They are used internally by attemptaddany(), attemptaddgc(), //and attemptaddspecific(). int anybase(); int GorC(); //This function converts the numerical representation of a base //to a letter, for printing. char numtobase(int x); int main () { S7

8 int t; get_parameters(); initialize(); //attempts is set in get_parameters() for(t=0;t<attempts;t++) makestrands(); //main() void get_parameters(){ //This just sets the number of attempts to make //a set of strands. printf("enter the number of attempts: "); scanf("%d",&attempts); //get_parameters() void initialize(){ int x; //Initialize the random number generator. srand(time(null)); //Initialize the list of snippets. snippetindex=0; for(x=0;x<maxsnippets;x++) snippets[x]=0; //Initialize each strand. S8

9 // 5 represents a null base for(x=0;x<strandlength;x++){ O1[x]=5; O2[x]=5; O3[x]=5; O4[x]=5; //initialize() int attemptaddany(int testsnippet){ int x,test,temp; bool found; found = false; //Attempt to add a random base that doesn t complete //a snippet on the list for(x=0;x<randomattempts;x++){ if (found == false){ test=anybase(); if ( not snippetfound(testsnippet*10+test) ){ temp=test; found=true; //if found = false //for loop //If a snippet not on the list can t be found, give up and //return a random base. if (found==false){ overmax++; return anybase(); else S9

10 return temp; //attemptaddany() int attemptaddgc(int testsnippet){ int x,test,temp; bool found; found = false; //Attempt to add a Gor C base that doesn t complete //a snippet on the list. for(x=0;x<randomattempts;x++){ if (found == false){ test=gorc(); if ( not snippetfound(testsnippet*10+test) ){ temp=test; found=true; //if found = false //for loop //If a snippet not on the list can t be found, give up and //return a G or C base. if (found==false){ overmax++; return GorC(); else return temp; //attemptaddgc() S10

11 int attemptaddspecific(int testsnippet, int base){ //For adding a specific base the snippet is put on the //snippet list anyway. if (snippetfound(testsnippet*10+base)){ overmax++; return base; else return base; //attemptaddspecific() bool snippetfound(int x){ //Search the list of snippets for snippet x int y; bool temp; temp = false; for(y=0;y<maxsnippets;y++) if (snippets[y]==x) temp=true; return temp; //snippetfound() int complement(int x){ //Returns the complement of base x. S11

12 return (x+2)%4; //complement() int anybase(){ //Pick a random base int temp; temp=rand()%4; return temp; //anybase() int GorC(){ //Pick a G or C if((anybase() - 2) < 0) return 0; else return 2; //GorC() char numtobase(int x){ //Convert the numberical representation of a base to //the letter. Recall that 5 is the number used for //initialization and so x represents an untouched position S12

13 //in the strand. char temp; switch(x){ case 0: temp= G ; break; case 1: temp= A ; break; case 2: temp= C ; break; case 3: temp= T ; break; case 5: temp= x ; break; //switch statement return temp; //numtobase() void makestrands(){ //This function makes a set of strands. int x; //Initialize the number of strands with too many repeated S13

14 //snippets, and the list of snippets. overmax=0; snippetindex=0; for(x=0;x<maxsnippets;x++) snippets[x]=0; //Generate each strand makestrand1(); makestrand2(); makestrand3(); makestrand4(); //print strands //Here, for example, 10 snippet repeats is the threhold //for printing a strand. if(overmax<10){ //This is just a scale to identify bases the output. for(x=0;x<strandlength;x++) printf("%d",x%10); printf("\n"); //Each strand is now printed printf("strand 1:\n"); //An optional marker to lay out the positions of the sides. printf(" [ ]...AB...[ ] [ ]"); printf("...bc...[ ] [ ]...CA...[ ]\n"); for(x=0;x<strandlength;x++) printf("%c",numtobase(o1[x])); printf("\n"); printf("strand 2:\n"); printf(" [ ]...CB...[ ] [ ]"); printf("...bd...[ ] [ ]...DC...[ ]\n"); for(x=0;x<strandlength;x++) printf("%c",numtobase(o2[x])); printf("\n"); S14

15 printf("strand 3:\n"); printf(" [ ]...DA...[ ] [ ]"); printf("...ac...[ ] [ ]...CD...[ ]\n"); for(x=0;x<strandlength;x++) printf("%c",numtobase(o3[x])); printf("\n"); printf("strand 4:\n"); printf(" [ ]...BA...[ ] [ ]"); printf("...ad...[ ] [ ]...DB...[ ]\n"); for(x=0;x<strandlength;x++) printf("%c",numtobase(o4[x])); printf("\n"); //Report how many snippets were duplicated. printf("overmax = %d\n",overmax); //makestrands() //These functions make each strand. void makestrand1(){ int x,testsnippet; //The T spacer at the beginning of Strand 1 for(x=0;x<3;x++) O1[x]=3; //The start of the 5 GC clamp of AB for(x=3;x<5;x++) O1[x]=GorC(); O1[x]+O1[x-1]*10+O1[x-2]*100+O1[x-3]*1000+O1[x-4]*10000; S15

16 complement(o1[x])*10000+complement(o1[x-1])*1000+complement(o1[x-2])*100 +complement(o1[x-3])*10+complement(o1[x-4]); //The rest of the 5 GC clamp of AB x=5; testsnippet=o1[x-1]+o1[x-2]*10+o1[x-3]*100+o1[x-4]*1000; O1[x]=attemptaddGC(testsnippet); O1[x]+O1[x-1]*10+O1[x-2]*100+O1[x-3]*1000+O1[x-4]*10000; complement(o1[x])*10000+complement(o1[x-1])*1000+complement(o1[x-2])*100 +complement(o1[x-3])*10+complement(o1[x-4]); //the middle of AB for(x=6;x<26;x++){ testsnippet=o1[x-1]+o1[x-2]*10+o1[x-3]*100+o1[x-4]*1000; O1[x]=attemptaddany(testsnippet); O1[x]+O1[x-1]*10+O1[x-2]*100+O1[x-3]*1000+O1[x-4]*10000; complement(o1[x])*10000+complement(o1[x-1])*1000+complement(o1[x-2])*100 +complement(o1[x-3])*10+complement(o1[x-4]); //the 3 GC clamp of AB for(x=26;x<29;x++){ testsnippet=o1[x-1]+o1[x-2]*10+o1[x-3]*100+o1[x-4]*1000; O1[x]=attemptaddGC(testsnippet); O1[x]+O1[x-1]*10+O1[x-2]*100+O1[x-3]*1000+O1[x-4]*10000; S16

17 complement(o1[x])*10000+complement(o1[x-1])*1000+complement(o1[x-2])*100 +complement(o1[x-3])*10+complement(o1[x-4]); //The T spacer between AB and BC for(x=29;x<32;x++){ testsnippet=o1[x-1]+o1[x-2]*10+o1[x-3]*100+o1[x-4]*1000; O1[x]=attemptaddspecific(testsnippet,3); O1[x]+O1[x-1]*10+O1[x-2]*100+O1[x-3]*1000+O1[x-4]*10000; complement(o1[x])*10000+complement(o1[x-1])*1000+complement(o1[x-2])*100 +complement(o1[x-3])*10+complement(o1[x-4]); //the 5 GC clamp of BC for(x=32;x<35;x++){ testsnippet=o1[x-1]+o1[x-2]*10+o1[x-3]*100+o1[x-4]*1000; O1[x]=attemptaddGC(testsnippet); O1[x]+O1[x-1]*10+O1[x-2]*100+O1[x-3]*1000+O1[x-4]*10000; complement(o1[x])*10000+complement(o1[x-1])*1000+complement(o1[x-2])*100 +complement(o1[x-3])*10+complement(o1[x-4]); //the middle of BC for(x=35;x<55;x++){ testsnippet=o1[x-1]+o1[x-2]*10+o1[x-3]*100+o1[x-4]*1000; O1[x]=attemptaddany(testsnippet); O1[x]+O1[x-1]*10+O1[x-2]*100+O1[x-3]*1000+O1[x-4]*10000; complement(o1[x])*10000+complement(o1[x-1])*1000+complement(o1[x-2])*100 S17

18 +complement(o1[x-3])*10+complement(o1[x-4]); //the 3 GC clamp of AB for(x=55;x<58;x++){ testsnippet=o1[x-1]+o1[x-2]*10+o1[x-3]*100+o1[x-4]*1000; O1[x]=attemptaddGC(testsnippet); O1[x]+O1[x-1]*10+O1[x-2]*100+O1[x-3]*1000+O1[x-4]*10000; complement(o1[x])*10000+complement(o1[x-1])*1000+complement(o1[x-2])*100 +complement(o1[x-3])*10+complement(o1[x-4]); //The T spacer between AB and BC for(x=58;x<61;x++){ testsnippet=o1[x-1]+o1[x-2]*10+o1[x-3]*100+o1[x-4]*1000; O1[x]=attemptaddspecific(testsnippet,3); O1[x]+O1[x-1]*10+O1[x-2]*100+O1[x-3]*1000+O1[x-4]*10000; complement(o1[x])*10000+complement(o1[x-1])*1000+complement(o1[x-2])*100 +complement(o1[x-3])*10+complement(o1[x-4]); //the 5 GC clamp of CA for(x=61;x<64;x++){ testsnippet=o1[x-1]+o1[x-2]*10+o1[x-3]*100+o1[x-4]*1000; O1[x]=attemptaddGC(testsnippet); O1[x]+O1[x-1]*10+O1[x-2]*100+O1[x-3]*1000+O1[x-4]*10000; complement(o1[x])*10000+complement(o1[x-1])*1000+complement(o1[x-2])*100 S18

19 +complement(o1[x-3])*10+complement(o1[x-4]); //the middle of CA for(x=64;x<84;x++){ testsnippet=o1[x-1]+o1[x-2]*10+o1[x-3]*100+o1[x-4]*1000; O1[x]=attemptaddany(testsnippet); O1[x]+O1[x-1]*10+O1[x-2]*100+O1[x-3]*1000+O1[x-4]*10000; complement(o1[x])*10000+complement(o1[x-1])*1000+complement(o1[x-2])*100 +complement(o1[x-3])*10+complement(o1[x-4]); //the 3 GC clamp of CA for(x=84;x<87;x++){ testsnippet=o1[x-1]+o1[x-2]*10+o1[x-3]*100+o1[x-4]*1000; O1[x]=attemptaddGC(testsnippet); O1[x]+O1[x-1]*10+O1[x-2]*100+O1[x-3]*1000+O1[x-4]*10000; complement(o1[x])*10000+complement(o1[x-1])*1000+complement(o1[x-2])*100 +complement(o1[x-3])*10+complement(o1[x-4]); //makestrand1() void makestrand2(){ int x,testsnippet; //The tail of strand 2 S19

20 for(x=0;x<3;x++) O2[x]=3; //CB (complement of BC in strand 1) for(x=3;x<29;x++) O2[x]=complement(O1[60-x]); //Add snippets for TTT plus the beginning of CB for(x=4;x<7;x++){ O2[x]+O2[x-1]*10+O2[x-2]*100+O2[x-3]*1000+O2[x-4]*10000; complement(o2[x])*10000+complement(o2[x-1])*1000+complement(o2[x-2])*100 +complement(o2[x-3])*10+complement(o2[x-4]); //The T spacer between CB and BD for(x=29;x<32;x++){ testsnippet=o2[x-1]+o2[x-2]*10+o2[x-3]*100+o2[x-4]*1000; O2[x]=attemptaddspecific(testsnippet,3); O2[x]+O2[x-1]*10+O2[x-2]*100+O2[x-3]*1000+O2[x-4]*10000; complement(o2[x])*10000+complement(o2[x-1])*1000+complement(o2[x-2])*100 +complement(o2[x-3])*10+complement(o2[x-4]); //The 5 GC clamp of BD for(x=32;x<35;x++){ testsnippet=o2[x-1]+o2[x-2]*10+o2[x-3]*100+o2[x-4]*1000; O2[x]=attemptaddGC(testsnippet); O2[x]+O2[x-1]*10+O2[x-2]*100+O2[x-3]*1000+O2[x-4]*10000; S20

21 complement(o2[x])*10000+complement(o2[x-1])*1000+complement(o2[x-2])*100 +complement(o2[x-3])*10+complement(o2[x-4]); //The middle of BD for(x=35;x<55;x++){ testsnippet=o2[x-1]+o2[x-2]*10+o2[x-3]*100+o2[x-4]*1000; O2[x]=attemptaddany(testsnippet); O2[x]+O2[x-1]*10+O2[x-2]*100+O2[x-3]*1000+O2[x-4]*10000; complement(o2[x])*10000+complement(o2[x-1])*1000+complement(o2[x-2])*100 +complement(o2[x-3])*10+complement(o2[x-4]); //The 3 GC clamp of BD for(x=55;x<58;x++){ testsnippet=o2[x-1]+o2[x-2]*10+o2[x-3]*100+o2[x-4]*1000; O2[x]=attemptaddGC(testsnippet); O2[x]+O2[x-1]*10+O2[x-2]*100+O2[x-3]*1000+O2[x-4]*10000; complement(o2[x])*10000+complement(o2[x-1])*1000+complement(o2[x-2])*100 +complement(o2[x-3])*10+complement(o2[x-4]); //The T spacer between BD and DC for(x=58;x<61;x++){ testsnippet=o2[x-1]+o2[x-2]*10+o2[x-3]*100+o2[x-4]*1000; O2[x]=attemptaddspecific(testsnippet,3); O2[x]+O2[x-1]*10+O2[x-2]*100+O2[x-3]*1000+O2[x-4]*10000; S21

22 complement(o2[x])*10000+complement(o2[x-1])*1000+complement(o2[x-2])*100 +complement(o2[x-3])*10+complement(o2[x-4]); //The 5 GC clamp of DC for(x=61;x<64;x++){ testsnippet=o2[x-1]+o2[x-2]*10+o2[x-3]*100+o2[x-4]*1000; O2[x]=attemptaddGC(testsnippet); O2[x]+O2[x-1]*10+O2[x-2]*100+O2[x-3]*1000+O2[x-4]*10000; complement(o2[x])*10000+complement(o2[x-1])*1000+complement(o2[x-2])*100 +complement(o2[x-3])*10+complement(o2[x-4]); //The middle of DC for(x=64;x<84;x++){ testsnippet=o2[x-1]+o2[x-2]*10+o2[x-3]*100+o2[x-4]*1000; O2[x]=attemptaddany(testsnippet); O2[x]+O2[x-1]*10+O2[x-2]*100+O2[x-3]*1000+O2[x-4]*10000; complement(o2[x])*10000+complement(o2[x-1])*1000+complement(o2[x-2])*100 +complement(o2[x-3])*10+complement(o2[x-4]); //The 3 GC clamp of DC for(x=84;x<87;x++){ testsnippet=o2[x-1]+o2[x-2]*10+o2[x-3]*100+o2[x-4]*1000; O2[x]=attemptaddGC(testsnippet); O2[x]+O2[x-1]*10+O2[x-2]*100+O2[x-3]*1000+O2[x-4]*10000; complement(o2[x])*10000+complement(o2[x-1])*1000+complement(o2[x-2])*100 S22

23 +complement(o2[x-3])*10+complement(o2[x-4]); //makestrand2() void makestrand3(){ int x, testsnippet; //The T spacer at the beginning of Strand 3 for(x=0;x<3;x++) O3[x]=3; //The start of the 5 GC clamp of DA for(x=3;x<5;x++) O3[x]=GorC(); O3[x]+O3[x-1]*10+O3[x-2]*100+O3[x-3]*1000+O3[x-4]*10000; complement(o3[x])*10000+complement(o3[x-1])*1000+complement(o3[x-2])*100 +complement(o3[x-3])*10+complement(o3[x-4]); //The rest of the 5 GC clamp of DA x=5; testsnippet=o3[x-1]+o3[x-2]*10+o3[x-3]*100+o3[x-4]*1000; O3[x]=attemptaddGC(testsnippet); O3[x]+O3[x-1]*10+O3[x-2]*100+O3[x-3]*1000+O3[x-4]*10000; complement(o3[x])*10000+complement(o3[x-1])*1000+complement(o3[x-2])*100 +complement(o3[x-3])*10+complement(o3[x-4]); S23

24 //the middle of DA for(x=6;x<26;x++){ testsnippet=o3[x-1]+o3[x-2]*10+o3[x-3]*100+o3[x-4]*1000; O3[x]=attemptaddany(testsnippet); O3[x]+O3[x-1]*10+O3[x-2]*100+O3[x-3]*1000+O3[x-4]*10000; complement(o3[x])*10000+complement(o3[x-1])*1000+complement(o3[x-2])*100 +complement(o3[x-3])*10+complement(o3[x-4]); //the 3 GC clamp of DA for(x=26;x<29;x++){ testsnippet=o3[x-1]+o3[x-2]*10+o3[x-3]*100+o3[x-4]*1000; O3[x]=attemptaddGC(testsnippet); O3[x]+O3[x-1]*10+O3[x-2]*100+O3[x-3]*1000+O3[x-4]*10000; complement(o3[x])*10000+complement(o3[x-1])*1000+complement(o3[x-2])*100 +complement(o3[x-3])*10+complement(o3[x-4]); //The T spacer between DA and AC for(x=29;x<32;x++){ testsnippet=o3[x-1]+o3[x-2]*10+o3[x-3]*100+o3[x-4]*1000; O3[x]=attemptaddspecific(testsnippet,3); O3[x]+O3[x-1]*10+O3[x-2]*100+O3[x-3]*1000+O3[x-4]*10000; complement(o3[x])*10000+complement(o3[x-1])*1000+complement(o3[x-2])*100 +complement(o3[x-3])*10+complement(o3[x-4]); //AC (complement of CA in strand 1) S24

25 for(x=32;x<58;x++) O3[x]=complement(O1[118-x]); //Add snippets for TTT plus the beginning of AC for(x=32;x<35;x++){ O3[x]+O3[x-1]*10+O3[x-2]*100+O3[x-3]*1000+O3[x-4]*10000; complement(o3[x])*10000+complement(o3[x-1])*1000+complement(o3[x-2])*100 +complement(o3[x-3])*10+complement(o3[x-4]); //The T spacer between AC and CD for(x=58;x<61;x++){ testsnippet=o3[x-1]+o3[x-2]*10+o3[x-3]*100+o3[x-4]*1000; O3[x]=attemptaddspecific(testsnippet,3); O3[x]+O3[x-1]*10+O3[x-2]*100+O3[x-3]*1000+O3[x-4]*10000; complement(o3[x])*10000+complement(o3[x-1])*1000+complement(o3[x-2])*100 +complement(o3[x-3])*10+complement(o3[x-4]); //CD (complement of DC in strand 2) for(x=61;x<87;x++) O3[x]=complement(O2[147-x]); //Add snippets for TTT plus the beginning of CD for(x=61;x<64;x++){ O3[x]+O3[x-1]*10+O3[x-2]*100+O3[x-3]*1000+O3[x-4]*10000; complement(o3[x])*10000+complement(o3[x-1])*1000+complement(o3[x-2])*100 +complement(o3[x-3])*10+complement(o3[x-4]); S25

26 //makestrand3() void makestrand4(){ int x; //The 5 tail of strand 4 for(x=0;x<3;x++) O4[x]=3; //BA (complement of AB in strand 1) for(x=3;x<29;x++) O4[x]=complement(O1[31-x]); //The spacer between BA and AD for(x=29;x<32;x++) O4[x]=3; //AD (complement of DA in strand 3) for(x=32;x<58;x++) O4[x]=complement(O3[60-x]); //The spacer between AD and DB for(x=58;x<61;x++) O4[x]=3; //DB (complement of BD in strand 2) for(x=61;x<87;x++) O4[x]=complement(O2[118-x]); //makestrand4() S26

Genome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner

Genome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner Genome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner Outline I. Problem II. Two Historical Detours III.Example IV.The Mathematics of DNA Sequencing V.Complications

More information

by the Genevestigator program (www.genevestigator.com). Darker blue color indicates higher gene expression.

by the Genevestigator program (www.genevestigator.com). Darker blue color indicates higher gene expression. Figure S1. Tissue-specific expression profile of the genes that were screened through the RHEPatmatch and root-specific microarray filters. The gene expression profile (heat map) was drawn by the Genevestigator

More information

HP22.1 Roth Random Primer Kit A für die RAPD-PCR

HP22.1 Roth Random Primer Kit A für die RAPD-PCR HP22.1 Roth Random Kit A für die RAPD-PCR Kit besteht aus 20 Einzelprimern, jeweils aufgeteilt auf 2 Reaktionsgefäße zu je 1,0 OD Achtung: Angaben beziehen sich jeweils auf ein Reaktionsgefäß! Sequenz

More information

Appendix A. Example code output. Chapter 1. Chapter 3

Appendix A. Example code output. Chapter 1. Chapter 3 Appendix A Example code output This is a compilation of output from selected examples. Some of these examples requires exernal input from e.g. STDIN, for such examples the interaction with the program

More information

6 Anhang. 6.1 Transgene Su(var)3-9-Linien. P{GS.ry + hs(su(var)3-9)egfp} 1 I,II,III,IV 3 2I 3 3 I,II,III 3 4 I,II,III 2 5 I,II,III,IV 3

6 Anhang. 6.1 Transgene Su(var)3-9-Linien. P{GS.ry + hs(su(var)3-9)egfp} 1 I,II,III,IV 3 2I 3 3 I,II,III 3 4 I,II,III 2 5 I,II,III,IV 3 6.1 Transgene Su(var)3-9-n P{GS.ry + hs(su(var)3-9)egfp} 1 I,II,III,IV 3 2I 3 3 I,II,III 3 4 I,II,II 5 I,II,III,IV 3 6 7 I,II,II 8 I,II,II 10 I,II 3 P{GS.ry + UAS(Su(var)3-9)EGFP} A AII 3 B P{GS.ry + (10.5kbSu(var)3-9EGFP)}

More information

Genome Reconstruction: A Puzzle with a Billion Pieces. Phillip Compeau Carnegie Mellon University Computational Biology Department

Genome Reconstruction: A Puzzle with a Billion Pieces. Phillip Compeau Carnegie Mellon University Computational Biology Department http://cbd.cmu.edu Genome Reconstruction: A Puzzle with a Billion Pieces Phillip Compeau Carnegie Mellon University Computational Biology Department Eternity II: The Highest-Stakes Puzzle in History Courtesy:

More information

Supplementary Table 1. Data collection and refinement statistics

Supplementary Table 1. Data collection and refinement statistics Supplementary Table 1. Data collection and refinement statistics APY-EphA4 APY-βAla8.am-EphA4 Crystal Space group P2 1 P2 1 Cell dimensions a, b, c (Å) 36.27, 127.7, 84.57 37.22, 127.2, 84.6 α, β, γ (

More information

SUPPLEMENTARY INFORMATION. Systematic evaluation of CRISPR-Cas systems reveals design principles for genome editing in human cells

SUPPLEMENTARY INFORMATION. Systematic evaluation of CRISPR-Cas systems reveals design principles for genome editing in human cells SUPPLEMENTARY INFORMATION Systematic evaluation of CRISPR-Cas systems reveals design principles for genome editing in human cells Yuanming Wang 1,2,7, Kaiwen Ivy Liu 2,7, Norfala-Aliah Binte Sutrisnoh

More information

warm-up exercise Representing Data Digitally goals for today proteins example from nature

warm-up exercise Representing Data Digitally goals for today proteins example from nature Representing Data Digitally Anne Condon September 6, 007 warm-up exercise pick two examples of in your everyday life* in what media are the is represented? is the converted from one representation to another,

More information

TCGR: A Novel DNA/RNA Visualization Technique

TCGR: A Novel DNA/RNA Visualization Technique TCGR: A Novel DNA/RNA Visualization Technique Donya Quick and Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275 dquick@mail.smu.edu, mhd@engr.smu.edu

More information

Machine Learning Classifiers

Machine Learning Classifiers Machine Learning Classifiers Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve Bayes Perceptrons, Multi-layer Neural Networks

More information

Digging into acceptor splice site prediction: an iterative feature selection approach

Digging into acceptor splice site prediction: an iterative feature selection approach Digging into acceptor splice site prediction: an iterative feature selection approach Yvan Saeys, Sven Degroeve, and Yves Van de Peer Department of Plant Systems Biology, Ghent University, Flanders Interuniversity

More information

2 41L Tag- AA GAA AAA ATA AAA GCA TTA RYA GAA ATT TGT RMW GAR C K65 Tag- A AAT CCA TAC AAT ACT CCA GTA TTT GCY ATA AAG AA

2 41L Tag- AA GAA AAA ATA AAA GCA TTA RYA GAA ATT TGT RMW GAR C K65 Tag- A AAT CCA TAC AAT ACT CCA GTA TTT GCY ATA AAG AA 176 SUPPLEMENTAL TABLES 177 Table S1. ASPE Primers for HIV-1 group M subtype B Primer no Type a Sequence (5'-3') Tag ID b Position c 1 M41 Tag- AA GAA AAA ATA AAA GCA TTA RYA GAA ATT TGT RMW GAR A d 45

More information

Supplementary Materials:

Supplementary Materials: Supplementary Materials: Amino acid codo n Numb er Table S1. Codon usage in all the protein coding genes. RSC U Proportion (%) Amino acid codo n Numb er RSC U Proportion (%) Phe UUU 861 1.31 5.71 Ser UCU

More information

Supplementary Data. Image Processing Workflow Diagram A - Preprocessing. B - Hough Transform. C - Angle Histogram (Rose Plot)

Supplementary Data. Image Processing Workflow Diagram A - Preprocessing. B - Hough Transform. C - Angle Histogram (Rose Plot) Supplementary Data Image Processing Workflow Diagram A - Preprocessing B - Hough Transform C - Angle Histogram (Rose Plot) D - Determination of holes Description of Image Processing Workflow The key steps

More information

Efficient Selection of Unique and Popular Oligos for Large EST Databases. Stefano Lonardi. University of California, Riverside

Efficient Selection of Unique and Popular Oligos for Large EST Databases. Stefano Lonardi. University of California, Riverside Efficient Selection of Unique and Popular Oligos for Large EST Databases Stefano Lonardi University of California, Riverside joint work with Jie Zheng, Timothy Close, Tao Jiang University of California,

More information

Supporting Information

Supporting Information Copyright WILEY VCH Verlag GmbH & Co. KGaA, 69469 Weinheim, Germany, 2015. Supporting Information for Small, DOI: 10.1002/smll.201501370 A Compact DNA Cube with Side Length 10 nm Max B. Scheible, Luvena

More information

A relation between trinucleotide comma-free codes and trinucleotide circular codes

A relation between trinucleotide comma-free codes and trinucleotide circular codes Theoretical Computer Science 401 (2008) 17 26 www.elsevier.com/locate/tcs A relation between trinucleotide comma-free codes and trinucleotide circular codes Christian J. Michel a,, Giuseppe Pirillo b,c,

More information

LABORATORY STANDARD OPERATING PROCEDURE FOR PULSENET CODE: PNL28 MLVA OF SHIGA TOXIN-PRODUCING ESCHERICHIA COLI

LABORATORY STANDARD OPERATING PROCEDURE FOR PULSENET CODE: PNL28 MLVA OF SHIGA TOXIN-PRODUCING ESCHERICHIA COLI 1. PURPOSE: to describe the standardized laboratory protocol for molecular subtyping of Shiga toxin-producing Escherichia coli O157 (STEC O157) and Salmonella enterica serotypes Typhimurium and Enteritidis.

More information

Crick s Hypothesis Revisited: The Existence of a Universal Coding Frame

Crick s Hypothesis Revisited: The Existence of a Universal Coding Frame Crick s Hypothesis Revisited: The Existence of a Universal Coding Frame Jean-Louis Lassez*, Ryan A. Rossi Computer Science Department, Coastal Carolina University jlassez@coastal.edu, raross@coastal.edu

More information

MLiB - Mandatory Project 2. Gene finding using HMMs

MLiB - Mandatory Project 2. Gene finding using HMMs MLiB - Mandatory Project 2 Gene finding using HMMs Viterbi decoding >NC_002737.1 Streptococcus pyogenes M1 GAS TTGTTGATATTCTGTTTTTTCTTTTTTAGTTTTCCACATGAAAAATAGTTGAAAACAATA GCGGTGTCCCCTTAAAATGGCTTTTCCACAGGTTGTGGAGAACCCAAATTAACAGTGTTA

More information

Sequence Assembly. BMI/CS 576 Mark Craven Some sequencing successes

Sequence Assembly. BMI/CS 576  Mark Craven Some sequencing successes Sequence Assembly BMI/CS 576 www.biostat.wisc.edu/bmi576/ Mark Craven craven@biostat.wisc.edu Some sequencing successes Yersinia pestis Cannabis sativa The sequencing problem We want to determine the identity

More information

Degenerate Coding and Sequence Compacting

Degenerate Coding and Sequence Compacting ESI The Erwin Schrödinger International Boltzmanngasse 9 Institute for Mathematical Physics A-1090 Wien, Austria Degenerate Coding and Sequence Compacting Maya Gorel Kirzhner V.M. Vienna, Preprint ESI

More information

CSCI2950-C Lecture 4 DNA Sequencing and Fragment Assembly

CSCI2950-C Lecture 4 DNA Sequencing and Fragment Assembly CSCI2950-C Lecture 4 DNA Sequencing and Fragment Assembly Ben Raphael Sept. 22, 2009 http://cs.brown.edu/courses/csci2950-c/ l-mer composition Def: Given string s, the Spectrum ( s, l ) is unordered multiset

More information

Graph Algorithms in Bioinformatics

Graph Algorithms in Bioinformatics Graph Algorithms in Bioinformatics Computational Biology IST Ana Teresa Freitas 2015/2016 Sequencing Clone-by-clone shotgun sequencing Human Genome Project Whole-genome shotgun sequencing Celera Genomics

More information

Sequencing. Computational Biology IST Ana Teresa Freitas 2011/2012. (BACs) Whole-genome shotgun sequencing Celera Genomics

Sequencing. Computational Biology IST Ana Teresa Freitas 2011/2012. (BACs) Whole-genome shotgun sequencing Celera Genomics Computational Biology IST Ana Teresa Freitas 2011/2012 Sequencing Clone-by-clone shotgun sequencing Human Genome Project Whole-genome shotgun sequencing Celera Genomics (BACs) 1 Must take the fragments

More information

DNA Sequencing. Overview

DNA Sequencing. Overview BINF 3350, Genomics and Bioinformatics DNA Sequencing Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Eulerian Cycles Problem Hamiltonian Cycles

More information

Eulerian Tours and Fleury s Algorithm

Eulerian Tours and Fleury s Algorithm Eulerian Tours and Fleury s Algorithm CSE21 Winter 2017, Day 12 (B00), Day 8 (A00) February 8, 2017 http://vlsicad.ucsd.edu/courses/cse21-w17 Vocabulary Path (or walk): describes a route from one vertex

More information

Scalable Solutions for DNA Sequence Analysis

Scalable Solutions for DNA Sequence Analysis Scalable Solutions for DNA Sequence Analysis Michael Schatz Dec 4, 2009 JHU/UMD Joint Sequencing Meeting The Evolution of DNA Sequencing Year Genome Technology Cost 2001 Venter et al. Sanger (ABI) $300,000,000

More information

DNA Sequencing The Shortest Superstring & Traveling Salesman Problems Sequencing by Hybridization

DNA Sequencing The Shortest Superstring & Traveling Salesman Problems Sequencing by Hybridization Eulerian & Hamiltonian Cycle Problems DNA Sequencing The Shortest Superstring & Traveling Salesman Problems Sequencing by Hybridization The Bridge Obsession Problem Find a tour crossing every bridge just

More information

de Bruijn graphs for sequencing data

de Bruijn graphs for sequencing data de Bruijn graphs for sequencing data Rayan Chikhi CNRS Bonsai team, CRIStAL/INRIA, Univ. Lille 1 SMPGD 2016 1 MOTIVATION - de Bruijn graphs are instrumental for reference-free sequencing data analysis:

More information

OFFICE OF RESEARCH AND SPONSORED PROGRAMS

OFFICE OF RESEARCH AND SPONSORED PROGRAMS OFFICE OF RESEARCH AND SPONSORED PROGRAMS June 9, 2016 Mr. Satoshi Harada Department of Innovation Research Japan Science and Technology Agency (JST) K s Gobancho, 7, Gobancho, Chiyoda-ku, Tokyo, 102-0076

More information

Structural analysis and haplotype diversity in swine LEP and MC4R genes

Structural analysis and haplotype diversity in swine LEP and MC4R genes J. Anim. Breed. Genet. ISSN - OIGINAL ATICLE Structural analysis and haplotype diversity in swine LEP and MC genes M. D Andrea, F. Pilla, E. Giuffra, D. Waddington & A.L. Archibald University of Molise,

More information

10/15/2009 Comp 590/Comp Fall

10/15/2009 Comp 590/Comp Fall Lecture 13: Graph Algorithms Study Chapter 8.1 8.8 10/15/2009 Comp 590/Comp 790-90 Fall 2009 1 The Bridge Obsession Problem Find a tour crossing every bridge just once Leonhard Euler, 1735 Bridges of Königsberg

More information

Eulerian tours. Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck. April 20, 2016

Eulerian tours. Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck.  April 20, 2016 Eulerian tours Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck http://cseweb.ucsd.edu/classes/sp16/cse21-bd/ April 20, 2016 Seven Bridges of Konigsberg Is there a path that crosses each

More information

Computational Methods for de novo Assembly of Next-Generation Genome Sequencing Data

Computational Methods for de novo Assembly of Next-Generation Genome Sequencing Data 1/39 Computational Methods for de novo Assembly of Next-Generation Genome Sequencing Data Rayan Chikhi ENS Cachan Brittany / IRISA (Genscale team) Advisor : Dominique Lavenier 2/39 INTRODUCTION, YEAR 2000

More information

DNA Fragment Assembly

DNA Fragment Assembly Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri DNA Fragment Assembly Overlap

More information

de novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics Next Generation Sequencing Analysis

de novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics Next Generation Sequencing Analysis de novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics 27626 - Next Generation Sequencing Analysis Generalized NGS analysis Data size Application Assembly: Compare

More information

10/8/13 Comp 555 Fall

10/8/13 Comp 555 Fall 10/8/13 Comp 555 Fall 2013 1 Find a tour crossing every bridge just once Leonhard Euler, 1735 Bridges of Königsberg 10/8/13 Comp 555 Fall 2013 2 Find a cycle that visits every edge exactly once Linear

More information

Graphs and Puzzles. Eulerian and Hamiltonian Tours.

Graphs and Puzzles. Eulerian and Hamiltonian Tours. Graphs and Puzzles. Eulerian and Hamiltonian Tours. CSE21 Winter 2017, Day 11 (B00), Day 7 (A00) February 3, 2017 http://vlsicad.ucsd.edu/courses/cse21-w17 Exam Announcements Seating Chart on Website Good

More information

Algorithms and Data Structures

Algorithms and Data Structures Algorithms and Data Structures Sorting beyond Value Comparisons Marius Kloft Content of this Lecture Radix Exchange Sort Sorting bitstrings in linear time (almost) Bucket Sort Marius Kloft: Alg&DS, Summer

More information

Algorithms for Bioinformatics

Algorithms for Bioinformatics Adapted from slides by Alexandru Tomescu, Leena Salmela and Veli Mäkinen, which are partly from http://bix.ucsd.edu/bioalgorithms/slides.php 58670 Algorithms for Bioinformatics Lecture 5: Graph Algorithms

More information

Sequence Assembly Required!

Sequence Assembly Required! Sequence Assembly Required! 1 October 3, ISMB 20172007 1 Sequence Assembly Genome Sequenced Fragments (reads) Assembled Contigs Finished Genome 2 Greedy solution is bounded 3 Typical assembly strategy

More information

Assembly in the Clouds

Assembly in the Clouds Assembly in the Clouds Michael Schatz October 13, 2010 Beyond the Genome Shredded Book Reconstruction Dickens accidentally shreds the first printing of A Tale of Two Cities Text printed on 5 long spools

More information

A Novel Implementation of an Extended 8x8 Playfair Cipher Using Interweaving on DNA-encoded Data

A Novel Implementation of an Extended 8x8 Playfair Cipher Using Interweaving on DNA-encoded Data International Journal of Electrical and Computer Engineering (IJECE) Vol. 4, No. 1, Feburary 2014, pp. 93~100 ISSN: 2088-8708 93 A Novel Implementation of an Extended 8x8 Playfair Cipher Using Interweaving

More information

Purpose of sequence assembly

Purpose of sequence assembly Sequence Assembly Purpose of sequence assembly Reconstruct long DNA/RNA sequences from short sequence reads Genome sequencing RNA sequencing for gene discovery Amplicon sequencing But not for transcript

More information

de novo assembly Rayan Chikhi Pennsylvania State University Workshop On Genomics - Cesky Krumlov - January /73

de novo assembly Rayan Chikhi Pennsylvania State University Workshop On Genomics - Cesky Krumlov - January /73 1/73 de novo assembly Rayan Chikhi Pennsylvania State University Workshop On Genomics - Cesky Krumlov - January 2014 2/73 YOUR INSTRUCTOR IS.. - Postdoc at Penn State, USA - PhD at INRIA / ENS Cachan,

More information

Problem statement. CS267 Assignment 3: Parallelize Graph Algorithms for de Novo Genome Assembly. Spring Example.

Problem statement. CS267 Assignment 3: Parallelize Graph Algorithms for de Novo Genome Assembly. Spring Example. CS267 Assignment 3: Problem statement 2 Parallelize Graph Algorithms for de Novo Genome Assembly k-mers are sequences of length k (alphabet is A/C/G/T). An extension is a simple symbol (A/C/G/T/F). The

More information

Detecting Superbubbles in Assembly Graphs. Taku Onodera (U. Tokyo)! Kunihiko Sadakane (NII)! Tetsuo Shibuya (U. Tokyo)!

Detecting Superbubbles in Assembly Graphs. Taku Onodera (U. Tokyo)! Kunihiko Sadakane (NII)! Tetsuo Shibuya (U. Tokyo)! Detecting Superbubbles in Assembly Graphs Taku Onodera (U. Tokyo)! Kunihiko Sadakane (NII)! Tetsuo Shibuya (U. Tokyo)! de Bruijn Graph-based Assembly Reads (substrings of original DNA sequence) de Bruijn

More information

3. The object system(s)

3. The object system(s) 3. The object system(s) Thomas Lumley Ken Rice Universities of Washington and Auckland Seattle, June 2011 Generics and methods Many functions in R are generic. This means that the function itself (eg plot,

More information

Genome 373: Genome Assembly. Doug Fowler

Genome 373: Genome Assembly. Doug Fowler Genome 373: Genome Assembly Doug Fowler What are some of the things we ve seen we can do with HTS data? We ve seen that HTS can enable a wide variety of analyses ranging from ID ing variants to genome-

More information

QuasiAlign: Position Sensitive P-Mer Frequency Clustering with Applications to Genomic Classification and Differentiation

QuasiAlign: Position Sensitive P-Mer Frequency Clustering with Applications to Genomic Classification and Differentiation QuasiAlign: Position Sensitive P-Mer Frequency Clustering with Applications to Genomic Classification and Differentiation Anurag Nagar Southern Methodist University Michael Hahsler Southern Methodist University

More information

1. PURPOSE: to describe the standardized laboratory protocol for molecular subtyping of Salmonella enterica serotype Enteritidis.

1. PURPOSE: to describe the standardized laboratory protocol for molecular subtyping of Salmonella enterica serotype Enteritidis. 1. PURPOSE: to describe the standardized laboratory protocol for molecular subtyping of Salmonella enterica serotype Enteritidis. 2. SCOPE: to provide the PulseNet participants with a single protocol for

More information

WSSP-10 Chapter 7 BLASTN: DNA vs DNA searches

WSSP-10 Chapter 7 BLASTN: DNA vs DNA searches WSSP-10 Chapter 7 BLASTN: DNA vs DNA searches 4-3 DSAP: BLASTn Page p. 7-1 NCBI BLAST Home Page p. 7-1 NCBI BLASTN search page p. 7-2 Copy sequence from DSAP or wave form program p. 7-2 Choose a database

More information

Algorithms for Bioinformatics

Algorithms for Bioinformatics Adapted from slides by Alexandru Tomescu, Leena Salmela and Veli Mäkinen, which are partly from http://bix.ucsd.edu/bioalgorithms/slides.php 582670 Algorithms for Bioinformatics Lecture 3: Graph Algorithms

More information

Programming Applications. What is Computer Programming?

Programming Applications. What is Computer Programming? Programming Applications What is Computer Programming? An algorithm is a series of steps for solving a problem A programming language is a way to express our algorithm to a computer Programming is the

More information

DNA Fragment Assembly

DNA Fragment Assembly SIGCSE 009 Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri DNA Fragment Assembly

More information

GATB programming day

GATB programming day GATB programming day G.Rizk, R.Chikhi Genscale, Rennes 15/06/2016 G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/2016 1 / 41 GATB INTRODUCTION NGS technologies produce terabytes of data Efficient

More information

How to Run NCBI BLAST on zcluster at GACRC

How to Run NCBI BLAST on zcluster at GACRC How to Run NCBI BLAST on zcluster at GACRC BLAST: Basic Local Alignment Search Tool Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu 1 OVERVIEW What is BLAST?

More information

Solutions Exercise Set 3 Author: Charmi Panchal

Solutions Exercise Set 3 Author: Charmi Panchal Solutions Exercise Set 3 Author: Charmi Panchal Problem 1: Suppose we have following fragments: f1 = ATCCTTAACCCC f2 = TTAACTCA f3 = TTAATACTCCC f4 = ATCTTTC f5 = CACTCCCACACA f6 = CACAATCCTTAACCC f7 =

More information

debgr: An Efficient and Near-Exact Representation of the Weighted de Bruijn Graph Prashant Pandey Stony Brook University, NY, USA

debgr: An Efficient and Near-Exact Representation of the Weighted de Bruijn Graph Prashant Pandey Stony Brook University, NY, USA debgr: An Efficient and Near-Exact Representation of the Weighted de Bruijn Graph Prashant Pandey Stony Brook University, NY, USA De Bruijn graphs are ubiquitous [Pevzner et al. 2001, Zerbino and Birney,

More information

PERFORMANCE ANALYSIS OF DATAMINIG TECHNIQUE IN RBC, WBC and PLATELET CANCER DATASETS

PERFORMANCE ANALYSIS OF DATAMINIG TECHNIQUE IN RBC, WBC and PLATELET CANCER DATASETS PERFORMANCE ANALYSIS OF DATAMINIG TECHNIQUE IN RBC, WBC and PLATELET CANCER DATASETS Mayilvaganan M 1 and Hemalatha 2 1 Associate Professor, Department of Computer Science, PSG College of arts and science,

More information

Graph Algorithms in Bioinformatics

Graph Algorithms in Bioinformatics Graph Algorithms in Bioinformatics Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 13 Lopresti Fall 2007 Lecture 13-1 - Outline Introduction to graph theory Eulerian & Hamiltonian Cycle

More information

Global Alignment. Algorithms in BioInformatics Mandatory Project 1 Magnus Erik Hvass Pedersen (971055) Daimi, University of Aarhus September 2004

Global Alignment. Algorithms in BioInformatics Mandatory Project 1 Magnus Erik Hvass Pedersen (971055) Daimi, University of Aarhus September 2004 1 Introduction Global Alignment Algorithms in BioInformatics Mandatory Project 1 Magnus Erik Hvass Pedersen (971055) Daimi, University of Aarhus September 2004 The purpose of this report is to verify attendance

More information

csci 210: Data Structures Stacks and Queues in Solution Searching

csci 210: Data Structures Stacks and Queues in Solution Searching csci 210: Data Structures Stacks and Queues in Solution Searching 1 Summary Topics Using Stacks and Queues in searching Applications: In-class problem: missionary and cannibals In-class problem: finding

More information

Sequencing. Short Read Alignment. Sequencing. Paired-End Sequencing 6/10/2010. Tobias Rausch 7 th June 2010 WGS. ChIP-Seq. Applied Biosystems.

Sequencing. Short Read Alignment. Sequencing. Paired-End Sequencing 6/10/2010. Tobias Rausch 7 th June 2010 WGS. ChIP-Seq. Applied Biosystems. Sequencing Short Alignment Tobias Rausch 7 th June 2010 WGS RNA-Seq Exon Capture ChIP-Seq Sequencing Paired-End Sequencing Target genome Fragments Roche GS FLX Titanium Illumina Applied Biosystems SOLiD

More information

Parallel de novo Assembly of Complex (Meta) Genomes via HipMer

Parallel de novo Assembly of Complex (Meta) Genomes via HipMer Parallel de novo Assembly of Complex (Meta) Genomes via HipMer Aydın Buluç Computational Research Division, LBNL May 23, 2016 Invited Talk at HiCOMB 2016 Outline and Acknowledgments Joint work (alphabetical)

More information

Working with files. File Reading and Writing. Reading and writing. Opening a file

Working with files. File Reading and Writing. Reading and writing. Opening a file Working with files File Reading and Writing Reading get info into your program Parsing processing file contents Writing get info out of your program MBV-INFx410 Fall 2015 Reading and writing Three-step

More information

Strings. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Strings. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas Strings Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas Run a program by typing at a terminal prompt (which may be > or $ or something else depending on your computer;

More information

Working with files. File Reading and Writing. Reading and writing. Opening a file

Working with files. File Reading and Writing. Reading and writing. Opening a file Working with files File Reading and Writing Reading get info into your program Parsing processing file contents Writing get info out of your program MBV-INFx410 Fall 2014 Reading and writing Three-step

More information

Computational Theory MAT542 (Computational Methods in Genomics) - Part 2 & 3 -

Computational Theory MAT542 (Computational Methods in Genomics) - Part 2 & 3 - Computational Theory MAT542 (Computational Methods in Genomics) - Part 2 & 3 - Benjamin King Mount Desert Island Biological Laboratory bking@mdibl.org Overview of 4 Lectures Introduction to Computation

More information

DELAMANID SUSCEPTIBILITY TESTING IN AN AUTOMATED LIQUID CULTURE SYSTEM

DELAMANID SUSCEPTIBILITY TESTING IN AN AUTOMATED LIQUID CULTURE SYSTEM DELAMANID SUSCEPTIBILITY TESTING IN AN AUTOMATED LIQUID CULTURE SYSTEM Daniela Maria Cirillo San Raffaele Scientific Institute, Milan COI/CA OSR as signed MTA with Janssen and Otzuka as SRL and is involved

More information

Theory of Computations III. WMU CS-6800 Spring -2014

Theory of Computations III. WMU CS-6800 Spring -2014 Theory of Computations III WMU CS-6800 Spring -2014 Markov Algorithm (MA) By Ahmed Al-Gburi & Barzan Shekh 2014 Outline Introduction How to execute a MA Schema Examples Formal Definition Formal Algorithm

More information

I519 Introduction to Bioinformatics, Genome assembly. Yuzhen Ye School of Informatics & Computing, IUB

I519 Introduction to Bioinformatics, Genome assembly. Yuzhen Ye School of Informatics & Computing, IUB I519 Introduction to Bioinformatics, 2014 Genome assembly Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Contents Genome assembly problem Approaches Comparative assembly The string

More information

DNA Fragment Assembly Algorithms: Toward a Solution for Long Repeats

DNA Fragment Assembly Algorithms: Toward a Solution for Long Repeats San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 2008 DNA Fragment Assembly Algorithms: Toward a Solution for Long Repeats Ching Li San Jose State University

More information

RESEARCH TOPIC IN BIOINFORMANTIC

RESEARCH TOPIC IN BIOINFORMANTIC RESEARCH TOPIC IN BIOINFORMANTIC GENOME ASSEMBLY Instructor: Dr. Yufeng Wu Noted by: February 25, 2012 Genome Assembly is a kind of string sequencing problems. As we all know, the human genome is very

More information

ENGI 4421 Counting Techniques for Probability Page Example 3.01 [Navidi Section 2.2; Devore Section 2.3]

ENGI 4421 Counting Techniques for Probability Page Example 3.01 [Navidi Section 2.2; Devore Section 2.3] ENGI 4421 Coutig Techiques fo Pobability Page 3-01 Example 3.01 [Navidi Sectio 2.2; Devoe Sectio 2.3] Fou cads, labelled A, B, C ad D, ae i a u. I how may ways ca thee cads be daw (a) with eplacemet? (b)

More information

Mining more complex patterns: frequent subsequences and subgraphs. Department of Computers, Czech Technical University in Prague

Mining more complex patterns: frequent subsequences and subgraphs. Department of Computers, Czech Technical University in Prague Mining more complex patterns: frequent subsequences and subgraphs Jiří Kléma Department of Computers, Czech Technical University in Prague http://cw.felk.cvut.cz/wiki/courses/a4m33sad/start poutline Motivation

More information

In Silico Modelling and Analysis of Ribosome Kinetics and aa-trna Competition

In Silico Modelling and Analysis of Ribosome Kinetics and aa-trna Competition In Silico Modelling and Analysis of Ribosome Kinetics and aa-trna Competition D. Bošnački 1, T.E. Pronk 2, and E.P. de Vink 3, 1 Dept. of Biomedical Engineering, Eindhoven University of Technology 2 Swammerdam

More information

Object Oriented Programming Using C++ Mathematics & Computing IET, Katunayake

Object Oriented Programming Using C++ Mathematics & Computing IET, Katunayake Assigning Values // Example 2.3(Mathematical operations in C++) float a; cout > a; cout

More information

TEST BDA24202 / BTI10202 COMPUTER PROGRAMMING May 2013

TEST BDA24202 / BTI10202 COMPUTER PROGRAMMING May 2013 DEPARTMENT OF MATERIAL AND ENGINEERING DESIGN FACULTY OF MECHANICAL AND MANUFACTURING ENGINEERING UNIVERSITI TUN HUSSEIN ONN MALAYSIA (UTHM), JOHOR TEST BDA24202 / BTI10202 COMPUTER PROGRAMMING May 2013

More information

Read Mapping. de Novo Assembly. Genomics: Lecture #2 WS 2014/2015

Read Mapping. de Novo Assembly. Genomics: Lecture #2 WS 2014/2015 Mapping de Novo Assembly Institut für Medizinische Genetik und Humangenetik Charité Universitätsmedizin Berlin Genomics: Lecture #2 WS 2014/2015 Today Genome assembly: the basics Hamiltonian and Eulerian

More information

Euclid. Father of Geometry Euclidean Geometry Euclid s Elements

Euclid. Father of Geometry Euclidean Geometry Euclid s Elements Euclid Father of Geometry Euclidean Geometry Euclid s Elements Point Description Indicates a location and has no size. How to Name it You can represent a point by a dot and name it by a capital letter.

More information

Lecture 02 C FUNDAMENTALS

Lecture 02 C FUNDAMENTALS Lecture 02 C FUNDAMENTALS 1 Keywords C Fundamentals auto double int struct break else long switch case enum register typedef char extern return union const float short unsigned continue for signed void

More information

Lab 5 Pointers and Arrays

Lab 5 Pointers and Arrays Lab 5 Pointers and Arrays The purpose of this lab is to practice using pointers to manipulate the data in arrays in this case, in arrays of characters. We will be building functions that add entries to,

More information

Computational Architecture of Cloud Environments Michael Schatz. April 1, 2010 NHGRI Cloud Computing Workshop

Computational Architecture of Cloud Environments Michael Schatz. April 1, 2010 NHGRI Cloud Computing Workshop Computational Architecture of Cloud Environments Michael Schatz April 1, 2010 NHGRI Cloud Computing Workshop Cloud Architecture Computation Input Output Nebulous question: Cloud computing = Utility computing

More information

Supporting Information

Supporting Information Supprting Inrmatin Ultraspeciic Multiplexed Detectin Lw-Abundance Single-Nucletide Variants by Cmbining a Masking Tactic with Flurescent Nanparticle Cunting Xiajing Pei, Tiancheng Lai, Guangyu Ta, Hu Hng,

More information

CSCI 220: Computer Architecture I Instructor: Pranava K. Jha. Simplification of Boolean Functions using a Karnaugh Map

CSCI 220: Computer Architecture I Instructor: Pranava K. Jha. Simplification of Boolean Functions using a Karnaugh Map CSCI 22: Computer Architecture I Instructor: Pranava K. Jha Simplification of Boolean Functions using a Karnaugh Map Q.. Plot the following Boolean function on a Karnaugh map: f(a, b, c, d) = m(, 2, 4,

More information

Week 1 Questions Question Options Answer & Explanation A. 10 B. 20 C. 21 D. 11. A. 97 B. 98 C. 99 D. a

Week 1 Questions Question Options Answer & Explanation A. 10 B. 20 C. 21 D. 11. A. 97 B. 98 C. 99 D. a Sr. no. Week 1 Questions Question Options Answer & Explanation 1 Find the output: int x=10; int y; y=x++; printf("%d",x); A. 10 B. 20 C. 21 D. 11 Answer: D x++ increments the value to 11. So printf statement

More information

Genome 373: Intro to Python I. Doug Fowler

Genome 373: Intro to Python I. Doug Fowler Genome 373: Intro to Python I Doug Fowler Outline Intro to Python I What is a program? Dealing with data Strings in Python Numbers in Python What is a program? What is a program? A series of instruc2ons,

More information

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang)

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang) Bioinformatics Programming EE, NCKU Tien-Hao Chang (Darby Chang) 1 Regular Expression 2 http://rp1.monday.vip.tw1.yahoo.net/res/gdsale/st_pic/0469/st-469571-1.jpg 3 Text patterns and matches A regular

More information

./pharo Pharo.image eval 3 + 4 ./pharo Pharo.image my_script.st BioSequence newdna: atcggtcggctta. BioSequence newambiguousdna: AAGTCAGTGTACTATTAGCATGCATGTGCAACACATTAGCTG. BioSequence newunambiguousdna:

More information

Towards a de novo short read assembler for large genomes using cloud computing

Towards a de novo short read assembler for large genomes using cloud computing Towards a de novo short read assembler for large genomes using cloud computing Michael Schatz April 21, 2009 AMSC664 Advanced Scientific Computing Outline 1.! Genome assembly by analogy 2.! DNA sequencing

More information

NOSEP: Non-Overlapping Sequence Pattern Mining with Gap Constraints

NOSEP: Non-Overlapping Sequence Pattern Mining with Gap Constraints 1 NOSEP: Non-Overlapping Sequence Pattern Mining with Gap Constraints Youxi Wu, Yao Tong, Xingquan Zhu, Senior, IEEE, and Xindong Wu, Fellow, IEEE Abstract Sequence pattern mining aims to discover frequent

More information

The Beauty of the Symmetric Sierpinski Relatives

The Beauty of the Symmetric Sierpinski Relatives Bridges 2018 Conference Proceedings The Beauty of the Symmetric Sierpinski Relatives Tara Taylor Department of Mathematics, Statistics and Computer Science, St. Francis Xavier University, Antigonish, Nova

More information

Hybrid Parallel Programming

Hybrid Parallel Programming Hybrid Parallel Programming for Massive Graph Analysis KameshMdd Madduri KMadduri@lbl.gov ComputationalResearch Division Lawrence Berkeley National Laboratory SIAM Annual Meeting 2010 July 12, 2010 Hybrid

More information

Arrays/Slices Store Lists of Variables

Arrays/Slices Store Lists of Variables Maps 02-201 Arrays/Slices Store Lists of Variables H i T h e r e! 0 1 2 3 4 5 6 7 8 1 1 2 3 5 8 13 21 34 55 89 0 1 2 3 4 5 6 7 8 9 10 ACG TTA GAG CCT TAA GGG CAT 0 1 2 3 4 5 6 What if Indices Aren t Integers?

More information

Assignment 2. Summary. Some Important bash Instructions. CSci132 Practical UNIX and Programming Assignment 2, Fall Prof.

Assignment 2. Summary. Some Important bash Instructions. CSci132 Practical UNIX and Programming Assignment 2, Fall Prof. Assignment 2 Summary The purpose of this assignment is to give you some practice in bash scripting. When you write a bash script, you are really writing a program in the bash programming language. In class

More information

Michał Kierzynka et al. Poznan University of Technology. 17 March 2015, San Jose

Michał Kierzynka et al. Poznan University of Technology. 17 March 2015, San Jose Michał Kierzynka et al. Poznan University of Technology 17 March 2015, San Jose The research has been supported by grant No. 2012/05/B/ST6/03026 from the National Science Centre, Poland. DNA de novo assembly

More information

Built-in functions. You ve used several functions already. >>> len("atggtca") 7 >>> abs(-6) 6 >>> float("3.1415") >>>

Built-in functions. You ve used several functions already. >>> len(atggtca) 7 >>> abs(-6) 6 >>> float(3.1415) >>> Functions Built-in functions You ve used several functions already len("atggtca") 7 abs(-6) 6 float("3.1415") 3.1415000000000002 What are functions? A function is a code block with a name def hello():

More information