CS170 Section 5 HW #3 Due Friday, March 20 at 11:59 p.m. Write and submit one Java program, Sequence.java, as described on the next page. The assignment should be submitted on the Math/CS system (from any lab computer or by accessing the computers remotely) using the following command to submit: /home/cs170005/turnin Sequence.java hw3 Late turn-ins will be accepted until Monday, March 23 at 11:59 pm (you must submit before 11:59pm) for a 10% penalty. If you are submitting late, use the following command: /home/cs170005/turnin Sequence.java hw3_late Style Guidelines: Only one statement or code structure per line. (The class declaration can t be on the same line as the main method declaration, for example.) No lines (including comments) over 80 characters, except for single-website citations. Write a block comment after the honor statement describing the program s purpose. Indent blocks of code in loops, methods, switch statements, and classes. See any class code for examples of this. Each level starting with a { or that could start with a { should be indented. Use descriptive variable names. No one-letter or one-symbol variable names, (New!) except in the initial action of a for loop. New! Each method must have a block comment preceding it that describes the method s purpose, parameters, and any return values. New! Every method, including the main method, longer than 7 lines must have inline comments describing what is going on in the code. Breaking these guidelines will result in a grade penalty. Honor Code: You must abide by the Emory honor code as well as the course-specific honor requirements for assignments when completing this homework. Make sure to include the statement of collaboration and adherence to the honor code (this can be found on the course syllabus at http: //mathcs.emory.edu/~ccgarve/cs170) acknowledging that you understand and have abided by the honor code. Make sure to cite websites you use, and include your name/email address. 1
[100 pts] Sequence.java Write a program, Sequence.java, that interprets DNA chromatograms, finds DNA subsequences, and prints information about DNA sequences. Background: Disclaimer: I am not a biologist, and this information may not be 100% accurate as a result. I suggest http://seqcore.brcf.med.umich.edu/doc/educ/dnapr/sequencing.html and https://code.google.com/p/seqtrace/ as references. There are many processes for sequencing DNA, which consists of four bases, denoted A, T, C, and G. Sequencing tells you the order of those pairs in the DNA. Regardless of the particular method used, sequencing DNA typically involves using some sort of a dye to color the different base pairs, and then reading the colors of the DNA in order using a machine. This results in a chromatograph, which shows the intensities of the different colors at different parts in the DNA sequence. If, at a particular spot in the sequence, the color used to dye A is the brightest of the four possible colors, then we know there should be an A at that point in the DNA sequence. Reading the color intensity patterns tells us the whole DNA sequence. Sometimes there s mistakes or unclear parts, though, where two colors have high intensity, or there is no color with high intensity. Typical color choices are red, green, blue, and black, corresponding to T, A, C, and G respectively. Test Code: You can retrieve a program and file for testing your implementation of Sequence.java by running the following command in your cs170 homework 3 directory: cp ~cs170005/share/hw3/*./ You must have both of these files in the same directory as your Sequence.java file to be able to run the test program. TestSequence.java has some information about how to use it in a comment at the top that you should read. I do not recommend trying to read and understand the code itself. You can instead compile and run it to test your implementation. Program Specifications: Unlike previous homework assignments, this homework does not (necessarily, see the extra credit info) involve writing a runnable Java program. You do not need a main method in your submission. You must instead write the methods listed below. All methods should be public and static, and have method signatures exactly matching what is listed blow. I suggest implementing them in the order listed. You may not use any String methods other than length, charat, and substring to implement these methods. countchar, which takes in a String followed by a character. The method should count and return the integer number of occurrences of the character in the String. countchar("hello", l ) 2 countchar("hello", c ) 0 countchar("", h ) 0 2
parsechromatogram, which takes in a String chromatogram sequence, and returns a string which is the corresponding DNA sequence. The corresponding DNA sequence should have the same trailing and leading number of spaces as the chromatogram, but all occurrences of r should be replaced by T, g with A, b with C, and k with G. Any other character, except the leading and trailing spaces, should be replaced by?. parsechromatogram(" rgbkakbgr ") " TACG?GCAT " parsechromatogram(" r gbkakbgr ") " T?ACG?GCAT " parsechromatogram("rgbkakbgr ") "TACG?GCAT " parsechromatogram("rgb") "???" printinfo, which takes in a String DNA sequence, returns nothing, and prints out the sequence, its length, and labelled counts of the number of occurrences of A, T, C, G, and? in the sequence. You must call countchar in this method. The format does not need to exactly match the example output. Example output to terminal for calling printinfo(" TACG?GCAT ") (which has two leading and trailing spaces, not just one): TACG?GCAT Length: 13 A: 2, T: 2, C: 2, G: 2,?: 1 findindex, which takes a String to be searched, followed by a String to search for within the first string. The method should return an integer, which is the first index at which the second String can be found within the first String (see examples). If the second string is not a substring of the first string, the method should return -1. The second string is guaranteed not to be the empty String. findindex("how are you?", "o") 1 findindex("how are you?", "are") 4 findindex("how are you?", "cat") -1 findindex("how are you?", "how are you? more text here") -1 findindex("", "o") -1 3
findsubsequence, which takes in three Strings and returns a String. You must call findindex and the String method substring in this method. The first input is a String which is being searched (which I ll refer to as the base string ), and a substring of it will be returned. The second and third strings are markers to look for in the base string; I ll call them the start marker and the end marker, respectively. If the first occurrence of the end marker in the base string happens after the end of the start marker occurs in the base string, the method should return the substring of the base string that is between the two markers, inclusive. This looks like: findsubsequence("cats are cute", "cats", "are") "cats are" findsubsequence("cats are cute", "cats", "cute") "cats are cute" However, if this is not possible in the base string, the empty string "" should be returned. This can be because one of the markers isn t in the base string: findsubsequence("cats are cute", "dogs", "are") "" findsubsequence("cats are cute", "cats", "ARE") "" Or because the first occurrence of the markers overlap: findsubsequence("cats are cute", "cats", "ats") "" findsubsequence("cats are cute", "c", "c") "" findsubsequence("cats are cute", "cats are cute", "c") "" Or because the first marker occurs after the second marker: findsubsequence("cats are cute", "cute", "cats") "" The marker strings are guaranteed to contain at least one character, but the base string is not. [up to 10 points extra] Extra Credit: Write a main method which prints a menu, allowing a user to choose between parsing a chromatogram and finding a subsequence of a DNA sequence. Based on the menu option chosen, the program should take input for either a chromatogram, or a DNA sequence and two markers. In either case, the program should print information about the resulting sequence using the printinfo method. You do not need to loop this procedure. 4
When you are finished, check each of the following. This non-exhaustive list of requirements will form the basis for the grading rubric. Make sure you have a collaboration statement that contains your name and email address. Also make sure your collaboration statement truthfully documents what you used to complete the homework. The first page lists style guidelines. Go through them one-by-one, making sure that your code adheres to the guidelines. Make sure you haven t used the String methods equals, indexof, contains, or any other methods besides charat, length, or substring. You will not get credit for methods you write that call these methods. Compile and run TestSequence.java and make sure you pass all test cases. 5