Do project 6.2 from our textbook (p. 316-320).
Before you begin, right-click to download the skeleton code for this
project:
You will write your code in genefinder.py
.
Also download the following data files, containing various size prefixes of the genome of a particular strain of E. coli:
You will begin with the smaller files.
Follow the instructions in the textbook to do both part 1 and part
2. In part 1, you will complete the definitions of the function orf1(dna,
rf, tortoise)
. In part 2, you will complete the
definition of the function
gcFreq(dna, window,
tortoise)
.
Hungry for more? Try some of the more challenging exercises from
section 6.7, which conern the identification of microsatellites or simple sequence repeats (SSRs).
Create a file named ssr.py
and do exercises 6.7.8, 6.7.9, and 6.7.10 (p. 310). Rather than naming
all the functions ssr(dna, repeat)
, as the textbook
suggests, name your functions as follows:
6.7.8 |
firstSSR(dna, repeat) |
6.7.9 |
longestSSR(dna, repeat) |
6.7.10 |
longestDinucleotideSSR(dna) |
Here is my test code:
if __name__=="__main__":And here is the sample output:
inputFile = open('eco536-500.txt', 'r')
testSequence = inputFile.read()
print(testSequence)
print("First 'caga' SSR:", firstSSR(testSequence, "caga"))
print("Longest 't' SSR:", longestSSR(testSequence, "t"))
print("The most repeated dinucleotide is", longestDinucleotideSSR(testSequence))
agcttttcattctgactgcaacgggcaatatgtctctgtgtggattaaaaaaagagtgtctgatagcagcttctgaactggttacctgccgtgagtaaattaaaattttattgacttaggtcactaaatactttaaccaatataggcatagcgcacagacagataaaaattacagagtacacaacatccatgaaacgcattagcaccaccattaccaccaccatcaccattaccacaggtaacggtgcgggctgacgcgtacaggaaacacagaaaaaagcccgcacctgacagtgcgggcttttttttcgaccaaaggtaacgaggtaacaaccatgcgagtgttgaagttcggcggtacatcagtggcaaatgcagaacgttttctgcgggttgccgatattctggaaagcaatgccaggcaggggcaggtggccaccgtcctctctgcccccgccaaaatcaccaaccatctggtagcgatgattgaaaaaaccat
First 'caga' SSR: 2
Longest 't' SSR: 8
The most repeated dinucleotide is tt
Please style your code per section 3.4 of the textbook.
Submit one file, genefinder.py
,
through the online turnin form. You may optionally submit a second
file, ssr.py
, for
the Above & Beyond problems.
Part 1 - orf1
- 5 points
gcFreq
- 5 pointsssr.py
- Up to 3 additional pointsCreated October 17, 2016
Last revisedOctober 17, 2016, 03:52:33 PM PDT
This work is licensed under a Creative
Commons Attribution-Noncommercial-Share Alike 3.0 United States License.