Learning Labs:   Bioinformatics

Find the STR... the STR sequence challenge

Tab bar

Sequence Worksheet Answer

The sequence from the BLAST search using D16S539 primer:

"Homo sapiens chromosome 16 clone RP11-511G21, complete sequence Length = 104680" is pasted below.

Your challenge is to find the STR.

Forward Primer: gggggtctaagagcttgtaaaaag begins the sequence at base number 10548.

Find the repeat - it is 4 bp long, repeated > 5 times. Try different reading frames to be sure you get the longest consecutive number of repeats.

1000 nucleotides starting with the primer site:

                                                                                                              10548 ggg ggtctaagag
    10561 cttgtaaaaa gtgtacaagt gccagatgct cgttgtgcac aaatctaaat gcagaaaagc
    10621 actgaaagaa gaatcccgaa aaccacagtt cccattttta tatgggagca aacaaaggca
    10681 gatcccaagc tcttcctctt ccctagatca atacagacag acagacaggt ggatagatag
    10741 atagatagat agatagatag atagatagat agatatcatt gaaagacaaa acagagatgg
    10801 atgatagata catgcttaca gatgcacaca caaacgctaa atggtataaa aatggaatca
    10861 ctctgtaggc tgttttacca cctactttac taaattaatg agttattgag tataatttaa
    10921 ttttatatac taatttgaaa ctgtgtcatt aggtttttaa gtctatggca tcactttcgc
    10981 ttgtattttt ctattgattt cttttctttt cttttctttt tttgagacag agtctcactc
    11041 tcacccaggc tggagtaccg tggcacgatc ttggctcatt gcaaccacca cctcccgggt
    11101 tcaagtgatt atcctgcctc agcctcccaa atagctggga ttacaggtgc ccagcaccat
    11161 gcctggctaa ttttttgtat ttttactaca gatgggtttt caccatgttg tccgggctgg
    11221 tctcgaactc ctggcctcaa gtgatccacc cgccttggcc ttccaaagtg ctgggattac
    11281 aggagcgagc caccgtgccc agccctattg atttggaaat tgtaaggaga gtcgtgctct
    11341 ctatgaattc acacagtagg ggggtgagtc aagtgagcag ggagccacac tcggcatcac
    11401 tcatccccag ctgcaccctg cttgctcaac agtgcctgtg tgtcctgcct tgcctactgt
    11461 tttattcata cggaaagaca cccgcacggt atttatttga ccagaaatgg tgtcacatca
    11521 tgcatgtatt tcctgcaact tcctttttcc cttgccgagt cctgcatgaa catcttttct

[To see the answer click on the forward primer -
it links to the answer page and the second part of the challenge.]

Find the repeat using "bioinformatics" - obviously it is difficult to do this with pen and paper. Molecular biologists need to find repeats using a computer algorithm to speed this process. Finding repeats is important in mapping genes since exons are not likely to contain repeats.

  1. Copy about half the sequence above to the clipboard.
  2. Click on this URL - http://tandem.bu.edu/trf/trf.submit.options.html - to go to the Tandem Repeat Finder at Boston University.
  3. Click the "Submit Page" button, then the "Basic" button and then paste your sequence in the space provided AFTER typing the ">" symbol and hitting the "Enter" key (this is FASTA format). You must also select the "cut and paste sequence" button.
  4. Then hit the "submit sequence" button and you will be taken to a page with analysis of your sequence.

Next: Sequence from BLAST search