![]() |
Bacillus megaterium Genome Sequencing WorkshopHands-on Computer Exercise |
![]() |
Many of the links on this page require a password. To get one, e-mail Rick Johns (rjohns@niu.edu).
link to my PowerPoint presentation from this morning.
A basic principle in all scientific investigations is that you need to write down what you have done and what the results were. The easiest way to do this is to open a Word document and record what you are doing as you go through the steps on this page. This is going to be your lab notebook for the day. To do this, go to the "Novell-delivered Application" at the bottom of the desktop, and scroll down until you find "WordXP". Start a new file and be sure to save it from time to time.
Go to the ORF page using the button below. Scan all 6 reading frames for long regions with no stop codons. Be sure to scan the reverse frames from right to left! You are looking for the longest ORFs that don't overlap more than 50 bp, so don't pick smaller ones that are inside larger ones. The minimum allowable size is 100 bp, but most will be much longer than that.
As a hint, this image contains 4 long ORFs, each at least 1000 bp long.
How many ORFs did you find? Which strand are the on? How long are they?
Get the coordinates of the downstream stop codon and the farthest upstream start codon for each long ORF. Also, write down what the base seqeunce is for both the start codon and the stop codon.
To the Open Readng Frame page!
You now want to get the DNA sequence for each of the potential gene you have found in open reading frames in the previous step. You want the sequence from the first base of the start codon to the last base of the stop codon. Enter the lower (left-hand) coordinate in the first box and the higher (right-hand) coordinate in the second box. If the gene is on the reverse strand, you will have to reverse-complement it (Step 2A).
Paste your sequences into your lab notebook file!
If your gene was on the reverse strand, you need to turn it around so the the gene's beginning (5' end) is at the beginning of your sequence. Also, you need to complement the bases: convert A to T, G to C, etc. Paste your sequence in the window below to do this.
Each codon (group of 3 bases) needs to be translated into a single amino acid, using the genetic code. The code is degenerate: several different codons code for the same amino acid (in most cases). The amino acids are given in the one-letter code system (which is shown in the genetic code table).
This program requires that:
Below is a link to Uniprot , which contains an up-to-date set of all known protein sequences. This service is used by a lot of people, so please don't abuse it. Click on the BLAST tab, then paste your sequence into the box and hit the BLAST button. It takes a bit (usually less than a minute) to get the results, so be patient
The results appear with the best hits on top. We want to pay attention to the best hit, and also to the top 5 hits. Questions to answer for each of these hits:
If the answers to all of the above questions are yes for the best hit, we can be confident that our gene is homologus to it. If most of the answers are no, this gene is probably not a homologue. In between is a gray area, which means that we would pu tthe word "putative" in front of the gene name.
If the top hits are all high scoring and their names match, we can confidently assign this name to the gene.
Bacteria is the domain, Firmicutes is the phylum, Bacillales is the order, Bacillaceae is the family, and Bacillus is the genus. Any gene whose top hits aren't at least from the Firmicutes is almost certainly an example of horizontal gene transfer. Any gene from the Bacillaceae family is almost certainly an example of vertical gene transfer. In between is a gray area.
Usually, the protein sequence is much better conserved across species lines than the sequence just upstream. However, the beginning of a protein is often the least well conserved area, so choosing start sites on the basis of sequence conservation works best with very similar sequences.