Paper: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
<font size = " | <font size = "4">'''Results'''</font> | ||
<font size = "4">'''Query Sequence'''</font> | |||
The amino acid query sequence of 2gfh protein (Figure 3) from ''Mus musculus'' is obtained from Genbank. | |||
{|cellspacing="0" cellpadding = "10" style="border-style:solid; border-color:black; border-width:1px;" | |||
|1 mgsdkihhhh hhmglsrvra vffdldntli dtagasrrgm levikllqsk yhykeeaeii<br> | |||
61 cdkvqvklsk ecfhpystci tdvrtshwee aiqetkggad nrklaeecyf lwkstrlqhm<br> | |||
121 iladdvkaml telrkevrll lltngdrqtq rekieacacq syfdaivigg eqkeekpaps<br> | |||
181 ifyhccdllg vqpgdcvmvg dtletdiqgg lnaglkatvw inksgrvplt sspmphymvs<br> | |||
241 svlelpallq sidckvsmsv<br> | |||
<br> | |||
|} | |||
[[Image:Results_-_dali_01.png]] | |||
'''Figure 3. '''The 260 amino acid sequence of 2gfh protein. | |||
<font size = "4">'''Sequence Homology'''</font> | |||
From the BlastP similarity was used for comparison as these had shown higher homology to the query sequence sequence search, a total of 500 proteins were yielded. | |||
Only a total of 38 proteins, in contrast with the remainder of the search results. | |||
These proteins were chosen according to their bit scores and E-values. Two more outlier partial sequences contributing to poor overall alignment (huge deletion gaps) were subsequently removed. The remaining 36 sequences were used for the generation of the phylogenetic tree (and bootstrapped tree as well). | |||
<font size = "4">'''Multiple Sequence Alignment'''</font> | |||
The following multiple sequence alignment (MSA) was obtained (Figure 4). | |||
From the alignments, gi<nowiki>|</nowiki>10888xy and gi<nowiki>|</nowiki>10888yz are representative of gi<nowiki>|</nowiki>108881764 and gi<nowiki>|</nowiki>108881765 respectively. Both these hypothetical proteins belong to the mosquito ''Aedes aegypti''. | |||
The identifier numbers for these two proteins were initially changed to an alpha-numeric one, due to the inability of Phylip to generate a tree from the original identifiers. This was due to the fact that the programme only took the first five numeric digits (10888), thereby resulting in a programme error prompt which listed both proteins as duplicates (from the identifier numbers). Both these identifiers were subsequently renamed for the final phylogenetic tree. | |||
{|cellspacing="0" cellpadding = "10" style="border-style:solid; border-color:black; border-width:1px;" | |||
| <br> | |||
|} | |||
[[Image: | [[Image:Results_-_dali_02.png]] | ||
'''Figure | '''Figure 4.''' MSA of query (top-most sequence – No.1) and related sequences. | ||
From the MSA, it can be observed that there are generally slight domain conservations throughout the protein sequences. Small insertion and deletion gaps were noticeable along the alignment as well. A particularly large insertion gap was observed between amino acids 91 to 114. | |||
The organisms with the large insertion gaps were as identified below: | |||
''Bacillus licheniformis'' | |||
''Bacillus subtilis'' | |||
''Bacillus halodurans'' | |||
''Bacillus clausii'' | |||
''Symbiobacterium thermophilum'' | |||
A highly conserved (with invariant) section of amino acids (LV)–(LVA)–(LIV)–(LIV)-T-N-G was observed in all the sequences from amino acid 211 to 217 in the alignment. Downstream of this conserved portion of genes are 5 more invariant positions (1 or 2 amino acids in length). | |||
From these short conservation regions, the functions or even structure of the encoded proteins could have significance in its evolutionary pattern. | |||
<font size = "4">'''Phylogenetic Tree'''</font> | |||
The tree was plotted to obtain the phylogenetic lineage (Figure 5). | |||
{|cellspacing="0" cellpadding = "10" style="border-style:solid; border-color:black; border-width:1px;" | |||
|B<br> | |||
|} | |||
[[Image:Results_-_dali_04.png]] | |||
{|cellspacing="0" cellpadding = "10" style="border-style:solid; border-color:black; border-width:1px;" | |||
|A<br> | |||
|} | |||
[[Image:Results_-_dali_03.png]] | |||
[[Image:Results_-_dali_05.png|framed|none]] | |||
[[Image:Results_-_dali_06.png|framed|none]] | |||
'''Figure 5.''' '''(A)''' Phylogenetic tree showing organisms with related protein sequence homology in Radial Tree view. '''(B)''' Rectangular Cladogram view with related protein sequence homology. | |||
From the Rectangular Cladogram view, it could be observed that there are four distinct separate groups involving fishes, mammals (where the query protein is also mapped), bacteria and insects. | |||
<font size = "4">'''Bootstrapping'''</font> | |||
Bootstrapping values obtained were analysed. Branch values occurring below 75% (<nowiki><</nowiki>75%) would be indicated by an asterisk (*), as shown in Figure 6. | |||
[[Image:Results_-_dali_07.png|framed|none]] | |||
'''Figure 6.''' Branch bootstrap values in Rectangular Cladogram view. Branches with strap values <nowiki><</nowiki>75% were indicated with asterisks (*) | |||
[[category:uncategorized]] | [[category:uncategorized]] |
Revision as of 10:24, 9 June 2007
Results
Query Sequence
The amino acid query sequence of 2gfh protein (Figure 3) from Mus musculus is obtained from Genbank.
1 mgsdkihhhh hhmglsrvra vffdldntli dtagasrrgm levikllqsk yhykeeaeii 61 cdkvqvklsk ecfhpystci tdvrtshwee aiqetkggad nrklaeecyf lwkstrlqhm |
Figure 3. The 260 amino acid sequence of 2gfh protein.
Sequence Homology
From the BlastP similarity was used for comparison as these had shown higher homology to the query sequence sequence search, a total of 500 proteins were yielded.
Only a total of 38 proteins, in contrast with the remainder of the search results.
These proteins were chosen according to their bit scores and E-values. Two more outlier partial sequences contributing to poor overall alignment (huge deletion gaps) were subsequently removed. The remaining 36 sequences were used for the generation of the phylogenetic tree (and bootstrapped tree as well).
Multiple Sequence Alignment
The following multiple sequence alignment (MSA) was obtained (Figure 4).
From the alignments, gi|10888xy and gi|10888yz are representative of gi|108881764 and gi|108881765 respectively. Both these hypothetical proteins belong to the mosquito Aedes aegypti.
The identifier numbers for these two proteins were initially changed to an alpha-numeric one, due to the inability of Phylip to generate a tree from the original identifiers. This was due to the fact that the programme only took the first five numeric digits (10888), thereby resulting in a programme error prompt which listed both proteins as duplicates (from the identifier numbers). Both these identifiers were subsequently renamed for the final phylogenetic tree.
Figure 4. MSA of query (top-most sequence – No.1) and related sequences.
From the MSA, it can be observed that there are generally slight domain conservations throughout the protein sequences. Small insertion and deletion gaps were noticeable along the alignment as well. A particularly large insertion gap was observed between amino acids 91 to 114.
The organisms with the large insertion gaps were as identified below:
Bacillus licheniformis
Bacillus subtilis
Bacillus halodurans
Bacillus clausii
Symbiobacterium thermophilum
A highly conserved (with invariant) section of amino acids (LV)–(LVA)–(LIV)–(LIV)-T-N-G was observed in all the sequences from amino acid 211 to 217 in the alignment. Downstream of this conserved portion of genes are 5 more invariant positions (1 or 2 amino acids in length).
From these short conservation regions, the functions or even structure of the encoded proteins could have significance in its evolutionary pattern.
Phylogenetic Tree
The tree was plotted to obtain the phylogenetic lineage (Figure 5).
B |
A |
Figure 5. (A) Phylogenetic tree showing organisms with related protein sequence homology in Radial Tree view. (B) Rectangular Cladogram view with related protein sequence homology.
From the Rectangular Cladogram view, it could be observed that there are four distinct separate groups involving fishes, mammals (where the query protein is also mapped), bacteria and insects.
Bootstrapping
Bootstrapping values obtained were analysed. Branch values occurring below 75% (<75%) would be indicated by an asterisk (*), as shown in Figure 6.
Figure 6. Branch bootstrap values in Rectangular Cladogram view. Branches with strap values <75% were indicated with asterisks (*)