Paper

From MDWiki
Revision as of 10:24, 9 June 2007 by Junxian (talk | contribs)
Jump to navigationJump to search

Results

Query Sequence

The amino acid query sequence of 2gfh protein (Figure 3) from Mus musculus is obtained from Genbank.

1 mgsdkihhhh hhmglsrvra vffdldntli dtagasrrgm levikllqsk yhykeeaeii

61 cdkvqvklsk ecfhpystci tdvrtshwee aiqetkggad nrklaeecyf lwkstrlqhm
121 iladdvkaml telrkevrll lltngdrqtq rekieacacq syfdaivigg eqkeekpaps
181 ifyhccdllg vqpgdcvmvg dtletdiqgg lnaglkatvw inksgrvplt sspmphymvs
241 svlelpallq sidckvsmsv

File:Results - dali 01.png

Figure 3. The 260 amino acid sequence of 2gfh protein.

Sequence Homology

From the BlastP similarity was used for comparison as these had shown higher homology to the query sequence sequence search, a total of 500 proteins were yielded.

Only a total of 38 proteins, in contrast with the remainder of the search results.

These proteins were chosen according to their bit scores and E-values. Two more outlier partial sequences contributing to poor overall alignment (huge deletion gaps) were subsequently removed. The remaining 36 sequences were used for the generation of the phylogenetic tree (and bootstrapped tree as well).

Multiple Sequence Alignment

The following multiple sequence alignment (MSA) was obtained (Figure 4).

From the alignments, gi|10888xy and gi|10888yz are representative of gi|108881764 and gi|108881765 respectively. Both these hypothetical proteins belong to the mosquito Aedes aegypti.

The identifier numbers for these two proteins were initially changed to an alpha-numeric one, due to the inability of Phylip to generate a tree from the original identifiers. This was due to the fact that the programme only took the first five numeric digits (10888), thereby resulting in a programme error prompt which listed both proteins as duplicates (from the identifier numbers). Both these identifiers were subsequently renamed for the final phylogenetic tree.


File:Results - dali 02.png

Figure 4. MSA of query (top-most sequence – No.1) and related sequences.

From the MSA, it can be observed that there are generally slight domain conservations throughout the protein sequences. Small insertion and deletion gaps were noticeable along the alignment as well. A particularly large insertion gap was observed between amino acids 91 to 114.

The organisms with the large insertion gaps were as identified below:

Bacillus licheniformis

Bacillus subtilis

Bacillus halodurans

Bacillus clausii

Symbiobacterium thermophilum

A highly conserved (with invariant) section of amino acids (LV)–(LVA)–(LIV)–(LIV)-T-N-G was observed in all the sequences from amino acid 211 to 217 in the alignment. Downstream of this conserved portion of genes are 5 more invariant positions (1 or 2 amino acids in length).

From these short conservation regions, the functions or even structure of the encoded proteins could have significance in its evolutionary pattern.

Phylogenetic Tree

The tree was plotted to obtain the phylogenetic lineage (Figure 5).

B

File:Results - dali 04.png

A

File:Results - dali 03.png

Figure 5. (A) Phylogenetic tree showing organisms with related protein sequence homology in Radial Tree view. (B) Rectangular Cladogram view with related protein sequence homology.

From the Rectangular Cladogram view, it could be observed that there are four distinct separate groups involving fishes, mammals (where the query protein is also mapped), bacteria and insects.

Bootstrapping

Bootstrapping values obtained were analysed. Branch values occurring below 75% (<75%) would be indicated by an asterisk (*), as shown in Figure 6.

Figure 6. Branch bootstrap values in Rectangular Cladogram view. Branches with strap values <75% were indicated with asterisks (*)