ATP binding domain 4 Evolution

From MDWiki
Jump to navigationJump to search

METHODS

The protein that is used in this research is a putative N-type ATP pyrophosphatase from Pyrococcus furiosus. This protein is predicted to be similar to ATP binding domain 4. From ‘Target Blast and Symatlas Table’, a Blastp search against the NCBI non-redundant protein database can be conducted for both sequences from Pyrococcus furiosus ATP pyrophosphatase and Macaca mulatta ATP binding domain 4. Sequences with small E value were selected and Fasta format of selected sequences were retrieved. Using these selected sequences, a multiple sequence alignment was conducted by using ClustalX. A phylogenetic tree can be constructed based on the best multiple sequence alignment so that the evolutionary relationship between species can be observed. From the phylogenetic tree, bootstrap was performed with 100 replicates. Thus, bootstrap values which indicate the branch length of the tree can be calculated. N-type ATP pyrophosphatase was also searched by using STRING database which will allow the observation of the occurence of the protein superfamily in other species.

RESULTS

From the Blastp search, it was found that the E value for both query sequence i.e; 1ru8A and Macaca mulatta, are very low (Figure 1 and Figure 2). The highest E-value for 1ru8A and the Macaca mulatta are 2e-25 and 1e-49 respectively which are very low. Therefore, all 200 sequences were aligned by using ClustalX. From the multiple sequence alignment, it was found that only one conserved region presence in the sequences. Therefore, only extremely low E values were re-selected and unrelated sequences were ignored and deleted. Sequences were deleted in order to acheive the best multiple alignment. Based on these new re-selected sequences (88 sequences), it was found that there are seven conserved residues but only three conserved residues are significant since these residues are a part of PP-loop motif (Figure 3 and Figure 4).

Phylogram was constructed from Treeview software which indicates the relationship between the taxa (Figure 5). Macaca mulatta is closely related to Homo sapien based on their high similarity of sequences of ATP binding domain 4 Therefore, Macaca mulatta are homologues to Homo sapien. Unrooted or radial tree was constructed from Treeview software (Figure 6). The tree showed the evolutionary relationship between taxa and it was found that the members of archaea and eukarya are clearly seperated. Domain bacteria was absence in this tree.

Bootstrap was performed to see the reliablity of branching order and distance i.e: a measure of the quality of phylogenetic tree. Bootstrap value indicates the confidence level of the branching order and distance. Value which is less than 50% are meaningless. In this project, some species from the phylogenetic tree were deleted when constructing Bootstrap tree since most of the Bootstrap value are 100%. Based on bootstrap tree (Figure 7), most of the value are lesser than 75%.

Since domain bacteria was not found in the phylogenetic tree, STRING search of N-type ATP pyrophosphatase was performed and it was found that some bacteria species actually belong to N-type ATP pyrophosphatase (Figure 8). However, when the sequences are aligned with that for Pyrococcus furiosus and Macaca mulatta, the sequences similarity are low and the E-values are quite high.

BlastP

Blastp search-1ru8.png

Figure 1: The result of BlastP search which used N-type ATP pyrophosphatase from Pyrococcus furiosus as query sequence (1RU8A)

Blastp-Macaca mulatta.png
Figure 2: The result of BlastP search which used protein sequence that is predicted to be similar ro ATP binding domain 4 from Macaca mulatta.


Multiple sequence alignment-ClustalX

Clustalx taken from residue 98-350.png

Figure 3: Multiple sequence alignment obtained from ClustalX. Section of msa revealed three important residue which are conserved across species in archaea and eukarya.


Clustalx taken from residue 98-350 Part2.png

Figure 4: Multiple sequence alignment obtained from ClustalX. Section of msa revealed three important residue which are conserved across species in archaea and eukarya. gi 40889720 denotes N-type ATP pyrophosphatase in Pyrococcus furiosus.


Phylogeny tree


Phylogram.png

Figure 5:

Unrooted tree ATP Binding Domain 4.PNG

Figure 6: Unrooted phylogenetic tree of species with low E-value from a blastP search of N-type ATP pyrophosphatase from pyrococcus furiosus and ATP Binding Domain 4 from Macaca mulatta. Members of Eukarya and Archaea are clearly seperated and clustered together. No members of Bacteria present in this phylogenetic tree.

Bootstrap

Boostrap.png

Figure 7: Phylogenetic tree with Bootstrap values. The tree was constructed by using Mega4. The following are gi number and their respected species and protein ;

STRING

String ATP pyrophosphatase.png

Figure 8


DISCUSSION

Multiple sequence alignment

From the multiple sequence alignment, there are seven conserved regions found in all species from Domain Eukarya and Archaea (based on Blastp search) but only three conserved residues which are significant to the function and structure of the protein since these residues are identified to be conserved amino acid sequence motif for P-loop of nucleotide binding domains which might be important in phosphate binding. These conserved residues are Serine-103, Glysine-104 and Glysine-105. Since this motif presence in uncharacterized ATP pyrophosphatase domain, the motif is called PP motif and can be written as S-G (2)-K-D-[GS]. This PP-loop motif is a modified version of the P-loop of nucleotide binding domain that is involved in phosphate binding. However, based on the multiple sequence alignment, the amino acids for PP-loop that are conserved across all species in eukarya and archaea are Glysine-104 and Glysine-105. Substitution of Serine with Threonine in Pyrobaculum arsenaticum, Pyrobaculum calidifontis, Thermoproteus neutrophilis, Pyrobaculum islandicum and Staphlothermus marinus, are conserved since both amino acids are polar uncharged amino acid, suggesting that this residue is also important. However, this residue is not conserved in ATPBD4 in Homo sapiens which is only encoded by 259 amino acid sequences and the sequence started from Glysince-104. Other sequences which are conserved across all species are Lysine-106 and Aspartic acid-107 where Aspartic acid-107 is also a part of PP motif. In fact, amino acid in residue 108 is also one of the important residues in PP motif. This residue is conserved across all species except for Theileria parva and Theileria annulata where Serine is substituted with Glysine.

Blastp search revealed the very low E value from many sequences from different species. These sequences appear to be encoding different group of proteins such as ATP pyrophosphatase, ATP-binding protein, ATPase and endoribonuclease. Although these proteins are responsible for different functions, the similarities of sequence between the proteins across species are very high (as indicated by E value) and the sequences which are conserved are a part of PP motif, suggesting that these proteins descend from a common ancestral sequence and therefore are paralogs. Such phenomenon could be as the result of duplication of genes within a genome.


Phylogeny tree and Bootstrap


The phylogeny tree and boostrap revealed that PP-loop motif from ATP binding domain 4 and other related proteins are found in species of Archaea and Eukarya suggesting that this motif is highly conserved throughout evolution. However, Bacteria found to be lacking of this conserved sequences since none of the species belongs to Domain Bacteria. Nevertheless, based on STRING: functional protein association networks, some bacteria species still have protein sequences which belong to N-type ATP pyrophosphatase superfamily. BlastP are conducted to compare the protein sequences of ATPases from Pyrococcus furiosus and the bacteria species which under the clan of N-type ATP pyrophosphatase. The E value obtained from the result are below the E-value that is used as the cut-out-point. The bacteria species are Fusobacteruim nucleatum (1e-22), Caldicellulosiruptor saccharolyticus (1e-13),Campylobacter jejuni (4e-19),Chromobacterium violaceum (1e-21)and Polynucleobacter sp. (1e-24). These E value fall below the lowest E value based on Blastp from Pyrococcus furiosus and Macacca mulatta i.e: 5e-50 and 2e-25 respectively. These bacteria spesices can not be found in BlastP on the first place because the E value are quite high which indicates the sequence similarity between bacteria species and Pyroccoccuus furiosus and Macaca mulatata are very low. These suggested that although some bacteria species still have protein sequences which belong to N-type ATP pyrophosphtase superfamily, they are distantly related to both archaea and eukarya. Based on the high protein sequence similarity between archaea and eukarya, it can be suggested that archeaa and eukarya are closely related compared to that for bacteria.

Based on unrooted tree, the relatedness of the taxa and their relationship can be illustrated but the last common ancestor and how the species evolved cannot be observed. From figure5, it was found that the taxa are clearly grouped into domain Archaea and Eukarya. Moreover, in Eukarya, the taxa are nicely clustered according to parasites, vertebrates and invertebrates. Domain Bacteria is missing in this tree suggesting that Bacteria are distantly related to Archeaa and Eukarya based on the sequence of ATP Binding Domain 4. Phylogram is another way to illustrate the evolution of the taxa where the relationships between the taxa and also the time or rate of evolution can be observed. From figure6, it was found that Macaca mulatta evolved first and Pyrococcus furiosus evolved later (not sure!)

Bootstrap was conducted to test the reliability of the branching order of the phylogeny tree. Based on the bootstrap value, we can be confident with the order of the phylogeny tree thus allowing us to determine where speciation events occur. Bootstrap worked by performing 'pseudoreplicates' of multiple sequence alignments and in this project, 100 replicates was performed. The bootsrap tree can then be generated thus comparing the branching orders and distances of phylogeny tree. The bootstrap value for each branch are obtained in percentage which indicate the confidence of the branch being correct. If the value of bootstrap is less than 75%, the branching order is not very reliable and meaningless. If the value is between 90% and above, we can be confident that the branching orders are correct. Based on the bootstrap result, it was found that most of the value are lower than 75% suggesting that the branching patterns and distances are not reliable. Therefore, another phylogenetic tree need to be built in order to increase the reliablity of the tree....

Based on unrooted tree, it was found members of domain eukarya are nicely cluster at one side without any presence of other species from different domain i.e: Members from different domains are well-seperated. Therefore, it can be suggested that the evolutionary model for the PP-loop motif is hold since there is no evidence of the occurrence of lateral gene transfer.


Back to Main ATP binding domain 4 pages