Paper

Evolution, Structure and Function of N-acetylneuraminic Acid Phosphatase

Jason Cheong Wen Leong (s41235935), Yau Heen wai (s41286272), Lim Junxian (s41313011)

Abstract

N-acetylneuraminic acid phosphatase a novel protein investigated by our group. With its structure and sequence known, the function was

assumed to be a part of the enormous family of haloacid dehalogenase-like hydrolases. It represent the family of predicted small molecule

phosphatases related by sequence cleave sites and reactions in the genomes of bacteria, archaea, and eukaryotes. Many have evolved to be used

for specific biological functions within individual organism

Introduction

The novel protein investigated by our group is N-acetylneuraminic acid (Neu5Ac) phosphatase, it was first release on Protein Data Bank

(PDB) on 18^th April 2006, named 2gfh. Mus muscular (mouse) was used as the source of the gene and Escherichia coli was the

vector used to express the novel protein. In Homo sapiens (man), it was known to be as N-acetylneuraminate 9-phosphate (Neu5Ac-9-P)

phosphatase haloacid dehalogenase (HAD)-like hydrolase domain containing protein 4. Other aliases of the novel protein include C20orf147, NANP

and HDHD4. The gene encoding the protein was found to be on chromosome 20; location 20p11.1.

Neu5Ac-9-P phosphatase belongs to a large family of haloacid dehalogenase (HAD)-like hydrolases. The enzymes found within this classification

possess varied types of cleavage activities. Although many of its members are related by sequence cleave sites and reactions, many have evolved

to be used for specific biological functions within individual organisms.

These small molecule phosphatase enzymes have been found to exists in the various domains of life — Bacteria, Archaea, and Eucarya. The number

of genes found within each organism is varied from bacteria to eukaryotes. Bacterial Neu5Ac synthase and mammalian Neu5Ac-9-P synthase are

homologous proteins, sharing about 35% sequence identity¹. Neu5Ac-9-P phosphatase dephosphorylates Neu5Ac-9-P to form Neu5Ac, the

main form of sialic acid.

Figure 1. Dephosphorylation of Neu5Ac-9-P is a reversible reaction with an end product of Neu5Ac (sialic acid) and a free phosphate.

Sialic acids are nine-carbon sugars with a carboxylate group that are found as components of many glycoproteins, glycolipids, and

polysaccharides in animals, viruses, and bacteria. The main form of sialic acid, Neu5Ac, is often present as the terminal sugar of N-

glycans on glycoproteins and glycolipids and plays an important role in protein–protein and cell–cell recognition ^{2; 3}.

Figure 2. Chemical structure of sialic acid.(http://en.wikipedia.org/wiki/Sialic_acid)

Sialic acids are found widely distributed in animal tissues and in bacteria, especially in glycoproteins and gangliosides. The amino group

bears either an acetyl or a glycolyl group. Sialic acid consists of acetylated, sulfated, methylated, and lactylated derivatives and is a large

family of more than 50 members ⁴.

Results

Query Sequence

The amino acid query sequence of 2gfh protein (Figure 3) from Mus musculus is obtained from Genbank.

1 mgsdkihhhh hhmglsrvra vffdldntli dtagasrrgm levikllqsk yhykeeaeii

61 cdkvqvklsk ecfhpystci tdvrtshwee aiqetkggad nrklaeecyf lwkstrlqhm

121 iladdvkaml telrkevrll lltngdrqtq rekieacacq syfdaivigg eqkeekpaps

181 ifyhccdllg vqpgdcvmvg dtletdiqgg lnaglkatvw inksgrvplt sspmphymvs

241 svlelpallq sidckvsmsv>

Figure 3. The 260 amino acid sequence of 2gfh protein.

Sequence Homology

From the BlastP similarity was used for comparison as these had shown higher homology to the query sequence sequence search, a total of 500

proteins were yielded.Only a total of 38 proteins, in contrast with the remainder of the search results.These proteins were chosen according to

their bit scores and E-values. Two more outlier partial sequences contributing to poor overall alignment (huge deletion gaps) were subsequently

removed. The remaining 36 sequences were used for the generation of the phylogenetic tree (and bootstrapped tree as well).

Multiple Sequence Alignment

The following multiple sequence alignment (MSA) was obtained (Figure 4). From the alignments, gi|10888xy and

gi|10888yz are representative of gi|108881764 and gi|108881765 respectively. Both these

hypothetical proteins belong to the mosquito Aedes aegypti.

The identifier numbers for these two proteins were initially changed to an alpha-numeric one, due to the inability of Phylip to generate a tree

from the original identifiers. This was due to the fact that the programme only took the first five numeric digits (10888), thereby resulting

in a programme error prompt which listed both proteins as duplicates (from the identifier numbers). Both these identifiers were subsequently

renamed for the final phylogenetic tree.

Figure 4. MSA of query (top-most sequence – No.1) and related sequences.

From the MSA, it can be observed that there are generally slight domain conservations throughout the protein sequences. Small insertion and

deletion gaps were noticeable along the alignment as well. A particularly large insertion gap was observed between amino acids 91 to 114.

The organisms with the large insertion gaps were as identified below:

Bacillus licheniformis

Bacillus subtilis

Bacillus halodurans

Bacillus clausii

Symbiobacterium thermophilum

A highly conserved (with invariant) section of amino acids (LV)–(LVA)–(LIV)–(LIV)-T-N-G was observed in all the sequences from amino acid 211

to 217 in the alignment. Downstream of this conserved portion of genes are 5 more invariant positions (1 or 2 amino acids in length).From these

short conservation regions, the functions or even structure of the encoded proteins could have significance in its evolutionary pattern.

Phylogenetic Tree

The tree was plotted to obtain the phylogenetic lineage (Figure 5).

Figure 5. (A) Phylogenetic tree showing organisms with related protein sequence homology in Radial Tree view. (B) Rectangular

Cladogram view with related protein sequence homology.

From the Rectangular Cladogram view, it could be observed that there are four distinct separate groups involving fishes, mammals (where the

query protein is also mapped), bacteria and insects.

Bootstrapping

Bootstrapping values obtained were analysed. Branch values occurring below 75% (<75%) would be indicated by an asterisk (*),

as shown in Figure 6.

Figure 6. Branch bootstrap values in Rectangular Cladogram view. Branches with strap values <75% were indicated with

asterisks (*)

DALI Searching

SUMMARY: PDB/chain identifiers and structural alignment statistics NR. STRID1 STRID2 Z RMSD LALI LSEQ2 %IDE REVERS PERMUT NFRAG TOPO PROTEIN

 1: 3033-A 2gfh-A 41.1  0.0  246   246  100      0      0     1 S    HYDROLASE        haloacid dehalogenase-like hydrolase domain
 2: 3033-A 1fez-A 18.1  3.5  178   256   22      0      0    13 S    HYDROLASE        phosphonoacetaldehyde hydrolase         (bacillus c 
 3: 3033-A 2hsz-A 17.9  3.3  168   222   23      0      0    13 S    STRUCTURAL GENOMICS, UNKNOWN FUNCTION    novel predicted
 4: 3033-A 1qq5-A 17.3  3.1  198   245   19      0      0    12 S    HYDROLASE        l-2-haloacid dehalogenase       (xanthobacter aut 
 5: 3033-A 1o03-A 17.0  5.0  188   221   20      0      0    11 S    ISOMERASE        beta-phosphoglucomutase         (lactococcus lactis
 6: 3033-A 2b0c-A 16.4  2.6  184   199   20      0      0    13 S    STRUCTURAL GENOMICS, UNKNOWN FUNCTION    putative phospha 
 7: 3033-A 2fdr-A 15.8  4.4  190   214   19      0      0    15 S    STRUCTURAL GENOMICS, UNKNOWN FUNCTION    conserved hypoth
 8: 3033-A 2p11-A 15.7  2.9  194   211   16      0      0    20 S    STRUCTURAL GENOMICS, UNKNOWN FUNCTION    hypothetical pro 
 9: 3033-A 1te2-A 15.7  3.6  170   211   19      0      0    15 S    HYDROLASE        putative phosphatase    (escherichia coli o157
10: 3033-A 1yns-A 15.3  4.0  169   254   11      0      0    13 S    HYDROLASE        e-1 enzyme (enolase-phosphatase e1)     (homo s 
11: 3033-A 1qyi-A 15.0  3.5  198   375   19      0      0    17 S    STRUCTURAL GENOMICS, UNKNOWN FUNCTION    hypothetical pro
12: 3033-A 2i6x-A 14.9  3.1  176   199   19      0      0    18 S    HYDROLASE        hydrolase, haloacid dehalogenase-like family 
13: 3033-A 1u7p-A 14.3  2.9  144   164   18      0      0    14 S    HYDROLASE        magnesium-dependent phosphatase-1 (mdp-1)       (
14: 3033-A 1ymq-A 14.1  2.3  130   260   16      0      0    14 S    TRANSFERASE      sugar-phosphate phosphatase bt4131      (bacte 
15: 3033-A 1j8d-A 13.1  2.5  141   180   11      0      0    12 S     HYDROLASE       deoxy-d-mannose-octulosonate 8-phosphate ph
16: 3033-A 2ho4-A 12.9  2.4  131   246   19      0      0    14 S    HYDROLASE        haloacid dehalogenase-like hydrolase domain 
17: 3033-A 1pw5-A 12.7  2.3  136   246   21      0      0    12 S    STRUCTURAL GENOMICS, UNKNOWN FUNCTION    nagd protein, pu
18: 3033-A 1nf2-A 12.7  2.6  127   267   13      0      0    11 S    STRUCTURAL GENOMICS/UNKNOWN FUNCTION     phosphatase     (the 
19: 3033-A 1rlm-A 12.4  2.8  131   269   13      0      0    14 S    HYDROLASE        phosphatase Mutant      (escherichia coli) bacte
20: 3033-A 1f5s-A 12.1  3.5  159   210   14      0      0    15 S     HYDROLASE       phosphoserine phosphatase (psp)         (methanoco 
21: 3033-A 1cr6-B 12.0  3.8  177   541   18      0      0    18 S    HYDROLASE        epoxide hydrolase       (mus musculus) mouse expr
22: 3033-A 1rku-A 11.9  3.6  172   206   11      0      0    18 S    TRANSFERASE      homoserine kinase       (pseudomonas aeruginosa 
23: 3033-A 2b30-A 11.8  2.7  134   284   16      0      0    12 S    STRUCTURAL GENOMICS, UNKNOWN FUNCTION    pvivax hypotheti
24: 3033-A 1kyt-A 10.5  2.5  122   216   13      0      0    15 S    STRUCTURAL GENOMICS, UNKNOWN FUNCTION    hypothetical pro 
25: 3033-A 2o2x-A 10.3  3.6  139   204   17      0      0    14 S    STRUCTURAL GENOMICS, UNKNOWN FUNCTION    hypothetical pro
26: 3033-A 1u02-A 10.1  2.7  128   222   16      0      0    12 S    STRUCTURAL GENOMICS      trehalose-6-phosphate phosphatase 
27: 3033-A 2fea-A 10.0  3.5  167   219    7      0      0    21 S    HYDROLASE        2-hydroxy-3-keto-5-methylthiopentenyl-1- pho
28: 3033-A 2hx1-A  9.6  3.2  130   275   24      0      0    19 S    HYDROLASE        predicted sugar phosphatases of the had supe 
29: 3033-A 1mh9-A  9.2  3.2  146   194   15      0      0    15 S    HYDROLASE        deoxyribonucleotidase (mitochondrial 5'(3')-

Figure 7. The DALI search results that were returned through e-mailed. The first position (2gfh) shows the query protein. With a z value

of 41.1 and a root mean standard deviation of 0.0 and %IDE of 100, shows that it is a HAD family protein. The 2nd, 9th, 16th, 19th and 28th

shows significant similarities of query protein as a hydrolase phosphatase as Z values are more then 1, RMSD still of low values and %IDE of

more then 20.Z

From the DALI search (Figure 7), Neu5Ac phosphatase is a haloacid dehalogenase-like hydrolase. This family is structurally different from the

alpha/ beta hydrolase family. It has L-2-haloacid dehalogenase, epoxide hydrolases and phosphatases. This family consists of two domains of

structure. One is an inserted four helix bundle, which is the least well conserved region of the alignment, between residues 16 and 96 of (S)-2-

haloacid dehalogenase I. The remaining of the fold is composed of the core alpha/beta domain. It is classified as a hydrolase found in mouse.

The chemical components would be phosphate ion, sodium ion, 1,2-ethanediol, chloride ion. PO₄ and EDO are ligands while

Na and Cl are metals.

Protein Structure

Figure 8. Secondary structure of 2gfh protein with residue interaction and the catalytic residues marked out in red boxes. (http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/CheckCode.pl)

Figure 9. (A) Main, bottom and right view of 2gfh protein, the spheres represent the element/chemical components. (B) 2gfh protein viewed using KiNG. (C) Topology diagram of 2gfh showing the beta and alpha strand. (http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/CheckCode.pl)

The structure of 2gfh protein was determined to a be polypeptide(L) with 260 residues. Secondary structure (Figure 8) comprises of 56% helical

(13 helicals; 146 residues) and 11% beta sheet (8 strands; 31 residues)

Protein Folding

Table 1. Matching folds detected by SSM and Dali, with scores values between the Neu5Ac-9-P phosphatase and other proteins.(http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/profunc/GetResults.pl?source=profunc&user_id=bb32&code=143144)

Hit	Z-score	No. SSE	RMSD (Å)	Sequence Id	PDB entry	Name
1	16.6	16	0.00	100.0%	2gfhA	Crystal structure of protein c20orf147 homolog (17391249) from Mus musculus at 1.90 a resolution
2	9.4	16	2.34	23.0%	1x42A	Crystal structure of a haloacid dehalogenase family protein (ph0459) from Pyrococcus horikoshii ot3
3	9.2	10	1.63	26.0%	1swwA	Crystal structure of the phosphonoacetaldehyde hydrolase d12a mutant complexed with magnesium and substrate phosphonoacetaldehyde
4	9.3	10	1.66	26.0%	1swvA	Crystal structure of the d12a mutant of phosphonoacetaldehyde hydrolase complexed with magnesium
5	7.1	11	1.75	24.4%	1fezA	The crystal structure of Bacillus cereus phosphonoacetaldehyde hydrolase complexed with tungstate, a product analog
6	6.3	12	2.34	20.5%	2p11A	Crystal structure of hypothetical protein (yp_553970.1) from Burkholderia xenovorans lb400 at 2.20 a resolution
8	7.5	11	1.96	26.8%	1rqlA	Crystal structure of phosponoacetaldehyde hydrolase complexed with magnesium and the inhibitor vinyl sulfonate
9	7.3	11	1.96	26.8%	1rqnA	Phosphonoacetaldehyde hydrolase complexed with magnesium
10	6.7	13	2.44	22.1%	2b0cA	The crystal structure of the putative phosphatase from Escherichia coli

The high score values between Neu5Ac phosphatase and the other proteins (Table 1), proving that the folding of the different proteins match.

The Z-score measures the statistical significance of a match in terms of standard Gaussian statistics. It is based on the quality of the match

between the query and target structures and assumes a Gaussian distribution of quality scores would be obtained from a large enough databases

of protein structures. The higher the Z-score, the higher is the statistical significance of the match is the number of matched secondary

structure elements, examples; helices and strands between the two structures.

Sequence Similarity

Hydrolase: domain 1 of 1, from 18 to 224: score 96.2, E = 1e-25

                  *->ikavvFDkDGTLtdgkeppiaeaiveaaaelgl.........lplee
                     ++av+FD+D+TL+d+ + + ++ + e+ ++l  + + +++ ++  + 
      query    18    VRAVFFDLDNTLIDT-AGASRRGMLEVIKLLQSkyhykeeaeIICDK 63

                  vekllgrgl.g.erilleggltaell...................d.evl
                  v   l +++ ++    ++   t ++ +   +++++ ++++  ++    ++
      query    64 VQVKLSKECfHpYSTCITDVRTSHWEeaiqetkggadnrklaeecYfLWK 113

                  glial.dklypgarealkaLkrrGikvailTggdr.naeallealgla.l
                   ++ ++  l +++++ l +L++  +++ +lT+gdr++++++ ea+++ ++
      query   114 STRLQhMILADDVKAMLTELRKE-VRLLLLTNGDRqTQREKIEACACQsY 162

                  fdviidsdevggvgpivvgKPkpeifllalerlgvkpeevgpevlmVGDg
                  fd+i++++e +        KP+p if + ++ lgv+p ++    +mVGD+
      query   163 FDAIVIGGEQK------EEKPAPSIFYHCCDLLGVQPGDC----VMVGDT 202

                  vnDapalaa.AGv.gvamgngg<-*
                  + +++ +  +AG+++++++n +   
      query   203 LETDIQGGLnAGLkATVWINKS    224

Figure 10. The alignments of the top-scoring domains of 2gfh protein (query) using Pfam 21.0 (Janelia Farm). (http://pfam.janelia.org)

A search of using Pfam (Figure 10) matched the query sequence in this case Neu5Ac-9-P phosphatase with hydrolase. The E value of 1e-25 gives

significant results proving that it is not by chance nor random that the match made was a hydrolase.

Surface Properties

Figure 11. Molecular surface of 2gfh colored by electrostatic potential shown using Pymol.

Using the PDB file name 2gfh, a model was constructed using Pymol showing the electrostatic potential of the molecular surface. As shown in

Figure 11, the red color portions are negatively charged while the blue would be positively charged region. The charge ranges from -63.539 to

63.539.

Figure 12. (A) Molecular structure of 2gfh showing the possible binding sites with the different colors represent classes of amino

acids. (B) Results from Profunc show that 2gfh comprises of 2 ligands: phostphate ion (PO₄) and ethylene glycol (EDO).

Profunc helps to identify the likely biochemical function of a protein from its 3 dimensional (3D) structure. It uses fold matching, residue

conservation, surface cleft analysis, and functional 3D templates, to identify both the protein’s likely active site and

possible homologues in the PDB. The search provided information on the possible binding sites and important identification of potential ligands

like PO₄ and EDO. Based on comprehension and research, EDO (Figure 14) could most likely be a chemical compound widely used to

crystallize protein from its native form and used as automotive antifreeze. Finding of the PO₄ ligand (Figure 13) was important as

it would most likely be an active site. As Neu5Ac-9-P phosphatase is a hydrolase, the PO₄ could well be involved in the mechanism

and function of the protein.

Figure 13. (A) Molecular structure of 2gfh with the ligand PO_4. (B) Molecular and chemical structure of PO_4. (C) Ligand interaction involving PO_4.

Figure 14. (A) Molecular structure of 2gfh with the ligand EDO_. (B) Molecular and chemical structure of EDO_. (C) Ligand interaction involving EDO_.

Figure 15. Molecular structure of Neu5Ac-9-P was determined using RasMol, showing the conserved region of asparagine, threonine and leucine with EDO molecule in grey and PO₄ in yellow.

Table 2. Number 4 shows siginificant scores implying possible convserved residues in N-acetylneuraminic acid phosphatase.

No	Score	Number of residues	Cleft	Average accessibility	Average conservation	Residues
1	3.770	3	3	2	0.437	Ser212(A), Gly213(A), Arg214(A)
2	3.579	3	3	-	0.913	Ala201(A), Gly202(A), Leu203(A)
3	3.483	3	3	-	0.816	Leu177(A), Gly178(A), Val179(A)
4	3.000	3	3	2	1.000	Asn15(A), Thr16(A), Leu17(A)
5	0.646	4	4	-	0.646	Cys145(A), Ala146(A), Cys147(A), Gln148(A)

Profunc also provided information of the conserved residues in Neu5Ac-9-P phosphatase. By using nest analysis whereby, nests are structural

motifs that are often found in functionally important regions of protein structures and given a score value. When a score is above 2.0, it

implies that the nest is a functionally significant one. The results were tabulated showing the nest’s start and end residues

residues making up the nest. Residue conservation was given to each nest residue. The score ranges from 0.0 to 1.0 which signifies that the

residue is not at all conserved or perfectly conserved respectively. It is determined from a multiple sequence alignment of the

protein’s sequence against BLAST hits from UniProt sequence database. Results (Figure 15) show 2 highly conserved

region asparagine, threonine and leucine as the residue conservation score was 1.0.

Functional analysis

The MSA (Figure 16) for the query sequence and the other 35 sequences shows several conserved motifs. The 1^st conserved motif

consists of almost invariant region of aspartic acid (D), only the 33^rd protein (gi: |45552117|)

showing gap. The 2^nd motif shows conserved and invariant of leucine (L), threonine (T), asparagine (N) and glycine (G). The

3^rd motif shows 2 invariant amino acid residues of lysine (K), proline (P), valine (V), glycine (G), aspartic acid (D) and

isoleucine (I). This correlates with the study done by Maliekal et al and strongly suggested that the query protein is a phosphatase.

Figure 16. MSA of the query protein Neu5Ac phosphatase with 35 others proteins. Only the 60^th – 70^th and the

210^th -300^th amino acid sequence were shown to illustrate the conserved and invariant regions. The 3 boxed-up sequences

were either conserved or invariant regions.

Discussion

Multiple Sequence Alignment

From the MSA obtained, the organisms with the large gap insertions were isolated to be mainly Bacillus, with the exception

of Symbiobacterium thermophilum. Symbiobacterium is an uncultivable thermophile isolated from compost. Its survival is based mainly on

microbial commensalisms ⁵^{. Th}is bacterium can only grow in vitro, if it is co-cultured with Bacillus species

bacteria ⁵^{. Th}is could therefore explain its genetic association with Bacillus, as observed from the sequence

alignment. However, interestingly, Bacillus is classified as Gram-positive, while Symbiobacterium is a Gram-negative bacterium. As

observed from the sequence alignment, other Gram-negative bacterium protein sequences (Vibrio species) do not contain the large gap

insertion at the 91^st to 114^th amino acid positions, with the exception of Symbiobacterium. Hence, more genetic (and

even functional) analysis might be necessary to determine the hydrolase protein relationship between the Gram-positive Bacillus with the

Gram-negative Symbiobacterium.

Phylogenetic Tree

From the Rectangular Cladogram view of the tree, it was observed that there were two main Domains — Procaryotes and Eucaryotes. This would also

be the root and first branching point of the phylogenetic tree.

The invertebrates (of Phylum Arthropoda) would be the first branching point for the eucaryotes in this tree.

From there, further branching occurs into the vertebrates (of Phylum Chordata). This would then be further branched into Osteichthyes (bony

fish) and Tetrapoda (four-limbed vertebrates) Superclasses.

For the prokaryotic domain, mainly branching occurs between Gram-positive (Bacillus spp.) and Gram-negative (Vibrio spp.) bacteria.

Hence, it can be generally deduced that the Neu5Ac (hydrolase) protein is non-evolutionary specific, as it is observed to be present in almost

all main Phyla and Classes of organisms from the two main Procaryotic and Eucaryotic Domains. Its functional significance would therefore be a

general one.

Bootstrapping

Tree bootstrapping is necessary to test for the reliability of the branching patterns and distances formed on the phylogenetic tree. This was

done by making "pseudoreplicates" of multiple sequence alignments of up to 100 sets. The distance matrices were recalculated using these d

duplicate alignment values to generate a bootstrap tree, which can be used to compare the branching patterns and distances with the original

phylogenetic tree.The bootstrap values (in percentage) obtained on each branch, signify branching confidence. Bootstrap values of 95% equate to

full branching confidence; 75% value equates to 95% branching confidence; 60% value equates to much lowered branching confidence; while 50%

value would render no branching confidence.

Functional Analysis

Figure 17. (A) List of all matched protein name terms for 2gfh. (B) List of all matched Gene Ontology terms for 2gfh. The score in

red is a measure of how strongly the term is predicted from the hits obtained by the different methods. The scores in blue show each

method’s contribution to the total score (with the number of relevant sequences/structures shown in brackets in grey).

(http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/)

The predicted function based on the evolution and structure, illustrate that 2gfh is a hydrolase. Profunc searches (Figure 17) on 2gfh also

show that it possesses hydrolase activity. The highest score for Gene Ontology (Figure 17) states it used for metabolism and possesses

phosphoglycolate phosphatase activity. Hydrolyase is an enzyme which catalyzes hydrolysis reaction (Figure 18), which is the addition of the

hydrogen and hydroxyl ions of water to a molecule with its consequent splitting into two or more simpler molecules. Hydrolase is the systematic

name for any enzyme of EC class 3.

Figure 18. Hydrolyase catalyze the hydrosis of the chemical bond between A and B, resulting of 2 simple molecules.

Neu5Ac phosphatase belongs to the HAD family, HAD is a vast superfamily of largely uncharacterized enzymes, with a few members shown to possess

phosphatase, phosphoglucomutase, phosphonatase, and dehalogenase activities ⁶. HAD-like hydrolases represent the largest family of

predicted small molecule phosphatases encoded in the genomes of bacteria, archaea, and eukaryotes, with 6,805 proteins in data bases ⁷.

HADs share little overall sequence similarity (15–30% identity), but they can be identified by the presence of three short

conserved sequence motifs ⁷. Most of the characterized HADs have phosphatase activity (CO–P bond hydrolysis), catalyze dehalogenase

activity (C–halogen bond hydrolysis), phosphonatase (C–P bond hydrolysis), and phosphoglucomutase (CO–P bond hydrolysis and intramolecular

phosphoryl transfer) reactions ⁶.

In the study conducted by Maliekal et al (Figure 19), they compared the alignment of the first 280 amino acids of rat and human Neu5Ac-9-P

phostphatase with other 2 homologous sequences.

Figure 19. Alignment of rat and human Neu5Ac-9-P phosphatase with homologous sequences. The following sequences are aligned: Rattus

norvegicus (Rnor, gi-34859431), Homo sapiens (Hsap, gi-23308749), Xenopus laevis (Xlae, gi-46250196), Danio rerio (Drer, gi-

63101958), and Drosophila melanogaster (Dmel, gi-28381565). Only the first 280 residues of the latter sequence are shown. Completely

conserved residues are shown in boldface type. Asterisks indicate the extremely conserved residues in phosphatases of the HAD family ⁸.

The MSA done by Maliekal et al shows that the Neu5Ac-9-Pase orthologs shared the three motifs found in phosphatases of the HAD family,

namely a 1^st motif comprising two extremely conserved aspartates (D), a 2^nd motif comprising a conserved serine (S) or

threonine (T), and a 3^rd motif comprising a conserved lysine (K) and two conserved aspartates (D) ⁸. The first aspartate

in the first motif forms a phosphoaspartate during the catalytic cycle ⁹. These findings suggested therefore that the HDHD4 protein

was a phosphatase. The first aspartate in the first motif forms a phosphoaspartate during the catalytic cycle ¹⁰. In our MSA (Figure

16), the several conserved motifs that shared great similarity to the study done by Maliekal et al. These findings suggested therefore that

Neu5Ac-9-P phosphatase protein is a phosphatase

Phosphatases of the HAD family are dependent on the presence of Mg²⁺and Ca²⁺ inhibits

their activity by replacing Mg²⁺and preventing the nucleophilic attack by the aspartate that covalently binds the

phosphate group ⁸. Phosphatases that form a phosphoenzyme during the catalytic cycle, are inhibited by vanadate ¹¹.

Vanadate (VO₄³⁻), formed when V₂O₅ is dissolved in water at alkaline pH, appears to inhibit enzymes

that process phosphate.

The presence of a protein sharing at least about 50% sequence identity with rat or human Neu5Ac-9-P phosphatase in the genomes of mammals,

chicken, xenopus, and fishes indicates that sialic acid synthesis proceeds via the 9-phosphate intermediate in these species ⁸. This

is consistent with the finding that the genome of vertebrates comprises a gene encoding the bifunctional enzyme UDP- N-acetylglucosamine-2-

epimerase or N-acetylmannosamine kinase ⁸^.

In bacteria, E. coli genome encodes five membrane-bound and 23 soluble HAD-like hydrolases, representing about 40% of the E. coli

proteins with known or predicted small molecule phosphatase activity ¹². The metabo lites hydrolyzed by HADs are intermediates of

various metabolic pathways and reactions (glycolysis, pentose phosphate pathway, gluconeogenesis, and intermediary sugar and nucleotide

metabolism).

E. coli HADs hydrolyze a wide range of phosphorylated metabolites, including carbohydrates, nucleotides, organic acids, and coenzymes.

Studies have shown that the most common substrates in metabolism such as glycolysis and pentose phosphate pathway (Figure 18). These enzymes

were fructose-1-phosphate, glucose-6-phosphate, mannose-6-phosphate, 2-deoxyglucose-6-phosphate, fructose-6- phosphate, ribose-5-phosphate, and

erythrose- 4-phosphate ¹³.

Figure 20. The schematic diagrams of glycolysis and pentose phosphate metabolic pathways. The green arrows show the substrates that are hydrolyzed by HADs (A) Glycolysis pathway with substrates that are hydrolyze by HADs: glucose 6-phosphate, fructose 6-phosphate and dihydroxyacetone phosphate. (B) Pentose phostphate pathway with substrates that are hydrolyze by HADs: glucose-6-phosphate, fructose-6-phosphate, dihydroxyacetone phosphate, glyceraldehyde-3-phosphate, gluconate 6-phosphate and erythrose-4-phosphate.

(http://www.steve.gb.com/science/core_metabolism.html)

Methods and Materials

Query Sequence

Sequences of N-acetylneuraminic acid phosphatase from House Mouse (Mus musculus) were obtained from Genbank protein database, with Accession number of 2GFH_A.

Sequence Homology

The query sequence was matched to related (amino acid sequence similarity) proteins from Blast. This was done using a fixed database stored

within a DVD, instead of obtaining the query search from the actual BlastP database on the World Wide Web.

Multiple Sequence Alignment

Alignment was performed on all the related proteins (from the BlastP search), using ClustalX. Similarly, the ClutalX programme used for this

was obtained from the DVD, instead of the website.

Phylogenetic Tree

Phylip programme was used for the purpose of obtaining a phylogentic tree to determine the relationship of the proteins from individual

organisms. The various programmes used were again obtained from the DVD.Prodist (within Phylip) was used to calculate the distance matrix. The

calculation method selected was as using PAM-Dayhoff.Neighbor (also found within Phylip) was next used to form the phylogenetic tree, using the

distance matrix calculation obtained. The "Input order of species" option was set to "Random" when generating the tree, with a random odd

number also given.Treeview programme was used to view the final tree.

Bootstrapping

Seqboot (within Phylip) was used to replicate 100 samples of the sequence alignments.

The outfile (.aln) was then used in calculating the bootstrap distance matrices, using Prodist. The parameter setting for this calculation was

similar to the initial distance matrix calculation, using PAM-Dayhoff method. An added parameter was including multiple data sets, of 100

replicates.

This outfile (.dis) was run through Neighbor. The parameter settings were again similar to the previous generation of the earlier phylogenetic

tree. An added parameter, as was with the bootstrap distance matrix calculations, was the inclusion of multiple data sets of 100 replicates.

The treefile (.ph) was run through Consense (within Phylip) to obtain the final bootstrapped phylogenetic tree. Bootstrap branch values were

also obtained to determine the reliability of the tree branches.

Replacing organism identifiers on phylogenetic tree

An online World Wide Web programme — Kenegdo server, was used in converting organism identifiers from within the tree, to their species names.

Protein Folding

First DALI search was done to compare the 3D structure with those in the protein data bank. It revealed that Neu5Ac-9-P phosphatase is a

haloacid dehalogenase-like hydrolase. Searching the PDB was then done to source for the structures of biological macromolecules and their

relationships to sequence, function, and disease. CE which is a databases and tool for 3-D protein structure ccomparison and alignment was used

to compare the alignments between the query protein and its neigbhours.

Sequence Similarity

Interproscan was then used to analyze the newly determined sequences for annotation of predicted proteins from genome sequencing projects. In

order to further analyze the protein, Pfam which is a large collection of multiple sequence alignments and hidden Markov models is used to

analyze the protein in this case acetylneuraminic acid phosphatase to find Pfam family matches.

The aim of using the ProFunc server is to help identify the likely biochemical function of a protein from its three-dimensional structure. It

uses a series of methods, including fold matching, residue conservation, surface cleft analysis, and functional 3D templates, to identify both

the protein’s likely active site and possible homologues in the PDB.

Surface Properties

RasMol which is a molecular graphics program was used for the visualisation of proteins, nucleic acids and small molecules while PyMOL, a

molecular graphics system with an embedded Python interpreter designed for real-time visualization and rapid generation of high-quality

molecular graphics images and animations was performed to assist you in the research.

References

1. Lawrence, S. M., Huddleston, K. A., Pitts, L. R., Nguyen, N., Lee, Y. C., Vann, W. F., Coleman, T. A. & Betenbaugh, M. J. (2000). Cloning and expression of the human N-acetylneuraminic acid phosphate synthase gene with 2-keto-3-deoxy-D-glycero-D-galactonononic acid biosynthetic ability. J. Biol Chem 275, 17869–17877.

2. Schauer, R. (2000). Achievements and challenges of sialic acid research. Glycoconj. J 17, 485-499.

3. Varki, A. (1997). Sialic acids as ligands in recognition phenomena. FASEB J. 11, 248-255.

4. Angata, T. & Varki, A. (2002). Chemical diversity in the sialic acids and related alpha-keto acids:an evolutionary perspective. Chem Rev 102, 439-469.

5. Institute, E. B. (2007). http://www.ebi.ac.uk/2can/genomes/bacteria/Symbiobacterium_thermophilum.html European Bioinformatics Institute.

6. Calderone, V., Forleo, C., Benvenuti, M., Thaller, M. C., Rossolini, G. M. & Mangani, S. (2004). The First Structure of a Bacterial Class B Acid Phosphatase Reveals Further Structural Heterogeneity Among Phosphatases of the Haloacid Dehalogenase Fold. J. Mol. Biol 335, 761–773.

7. Koonin, E. V. & Tatusov, R. L. (1994). A genomic perspective on protein families. J. Mol. Biol. 244, 125-132.

8. Maliekal, P., Vertommen, D., Delpierre, G. & Schaftingen, E. V. (2006). Identification of the sequence encoding N-acetylneuraminate-9-phosphate phosphatase. Glycobiology 16, 165–172.

9. Collet, J.-F., Stroobant, V. & Van Schaftingen, E. (1999). A new class of phosphotransferases phosphorylated on an aspartate residue in an amino-terminal DXDX (T/V) motif. J. Biol Chem 273, 14107–14112.

10. Collet, J.-F., Stroobant, V., Pirard, M., Delpierre, G. & Van Schaftingen, E. (1998). A new class of phosphotransferases phosphorylated on an aspartate residue in an amino-terminal DXDX (T/V) motif. J. Biol Chem 273, 14107-14112.

11. Macara, I. G. (1980). Vanadium, an element in search of a role. Trends Biochem Sci 5, 92-94.

12. Keseler, I. M., Collado-Vides, J., Gama-Castro, S., Ingraham, J., Paley, S., Paulsen, I. T., Peralta-Gil, M. & Karp, P. D. (2005). EcoCyc: a comprehensive database resource for Escherichia coli. Nucleic Acids Res 33, D334–D337.

13. Kuznetsova, E., Proudfoot, M., Gonzalez, C. F., Brown, G., Omelchenko, M. V., Borozan, I., Carmel, L., Wolf, Y. I., Mori, H. & Yakunin, A. F. (2006). Genome-wide Analysis of Substrate Specificities of the Escherichia coli Haloacid Dehalogenase-like Phosphatase Family. J. Biol Chem 281, 36149–36161.

Paper

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools