A sequence similarity search was performed using the human orthologue of the protein 2nfx, the hypothetical protein LOC56985. This was performed through the use of blast program, BLASTP, (Altschul, Gish et al, 1990) that had been saved onto a CD in late April. Thus results could only have come from data present within the database before that day. That is the results may have been very similar but may not exactly match results achieved through a BLAST search on the NCBI (National Centre for Biotechnology Information) website. Only the highest 60 results (based on their E values) from the initial sequence search were used. For the purpose of creating a phylogenetic tree, any double up of species were removed, unless considered significant.
The selected sequences were aligned by the use of the CLUSTALX program (Larkin, et al., 2007), which is a global alignment program. From here any sequences that had extremely large gaps were also deleted. When a reasonable alignment was produced, the aligned sequences were used to form a phylogenetic tree. The tree was also bootstrapped and trees and bootstrap values were visualised using the program TreeView (Page, 1996).
The 2nxf structure was submitted to the DALI server (Holm & Sander, 1996) to search for structurally similar protiens among all solved structures deposited in the Protein Data Bank (PDB). These results were then analysed using ClustalX (Larkin, et al., 2007) for any conserved regions in the sequence that may provide vital information about the binding sites or sequences essential to the fold of the protein. CASTp (Dundas, Ouyang et al, 2006) and Pymol (DeLano Scientific LLC, 2007) were used for visualising the protein’s secondary and tertiary structure as well as the possible bindings sites. Both CATH (Orengo, Michie, Jones, Jones, Swindells, & Thornton, 1997) and SCOP (Murzin, Brenner et al, 1995) databases were used to identify and characterise the structure of the protein; LIGPLOT (Wallace, Laskowski, & Thornton, 1995) was used to generate schematic diagrams of the ligand-protein interactions at the binding site.
Domain and superfamily/ family identification
The Superfamily database (http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/hmm.html) and the Pfam domain database (http://pfam.sanger.ac.uk/) were searched using the LOC56985 sequence (default settings).
Multiple sequence alignment and identification of the metallophosphoesterase signature sequence
The above mentioned muliple sequence alignments were analysed for the metallophosphoesterase signature motif DX[H/X]-(X)n-GDXX[D/X]-(X)n-GNH[D/E]-(X)n-[G/X]H-(X)n-GHX[H/X] and any other conserved regions.
Structure and function analysis
The active site of 2nxf was identified and analysed using macromolecular data available at its PDBsum page. Pymol (DeLano Scientific LLC, 2007) was used to visualise ligand-binding sites, fold organization, and topology, and allign structural homolog 2dxl to 2nxf (structures obatined from The Protein Data Bank).
The ProFunc Structure database (http://www.ebi.ac.uk/thornton-srv/databases/profunc/index.html) was searched using 2nxf to infer biochemical function information from the three-dimensional structure.
Evolution and function analysis
A TBlastN search was conducted using the LOC56985 sequence to identify orthologs in all genomes. The human PAP P13686 was also subjected to TBLastN to compare phylogenetic distribution of its orthologs to LOC56985 orthologs.
Database searching for high-throughput gene expression data
The Genomics Institute of the Novartis Research Foundation (GNF) database (Su AI et al. 2002) was searched using the LOC56985 sequence to obtain data on tissue-specific expression.