Materials and Methods (1zkd)
Determining the Structure
A protein search on PDB and NCBI Entrez was performed in identifying the structure of the given protein. Next, proteins with similar structures were identified using the DALI server which is a network service for comparing protein structures in 3D with those in the protein data bank. It will generate multiple alignment of structural neighbours. By comparing the 3D structures, it will hopefully reveal biologically interesting similarities that are not detectable by comparing sequences. A CE comparison was performed among the identified structures as it uses databases and tools for 3D Protein structure comparison and alignment. It calculates structural alignment for two chains. Surface properties of the given proteins were obtained using Pymol and Rasmol. Pfam and InterPro were also used to identify the domains of the protein.
Determining the Function
The protein function prediction tool ProFunc (Laskowski et al, 2005) was used to identify the most likely biochemical functions based on the submitted pdb file of 1zkd. ProFunc includes several approaches including Secondary Structure Matching (Krissinel & Henrick, 2004), Ligand Template Matching, Reverse Template Matching (Laskowski et al, 2005) and Superfamily program searches against libraries of Hidden Markov Models HMMs (Gough et al, 2001; Madera et al, 2004) derived from SCOP families.
The Genomic context of the 1zkd gene in the genome of Rhodopseudomonas palustris was revealed from the NCBI Entrez Gene database (http://www.ncbi.nlm.nih.gov/sites/entrez?db=g)
The Nuclear Protein Localisation Prediction tool "Nucleo" (Hawkins et al, 2006) was used to determine the likelihood for the human and mouse ortholog the be located in the nucleus.
Data from the LOCATE database (Fink et al, 2006) was used to get information about the localisation of the proteins in the cell.
Counts of Expressed Sequence Tags (ESTs) were used to get expression profiles for the mouse and human ortholog and were taken from the NCBI UniGene database (http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene).
Surface charges of 1zkd were modelled using Adaptive Poisson-Boltzmann Solver APBS (Baker et al, 2001) and visualisation was performed by using Pymol (http://www.pymol.org).
Determining the Evolution
1. BLASTP was used to first construct a multiple sequence alignment using [Sequence 3] which is the bacterial ortholog of 1ZKD from Rhodopseudomonas palustris.
2. The BLAST output results was then used in ClustalX (in a FASTA sequence file format) to determine how well suited the alignments were to construct a phylogenetic evolutionary tree. Sequences that were causing gaps in the MSA in ClustalX were then omitted and ClustalX was ran again to produce ideal MSA.
3. The ideal MSA from ClustalX was saved in a .phy format so that it can be used in Protdist to calculate distance matrix using the Dayhoff PAM matrix which scales the probabilities of change from one amino acid to another.
>> Input file for Protdist eg:(target_align.phy)
>> Output file for Protdist eg:(target_align.dis)
4. Neighbor - joining program was utilized to construct a tree by successive clustering of lineages, setting branch lengths as the lineages join. The tree produced here is unrooted.
>> Input file for Neighbor eg:(target_align.dis)
>> Output file for Neighbor eg:(target_align.nei)
>> Outtree file for Neighbor eg:(target_align.ph)
Similarly critical inspection of the outtree file using the Treeview program can allow further elimination of unwanted sequences that can disrupt the proper evolutionary tree outlook.
5. Finally to determine the of the lineages in the evolutionary tree produced, Bootstrapping is required and it is done using the following programs - Seqboot, Protdist, Neighbor and Consense.
Seqboot is a general bootstrapping and data set translation tool which enables the generation of multiple data sets that are resampled versions of the input data set. In this case 100 bootstrap samples will be produced.
>> Input file for Seqboot eg:(target_align.phy)- from ClustalX
>> Output file for Seqboot eg:(target_align.aln)
Protdist is ran again using the Dayhoff PAM matix - use same parameters as Step 3, change replicates to 100
>> Input file for Protdist eg:(target_align.aln)
>> Output file for Protdist eg:(target_align.dis)
Neighbor program is ran again - use same parameters as Step 4, change M to multiple and 100 data sets
>> Input file for Neighbor eg:(target_align.dis)
>> Output file for Neighbor eg:(target_align.nei)
>> Outtree file for Neighbor eg:(target_boot.ph)
Consense reads a file of computer-readable trees and prints out a consensus tree - make sure both output and outtree file for Consense is different
>> Input file for Consense eg:(target_align.ph)
>> Output file for Consense eg:(consense_boot.ph) - To view bootstrap values
>> Outree file for Consense eg:(consense_tree.ph) - To view consensus tree
6. To obtain a tree with the organism names attached to it instead of its GI numbers go to (http://foo.maths.uq.edu.au/~huber/BIOL3004/gi2name.pl).
>> Upload reference will be FASTA format of all BLASTP search results
>> Upload consensus tree form Consense eg:(consense_tree.ph)
Phylum output file will be produced this is then edited into the Treeview file of the consensus tree.
[Results]