2gnx Methods & Materials
The target sequence 2gnxA was blasted in a non-redundant and the NCIB data bases to find homologous sequences. All sequences with significant E-values were assessed to see if they also possessed a high identity values. The accession numbers of those with significant E-values and high identities were then used to obtain these sequences in FASTA format. All splice varrients were also removed. These sequences were then loaded into a program called ClustalX that allowed a multiple sequence alignment to be completed (Figure 1). The resulting alignment was then examined and any sequence that was too disparate from the rest of the alignment was removed.
To produce a phylogenetic tree the programme phylip-3.63 was used. This programme uses algorithms to calculate the positions of each organism relative to each other from the data provided by the multiple sequence alignment. The bootstrap values were calculated and all branching incidents that occurred more the 75% of the time were marked (Figure 2).
Structural analysis of our protein was carried out by submitting the PDB (Protein Data Bank) file of the 2GNX protein to a number of online databases and analysing their output for usage in this project. Firstly, the PDB file of our protein was submitted to the Dali database. Dali is a database which predicts the closest structural matches to the submitted protein. This database was very extensively used for the purposes of structural analysis. The next database used was CE , a database using which comparisons can be drawn as to sequence similarity of 2 submitted proteins, as well as to the 3-D structures of both proteins. This database was used extensively in combination with the Dali database for comparison of our protein with structurally related proteins from the Dali results. Another database used was Q-siteFinder, which is a website that predicts the ligand binding sites on the protein that has been submitted into the database. Two programs namely RasMol and PyMol were used in the structural analysis of the protein as well. Using these programs it was possible to compare the 3-D structure of the protein with the closest structually related proteins generated from the Dali search in order to draw conclusions as to which residues of the hypothetical protein would be involved in ligand binding.
Functional analysis was undertaken in two phases – the first relying on the protein sequence comparisons, and the second relying on structural analysis.
Deriving Function from Sequence Data
This phase involved submitting the BC048403 FASTA sequence to several predictive databases. These included STRING, a database providing the ability to predict functional associations between proteins; Locate, a database containing data describing the membrane organization and sub-cellular location of proteins; and CDART, a tool that displays the functional domains that make up the protein and other proteins with similar domain architectures. The default search variables were used in most cases. In the case of the protein 2GNX, these tools returned no results and thus there was no further analysis concerning them.
It is important to note that solely depending on data from sequence analysis is not reliable due to the way in which the sequencing databases work. These databases work by comparing similarities to homologs, however genes that are homologous can in fact be paralogs and have divergent functions. Thus sequence analysis should form part of your research but not be the only foundation of your results.
Deriving Function from Structural Data
The sequence was submitted to the following tools which can be used to determine functional information from structural data: Profunc – a tool used to analyse the 3D structure of a protein to help identify its likely biochemical function; Proknow – a tool used to achieve the same ends as Profunc but with a slightly different process; Interpro – a tool used to predict protein domains; and Pfam – which identifies protein families. These tools provided limited or no results and so a different approach was investigated.
A search was conducted using Symatlas, a gene atlas of mouse and human expression patterns across diverse tissue sets. This tool also has the functionality to find correlated micro-array expression data allowing the discovery of proteins with similar expression data.
Once structural analysis had progressed and some basic data was known concerning the structure of the protein, this information was used in furthering the functional analysis. Specifically, research was performed on discovering a link between the results from the micro-array analysis, and the presence of structural domains identified through structural analysis.
cis-RED was used to find the motifs inherent in the protein and these results were investigated for conservation between species, and any correlations to previous results.
return to Report