DAP method

From MDWiki
Jump to navigationJump to search

BlastP


  • FASTA SEQUENCE FROM NCBI ENTREZ protein = 2IJZ_A
  • Origin of query sequence = Pseudomonas aeruginosa


>gi|119390187|pdb|2IJZ|A Chain A, Crystal Structure Of Aminopeptidase
RAELNQGLIDFLKASPTPFHATASLARRLEAAGYRRLDERDAWHTETGGRYYVTRNDSSLIAIRLGRRSP
LESGFRLVGAHTDSPCLRVKPNPEIARNGFLQLGVEVYGGALFAPWFDRDLSLAGRVTFRANGKLESR
LVDFRKAIAVIPNLNIHLNRAANEGWPINAQNELPPIIAQLAPGEAADFRLLLDEQLLREHGITADVVLDYE
LSFYDTQSAAVVGLNDEFIAGARLDNLLSCHAGLEALLNAEGDENCILVCTDHEEVGSCSHCGADGPFLE
QVLRRLLPEGDAFSRAIQRSLLVSADNAHGVHPNYADRHDANHGPALNGGPVIKINSNQRYATNSETA
GFFRHLCQDSEVPVQSFVTRSDMGCGSTIGPITASQVGVRTVDIGLPTFAMHSIRELAGSHDLAHLVKVLGA
FYASSELP


  • Performed blastp search against non-redundant (nr) databases which was provided on the CD provided. Query sequences used was Pseudomonas Aeruginosa chain A crystal structure of asparytl aminopeptidase.
  • Initial sequence alignment was performed using ClustalX and edited to reduce gapping in the alignment and final multiple sequence alignment was again performed with 38 sequences.
  • Treeview32 software was used to view phylogenetic tree produced from multiple sequence alignment and a bootstrapped N-J tree was produced using Clustalx for branches reliability indications.


As mentioned in the methods and website :

C:\blast\blastall -p blastp -d C:\blast\databases\nr -i yourfile.fasta -o usefuloutputname.html

Obtained fastaformat files

C:\blast\fastacmd -d C:\blast\databases\nr -i filewith_img_numbers -o C:\newsequences.fasta

Inputs used for obtaining fastaformat files:

pdb|2IJZ|A 
ref|YP_789908.1|
ref|YP_261475.1|  
ref|ZP_00416764.1|
ref|NP_743887.1|
ref|NP_793647.1|
ref|YP_607123.1|
ref|YP_958321.1|
ref|ZP_01894798.1|
ref|ZP_01166960.1|
ref|ZP_01738318.1|
ref|YP_436072.1| 
ref|ZP_01462550.1|
ref|YP_630602.1|
ref|YP_001615044.1|
ref|YP_747571.1|
ref|YP_113441.1|
ref|XP_001751765.1|
ref|XP_001641062.1|
ref|XP_713998.1|
gb|AAM61631.1|
ref|XP_365906.1|
ref|XP_843934.1|
ref|NP_001045513.1|
ref|XP_001566576.1|
ref|XP_001877081.1| 
gb|ACC64563.1| 
ref|XP_001492028.1|
ref|NP_001039417.1|
ref|YP_833603.1|
ref|NP_036232.2|
ref|NP_001012937.1|
ref|NP_001104301.1|
gb|EDL75426.1|
ref|NP_001085525.1|
ref|XP_462175.1|
ref|NP_956447.1|

Changed headings in every single obtained fasta sequences into organism names only, e.g :

From

>gi|116051260|ref|YP_789908.1| putative aminopeptidase 2 [Pseudomonas aeruginosa UCBPP-PA14]

MRAELNQGLIDFLKASPTPFHATASLARRLEAAGYRRLDERDAWHTEAGGRYYVTRNDSSLIAIRLGRRSPLESGFRLVG

AHTDSPCLRVKPNPEIARNGFLQLGVEVYGGALFAPWFDRDLSLAGRVTFRANGKLESRLVDFRKAIAVIPNLAIHLNRA

ANEGWPINAQNELPPIIAQLAPGEAADFRLLLDEQLLREHGITADVVLDYELSFYDTQSAAVVGLNDEFIAGARLDNLLS

CHAGLEALLNAEGDENCILVCTDHEEVGSCSHCGADGPFLEQVLRRLLPEGDAFSRAIQRSLLVSADNAHGVHPNYADK

DANHGPALNGGPVIKINSNQRYATNSETAGFFRHLCQDSEVPVQSFVTRSDMGCGSTIGPITASQVGVRTVDIGLPTFAM

HSIRELAGSHDLAHLVKVLGAFYASSELP


To

>Pseudomonas_aeruginosa

MRAELNQGLIDFLKASPTPFHATASLARRLEAAGYRRLDERDAWHTEAGGRYYVTRNDSSLIAIRLGRRSPLESGFRLVG

AHTDSPCLRVKPNPEIARNGFLQLGVEVYGGALFAPWFDRDLSLAGRVTFRANGKLESRLVDFRKAIAVIPNLAIHLNRA

ANEGWPINAQNELPPIIAQLAPGEAADFRLLLDEQLLREHGITADVVLDYELSFYDTQSAAVVGLNDEFIAGARLDNLLS

CHAGLEALLNAEGDENCILVCTDHEEVGSCSHCGADGPFLEQVLRRLLPEGDAFSRAIQRSLLVSADNAHGVHPNYADK

DANHGPALNGGPVIKINSNQRYATNSETAGFFRHLCQDSEVPVQSFVTRSDMGCGSTIGPITASQVGVRTVDIGLPTFAM

HSIRELAGSHDLAHLVKVLGAFYASSELP


saved into a new file organismnames.fasta


ClustalX


Figure 1.1 Multiple Sequence Alignment example

Used ClustalX 1.83multiple alignment software tool to align C:\3rdplaceoutnames.fasta. Output format options was changed to NODE before bootstrapping, this is performed in order see reliability of branches in treeview.


Conserved regions (*) of >gi|119390187|pdb|2IJZ|A Chain A, Crystal Structure Of Aminopeptidase was noted for structural analysis.

Output obtained : .aln file (alignment) and .dnd file (output guide tree)

Bootstrapping : .phb file obtained


Treeview

Used Treeview to visualize Phylogenetic tree:

  1. Radial Tree
  2. Rectangular Cladogram

The results from the blast search were then screened and a selection was of these results were used for a multiple sequence alignment using ClustalX. This result was boostrapped and these values checked and more sequences were added to improve the resolution of specific branches. A bootstrapped phylogram was produced, as well as a radial tree.


Protein Folding


First DALI search was done to compare the 3D structure with those in the protein data bank. It revealed that Aspartyl Aminopeptidase is a mol1A molecule: Probable M18-Family Aminopeptidase 2. Searching the PDB was then done to source for the structures of biological macromolecules and their relationships to sequence, function, and disease. CE which is a databases and tool for 3-D protein structure ccomparison and alignment was used to compare the alignments between the query protein and its neigbhours.


Sequence Similarity


Interproscan was then used to analyze the newly determined sequences for annotation of predicted proteins from genome sequencing projects. In order to further analyze the protein, Pfam which is a large collection of multiple sequence alignments and hidden Markov models is used to analyze the protein in this case acetylneuraminic acid phosphatase to find Pfam family matches. The aim of using the ProFunc server is to help identify the likely biochemical function of a protein from its three-dimensional structure. It uses a series of methods, including fold matching, residue conservation, surface cleft analysis, and functional 3D templates, to identify both the protein’s likely active site and possible homologues in the PDB.


MOTIF identification


MOTIFs were identified using the PROSITE motif search service (Bairoch, Bucher, & Hofmann, 1997) on the Aspartyl Aminopeptidase Chain A residue sequence. The identified MOTIF patterns can be seen below.


Figure 1.2 PROSITE Motif identification of 2ijz Chain A.(Continued in Figure 1.3)



Figure 1.3 PROSITE Motif Identification of 2ijz Chain A.



Structural Alignment

PyMOL was used to align two different sequence structure together to see how closely related they are in a three dimensional diagram.


EBI-EMBL

This site is a great resource for finding information on genomics. It can analyse a sequence and had links to many other databases, tools, and journals.

CluSTr

Able to provide a link to the data base UniProt and provide a structural alignment of the protein to mouse.

ExPASy

Prosite performed a scan using ProRule

Prosite predicted possible active sites with a high probability of occurence based on sequence data. The output did not take into account enough of the predicted Asp, Glu or His residues to be considered reliable.


Figure 1.4 Prosite predicted possible active sites with a high probability of occurence based on sequence data. The output did not take into account enough of the predicted Asp, Glu or His residues to be considered reliable.



UniProt

UniProt was used to Identify the function based on sequence in FASTA format, and confirm possible active site residues.


Figure 1.5 UniProt output



Figure 1.6 UniProt output



Figure 1.7 UniProt output



Other

Other Useful resources used are Profunc, pfam, Symatalas which gives expression data, and MEROPS

[1]Return to Aspartyl Aminopeptidase