DAP method: Difference between revisions

Latest revision as of 15:13, 9 June 2008

BlastP

FASTA SEQUENCE FROM NCBI ENTREZ protein = 2IJZ_A

Origin of query sequence = Pseudomonas aeruginosa

>gi|119390187|pdb|2IJZ|A Chain A, Crystal Structure Of Aminopeptidase
RAELNQGLIDFLKASPTPFHATASLARRLEAAGYRRLDERDAWHTETGGRYYVTRNDSSLIAIRLGRRSP
LESGFRLVGAHTDSPCLRVKPNPEIARNGFLQLGVEVYGGALFAPWFDRDLSLAGRVTFRANGKLESR
LVDFRKAIAVIPNLNIHLNRAANEGWPINAQNELPPIIAQLAPGEAADFRLLLDEQLLREHGITADVVLDYE
LSFYDTQSAAVVGLNDEFIAGARLDNLLSCHAGLEALLNAEGDENCILVCTDHEEVGSCSHCGADGPFLE
QVLRRLLPEGDAFSRAIQRSLLVSADNAHGVHPNYADRHDANHGPALNGGPVIKINSNQRYATNSETA
GFFRHLCQDSEVPVQSFVTRSDMGCGSTIGPITASQVGVRTVDIGLPTFAMHSIRELAGSHDLAHLVKVLGA
FYASSELP

Performed blastp search against non-redundant (nr) databases which was provided on the CD provided. Query sequences used was Pseudomonas Aeruginosa chain A crystal structure of asparytl aminopeptidase.

Initial sequence alignment was performed using ClustalX and edited to reduce gapping in the alignment and final multiple sequence alignment was again performed with 38 sequences.

Treeview32 software was used to view phylogenetic tree produced from multiple sequence alignment and a bootstrapped N-J tree was produced using Clustalx for branches reliability indications.

As mentioned in the methods and website :

C:\blast\blastall -p blastp -d C:\blast\databases\nr -i yourfile.fasta -o usefuloutputname.html

Obtained fastaformat files

C:\blast\fastacmd -d C:\blast\databases\nr -i filewith_img_numbers -o C:\newsequences.fasta

Inputs used for obtaining fastaformat files:

pdb|2IJZ|A 
ref|YP_789908.1|
ref|YP_261475.1|  
ref|ZP_00416764.1|
ref|NP_743887.1|
ref|NP_793647.1|
ref|YP_607123.1|
ref|YP_958321.1|
ref|ZP_01894798.1|
ref|ZP_01166960.1|
ref|ZP_01738318.1|
ref|YP_436072.1| 
ref|ZP_01462550.1|
ref|YP_630602.1|
ref|YP_001615044.1|
ref|YP_747571.1|
ref|YP_113441.1|
ref|XP_001751765.1|
ref|XP_001641062.1|
ref|XP_713998.1|
gb|AAM61631.1|
ref|XP_365906.1|
ref|XP_843934.1|
ref|NP_001045513.1|
ref|XP_001566576.1|
ref|XP_001877081.1| 
gb|ACC64563.1| 
ref|XP_001492028.1|
ref|NP_001039417.1|
ref|YP_833603.1|
ref|NP_036232.2|
ref|NP_001012937.1|
ref|NP_001104301.1|
gb|EDL75426.1|
ref|NP_001085525.1|
ref|XP_462175.1|
ref|NP_956447.1|

Changed headings in every single obtained fasta sequences into organism names only, e.g :

From

>gi|116051260|ref|YP_789908.1| putative aminopeptidase 2 [Pseudomonas aeruginosa UCBPP-PA14]

MRAELNQGLIDFLKASPTPFHATASLARRLEAAGYRRLDERDAWHTEAGGRYYVTRNDSSLIAIRLGRRSPLESGFRLVG

AHTDSPCLRVKPNPEIARNGFLQLGVEVYGGALFAPWFDRDLSLAGRVTFRANGKLESRLVDFRKAIAVIPNLAIHLNRA

ANEGWPINAQNELPPIIAQLAPGEAADFRLLLDEQLLREHGITADVVLDYELSFYDTQSAAVVGLNDEFIAGARLDNLLS

CHAGLEALLNAEGDENCILVCTDHEEVGSCSHCGADGPFLEQVLRRLLPEGDAFSRAIQRSLLVSADNAHGVHPNYADK

DANHGPALNGGPVIKINSNQRYATNSETAGFFRHLCQDSEVPVQSFVTRSDMGCGSTIGPITASQVGVRTVDIGLPTFAM

HSIRELAGSHDLAHLVKVLGAFYASSELP

To

>Pseudomonas_aeruginosa

MRAELNQGLIDFLKASPTPFHATASLARRLEAAGYRRLDERDAWHTEAGGRYYVTRNDSSLIAIRLGRRSPLESGFRLVG

AHTDSPCLRVKPNPEIARNGFLQLGVEVYGGALFAPWFDRDLSLAGRVTFRANGKLESRLVDFRKAIAVIPNLAIHLNRA

ANEGWPINAQNELPPIIAQLAPGEAADFRLLLDEQLLREHGITADVVLDYELSFYDTQSAAVVGLNDEFIAGARLDNLLS

CHAGLEALLNAEGDENCILVCTDHEEVGSCSHCGADGPFLEQVLRRLLPEGDAFSRAIQRSLLVSADNAHGVHPNYADK

DANHGPALNGGPVIKINSNQRYATNSETAGFFRHLCQDSEVPVQSFVTRSDMGCGSTIGPITASQVGVRTVDIGLPTFAM

HSIRELAGSHDLAHLVKVLGAFYASSELP

saved into a new file organismnames.fasta

ClustalX

Figure 1.1 Multiple Sequence Alignment example

Used ClustalX 1.83multiple alignment software tool to align C:\3rdplaceoutnames.fasta. Output format options was changed to NODE before bootstrapping, this is performed in order see reliability of branches in treeview.

Conserved regions (*) of >gi|119390187|pdb|2IJZ|A Chain A, Crystal Structure Of Aminopeptidase was noted for structural analysis.

Output obtained : .aln file (alignment) and .dnd file (output guide tree)

Bootstrapping : .phb file obtained

Treeview

Used Treeview to visualize Phylogenetic tree:

Radial Tree
Rectangular Cladogram

The results from the blast search were then screened and a selection was of these results were used for a multiple sequence alignment using ClustalX. This result was boostrapped and these values checked and more sequences were added to improve the resolution of specific branches. A bootstrapped phylogram was produced, as well as a radial tree.

Protein Folding

First DALI search was done to compare the 3D structure with those in the protein data bank. It revealed that Aspartyl Aminopeptidase is a mol1A molecule: Probable M18-Family Aminopeptidase 2. Searching the PDB was then done to source for the structures of biological macromolecules and their relationships to sequence, function, and disease. CE which is a databases and tool for 3-D protein structure ccomparison and alignment was used to compare the alignments between the query protein and its neigbhours.

Sequence Similarity

Interproscan was then used to analyze the newly determined sequences for annotation of predicted proteins from genome sequencing projects. In order to further analyze the protein, Pfam which is a large collection of multiple sequence alignments and hidden Markov models is used to analyze the protein in this case acetylneuraminic acid phosphatase to find Pfam family matches. The aim of using the ProFunc server is to help identify the likely biochemical function of a protein from its three-dimensional structure. It uses a series of methods, including fold matching, residue conservation, surface cleft analysis, and functional 3D templates, to identify both the protein’s likely active site and possible homologues in the PDB.

MOTIF identification

MOTIFs were identified using the PROSITE motif search service (Bairoch, Bucher, & Hofmann, 1997) on the Aspartyl Aminopeptidase Chain A residue sequence. The identified MOTIF patterns can be seen below.

Figure 1.2 PROSITE Motif identification of 2ijz Chain A.(Continued in Figure 1.3)

Figure 1.3 PROSITE Motif Identification of 2ijz Chain A.

Structural Alignment

PyMOL was used to align two different sequence structure together to see how closely related they are in a three dimensional diagram.

EBI-EMBL

This site is a great resource for finding information on genomics. It can analyse a sequence and had links to many other databases, tools, and journals.

CluSTr

Able to provide a link to the data base UniProt and provide a structural alignment of the protein to mouse.

ExPASy

Prosite performed a scan using ProRule

Prosite predicted possible active sites with a high probability of occurence based on sequence data. The output did not take into account enough of the predicted Asp, Glu or His residues to be considered reliable.

Figure 1.4 Prosite predicted possible active sites with a high probability of occurence based on sequence data. The output did not take into account enough of the predicted Asp, Glu or His residues to be considered reliable.

UniProt

UniProt was used to Identify the function based on sequence in FASTA format, and confirm possible active site residues.

Figure 1.5 UniProt output

Figure 1.6 UniProt output

Figure 1.7 UniProt output

Other

Other Useful resources used are Profunc, pfam, Symatalas which gives expression data, and MEROPS

[1]Return to Aspartyl Aminopeptidase

@@ Line 1: / Line 1: @@
-=== BLASTP ===
+== BlastP ==
 :
@@ Line 111: / Line 110: @@
-===CLUSTALX===
+==ClustalX==
 :
-[[Image:clustalexample.JPG|frame|Multiple Sequence Alignment example]]
+[[Image:clustalexample.JPG|frame|'''Figure 1.1''' Multiple Sequence Alignment example]]
 Used ClustalX 1.83multiple alignment software tool to align '''C:\3rdplaceoutnames.fasta. '''Output format options was changed to NODE before bootstrapping, this is performed in order see reliability of branches in treeview.
@@ Line 128: / Line 127: @@
-'''Treeview'''
+===Treeview===
@@ Line 139: / Line 138: @@
-[[image:Motif_Sequence1.JPG]]
+==Protein Folding==
+:
+First DALI search was done to compare the 3D structure with those in the protein data bank. It revealed that Aspartyl Aminopeptidase is a  mol1A molecule: Probable M18-Family Aminopeptidase 2. Searching the PDB was then done to source for the structures of biological macromolecules and their relationships to sequence, function, and disease. CE which is a databases and tool for 3-D protein structure ccomparison and alignment was used to compare the alignments between the query protein and its neigbhours.
+==Sequence Similarity==
+:
+Interproscan was then used to analyze the newly determined sequences for annotation of predicted proteins from genome sequencing projects. In order to further analyze the protein, Pfam which is a large collection of multiple sequence alignments and hidden Markov models is used to analyze the protein in this case acetylneuraminic acid phosphatase to find Pfam family matches. The aim of using the ProFunc server is to help identify the likely biochemical function of a protein from its three-dimensional structure. It uses a series of methods, including fold matching, residue conservation, surface cleft analysis, and functional 3D templates, to identify both the protein’s likely active site and possible homologues in the PDB.
+==MOTIF identification==
+:
+MOTIFs were identified using the PROSITE motif search service (Bairoch, Bucher, & Hofmann, 1997) on the Aspartyl Aminopeptidase Chain A residue sequence. The identified MOTIF patterns can be seen below.
+[[image:Motif_Sequence1.JPG|framed|<br>'''Figure 1.2''' PROSITE Motif identification of 2ijz Chain A.(Continued in Figure 1.3)|none]]<br>
+[[image:Motif_Sequence2.JPG|framed|<br>'''Figure 1.3''' PROSITE Motif Identification of 2ijz Chain A.|none]]<br>
+==Structural Alignment==
+:
+PyMOL was used to align two different sequence structure together to see how closely related they are in a three dimensional diagram.
+== EBI-EMBL ==
+This site is a great resource for finding information on genomics. It can analyse a sequence and had links to many other databases, tools, and journals.
+==CluSTr==
+Able to provide a link to the data base UniProt and provide a structural alignment of the protein to mouse.
+==ExPASy==
+'''Prosite''' performed a scan using ProRule
+Prosite predicted possible active sites with a high probability of occurence based on sequence data. The output did not take into account enough of the predicted Asp, Glu or His residues to be considered reliable.
+[[Image:prositeprediction.jpg|framed|<br>'''Figure 1.4''' Prosite predicted possible active sites with a high probability of occurence based on sequence data. The output did not take into account enough of the predicted Asp, Glu or His residues to be considered reliable.|none]]<br>
+==UniProt==
+UniProt was used to Identify the function based on sequence in FASTA format, and confirm possible active site residues.
+[[Image:names.jpg|framed|<br>'''Figure 1.5''' UniProt output |none]]<br>
+[[Image:sequence.jpg|framed|<br>'''Figure 1.6''' UniProt output|none]]<br>
+[[Image:ontology.jpg|framed|<br>'''Figure 1.7''' UniProt output|none]]<br>
+== Other ==
+Other Useful resources used are Profunc, pfam, Symatalas which gives expression data, and MEROPS
+[http://compbio.chemistry.uq.edu.au/mediawiki/index.php/Aspartyl_Aminopeptidase]Return to Aspartyl Aminopeptidase

DAP method: Difference between revisions

Latest revision as of 15:13, 9 June 2008

Contents

BlastP

ClustalX

Treeview

Protein Folding

Sequence Similarity

MOTIF identification

Structural Alignment

EBI-EMBL

CluSTr

ExPASy

UniProt

Other

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools