Methods and Websites: Difference between revisions

From MDWiki
Jump to navigationJump to search
No edit summary
 
(40 intermediate revisions by 6 users not shown)
Line 1: Line 1:
=== Wiki stuff ===
[http://en.wikipedia.org/wiki/Wikipedia:How_to_edit_a_page How to edit a page on Wikipedia]


=== Websites with useful information or software ===
[http://meta.wikimedia.org/wiki/Help:Editing Editing help] (since our help is not yet functional)




=== Websites with useful information or software ===


RCSB Protein Database: [http://www.rcsb.org/pdb/Welcome.do PDB]




Structural Classification of Proteins: [http://scop.mrc-lmb.cam.ac.uk/scop/ SCOP]
* RCSB Protein Database: [http://www.rcsb.org/pdb/Welcome.do PDB]


Structural Comparison of Proteins: [http://www.ebi.ac.uk/dali/ Dali], [http://cl.sdsc.edu/ CE]
* Structural Classification of Proteins: [http://scop.mrc-lmb.cam.ac.uk/scop/ SCOP]
* Structural Comparison of Proteins: [http://www.ebi.ac.uk/dali/ Dali], [http://cl.sdsc.edu/ CE]
* Multiple Structure Alignment: [http://bioinformatics.albany.edu/~cemc/ CEMC]


Multiple Structure Alignment: [http://bioinformatics.albany.edu/~cemc/ CEMC]
* Protein families database: [http://pfam.wustl.edu Pfam at St Louis] [http://www.sanger.ac.uk/Software/Pfam/ Pfam at Sanger]


* Clusters of Orthologous groups: [http://www.ncbi.nlm.nih.gov/COG/ COG]


Protein families database: [http://pfam.wustl.edu Pfam at St Louis] [http://www.sanger.ac.uk/Software/Pfam/ Pfam at Sanger]
* Protein Domain prediction: [http://www.ebi.ac.uk/interpro/ InterPro]


* Inference of Protein Function from Protein Structure: [http://www.doe-mbi.ucla.edu/Services/ProKnow/ ProKnow] [http://www.ebi.ac.uk/thornton-srv/databases/ProFunc Profunc]


Clusters of Orthologous groups: [http://www.ncbi.nlm.nih.gov/COG/ COG]
* Protein-protein interaction:<br>
[http://www.bork.embl-heidelberg.de/STRING/ STRING]: a database of predicted functional associations between proteins<br>
[http://mips.gsf.de/proj/ppi/ PII]: The MIPS Mammalian Protein-Protein Interaction Database<br>
[http://www.ihop-net.org/UniPub/iHOP/ iHOP] (Information Hyperlinked over Proteins). Protein association network built by literature mining


* A program to score residue conservation in a multiple sequence alignment [http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/valdar/scorecons_server.pl Scorecons]


Domain prediction: [http://www.ebi.ac.uk/interpro/ InterPro]
* Computed atlas of surface topography of proteins [http://sts.bioengr.uic.edu/castp/calculation.php CASTp]


* Ligand Binding Site Prediction [http://www.bioinformatics.leeds.ac.uk/qsitefinder/ Q-siteFinder]


* [http://foo.maths.uq.edu.au/~huber/BIOL3004/gi2name.pl Webserver] that converts sequence identifiers into species names.


[http://foo.maths.uq.edu.au/~huber/BIOL3004/gi2name.pl Webserver] that converts sequence identifiers into species names.
The website lets you change the sequence identifiers to organism taxonomies. You need to upload the original FASTA file with all sequences and a second file (e.g. the Newick tree file, but could be any other (text) format). The result you get back from the web page will have replaced identifiers by taxonomies where possible in the second file.


The website lets you change the sequence identifiers to organism taxonomies. You need to upload the original FASTA file with all sequences and a second file (e.g. the Newick tree file, but could be any other (text) format). The result you get back from the web page will have replaced identifiers by taxonomies where possible in the second file.
* [http://reactome.org Reactome] a curated knowledgebase of biological pathways


Additional tips for [[Deep Evolutionary Analysis]]


Additional tips for [Deep Evolutionary Analysis]


*  [http://zope.bioinfo.cnio.es/bionlp_tools/all_bionlp_tools Collection of bio-nlp tools] Tools for biological natural language processing (literature mining)


[[ Tools/Papers for function analysis ]]


----


=== CD/DVD software, basic how to use ===
=== CD/DVD software, basic how to use ===


blast  clustalx  muscle  seaview  phylip-3.36 treeview rasmol  pymol




=blast  clustalx  muscle  seaview  phylip-3.36 treeview rasmol  pymol=


==== Blast ====




==== Blast ====
Call up a command prompt (accessories), it should be H:  (your student directory)




'' '''D:\'''blast\bin\blastall -p blastp -d D:\blast\databases\nr -i yourfile.fasta -o usefuloutputname.blast''


Call up a command prompt (accessories), it should be H: (your student directory)
'''NOTE:''' '''D''' is the letter of your CD-ROM.


'''NOTE:''' You need to specify an input file (i.e. on H:\ your student directory)


'''NOTE:''' You need to be able to '''write''' to an output directory i.e. H:\ (this is your student directory.


=E:lastlastall -p blastp -d e:lastdatabasesimg_bacteria -i yourfile.fasta -o usefuloutputname.blast=
i.e. '' '''D:\'''blast\blastall -p blastp -d '''D:'''\blast\databases\nr -i '''H:'''\yourfile.fasta -o '''H:\'''usefuloutputname.blast''
 




-p the blast program to use: blastp, blastn
-p the blast program to use: blastp, blastn


-d the database to use:


On the DVD we provide a non-redundant sequence data base (''nr''), sequences from the proteins in the PDB (''pdb'') and a selected subset of the RefSeq database (''fungi invertebrate plant protozoa vertebrate_mammalian vertebrate_other''). You can search several databases by putting quotes around them: ''-d "vertebrate_mammalian vertebrate_other"''


-d the database to use: =img_bacteria  img_archaea  img_eukaryota=  You can search several databases by putting quotes around them: =-d "img_archaea img_bacteria"=


-i input, query sequence (in FastaFormat)


-o output file to write blast results to.


-i input, query sequence (in FastaFormat)
==== Psi-blast ====




Psi-blast is very similar, but you need to use "blastpgp" and be aware of -j and -h options.  Also remember that psi-blast will generally be slower because it has to do normal blast first, and then build profiles and do later rounds of searching with the profiles.


-o output file to write blast results to.


''E:\blast\blastpgp -d e:\blast\databases\nr -i yourfile.fasta -o usefuloutputname.blast -j 3 -h 0.000001''
-j maximum number of rounds to do  (it will stop earlier, once the searches don't find more matches)


-h significance level cut-off




==== Psi-blast =====


The documentation with all possible options is on the CD/DVD under blastdoc.




Psi-blast is very similar, but you need to use "blastpgp" and be aware of -j and -h options.  Also remember that psi-blast will generally be slower because it has to do normal blast first, and then build profiles and do later rounds of searching with the profiles.




==== Obtaining FastaFormat files of the sequences found with blast: ====


=E:lastlastpgp -d e:lastdatabasesimg_bacteria -i yourfile.fasta -o usefuloutputname.blast -j 3 -h 0.000001=




Call up a command prompt.


-j maximum number of rounds to do  (it will stop earlier, once the searches don't find more matches)


''E:\blast\bin\fastacmd -d E:\blast\databases\nr -i filewith_img_numbers -o H:\newsequences.fasta ''




-h significance level cut-off
-i  the input file should be a line-by-line listing of the "accession numbers" from the same img database you used in the blast search. 


1234567


1234589


The documentation with all possible options is on the CD/DVD under blastdoc.
1456789






[[Media:ExtractIDs.doc | ExtractIDs.doc]] shows a fast, painless way to prepare the input file from your blast result.


==== Obtaining FastaFormat files of the sequences found with blast: =====




The complete fastacmd document is [http://biowulf.nih.gov/apps/blast/doc/fastacmd.html here].


Call up a command prompt.
==== Clustal ====






=E:lastastacmd -d e:lastdatabasesimg_bacteria -i filewith_img_numbers -o newsequences.fasta=
Click on the clustalx.exe icon in the clustal folder. Load sequences (you can use "browse" to go to your student area files) in FastaFormat.






-i  the input file should be a line-by-line listing of the "accession numbers" from the same img database you used in the blast search. Each number needs to have =lcl|= in front of it:
Select options from the various clustal menu items.






<verbatim>
Alignment output defaults to ''.aln'' (which can be loaded back into clustal later); select phylip output format also (.''phy'') for phylip analysis.


lcl|1234567


lcl|1234589


lcl|1456789
Remember to change the output format options from branch to NODE before bootstrapping in clustal. If not, you will not be able to see the reliability of the branches in treeview (shown with the internal edge labels).


</verbatim>




==== Phylip ====


[[%ATTACHURL%/ExtractIDs.doc][ExtractIDs.doc]] shows a fast, painless way to prepare the input file from your blast result.




Click on the icon for the appropriate program in the phylip '''exe''' folder.  Type in the input file name eg ''H:BIOL3004mydata.phy''.  Most phylip programs take ''.phy'' input files; '''neighbor''' takes a distance matrix produced by '''protdist''', '''dnadist''' or similar.


The complete fastacmd document is [[http://biowulf.nih.gov/apps/blast/doc/fastacmd.html][here]].


For more instructions on how to use phylip to construct a phylogenetic tree, see the [[Phylogenetic tree]] page




==== Clustal ====


Complete phylip documentation is also on the DVD: click on the phylip.html document in the phylip folder, it has links to documentation for specific programs. Or, on the web, you can find it [http://evolution.genetics.washington.edu/phylip/phylip.html here].




Click on the clustalx.exe icon in the clustal folder.  Load sequences (you can use "browse" to go to your student area files) in FastaFormat.






Select options from the various clustal menu items.
==== Treeview ====






Alignment output defaults to =.aln= (which can be loaded back into clustal later); select phylip output format also (=.phy=) for phylip analysis.
Click on treev32.exe in the Treeview folder.  The input file should be a tree in NewHampshire format.






Remember to change the output format options from branch to NODE before bootstrapping in clustal. If not, you will not be able to see the reliability of the branches in treeview (shown with the internal edge labels).
From clustal:  ''filename.ph  filename.phb''






==== Phylip ====
From phylip: ''outtree'' (renamed appropriately)






Click on the icon for the appropriate program in the phylip *exe* folder.  Type in the input file name eg =H:BIOL3004mydata.phy=.  Most phylip programs take =.phy= input files; *neighbor* takes a distance matrix produced by *protdist*, *dnadist* or similar.




==== Pymol ====


PhylipBootstrapping is a multi-stage process (details to come).
PyMOL is a powerful molecular visualiser.
Click on PyMOL.exe in the Pymol folder.  The input is a protein structure file, such as a PDB file.


'''The best place to learn about PyMOL is the PyMOL wiki: http://www.pymolwiki.org/index.php/Main_Page'''


Some brief notes on using PyMOL to do structural alignment: [[PyMOL alignment]]


Complete phylip documentation is also on the DVD: click on the phylip.html document in the phylip folder, it has links to documentation for specific programs. Or, on the web, you can find it [[http://evolution.genetics.washington.edu/phylip/phylip.html][here]].
How to use APBS within PyMOL is described here.


How to visualise surface topographies computed by [http://sts.bioengr.uic.edu/castp/calculation.php CASTp] is described [http://sts.bioengr.uic.edu/castp/pymol.php here]




More information on PyMOL will be uploaded soon. The [http://pymol.sourceforge.net/newman/ref/toc.html PyMOL Reference Manual] and the [http://pymol.sourceforge.net/newman/user/toc.html PyMOL User Manual] may also be useful.


a brief [http://137.189.50.96/kbwong/teaching/pymol/pymol_tutorial.html PyMOL tutorial]


==== Treeview ====
==== Rasmol ====






Click on treev32.exe in the Treeview folder.  The input file should be a tree in NewHampshire format.
Click on rasmol.exe in the Rasmol folder.  The input is a protein structure file, such as a PDB file.






From clustal:  =filename.ph  filename.phb=
==== Perhaps useful links ====


http://www.soe.ucsc.edu/~karplus/compbio_pages.html




From phylip: =outtree= (renamed appropriately)


=LAST MINUTE HELP=


*IF YOU NEED TO UPLOAD FILES THAT ARE OF EXTENSION OTHER THAN:
'png', 'gif', 'jpg', 'jpeg', 'pdf', 'doc', 'txt', 'aln', 'dnd', 'ppt'
    PLEASE EMAIL s4026869@student.uq.edu.au


*A TOOL to convert ppt/word docs into media wiki friendly format: http://openwetware.org/wiki/Converting_documents_to_mediawiki_markup


*PYMOL does not appear to work on the DVD provided. Linux/Windows/MAC versions can be download to your home pc via: http://pymol.sourceforge.net/


==== Rasmol ====
*If you still have no luck with pymol there are molecular viewers such as: Swisspdb: (http://www.expasy.ch/spdbv/) and, VMD: (http://www.ks.uiuc.edu/Research/vmd/)
-----------------------------------------------------------------






Click on rasmol.exe in the Rasmol folder.  The input is a protein structure file, such as a PDB file.
--[[User:ThomasHuber|ThomasHuber]] 13:43, 24 April 2007 (EST)

Latest revision as of 08:05, 19 May 2009

Wiki stuff

How to edit a page on Wikipedia

Editing help (since our help is not yet functional)


Websites with useful information or software

  • RCSB Protein Database: PDB
  • Structural Classification of Proteins: SCOP
  • Structural Comparison of Proteins: Dali, CE
  • Multiple Structure Alignment: CEMC
  • Clusters of Orthologous groups: COG
  • Protein-protein interaction:

STRING: a database of predicted functional associations between proteins
PII: The MIPS Mammalian Protein-Protein Interaction Database
iHOP (Information Hyperlinked over Proteins). Protein association network built by literature mining

  • A program to score residue conservation in a multiple sequence alignment Scorecons
  • Computed atlas of surface topography of proteins CASTp
  • Webserver that converts sequence identifiers into species names.

The website lets you change the sequence identifiers to organism taxonomies. You need to upload the original FASTA file with all sequences and a second file (e.g. the Newick tree file, but could be any other (text) format). The result you get back from the web page will have replaced identifiers by taxonomies where possible in the second file.

  • Reactome a curated knowledgebase of biological pathways

Additional tips for Deep Evolutionary Analysis


Tools/Papers for function analysis


CD/DVD software, basic how to use

blast clustalx muscle seaview phylip-3.36 treeview rasmol pymol


Blast

Call up a command prompt (accessories), it should be H: (your student directory)


D:\blast\bin\blastall -p blastp -d D:\blast\databases\nr -i yourfile.fasta -o usefuloutputname.blast

NOTE: D is the letter of your CD-ROM.

NOTE: You need to specify an input file (i.e. on H:\ your student directory)

NOTE: You need to be able to write to an output directory i.e. H:\ (this is your student directory.

i.e. D:\blast\blastall -p blastp -d D:\blast\databases\nr -i H:\yourfile.fasta -o H:\usefuloutputname.blast


-p the blast program to use: blastp, blastn

-d the database to use:

On the DVD we provide a non-redundant sequence data base (nr), sequences from the proteins in the PDB (pdb) and a selected subset of the RefSeq database (fungi invertebrate plant protozoa vertebrate_mammalian vertebrate_other). You can search several databases by putting quotes around them: -d "vertebrate_mammalian vertebrate_other"


-i input, query sequence (in FastaFormat)

-o output file to write blast results to.

Psi-blast

Psi-blast is very similar, but you need to use "blastpgp" and be aware of -j and -h options. Also remember that psi-blast will generally be slower because it has to do normal blast first, and then build profiles and do later rounds of searching with the profiles.


E:\blast\blastpgp -d e:\blast\databases\nr -i yourfile.fasta -o usefuloutputname.blast -j 3 -h 0.000001

-j maximum number of rounds to do (it will stop earlier, once the searches don't find more matches)

-h significance level cut-off


The documentation with all possible options is on the CD/DVD under blastdoc.



Obtaining FastaFormat files of the sequences found with blast:

Call up a command prompt.


E:\blast\bin\fastacmd -d E:\blast\databases\nr -i filewith_img_numbers -o H:\newsequences.fasta


-i the input file should be a line-by-line listing of the "accession numbers" from the same img database you used in the blast search.

1234567

1234589

1456789


ExtractIDs.doc shows a fast, painless way to prepare the input file from your blast result.


The complete fastacmd document is here.

Clustal

Click on the clustalx.exe icon in the clustal folder. Load sequences (you can use "browse" to go to your student area files) in FastaFormat.


Select options from the various clustal menu items.


Alignment output defaults to .aln (which can be loaded back into clustal later); select phylip output format also (.phy) for phylip analysis.


Remember to change the output format options from branch to NODE before bootstrapping in clustal. If not, you will not be able to see the reliability of the branches in treeview (shown with the internal edge labels).


Phylip

Click on the icon for the appropriate program in the phylip exe folder. Type in the input file name eg H:BIOL3004mydata.phy. Most phylip programs take .phy input files; neighbor takes a distance matrix produced by protdist, dnadist or similar.


For more instructions on how to use phylip to construct a phylogenetic tree, see the Phylogenetic tree page


Complete phylip documentation is also on the DVD: click on the phylip.html document in the phylip folder, it has links to documentation for specific programs. Or, on the web, you can find it here.



Treeview

Click on treev32.exe in the Treeview folder. The input file should be a tree in NewHampshire format.


From clustal: filename.ph filename.phb


From phylip: outtree (renamed appropriately)



Pymol

PyMOL is a powerful molecular visualiser. Click on PyMOL.exe in the Pymol folder. The input is a protein structure file, such as a PDB file.

The best place to learn about PyMOL is the PyMOL wiki: http://www.pymolwiki.org/index.php/Main_Page

Some brief notes on using PyMOL to do structural alignment: PyMOL alignment

How to use APBS within PyMOL is described here.

How to visualise surface topographies computed by CASTp is described here


More information on PyMOL will be uploaded soon. The PyMOL Reference Manual and the PyMOL User Manual may also be useful.

a brief PyMOL tutorial

Rasmol

Click on rasmol.exe in the Rasmol folder. The input is a protein structure file, such as a PDB file.


Perhaps useful links

http://www.soe.ucsc.edu/~karplus/compbio_pages.html


LAST MINUTE HELP

  • IF YOU NEED TO UPLOAD FILES THAT ARE OF EXTENSION OTHER THAN:

'png', 'gif', 'jpg', 'jpeg', 'pdf', 'doc', 'txt', 'aln', 'dnd', 'ppt'

   PLEASE EMAIL s4026869@student.uq.edu.au


--ThomasHuber 13:43, 24 April 2007 (EST)