Sequence searches

From MDWiki
Jump to navigationJump to search

The main purposes of why we perform sequence searches is to find related (homologous) sequences to our protein, which later will allow us to discuss our target protein in the context of the whole family of related proteins.

To a small degree this has already been done by NCBI with their HomoloGene clusters. The HomoloGene web site for your protein is linked from the Target PDB Table. However, to get a much broader (and better) picture on the relations a more thorough sequence analysis is necessary.

To find more complete sets of related sequences we will use Blast as you have already in the pracs. However, you will run the Blast program locally and also use local databases, since this is much swifter and also guarantees that databases don't change while we do our studies. Details on how to run the Blast software from the DVD are described on the Methods and Websites page.


On the DVD we provide, there are several Blast databases for you to use.

  • general databases:
    • nr = a non-redundant sequence database. This is a good starting point to collate a diverse set of sequences
    • pdb = all sequences of proteins for which the structures are available in the protein database (PDB).


Our particular interest is potential function of the target proteins in Human. The following databases are on your DVD. These specialsed RefSeq databases are for Eukaryotes only, and are split according to taxonomy (names are self explanatory):

  • specialised databases (by taxonomy):
    • fungi
    • invertebrate
    • plant
    • vertebrate_mammalian
    • vertebrate_other


--ThomasHuber 16:39, 25 April 2007 (EST)