Talk:Protein Function

From MDWiki
Revision as of 06:10, 5 June 2007 by S4079195 (talk | contribs)
Jump to navigationJump to search

Information From 8th May 2007

MIF4G is the middle domain of eukaryotic initiation factor 4G (eIF4G). It also occurs in NMD2p (non-sense mediated mRNA decay protein - it is involved in the non-sense mediated decay of mRNAs containing premature stop codons) and in CBP80 (Cap binding protein).

The protein binds eIF4A, eIF3, RNA and DNA Therefore part of function is to bind to RNA

Possibly located in the cytoplasm - See link to LOCATE. Mouse protein of similar seuqence in this location.

MIF4G starts residue 28 Ends 240 (mouse)

It is soluble and non-secreted.

PA74324.2 Riken cDNA 2310075612 Rik Protein - AAH26740, AAH55812(mouse), AAH33759(human)

AAH55812 - Rik Protein Mouse. Present in the cerebellum, Striatum, Eye, Wholebrain, Liver, Hippocampus, Hematopoietic Stem Cells and Kidney Accession No: BC055812.1

Performed a MultiLoc prediction that determines location of the protein based on Amino Acid sequence and the presence etc of a N-termial targeting sequence. There is a 0.93 Probability that the protein is cytoplasmic. Now I have to find specific location, what the protein binds to and the structure of what it binds to. If i can identify the structure of the binding domain then I can predict to some extent the structure or a very small piece of the structure ie active site and can use this to perform function based analysis?

Error creating thumbnail: File missing
Binding Site Analysis From ProFunc Using Human Sequence


ProFunc Analysis:

http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/profunc/GetResults.pl?source=profunc&user_id=ay65&code=075103

Showed that the domain contains an ARM repeat. Further research into this will be done. Eliza found the same thing.

This shows that there are many binding sites. To get to this image follow the link under the Cleft Sites analysis on the ProFunc results page.

Still need to ID what is the significance of all the results uncovered by ProKnow

Will go into this more next week

But it is interesting to know for the time being that both eliza and I have found that the function has something to do with the methylated cap on RNA and that it is this process with-in the cytoplasm (as opposed to in the nucleus).

Site: Showed that Danio Renio is 99.5% likely to be a match in structure to 2i2O> we can then make an inference that since they are both in the same region (Double Check this on Locate) and they have the same structure and x % similarity in sequence then it is likely they are related in function.

Danio Renio likely structure similarity with 2i2O

http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=ay65&pdb_type=PROFUNC&code=075103&template=sitehit.html&profunc=TRUE&u=&l=1.1&o=SITE

Same alignment results for 1hu3 eIF4Gii. http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=ay65&pdb_type=PROFUNC&code=075103&template=sitehit.html&profunc=TRUE&u=&l=2.1&o=SITE

NEST Analysis: Found 3 nests within Structure. This provided possible functional residues.


Alignment picture.gif

Figure 1.0: Alignment obtained from ProFunc NEST

Superfamily Results showed 1 sequence motif found in the sequence provided. http://supfam.org/SUPERFAMILY/cgi-bin/scop.cgi?sunid=48371 This revealed the presence of ARM Repeats

Superfamily analysis.gif

Figure 1.1: Superfamily analysis revealed 1 sequence motif in the sequence.

http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=ay65&pdb_type=PROFUNC&code=075103&template=nestanal.html&profunc=TRUE&u=&l=1


ProKnow Analysis

Table shows that the most likely molecular function for our protein is RNA binding this is infered by genetic interaction. Most of the Biological processes ID'd are from a traceable author statement. Number of clues are 6 and 4 respectively. 1-2 is considered weak therefore 4 and 6 probably arent greatly significant but perhaps high enough to make some inferrences.

When looking at the Master Table from results - note the following:

 *Clue 1  Frequency of the ontolgies obtained from Blast hits 
 *Clue 2  Score for the ontology from Blast Evalues. The best evalue available for the ontology is taken (only 4 
          digits after decimal is shown). 
 *Clue 3  Frequency of ontologies from 3D motifs 
 *Clue 4  Score of ontologies from 3D motifs based on conservation. It is the average of scores from the motifs 
          associated with the ontology. 
 *Clue 5  Score of ontologies from 3D folds. The best Z_score available for the function is taken. 
 *Clue 6  Frequency of ontologies from 3D folds 
 *Clue 7  Frequency of ontologies from DIP search 
 *Clue 8  Score of ontologies from PROSITE search based on conservation. It is the average of scores from the motifs 
          associated with the ontology. 
 *Clue 9  Frequency of ontologies from PROSITE search 
 *Clue 10 Frequency of ontologies from PROLINKS search
MIDDLE DOMAIN OF HUMAN EIF4GII from Danio Renio Obtained from PDB Search - see below

Our results had a very high reading in clue 8. Does this mean that the sequ is highly conserved??


Article about eIF4GIII Protein - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11172724

Most similar structurally to zebra fish - shows that the sequences within the domain are not similar. The structures among the different proteins of the domain are similar but the sequences are different. Does this make all the phylogeny and sequ analysis stuff kind of redundant??


PDB Search on Danio Renio 1hu3

http://www.rcsb.org/pdb/explore/explore.do?structureId=1HU3


Papers on RNA/DNA Binding Proteins

http://www.wormbook.org/chapters/www_RNAbindingproteins/RNAbindingproteins.html

http://www.molecular-cancer.com/content/3/1/24


Obtained Sequences

Human - Protein Sequ

mgepsreeyk iqsfdaetqq llktalkvac fetedgeysv cqrsysncsr lmpsrcntqy

rdpgavdlek vanvivdhsl qdcvfskeag rmcyaiiqae skqagqsvfr rgllnrlqqe

yqareqlrar slqgwvcyvt ficnifdylr vnnmpmmalv npvydclfrl aqpdslskee

evdclvlqlh rvgeqlekmn gqrmdelfvl irdgfllptg lsslaqllll eiiefraagw

kttpaahkyy ysevsd


>AAH26740 ARM repeat, position: 13-208 (Mouse)

SFDAQTQQLLKTALKDPGAVDLERVANVIVDHSLQDCVFSKEAGRMCYAIIQAESKQAGQSVFRRGLLNRLQKEYDAREQ

LRACSLQGWVCYVTFICNIFDYLRVNNMPMMALVNPVYDCLFQLAQPESLSREEEVDCLVLQLHRVGEQLEKMNGQRMDE

LFILIRDGFLLPTDLSSLARLLLLEMIEFRAAGWK


Mouse - Protein

mseasrddyk iqsfdaetqq llktalkdps avdlervanv ivdhslqdcv fskeagrmcy

aiiqaeskqa gqsvfrrgll nrlqkeydar eqlracslqg wvcyvtficn ifdylrvnnm

pmmalvnpvy dclfqlaqpe slsreeevdc lvlqlhrvge qlekmngqrm delfilirdg

fllptdlssl arllllemie fraagwkttp aahkyyysev sd


FASTA - Human

>gi|21707112|gb|AAH33759.1| MIF4G domain containing [Homo sapiens]

MGEPSREEYKIQSFDAETQQLLKTALKVACFETEDGEYSVCQRSYSNCSRLMPSRCNTQYRDPGAVDLEK

VANVIVDHSLQDCVFSKEAGRMCYAIIQAESKQAGQSVFRRGLLNRLQQEYQAREQLRARSLQGWVCYVT

FICNIFDYLRVNNMPMMALVNPVYDCLFRLAQPDSLSKEEEVDCLVLQLHRVGEQLEKMNGQRMDELFVL

IRDGFLLPTGLSSLAQLLLLEIIEFRAAGWKTTPAAHKYYYSEVSD