2gnx Results: Difference between revisions
Line 87: | Line 87: | ||
Zoraghi R. et al. | Zoraghi R. et al. | ||
SX(13-18)FDX(18-22)IAX(21)[Y/N]X(2)VDX(2)TX(3)TX(19)[E/Q] | SX(13-18)FDX(18-22)IAX(21)[Y/N]X(2)VDX(2)TX(3)TX(19)[E/Q] | ||
>2GNX:A|PDBID|CHAIN|SEQUENCE | >2GNX:A|PDBID|CHAIN|SEQUENCE | ||
KTMLLAKFSFYFHEALSRQTTASEMKALTAKANPDLFGKISSFIRKYDAANVSLIFDNRGSESFQGHGYHHPHSYREAPK | KTMLLAKFSFYFHEALSRQTTASEMKALTAKANPDLFGKISSFIRKYDAANVSLIFDNRGSESFQGHGYHHPHSYREAPK | ||
S.....(13)....FD......x(19)........IA | S.....(13)....FD......x(19)........IA | ||
GVDQYPAVVSLPSDRPVMHWPNVIMIMTDRASDLNSLEKVVHFYDDKVQSTYFLTRPEPHFTIVVIFESKKSERDSHFIS | GVDQYPAVVSLPSDRPVMHWPNVIMIMTDRASDLNSLEKVVHFYDDKVQSTYFLTRPEPHFTIVVIFESKKSERDSHFIS | ||
YVD TxxxT.....x(17).......E | YVD TxxxT.....x(17).......E | ||
FLNELSLALKNPKVFASLKPGSKG | FLNELSLALKNPKVFASLKPGSKG | ||
Revision as of 12:09, 11 June 2007
Structural analysis
A Dali analysis (Table 1) of the 2GNX protein was highly inconclusive and there were no significant structural matches to the hypothetical protein.
Table 1: A Dali analysis of the 2GNX protein
NR. STRID1 STRID2 Z RMSD LALI LSEQ2 %IDE REVERS PERMUT NFRAG TOPO PROTEIN 1: 3023-A 2gnx-A 42.9 0.0 280 280 100 0 0 1 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION hypothetical pro 2: 3023-A 2cmr-A 5.7 3.5 114 192 11 0 0 11 S IMMUNOGLOBULIN COMPLEX d5 (fab heavy chain) d5 (fab li 3: 3023-A 1j3w-A 5.7 3.2 99 134 12 0 0 9 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION giding protein-m 4: 3023-A 1jmr-A 5.5 3.0 94 246 9 0 0 12 S 5: 3023-A 1f5m-B 5.5 5.0 107 177 9 0 0 13 S SIGNALING PROTEIN gaf (saccharomyces cerevisiae) yeas 6: 3023-A 1vcs-A 5.0 4.7 82 102 9 0 0 8 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION vesicle transpor 7: 3023-A 1kt0-A 4.9 2.8 81 357 6 0 0 7 S ISOMERASE 51 kda fk506-binding protein (fkbp51) Mutant 8: 3023-A 1e2a-A 4.9 4.5 80 102 9 0 0 6 S TRANSFERASE enzyme iia (enzyme iii, lactose-specific i 9: 3023-A 2d2s-A 4.8 3.1 75 217 11 0 0 5 S ENDOCYTOSIS/EXOCYTOSIS exocyst complex component exo84 10: 3023-A 2oew-A 4.7 2.8 119 358 8 0 0 12 S PROTEIN TRANSPORT programmed cell death 6-interacting 11: 3023-A 1h3q-A 4.7 4.2 92 140 4 0 0 11 S TRANSPORT sedlin (sedl) (mus musculus) mouse S.B.Jan 12: 3023-A 2oev-A 4.5 36.5 151 697 7 0 0 14 S PROTEIN TRANSPORT programmed cell death 6-interacting 13: 3023-A 2cwy-A 4.5 2.4 82 92 20 0 0 7 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION hypothetical pro 14: 3023-A 2c5i-T 4.5 2.8 75 93 11 0 0 5 S PROTEIN TRANSPORT/COMPLEX t-snare affecting a late gol 15: 3023-A 3nul 4.4 3.4 93 130 5 0 0 11 S ACTIN-BINDING PROTEIN profilin i (arabidopsis thalian
A Dali analysis carried out separately with only the N-terminal domain (Table 2) of the protein also did not produce any significant structural matches.
Table 2: Dali analysis of N-terminal domain
NR. STRID1 STRID2 Z RMSD LALI LSEQ2 %IDE REVERS PERMUT NFRAG TOPO PROTEIN 1: 3256-A 2gnx-A 23.2 0.0 173 280 100 0 0 1 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION hypothetical pro 2: 3256-A 1e2a-A 7.5 4.5 80 102 9 0 0 6 S TRANSFERASE enzyme iia (enzyme iii, lactose-specific i 3: 3256-A 1kt0-A 7.4 2.8 81 357 6 0 0 7 S ISOMERASE 51 kda fk506-binding protein (fkbp51) Mutant 4: 3256-A 2d2s-A 7.3 3.1 75 217 11 0 0 5 S ENDOCYTOSIS/EXOCYTOSIS exocyst complex component exo84 5: 3256-A 1vcs-A 7.3 4.7 78 102 9 0 0 7 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION vesicle transpor 6: 3256-A 2cmr-A 6.9 3.2 104 192 11 0 0 9 S IMMUNOGLOBULIN COMPLEX d5 (fab heavy chain) d5 (fab li 7: 3256-A 2c5i-T 6.9 2.8 75 93 11 0 0 5 S PROTEIN TRANSPORT/COMPLEX t-snare affecting a late gol 8: 3256-A 2h7o-A 6.8 3.0 81 270 5 0 0 7 S SIGNALING PROTEIN protein kinase ypka fragment (protei 9: 3256-A 2h7v-C 6.6 4.2 76 269 13 0 0 5 S SIGNALING PROTEIN migration-inducing protein 5 (ras-re 10: 3256-A 2dnx-A 6.5 4.9 80 130 6 0 0 6 S TRANSPORT PROTEIN syntaxin-12 fragment (homo sapiens) 11: 3256-A 1hg5-A 6.5 3.2 85 263 9 0 0 6 S ENDOCYTOSIS clathrin assembly protein short form frag 12: 3256-A 1a17 6.4 2.5 71 159 3 0 0 5 S HYDROLASE serineTHREONINE PROTEIN PHOSPHATASE 5 fragme 13: 3256-A 2if4-A 6.3 2.5 82 258 7 0 0 7 S SIGNALING PROTEIN atfkbp42 fragment (twd1 (twisted dwa 14: 3256-A 1owa-A 6.2 3.3 76 156 12 0 0 6 S CYTOKINE spectrin alpha chain, erythrocyte fragment (e 15: 3256-A 2oew-A 6.1 2.8 119 358 8 0 0 12 S PROTEIN TRANSPORT programmed cell death 6-interacting
However, a Dali analysis (Table 3) carried out with the C-terminal domain of the protein produced one significant structural match, this being the GAF signalling protein, i.e the 4th result in the Dali analysis.
Table 3: Dali analysis of C-terminal domain
NR. STRID1 STRID2 Z RMSD LALI LSEQ2 %IDE REVERS PERMUT NFRAG TOPO PROTEIN 1: 3257-A 2gnx-A 24.3 0.0 118 280 100 0 0 1 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION hypothetical pro 2: 3257-A 1jmr-A 7.6 3.0 94 246 9 0 0 12 S 3: 3257-A 1j3w-A 7.5 2.9 91 134 13 0 0 7 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION giding protein-m 4: 3257-A 1f5m-B 6.8 2.9 95 177 9 0 0 10 S SIGNALING PROTEIN gaf (saccharomyces cerevisiae) yeas 5: 3257-A 1h3q-A 6.6 4.2 92 140 4 0 0 11 S TRANSPORT sedlin (sedl) (mus musculus) mouse S.B.Jan 6: 3257-A 3nul 6.3 3.4 93 130 5 0 0 11 S ACTIN-BINDING PROTEIN profilin i (arabidopsis thalian 7: 3257-A 1mc0-A 5.8 4.1 99 341 8 0 0 11 S HYDROLASE 3',5'-cyclic nucleotide phosphodiesterase 2a 8: 3257-A 2h28-A 5.4 2.8 75 106 8 0 0 10 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION hypothetical pro 9: 3257-A 2p7j-A 5.0 2.9 79 262 13 0 0 11 S TRANSCRIPTION putative sensory boxGGDEF FAMILY PROTEIN 10: 3257-A 2dmw-A 5.0 3.3 85 131 7 0 0 11 S MEMBRANE PROTEIN synaptobrevin-like 1 variant fragment 11: 3257-A 2avx-A 4.8 3.6 93 171 5 0 0 10 S TRANSCRIPTION regulatory protein sdia Mutant (escheri 12: 3257-A 2j3t-C 4.7 5.2 83 141 7 0 0 8 S PROTEIN TRANSPORT trafficking protein particle complex 13: 3257-A 2hj9-C 4.7 3.3 76 210 5 0 0 9 S SIGNALING PROTEIN autoinducer 2-binding periplasmic pr 14: 3257-A 2hje-A 4.6 3.0 75 210 5 0 0 9 S SIGNALING PROTEIN autoinducer 2 sensor kinasePHOSPHATA 15: 3257-A 2uv0-E 4.5 3.5 93 159 9 0 0 12 S TRANSCRIPTION transcriptional activator protein lasr
An analysis of the secondary structure of the protein from its amino acid sequence (Figure 1) shows the secondary structural arrangement of different regions of our protein
Figure 1: Secondary structure analysis of the 2GNX protein from Protein Data Bank
Figure 2: Dotlet Analysis for 2GNX
The Dotlet analysis (Figure 2) showed that there was no internally homologous repeats in 2GNX.
USR1:A 185/392 QVAKNLFTH---LDDVSVLLQEIITEARNLSNAEICSVFLLDQ----------------- USR2:A 181/283 TASEXKALTAKANPDLFGKISSFIRKY------DAANVSLIFDNRGSESFQGHGYHHPHS USR1:A 225/432 ----------NELVAKVFDGGVVDDESYEIRIPADQGIAGHVATTG----------QILN USR2:A 235/#44 YREAPKGVDQYPAVVSLP----------SDRPVXHWPNVIXIXTDRASDLNSLEKVVHFY USR1:A 265/472 IPDAYAHPLFYRGVDDSTGFRTRNILCFPIKNENQEVIGVAELVNKINGPWFSKFDEDLA USR2:A 285/387 DDKV-------------------QSTYFLTRPEP-HFTIVVIFESK---------KSERD USR1:A 325/532 TAFSIYCGISIAHSLL USR2:A 316/418 SHFISFLNELSLALKN
The conserved residues of the ligand binding site in 1MC0 were not consist with the aligned residues in 2GNX.
Zoraghi R. et al.
SX(13-18)FDX(18-22)IAX(21)[Y/N]X(2)VDX(2)TX(3)TX(19)[E/Q]
>2GNX:A|PDBID|CHAIN|SEQUENCE
KTMLLAKFSFYFHEALSRQTTASEMKALTAKANPDLFGKISSFIRKYDAANVSLIFDNRGSESFQGHGYHHPHSYREAPK
S.....(13)....FD......x(19)........IA
GVDQYPAVVSLPSDRPVMHWPNVIMIMTDRASDLNSLEKVVHFYDDKVQSTYFLTRPEPHFTIVVIFESKKSERDSHFIS
YVD TxxxT.....x(17).......E
FLNELSLALKNPKVFASLKPGSKG
Figure 3: CE predicted structural alignment. USR1 = 1MC0(PDB code), Regulatory Segment of Mouse 3',5'-Cyclic Nucleotide Phosphodiesterase 2A, Containing the GAF A and GAF B Domains. USR2= 2GNX
Functional Analysis
STRING and CDART returned no results for the submitted protein data. BlastP returned results however the results were limited to hypothetical proteins that gave no added information.
Score (Bits) | E Value | |||
ref | XP_001163972.1 | PREDICTED: similar to FLJ32549 protein [Pan | 850 | 0.0 |
ref | XP_001116860.1 | PREDICTED: hypothetical protein isoform 1 [M | 848 | 0.0 |
ref | NP_689653.3 | hypothetical protein LOC144577 [Homo sapiens... | 847 | 0.0 |
gb | AAH36246.1 | FLJ32549 protein [Homo sapiens] | 846 | 0.0 |
ref | XP_001116875.1 | PREDICTED: hypothetical protein isoform 3 [M | 843 | 0.0 |
ref | XP_531657.2 | PREDICTED: hypothetical protein XP_531657 [Cani | 827 | 0.0 |
ref | XP_615557.3 | PREDICTED: hypothetical protein [Bos taurus] | 823 | 0.0 |
gb | EDL24424.1 | cDNA sequence BC048403, isoform CRA_a [Mus muscul | 803 | 0.0 |
ref | NP_766610.2 | hypothetical protein LOC270802 [Mus musculus... | 803 | 0.0 |
ref | XP_576234.2 | PREDICTED: hypothetical protein [Rattus norv... | 802 | 0.0 |
ref | XP_001364942.1 | PREDICTED: hypothetical protein [Monodelphis | 797 | 0.0 |
ref | XP_416063.1 | PREDICTED: hypothetical protein [Gallus gallus] | 796 | 0.0 |
dbj | BAC39804.1 | unnamed protein product [Mus musculus] | 760 | 0.0 |
ref | XP_001116868.1 | PREDICTED: hypothetical protein isoform 2 [M | 743 | 0.0 |
ref | NP_001085035.1 | hypothetical protein LOC432102 [Xenopus l... | 697 | 0.0 |
ref | NP_001025261.1 | hypothetical protein LOC555715 [Danio rer... | 665 | 0.0 |
ref | NP_001076454.1 | hypothetical protein LOC100005809 [Danio ... | 661 | 0.0 |
ref | XP_001331282.1 | PREDICTED: hypothetical protein [Danio rerio | 598 | 2e-169 |
emb | CAG12393.1 | unnamed protein product [Tetraodon nigroviridis] | 593 | 8e-168 |
pdb | 2GNX | A Chain A, X-Ray Structure Of A Hypothetical Protein... | 554 | 3e-156 |
dbj | BAE41440.1 | unnamed protein product [Mus musculus] | 508 | 2e-142 |
ref | NP_001038719.1 | hypothetical protein LOC692281 [Danio rer... | 357 | 1e-96 |
ref | XP_624797.1 | PREDICTED: hypothetical protein [Apis mellifera | 235 | 3e-60 |
ref | XP_974676.1 | PREDICTED: hypothetical protein [Tribolium cast | 232 | 5e-59 |
ref | XP_001193974.1 | PREDICTED: hypothetical protein [Strongyloce | 208 | 5e-52 |
ref | XP_797380.2 | PREDICTED: hypothetical protein, partial [St... | 207 | 2e-51 |
dbj | BAE37112.1 | unnamed protein product [Mus musculus] >dbj B... | 134 | 2e-29 |
gb | EDL24425.1 | cDNA sequence BC048403, isoform CRA_b [Mus muscul | 132 | 6e-29 |
ref | XP_642387.1 | hypothetical protein DDBDRAFT_0205477 [Dicty... | 87.8 | 1e-15 |
emb | CAJ08583.1 | hypothetical protein, conserved [Leishmania majo | 36.6 | 3.5 |
Locate analysis predicted that the protein is a soluble non-secreted protein. Localisation data was diverse as follows: Method Predicted Subcellular Location Evaluation
Method | Location | Score |
CELLO | Mitochondrion | 1.34 |
CELLO | Extracellular region | 1.08 |
pTarget | Endoplasmic reticulum | 93.90 |
Proteome Analyst | No prediction | 0.00 |
WoLFPSORT | Cytoplasm | 13.00 |
WoLFPSORT | Nucleus | 12.00 |
WoLFPSORT | Golgi apparatus | 3.00 |
MultiLoc | Peroxisome | 0.49 |
MultiLoc | Mitochondrion | 0.23 |
MultiLoc | Extracellular region | 0.09 |
Pfam, Profunc, Proknow, and Interpro all returned no results for the protein 2gnxA. However, Symatlas did provide an interesting lead. The expression data is presented in the following diagram. However, the significant results were the number of olfactory receptors with correlated expression profiles.
Olfactory receptors were also encountered when the protein was submitted to cis-RED to retrieve the corresponding cis-regulatory motif patterns. All fourteen motif patterns or modules, corresponding to the BC048403 protein are also motif patterns that are found in many different olfactory receptors. Motifs are predicted by cisRED with p-values < 0.005.
In total, the fourteen motifs corresponded to 120 different olfactory receptors. The following table lists the olfactory receptors with 3 or more co-occurring motifs. The header row lists the fourteen modules. Highlighted in orange (nine co-occurring modules) and green (7 co-occurring modules), are the olfactory receptors having the most modules in common with the BC048403 protein.
The following graph represents the number of co-occurring motifs across the entire range of 120 corresponding olfactory receptors.
These motifs were searched for in the other species databases of cis-RED however they were not found as there is no inter-species search tool. Unfortunately, micro-array expression data for the olfactory receptors with the most co-occurring motifs, were unavailable.
The following micro-array data was found performing a search on the human ortholog using GEO Profiles.
Other interesting motifs found to appear in the Bc048403 protein were motifs that corresponded to the cadherin family.