Bioinformatic approaches to
finding cis acting regulatory mRNA
elements in eukaryotic mRNA
- focusing on human 3' UTR
analysis
This is a brief introduction related to the TransTerm databases (transterm.otago.ac.nz, mRNA.otago.ac.nz). If you find this useful in your research, please cite our publication in the database issue of Nucleic Acids Research. Other methods are described in the references [1-4] and related web sites [5-9]. The book, RNA motifs and Regulatory Elements , edited by T. Dandekar, provides a comprehensive introduction [10]. We provide a selection of tools via the www interface to access the TransTerm data [11]. A list of related tools and resources is available here.
What are cis
regulatory mRNA elements?
Defined elements in the mRNA that regulate transcript
expression. In general they would respond to the cellular environment
Classification based on location in the mRNA
Many motifs are primarily located in the untranslated regions of mRNA, the 5' UTR or 3' UTR of mRNA sequences. They have been reported less commonly in coding sequences (see below).
What do motifs do? Classification based on function
Motifs in particular mRNAs and translated viral RNAs have been shown to be involved in mediate many functions and post-translational controls in cells. These include (with recent selected references):
* Localising the mRNA (zip code motifs, [12-17])
* Stabilising or destabilising mRNA (stability elements, SE s), [18-22])
* Repressing translation (translational repressors, TR s), [23-25])
* Enhancing translation (translational enhancers, TE s), [26, 27])
* Affecting polyadenylation (polyadenylation elements, PE s), [28-30]) or 3' UTR maturation [31].
* Control the efficiency of translation initiation or promote initiation in abnormal contexts (for instance internal initiation within eukaryotic mRNAs can be achieved via internal ribosome entry sites (IRES s) [32-35]), these may be bound by IRES interacting proteins or ITAF s.
* Promote alternative reading, or recoding, of the genetic code (such as frameshifting (frameshifting elements, FSE s) [36-41] }, readthrough, (readthrough elements, RTE s) [42-44]}, and selenocysteine incorporation (SECIS motif), [45-47]) these are particularly prominent in positive stranded RNA viruses [27].
* Targets of small regulatory RNAs eg microRNA (miRNA) or antisense RNA [53-57].
* Targets of small molecules, eg riboswitches.
* Splicing enhancers or silencers may be present in the mature mRNA sequence [48-52]. In addition sequences corresponding to transcription factor binding sites in the DNA may be present in the mRNA, but do not function there.
Note:
Some RNAs with typical mRNA structures (cap and polyA tail) may not encode large proteins, a class of non-coding mRNAs [58].
Detailed examples of some of these motifs can be found by using the "Describe TransTerm motifs" in the pull down menu and choosing a pattern.
A sequence and structural classification:
Motifs can be classified into three broad classes based on structure
* 1. Sequence alone
* 2. Structure alone
* 3. A combination of sequence and structure
How can I find these types of motifs computationally?
These classes of motifs reflect the different ways in which they interact with other RNAs, RNA-binding proteins or ribosomes. The different classes require different methods for computational recognition. Two types of questions are often asked: "How can I find known motifs in my sequence?" or, "Given a group of related sequences how can I find common motifs?"
1. Sequence alone. Motifs can vary greatly in size, although many are small ~4-8 bases long, and may repeat in the sequence eg ARE elements.
*A single mRNA. These may be recognised by RNA binding proteins or by other RNAs. Known motifs can be identified computationally using consensus sequences, consensus matrices and statistical models of motifs. The first two are provided at this site. More sophisticated methods are available, but these will usually require implementing the programs at your site. Examples are the common AU Rich Elements (ARE, repeating core motifs of AUUUA) or rare Nanos Response Elements (NRE, repeating motifs of UUGU). Although superficially similar, these motifs are recognised by different classes of proteins. Furthermore the function of such motifs may be determined by the binding of secondary ligand (s). Thus, a destabilising element in one cell may stabilise it in another.
* Aligned or unaligned related sequences. Methods involving local alignments are usually utilised to find small motifs or structures [57]. Methods that attempt to find global alignments e.g. ClustalW or pileup, are not so successful, although they will find longer motifs.
2. Structure alone.
*A single mRNA. Known motifs may be described and searched for at this site using user-defined base pairing rules. Methods involving energy minimisation, utilising thermodynamic parameters are available [59, 60]. However, the theoretically most stable structure may not the physiological motif, as other proteins, RNAs and complexes binding the mRNA will affect structure. Induced fit has been demonstrated in RNA-protein recognition [58, 61]. In some cases simplification of the structure may assist analysis [62].
It should also be recognised that unusual base-pairing may, in some cases contribute to unusual structures, for example A-G base pairs in the SECIS element [63]. These unusual base pairs, and the more common U-U and G-G base pairs, will not be favoured by thermodynamic computational approaches. Unusual base pairs or pinched out bases may provide discrimination between similar structural motifs.
* Aligned or unaligned related sequences. By definition, it is difficult to make a multiple alignment of sequences with only conservation in structure. However, new methods for recognition of structural motifs in unaligned sequences have recently become available.
3. A combination of sequence and structure
*A single mRNA. Known motifs may be described and searched for at this site using user-defined base pairing rules and consensus methods. For example the well characterised Iron Response Element (IRE) [64-66].
* Aligned or unaligned related sequences. Few methods currently exist to combine the approaches described above. Utilisation of both sequence and structural recognition elements may allow the discovery of such motifs [67, 68].
How do I know if this match is significant?
This is perhaps the most difficult question. It is possible to apply statistical methods to determine how often a sequence motif is expected to occur by chance in a particular database. Small motifs will give many false positives. When ascertaining significance it is essential to take into account the expected composition of the bases in similar regions of the genome in question. Usually at least dinucleotide bias is taken into account.
In addition searching for motifs in regions of similar composition where they are known not to function can give an estimate of the false positive rate. For most patterns described in TransTerm we give an estimate of the number of hits in a typical mRNA database.
Motifs in coding sequences.
Much of a mRNA sequence encodes protein and is thus constrained [58], motifs in the 5' or 3' UTRs have been easier to identify [69, 70]}. However, coding region motifs have previously been discovered experimentally [71, 72]. Computational methods to discover regulatory elements within coding are now becoming feasible, following the sequencing of many genomes [43, 73].
1. Gorodkin, J., S.L. Stricklin, and G. Stormo,
(2001) Discovering common-stem loop motifs in unaligned RNA sequences.
Nucleic Acids Res., 29:2135-2144.
2. Pyronnet, S. and N. Sonenberg, (2001)
Cell-cycle-dependent translational control. Curr Opin Genet Dev, 11:13-8.
3. Cooperstock, R.L. and H.D. Lipshitz,
(2001) RNA localization and translational regulation during axis specification
in the Drosophila oocyte. Int Rev Cytol, 203:541-66.
4. Ohler, U. and H. Niemann, (2001)
Identification and analysis of eukaryotic promoters: recent computational approaches.
Trends Genet, 17:56-60.
5. Mignone, F., et al., (2005) UTRdb and
UTRsite: a collection of sequences and regulatory motifs of the untranslated
regions of eukaryotic mRNAs. Nucleic Acids Res, 33:D141-6.
6. Castrignano, T., et al., (2004) CSTminer:
a web tool for the identification of coding and noncoding conserved sequence
tags through cross-species genome comparison. Nucleic Acids Research, 32:W624-W627.
7. Griffiths-Jones, S., et al., (2005) Rfam:
annotating non-coding RNAs in complete genomes. Nucleic Acids Res, 33:D121-4.
8. Siebert, S. and R. Backofen, (2005)
MARNA: multiple alignment and consensus structure prediction of RNAs based on
sequence structure comparisons. Bioinformatics, 21:3352-9.
9. Bindewald, E. and B.A. Shapiro, (2006)
RNA secondary structure prediction from sequence alignments using a network of
k-nearest neighbor classifiers. RNA,
12:342-52.
10. Dandekar, T., ed. RNA Motifs and Regulatory Elements. 2002, Springer-Velag: Berlin.
11. Jacobs, G.H., P.A. Stockwell, W.P. Tate,
and C.M. Brown, (2006) Transterm--extended search facilities and improved
integration with other databases. Nucleic Acids Res, 34:D37-40.
12. Nury, D., H. Chabanon, M.
Levadoux-Martin, and J. Hesketh, (2005) An eleven nucleotide section of the
3'-untranslated region is required for perinuclear localization of rat
metallothionein-1 mRNA. Biochem J,
387:419-28.
13. Kloc, M. and L.D. Etkin, (2005) RNA
localization mechanisms in oocytes. J Cell Sci, 118:269-82.
14. Islam, S., R.K. Montgomery, J.J. Fialkovich,
and R.J. Grand, (2005) Developmental and regional expression and localization
of mRNAs encoding proteins involved in RNA translocation. J Histochem
Cytochem, 53:1501-9.
15. Darnell, J.C., O. Mostovetsky, and R.B.
Darnell, (2005) FMRP RNA targets: identification and validation. Genes Brain
Behav, 4:341-9.
16. Colegrove-Otero, L.J., N. Minshall, and
N. Standart, (2005) RNA-binding proteins in early development. Crit Rev
Biochem Mol Biol, 40:21-73.
17. Chabanon, H., I. Mickleburgh, B. Burtle,
C. Pedder, and J. Hesketh, (2005) An AU-rich stem-loop structure is a critical
feature of the perinuclear localization signal of c-myc mRNA. Biochem J, 392:475-83.
18. Bakheet, T., M. Frevel, B.R. Williams, W.
Greer, and K.S. Khabar, (2001) ARED: human AU-rich element-containing mRNA
database reveals an unexpectedly diverse functional repertoire of encoded
proteins. Nucleic Acids Res,
29:246-54.
19. Wang, J., M. Pitarque, and M.
Ingelman-Sundberg, (2006) 3'-UTR polymorphism in the human CYP2A6 gene affects
mRNA stability and enzyme expression. Biochem Biophys Res Commun, 340:491-7.
20. Xu, Y.Z., S. Di Marco, I. Gallouzi, M.
Rola-Pleszczynski, and D. Radzioch, (2005) RNA-binding protein HuR is required
for stabilization of SLC11A1 mRNA and SLC11A1 protein expression. Mol Cell
Biol, 25:8139-49.
21. Barreau, C., L. Paillard, and H.B.
Osborne, (2005) AU-rich elements and associated factors: are there unifying
principles? Nucleic Acids Res,
33:7138-50.
22. Jing, Q., et al., (2005) Involvement of
microRNA in AU-rich element-mediated mRNA instability. Cell, 120:623-34.
23. Coller, J. and R. Parker, (2005) General
translational repression by activators of mRNA decapping. Cell, 122:875-86.
24. Ostareck, D.H., A. Ostareck-Lederer, I.N.
Shatsky, and M.W. Hentze, (2001) Lipoxygenase mRNA silencing in erythroid
differentiation: The 3' UTR regulatory complex controls 60S ribosomal subunit
joining. Cell, 104:281-290.
25. Dean, K.A., A.K. Aggarwal, and R.P.
Wharton, (2002) Translational repressors in Drosophila. Trends Genet, 18:572-7.
26. Cok, S.J. and A.R. Morrison, (2001) The
3'-untranslated region (3' UTR) of murine cyclooxygenase-2 contains multiple
regulatory elements that alter message stability and translational efficiency.
J Biol Chem, 9:9.
27. Dreher, T.W. and W.A. Miller, (2006)
Translational control in positive strand RNA plant viruses. Virology, 344:185-97.
28. Oh, B., S.Y. Hwang, J. McLaughlin, D.
Solter, and B.B. Knowles, (2000) Timely translation during the mouse
oocyte-to-embryo transition. Development, 127:3795-3803.
29. Crucs, S., S. Chatterjee, and E.R. Gavis,
(2000) Overlapping but distinct RNA elements control repression and activation
of nanos translation. Molecular Cell, 5:457-467.
30. Milligan, L., C. Torchet, C. Allmang, T.
Shipman, and D. Tollervey, (2005) A nuclear surveillance pathway for mRNAs with
defective polyadenylation. Mol Cell Biol, 25:9996-10004.
31. Prasanth, K.V., et al., (2005) Regulating
gene expression through RNA nuclear retention. Cell, 123:249-63.
32. Jackson, R.J., (2005) Alternative
mechanisms of initiating translation of mammalian mRNAs. Biochem Soc Trans, 33:1231-41.
33. Pilipenko, E.V., E.G. Viktorova, S.T.
Guest, V.I. Agol, and R.P. Roos, (2001) Cell-specific proteins regulate viral
RNA translation and virus-induced disease. Embo J, 20:6899-908.
34. Kwok, L.W., et al., (2006) Concordant
exploration of the kinetics of RNA folding from global and local perspectives.
J Mol Biol, 355:282-93.
35. Mokrejs, M., et al., (2006) IRESite: the database
of experimentally verified IRES structures (www.iresite.org). Nucleic Acids
Res, 34:D125-30.
36. Herr, A.J., J.F. Atkins, and R.F.
Gesteland, (2000) Coupling of open reading frames by translational bypassing.
Annu Rev Biochem, 69:343-72.
37. Yu, E.T., Q. Zhang, and D. Fabris, (2005)
Untying the FIV frameshifting pseudoknot structure by MS3D. J Mol Biol, 345:69-80.
38. Baranov, P.V., O.L. Gurvich, A.W. Hammer,
R.F. Gesteland, and J.F. Atkins, (2003) Recode 2003. Nucleic Acids Res, 31:87-9.
39. Baranov, P.V., et al., (2001) RECODE: a
database of frameshifting, bypassing and codon redefinition utilized for gene
expression. Nucleic Acids Res,
29:264-7.
40. Baranov, P.V., O. Fayet, R.W. Hendrix,
and J.F. Atkins, (2006) Recoding in bacteriophages and bacterial IS elements.
Trends Genet, 22:174-81.
41. Ivanov, I.P., R.F. Gesteland, and J.F.
Atkins, (2006) Evolutionary specialization of recoding: Frameshifting in the
expression of S. cerevisiae antizyme mRNA is via an atypical antizyme shift
site but is still +1. RNA,
12:332-7.
42. Tork, S., I. Hatin, J.P. Rousset, and C.
Fabret, (2004) The major 5' determinant in stop codon read-through involves two
adjacent adenines. Nucleic Acids Res, 32:415-21.
43. Firth, A.E. and C.M. Brown, (2006)
Detecting overlapping coding sequences in virus genomes. BMC Bioinformatics, 7:75.
44. Harrell, L., U. Melcher, and J.F. Atkins,
(2002) Predominance of six different hexanucleotide recoding signals 3' of
read-through stop codons. Nucleic Acids Res, 30:2011-7.
45. Castellano, S., et al., (2001) In silico
identification of novel selenoproteins in the Drosophila melanogaster genome.
EMBO Rep, 2:697-702.
46. Small-Howard, A., et al., (2006)
Supramolecular complexes mediate selenocysteine incorporation in vivo. Mol
Cell Biol, 26:2337-46.
47. Allamand, V., et al., (2006) A single
homozygous point mutation in a 3'untranslated region motif of selenoprotein N
mRNA causes SEPN1-related myopathy. EMBO Rep.
48. Cartegni, L., J. Wang, Z. Zhu, M.Q.
Zhang, and A.R. Krainer, (2003) ESEfinder: A web resource to identify exonic
splicing enhancers. Nucleic Acids Res, 31:3568-71.
49. Xu, D.Q. and W. Mattox, (2006)
Identification of a splicing enhancer in MLH1 using COMPARE, a new assay for
determination of relative RNA splicing efficiencies. Hum Mol Genet, 15:329-36.
50. Wu, Y., Y. Zhang, and J. Zhang, (2005)
Distribution of exonic splicing enhancer elements in human genes. Genomics, 86:329-36.
51. Zatkova, A., et al., (2004) Disruption of
exonic splicing enhancer elements is the principal cause of exon skipping
associated with seven nonsense or missense alleles of NF1. Hum Mutat, 24:491-501.
52. Wang, J., P.J. Smith, A.R. Krainer, and
M.Q. Zhang, (2005) Distribution of SR protein exonic splicing enhancer motifs
in human protein-coding genes. Nucleic Acids Res, 33:5053-62.
53. Perkins, D.O., C. Jeffries, and P.
Sullivan, (2005) Expanding the 'central dogma': the regulatory role of
nonprotein coding genes and implications for the genetic liability to
schizophrenia. Mol Psychiatry,
10:69-78.
54. Rusinov, V., V. Baev, I.N. Minkov, and M.
Tabler, (2005) MicroInspector: a web tool for detection of miRNA binding sites
in an RNA sequence. Nucleic Acids Res, 33:W696-700.
55. Zhang, Y., (2005) miRU: an automated
plant miRNA target prediction server. Nucleic Acids Res, 33:W701-4.
56. Hsu, P.W., et al., (2006) miRNAMap:
genomic maps of microRNA genes and their target genes in mammalian genomes.
Nucleic Acids Res, 34:D135-9.
57. Xie, X., et al., (2005) Systematic
discovery of regulatory motifs in human promoters and 3' UTRs by comparison of
several mammals. Nature,
434:338-45.
58. Meyer, I.M. and I. Miklos, (2005)
Statistical evidence for conserved, local secondary structure in the coding
regions of eukaryotic mRNAs and pre-mRNAs. Nucleic Acids Res, 33:6338-48.
59. Zuker, M. and A.B. Jacobson, (1998) Using
reliability information to annotate RNA secondary structures. RNA, 4:669-79.
60. Hofacker, I.L., (2003) Vienna RNA
secondary structure server. Nucleic Acids Res, 31:3429-31.
61. Williamson, J.R., (2001) Proteins that
bind RNA and the labs who love them. Nat Struct Biol, 8:390-1.
62. Reeder, J. and R. Giegerich, (2005)
Consensus shapes: an alternative to the Sankoff algorithm for RNA consensus
structure prediction. Bioinformatics, 21:3516-23.
63. Busch, A., S. Will, and R. Backofen,
(2005) SECISDesign: a server to design SECIS-elements within the coding
sequence. Bioinformatics,
21:3312-3.
64. Paraskeva, E., N.K. Gray, B. Schlager, K.
Wehr, and M.W. Hentze, (1999) Ribosomal pausing and scanning arrest as
mechanisms of translational regulation from cap-distal iron-responsive
elements. Mol Cell Biol,
19:807-16.
65. Tsuji, Y., et al., (2000) Coordinate
transcriptional and translational regulation of ferritin in response to
oxidative stress. Molecular & Cellular Biology, 20:5818-5827.
66. Pavesi, G., G. Mauri, M. Stefani, and G.
Pesole, (2004) RNAProfile: an algorithm for finding conserved secondary
structure motifs in unaligned RNA sequences. Nucleic Acids Res, 32:3258-69.
67. Washietl, S., I.L. Hofacker, M. Lukasser,
A. Huttenhofer, and P.F. Stadler, (2005) Mapping of conserved RNA secondary
structures predicts thousands of functional noncoding RNAs in the human genome.
Nat Biotechnol, 23:1383-90.
68. Hofacker, I.L., M. Fekete, and P.F.
Stadler, (2002) Secondary structure prediction for aligned RNA sequences. J
Mol Biol, 319:1059-66.
69. Winkler, W.C. and R.R. Breaker, (2005)
Regulation of bacterial gene expression by riboswitches. Annu Rev Microbiol, 59:487-517.
70. Shabalina, S.A., A.Y. Ogurtsov, I.B.
Rogozin, E.V. Koonin, and D.J. Lipman, (2004) Comparative analysis of
orthologous eukaryotic mRNAs: potential hidden functional signals. Nucleic
Acids Res, 32:1774-82.
71. Lemm, I. and J. Ross, (2002) Regulation
of c-myc mRNA decay by translational pausing in a coding region instability
determinant. Mol Cell Biol,
22:3959-69.
72. Ioannidis, P., et al., (2003) CRD-BP: a
c-Myc mRNA stabilizing protein with an oncofetal pattern of expression.
Anticancer Res, 23:2179-83.
73. Firth, A.E. and C.M. Brown, (2005)
Detecting overlapping coding sequences with pairwise alignments.
Bioinformatics, 21:282-92.
Chris Brown and TransTerm team
Biochemistry Department, University of Otago, Dunedin, New Zealand
Last updated 10/06.