Bioinformatic approaches to finding cis acting regulatory mRNA elements in eukaryotic mRNA

- focusing on human 3' UTR analysis

 

 

This is a brief introduction related to the TransTerm databases (transterm.otago.ac.nz, mRNA.otago.ac.nz). If you find this useful in your research, please cite our publication in the database issue of Nucleic Acids Research. Other methods are described in the references [1-4] and related web sites [5-9]. The book, RNA motifs and Regulatory Elements , edited by T. Dandekar, provides a comprehensive introduction [10].  We provide a selection of tools via the www interface to access the TransTerm data [11]. A list of related tools and resources is available here.

 

What are cis regulatory mRNA elements?

 

Defined elements in the mRNA that regulate transcript expression. In general they would respond to the cellular environment

 

Classification based on location in the mRNA

 

Many motifs are primarily located in the untranslated regions of mRNA, the 5' UTR or 3' UTR of mRNA sequences. They have been reported less commonly in coding sequences (see below).

 

 

What do motifs do? Classification based on function

 

Motifs in particular mRNAs and translated viral RNAs have been shown to be involved in mediate many functions and post-translational controls in cells. These include (with recent selected references):

 

*               Localising the mRNA (zip code motifs, [12-17])

 

*               Stabilising or destabilising mRNA (stability elements, SE s), [18-22])

 

*               Repressing translation (translational repressors, TR s), [23-25])

 

*              Enhancing translation (translational enhancers, TE s), [26, 27])

 

*               Affecting polyadenylation (polyadenylation elements, PE s), [28-30]) or 3' UTR maturation [31].

 

*               Control the efficiency of translation initiation or promote initiation in abnormal contexts (for instance internal initiation within eukaryotic mRNAs can be achieved via internal ribosome entry sites (IRES s) [32-35]), these may be bound by IRES interacting proteins or ITAF s.

 

*               Promote alternative reading, or recoding, of the genetic code (such as frameshifting (frameshifting elements, FSE s) [36-41] }, readthrough, (readthrough elements,  RTE s) [42-44]}, and selenocysteine incorporation (SECIS motif), [45-47]) these are particularly prominent in positive stranded RNA viruses [27].

 

 

*              Targets of small regulatory RNAs eg microRNA (miRNA) or antisense RNA [53-57].

 

*               Targets of small molecules, eg riboswitches.

 

*              Splicing enhancers or silencers may be present in the mature mRNA sequence [48-52]. In addition sequences corresponding to transcription factor binding sites in the DNA may be present in the mRNA, but do not function there.

 

Note:

          Some RNAs with typical mRNA structures (cap and polyA tail) may not encode  large proteins, a class of non-coding mRNAs [58].

 

Detailed examples of some of these motifs can be found by using the "Describe TransTerm motifs" in the pull down menu and choosing a pattern.

 

A sequence and structural classification:

 

Motifs can be classified into three broad classes based on structure

 

*          1. Sequence alone

 

*          2. Structure alone

 

*          3. A combination of sequence and structure

 

 

How can I find these types of motifs computationally?

 

These classes of motifs reflect the different ways in which they interact with other RNAs, RNA-binding proteins or ribosomes. The different classes require different methods for computational recognition. Two types of questions are often asked: "How can I find known motifs in my sequence?" or, "Given a group of related sequences how can I find common motifs?"

 

1. Sequence alone. Motifs can vary greatly in size, although many are small ~4-8 bases long, and may repeat in the sequence eg ARE elements.

 

*A single mRNA. These may be recognised by RNA binding proteins or by other RNAs. Known motifs can be identified computationally using consensus sequences, consensus matrices and statistical models of motifs. The first two are provided at this site. More sophisticated methods are available, but these will usually require implementing the programs at your site. Examples are the common AU Rich Elements (ARE, repeating core motifs of AUUUA) or rare Nanos Response Elements (NRE, repeating motifs of UUGU). Although superficially similar, these motifs are recognised by different classes of proteins. Furthermore the function of such motifs may be determined by the binding of secondary ligand (s). Thus, a destabilising element in one cell may stabilise it in another.

 

* Aligned or unaligned related sequences. Methods involving local alignments are usually utilised to find small motifs or structures [57]. Methods that attempt to find global alignments e.g. ClustalW or pileup, are not so successful, although they will find longer motifs.

 

2. Structure alone.

 

*A single mRNA. Known motifs may be described and searched for at this site using user-defined base pairing rules. Methods involving energy minimisation, utilising thermodynamic parameters are available [59, 60].  However, the theoretically most stable structure may not the physiological motif, as other proteins, RNAs and complexes binding the mRNA will affect structure.  Induced fit has been demonstrated in RNA-protein recognition [58, 61]. In some cases simplification of the structure may assist analysis [62].

 

It should also be recognised that unusual base-pairing may, in some cases contribute to unusual structures, for example A-G base pairs in the SECIS element [63]. These unusual base pairs, and the more common U-U and G-G base pairs, will not be favoured by thermodynamic computational approaches. Unusual base pairs or pinched out bases may provide discrimination between similar structural motifs.

 

* Aligned or unaligned related sequences. By definition, it is difficult to make a multiple alignment of sequences with only conservation in structure.  However, new methods for recognition of structural motifs in unaligned sequences have recently become available.

 

3. A combination of sequence and structure

 

*A single mRNA. Known motifs may be described and searched for at this site using user-defined base pairing rules and consensus methods. For example the well characterised Iron Response Element (IRE) [64-66]. 

 

* Aligned or unaligned related sequences. Few methods currently exist to combine the approaches described above. Utilisation of both sequence and structural recognition elements may allow the discovery of such motifs [67, 68].  

 

How do I know if this match is significant?

 

This is perhaps the most difficult question. It is possible to apply statistical methods to determine how often a sequence motif is expected to occur by chance in a particular database. Small motifs will give many false positives. When ascertaining significance it is essential to take into account the expected composition of the bases in similar regions of the genome in question. Usually at least dinucleotide bias is taken into account.

 

In addition searching for motifs in regions of similar composition where they are known not to function can give an estimate of the false positive rate. For most patterns described in TransTerm we give an estimate of the number of hits in a typical mRNA database.

 

Motifs in coding sequences.

Much of a mRNA sequence encodes protein and is thus constrained [58],  motifs in the 5' or 3' UTRs have been easier to identify [69, 70]}. However, coding region motifs have previously been discovered experimentally [71, 72]. Computational methods to discover regulatory elements within coding are now becoming feasible, following the sequencing of many genomes [43, 73].

 

 

1.                  Gorodkin, J., S.L. Stricklin, and G. Stormo, (2001) Discovering common-stem loop motifs in unaligned RNA sequences. Nucleic Acids Res., 29:2135-2144.

2.                  Pyronnet, S. and N. Sonenberg, (2001) Cell-cycle-dependent translational control. Curr Opin Genet Dev, 11:13-8.

3.                  Cooperstock, R.L. and H.D. Lipshitz, (2001) RNA localization and translational regulation during axis specification in the Drosophila oocyte. Int Rev Cytol, 203:541-66.

4.                  Ohler, U. and H. Niemann, (2001) Identification and analysis of eukaryotic promoters: recent computational approaches. Trends Genet, 17:56-60.

5.                  Mignone, F., et al., (2005) UTRdb and UTRsite: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs. Nucleic Acids Res, 33:D141-6.

6.                  Castrignano, T., et al., (2004) CSTminer: a web tool for the identification of coding and noncoding conserved sequence tags through cross-species genome comparison. Nucleic Acids Research, 32:W624-W627.

7.                  Griffiths-Jones, S., et al., (2005) Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res, 33:D121-4.

8.                  Siebert, S. and R. Backofen, (2005) MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons. Bioinformatics, 21:3352-9.

9.                  Bindewald, E. and B.A. Shapiro, (2006) RNA secondary structure prediction from sequence alignments using a network of k-nearest neighbor classifiers. RNA, 12:342-52.

10.              Dandekar, T., ed. RNA Motifs and Regulatory Elements. 2002, Springer-Velag: Berlin.

11.              Jacobs, G.H., P.A. Stockwell, W.P. Tate, and C.M. Brown, (2006) Transterm--extended search facilities and improved integration with other databases. Nucleic Acids Res, 34:D37-40.

12.              Nury, D., H. Chabanon, M. Levadoux-Martin, and J. Hesketh, (2005) An eleven nucleotide section of the 3'-untranslated region is required for perinuclear localization of rat metallothionein-1 mRNA. Biochem J, 387:419-28.

13.              Kloc, M. and L.D. Etkin, (2005) RNA localization mechanisms in oocytes. J Cell Sci, 118:269-82.

14.              Islam, S., R.K. Montgomery, J.J. Fialkovich, and R.J. Grand, (2005) Developmental and regional expression and localization of mRNAs encoding proteins involved in RNA translocation. J Histochem Cytochem, 53:1501-9.

15.              Darnell, J.C., O. Mostovetsky, and R.B. Darnell, (2005) FMRP RNA targets: identification and validation. Genes Brain Behav, 4:341-9.

16.              Colegrove-Otero, L.J., N. Minshall, and N. Standart, (2005) RNA-binding proteins in early development. Crit Rev Biochem Mol Biol, 40:21-73.

17.              Chabanon, H., I. Mickleburgh, B. Burtle, C. Pedder, and J. Hesketh, (2005) An AU-rich stem-loop structure is a critical feature of the perinuclear localization signal of c-myc mRNA. Biochem J, 392:475-83.

18.              Bakheet, T., M. Frevel, B.R. Williams, W. Greer, and K.S. Khabar, (2001) ARED: human AU-rich element-containing mRNA database reveals an unexpectedly diverse functional repertoire of encoded proteins. Nucleic Acids Res, 29:246-54.

19.              Wang, J., M. Pitarque, and M. Ingelman-Sundberg, (2006) 3'-UTR polymorphism in the human CYP2A6 gene affects mRNA stability and enzyme expression. Biochem Biophys Res Commun, 340:491-7.

20.              Xu, Y.Z., S. Di Marco, I. Gallouzi, M. Rola-Pleszczynski, and D. Radzioch, (2005) RNA-binding protein HuR is required for stabilization of SLC11A1 mRNA and SLC11A1 protein expression. Mol Cell Biol, 25:8139-49.

21.              Barreau, C., L. Paillard, and H.B. Osborne, (2005) AU-rich elements and associated factors: are there unifying principles? Nucleic Acids Res, 33:7138-50.

22.              Jing, Q., et al., (2005) Involvement of microRNA in AU-rich element-mediated mRNA instability. Cell, 120:623-34.

23.              Coller, J. and R. Parker, (2005) General translational repression by activators of mRNA decapping. Cell, 122:875-86.

24.              Ostareck, D.H., A. Ostareck-Lederer, I.N. Shatsky, and M.W. Hentze, (2001) Lipoxygenase mRNA silencing in erythroid differentiation: The 3' UTR regulatory complex controls 60S ribosomal subunit joining. Cell, 104:281-290.

25.              Dean, K.A., A.K. Aggarwal, and R.P. Wharton, (2002) Translational repressors in Drosophila. Trends Genet, 18:572-7.

26.              Cok, S.J. and A.R. Morrison, (2001) The 3'-untranslated region (3' UTR) of murine cyclooxygenase-2 contains multiple regulatory elements that alter message stability and translational efficiency. J Biol Chem, 9:9.

27.              Dreher, T.W. and W.A. Miller, (2006) Translational control in positive strand RNA plant viruses. Virology, 344:185-97.

28.              Oh, B., S.Y. Hwang, J. McLaughlin, D. Solter, and B.B. Knowles, (2000) Timely translation during the mouse oocyte-to-embryo transition. Development, 127:3795-3803.

29.              Crucs, S., S. Chatterjee, and E.R. Gavis, (2000) Overlapping but distinct RNA elements control repression and activation of nanos translation. Molecular Cell, 5:457-467.

30.              Milligan, L., C. Torchet, C. Allmang, T. Shipman, and D. Tollervey, (2005) A nuclear surveillance pathway for mRNAs with defective polyadenylation. Mol Cell Biol, 25:9996-10004.

31.              Prasanth, K.V., et al., (2005) Regulating gene expression through RNA nuclear retention. Cell, 123:249-63.

32.              Jackson, R.J., (2005) Alternative mechanisms of initiating translation of mammalian mRNAs. Biochem Soc Trans, 33:1231-41.

33.              Pilipenko, E.V., E.G. Viktorova, S.T. Guest, V.I. Agol, and R.P. Roos, (2001) Cell-specific proteins regulate viral RNA translation and virus-induced disease. Embo J, 20:6899-908.

34.              Kwok, L.W., et al., (2006) Concordant exploration of the kinetics of RNA folding from global and local perspectives. J Mol Biol, 355:282-93.

35.              Mokrejs, M., et al., (2006) IRESite: the database of experimentally verified IRES structures (www.iresite.org). Nucleic Acids Res, 34:D125-30.

36.              Herr, A.J., J.F. Atkins, and R.F. Gesteland, (2000) Coupling of open reading frames by translational bypassing. Annu Rev Biochem, 69:343-72.

37.              Yu, E.T., Q. Zhang, and D. Fabris, (2005) Untying the FIV frameshifting pseudoknot structure by MS3D. J Mol Biol, 345:69-80.

38.              Baranov, P.V., O.L. Gurvich, A.W. Hammer, R.F. Gesteland, and J.F. Atkins, (2003) Recode 2003. Nucleic Acids Res, 31:87-9.

39.              Baranov, P.V., et al., (2001) RECODE: a database of frameshifting, bypassing and codon redefinition utilized for gene expression. Nucleic Acids Res, 29:264-7.

40.              Baranov, P.V., O. Fayet, R.W. Hendrix, and J.F. Atkins, (2006) Recoding in bacteriophages and bacterial IS elements. Trends Genet, 22:174-81.

41.              Ivanov, I.P., R.F. Gesteland, and J.F. Atkins, (2006) Evolutionary specialization of recoding: Frameshifting in the expression of S. cerevisiae antizyme mRNA is via an atypical antizyme shift site but is still +1. RNA, 12:332-7.

42.              Tork, S., I. Hatin, J.P. Rousset, and C. Fabret, (2004) The major 5' determinant in stop codon read-through involves two adjacent adenines. Nucleic Acids Res, 32:415-21.

43.              Firth, A.E. and C.M. Brown, (2006) Detecting overlapping coding sequences in virus genomes. BMC Bioinformatics, 7:75.

44.              Harrell, L., U. Melcher, and J.F. Atkins, (2002) Predominance of six different hexanucleotide recoding signals 3' of read-through stop codons. Nucleic Acids Res, 30:2011-7.

45.              Castellano, S., et al., (2001) In silico identification of novel selenoproteins in the Drosophila melanogaster genome. EMBO Rep, 2:697-702.

46.              Small-Howard, A., et al., (2006) Supramolecular complexes mediate selenocysteine incorporation in vivo. Mol Cell Biol, 26:2337-46.

47.              Allamand, V., et al., (2006) A single homozygous point mutation in a 3'untranslated region motif of selenoprotein N mRNA causes SEPN1-related myopathy. EMBO Rep.

48.              Cartegni, L., J. Wang, Z. Zhu, M.Q. Zhang, and A.R. Krainer, (2003) ESEfinder: A web resource to identify exonic splicing enhancers. Nucleic Acids Res, 31:3568-71.

49.              Xu, D.Q. and W. Mattox, (2006) Identification of a splicing enhancer in MLH1 using COMPARE, a new assay for determination of relative RNA splicing efficiencies. Hum Mol Genet, 15:329-36.

50.              Wu, Y., Y. Zhang, and J. Zhang, (2005) Distribution of exonic splicing enhancer elements in human genes. Genomics, 86:329-36.

51.              Zatkova, A., et al., (2004) Disruption of exonic splicing enhancer elements is the principal cause of exon skipping associated with seven nonsense or missense alleles of NF1. Hum Mutat, 24:491-501.

52.              Wang, J., P.J. Smith, A.R. Krainer, and M.Q. Zhang, (2005) Distribution of SR protein exonic splicing enhancer motifs in human protein-coding genes. Nucleic Acids Res, 33:5053-62.

53.              Perkins, D.O., C. Jeffries, and P. Sullivan, (2005) Expanding the 'central dogma': the regulatory role of nonprotein coding genes and implications for the genetic liability to schizophrenia. Mol Psychiatry, 10:69-78.

54.              Rusinov, V., V. Baev, I.N. Minkov, and M. Tabler, (2005) MicroInspector: a web tool for detection of miRNA binding sites in an RNA sequence. Nucleic Acids Res, 33:W696-700.

55.              Zhang, Y., (2005) miRU: an automated plant miRNA target prediction server. Nucleic Acids Res, 33:W701-4.

56.              Hsu, P.W., et al., (2006) miRNAMap: genomic maps of microRNA genes and their target genes in mammalian genomes. Nucleic Acids Res, 34:D135-9.

57.              Xie, X., et al., (2005) Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature, 434:338-45.

58.              Meyer, I.M. and I. Miklos, (2005) Statistical evidence for conserved, local secondary structure in the coding regions of eukaryotic mRNAs and pre-mRNAs. Nucleic Acids Res, 33:6338-48.

59.              Zuker, M. and A.B. Jacobson, (1998) Using reliability information to annotate RNA secondary structures. RNA, 4:669-79.

60.              Hofacker, I.L., (2003) Vienna RNA secondary structure server. Nucleic Acids Res, 31:3429-31.

61.              Williamson, J.R., (2001) Proteins that bind RNA and the labs who love them. Nat Struct Biol, 8:390-1.

62.              Reeder, J. and R. Giegerich, (2005) Consensus shapes: an alternative to the Sankoff algorithm for RNA consensus structure prediction. Bioinformatics, 21:3516-23.

63.              Busch, A., S. Will, and R. Backofen, (2005) SECISDesign: a server to design SECIS-elements within the coding sequence. Bioinformatics, 21:3312-3.

64.              Paraskeva, E., N.K. Gray, B. Schlager, K. Wehr, and M.W. Hentze, (1999) Ribosomal pausing and scanning arrest as mechanisms of translational regulation from cap-distal iron-responsive elements. Mol Cell Biol, 19:807-16.

65.              Tsuji, Y., et al., (2000) Coordinate transcriptional and translational regulation of ferritin in response to oxidative stress. Molecular & Cellular Biology, 20:5818-5827.

66.              Pavesi, G., G. Mauri, M. Stefani, and G. Pesole, (2004) RNAProfile: an algorithm for finding conserved secondary structure motifs in unaligned RNA sequences. Nucleic Acids Res, 32:3258-69.

67.              Washietl, S., I.L. Hofacker, M. Lukasser, A. Huttenhofer, and P.F. Stadler, (2005) Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat Biotechnol, 23:1383-90.

68.              Hofacker, I.L., M. Fekete, and P.F. Stadler, (2002) Secondary structure prediction for aligned RNA sequences. J Mol Biol, 319:1059-66.

69.              Winkler, W.C. and R.R. Breaker, (2005) Regulation of bacterial gene expression by riboswitches. Annu Rev Microbiol, 59:487-517.

70.              Shabalina, S.A., A.Y. Ogurtsov, I.B. Rogozin, E.V. Koonin, and D.J. Lipman, (2004) Comparative analysis of orthologous eukaryotic mRNAs: potential hidden functional signals. Nucleic Acids Res, 32:1774-82.

71.              Lemm, I. and J. Ross, (2002) Regulation of c-myc mRNA decay by translational pausing in a coding region instability determinant. Mol Cell Biol, 22:3959-69.

72.              Ioannidis, P., et al., (2003) CRD-BP: a c-Myc mRNA stabilizing protein with an oncofetal pattern of expression. Anticancer Res, 23:2179-83.

73.              Firth, A.E. and C.M. Brown, (2005) Detecting overlapping coding sequences with pairwise alignments. Bioinformatics, 21:282-92.

 

Chris Brown and TransTerm team

Biochemistry Department, University of Otago, Dunedin, New Zealand

Last updated 10/06.