UTRdb

A curated database of 5' and 3' untranslated sequences of eukaryotic mRNAs


The new version of UTRdb hosts sequences and annotations from more than 570 organisms for a total of 26,098,657 annotated UTRs.
All the UTRdb entries have been generated according to the sequence data and gene annotations available on Ensembl databases.


Searching into UTRdb is quite straightforward and also users with no bioinformatics skills can perform accurate searches across the database by:


Simple Retrieval

UTRs annotations are stored according to their genomic context and can be retrieved by providing the UTR_type (5'/3' UTR, 5' UTR, 3' UTR):


The Organism:
Note. Select "All Organisms" if not interested in a particular species.


A genomic locus (Search by Gene symbol):


Or an Ensembl Gene Id (Search by Ensembl Gene Id):


Or an Ensembl Transcript Id (Search by Transcript Id):


A simple helper is available on the right of the search term box:



Search by Gene symbol, Search by Ensembl Gene Id activate the "autocomplete function".
This in order to facilitate the selection of the right gene, Ensembl Gene Id or Ensembl Transcript Id.
Note. The autocomplete function is available for each of the organisms stored in UTRdb. Since it changes according to the genetic
makeup of each organism, it doesn't work with the 'All organisms' selection.

Avanced Retrieval

The basic functions provided by the Simple retrieval form can be extended by using the filters included with the Advanced retrieval form.




The advanced query form incorporates the functionality of the simple one and extends it.
In particular, since all UTRdb records are cataloged through a unique identifier, whose main structure is:


{UTR_type}_{genome assembly}_{transcript_id}.{transcript_level} ▶ (e.g. 3UTR_95_ENST00000565274.5)

each record can be retrieved from the advanced query form by entering its unique identifier in the "Search by Entry id" field.
Obviously the "Search by Entry id" disables all the other fields in the query.
A complete list (compressed in .gz format) of all the entries IDs stored in UTRdb can be obtained by clicking on "Entry id".




An important filter available through the advanced query form is the "Search by UTR length".
The user can search for UTRs according to a minimum and a maximum lenght (both expressed in base pairs).
The length in each box can be set manually by copy/paste and adjusted by the arrows ▲ ▼ present in each box.
The box on the left is the minimum length, while the box on the right is the maximum.
The number of boxes (2 boxes/4 boxes) changes according to the UTR_type selection (5'/3' UTR, 5' UTR, 3' UTR).
Note. The "Search by UTR length" coupled with "All Organisms" may take time (especially with very wide ranges of lengths.)




An additional filter present in the advanced query form is the "Search by UTR exons number".
This filter is very similar in operation to the search by length.
The user can search for UTRs according to a minimum and a maximum number of exons.
The numbers in each box can be set manually by copy/paste and adjusted by the arrows ▲ ▼ present in each box.
The box on the left is the minimum, while the box on the right is the maximum.
The number of boxes (2 boxes/4 boxes) changes according to the UTR_type selection (5'/3' UTR, 5' UTR, 3' UTR).
Note. The "Search by UTR exons" can be combined with the "Search by UTR length".
This combination, coupled with "All Organisms" may take time (especially with very wide ranges of lengths and exons numbers)




Upstream open reading frames (uORFs) are open reading frames (ORFs) within the 5'UTRs.
A checkbox in the advanced query allows for searching all the UTRs containing annotated uORFs.
This filter can be combined with "UTR_type", "Organism" and "Search by Gene_name/Ensembl Gene Id/Ensembl Transcript Id".
According to uORFs biological localization the uORF checkbox is available only with 5'/3' UTR and 5' UTR selection.
Note. This filter can be used as is or in combination with others (e.g. RNA editing, IREs...).




Internal ribosome entry sites (IREs), are RNA elements allowing for translation initiation. They tipically occur in the 5'UTRs.
A checkbox in the advanced query allows for searching all the UTRs containing annotated IREs (from IRESite).
This filter can be combined with "UTR_type", "Organism" and "Search by Gene_name/Ensembl Gene Id/Ensembl Transcript Id".
According to IREs biological localization the IREs checkbox is available only with 5'/3' UTR and 5' UTR selection.
Note. This filter can be used as is or in combination with others (e.g. RNA editing, IREs, CAGEs...).




CAGE (Cap Analysis of Gene Expression) is based on a series of full-length cDNA technologies previously developed at RIKEN.
The purpose of the technology is to comprehensively map the vast majority of human transcription starting sites and hence their promoters.
A checkbox in the advanced query allows for searching all the UTRs containing annotated CAGEs (from Fantom).
This filter can be combined with "UTR_type", "Organism" and "Search by Gene_name/Ensembl Gene Id/Ensembl Transcript Id".
According to CAGEs biological localization the CAGEs checkbox is available only with 5'/3' UTR and 5' UTR selection.
Note. This filter can be used as is or in combination with others (e.g. RNA editing, IREs, m6a...).




RNA editing in untranslated regions (UTRs) regulates mRNA stability/expression.
A checkbox in the advanced query allows for searching all the UTRs containing annotated RNA editing events (A-to-I deamination from REDIportal).
This filter can be combined with "UTR_type", "Organism" and "Search by Gene_name/Ensembl Gene Id/Ensembl Transcript Id".
Note. This filter can be used as is or in combination with others (uORFs, IREs...).




Rfam is a database containing information about non-coding RNA (ncRNA) families and other structured RNA elements.
Rfam derived annotations, falling whithin 5'/3'UTRs can be retrieved by using the Rfam motifs dropdown menu.
This filter works in combination with "UTR_type" and "Organism".






Note. For a complete list of all RFAM motifs and their description click on "motifs".
The click redirects to a page containing a table with rfam motifs IDs, motifs short description and a link (e.g click on RFAM00001, RFAM00002, etc.) to their Rfam respective pages.



MicroRNAs (miRNAs) tipically bind to the 3'UTRs of their target mRNAs interfering with translation.
UTRs stored in UTRdb, whose coordinates (according to TarBase) are experimentally recognized as miRNA targets, can be easily retrieved by using the Targeting miRNAs dropdown menu.
This filter works in combination with "UTR_type".



The advanced query form can be reset to the default parameters (5'/3'UTR, All organisms) by clicking on the clear button.



The Gene view module

UTRs stored in UTRdb can be explored in their genic context through a custom "Gene View module" based on Python3, Javascript HighCharts library and HTML5.
The "Gene View" page can be accessed through the main Website Menu.

The Gene View page contains a small form that allows searches by Organism, Gene symbol, Ensembl gene id, Ensembl Transcript id.
As with all other UTRdb forms (except for "All organism") search term autocompletion module is active.
Note.If "All organism" is selected, the search will be performed on the entire organism database.

After filling/selecting the required fields and clicking on submit, a Results page will be returned.
Note.If "All organism" and "Search by Gene symbol" are selected, the query will return all the entries for that gene in the different organisms
stored in UTRdb.

The results page is in tabular format and contains the following columns:

  • The name of the gene (corresponding to what was entered in the search term field of the query form).
  • The organism(s) whose genome harbors that gene.
  • The coordinates of the gene in the ensembl chromosome:start-end:strand format.
  • The number of isoforms of that gene.
  • The details of the utr associated with the different isoforms, contained in a hidden nested pop-up table.

To show the nested pop-up table of isoforms, just click on the arrow at the end of each row.
The nested table contains the different isoforms of the gene (with their ensembl identifiers) and the associated UTRdb entries.
If for a given transcript, UTRs are reported, the name of the UTRdb entries will be within a green rectangle, otherwise there will be a red rectangle with the words no 5'UTR or no 3'UTR.
By clicking on the UTRdb entries the user will be redirected to the corresponding UTRdb entry record. See below in the page for further details.

By clicking on the gene_name (e.g trim73) the user will be able to see the structure of the transcripts reported in the results page.
According to colours legend, in the Transcripts Diagram:

  • ORANGE filled boxes are coding exons.
  • BLUE and RED filled boxes are respectively 5'UTR and 3'UTR (UnTranslated Regions).
  • Lines connecting the boxes are introns.
  • A vertical red line moving according to the mouse pointer is a spatial reference.
Commands:
  • Users can ZOOM IN by clicking and holding the left mouse/pad button from one edge to the other of the desired zoom area.
  • ZOOM OUT: By clicking the right mouse/pad button.
  • Scroll the view of the entire track horizontally by dragging the cursor to the left or right (once zoomed in).

By mousing over each transcript diagram block, the corresponding information will be displayed.
By clicking on the UTRs blocks the user will be redirected to the corresponding UTRdb entry record. See below in the page for further details.

Entry structure

A typical utrdb entry consists of 3 main parts:

    A HEADER containing the general information of the entry and composed by:
  • The organism from which that UTR entry derives(e.g. Homo sapiens)
  • The entry name, according to the unique structure:
    {UTR_type}_{genome assembly}_{transcript_id}.{transcript_level} ▶ (e.g. 3UTR_95_ENST00000412139.6)
  • The Gene symbol (e.g. DMTF1) containing the UTR annnotation.
  • The corresponding Gene Id (e.g. ENSG00000135164) linked to its page on Ensembl database.
  • The corresponding Transcript Id (e.g. ENST00000412139) linked to its page on Ensembl database.
  • The Region (e.g. 3' UTR) to which the entry corresponds.
  • The Genomic assembly (e.g. 95) to which the annotations refer.
  • The Gene exons (e.g. 18) that are graphically displayed in the transcript diagram (see below).
  • The Transcript Length (e.g. 4460) bp
  • The Gene location in the ensembl chromosome:start-end:strand format (e.g 7:87152361-87196337:+).
    Note. By clicking on the Gene genomic coordinates a modal will pop-up containing the gene view module of the specific transcript referenced by the entry
  • The total gene length expressed in base pairs (e.g. Total Gene length 43977)
  • The UTR genomic location in the ensembl chromosome:start-end:strand format (e.g. join{7:87182165-87182337:+, 7:87184397-8718...})
  • The UTR length expressed in base pairs (e.g. UTR length 2819)

    An intermediate part containing annotations related to the utr specified by the entry, such as:
  • Gene Ontologies (e.g. taken from 5UTR_95_ENST00000425705.2)
  • Gene Orthologs (e.g. taken from 5UTR_95_ENST00000425705.2)
  • miRNAs targeting the UTR (e.g. taken from 5UTR_95_ENST00000425705.2)

  • PolyAdenilation sites (in case of 3'UTRs e.g. taken from 3UTR_95_ENST00000579850.5)
  • Conserved blocks (derived from PhastCons e.g 3UTR_95_ENST00000579850.5)

  • Conserved RNA motifs from Rfam (e.g. 5S ribosomal RNA in 3UTR_95_ENST00000552681.1)
  • NoteBy clicking on Rfam motifs the user will be redirected to a page containing the Rfam Infernal tool output (from wich the conserved RNA pattern was obtained).



  • 5'UTR Upstream Orfs (e.g. in 5UTR_95_ENST00000318602.11), calculated from UTRs fasta sequences by means of an ad-hoc Python script.


  • Variants from multiple sources (mainly from dbSNP) according to The Ensembl Variation database (e.g. from 5UTR_95_ENST00000425705.2).


  • RNA editing events (A-to-I deamination from REDIportal) (e.g from 3UTR_95_ENST00000395489.6).


  • Annotated IREs (external link to IRESite) (e.g. from 3UTR_42_FBtr0332873).


  • For each annotation, a ready-to-use excel table (.xls format) can be downloaded by pressing the associated button.

      A final part containing the sequence in fasta format of the utr related to the entry.