Uta
Universal Transcript Archive: comprehensive genome-transcript alignments; multiple transcript sources, versions, and alignment methods; available as a docker image
Install / Use
/learn @biocommons/UtaREADME
uta -- Universal Transcript Archive
bringing smiles to transcript users since 2013
The UTA (Universal Transcript Archive) stores transcripts aligned to sequence references (typically genome reference assemblies). It supports aligning the same transcript to multiple references using multiple alignment methods. Specifically, it facilitates the following:
- Querying for multiple transcript sources through a single interface
- Interpreting variants reported in literature against obsolete transcript records
- Identifying regions where transcript and reference genome sequence assemblies disagree
- Comparing transcripts across from distinct sources
- Comparing transcript alignments generated by multiple methods
- Identifying ambiguities in transcript alignments
UTA is used by the hgvs package to map variants between genomic, transcript, and protein coordinates.
This code repository is primarily used for generating the UTA database. The primary interface for the database itself is via direct PostgreSQL access. (A REST interface is planned, but not yet available.)
Users can access a public instance of UTA or build their own instance of the database.
Accessing the Public UTA Instance
Invitae provides a public instance of UTA. The connection parameters are:
param | value
------------ | --------------------
host | uta.biocommons.org
port | 5432 (default)
database | uta
login | anonymous
password | anonymous
For example:
$ PGPASSWORD=anonymous psql -h uta.biocommons.org -U anonymous -d uta
Or, in Python (requires psycopg2):
> import psycopg2, psycopg2.extras
> conn = psycopg2.connect("host=uta.biocommons.org dbname=uta user=anonymous password=anonymous")
> cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
> cur.execute("select * from uta_20140210.tx_def_summary_v where hgnc='BRCA1'")
> row = cur.fetchone()
> dict(row)
{'hgnc': 'BRCA1',
'cds_md5': 'b3d16af258a759d0321d4f83b55dd51b',
'es_fingerprint': 'f91ab768a35c8db477fbf04dde6955e2',
'tx_ac': 'ENST00000357654',
'alt_ac': 'ENST00000357654',
'alt_aln_method': 'transcript',
'alt_strand': 1,
'exon_set_id': 7027,
'n_exons': 23,
'se_i': '0,100;100,199;199,253;253,331;331,420;420,560;560,666;666,712;712,789;789,4215;4215,4304;4304,4476;4476,4603;4603,4794;4794,5105;5105,5193;5193,5271;5271,5312;5312,5396;5396,5451;5451,5525;5525,5586;5586,7094',
'starts_i': [0,
100,
199,
253,
331,
420,
560,
666,
712,
789,
4215,
4304,
4476,
4603,
4794,
5105,
5193,
5271,
5312,
5396,
5451,
5525,
5586],
'ends_i': [100,
199,
253,
331,
420,
560,
666,
712,
789,
4215,
4304,
4476,
4603,
4794,
5105,
5193,
5271,
5312,
5396,
5451,
5525,
5586,
7094],
'lengths': [100,
99,
54,
78,
89,
140,
106,
46,
77,
3426,
89,
172,
127,
191,
311,
88,
78,
41,
84,
55,
74,
61,
1508],
'cds_start_i': 119,
'cds_end_i': 5711}
Installing UTA Locally
Installing with Docker (preferred)
Docker enables the distribution of lightweight, isolated packages that run on essentially any platform. When you use this approach, you will end up with a local UTA installation that runs as a local PostgreSQL process. The only requirement is Docker itself - you will not need to install PostgreSQL or any of its dependencies.
-
Define the UTA version to download. A list of available versions can be found here:
$ uta_v=uta_20241220This variable is used only for consistency in the examples that follow. Defining this variable is not required for any other reason.
The UTA version string indicates the data release date. The tag is made at the time of loading and is used to derive the filename for the database dumps and docker images. Therefore, the public c instances, database dumps, and docker images will always contain exactly the same content.
-
Fetch the UTA Docker image from Docker Hub:
$ docker pull biocommons/uta:$uta_vThis process will likely take 1-3 minutes.
-
To prevent bot scraping of the biocommons.org hosting site, you must navigate to the downloads site https://dl.biocommons.org/uta/ in a web browser, and click to download the desired database snapshot file (ends in
.pgd.gz). For example, for$uta_v=uta_20241220, that would beuta_20241220.pgd.gz.You can choose where to download it to from your browser or move it after downloading. You will use this path in the next step, to pass this file into the UTA docker container in order to initialize the database in the
$uta_vvolume. After initialized, you can delete the snapshot file. -
Run the image, using the path you have the snapshot saved to. For example
/path/to/snapshot/uta_20241220.pgd.gz. The command below expects it to be in the same directory that you run the command from. If you do not already have a docker volume called$uta_v, thedocker runcommand below will create it automatically. If you want to bind the postgres server to a port other than the default5432, you can change the-poption below to use some other number like5555(-p 127.0.0.1:5555:5432).$ docker run \ -d \ -e POSTGRES_PASSWORD=some-password-that-you-make-up \ -v $uta_v:/var/lib/postgresql/data \ --mount type=bind,source="$(pwd)/$uta_v.pgd.gz",target="/tmp/$uta_v.pgd.gz",readonly \ --name $uta_v \ -p 127.0.0.1:5432:5432 \ biocommons/uta:$uta_vThe first time you run this image, it will initialize a PostgreSQL database from the snapshot.
On subsequent runs, you can run the container by:
$ docker start $uta_v-dstarts the container in daemon (background) mode. To see progress:$ docker logs -f $uta_vYou will see messages from several processes running in parallel. Near the end, you'll see:
== You may now connect to uta. No password is required. ... 2020-05-28 22:08:45.654 UTC [1] LOG: database system is ready to accept connectionsHit Ctrl-C to stop watching logs. The container will still be running.
-
Test your installation.
With the test commands below, you should see a table dump with at least 4 lines showing schema_version, create date, license, and uta (code) version used to build the instance. If you mapped the container to a port other than
5432, change the-poption below (e.g.-p 5555).$ psql -h localhost -U anonymous -d uta -p 5432 -c "select * from $uta_v.meta" key | value ----------------+-------------------------------------------------------------------- schema_version | 1.1 created on | 2015-08-21T10:53:50.666152 license | CC-BY-SA (http://creativecommons.org/licenses/by-sa/4.0/deed.en_US uta version | 0.2.0a2.dev11+n52ed6e969cfc (4 rows) -
(Optional) To configure hgvs to use this local installation, consult the hgvs documentation
Installing from database dumps
Users should prefer the public UTA instance (uta.biocommons.org) or the Docker installation wherever possible. When those options are not available, users may wish to create a local PostgreSQL database from database dumps. Users choosing this method of installation should be experienced with PostgreSQL administration.
The public site and Docker images are built from exactly the same dumps as provided below. Building a database from these should result in a local database that is essentially identical to those options.
Due to the heterogeneity of operating systems and PostgreSQL installations, installing from database dumps is unsupported.
The following commands will likely need modification appropriate for the installation environment.
-
Download an appropriate database dump from dl.biocommons.org.
-
Create a user and database.
You may choose any username and database name you like. uta and uta_admin are likely to ease installation.
$ createuser -U postgres uta_admin $ createuser -U postgres anonymous $ createdb -U postgres -O uta_admin uta -
Restore the database.
$ uta_v=uta_20241220 $ gzip -cdq $uta_v.pgd.gz | psql -U uta_admin -1 -v ON_ERROR_STOP=1 -d uta -Eae
Developer Setup
Virtual Environment
To develop UTA, follow these steps.
-
Set up a virtual environment using your preferred method. For example:
$ python3 -m venv uta-venv $ source uta-venv/bin/activate -
Clone UTA and install:
$ git clone git@github.com:biocommons/uta.git $ cd uta $ pip install -e .[test] -
Restore a database or load a new one using the instructions above.
-
To run the tests:
$ python3 -m unittest
Docker
-
Clone UTA and build docker image:
$ git clone git@github.com:biocommons/uta.git $ cd uta $ docker build -t uta . -
Restore a database or load a new one using the instructions [ab
