Request: RNA transcript sequence as additional AnVIL reference data

Michael_Love · March 12, 2026, 3:21pm

In an AnVIL workspace, it’s very convenient to be able to pull in Reference Data by specifying a name e.g. “hg38”, which then provides convenient access to files such as “Homo_sapiens_assembly38.fasta”.

It would be great to also provide GENCODE RNA transcripts sets, as these are widely used, versioned, and formatted for many downstream analyses.

Valeriya_Gaysinskaya · May 15, 2026, 5:05pm

Hi Michael. Thank you - it is a great idea to provide the widely-used GENCODE RNA transcripts as a reference on AnVIL. While we are figuring out the best mechanism forward for adding additional reference data to AnVIL, perhaps you could clarify if adding one file (or only a few) - e.g. the latest gencode.v49.transcripts.fa.gz - is what you are after, or, if you would like to see all of the gencode transcript-associated data available as reference (https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_49/)?

Thank you!

Michael_Love · June 1, 2026, 7:13pm

hi Valeriya,

In long read RNA-seq workflows, we typically use the Fasta of all human transcripts, i.e. what you reference above. Being able to specify the version and then be able to point to this in workflows would be super convenient.

The GTF files are also useful, but there are many versions, so there may be a little more fragmentation there.

Topic		Replies	Views
AnVIL Demo: Variant Analysis using the VRS Toolkit in AnVIL on March 19, 2025 AnVIL Demos	1	97	March 19, 2025
About the Feature Requests category Feature Requests	0	17	February 18, 2025
Finding a specific reference file for GTEx Data Access	5	500	April 1, 2022
AnVIL Office Hours 15DEC2022 @ 11 AM ET AnVIL Demos	1	338	December 15, 2022
AnVIL Demo: Using the Human Pangenome on AnVIL on April 27 AnVIL Demos	0	374	April 21, 2023

Request: RNA transcript sequence as additional AnVIL reference data

Related topics