In an AnVIL workspace, it’s very convenient to be able to pull in Reference Data by specifying a name e.g. “hg38”, which then provides convenient access to files such as “Homo_sapiens_assembly38.fasta”.
It would be great to also provide GENCODE RNA transcripts sets, as these are widely used, versioned, and formatted for many downstream analyses.
Hi Michael. Thank you - it is a great idea to provide the widely-used GENCODE RNA transcripts as a reference on AnVIL. While we are figuring out the best mechanism forward for adding additional reference data to AnVIL, perhaps you could clarify if adding one file (or only a few) - e.g. the latest gencode.v49.transcripts.fa.gz - is what you are after, or, if you would like to see all of the gencode transcript-associated data available as reference (https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_49/)?
Thank you!
hi Valeriya,
In long read RNA-seq workflows, we typically use the Fasta of all human transcripts, i.e. what you reference above. Being able to specify the version and then be able to point to this in workflows would be super convenient.
The GTF files are also useful, but there are many versions, so there may be a little more fragmentation there.