AnVIL Demo: Variant Analysis using the VRS Toolkit in AnVIL on March 19, 2025

nakucher · March 12, 2025, 1:57pm

Topic: Variant Analysis using the VRS Toolkit in AnVIL

March 19, 2025 at 10:00 AM EST (your time zone) on Zoom

11:00 AM - 11:30 AM EDT – Demo on AnVIL

In this demo, Quinn Wai Wong from OHSU will be covering how to maximize VRS tooling on AnVIL. From launching Terra workflows to running Jupyter analyses, learn more about using VRS in your variant analysis!

11:30 AM - 12:00 PM EDT – Q&A

We’ll open up the floor to answer questions about the demo or any general questions about AnVIL you might have!

Sign up: https://forms.gle/7CcaLE9AM7FrYqpP7

What are AnVIL Demos?

AnVIL Demos are a monthly, virtual meeting where we highlight what you can do on the NHGRI Analysis, Visualization, and Informatics Lab-space (AnVIL), a cloud-based computing platform for genomic data science! AnVIL Demos will start out with a 30-minute demonstration on the platform followed by open time for Q&A and user support.

The demos will highlight a range of topics, from a capability of the platform to a scientific analysis powered by AnVIL. If you’re interested in showcasing how you use AnVIL at a future AnVIL Demos session, reach out to Natalie Kucher (nkucher3@jhu.edu). After the demo, we’ll open up the floor to answer questions about the demo and to answer any general questions you might have about AnVIL.

Watch our past Demos from our YouTube playlist!

Sign up for more demos: https://forms.gle/7CcaLE9AM7FrYqpP7

Resources

AnVIL Getting Started Guide. An opinionated guide on how to get set up and start your work on AnVIL. Get started here.
AnVIL YouTube Channel. Learn about AnVIL capabilities and features with our “AnVIL in 2 minutes” videos. Watch AnVIL in 2 minutes here or watch past AnVIL Demos here.
AnVIL Support Forum. Join the conversation at help.anvilproject.org.

Upcoming Events

Sign up to hear about future AnVIL Demos and announcements at bit.ly/anvil-mailing-list and learn about upcoming events at https://anvilproject.org/events!

nakucher · March 19, 2025, 3:00pm

Demo Workspace: Terra

Q: Will VCF to VRS variant conversion sometimes result in a difference in chromosome coordinates from the original VCF?

A: Yes. First, the coordinates used by VRS are inter-residue, so the VRS location start value is typically 1 lower than the VCF position value. Second, due to full-length normalization, there will be places where the interval is larger than it is in VCF.

Q: After converting a VCF to VRS file, is there a summary of changes or differences?

A: There are no changes made to the VCF positions directly. The annotated VCF includes two new fields, one that describes the VRS Allele IDs, another that specifies the VRS coordinates.

Q: I see, so the output VCF file is the original VCF + new fields?

A: Yes; specifically, new key-value pairs added to the VCF INFO field with this information (VRS_ID, VRS_Start, VRS_End, VRS_State keys)

Q: Can you share more on possible annotation sources that are available?

A: By default, VRS makes use of SeqRepo, which is a software package that allows high-throughput retrieval of sequence content from different sequence assemblies, including GrCh37 and 38, also transcript and protein sequence collections. SeqRepo is more like an underlying library to do full length normalization and computed digests. For justification representation of variants, it requires knowledge of the context of the sequence.

Major community resources that have adopted it are gnomAD and MAVE-DB. VRS was particularly helpful to define the experimental sequences in a consistent basis. There are many private implementations of VRS, too. VRS computed identifiers are great for aggregating and indexing.

Q: Are many annotations looking for where the variant has been observed? Or looking for additional annotations of the variant?

A: Computed identifiers help consistently look across resources to find evidence or observations of these variants before. VRS as a data model is helpful for attaching lots of different evidence and knowledge statements to variant subjects. ClinVar has a mix of high level VCV classifications as well as underlying SCV assertions. You can go farther and extract the evidence lines used. These can point to the same variant or related variants. VRS is used to create documents to describe them.

MetaKB is on somatic cancer knowledge side to allow representations that combine evidence in the different contexts that they’re observed.

Q: Are there still challenges with long reads and structural variants that VRS is tackling?

A: Some limitations that have been addressed - one barrier to using full-length justification is that when you have very large repetitive regions, you have records with very large alternate alleles (when the diff could be 3 nucleotides). This is a challenge when conventions of VCF and HGVS are shorter and better. There was a recent standard version update, VRS 2, with a reference length encoding pointing to the region, reference subsequence that is repeating, and change to reference length as a result of the variant using integer representation.

VRS 2 also captures adjacencies under the same vocabulary. SVs where you have movements of entire regions with rotations and various orientations, VRS 2 is handling this as well. Cool to see union of SV and small variant world using shared vocabularity and structures.

Big thing on the horizon is the pangenome, or graph and kmer representation of pangenomes and variants that exist on these graphs. The VRS group is working with the HPRC community to build compatibility to use the same data objects to represent these variants in a pangenome. They will discuss this at GA4GH Connect in April 2025.

Q: What about if a researcher uses VRS Annotator, identifies a cool variant underlying a cool phenotype. If they publish, do they use a VRS ID? How do they communicate it to the wider community?

A: There are many communities that use their own conventions to describe a variant. VRS is not being perscriptive of how to call it, just how to use the data of a variant. In conjuction with Categorical Variant Representation Specification (CatVRS), have ability to create JSON structures that represent variants in any context. Covers EGFR loss of function to BRAF r600e, can all be described as a JSON object that has an ID that can be VRS computed ID or resource specific ID.

You can add this JSON document as a supplementary file as submission to a journal, and in reference text, you just use your ID that ties it to the representational specification, and can still follow the norm for the community.

Topic		Replies	Views
AnVIL Demo: Epigenetics in AnVIL on September 21 AnVIL Demos	1	260	September 21, 2023
AnVIL Demo: Open Discussion Forum on April 16, 2025 AnVIL Demos	1	24	April 16, 2025
AnVIL Demos: Learn how to use data across multiple AnVIL workspaces on June 22 AnVIL Demos	1	246	June 22, 2023
AnVIL Demo: Interactive Genomic Data Science with Bioconductor on July 20, 2023 AnVIL Demos	0	16	July 20, 2023
AnVIL Demo: Using the Human Pangenome on AnVIL on April 27 AnVIL Demos	0	350	April 21, 2023

AnVIL Demo: Variant Analysis using the VRS Toolkit in AnVIL on March 19, 2025

Topic: Variant Analysis using the VRS Toolkit in AnVIL

What are AnVIL Demos?

Resources

Upcoming Events

Related topics