AnVIL Demo: Open Discussion Forum on July 16, 2025

Join our next AnVIL Demo!

Topic: Open Discussion Forum

July 16, 2025 at 10:00 AM EST (your time zone) on Zoom

10:00 AM - 11:00 AM EST – Open Forum Discussion

In this meeting, we’ll have an open forum to chat about AnVIL, answer any questions you might have, and point you to any resources you might be looking for.

:pencil2: Sign up: https://forms.gle/7CcaLE9AM7FrYqpP7

What are AnVIL Demos?

AnVIL Demos are a monthly, virtual meeting where we highlight what you can do on the NHGRI Analysis, Visualization, and Informatics Lab-space (AnVIL), a cloud-based computing platform for genomic data science! AnVIL Demos will start out with a 30-minute demonstration on the platform followed by open time for Q&A and user support.

The demos will highlight a range of topics, from a capability of the platform to a scientific analysis powered by AnVIL. If you’re interested in showcasing how you use AnVIL at a future AnVIL Demos session, reach out to Natalie Kucher (nkucher3@jhu.edu). After the demo, we’ll open up the floor to answer questions about the demo and to answer any general questions you might have about AnVIL.

:play_or_pause_button: Watch our past Demos from our YouTube playlist!

Resources

Upcoming Events

Sign up to hear about future AnVIL Demos and announcements at bit.ly/anvil-mailing-list and learn about upcoming events at https://anvilproject.org/events!

Q: How can we view which Google Billing Account (GCP) is associated with a Terra Project (on AnVIL)?

A: Go to the Billing menu in Terra, look under the Terra Project name.

If you can’t view the GCP Billing Account in Terra, you likely don’t have permissions to see what GCP Billing Account is linked. If appropriate, you can look at the Terra Billing Project owners (separate tab) and contact them to get more information.

We recommend that the GCP Billing Account and Terra Billing Project not have the same name. It will make things a bit confusing to track on the GCP Billing menu.

Note that the pin can be used on GCP to follow along with billing a bit more easily, assuming you have permissions.

Q: Can you change the Billing Account / Project names?

A: In GCP Console, you can rename Billing Accounts if you have sufficient permissions. You cannot rename a billing project through the AnVIL website. Note that the Terra Billing Project is part of the URL for the Terra Workspace, which could break links. It might be better to create a new Terra Billing Project associated with the same GCP Billing Account.

Q: I’m getting an error when I’m trying to combine two pipelines in two workspaces. The second workflow draws from bam files that are created in first workspace. But upon running the second workflow in the second workspace, I’m getting what looks like a permission error.

A: It seems that the permission error is because you can’t see the log file. However, the logged error is shown in Terra. The team suggests running the workflow on the workflow test data. Could be a problem with the workflow or with GATK. As long as the Terra Workspace is owned or can be shared by the same user, there shouldn’t be an access issue across workspaces.

Q: BigQuery has come up a couple of times now at the AnVIL Demos call. Does https://support.terra.bio/hc/en-us/articles/360051229072 indicate that BigQuery can in fact be used through AnVIL to for example load tons of VCF data and query for subsets?

A: Yes, similar to accessing external GCP buckets, you can set up a Google Billing Project (not the one created by Terra) and sync permissions using your Terra proxy group. If you create a BigQuery table of variants, then you can query and subset. This billing is all separate from Terra, so you would need to track and manage your own costs.

There are very limited examples of using BigQuery to store genomic data which is traditionally saved as .vcf.gz file, although I am aware that gnomAD is using BigQuery to process and store vcf file for it large cohort. Therefore, I cannot stop wondering why. I think it could be very feasible to query a handful of variants. but I am not fully convinced it is the tool to store vcf file for efficiently querying genomewide loci of interest. Maybe I just need to see more example code to use BigQuery to query the variants.

The other alternative is to use HAIL matrix table. I am debating which tool gives me more flexibility and greater potential to integrate with other softwares for downstream analysis.

My analysis situation is the following, I have whole genome sequence data for a cohort of ~1500 samples. For those individuals, I also have their gene-burden count for rare variants at various location throughout genome. For my analysis, I may not use all those samples in a single analysis. So I would need to dissect my genomic data file by regions and by sample quiet frequently. I am trying to avoid load a 200 GB vcf.gz files into my VM each time I need to slide off a subset of samples and variants.

Also some cost estimate for using BigQuery to hold genomic data would be very helpful. Say, for 500 srWGS of 5-10 million variants, how large would the BigQuery gs be? For BigQuery, in addition to storage cost, how google charge for BigQuery service?