AnVIL Demo: Open Discussion Forum on April 16, 2025

Join our next AnVIL Demo!

Topic: Open Discussion Forum

April 16, 2025 at 10:00 AM EST (your time zone) on Zoom

10:00 AM - 11:00 AM EST – Open Forum Discussion

In this meeting, we’ll have an open forum to chat about AnVIL, answer any questions you might have, and point you to any resources you might be looking for.

:pencil2: Sign up: https://forms.gle/7CcaLE9AM7FrYqpP7

What are AnVIL Demos?

AnVIL Demos are a monthly, virtual meeting where we highlight what you can do on the NHGRI Analysis, Visualization, and Informatics Lab-space (AnVIL), a cloud-based computing platform for genomic data science! AnVIL Demos will start out with a 30-minute demonstration on the platform followed by open time for Q&A and user support.

The demos will highlight a range of topics, from a capability of the platform to a scientific analysis powered by AnVIL. If you’re interested in showcasing how you use AnVIL at a future AnVIL Demos session, reach out to Natalie Kucher (nkucher3@jhu.edu). After the demo, we’ll open up the floor to answer questions about the demo and to answer any general questions you might have about AnVIL.

:play_or_pause_button: Watch our past Demos from our YouTube playlist!

Resources

Upcoming Events

Sign up to hear about future AnVIL Demos and announcements at bit.ly/anvil-mailing-list and learn about upcoming events at https://anvilproject.org/events!

Q: Interested in getting started on AnVIL, interested in uploading data and accessing data. Target is mainly genomic analysis, functional genomics. Has done a lot of wet lab sequencing expertise, analyzing short repeats. Interested in getting deeper experience in programming and genomic data science.

A: The 1000 Genomes project has 3202 genomes sequence data, which is very well characterized. Another project, MAGE collected RNA-Seq data from these samples and can be helpful as well. MAGE has zenodo and Dropbox links, is available on SRA as well. It is being ingested into AnVIL.


Q: Still at beginning of analysis, working on bringing more trainees into AnVIL. Comfortable with all the resources. Initially, the accounts were unable to link to cloud computing funding. It took some work with their IT department to fix issues with Google Artifact Registry.

Currently working with Kion to monitor their spending. Last meeting we discussed planning the spending and setting up budgets. They calculated the average over each month and set budget alerts. These assume a linear spending rate, but the project sometimes incurs a lot more or a lot less spending over time. This releases pressure on reporting to funding office. Kion has been a great resource to track spending and provide spending reports, so they can focus on the science. We work on AnVIL, Terra, and All of Us and Kion provides a dashboard where you can view spending in each platform and can track spending at the user level.

A: Another new resource for managing costs in AnVIL can be found here: Tools to Manage Terra Costs - Terra.

Q: Recently applied for an opportunity in the NHGRI Office of Data Science Initiative, looking for more trainees to join.


Q: Want to know operation sequence when using All of Us VariantDataset (VDS) files. It’s often much simpler with a workflow to use VCF files than VDS. Genome wide, vcf files have a lot of accompanied tools. In Hail forums, users post about filtering the intervals first. There are complications with dealing with new data formats where not a lot of information is available on how to use these. It would be helpful to have a demo on dealing with All of Us VDS files. There are challenges to using Hail, understanding how to localize files and mount the resources. Hail uses spark, so they are distributed already, so they don’t necessarily fit well as a WDL. Hail is seeming to move into new features but seems that the documentation is lagging. There is a tradeoff for waiting for Hail advances to mature and documentation to catch up vs moving forward with existing VCF pipelines and moving them to the cloud. There are also some issues encountered with accessing large numbers of files (5000+) and getting alerts that the cloud traffic is too high and cannot be done. The workaround is to copy sections to the environment. Want to use WDL, but hit a bottleneck.

A: Working in the cloud, things are scattered across location and technologies. It sounds like you’ve been able to connect with the right communities (All of Us, Hail) that are committing to the VDS file format and the underlying Hail technology.

We would love to hear your Feature Requests (including a Hail demo) Feature Requests - AnVIL. This can help us communicate with the developers on what users need.

It would be great to have Hail attend to share an AnVIL Demo. We’ve only seen Hail used in Jupyter, not WDLs. It would be great to hear where Hail is in 2025.