AnVIL Demo: Open Discussion Forum on February 19, 2025

Join our next AnVIL Demo!

Topic: Open Discussion Forum

February 19, 2025 at 10:00 AM EST (your time zone) on Zoom

11:00 AM - 12:00 PM EST – Open Forum Discussion

In this meeting, we’ll have an open forum to chat about AnVIL, answer any questions you might have, and point you to any resources you might be looking for.

:pencil2: Sign up: https://forms.gle/7CcaLE9AM7FrYqpP7

What are AnVIL Demos?

AnVIL Demos are a monthly, virtual meeting where we highlight what you can do on the NHGRI Analysis, Visualization, and Informatics Lab-space (AnVIL), a cloud-based computing platform for genomic data science! AnVIL Demos will start out with a 30-minute demonstration on the platform followed by open time for Q&A and user support.

The demos will highlight a range of topics, from a capability of the platform to a scientific analysis powered by AnVIL. If you’re interested in showcasing how you use AnVIL at a future AnVIL Demos session, reach out to Natalie Kucher (nkucher3@jhu.edu). After the demo, we’ll open up the floor to answer questions about the demo and to answer any general questions you might have about AnVIL.

Watch our past Demos from our YouTube playlist!

:pencil2: Sign up for more demos: https://forms.gle/7CcaLE9AM7FrYqpP7

Resources

Upcoming Events

Sign up to hear about future AnVIL Demos and announcements at bit.ly/anvil-mailing-list and learn about upcoming events at https://anvilproject.org/events!

Q: AnVIL Team members are clarifying how to determine whether certain controlled access datasets registered with dbGaP are on AnVIL or not: dbGaP data access not showing up on the TERRA workspaces. There may be some confusing guidance that indicates dbGaP-registered datasets are deposited and available in AnVIL, but some researchers may still be depositing those data to AnVIL or are still consortia-only access.

A: The AnVIL Data Explorer (Datasets - AnVIL Data Explorer) is currently the definitive place to find which dbGaP-registered controlled access data are in AnVIL. Researchers can search by phs ID in the filter.

Q: A researcher is planning to use GTEx data on AnVIL, has applied for data library card, and submitted a request to access the dataset. They have submitted a grant proposal, and are currently working on setting up a Google billing account, which is required for use of data storage and computing resources on AnVIL. Looking for guidance on how to get this set up.

A: One of the biggest hurdles users face is getting used to paying on the cloud. Here are a few options for setting up billing, ranked by our recommendations:

  • Your institution may have a relationship with Google Cloud Platform, so it is worth reaching out to your institution research IT to understand if this is available. Some institutions manage cloud billing accounts centrally, while others do not.
  • Another option is to set up an account with a credit card. This is only recommended for getting an account set up for exploring the platform and running minimal compute. Google provides $300 of free credits for new users, for which you must provide a credit card, so this is a possible temporary option.
  • The NIH STRIDES program (STRIDES Initiative | Data Science at NIH) aims to make cloud computing easier to pay for with existing grants. If your budget from NIH includes funds for cloud computing, you can reach out to STRIDES to establish an account which will leverage some discounts STRIDES has negotiated.
  • Cloud credit reseller companies such as Burwood and Carahsoft are also options to work with to create accounts and pay for cloud use with grant funds.

Q: A researcher would like to download some summary statistics and generated graphs from the AnVIL platform. What are the best practices to download these?

A: If your files are already in the Terra Workspace, in the Data Tab, you can find them in Files, and you will have the option to download them through the UI to your computer. If you are trying to bulk download directories, you can use gsutil commands to facilitate this. You can also transfer to another bucket outside of Terra.

Q: Are there any restrictions on downloading data from AnVIL? For example, All of Us has restrictions on downloading from its platform.

A: AnVIL doesn’t have any restrictions from transferring data to a private storage option. The researcher with access to the data is required to comply with the Data Use Agreement that is signed when acquiring access to controlled access datasets, and follow NIH best practices security for storing controlled access datasets.

A: For Jupyter notebooks and RStudio, another option is to sync the notebooks with Github via a terminal in AnVIL.

Q: A researcher shared their experience with AnVIL, noting that they most often use notebooks in their research. With AnVIL, you have the option to develop scripts from notebooks into workflows. The researcher is planning to modify the input file names, then run notebooks and collect the output files with scripting.

A: Researchers in AnVIL can explore the option of running analyses with workflows rather than notebooks for many reasons. With notebooks, the default environment often uses a single VM, but through WDL workflows in anVIL, you can more readily scale with more VMs and scale to run the analysis on more samples.

Q: The researcher is very familiar with scripting, though using workflows for scaling up makes a lot of sense to leverage the platform. For the individual code chunks, you don’t have to vary it very much. You can experiment with small amounts of code and see that the costs for cloud computing can sometimes be even cheaper than local computing. There’s a little bit of a transition to cloud computing. HPC slurm isn’t too complicated, and similar with workflows the bottleneck is figuring out the syntax, especially if you’re familiar with scripting.

Q: A researcher will be working with a high schooler who is learning about bioinformatics. They will work to get the $300 free credits and do some training.

A: This is very exciting work! As a note, the Terra Terms of Service requires users to be 18 years of age or older, or 16 years with authorization from a Non-Profit Entity: Terra.