AnVIL Office Hours 15DEC2022 @ 11 AM ET

The AnVIL Outreach Working Group is hosting virtual AnVIL Office Hours on Thursday, December 15, 2022 at 11:00 am ET. These Office Hours are an opportunity for you to get your questions about working on AnVIL answered in person – whether you’re trying to set up a billing account, launch Galaxy or RStudio, looking for methods and featured workspaces, and more. Members of the AnVIL team will be available to help users including PIs, analysts, and data submitters get unstuck, troubleshoot issues, and discover online resources that provide further information.

Please post your questions in this thread ahead of the session!

Register here to receive the meeting link: https://forms.gle/dNnBayBF4DdJZUGw8.

Q: I’m familiar with working in an interactive terminal locally. Is there a way to work interactively in AnVIL? How can I import my large genomic datasets from Google buckets into AnVIL to use in WDL workflows? Will use small test sets with 1000 Genomes data and need to save it to use for future runs. It will eventually be a very large number of samples.

A: It would be beneficial to learn how to set up data tables in AnVIL to streamline selecting datasets for workflows. Data tables can be used as a table of reference pointing to gs locations. Then in a workflow input, you can reference the data table to select the files. The Terra documentation support site has a Getting Started section to learn about data tables: https://support.terra.bio/hc/en-us/categories/360005881492-Getting-Started. One example is the 1000 Genomes Data Table, which is already publicly available in AnVIL: Terra.

Working through the tutorials to get comfortable accessing the 1000 Genomes datasets is suggested. It will also be helpful to observe how Terra stores the outputs of the workflows. AnVIL will manage output data being stored in the workspace Google bucket for the user, which can be found in the Data Tab.

Q: I have access to some workspaces in AnVIL, but my AnVIL workspace doesn’t have that data in it. How can I access it?

A: In AnVIL, you will not need to copy the data to your own workspace - as long as you have access to the data, you will be able to reference the data where they are stored from your workspace. The Overview and Quickstart Workspaces in the Terra Documentation are a great place to start.

Q: How can I work interactively with a terminal in AnVIL?

A: You can open a terminal in AnVIL, however you would not be able to create, access, or provision nodes used for the analysis from that terminal. It can be used for data transfers, testing scripts, and running simple commands, but it differs from the HPC paradigm. Much of the development and testing of workflows would be best done locally. The default terminal has a very specific docker file configuration with access to a single VM. Using workflows, you can specify any docker file and it will provision the resources necessary to run it itself.

The terminal has its own persistent disk, so you would need to copy the data from its location to the persistent disk. Some users do use Jupyter notebooks or RStudio to do preliminary investigation/analysis or to examine workflow results.