AnVIL Demo: WARP Pipelines in AnVIL/Terra on May 18

Upcoming AnVIL Demos

:pencil2: Sign up: https://forms.gle/7CcaLE9AM7FrYqpP7

What are AnVIL Demos?

AnVIL Demos are a monthly, virtual meeting where we highlight what you can do on the NHGRI Analysis, Visualization, and Informatics Lab-space (AnVIL), a cloud-based computing platform for genomic data science! AnVIL Demos will start out with a 30-minute demonstration on the platform followed by open time for Q&A and user support.

The demos will highlight a range of topics, from a capability of the platform to a scientific analysis powered by AnVIL. If you’re interested in showcasing how you use AnVIL at a future AnVIL Demos session, reach out to Natalie Kucher (nkucher3@jhu.edu).

After the demo, we’ll open up the floor to answer questions about the demo and to answer any general questions you might have about AnVIL.

May Demo: WARP Pipelines in AnVIL/Terra

When: May 18, 2023 at 11:00 AM ET (your time zone) on Zoom

11:00 AM - 11:30 AM ET – Demo on AnVIL

Our partners from the Broad User Education and Pipeline Development teams will share:

  • An overview of the WDL Analysis Research Pipelines (WARP), a GitHub repository for cloud-based WDL workflows
  • How the WARP team develops, tests, and shares workflows in Dockstore and AnVIL/Terra
  • Where to find public WARP workspaces in AnVIL/Terra so you can try them at low cost

11:30 AM - 12:00 PM ET – Q&A

We’ll open up the floor to questions about the demo presented, and will have AnVIL and Terra support on call to answer any questions about AnVIL you might have!

:pencil2: Sign up for more demos: https://forms.gle/7CcaLE9AM7FrYqpP7

Upcoming Demos

May 18, 2023 – WARP Pipelines in AnVIL/Terra

June 22, 2023 – How to use data across multiple AnVIL workspaces

July 20, 2023 – Interactive Genomic Data Science with Bioconductor

August 24, 2023 – Galaxy on AnVIL

September 21, 2023 – Epigenetics in AnVIL

Resources

Upcoming Events

Learn about the upcoming AnVIL events at AnVIL Community Events!

Quick Links:

Q: There have been efforts in many software projects for more of the community to contribute in addition to the core maintainers. How much has the community been able to be converted to contributors for WARP pipelines? Does that community currently meet or discuss anywhere like in Slack?

A: The initial entry for development of WARP pipelines are primarily consortia that come to the pipelines team, and the team builds WDL-based workflows for the consortia. They haven’t hit the critical mass of contributors from the community because they don’t always have the flexibility to make all the changes that they need to see. This concept is still in progress. There is more critical mass around the individual pipelines (e.g., GATK), and that community has places where they meet. The WARP pipelines team has put as much as they can in place for users to fork the repo in order to customize and contribute to the pipelines. Many folks have been very active in contributing to the documentation, which is amazing.

Q: WARP pipelines might be a nice place to bring together the community of WDL developers. For production scale, it seems like the pipelines are rock solid, but there may be more community developers who are working at the community scale, and aiming for general WDL developers could be a good way to bring people in, and Nextflow would be a good community to connect with.

A: The idea has come up to support other groups that want to make changes to the production versions, and Nextflow sounds like a good community to engage.

AnVIL is in active development to be deployed on Microsoft Azure as well, so there are other communities that are working in different ways that would be great to plug into.

Q: You described testing of the plumbing and scientific testing of the workflows, primarily as pipeline updates move through the development process. Is the testing environment similar to the various execution environments where users might download and run the workflows or may run them on AnVIL?

A: Yes, the testing infrastructure has developed a lot. The workflows run in WDL 1.0 syntax and are run on Cromwell in a Google Cloud virtual machine. We want to move testing infrastructure in Terra, and aim to be building towards this as a first step to open the world to testing both on GCP and Microsoft Azure. The pipelines are in the process of testing in Azure as well.

Q: Do workflows get updated automatically in AnVIL? If there is a new release is that automatically pulled into a workspace or would I as a user need to bring it into a workspace myself?

A: The version of the workflow in the workspace stays the same until a user updates the version. In the publicly released example workspaces that feature WARP pipelines, there is an internal process where Kaylee makes a clone (a copy) of the workspace to make sure the latest released version is running properly before updating the official workspace.

If you make a clone, you can go and update it yourself if the release of the new workflow version is publicly released in Dockstore. With the public workspaces, if the workflow version is not already at the latest release, that is because it has not been tested quite yet, so there might be a possibility that modifications in the new version require new inputs or outputs and wouldn’t run properly.

Dockstore will pull in new releases for you from the AnVIL Workflows tab, in the specific workflow. There is a dropdown menu that you can see all the workflows in the Dockstore organization, and you can scroll down to the relevant workflow, for the demo workspace that would be Optimus. There is an active feature request to make that dropdown searchable. All of the workflow versions are branches in the repository, and can make a branch and do personal testing in the AnVIL environment, if it is in Dockstore and public.

Q: Is that the case for only released workflows? Can a user pull in a non-released workflow?

A: There is not a latest tag, so you would have to intentionally go in and grab a branch. There might be some snags. Users have the option to choose different branches and you might have the wrong branch set up.

The best practice suggestion is to clone a workspace and use the analysis as-is.

Q: Are there any new WARP pipelines about to be announced?

A: The WARP pipelines github repository is open access and public, so the benefit of that is that anyone can see what is active development! Exciting pipelines on the way are a methalome pipeline, combining methalome and Hi-C data. Another pipeline that is almost ready is multiome for 10x, combining ATAC-seq and expression analysis, with new features and validations around cell range. A lot of collaborations go into these.

Q: Dockstore only shows a subset pipelines that seem to be available on the documentation site.

A: There may be a few extra steps to add the pipelines to the WARP “collection” in the Broad “organization” on Dockstore. We will follow up on making sure those are all displayed.

There are also going to be more workspaces in AnVIL than the number of pipelines. There are workspaces in using the warp-pipelines tag that have been used for workshops and some consortia has asked to develop a featured workspace for their tools.

Have more questions for the WARP team? Email them at warp-pipelines-help@broadinstitute.org.