Quick Links:
Q: There have been efforts in many software projects for more of the community to contribute in addition to the core maintainers. How much has the community been able to be converted to contributors for WARP pipelines? Does that community currently meet or discuss anywhere like in Slack?
A: The initial entry for development of WARP pipelines are primarily consortia that come to the pipelines team, and the team builds WDL-based workflows for the consortia. They haven’t hit the critical mass of contributors from the community because they don’t always have the flexibility to make all the changes that they need to see. This concept is still in progress. There is more critical mass around the individual pipelines (e.g., GATK), and that community has places where they meet. The WARP pipelines team has put as much as they can in place for users to fork the repo in order to customize and contribute to the pipelines. Many folks have been very active in contributing to the documentation, which is amazing.
Q: WARP pipelines might be a nice place to bring together the community of WDL developers. For production scale, it seems like the pipelines are rock solid, but there may be more community developers who are working at the community scale, and aiming for general WDL developers could be a good way to bring people in, and Nextflow would be a good community to connect with.
A: The idea has come up to support other groups that want to make changes to the production versions, and Nextflow sounds like a good community to engage.
AnVIL is in active development to be deployed on Microsoft Azure as well, so there are other communities that are working in different ways that would be great to plug into.
Q: You described testing of the plumbing and scientific testing of the workflows, primarily as pipeline updates move through the development process. Is the testing environment similar to the various execution environments where users might download and run the workflows or may run them on AnVIL?
A: Yes, the testing infrastructure has developed a lot. The workflows run in WDL 1.0 syntax and are run on Cromwell in a Google Cloud virtual machine. We want to move testing infrastructure in Terra, and aim to be building towards this as a first step to open the world to testing both on GCP and Microsoft Azure. The pipelines are in the process of testing in Azure as well.
Q: Do workflows get updated automatically in AnVIL? If there is a new release is that automatically pulled into a workspace or would I as a user need to bring it into a workspace myself?
A: The version of the workflow in the workspace stays the same until a user updates the version. In the publicly released example workspaces that feature WARP pipelines, there is an internal process where Kaylee makes a clone (a copy) of the workspace to make sure the latest released version is running properly before updating the official workspace.
If you make a clone, you can go and update it yourself if the release of the new workflow version is publicly released in Dockstore. With the public workspaces, if the workflow version is not already at the latest release, that is because it has not been tested quite yet, so there might be a possibility that modifications in the new version require new inputs or outputs and wouldn’t run properly.
Dockstore will pull in new releases for you from the AnVIL Workflows tab, in the specific workflow. There is a dropdown menu that you can see all the workflows in the Dockstore organization, and you can scroll down to the relevant workflow, for the demo workspace that would be Optimus. There is an active feature request to make that dropdown searchable. All of the workflow versions are branches in the repository, and can make a branch and do personal testing in the AnVIL environment, if it is in Dockstore and public.
Q: Is that the case for only released workflows? Can a user pull in a non-released workflow?
A: There is not a latest tag, so you would have to intentionally go in and grab a branch. There might be some snags. Users have the option to choose different branches and you might have the wrong branch set up.
The best practice suggestion is to clone a workspace and use the analysis as-is.
Q: Are there any new WARP pipelines about to be announced?
A: The WARP pipelines github repository is open access and public, so the benefit of that is that anyone can see what is active development! Exciting pipelines on the way are a methalome pipeline, combining methalome and Hi-C data. Another pipeline that is almost ready is multiome for 10x, combining ATAC-seq and expression analysis, with new features and validations around cell range. A lot of collaborations go into these.
Q: Dockstore only shows a subset pipelines that seem to be available on the documentation site.
A: There may be a few extra steps to add the pipelines to the WARP “collection” in the Broad “organization” on Dockstore. We will follow up on making sure those are all displayed.
There are also going to be more workspaces in AnVIL than the number of pipelines. There are workspaces in using the warp-pipelines tag that have been used for workshops and some consortia has asked to develop a featured workspace for their tools.
Have more questions for the WARP team? Email them at warp-pipelines-help@broadinstitute.org.