AnVIL

Workflow execution status in Terra

We launched a Workflow on Terra on Friday over 3202 samples. The workflow has about 8 major phases that are listed as individual WDL tasks. Is there a way to summarize the overall status of the job with how many jobs have completed step 1, step 2, etc? I can see this for individual tasks but there doesnt seem to be a way to do this for all other than individually clicking on each of the 3202 tasks. There are a few scatter/gather phases but ideally could measure how many of the 3202 have completely finished step 1, step 2, etc. Is there a webpage or API to get this workflow status?

Thanks!
Mike

1 Like

From Terra Support 10:58am EST

Your request (122556) has been received and is being reviewed by our support staff.

From Terra Support 2:14pm EST

Thanks for writing in. Let me take a closer look at our offerings and get back to you.

From Terra Support 10:11am EST

I’m curious to know if the Workflow Dashboard (described in this article) gives you the information you need. This is a relatively new feature designed to give you high-level information on your workflow. If this doesn’t give you the information you desire, can you let us know how you would change the dashboard to be more ideal? The development team is looking to make continuous improvements to this module, so any feedback is greatly appreciated.

If the Workflow Dashboard doesn’t work for you, you can use jq ​ and curl ​ with the endpoints to get the information needed:
Swagger UI
Swagger UI

Note that this isn’t a supported method of retrieving the data - we wouldn’t be able to troubleshoot this

If neither of the two options quite fit the bill, I’ll be happy to file a feature request for the development team!

The Workflow Dashboard presents a really nice display of the status for a single sample, but it would also be useful to aggregate the status across all workflows launched as part of the same job submission. Specifically as part of job submission 88178824-e2b7-48f3-900c-570a4813f4bf in the t2t-nhgri/t2t-vc-processing workspace, we launched 3202 workflows on the 3202 individual samples in the data table. We can see very detailed process on individual samples (e.g. workflow fa8777b5-9221-4ef0-a639-1c300f0518d0), but it would be helpful to monitor the progress of all of them together. It doesnt seem like the swagger API calls allow this either, except to individually query the 3202 workflows that were launched. But maybe Im missing something? For now I have been monitoring the workflow bucket so I can see when the various bam files and other stats files are written back to the bucket.

Thanks!

Mike

From Terra Support 4:51pm EST

Thanks for getting back to us, and for detailing your needs. To get the information you’re after, you may only have the option of scripting a query to pull information on all 3202 workflows using the APIs options previously mentioned at this time. However, I’m more than happy to file a feature request with the team to build something into the platform that gets you the information you desire.

Can you provide us a few more details about the problem you’re ultimately looking to solve? This will be helpful for the team to know when considering how to build the feature.

Our scientific goal is to align 3202 samples to the new human reference genome created from the Telomere-to-Telomere project to characterize variants across the samples compared to variants identified using the standard GRCh38 reference genome. But the immediate need is to be able to report back to the consortium how much progress the pipeline has made, get some estimate of how much longer the pipeline will take to complete, and get some estimate of how much more it will cost to complete. I realize projecting time and costs is very complicated and a long term research project, although I would hope we could get a summary of the running tasks. Does that make sense? Happy to have a call to discuss more.

Mike

From Terra Support 10:19am EDT

Thanks so much for those details. This level of visibility isn’t currently built-in to the platform but I’ll be raising a feature request for the development team to consider.

Would you like any assistance from one of our teams in pulling the information you need via API? Please let us know if so - I’ll be happy to set up an email thread to get the process started!

Thank you. I will not need this at this time - we have a workaround of polling the google buckets directly to check on status but it would definitely be helpful if you could raise a feature request to make this more accessible & robust

Best

Mike

From Terra Support 9:57am EDT

Will do! Thanks again for writing in and let us know if we can help in any other way.