Q: User is putting together documentation on how to get onto AnVIL and use a cloud platform, pull together various types of data, and lessons learned. There’s interest to share the protocols for other AnVIL users who might be interested in accessing their data as well. Is there support for this?
A: The AnVIL team is very supportive of sharing user-generated documentation! This would be very helpful for the community to understand the type of work and ways that analysts do it on AnVIL.
Q: One challenge was pulling together custom docker images since different users wanted to access different tools and possibly put them into a WDL. There were some issues loading the image, depending on where it was hosted. If the image was hosted in a users own Docker, the request would time out during the retrieval. It would be helpful to have more/better documentation on hosting docker images, retrieving docker images from Docker, and where they should be stored.
A: The team is aware of the issue that it takes a lot of time for a custom docker images to retrieve, and if it takes too long, Terra will assume something is wrong and stop the pull down. The VM has to be spun up with the image, and is likely more of a AnVIL Terra limitation than a docker image repository so that broken images don’t get stuck spinning up indefinitely. One reason it takes so long, is that it has to be built from scratch, while existing images are cached. The Terra team is experimenting and finding solutions. Perhaps users could set their own time limit they’re willing to wait, though if an image is broken this would still be an issue. The base image could be pared down more so they spin up faster. The team that manages the platform can work more closely with the user to use this scenario to work toward a solution.
Some notes on docker image repositories: Dockerhub has rate limits for different accounts: Download rate limit | Docker Documentation. They also put out policy that they will no longer host images that have not been used in 6 months: Docker Hub Image Retention Policy Delayed, Subscription Updates - Docker. There are some improvements if the docker is hosted in the Google Container Registry (GCR) or to host your own images in Quay.io, which offers free accounts for all images and scanning the container. The repository can have different levels of features. Terra and Dockstore support Google Container Repository and Quay.io for workflow submission, but not for interactive environments.
Q: The aim is to have an image with some permissions for some pip installations, possibly a tiered system where users can have different permissions to install tools. It would be helpful to understand the build and update process for custom docker images, especially as new custom images are built and as the base images are periodically updated by Terra.
A: There are more permissions for a docker used for a WDL than in interactive environments. If done in terminal or interactive environments, installations that require root privileges are not permissible. You can build it into your Docker image or put install commands in the startup script. The startup script can be a solution in the meantime, to ask the group to use the default image and use the startup script with very specific installations so they have the necessary packages. Startup script documentation: https://support.terra.bio/hc/en-us/articles/360058193872-Pre-configure-a-Cloud-Environment-with-a-startup-script.