I have data in one of my workspaces that I’d like to make publicly-accessible, for API access for users outside of AnVIL. The data is summary-level variant interpretation evidence, to be integrated into ClinGen. I could download the data and set up an API endpoint elsewhere, but it would be nice to do this within AnVIL. How are other people approaching the dissemination of their derived data? I’ve looked into the following options so far, and they have their pros and cons:
Share the data on Terra as a featured Terra workspace. With the right settings, this workspace can be accessible by API to users outside Terra.
Pro: this can be set up quickly.
Pro: it can in principle be done today, without requiring any platform changes.
Con: there’s no way to get around the egress charges. In my case, the egress charges should be modest because the data is so small, but it requires the long-term commitment to cover those charges. I don’t want to use requestor-pays, because it doesn’t make sense in my use case (for reasons I won’t go into here for brevity).
Public the data on Gen3
Con: these data don’t seem to fit the Gen3 schema: there’s no sequencing information, and while there is relevant phenotype information, there’s no patient information. It seems like this just breaks the Gen3 model.
For our understanding, do you have a deadline by which you need to get your data set up somewhere?
I see that you also mentioned you don’t want to use requester pays, but are also concerned about long-term commitments covering charges. Would it make sense for you to host the data on Terra in a normal bucket for a period of time, then request that the bucket be changed to a requester pays bucket later down the line?
Ideally, I’d like to get the data hosted somewhere, with public API access, about a month from now.
It’s an interesting idea to cover the egress charges for a while, and then set requestor-pays sometime out in the future. That would be an option to keep the data accessible without committing to covering the storage charges indefinitely. Let me think about that.
In the meantime, if there are other options within the platform, I’d still like to explore them, with or without the Terra bucket for the time being.
I think hosting the data in a workspace bucket will be the best option given your timeline and needs. You can definitely enable requester pays on the bucket at a later time if you wanted to make sure you didn’t have to continue paying for egress/network/retrieval charges. Note, however, that you would still have to cover storage costs! You can read more about how requester pays works here: Requester Pays | Cloud Storage | Google Cloud