Public dissemination of derived data on AnVIL?

melissacline · September 9, 2021, 8:01pm

Hi folks,

I have data in one of my workspaces that I’d like to make publicly-accessible, for API access for users outside of AnVIL. The data is summary-level variant interpretation evidence, to be integrated into ClinGen. I could download the data and set up an API endpoint elsewhere, but it would be nice to do this within AnVIL. How are other people approaching the dissemination of their derived data? I’ve looked into the following options so far, and they have their pros and cons:

Share the data on Terra as a featured Terra workspace. With the right settings, this workspace can be accessible by API to users outside Terra.

Pro: this can be set up quickly.
Pro: it can in principle be done today, without requiring any platform changes.
Con: there’s no way to get around the egress charges. In my case, the egress charges should be modest because the data is so small, but it requires the long-term commitment to cover those charges. I don’t want to use requestor-pays, because it doesn’t make sense in my use case (for reasons I won’t go into here for brevity).

Public the data on Gen3

Con: these data don’t seem to fit the Gen3 schema: there’s no sequencing information, and while there is relevant phenotype information, there’s no patient information. It seems like this just breaks the Gen3 model.

Other???

Thanks in advance!
Melissa

cutsort · September 9, 2021, 10:25pm

From Terra Support 6:24pm EDT

Your request (186282) has been received and is being reviewed by our support staff.

cutsort · September 10, 2021, 1:56pm

From Terra Support 9:48am EDT

Thanks for writing in about this. I’ll discuss options with my team and get back to you as soon as I can.

cutsort · September 10, 2021, 9:58pm

From Terra Support 4:41pm EDT

For our understanding, do you have a deadline by which you need to get your data set up somewhere?

I see that you also mentioned you don’t want to use requester pays, but are also concerned about long-term commitments covering charges. Would it make sense for you to host the data on Terra in a normal bucket for a period of time, then request that the bucket be changed to a requester pays bucket later down the line?

melissacline · September 12, 2021, 5:53pm

Thanks!

Ideally, I’d like to get the data hosted somewhere, with public API access, about a month from now.

It’s an interesting idea to cover the egress charges for a while, and then set requestor-pays sometime out in the future. That would be an option to keep the data accessible without committing to covering the storage charges indefinitely. Let me think about that.

In the meantime, if there are other options within the platform, I’d still like to explore them, with or without the Terra bucket for the time being.

Thanks!
Melissa

cutsort · September 14, 2021, 5:17pm

From Terra Support 1:07pm EDT

I think hosting the data in a workspace bucket will be the best option given your timeline and needs. You can definitely enable requester pays on the bucket at a later time if you wanted to make sure you didn’t have to continue paying for egress/network/retrieval charges. Note, however, that you would still have to cover storage costs! You can read more about how requester pays works here: Requester Pays | Cloud Storage | Google Cloud

Topic		Replies	Views
dbGaP access - sequencing data Data Access	8	75	July 8, 2025
dbGap controlled data access Data Access	19	189	February 13, 2025
AnVIL Office Hours 27JAN2022 @ 11 am ET AnVIL Demos	6	491	January 25, 2022
AnVIL Demo: Open Discussion Forum on February 19, 2025 AnVIL Demos	1	23	February 19, 2025
AnVIL Office Hours 16DEC2021 @ 11 am ET AnVIL Demos	2	610	December 16, 2021

Public dissemination of derived data on AnVIL?

Related topics