Gaining access to and processing dbGaP datasets which are unavailable on Anvil Data Explorer

nuwan · January 6, 2026, 4:52pm

Is there a recommended process for accessing datasets that are not available on AnVIL? For example, let’s say I want to process eMerge PGx data on the AnVIL and I have approval for the dataset. Currently, that data is not available on the AnVIL, and only dbGaP is listed as a data storage and distribution platform.

Is there a way to request that it be hosted on AnVIL?
Assuming yes, how long would that process typically take?
Assuming no, what is the recommended way to get the data into an AnVIL workspace for processing? This post suggested using a WDL/dockstore workflow that utilises info obtained via the dbGaP download guide.

Thanks.
/Nuwan

References:

SMO · January 7, 2026, 4:27pm

Hi Nuwan, thank you for writing in with these important questions. Please see below for responses to your questions.

Yes, AnVIL does have a formal dataset onboarding application (https://forms.gle/pecCZmSXS4sgdeHK8). Prospective AnVIL data submitters should complete and submit this form, which will send the application to the AnVIL leadership committee for review.
Decisions on onboarding new datasets are typically made within one month.
If AnVIL does not host a dataset of interest but dbGaP or SRA does, you can use pre-existing workflows that will fetch data and store it in a bucket your own google bucket/Terra workspace. Learn more at https://anvilproject.org/learn/find-data/importing-data-from-dbgap-and-sra. More generally, anything that can be transferred via gcloud storage can be analyzed on AnVIL.

Topic		Replies	Views
FAQs and About the Data Access Category Data Access	0	117	September 13, 2024
dbGap controlled data access Data Access	18	376	February 13, 2025
Unfound dataset from dbGaP Data Access	1	1140	March 17, 2023
How do I retrieve access to a specific dataset? Data Access	1	181	January 10, 2024
Question about AnVIL setup and cost for dbGaP phs000933 Data Access	3	35	June 3, 2026

Gaining access to and processing dbGaP datasets which are unavailable on Anvil Data Explorer

Related topics