dbGaP access - sequencing data

AKKB · April 3, 2025, 1:28pm

Hello

We are seeking access to sequencing data from a set of iPSCs provided by the California’s Stem Cell Agency (CIRM), which are available through your database.

Recently, we had our request for data access approved by dbGaP’s DAC. Unfortunately, as we were not aware that the data wasn’t managed by dbGaP, we did not request access to use cloud computing.

How do we proceed from here? We have not used AnVIL before, so I don’t know whether we need to revise our data request in order to access the data.

Otherwise, can we download the data directly without the use of cloud computing?

Thank you in advance.

Best wishes,
Anne Kirstine

avahoffman · April 11, 2025, 3:22pm

Hi @AKKB,

Thanks for your question.

If you’d like to use data locally, you will still need to egress (download) data from AnVIL. Whether you choose to (1) analyze data on AnVIL or (2) egress, you’ll need to:

Ensure you can log in to AnVIL
Link your eRA commons ID in your AnVIL user settings [instructions here]
Ensure you can view the data on AnVIL

Please let us know if you are encountering issues on any of the steps above.

Ava

AKKB · April 24, 2025, 10:04am

Hi Ava,

Thank you for your quick response.

We are only interested in looking at a limited set of genes to see whether they harbour disease-associated SNPs – we are trying to select a set of iPSCs (that have been previously sequenced) to find the most optimal for a disease model for AMD (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002032.v2.p1).

Do you know if we can simply see the data on AnVIL without having requested permission to use cloud computing on our dbGaP access request? Also, do you know if we can see the genotypes on AnVIL without doing any analysis – and without the need for cloud computing?

In the case of requesting the use of cloud computing, we need to fill out the following. Perhaps you could help here as well?

Cloud Use Statement
State the name of the cloud service provider and/or third-party IT system, their security standard, and how they will be used to carry out the work described in your Research Use Statement. Also, if applicable, describe the role of any collaborators. Please limit your statement to 2000 characters.

Thank you very much in advance.

Best wishes,

Anne Kirstine

avahoffman · April 24, 2025, 1:37pm

Hi @AKKB,

It is possible to view/browse data on AnVIL without cloud computing costs, as long as you are logged in and have permission to view the data. If the Workspace containing the data already has genotypes quantified, you could view these as well. However, it depends on what has already been done in the Workspace.

We are locating resources our end that can be used for the Cloud Use Statement.

Thanks!
Ava

AKKB · April 24, 2025, 4:34pm

Hi Ava,

Okay, thank you very much!

Can you say anything about the costs of using AnVIL for looking at these genotypes? I know it’s hard to predict, but just a general estimate? Also, which platforms/tools would you recommend us using?

For the dbGaP access request, can you specify AnVIL’s security standard very shortly (please see below)?

State the name of the cloud service provider and/or third-party IT system, their security standard, and how they will be used to carry out the work described in your Research Use Statement. Also, if applicable, describe the role of any collaborators. Please limit your statement to 2000 characters.

Thanks in advance.

Best wishes,

Anne Kirstine

avahoffman · April 28, 2025, 1:46pm

Hi @AKKB ,

The costs will generally depend on the number of samples and what tools you are using (e.g., Jupyter notebooks, Galaxy, Workflows). Generally, interactive sessions are easier to estimate because the tool shows the amount per hour. For example, the base configuration Jupyter notebook is $0.06 per hr. If you plan to run workflows, we recommend starting with a few samples to get a cost estimate.

You might be able to find more precise information in the Terra support docs here: https://support.terra.bio/hc/en-us/sections/360006459511-Managing-Cloud-costs

You brought to our attention that our team should provide a general use Cloud Use Statement for AnVIL. We’re currently working on this. In the meantime, you can find security information here: Platform and Data Security - AnVIL Portal

Thanks!
Ava

avahoffman · June 5, 2025, 8:34pm

Hi @AKKB ,

You previously raised a great point about providing a Cloud Use Statement. We’re pleased to share it is now available on the AnVIL Portal: Platform and Data Security - AnVIL Portal

Thank you for your suggestion!
Ava

AKKB · July 5, 2025, 5:48pm

Dear Ava,

Thanks for your help.

We now got the dbGaP access request approved, and we created a Terra account.

Unfortunately, we get an error page when we try to link with the NIH account (see screenshot below)

Do you know how to fix this?

Thanks in advance.

Best wishes,

Anne Kirstine

avahoffman · July 8, 2025, 3:19pm

Hi @AKKB ,

A few things that could potentially cause issues:

Are you using login.gov? You’ll need to make sure your eRA Commons ID is linked to your login.gov account.
Are you using a gmail account? As of earlier this year, gmail accounts can no longer be connected to NIH credentials / protected data.

Please let us know if either of these were the issue.

Thanks!
Ava

AKKB · August 22, 2025, 12:54pm

Hi Ava,

Thanks for your reply.

We created the AnVIL account using a Google account that we made with the only purpose of using AnVIL (following the guide on your website). Does that mean we can’t use the Google account to access the protected data?

Is there an alternative?

Best wishes,

AKKB

avahoffman · August 25, 2025, 3:33pm

Hi @AKKB ,

This is a fairly new change. Google accounts can be used for public data or personal use, but not protected data requiring dbGaP access. You will need an institutional account for these.

If your institutional account isn’t already a Google account, you can see how to set that up here: https://support.terra.bio/hc/en-us/articles/360029186611-How-to-set-up-a-Terra-account-with-a-non-Google-email

Thanks!
Ava

Topic		Replies	Views
AnVIL Demo: Open Discussion Forum on February 19, 2025 AnVIL Demos	1	65	February 19, 2025
FAQs and About the Data Access Category Data Access	0	92	September 13, 2024
Gaining access to and processing dbGaP datasets which are unavailable on Anvil Data Explorer Data Access datamanagement	1	19	January 7, 2026
AnVIL Demo: Open Discussion Forum on August 20, 2025 AnVIL Demos	1	22	August 20, 2025
Public dissemination of derived data on AnVIL? Data Access terra	5	549	September 14, 2021

dbGaP access - sequencing data

Related topics