dbGap controlled data access

Hello,
I followed instructions on this page to access a controlled dbGap dataset via AnVIL, including getting my data access request approved by dbGap, and linking my eRA commons account to my terra account: Requesting Data Access - AnVIL Portal
According to that webpage, dbGaP will list my eRA Commons ID within that cohort’s telemetry file, and the dataset will appear as a workspace within terra environment. However, within terra I get the message: Your account was linked, but you are not authorized to view these controlled datasets. If you think you should have access, please verify your credentials here
Please, advise? How can I access the dbGap data for which I have been approved to access? I indicated in my request that I will access the data through AnVIL.
Thanks.

Hello @Oyomoare,

Thank you for your message!

Do you know how long ago your request was approved by dbGap. Synchronization typically occurs once a day and users might not have access the moment they get an e-mail from dbGaP.

Javier

Thanks, Javier!
I received the email indicating dbGap approval on 11/14/2024.
Do you have a sense of how long it might take to actually get access?
Besides waiting patiently, is there any other step that I need to take to facilitate my accessing the approved dataset via AnVIL?
Thank you.

Best,
Oyomoare

@Oyomoare

I think by now you should have gotten access. Which dataset are you trying to access specifically? Some of the dbGaP datasets are not yet released trough AnVIL, that could be the main issue here.

Also, you could try unlink and re-link the account on Terra. Sometimes that solves some issues with authorization links.

Javier

Sorry! I forgot to say that I followed your advise and unlinked my terra account, and then relinked my terra account to my NIH eRA commons account, but still no access. I still get the same you’re not authorized message.

Thanks Javier for your reply!
Here is the dataset that I got dbGap approval for: phs000669.v1.p1
In my data access request, I indicated that I planned to access the data through AnVIL. My understanding based on AnVIL’s instructions was that after receiving dbGap approval, and linking my eRA commons and terra accounts, I will be able to access the data via telemetry files in AnVIL. Is this expectation correct? How can we move the process of my access to the approved dataset forward?

Hi @Oyomoare

Turns out the dbGaP dataset you are trying to access phs000669.v1.p1 is not available on AnVIL and does not appear that someone has requested to host it. Here is an example comparing a dataset that is available on AnVIL (left) and a dataset that is not (yours; right):

Is there a specific reason you thought this phs000669.v1.p1 dataset was already available on AnVIL?

Javier

Thanks Javier for getting back to me on this issue.
See the information on AnVIL’s website that formed the basis of my expectation. It appears that I misunderstood the instructions.I strictly followed the outlined steps and expected to get access to the approved dataset via telemetry files, not realizing the dataset had to be already hosted in AnVIL. Here is the webpage I am referring to:

So what does this mean? A dbGap approved dataset cannot be accessed through AnVIL unless it was previously hosted on AnVIL? That is so disappointing. I see it is my error now as the list of instructions starts with a search of datasets in AnVIL. I misunderstood, and followed all other steps, but alas it is all invalid.
On a practical level, what do you advice? What is required for the dataset to be hosted on AnVIL? Can I download the dataset and upload into AnVIL? What are my options as I really would like to conduct my analyses within AnVIL environment.
Thanks for your time and effort in assisting me.

@Oyomoare,

You followed the instructions the right way for getting access to the dbGaP dataset of interest. Unfortunately, not all datasets are available directly on AnVIL.

We can definitely help you getting this dataset on AnVIL. Is on my understanding that, yes, you can download the data yourself then upload it to AnVIL. But please bear with me as I am working with the AnVIL team and Terra support to provide recommendations on how to do that.

Thank you!

Javier

Thanks so much, Javier.
I am eagerly looking forward to hearing what steps I can take to get the dataset into AnVIL and conduct my analyses.
I really appreciate your assistance.
Best,
Oyomoare

Hi @Oyomoare

After gathering some information with the AnVIL team, we noticed that dbGaP has some instructions to Downloading dbGaP data with JWT or NGC

One way to do this on AnVIL:

  1. Create Workspace (requires Billing Account and Billing Project)
  2. Upload JWT file
  3. Create Jupyter Cloud Environment
  4. Enter Terminal
  5. Copy JWT file from Workspace Storage to Persistent Disk
  6. Install SRA Toolkit
  7. Run prefetch

Please give it a try and let us know if this works. I also opened a Terra Support ticket and will let you know as soon as they provide more options/recommendations.

Your request (323711) has been received and is being reviewed by our support staff.

Some useful resources: