We are seeking access to sequencing data from a set of iPSCs provided by the California’s Stem Cell Agency (CIRM), which are available through your database.
Recently, we had our request for data access approved by dbGaP’s DAC. Unfortunately, as we were not aware that the data wasn’t managed by dbGaP, we did not request access to use cloud computing.
How do we proceed from here? We have not used AnVIL before, so I don’t know whether we need to revise our data request in order to access the data.
Otherwise, can we download the data directly without the use of cloud computing?
If you’d like to use data locally, you will still need to egress (download) data from AnVIL. Whether you choose to (1) analyze data on AnVIL or (2) egress, you’ll need to:
Ensure you can log in to AnVIL
Link your eRA commons ID in your AnVIL user settings [instructions here]
Ensure you can view the data on AnVIL
Please let us know if you are encountering issues on any of the steps above.
Do you know if we can simply see the data on AnVIL without having requested permission to use cloud computing on our dbGaP access request? Also, do you know if we can see the genotypes on AnVIL without doing any analysis – and without the need for cloud computing?
In the case of requesting the use of cloud computing, we need to fill out the following. Perhaps you could help here as well?
Cloud Use Statement
State the name of the cloud service provider and/or third-party IT system, their security standard, and how they will be used to carry out the work described in your Research Use Statement. Also, if applicable, describe the role of any collaborators. Please limit your statement to 2000 characters.
It is possible to view/browse data on AnVIL without cloud computing costs, as long as you are logged in and have permission to view the data. If the Workspace containing the data already has genotypes quantified, you could view these as well. However, it depends on what has already been done in the Workspace.
We are locating resources our end that can be used for the Cloud Use Statement.
Can you say anything about the costs of using AnVIL for looking at these genotypes? I know it’s hard to predict, but just a general estimate? Also, which platforms/tools would you recommend us using?
For the dbGaP access request, can you specify AnVIL’s security standard very shortly (please see below)?
State the name of the cloud service provider and/or third-party IT system, their security standard, and how they will be used to carry out the work described in your Research Use Statement. Also, if applicable, describe the role of any collaborators. Please limit your statement to 2000 characters.
The costs will generally depend on the number of samples and what tools you are using (e.g., Jupyter notebooks, Galaxy, Workflows). Generally, interactive sessions are easier to estimate because the tool shows the amount per hour. For example, the base configuration Jupyter notebook is $0.06 per hr. If you plan to run workflows, we recommend starting with a few samples to get a cost estimate.
You brought to our attention that our team should provide a general use Cloud Use Statement for AnVIL. We’re currently working on this. In the meantime, you can find security information here: Platform and Data Security - AnVIL Portal