AnVIL Demos: AnVIL Data Explorer: Building and Exporting Dataset Cohorts on December 17, 2025

Topic: AnVIL Data Explorer: Building and Exporting Dataset Cohorts

December 17, 2025 at 10:00 AM ET on Zoom

10:00 AM - 10:30 AM ET – Demo on AnVIL

Join us for a demo of the AnVIL Data Explorer, a cohort builder for NHGRI-supported genomic datasets hosted on AnVIL. Learn how to register, link your NIH credentials, find and filter datasets, and request access through dbGaP or the Data Use Oversight System (DUOS). We’ll also walk through exporting selected datasets to AnVIL for analysis.

By the end of the session, you’ll understand how to use the AnVIL Data Explorer to find, access, and analyze genomic data in the AnVIL ecosystem.

10:30 AM - 11:00 AM ET – Q&A

We’ll open up the floor to questions about the demo presented, and will have AnVIL and Terra support on call to answer any questions about AnVIL you might have!

:pencil: Sign up: Register for AnVIL Demos

What are AnVIL Demos?

AnVIL Demos are a monthly, virtual meeting where we highlight what you can do on the NHGRI Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org/), a cloud-based computing platform for genomic data science! AnVIL Demos will start out with a 30-minute demonstration on the platform followed by open time for Q&A and user support.

The demos will highlight a range of topics, from a capability of the platform to a scientific analysis powered by AnVIL. If you’re interested in showcasing how you use AnVIL at a future AnVIL Demos session, reach out to Natalie Kucher (nkucher3@jhu.edu). After the demo, we’ll open up the floor to answer questions about the demo and to answer any general questions you might have about AnVIL.

:play_or_pause_button: Watch past AnVIL Demos recordings from our YouTube playlist!

Resources

Upcoming Events

Sign up to hear about future AnVIL Demos and announcements athttp://bit.ly/anvil-mailing-list and learn about upcoming events athttps://anvilproject.org/events!

I think I registered for this last week. Was I supposed to get a link to the Demo? I didn’t see anything in my email or Outlook. Thanks!

Hi @camancuso , apologies you didn’t get it yet! I forwarded the invite to you just now. Please let me know if you didn’t receive it.

Thanks!

Natalie

I got it now, thanks!

Q: Can you create more complicated queries using DSL or other languages over the data in the Data Explorer?

A: Through the Explorer, no, but would be very interested to understand which queries you’d be interested in to develop as a use case.

Once the data are exported in a workspace, all the metadata are available for filtering and you can write scripts to filter the data from the exported dataset.

The Data Explorer search facets developed are based on a Findability Subset which includes public metadata we can expose for search across public and managed access datasets. Not all of these are populated, but this is the starting point for findability of the data of interest.

Q: Can someone export or download the full findability subset?

A: Yes, once you select data you’d like and select export, you can download the file manifest. This can be selected and exported across multiple cohorts in the Explorer.

An API is also available for downloading all metadata values in the findability subset.

There are efforts to harmonize the findability subset metadata, however users should be aware that there will be some inconsistencies, particularly for data from consortia projects that are completed, due to different ontologies and data models used.

Q: For indexing and export, is it fair to say that it’s a file-centric perspective? If an individual donor has 4 files associated with their data, does the export carry this association (e.g., DNA, RNA).

A: These are linked by having an identical subject ID.

The smallest granularity for export purposes are files. However a major design criteria is to act as a cohort builder. A user can go through and select characteristics specific to that file, and it may meet the whole search criteria.

If you select a dataset and select all files associated with these data, it’s a different export than selecting specific files. At the end of an export where all files from a dataset are selected, your workspace will include the data structures and files as provided by original submitter.

Q: Regarding file & project versioning, presumably you don’t add automatically add new files to the Explorer without versioning?

A: NHGRI chose for most cases, final release snapshots of particular datasets. Some datasets have a v1 in their naming; for subsequent releases these versions get updated. The datasets that are made available to researchers through the AnVIL Data Explorer are final versions of these datasets submitted by the data submitters.

Q: When signing in, there are checkmarks that note that you have a Terra account and that you agree to the terms of service. Is this live?

A: Yes. If the terms of service change, users will have to re-accept the terms of service, so you will not have logged-in access to the Data Explorer if you haven’t accepted. For the NIH account linking, this is required to be renewed monthly.

For this check to work, users need to use the same email to log into Terra as the Data Explorer.

For requesting access to data, users need to use their institutional email address, not a public gmail address.

Q: Any features up and coming in 2026 that we can preview?

A: New features are coming, a little preliminary to discuss, but high level, there are efforts to support some natural language queries.