AnVIL Demo: Setting up a data model for AnVIL data submission on October 24, 2024

11:00 AM - 11:30 AM ET – Demo on AnVIL

In this demo, Robert Carroll discussed key considerations for setting up a data model when planning for AnVIL data submission.

11:30 AM - 12:00 PM ET – Q&A

We’ll open up the floor to questions about the demo presented, and will have AnVIL and Terra support on call to answer any questions about AnVIL you might have!

What are AnVIL Demos?

AnVIL Demos are a monthly, virtual meeting where we highlight what you can do on the NHGRI Analysis, Visualization, and Informatics Lab-space (AnVIL ), a cloud-based computing platform for genomic data science! AnVIL Demos will start out with a 30-minute demonstration on the platform followed by open time for Q&A and user support.

The demos will highlight a range of topics, from a capability of the platform to a scientific analysis powered by AnVIL. If you’re interested in showcasing how you use AnVIL at a future AnVIL Demos session, reach out to Natalie Kucher (nkucher3@jhu.edu). After the demo, we’ll open up the floor to answer questions about the demo and to answer any general questions you might have about AnVIL.

Watch our past Demos from our YouTube playlist !

:pencil2: Sign up for more demos: https://forms.gle/7CcaLE9AM7FrYqpP7

Resources

Upcoming Events

Sign up to hear about future AnVIL Demos and announcements at lists.anvilproject.org and learn about upcoming events at AnVIL Community Events!

Q&A
Q: How does the approach presented fit with the various data types AnVIL supports, and what is the extent of its applicability?
A: One clear example is the VCF format, which is standardized. There could be valuable data products that would be beneficial to share. We have flexibility in sharing resources, eg via links in an AnVIL/TDR data table through platforms like GitHub. Additionally, technical outputs such as quality scores for filtering might be worth sharing.

Q: There are multiple approaches to sharing data, such as Workspaces, TDR, and GitHub. What are your thoughts on how to approach this?
A: We could create rows in data tables within AnVIL/TDR pointing to where resources are stored, but controlled-access data must remain within AnVIL. The most valuable contributions are those that provide consistent data. While users can submit whatever data elements they wish, we have a minimal set of data we’d like represented in a particular format. Our goal is not to be restrictive but to encourage a common data representation. Submitters can extend their data model beyond what AnVIL suggests, as needed.

Q: How far should we go with integration? For example, platforms like SRA, NCBI, and OMOP have their own systems. Does AnVIL collaborate with them, or are we working in parallel?
A: We aim to avoid creating new standards when effective solutions already exist. At the same time, we don’t want to impose compliance where it isn’t relevant, such as in rare disease studies.