Q&A
–
Q: Is the seqr dashboard visible in AnVIL?
A: The seqr dashboard is visible on the seqr site. Seqr’s main public instance is available through AnVIL. To use the main instance of seqr with a user’s own data in AnVIL, users will need to log in to AnVIL, select their data in an AnVIL workspace, then open seqr. The seqr site (https://seqr.broadinstitute.org/) includes a demo on public data that users can walk through to learn how to use the platform.
Q: If any of the reference databases used by seqr are updated, will that be automatically reflected in seqr on AnVIL?
A: Yes. Seqr automatically updates weekly, every Sunday evening. The main instance of of seqr run on AnVIL. The platform pulls updates from the many related databases and sources that are part of analysis, including OMIM, ClinVAR, GenCC, and PanelAPP.
Q: During the demo, you showed selecting a .vcf file and selecting the link to analyze in seqr. Is the main activity in AnVIL to select datasets to move to seqr?
A: A user can either directly go to seqr.broadinstitute.org, but can also go directly go through the AnVIL workspace. The main seqr instance is within AnVIL, so seqr is under all the security boundaries and requirements of AnVIL. Seqr login is managed by AnVIL.
When data are initially uploaded to AnVIL, it’s stored in the AnVIL workspace bucket. When a user opens seqr with .vcfs selected, seqr copies the data from AnVIL. Once .vcfs are loaded into seqr, the .vcfs are no longer needed to be stored in AnVIL. However, .crams would need to remain in AnVIL if a user would like to visualize the variants in a genome sequence with IGV.
Launching seqr kicks off the data loading process. Users then kick off the analysis in seqr.
–
Q: Can people use script to run those workflow in seqr? Or is it designed for case by case scenario and relying on UI to control the analysis process, choosing parameter values?
A: Seqr is designed for folks who don’t have computational expertise if you use seqr through the UI. If you want to customize or develop additional features, users can have their own local installation from github.
–
Q: Are your development teams going to incorporate Alpha genome, regulatory variant effect prediction?
A: Seqr has some scores for regulatory variant effect prediction. When there are new in silico prediction scores available, the team reviews these tools. There is a postdoc working on bringing in Alpha Genome. Currently, Alpha Genome has some raw scores, they are planning to publish some more robust scores soon that the seqr team will incorporate.
–
Q: Is it possible to run a single genome without a family history on seqr, looking for rare variants? Or does it require a pedigree?
A: Yes, it is possible. You can load data from a program that has any number of participants. It’s difficult to identify de novo variants without family history, so you wouldn’t know if a variant is in cis or trans. Pedigree also helps to create the manifest. Seqr requires 1 HPO term per individual (at least). You need a joint call manifest, identifier, and family IDs, but don’t need extra family members with these cases.
Q: If the individual doesn’t want to share their data in a seqr analysis, is it possible to restrict this?
A: We highly recommend sharing data, but understand that not all patients are consented for sharing their data. Seqr requires some high level data to be shared, including but not limited to: allele count, high level phenotype details. Identifiable information is never shared.
–
Q: Can seqr use long-read sequencing data?
A: This is in progress but not available yet, looking to release it this year. It will be a great advantage to review short read and long read data side by side.
–
Q: Does seqr support analysis for variants other than the SNP and INDELs, other variants types like SV?
A: There are some pipelines that seqr used like GATK SVs, but the team doesn’t currently have a stable pipeline used in seqr. The team is investigating other SV callers, like DRAGEN.
Q: Does seqr use hg38? Can we use other builds, like T2T?
A: Seqr currently supports hg37 and hg 38. It’s not currently on the roadmap to bring in new references, but is a possibility for the future.
–
Q:Can you please clarify what happens to potentially newly discovered variants of unknown significance in the pipeline?
A: All the analysis is run by the user. If the user identifies a VUS, the burden is on the researcher to document and report this in publications and variant resources. The team is piloting automated reanalysis when the literature and downstream resources (e.g., ClinVar) are updated related to identified unknown variants a user has previously seen.
–
Q: Does a user need to set up seqr to run out of Docker in an AnVIL VM? To use seqrin AnVIL, how much does a user need to pay for storage of resources and data pulled in by the seqr analysis? Is the information being pulled on the fly for outside evidence from the related resources?
A: There is a centralized instance of seqr in AnVIL available for all users. All storage costs for .vcfs, annotations, and analysis are covered by the seqr program, not the user.
If a user deploys their own instance of seqr, then that user will be required to pay for everything involved in the deployment.
Q: So all the information served in the GUI interface information is kept by the seqr team?
A: Yes. If a user continues to keep their .vcfs, .crams, and other data in the AnVIL workspace, then there will be storage costs incurred. But there is no seqr analysis cost the user is responsible for.
Q: How can I return to my results in seqr? How do I download the result that I have without having to return to seqr to share it with my collaborators?
A: All analysis in seqr can be accessed at any time by a user once you have logged in. The analysis is all run through the seqr site. Access to the case is controlled by the person who loads data to seqr, through the workspace. Any collaborators who have been given access to the workspace will have access to the cases, the notes, etc. in seqr.
Q: People don’t always want to navigate another platform or deal with login to see a single result. Is there a different way to share the results with collaborators, like an analyst and a clinician?
A: A file can be downloaded to an excel or .tsv file. If someone just wants annotations of variants, tools in ClinVar and OMIM can annotate your data and return it to you. Seqr is a place to see data and annotations and make human decisions on whether they’re causal for a patient’s phenotype. This is a judgement that needs to be analyzed and reviewed by a human to make that assessment. Seqr is a tool to capture decisions and make judgements on a variant. Other tools can be used to add annotations or output a result.
Q: Looking for a way to consolidate the information in a single place (lab uses other tools and systems to capture notes). Can put an AI over the systems to pull out information. Looking for tools that will give the most functionality and usage to justify transitioning to another system.
–
Q: Is there a way to focus output on all variants for a specific gene of interest, or genomic region of interest?
A: Yes! You can specify one gene or an entire list, specific variants, genomic regions, etc. This video details that process: https://www.youtube.com/watch?v=tbvhn3quTqg.
–
Q: There is a GREGoR Training Dataset, is there is a hands-on demo that can be accessed publicly?
A: These data are not consented for open use, just for training. It’s fairly light-weight to apply for access for training use.