Workflow "out of memory" issue

Dear AnVIL Community,
I’d like to learn your experience with WholeGenomeGermlineSingleSample v3.3.4 workflow. I consistently encounter “out of memory” errors at the MarkDuplicates stage.

From my understanding, the memory_multiplier parameter controls this step. I have experimented with several values:

  • 34, 68, 70, 80 → all returned out-of-memory errors.

  • 100, 250, 300 → returned the error “Invalid value for field ‘resource.properties.machineType’”, which I believe indicates that GCP rejected the request due to excessive resource allocation.

Since I am working with large uBAM files (400 samples, total size is about 30 TB), I am unsure how best to configure these parameters to complete the workflow successfully. I have attached my current inputs.json file for reference.

Please advise on how to properly set the parameters (particularly memory and disk sizing) so that the workflow can run successfully on large inputs. I’d also be happy to provide any additional details that would help in troubleshooting.

I greatly appreciate any insight you can share.

# input.json:
{“WholeGenomeGermlineSingleSample.CollectRawWgsMetrics.read_length”:“${151}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.ApplyBQSR.gatk_docker”:“${}”,“WholeGenomeGermlineSingleSample.BamToGvcf.make_bamout”:“${false}”,“WholeGenomeGermlineSingleSample.fingerprint_genotypes_file”:“gs://dsde-data-na12878-public/NA12878.hg38.reference.fingerprint.vcf”,“WholeGenomeGermlineSingleSample.CollectRawWgsMetrics.memory_multiplier”:“${4}”,“WholeGenomeGermlineSingleSample.references”:“${{“contamination_sites_ud”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.contam.UD”,“contamination_sites_bed”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.contam.bed”,“contamination_sites_mu”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.contam.mu”,“calling_interval_list”:“gs://gcp-public-data–broad-references/hg38/v0/wgs_calling_regions.hg38.interval_list”,“reference_fasta”:{“ref_dict”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.dict”,“ref_fasta”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta”,“ref_fasta_index”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai”,“ref_alt”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.alt”,“ref_sa”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.sa”,“ref_amb”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.amb”,“ref_bwt”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.bwt”,“ref_ann”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.ann”,“ref_pac”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.pac”},“known_indels_sites_vcfs”:[“gs://gcp-public-data–broad-references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz”,“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz”],“known_indels_sites_indices”:[“gs://gcp-public-data–broad-references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi”,“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi”],“dbsnp_vcf”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf”,“dbsnp_vcf_index”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx”,“evaluation_interval_list”:“gs://gcp-public-data–broad-references/hg38/v0/wgs_evaluation_regions.hg38.interval_list”,“haplotype_database_file”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.haplotype_database.txt”}}”,“WholeGenomeGermlineSingleSample.sample_and_unmapped_bams”:“${{ “sample_name”: this.sample_name_id, “base_file_name”: this.base_file_name, “flowcell_unmapped_bams”: this.flowcell_unmapped_bams, “final_gvcf_base_name”: this.final_gvcf_base_name, “unmapped_bam_suffix”: “.bam” }}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.SortSampleBam.memory_multiplier”:“${34}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.GatherBamFiles.additional_disk”:“${1000}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.MarkDuplicates.read_name_regex”:“${null}”,“WholeGenomeGermlineSingleSample.cloud_provider”:“gcp”,“WholeGenomeGermlineSingleSample.BamToGvcf.SortBamout.additional_disk”:“${1000}”,“WholeGenomeGermlineSingleSample.CollectRawWgsMetrics.additional_disk”:“${1000}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.ApplyBQSR.memory_multiplier”:“${8}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.BaseRecalibrator.gatk_docker”:“${}”,“WholeGenomeGermlineSingleSample.BamToGvcf.HaplotypeCallerGATK4.memory_multiplier”:“${8}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.ApplyBQSR.additional_disk”:“${1000}”,“WholeGenomeGermlineSingleSample.BamToGvcf.make_gvcf”:“${true}”,“WholeGenomeGermlineSingleSample.wgs_coverage_interval_list”:“gs://gcp-public-data–broad-references/hg38/v0/wgs_coverage_regions.hg38.interval_list”,“WholeGenomeGermlineSingleSample.BamToGvcf.SortBamout.memory_multiplier”:“${20}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.MarkDuplicates.additional_disk”:“${1500}”,“WholeGenomeGermlineSingleSample.papi_settings”:“${{“preemptible_tries”:3,“agg_preemptible_tries”:3}}”,“WholeGenomeGermlineSingleSample.BamToCram.ValidateCram.memory_multiplier”:“${4}”,“WholeGenomeGermlineSingleSample.scatter_settings”:“${{“haplotype_scatter_count”:50,“break_bands_at_multiples_of”:1000000}}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.MarkDuplicates.memory_multiplier”:“${80}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.GatherBamFiles.memory_multiplier”:“${4}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.GatherBqsrReports.gatk_docker”:“${}”,“WholeGenomeGermlineSingleSample.AggregatedBamQC.CheckFingerprintTask.memory_size”:“${1000}”}

Hi @yxhan ,

I’m not sure if this is a Terra issue or perhaps an issue/something going on with the workflow itself. I’d recommend reaching out to Terra support via email (support@terra.bio) or through the AnVIL Menu > Support > Contact Us.

Thanks!
Ava

This is their response: “Unfortunately we can’t advise on specific memory values to use, but it may help to know that the amount of memory it’s possible to configure a machine with depends on its CPU count - so you may be able to access higher amounts of memory by increasing your core count.“

I already tried many times and requested memory that exceeded what the GCP/Terra/WARP can provide. Any other teams that I can reach out to?

Thanks!

Hi @yxhan,

This is a tricky problem; it’s not immediately clear to me where the memory_multiplier parameter is being used. Perhaps in this sub-workflow? If that’s correct, then 80*7.5 is already 600 GB. Is it possible to subset the .bam file? Said another way, are there any .bam files that are succeeding?

I saw you opened an issue on GitHub , which should hopefully be closer to getting you a solution!

Thanks!
Ava