Workflow "out of memory" issue

yxhan · September 22, 2025, 3:22pm

Dear AnVIL Community,
I’d like to learn your experience with WholeGenomeGermlineSingleSample v3.3.4 workflow. I consistently encounter “out of memory” errors at the MarkDuplicates stage.

From my understanding, the memory_multiplier parameter controls this step. I have experimented with several values:

34, 68, 70, 80 → all returned out-of-memory errors.
100, 250, 300 → returned the error “Invalid value for field ‘resource.properties.machineType’”, which I believe indicates that GCP rejected the request due to excessive resource allocation.

Since I am working with large uBAM files (400 samples, total size is about 30 TB), I am unsure how best to configure these parameters to complete the workflow successfully. I have attached my current inputs.json file for reference.

Please advise on how to properly set the parameters (particularly memory and disk sizing) so that the workflow can run successfully on large inputs. I’d also be happy to provide any additional details that would help in troubleshooting.

I greatly appreciate any insight you can share.

# input.json:
{“WholeGenomeGermlineSingleSample.CollectRawWgsMetrics.read_length”:“${151}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.ApplyBQSR.gatk_docker”:“${}”,“WholeGenomeGermlineSingleSample.BamToGvcf.make_bamout”:“${false}”,“WholeGenomeGermlineSingleSample.fingerprint_genotypes_file”:“gs://dsde-data-na12878-public/NA12878.hg38.reference.fingerprint.vcf”,“WholeGenomeGermlineSingleSample.CollectRawWgsMetrics.memory_multiplier”:“${4}”,“WholeGenomeGermlineSingleSample.references”:“${{“contamination_sites_ud”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.contam.UD”,“contamination_sites_bed”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.contam.bed”,“contamination_sites_mu”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.contam.mu”,“calling_interval_list”:“gs://gcp-public-data–broad-references/hg38/v0/wgs_calling_regions.hg38.interval_list”,“reference_fasta”:{“ref_dict”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.dict”,“ref_fasta”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta”,“ref_fasta_index”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai”,“ref_alt”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.alt”,“ref_sa”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.sa”,“ref_amb”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.amb”,“ref_bwt”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.bwt”,“ref_ann”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.ann”,“ref_pac”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.pac”},“known_indels_sites_vcfs”:[“gs://gcp-public-data–broad-references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz”,“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz”],“known_indels_sites_indices”:[“gs://gcp-public-data–broad-references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi”,“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi”],“dbsnp_vcf”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf”,“dbsnp_vcf_index”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx”,“evaluation_interval_list”:“gs://gcp-public-data–broad-references/hg38/v0/wgs_evaluation_regions.hg38.interval_list”,“haplotype_database_file”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.haplotype_database.txt”}}”,“WholeGenomeGermlineSingleSample.sample_and_unmapped_bams”:“${{ “sample_name”: this.sample_name_id, “base_file_name”: this.base_file_name, “flowcell_unmapped_bams”: this.flowcell_unmapped_bams, “final_gvcf_base_name”: this.final_gvcf_base_name, “unmapped_bam_suffix”: “.bam” }}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.SortSampleBam.memory_multiplier”:“${34}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.GatherBamFiles.additional_disk”:“${1000}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.MarkDuplicates.read_name_regex”:“${null}”,“WholeGenomeGermlineSingleSample.cloud_provider”:“gcp”,“WholeGenomeGermlineSingleSample.BamToGvcf.SortBamout.additional_disk”:“${1000}”,“WholeGenomeGermlineSingleSample.CollectRawWgsMetrics.additional_disk”:“${1000}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.ApplyBQSR.memory_multiplier”:“${8}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.BaseRecalibrator.gatk_docker”:“${}”,“WholeGenomeGermlineSingleSample.BamToGvcf.HaplotypeCallerGATK4.memory_multiplier”:“${8}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.ApplyBQSR.additional_disk”:“${1000}”,“WholeGenomeGermlineSingleSample.BamToGvcf.make_gvcf”:“${true}”,“WholeGenomeGermlineSingleSample.wgs_coverage_interval_list”:“gs://gcp-public-data–broad-references/hg38/v0/wgs_coverage_regions.hg38.interval_list”,“WholeGenomeGermlineSingleSample.BamToGvcf.SortBamout.memory_multiplier”:“${20}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.MarkDuplicates.additional_disk”:“${1500}”,“WholeGenomeGermlineSingleSample.papi_settings”:“${{“preemptible_tries”:3,“agg_preemptible_tries”:3}}”,“WholeGenomeGermlineSingleSample.BamToCram.ValidateCram.memory_multiplier”:“${4}”,“WholeGenomeGermlineSingleSample.scatter_settings”:“${{“haplotype_scatter_count”:50,“break_bands_at_multiples_of”:1000000}}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.MarkDuplicates.memory_multiplier”:“${80}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.GatherBamFiles.memory_multiplier”:“${4}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.GatherBqsrReports.gatk_docker”:“${}”,“WholeGenomeGermlineSingleSample.AggregatedBamQC.CheckFingerprintTask.memory_size”:“${1000}”}

avahoffman · September 23, 2025, 5:09pm

Hi @yxhan ,

I’m not sure if this is a Terra issue or perhaps an issue/something going on with the workflow itself. I’d recommend reaching out to Terra support via email (support@terra.bio) or through the AnVIL Menu > Support > Contact Us.

Thanks!
Ava

yxhan · September 23, 2025, 6:13pm

This is their response: “Unfortunately we can’t advise on specific memory values to use, but it may help to know that the amount of memory it’s possible to configure a machine with depends on its CPU count - so you may be able to access higher amounts of memory by increasing your core count.“

I already tried many times and requested memory that exceeded what the GCP/Terra/WARP can provide. Any other teams that I can reach out to?

Thanks!

avahoffman · September 24, 2025, 2:15pm

Hi @yxhan,

This is a tricky problem; it’s not immediately clear to me where the memory_multiplier parameter is being used. Perhaps in this sub-workflow? If that’s correct, then 80*7.5 is already 600 GB. Is it possible to subset the .bam file? Said another way, are there any .bam files that are succeeding?

I saw you opened an issue on GitHub , which should hopefully be closer to getting you a solution!

Thanks!
Ava

Topic		Replies	Views
Terra warp-pipelines out of memory issue Help terra	5	35	July 25, 2025
Increasing computational resources for Galaxy on Terra? Feature Requests galaxy , terra	3	512	February 23, 2021
“Retry with More Memory” feature for Terra workflow Feature Requests terra	3	13	September 17, 2025
Running bwa mem on AnVIL Help	4	592	September 15, 2021
Galaxy on Terra: How to best handle 100s of files Help	13	531	February 18, 2021

Workflow "out of memory" issue

Related topics