AnVIL

Galaxy on Terra: How to best handle 100s of files

Is there a way to automate importing files into the Galaxy instance on Terra, given a list of directories in the workspace?
I intend to process hundreds of files and I am currently limited to clicking on them one at a time in the Choose remote files section.

If not automating the import of files as they are analyzed, perhaps a better solution to importing in bulk?
Galaxy has a feature to batch import URLs from the web, is there something similar when accessing files in Terra workspaces?

On a related note, I’m curious to know if I can increase computer resources in Galaxy (in order to process a large list of files faster).

from slack:

lack of a ‘select all’ feature is an issue that was identified in the AnVIL dev days meeting and should def be added; for current ways to accomplish batch uploading:
gxfiles://<service>/file/path type URLs work with the ‘paste/fetch’ component of the uploader, and whole directories can be imported with the rule-based uploader.

from slack:

click paste/fetch, and include gxfiles:// formatted URIs, then press start:

from slack:

for whole directories, you can go to rule-based and select remote files directory

the trick here is to select the little arrow box to navigate, then left click the directory name of the one you want, but in my (limited) experiments with this interface, it was a bit clunky and didn’t seem to handle subdirectories (e.g., if you click on some/dir/ you won’t get the contents of some/dir/a/ or some/dir/b/

but, to circle back — doing this with a ‘select all’ checkbox is probably the way to go.
thinking further, 100s of datafiles (unless they are quite small) are probably not yet supported on current Galaxy VM sizes and will likely fill the provisioned volume, but it would be good to have this user experience issue fixed before bigger volumes are supported

Thanks! I’m trying this but getting this error message:

{
“userAgent”: “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_16_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36”,
“onLine”: true,
“version”: “20.09”,
“xhr”: {
“readyState”: 4,
“responseText”: “{“source”:“leonardo”,“message”:“TraceId(1e3b656d242b2310b19d2062c12e26b3/16382466342422451293) | Unable to proxy connection to tool on mtdna-mitonucl/saturn-k8-e3c4b128-2807-4a9e-8371-f54f4d24216b/galaxy”,“statusCode”:500,“exceptionClass”:“class org.broadinstitute.dsde.workbench.leonardo.http.service.ProxyException”}”,
“responseJSON”: {
“source”: “leonardo”,
“message”: “TraceId(1e3b656d242b2310b19d2062c12e26b3/16382466342422451293) | Unable to proxy connection to tool on mtdna-mitonucl/saturn-k8-e3c4b128-2807-4a9e-8371-f54f4d24216b/galaxy”,
“statusCode”: 500,
“exceptionClass”: “class org.broadinstitute.dsde.workbench.leonardo.http.service.ProxyException”
},
“status”: 500,
“statusText”: “error”
},
“options”: {
“validate”: true,
“parse”: true,
“wait”: true,
“emulateHTTP”: false,
“emulateJSON”: false,
“textStatus”: “error”,
“errorThrown”: “”
},
“url”: “https://notebooks.firecloud.org/proxy/google/v1/apps/mtdna-mitonucl/saturn-k8-e3c4b128-2807-4a9e-8371-f54f4d24216b/galax/proxy/google/v1/apps/mtdna-mitonucl/saturn-k8-e3c4b128-2807-4a9e-8371-f54f4d24216b/galaxy/api/histories/877d7351f28c2dc0/contents?details=5418c83dc470343f%2C528eabcb6d0deb65&order=hid&v=dev&q=update_time-ge&q=deleted&q=purged&qv=2021-02-17T19%3A36%3A14.000Z&qv=False&qv=False”,
“model”: {
“state”: “error”,
“deleted”: true,
“purged”: false,
“name”: “GTEX-1JMLX-0326-SM-CM2RP.Aligned.sortedByCoord.out.patched.md.bam”,
“accessible”: true,
“data_type”: “galaxy.datatypes.data.Data”,
“file_ext”: “data”,
“file_size”: 0,
“meta_files”: [],
“misc_blurb”: “tool error”,
“misc_info”: “”,
“tags”: [],
“history_id”: “877d7351f28c2dc0”,
“history_content_type”: “dataset”,
“hid”: 133,
“visible”: true,
“model_class”: “HistoryDatasetAssociation”,
“url”: “/proxy/google/v1/apps/mtdna-mitonucl/saturn-k8-e3c4b128-2807-4a9e-8371-f54f4d24216b/galaxy/api/histories/877d7351f28c2dc0/contents/6f27d97c8e729e85”,
“type”: “file”,
“type_id”: “dataset-6f27d97c8e729e85”,
“extension”: “data”,
“create_time”: “2021-02-18T00:28:15.847Z”,
“update_time”: “2021-02-18T00:35:48.308Z”,
“dataset_id”: “6f27d97c8e729e85”,
“id”: “6f27d97c8e729e85”,
“hda_ldda”: “hda”,
“rerunnable”: false,
“annotation”: null,
“permissions”: {
“manage”: [
“877d7351f28c2dc0”
],
“access”: []
},
“uuid”: “83f7ec86-7e62-49e3-8610-592aaf135c35”,
“creating_job”: “9a7ff6b167ebf5b4”,
“display_apps”: [],
“visualizations”: [
{
“name”: “editor”,
“html”: “Editor”,
“description”: “Manually edit text”,
“logo”: null,
“title”: null,
“target”: “galaxy_main”,
“embeddable”: false,
“entry_point”: {
“type”: “mako”,
“file”: “editor.mako”,
“attr”: {}
},
“settings”: null,
“groups”: null,
“specs”: null,
“href”: “/proxy/google/v1/apps/mtdna-mitonucl/saturn-k8-e3c4b128-2807-4a9e-8371-f54f4d24216b/galaxy/plugins/visualizations/editor/show”
}
],
“validated_state”: “unknown”,
“resubmitted”: false,
“api_type”: “file”,
“file_name”: “/galaxy/server/database/objects/8/3/f/dataset_83f7ec86-7e62-49e3-8610-592aaf135c35.dat”,
“genome_build”: “?”,
“peek”: null,
“display_types”: [],
“download_url”: “/proxy/google/v1/apps/mtdna-mitonucl/saturn-k8-e3c4b128-2807-4a9e-8371-f54f4d24216b/galaxy/api/histories/877d7351f28c2dc0/contents/6f27d97c8e729e85/display”,
“created_from_basename”: null,
“validated_state_message”: null,
“metadata_dbkey”: “?”,
“urls”: {
“purge”: “/proxy/google/v1/apps/mtdna-mitonucl/saturn-k8-e3c4b128-2807-4a9e-8371-f54f4d24216b/galaxy/datasets/6f27d97c8e729e85/purge_async”,
“display”: “/proxy/google/v1/apps/mtdna-mitonucl/saturn-k8-e3c4b128-2807-4a9e-8371-f54f4d24216b/galaxy/datasets/6f27d97c8e729e85/display/?preview=True”,
“edit”: “/proxy/google/v1/apps/mtdna-mitonucl/saturn-k8-e3c4b128-2807-4a9e-8371-f54f4d24216b/galaxy/datasets/edit?dataset_id=6f27d97c8e729e85”,
“download”: “/proxy/google/v1/apps/mtdna-mitonucl/saturn-k8-e3c4b128-2807-4a9e-8371-f54f4d24216b/galaxy/datasets/6f27d97c8e729e85/display?to_ext=data”,
“report_error”: “/proxy/google/v1/apps/mtdna-mitonucl/saturn-k8-e3c4b128-2807-4a9e-8371-f54f4d24216b/galaxy/dataset/errors?id=6f27d97c8e729e85”,
“rerun”: “/proxy/google/v1/apps/mtdna-mitonucl/saturn-k8-e3c4b128-2807-4a9e-8371-f54f4d24216b/galaxy/tool_runner/rerun?id=6f27d97c8e729e85”,
“show_params”: “/proxy/google/v1/apps/mtdna-mitonucl/saturn-k8-e3c4b128-2807-4a9e-8371-f54f4d24216b/galaxy/datasets/6f27d97c8e729e85/show_params”,
“visualization”: “/proxy/google/v1/apps/mtdna-mitonucl/saturn-k8-e3c4b128-2807-4a9e-8371-f54f4d24216b/galaxy/visualization”,
“meta_download”: “/proxy/google/v1/apps/mtdna-mitonucl/saturn-k8-e3c4b128-2807-4a9e-8371-f54f4d24216b/galaxy/dataset/get_metadata_file?hda_id=6f27d97c8e729e85&metadata_name=”
}
},
“user”: {
“id”: “877d7351f28c2dc0”,
“username”: “edmundotogo”,
“total_disk_usage”: 17787974463,
“nice_total_disk_usage”: “16.6 GB”,
“quota_percent”: null,
“is_admin”: true,
“preferences”: {
“favorites”: “{“tools”: [“toolshed.g2.bx.psu.edu/repos/lparsons/htseq_count/htseq_count/0.9.1”, “toolshed.g2.bx.psu.edu/repos/devteam/samtools_idxstats/samtools_idxstats/2.0.3”]}”
},
“tags_used”: [
“name:Skeletalmuscle-AfAm”
],
“purged”: false,
“deleted”: false,
“quota”: null
}
}

From Slack 2:54pm EST

This looks like the transient client error that comes up when the web handler has problems in the back. [we’ll check] to see if there are any relevant logs

From Slack 9:06am EST

Looking at this - this user’s instance is currently RUNNING and the galaxy-web handler seems to be up. Looking through logs to see if there were any transient errors when the file import was tried

From Terra Support 9:17am EST

Your request (114068) has been received and is being reviewed by our support staff.

From Slack 9:21am EST

here is what I see in the galaxy-web pod logs:

  1. file imports happening at ~12-1pm yesterday
  2. ~30min gap in logs at around 1pm
  3. container restarts around 2pmWondering if the web handler could have ran out of memory or something?

From Terra Support 9:33am

Thanks for writing in. Are you experiencing a specific error in Galaxy when trying to process these files? Any screenshots you can provide would be very helpful!

From Slack 11:04am EST

Was the import so big that it could’ve filled the disk?

From Slack 11:30am EST

Maybe. Not actually sure since I can’t see the data. […] would it be possible for the user to add me to their Terra workspace?

That might have been the case. I tried using the Rule-based method and I must have picked too many at a time. At about 5-7 GB per file, I’m pretty sure I hit the 250Gb limit.

Importing in smaller batches by clicking seems to work fine.

Was the import so big that it could’ve filled the disk?

1 Like