Gcloud with start up script on jupyter

I’m trying to copy some files from one terra gcp bucket to a computational environment that I spin up. I’m trying to use the command (made up bucket id)

gcloud storage cp --recursive "gs://fc-1111-111-1111-111111/module_data/module_1/part2" /home/jupyter

this ins’t the same bucket that the workspace is in, but a bucket I have access to. It works well with RStudio using something like

gcloud storage cp --recursive "gs://fc-1111-111-1111-111111/module_data/module_1/part2" /home/rstudio

I’ve noticed some differences in Rstudio and jupyter in permission access and I have tried stuff like

cd /home/jupyter
mkdir data
chmod -R ug+rw data
gcloud storage cp --recursive "gs://fc-1111-111-1111-111111/module_data/module_1/part2" /home/jupyter/data
chmod -R ug+rw data

Is there a difference using gcloud on Rstudio vs Jupyter? Is there an example of using a start up script using gcloud storage cp on jupyter?

Thanks!

Hi @camancuso

I’ve noticed some differences in Rstudio and jupyter in permission access

Could you elaborate on this? Is this affecting you in a specific way?

You should not expect any differences as long as gcloud version is the same for both images. The only case for a potential difference could come in regards to different base images, i.e. Ubuntu 20 vs Ubuntu 22.

Javier

Not sure how to elaborate on this more. In Rstudio, I can use gcloud storage cp in a start up script and it copies data from a given GCP bucket to the computational environment that is built, where as this does not work for a start up script within a Jupyter environment for me (nothing gets copied over when the computational environment is built). I would imagine, like you said, that it is a difference between base image for RStudio and Jupyter. I can’t find logs generated during building the computational environments so I have no idea why it is not copying.

I have one Template workspace for a workshop I’ll be running. This workspace contains all the data needed for 8 different modules. For each module there is a start up script that lists the needed extra packages to be built and what data needs to be copied. So when a module starts the users can use the module specific start up script and then have everything they need for those few hours they are doing that module. Right now any module that needs Jupyter just can’t use a start up script and we need to manually go into the terminal (or call terminal commands for a notebook) to get the data from the template bucket. It isn’t a big problem, but I just can’t figure out why gcloud storage cp doesn’t work on the Jupyter image.

@camancuso

Thanks for sharing this. Just wanted to let you know that the issue has been identified and a fix is in progress. Will post again when it’s been fixed.

Javier

Hi @camancuso

UPDATE: The issue is now fixed. To successfully use gcloud commands in a Jupyter startup script and subsequently after startup, the gcloud commands in the Jupyter startup script must be in the following block:

sudo -u jupyter -g users bash -c "{
    # glcoud commands here
}"

For your particular case, your gcloud command could look like this:

sudo -u jupyter -g users bash -c "{
      gcloud storage cp --recursive "gs://fc-1111-111-1111-111111/module_data/module_1/part2" /home/jupyter/data
}"

Please note that the --billing-project flag may be needed for “Requester Pays” workspaces/buckets cases. More info here:

This still didn’t work for me. Nothing was copied over. I didn’t set the billing project flag as it is not a Requester Pays workspace/bucket.

@camancuso

If you haven’t done so, could you try the minimal example above for maybe just one file and let us know if that works for you? For example:

#!/bin/bash

sudo -u jupyter -g users bash -c "{
   gcloud storage cp gs://fc-1111-111-1111-111111/file .
}"

If possible, could you share the script you are having issues with?

This works! It works for recursive copying as well. I think removing the quotes around the gs file name part is what did it. Thanks!