How to get big files into Compute Engine

I’ve been working with some large models recently and, as a Docker beginner, shoved them all into my Docker image. This worked… sort of… until docker push started trying to upload 20GB of data. Google Cloud doesn’t seem to support service keys for docker auth (even though they claim to! not that I’m bitter), so I kept getting authorization errors. Time to figure out docker volumes.

First, I needed to create an additional disk. I essentially followed the directions in the docs. Using the console in your compute engine instance, under “Additional Disks” select “Add new disk” and fill in the size you want. The defaults are probably fine, although it defaults to SSD so you can select Standard if don’t care about speed.

Save the instance and start it up. Hit the “SSH” button once it’s booted. Then, find your new disk:

$ sudo lsblk
sdb         8:16   0   20G  0 disk

Then format the disk:

$ sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/sdb
$ sudo mkdir -p /mnt/disks/vqgan_models
$ sudo mount -o discard,defaults /dev/sdb /mnt/disks/vqgan_models

I then ran a quick test to make sure it’s actually a writable directory:

$ cd /mnt/disks/vqgan_models/
$ echo "hello world" > test.txt
$ cat test.txt
hello world

Woot! Time to transfer some real data. Following the docs, I ran:

gcloud compute scp models/vqgan/model.ckpt vqgan-clip:/mnt/disks/vqgan_models

After a long upload, I realized that I created the disk in the wrong data center. So if this happens to you: stop the VM, edit it to remove the disk (you have to detach the disk from the VM to modify its zone). Then move the disk:

gcloud compute disks move vqgan-models --zone=us-east1-b --destination-zone=us-central1-c

“zone” is the source zone and “destination-zone” is, more obviously, the destination zone. This probably incurred some cross-data-center-networking cost, but life’s too short to wait for SCP.

Then I edited my us-central1-c instance to add an existing disk. Annoyingly, it isn’t mounted on startup. GCP claims that you can add it to your /etc/fstab, but that was destroyed every time I restarted the instance. Thus, I instead went to “Edit” -> “Management” -> “Metadata” -> “Automation” -> “Startup script” and added the lines:

sudo mkdir -p /mnt/disks/vqgan_models
sudo mount -o discard,defaults /dev/sdb /mnt/disks/vqgan_models

I also managed to make my disk the wrong size. So, if you need to increase the size of your disk, run:

gcloud compute disks resize vqgan-models --size 40 --zone us-central1-c

Then ext4 doesn’t know about the new, bigger size yet, so SSH into your VM and run:

sudo resize2fs /dev/sdb

Now df -h should show “40G” as the size.

Now to actually mount this sucker as a docker volume. Shut the instance back down and go to “Edit”. Under “Container” select “Change” and select “Add Volume”. I want /mnt/disks/vqgan_models/pretrained to be mounted as /app/pretrained in the Docker container, so set “Mount path” to /app/pretrained and “Host path” to /mnt/disks/vqgan_models/pretrained.

Finally, it’s time to boot this up and try it out! Start the instance, hit the SSH button, find the docker container ID, and use that to check the filesystem in the container:

$ export CID=$(docker container ls | tail -n 1 | cut -f 1 -d' ')
$ docker exec $CID ls /app/pretrained

Now you can (fairly) easily move big files around and attach them to your docker instances.

Note: I am a newbie at all of these tech stacks. If anyone knows a better way to do this, I’d love to hear about it! Please let me know in the comments.

One thought on “How to get big files into Compute Engine

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: