Generating Voronoi cells in Python

Voronoi cells are basically the shape you see soap suds make. They have a lot of cool properties that I wanted to use for image generation, but I didn’t want to have to figure out the math myself. I found this really excellent tutorial on generating Voronoi cells, which goes into some interesting history about them, too!

However, the Python code was a little out-of-date (and I think the author’s primary language was C++), so I wanted to clean up the example a bit.

It’s always a little tricky combining numpy and cv2, since numpy is column-major and cv2 is row-major (or maybe visa versa?) so I’m doing a rectangle instead of a square to make sure the coordinates are all ordered correctly. I started with some initialization:

import cv2
from matplotlib import pyplot as plt
import numpy as np

random.seed(42)
width = 256
height = 128
num_points = 25

Then we can use the Subdiv2D class and add a point for each cell:

subdiv  = cv2.Subdiv2D((0, 0, width, height))

def RandomPoint():
  return (int(random.random() * width), int(random.random() * height))

for i in range(num_points):
  subdiv.insert(RandomPoint())

Then it just spits out the cells!

# Note that this is height x width!
img = np.zeros((height, width, 3), dtype=np.uint8)

def RandomColor():
  """Generates a random RGB color."""
  return (
    random.randint(0, 256), 
    random.randint(0, 256), 
    random.randint(0, 256))

# idx is the list of indexes you want to get, [] means all.
facets, centers = subdiv.getVoronoiFacetList(idx=[])
for facet, center in zip(facets, centers):
  # Convert shape coordinates (floats) to int.
  ifacet = np.array(facet, int)

  # Draw the polygon.
  cv2.fillConvexPoly(img, ifacet, RandomColor(), cv2.LINE_AA, 0)

  # Draw a black edge around the polygon.
  cv2.polylines(img, np.array([ifacet]), True, (0, 0, 0), 1, cv2.LINE_AA, 0)

  # Draw the center point of each cell.
  cv2.circle(
    img, (int(center[0]), int(center[1])), 3, (0, 0, 0), cv2.FILLED, cv2.LINE_AA, 0)

Finally, write img to a file or just it display with:

plt.imshow(img)

If you use 42 as the seed, you should see exactly:

How to set up Python on Compute Engine

This is a followup to my previous post on setting up big files on GCP. I ran into similar problems with Python as I did with static files, but my solution was a bit different.

The right wayTM of running Python on GCP seems to be via a docker container. However, adding a virtual environment to a docker container is painful: for anything more than a small number of dependencies and the docker image becomes too unwieldy to upload. Thus, I decided to keep my virtual environment on a separate disk in GCP and mount it as a volume on container startup. This keeps the Python image svelte and the virtual environment static, both good things! It does mean that they can get out of sync: technically I should probably be setting up some sort of continuous deployment. However, I don’t want spend the rest of my life setting up ops stuff, so let’s go with this for now.

To create a separate disk, follow the instructions in the last post for creating and attaching a disk to your GCP instance. Make sure you mark the disk read/write, since we’re going to install a bunch of packages.

Start up the instance and mount your disk (I’m calling mine vqgan_models, because sharing is caring).

On your development environment, scp your requirements.txt file over to GCP:

gcloud compute scp requirements.txt vqgan-clip:/mnt/disks/vqgan_models/python_env/requirements.txt

Here’s where things get a little tricky, so here’s a high-level view of what we’re doing:

  1. Create a “scratch” Docker instance.
  2. Add our persistent disk to the container in such a way that it mimics what out prod app will look like.
  3. Install Python dependencies.

Virtual environments are not relocatable, so we need to make the virtual environment directory match what prod will look like. For instance, I’ll be running my python app in /app with a virtual environment /app/.venv. Thus, I am going to mount my persistent disk to /app in the scratch docker container:

docker run -v /mnt/disks/vqgan_models/python_env:/app -it python:3.10-slim bash

This will put you in a bash shell in a python environment container. Everything your create in /app will be saved to the persistent disk.

Note: when you want to leave, exit by hitting Ctrl-D! Typing “exit” seemed to cause changes in the volume not to actually be written to the persistent disk.

Now you can create a virtual environment that will match your production environment:

# Shell starts in /
$ cd /app
$ python3 -m venv .venv
$ . .venv/bin/activate
$ pip install -r requirements.txt

Hit Ctrl-D to exit the scratch docker instance. Shut down your instance so you can change your docker volumes. Go to Container -> Change -> Volume mounts and set the Mount path to /app/.venv and the Host path to /mnt/disks/vqgan_models/python_env/.venv.

On you development machine, set up a Dockerfile that copies your source code and then activates your virtual environment before starting your service:

FROM python:3.10-slim
WORKDIR /app
COPY mypkg ./mypkg
CMD . .venv/bin/activate && python -m mypkg.my_service

Build and push your image:

$ export BACKEND_IMAGE="${REGION}"-docker.pkg.dev/"${PROJECT_ID}"/"${BACKEND_ARTIFACT}"/my-python-app
$ docker build --platform linux/amd64 --tag "${BACKEND_IMAGE}" .
$ docker push "${BACKEND_IMAGE}"

Now start up your GCP instance and make sure it’s running by checking the docker logs.

$ export CID=$(docker container ls | tail -n 1 | cut -f 1 -d' ')
$ docker logs $CID
...
I0917 01:24:51.654180 139988588971840 my_service.py:119] Ready to serve

Now you can quickly upload new versions of code without hassling with giant Docker containers.

Note: I am a newbie at these technologies, so please let me know in the comments if there are better ways of doing this!

How to get big files into Compute Engine

I’ve been working with some large models recently and, as a Docker beginner, shoved them all into my Docker image. This worked… sort of… until docker push started trying to upload 20GB of data. Google Cloud doesn’t seem to support service keys for docker auth (even though they claim to! not that I’m bitter), so I kept getting authorization errors. Time to figure out docker volumes.

First, I needed to create an additional disk. I essentially followed the directions in the docs. Using the console in your compute engine instance, under “Additional Disks” select “Add new disk” and fill in the size you want. The defaults are probably fine, although it defaults to SSD so you can select Standard if don’t care about speed.

Save the instance and start it up. Hit the “SSH” button once it’s booted. Then, find your new disk:

$ sudo lsblk
...
sdb         8:16   0   20G  0 disk

Then format the disk:

$ sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/sdb
$ sudo mkdir -p /mnt/disks/vqgan_models
$ sudo mount -o discard,defaults /dev/sdb /mnt/disks/vqgan_models

I then ran a quick test to make sure it’s actually a writable directory:

$ cd /mnt/disks/vqgan_models/
$ echo "hello world" > test.txt
$ cat test.txt
hello world

Woot! Time to transfer some real data. Following the docs, I ran:

gcloud compute scp models/vqgan/model.ckpt vqgan-clip:/mnt/disks/vqgan_models

After a long upload, I realized that I created the disk in the wrong data center. So if this happens to you: stop the VM, edit it to remove the disk (you have to detach the disk from the VM to modify its zone). Then move the disk:

gcloud compute disks move vqgan-models --zone=us-east1-b --destination-zone=us-central1-c

“zone” is the source zone and “destination-zone” is, more obviously, the destination zone. This probably incurred some cross-data-center-networking cost, but life’s too short to wait for SCP.

Then I edited my us-central1-c instance to add an existing disk. Annoyingly, it isn’t mounted on startup. GCP claims that you can add it to your /etc/fstab, but that was destroyed every time I restarted the instance. Thus, I instead went to “Edit” -> “Management” -> “Metadata” -> “Automation” -> “Startup script” and added the lines:

sudo mkdir -p /mnt/disks/vqgan_models
sudo mount -o discard,defaults /dev/sdb /mnt/disks/vqgan_models

I also managed to make my disk the wrong size. So, if you need to increase the size of your disk, run:

gcloud compute disks resize vqgan-models --size 40 --zone us-central1-c

Then ext4 doesn’t know about the new, bigger size yet, so SSH into your VM and run:

sudo resize2fs /dev/sdb

Now df -h should show “40G” as the size.

Now to actually mount this sucker as a docker volume. Shut the instance back down and go to “Edit”. Under “Container” select “Change” and select “Add Volume”. I want /mnt/disks/vqgan_models/pretrained to be mounted as /app/pretrained in the Docker container, so set “Mount path” to /app/pretrained and “Host path” to /mnt/disks/vqgan_models/pretrained.

Finally, it’s time to boot this up and try it out! Start the instance, hit the SSH button, find the docker container ID, and use that to check the filesystem in the container:

$ export CID=$(docker container ls | tail -n 1 | cut -f 1 -d' ')
$ docker exec $CID ls /app/pretrained
model.ckpt

Now you can (fairly) easily move big files around and attach them to your docker instances.

Note: I am a newbie at all of these tech stacks. If anyone knows a better way to do this, I’d love to hear about it! Please let me know in the comments.

Using Warp workflows to make the shell easier

Disclaimer: GV is an investor in Warp.

Whenever I start a new Python project, I have to go look up the syntax for creating a virtual environment. Somehow it can never stick in my brain, but it seems too trivial to add a script for. I’ve been using Warp as my main shell for a few months now and noticed they they have a feature called “workflows,” which seems to make it easy to add a searchable, documented command you frequently use right to the shell.

To add a workflow to the Warp shell, create a ~/.warp/workflows directory and add a YAML file describing the workflow:

$ mkdir -p ~/.warp/workflows
$ emacs ~/.warp/workflows/venv.yaml

Then I used one of the built-in workflows as a template and modified it to create a virtual environment:

---
name: Create a virtual environment
command: "python3 -m venv {{directory}}"
tags: ["python"]
description: Creates a virtual environment for the current directory.
arguments:
  - name: directory
    description: The directory to contain the virtual environment.
    default_value: .venv
source_url: "https://docs.python.org/3/library/venv.html"
author: kchodorow
author_url: "https://www.kchodorow.com"
shells: []

I saved the file, typed Ctrl-Shift-R, and typed venv and my nice, documented workflow popped up:

However, I’d really like this to handle creating or activating it, so I changed the command to:

command: "[ -d '{{directory}}' ] && source '{{directory}}/bin/activate' || python3 -m venv {{directory}}"

Which now yields:

So nice.

Update: I realized I actually always want to activate the virtual environment, but I also want to create it first if it doesn’t exist. So I updated the command to: ! [ -d '{{directory}}' ] && python3 -m venv {{directory}}; source '{{directory}}/bin/activate'". This creates the virtual environment if it doesn’t exist, and then activates it regardless.

Why market cap is dumb

When I was a kid, I went to a tag sale “for kids, by kids” where kids sold their junk/toys to other kids. I was wandering around and saw a shoebox filled to the brim with marbles. I went over and there was a sign on the box that said, “25 cents/marble”.

“How much for the whole box?” I asked the kid.

He thought for a second. He was around my age, maybe a little older. “$5,” he said.

“Sold!” I said quickly, handed him $5, and ran off with my hundreds and hundreds of marbles before he realized how deep a discount he had just given me.

Suppose there were 300 marbles in the box. The box “should have” cost 300*25 cents=$75. Obviously no one is going to pay $75 for a box of marbles, which brings us to the basic problem with market cap and the stock market.

The market cap of a company is basically the number of shares it has issued multiplied by price per share. However, if we think of a share as a marble, the market cap is that ridiculously inflated $75.

How much of this stock are people actually trading? Google, for example, has 723,000,000,000(ish) shares outstanding. Daily trading volume is around 1,500,000. That is .0002% of the outstanding shares. Translating that into marbles… that’s a lot less than one marble.

But let’s say a couple of people buy individual marbles, and then start trading them between themselves for 25 cents. Someone who hasn’t seen the kid’s booth offers a buyer 30 cents for a marble. Doing some quick math, people realize that marble boy’s net worth has gone from $75 to $90. “Hey, that kid just made $15. We should tax him on that.”

And that’s why a wealth tax is stupid.

Shoulders of Giants

I’ve been thinking a lot about construction. Taking a very specific part of the process, building the staircase: you find a carpenter and they build the staircase to your measurements. Generally your contractor will find someone with decent experience that they think will do a good job for whatever price you’re willing to pay and then you get as staircase executed at whatever skill level happens to be available/at that price point.

Construction Physics had an interesting point the other day: mass production took off in America because the United States didn’t have skilled craftsmen the way Europe did. This is also borne out by my Instagram feed, currently: European tradesmen seem to be more artistic and skilled than the Americans in my feed (sorry fellow countrymen). My guess is that Europe’s aristocracy supported spending 5000 man-hours on a staircase in a way that the United States really couldn’t compete with. And now maybe a continuing culture that values these skills more? I don’t really know.

Regardless, I was thinking about how different this is than software engineering. There’s always someone’s first staircase they’ve ever built, which is not going to be as good as the thousandth (I hope!). However, if there’s a common component in software engineering, someone will have already built it and it will be the product of many engineers’ thousandth try at building user login, logging, whatever. Thus, a junior engineer can use these solid building blocks to create their own first-try-mess on top of. However, that mess will (hopefully!) have a solid foundation.

Open source and APIs are an incredible superpower software engineers have over the physical world. It’s like installing a staircase that was built by every master-craftsman over the last 500 years. And generally, the best tools are accessible to everyone: Fortune 500 companies can use Stripe/Twilio/Mercurial the same way an individual developer with a hobby project can. At least in the realm of software engineering, it is a golden age of equality.

5-minute design: meme generator

When I’ve talked to people who’ve attempted to make meme tools, they say that search is a really hard problem. This sort of makes sense: one person might search “communism”, another “bugs bunny”, and another “we” trying to get this image template:

See Know Your Meme for context.

I was thinking about it today, though, and you know what? All of those people who’ve actually tried to build this are wrong. This is a super easy problem.

How this should work:

  • A user searches for a meme and doesn’t find it. Keep a record of the query (say it’s “communism”).
  • The same user uploads a meme. Now there is a strong possibility that the query the user did is a good match for the uploaded image, so associate that image with “communism” and give that pair a score of 1.
  • Now diff that image to others in the database using image recognition to find “equivalent” images. For gifs, I’m not too familiar with gif’s file format, but I assume something could be done generating frames of image. (There are a lot of assumptions here.)
  • Now associate that query with all “equivalent” images, plus the new image. Then take all of the query terms associated with the existing images and add them to the new image.

Next time someone searches for “communism”, show the meme template uploaded above. If they choose that template, increase the (template, "communism") pair score. Whenever someone searches, show them a mix of high-scoring templates for their search term, plus some prospective templates that are still “young.”

In the example above, I assume the user is trustworthy. There’s also a strong possibility that the user is a bot/malicious actor/both. So that users rep should be tied to whether others use that prospective template/query pair, and that feeds back into how much a user can affect a template’s score.

Since memes change over time, you probably also want to overlay some decay function, so if you search for “drake” you get the latest templates, not ones from years ago.

Now, assuming you have some users, you can set up a “self-labeling” system.

Easy peasy.

A review of GitHub Copilot

I’ve been using GitHub Copilot for Unity development and it is incredible. It has saved me from tons of bugs, speeded up my work by 2x, and even taught me a lot of C# (which Unity uses and I haven’t used before).

How does it work? You install it as a VS Code extension and then, as you type, it (very magically) suggests what should come next. For example, I have a very simple utility class with a static field, WorldWidth, to get the size of the screen in world coordinates. Suppose I wanted to add a WorldHeight field. I press enter, and I literally don’t even have to start typing:

It knows that this is the other function I’d likely want.

Okay, but that’s very simply. How about something more complex? This is a long function, so excerpting part of it, but it’s a nested for loop that looks at each column on the game board.

At the end of this function, I press enter and:

Note that bool lastIter went from y == board[x].Length - 1 to board.Length - 1. Fancy!

Its context-awareness is also great when you don’t want to have to look up math formulas:

(…it works for much more complicated math, too.)

And it saves tons of time writing out annoying debugging messages or ToString functions:

It also taught me C#’s $-formatting and rolls with other formats as you go, e.g., if you prefer (x,y) format.

The downside of using this is that I’m much more likely to repeat code, because I don’t have to write it all out. For example, in the GetVerticalRuns/GetHorizontalRuns functions above, I probably would have pulled out the loops into a common function if I had to write it myself. However, I also probably would have messed it up on the first try.

If you do any programming, I highly recommend signing up for Copilot. It takes away a lot of repetitive, annoying, and route work and lets you concentrate on actually getting your project working.

Using TextMeshPro in a 2D Unity game

Unity has a simple, easy-to-use 2D text option under UI->Legacy->Text. However, this puts a text element on the weird ethereal Canvas that UI stuff sits on, which is probably not what you want (or at least not what I want). I want my text to be nested in other sprites in my scene. To do this, we can use TextMesh Pro objects.

TextMesh Pro was a Unity acquisition in 2017 and since then they’ve more-or-less integrated it into Unity. That said, the first thing you have to do is actually install it, so go to Window ->TextMeshPro -> Import TMP Essential Resources. Now you can use it.

To actually use TextMesh Pro, create a GameObject -> 3D Object -> Text (TextMeshPro). There is a similarly-named component under UI, but do not be fooled! If you use that one, it’ll be placed on a Canvas. Stick with the sprite-y 3D Object one, even in a 2D game. Now you can nest this object as a child of your other GameObjects and local scaling and positioning will work correctly.

From your scripts, you can reference TextMeshPro through its intuitively named package. Start with:

// Such package. Wow.
using TMPro;

Then the component is of type TMP_Text, so you can do:

// This class name...
TMP_Text scoreText = scoreObj.GetComponent<TMP_Text>();
scoreText.text = "Score: 0";

And that’s it. Now you have a nice text block that you can treat like a “normal” Sprite, not some weird 2D thing that lives on its own detached plane of existence.

(Side note: I’ve been mainly using Python for the last few years, so I’m very salty about how ambiguous and terribly-named packages in C# are.)

A YIMBY’s Modest Proposal

Let’s say there’s a nice house, around 100 years old, on a street. It’s been well-maintained and had normal renovations done over time to keep it comfortable and practical to live in (electrics upgrades, HVAC improvements, new roof when necessary, etc.). Now, suppose an inventor buys the empty lot next door. This inventor loves the old house so much he creates a machine to duplicate it in every detail down to the last molecule. Now there are two identical houses next to each other on the street.

The new one will immediately be deemed uninhabitable by the city.

The problem is that construction needs to adhere to building codes, which are the minimum standards to which a building must be built. These get stricter and stricter over time, so there is no chance that the 100-year-old house complies with them (likely offenders include insulating-ness of walls and windows, stair width, porch railings, and hundreds of other things). The weird thing is that the building code is supposed to be the minimum standard, but it obviously isn’t: literally most of the US lives in houses that wouldn’t be approved if they were new construction.

So building codes are too strict in one direction (new housing) and too lenient in another (old housing). There’s obviously a distinction between the true minimum (below which a building is uninhabitable) and the desired minimum (our current building code). Let’s call them BMBC (Bare Minimum Building Code) and DBC (Desired Building Code).

My modest proposal: allow people to build houses that only meet the BMBC. However, tax property for every standard in the DBC that the house fails to meet. Want that spiral staircase? No problem, but it’ll cost you $4k/year for the rest of your life.

Note that this does not exempt old houses. If you have a 23-bedroom house built by a robber baron at the turn of the century and the new DBC says all windows must be triple-glazed, you can either make sure all 6,000 windows are triple glazed or pay your new tax on being energy ineffecient.

This will have a stimulating effect on real estate. Housing prices, particularly for older houses, will be much more affordable to young people. This will create a virtuous cycle of young people with money and energy bringing old houses up to the DBC, and meanwhile older people can move into more modern housing. Modern housing will be more compatible with seniors’ mobility constraints and have lower carrying costs (since property taxes will be lower and more predictable). Plus, it’ll create ongoing employment in the construction industry and keep US housing in tip-top shape.

To sum up: housing is either safe or not. We should encourage more safe and efficient choices, but right now old houses are absurdly advantaged. If we want more and cheaper housing, the BMBC makes it easier to build new housing and the DBC incentivizes keeping old housing up-to-date.