A personal anecdote: hot wheels

Last night I was walking Domino and a woman passed us pulling a wagon of obviously-just-bought stuff from Home Depot. A guy behind her leaned down to adjust one of the items and I thought “Oh, I guess they’re together.” Then he picked it up, did a 180, and started walking away.

I realized what was happening and shouted “Hey!” and made a half-hearted attempt to grab the guy. He took off and the woman dropped the wagon handle and ran after him. I took up a post guarding the wagon, since apparently it needed guarding. A minute later she returned, her chase unsuccessful.

“It was just wheels,” she told me. “Not anything too important.”

“That’s good,” I replied.

Still, it’s the first time in years I’ve witnessed such blatant theft.

Unity Android App deployment

Here is the (roughly 52-step) process for testing your Unity game on an Android device.

In Unity Hub, go to Installs -> Settings -> Add Modules -> Android Build Support and make sure Android SDK & NDK Tools and OpenJDK are installed. If you need to install them, you’ll have to restart Unity afterwards to pick up on the change.

In Unity, go to File -> Build Settings and select Android. Make sure all of your scenes are included and select Build. This will pop up a naming dialog, which will have “.apk” added as an extension when built. In this example, I’m going to call it “game_dev”.

Once the build is finished, you should have a file, game_dev.apk, ready to be installed on your phone!

Now you need to explain to your computer how to get an APK onto your phone. Install adb and connect your phone for debugging: under Developer Options on your phone, enable Wireless debugging. Then select “Pair device with pairing code”:

adb pair $IP:$PORT
<put in the 6-digit code from phone>

You only have to do this once… until your phone’s IP address changes. On my home wifi, this seems to happen pretty regularly, but YMMV.

Above the options for getting a code, your phone will display “Wifi Address”. This port is different from the one you just typed in and you need to use this one to actually connect to the phone:

adb connect $IP:$PORT_FROM_PHONE

Now your phone will doodle-doop to let you know it’s connected. You have to run the connect command every time your computer loses connection to your phone (e.g., you turn off your machine).

Now we can move the APK from your computer to your phone wirelessly. If you’re just running a one-off test, you can select “Build and Run” in Unity and it’ll load the APK in some temporary way onto your phone. This is nice for quick turnaround, but if you want to keep your game around a while, you might want to actually install the APK. To do that, run:

adb devices

If you have more than one device, you’ll have to specify which one you want to deploy your APK to:

adb -s 192.168.12.34:12345 install /path/to/game_dev.apk

This will install the game under the name listed in Unity’s Edit -> Project Settings -> Player -> Product Name field.

Now you can “quickly” try out your game on your phone!

Combining columns in Pandas

Suppose we have a dataframe with a couple of columns and we’d like to merge them into one column with some delimiter. For instance, let’s take a restaurant with orders to fill:

        dinner	dessert
patron		
Alice	Steak	None
Bob	Fish	Cake
Carol	None	Pie

And we want to combine the columns into:

         patron
Alice    Steak     
Bob      Fish, Cake
Carol    Pie

Note that this is made more difficult by the missing values: if everyone would just order dinner and dessert, it would be much simpler. However, users gonna user so there are a couple of ways of doing this in Pandas. If we weren’t picky about formatting, we could simply do:

df['dinner'].fillna('') + ', ' + df['dessert'].fillna('')
patron
Alice   Steak,
Bob     Fish, Cake
Carol   , Pie

Ew. However, easy enough to fix! Add an str.strip:

(df['dinner'].fillna('') + ', ' + df['dessert'].fillna('')).str.strip(', ')
patron
Alice   Steak
Bob     Fish, Cake
Carol   Pie

This is probably the most straightforward way of getting the formatting we want. However, this is a long messy line and gets worse as we add more columns (what about drinks? Appetizers?). We can get more elegant using aggregation to ignore the missing values, instead of having to fill them in and then strip them out at the end. First, we stack up the two columns into one:

df.stack()
patron
Alice   dinner   Steak
Bob     dinner   Fish
        dessert  Cake
Carol   dessert  Pie

This creates a multiindex on name and part of meal. Now we can group by person and join on ‘,’ for any foods they list:

df.stack().groupby(level=0).agg(', '.join)
patron
Alice    Steak
Bob      Fish, Cake
Carol    Pie

:Chef’s kiss:

Centering a stupid box in Unity

If you have a 2D game with a board and you want to center it on the screen, here’s the best way I came up with to do it.

Suppose you have a variable-size board (could longer or shorter in either dimension) on a variable-sized screen (could be a phone, tablet, web browser, console, etc.). Given you want to center the board on the screen, leaving a small margin on each edge, you need to scale and position it correctly.

First we’re just going to find the width and height of the screen. ViewportToWorldPoint gets the coordinates at the upper-right corner of the screen and the camera is centered at (0, 0), so we actually need to double the URH coordinates to get the full width and height.


Vector3 world = Camera.main.ViewportToWorldPoint(
    new Vector3(1f, 1f, 0f));
world *= 2;

Then we need to scale the board to fit within this screen. Thus, if the board is longer than the screen is tall, we have to make it shorter. If the board is wider than the screen, we have to squeeze it thinner. In either case, we want to keep the board at the same aspect ratio after shrinking, so we choose the axis that requires the most compression and scale both axes by that. This code assumes that transform is your board GameObject‘s Transform and board is a Vector2 of board width & height (e.g., for chess it would be new Vector2(8, 8)).


float margin = .5f;
float scaleX = (world.x - 2 * margin) / board.x;
float scaleY = (world.y - 2 * margin) / board.y;
float scale = Mathf.Min(scaleX, scaleY);
transform.localScale = new Vector3(scale, scale, 1f);

Finally, we need to center the scaled-down board on the screen. Since (0, 0) is the center of the screen, we want to position the board at (-width/2, –height/2). However, this isn’t quite correct: if we put a 1-unit sprite at (0, 0), Unity centers the sprite at (0, 0) so its left edge is at -.5. Thus, if we position the sprite at –width/2, its right edge ends up at 0. To adjust for this we need to offset by 1/2 of tile’s width, i.e., the 1/2 of the scaling factor.


float offset = scale / 2;
transform.position = new Vector3(
    -(scale * board.x) / 2 + offset,
    -(scale * board.y) / 2 + offset,
    0f);

I’m kind of shocked that there isn’t an easier way to do this, so please chime in if you know of one!

Setting a random seed for Unity

Over the holidays I’ve been playing with game development in Unity, so I’m going to post a couple of things I’ve discovered that I think are handy. First up: setting a random seed for your game. Not exactly a groundbreaking discovery, but I implemented this early on and it’s game-changing (har har).

I added a method to log what the game’s random seed is and make it easy to set. Now, when I run into a bug, I can immediately reproduce it in subsequent runs.

The code is a short function that I put in my GameManager.cs:

using UnityEngine;

// If you are using System, disambiguate.
using Random = UnityEngine.Random;


class GameManager : MonoBehaviour {
  
  void Awake() {
    // ...
  }

  private void SetSeed(int state=0) {
    if (state == 0) {
      state = System.DateTime.Now.Millisecond;
    }
    Debug.Log("Random seed: " + state);
    Random.InitState(state);
  }
}

Now in GameManager.Awake, I can simply call SetSeed() (or SetSeed(123) if I’m trying to reproduce a specific failure). If I want to come back and debug something later, I can make a note of the seed with the bug report.

System.DateTime.Now.Millisecond returns a number between 0-999, so if you want more possible scenarios you might want to use a different mechanism for generating random seed. However, I liked having a 3-digit seed, since they’re pretty easy to read & remember.

Hope this helps someone!

Co-founder analysis

I recently signed up for yCombinator’s cofounder matching platform. After a week, I deactivated my profile with over 100 founders (or potential founders) having reached out to me. That seemed like a lot of responses, which might be because I had fairly flexible requirements: open to someone technical or non-technical, geography didn’t matter, and a half-dozen areas of interest. (I’m a natural lurker, so I didn’t reach out to anyone myself.)

It was a fascinating experience and there were some amazing founders out there. I thought it would be interesting to give a very general overview of what I saw, without identifying details. So I went through all 100+ emails and made a spreadsheet. Aside from how to spell “Philadelphia,” here are the things I learned:

New York will inspire you

I was delighted to see nearly 30% of founders based in NYC! (This may have been selection bias, since I’m in NYC. But still, makes me happy.) Bay Area followed with 16 founders, LA with 10, and Austin with 5. Over 20% of founders were international and almost 20% were in non-traditional US areas (e.g., flyover states, Dallas, Florida, etc). I love that they weren’t all concentrated in the Bay Area.

Heal the world

I’ve been thinking that there are two paths to making the world a better place. Option 1: become a politician and write legislation to make people do what you want them to do. Option 2: build a business and make it such a compelling choice for people that they will literally pay you to make the change you want to see happen.

I think a lot of entrepreneurs see their business as a way to do good in the world, so it was unsurprising that 10% of founders explicitly wanted to build a business to help fix global warming (or similar ESG focus).

Crypto

A little less than 10% of founders wanted to work on blockchain-/defi-/dao-/nft-related startups. They did not seem to be the most <positive attribute here> tools in the shed, but it was a small sample. This had the only neg of the bunch, plus this gem:

“Working on [crypto project], if done correctly can change securities market for good—but you have to be daring enough ’cause it might just get banned.”

Thank you anyway, I am not that daring.

Bio is hot

Results skewed heavier towards biotech/healthcare than I thought it would (especially since I didn’t express any interest in these area), with over a dozen founders reaching out.

Limitations of my analysis

When people said they had an idea but I couldn’t figure out what it was, I just marked them as not having an idea. My recording of this was not totally impartial, but I did my best. For example, if someone said they were interested in making a marketplace on the blockchain, I’d file that under “crypto.” However, when someone said they were interested in making a marketplace of ideas, I didn’t know what to do with that.

Some startups defied categorization, which was kind of interesting. I don’t want to give real examples because it would be way too identifying, but stuff like, “It’s a trading platform for futures on Venetian masks designed for pets.” Um… fintech?

Getting personal: profile analysis

I noticed two extremes in the data: some people would basically be like, “I don’t have an idea, here’s my ho-hum bio.” The other end was “I have already launched my product and I have 10 previous exits and ask me about the time I wrestled a shark.”

I felt like the ho hum founders needed to at least bring an idea to the table. If you’re not technical, have no relevant accomplishments and no passions, you better at least come with some creativity. You can even say, “I’m not set on this,” if you want, just bring something!

On the other hand, the shark wrestlers need to calm the fuck down. I think that they’re much more like the people I would want to be in the startup trenches with, but I also don’t want to sign up for listening to the same “thrilling” story every day for the next 10 years.

(There were plenty of people in the middle, which was “I’m interested in <things>, I’m looking for <other things>, I bring these skills & relevant accomplishments to the table, here are a couple interesting tidbits about me.”)

Conclusion

I highly recommend the cofounder search if you’re interested in starting a startup! A lot of really interesting and accomplished people on there.

Let me know if there are other breakdowns you’d be interested in! And watch out for shark-wrestlers.

Scraping politely

A lot of projects require scraping websites. I usually write a scraper, run it, it fetches all of the data, and then fails in some final step before writing it anywhere. Then I curse a bit and try to fix my program without being sure what the responses actually looked like. Then I rerun my script, crossing my fingers that I don’t go over any rate limits.

This isn’t optimal, so I’ve finally come up with a better system for this. My requirements are:

  1. Only download a page once.
  2. …for a given time period (e.g., a day). If I rerun after that time period, download the page again.
  3. Make everything human-readable. I want to be able to easily find the response for a given request and visually inspect it.
  4. Basic rate limiting support.
  5. Not reinvent the wheel.

So basically, if I request http://httpbin.org/anything?foo=bar I want it to save the response to a file like ./.db/cache/2021-07-31/httpbin.org_anything_foo_bar. Then I can cat the file and see the response (or delete it to “clear” the cache). However, URLs can be much longer than legal filenames (and the human-readable scheme above could cause collisions), so I’m going to compromise and store the response in file with an opaque hash for a name (e.g., ./.db/cache/2021-07-31/e23403ee51adae9260d7810e2f49f0f2098d8a25c3581440d25d20d02e00ccb9) and then have a CSV file in the directory that maps request URL -> hash. It’s not quite as user-friendly as being able to just visually examine the filename, but I can just do:

$ cat ./.db/cache/2021-07-31/cache_map.csv | grep 'foo=bar'
e23403ee51adae9260d7810e2f49f0f2098d8a25c3581440d25d20d02e00ccb9,http://httpbin.org/anything?foo=bar

I’m using Python, so for not reinventing the wheel, I decided to use requests-cache. The requests-cache package actually has an option to write responses to the filesystem, but I wanted some custom behavior: 1) the cache_map.csv file as described above and 2) naming cache directories by date. Thus, I implemented a custom storage layer for requests-cache to use.

requests-cache represents storage as a dict: each URL is hashed and then requests-cache calls the getter or setter for that hash, depending on if it’s reading or writing. Thus, to implement custom storage, I just have to implement the dict interface to read/write to the filesystem, plus keep my cache_map.csv up-to-date:

class FilesystemStorage(requests_cache.backends.BaseStorage):

    def __init__(self, **kwargs):
        # I'm using APIs that return JSON, so it's easiest to
        # use the built-in JSON serializer.
        super().__init__(serializer='json', **kwargs)

        # A cache a day keeps the bugs at bay.
        today = datetime.datetime.today().strftime('%Y-%m-%d')
        self._storage_dir = os.path.join('.db/cache', today)
        if not os.path.isdir(self._storage_dir):
            os.makedirs(self._storage_dir, exist_ok=True)

        # The map of filename hashes -> URLs.
        self._cache_map = os.path.join(self._storage_dir, 'cache_map.csv')
        # Load any existing cache.
        self._cache = self._LoadCacheMap()

    def _LoadCacheMap(self) -> Dict[str, str]:
        if not os.path.exists(self._cache_map):
            return {}
        # Using pandas is overkill, but are you even a data
        # scientist if you don't?
        return pd.read_csv(self._cache_map, index_col='filename')['url'].to_dict()

    # Dict implementation.

    def __getitem__(self, key: str) -> requests_cache.CachedResponse:
        if key not in self._cache:
            raise KeyError
        k = os.path.join(self._storage_dir, key)
        with open(k, mode='rb') as fh:
            content = fh.read()
        # I want to be able to get the URL from the response,
        # so adding it here.
        url = self._cache[key]
        return requests_cache.CachedResponse(content, url=url)

    def __setitem__(self, key: str, value: requests_cache.CachedResponse):
        # Note that `key` is already hashed, so we use `value`'s
        # URL attribute to get the human-readable URL.
        k = os.path.join(self._storage_dir, key)
        with open(k, mode='wt') as fh:
            json.dump(value.json(), fh)
        # Update cache map
        self._cache[key] = value.url
        # Write the cache back to the file system.
        (
            pd.Series(self._cache, name='url')
            .rename_axis('filename')
            .to_frame()
            .to_csv(self._cache_map)
        )

    # I don't plan on using these, so didn't both implementing them.
    def __delitem__(self, key):
        pass
    
    def __iter__(self):
        pass
    
    def __len__(self) -> int:
        return len(self._cache)

Now I add a simple cache class to use this custom storage:

class FilesystemCache(requests_cache.backends.BaseCache):
    """Stores a map of URL to filename."""

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        storage = FilesystemStorage(**kwargs)
        self.redirects = storage
        self.responses = storage

Note that I’m using the same instance of my cache for both responses and redirects. This isn’t optimal if I were actually expecting redirects, but I’m not and my storage layer is designed to be a singleton (as implemented, multiple instances would clobber each other).

Now I create a request class that uses my custom cache.

import requests_cache
from typing import Any, Dict

from lib import custom_cache

class Requester(object):

    def __init__(self):
        self._client = requests_cache.CachedSession(
            backend=custom_cache.FilesystemCache())

    def DoRequest(self, url: str) -> Dict[str, Any]:
        resp = self._client.get(url, headers=_GetHeader())
        body = resp.json()
        # The API I'm using always has a 'data' field in valid
        # responses, YMMV.
        if 'data' not in body:
            raise ValueError('Unexpected response: %s' % resp.text)
        return body

This reads auth info from environment variables:

def _GetHeader() -> Dict[str, str]:
    return {'Authorization': 'Bearer %s' % _GetBearerToken()}

def _GetBearerToken() -> str:
   bearer_token = os.getenv('bearer_token')
   if not bearer_token:
       raise RuntimeError('No bearer token found, try `source setup.env`')
   return bearer_token

Finally, I want to support rate limiting. I used the ratelimit package for this. ratelimit is based on the Twitter API, which rate limits on 15-minute intervals. So if I was hitting an endpoint that allowed 10 requests/minute (10*15 = 150 requests per 15 minutes) then I could write:

@ratelimit.sleep_and_retry
@ratelimit.limits(calls=150)
def DoApiCall(self, url) -> Dict[str, Any]:
    return self._requester.DoRequest(url)

This will block the program’s main thread if this function is called more frequently than the allowed rate limit (which may not be what you want, check the ratelimit docs for other options).

The downside of this implementation is that it still rate limits, even if you’re hitting the cache. You could get around this by checking the cache contents in Requester and then only conditionally calling DoApiCall, but this is left as an exercise for the reader 😉

Road to Thornmire

Yesterday, Andrew and I started working on a driveway for the undeveloped parcel of thorny, swampy woodland we bought during lockdown. We rented a chainsaw at an equipment rental place, where the guy asked if we had ever used one before. We had not. He showed us how to start it: open the choke, pull the string thing (technical term) vigorously a couple of times until you hear the motor almost catch. Then close the choke, one more vigorous pull and the engine catches. Easy.

We headed out to our woods a half-hour away, parked alongside the road, and determined a plan of attack. There’s an old stone wall that we wanted the driveway to run along, so we figured out the angle we needed to cut and tried to start the chainsaw. And failed. And failed. And failed. And then it smelled like gas we realized we had probably flooded the engine. And so there we were, sitting on the side of the road, Googling how to fix a flooded chainsaw engine. Then calling the rental place. Then finally driving back to the rental place, where they showed us how to get it going again and gave us updated instructions (pro tip: you shouldn’t open the choke at all if the saw is already warmed up).

When we got back to our property, the chainsaw started right up and I started cutting down saplings and hacking a path through the undergrowth. For every one minute of chopping, I had to stop the chainsaw, put it down, and rest, because it was so damn heavy. (It’s also freakin loud.) After we cut a narrow path about 30′ into the underbrush (of the ~1000′ we need to cut), the chain jumped the track. Some more Googling later, and we realized we didn’t have the hexwrenches we needed to get it back on.

This was all, of course, very frustrating and, in some ways, a huge waste of time. It certainly wasn’t how we were planning to spend the day. However, we learned a ton, so I’m counting it as not really a waste. We figured out a bunch of things that didn’t work and have a better idea of what to try next time (ear protection, bringing hexwrenches, rent a Brush Hog for the small stuff). My arms/shoulders/back are all noodles today, so we are going to be built by the end of this.

And machete-ing through the woods is pretty satisfying.

Optimizing resource allocation

Every year, I go to GenCon: a gaming conference where tens of thousands of nerds descend on Indianapolis to try out new board games, RPGs, and other assorted nerdery. Indianapolis is no stranger to huge conferences, but GenCon stretches the city to its limits. GenCon buys thousands of hotel rooms throughout the city and then doles them out by lottery to the attendees. The hotels closest to the conference center sell out immediately, then it gradually filters out to the further away/more expensive options. A day after GenCon’s housing portal opens, every hotel room in Indianapolis is booked.

Putting up with/hacking around this annoying system for a decade inspired me to create a theory of resource allocation that, while I can’t imagine is original, I’ve never heard anyone else talk about.

When you control a finite resource that a lot of people want, there are three groups that you should allocate it to:

  • The deserving. These are the people you think will best use the resource: artists and superfans and young people who want to go so bad they will wait in line all night. These people might need subsidies (or at least reasonably-priced options) to be able to partake. However, they are expected to materially improve the quality of the conference/neighborhood/magnet school.
  • A random lottery. You think you know what will make a great conference/neighborhood/magnet school, but no one really knows the secret sauce. If you think of the previous group as a garden, this group is the wildflowers.
  • The rich. These people will subsidize the other groups/the event itself. 

Basically all resource allocations can be broken down into some division between these three groups, and the interesting question is “what proportion should deserving vs. lucky vs. rich be?”  They all have different strength and weaknesses.  The rich might not add much of anything culturally.  The deserving may stultify into an old guard that prevents innovation. The randos might be useless.

Right now, GenCon is 99.9% random lottery, with a handful of slots for rich people to get preferential access to housing. This means that they are missing out on a lot of the extra money they could be making from their more well-heeled patrons. They are also excluding a lot of passionate fans getting rooms in preference to someone who is mostly there to hang out in the hot tub.

Covid vaccines are another interesting case. Who should get a vaccine? The breakdown we’ve gone with is the deserving (front line workers, the elderly, etc.) then random. If vaccine rollout is blocked on money, what if providing a rich person with a shot a few weeks early could fund ten shots for front-line workers?  A thousand? A million? If not, is there any number where you’d let someone you didn’t like get a shot early for the sake of humanity?

The problem with letting the rich pay their way in is that it feels so unfair. Lotteries feel fair. Letting in deserving people also feels fair (albeit subject to how deserving-ness is measured). In contrast, it’s infuriating when someone who already has enough gets more benefits. However, if you cut off the ways wealthy people can access a resource, they’re not going to just shrug and give up. They’ll just go outside the system: buy the most desirable hotel rooms a year in advance, send their kid to an expensive private school, or use their connections to get a vaccine.  If we build ways to serve the wealthy in the system, the whole system can benefit from their resources. 

Systems often default to a 0% allocation for the rich, because it feels fairer. However, I think it’s not usually the optimal choice, it’s just the easiest one to make a case for.

Intro to Altair

Altair is a beautiful graphing library for Python. I’ve been using it a lot recently, but it was a real struggle to get started with. Here’s the guide I wish I’d had.

I’m going to be using https://colab.research.google.com/, but this should work fine in any other interactive notebook you want to use.

Getting started

First, you’re going to want to import numpy and pandas as well as altair. They’ll make working with data easier.

import altair as altimport numpy as np
import pandas as pd

To start with, we’ll generate a random dataframe and graph it using pandas. It’ll use matplotlib and look pretty ugly:

Instead, if you use altair:

Not much prettier, but it’s a start. There are several important things to note:

  • There are three separate parts to creating this graph:
    1. Passing in the data you’re using (the alt.Chart call).
    2. What kind of marks you want. There are dozens of options: dots, stacks, pies, maps, etc. Line is a nice simple one to start with.
    3. What x and y should be. These should be the names of columns in your dataframe.
  • From point #3 above: Altair does not understand your indexes. You have to reset_index() on your dataframe before you pass it to Altair, otherwise you can’t access the index values. (The index becomes a column named “index” above.)
  • The API is designed to chain calls, each building up more graph configuration and returning a Chart object. The default behavior for showing a returned chart is displaying it.

Using this slightly more complicated configuration, you get a more attractive graph that you can do more with. However, as you try to do more with Altair, it just feels… not quite right. And it took me a while to figure out why.

Why Altair’s API feels weird

Why doesn’t Altair let you pass in a column (instead of a column name)? Why is typing and aggregation done in strings? Why is the API so weird in general?

The reason (I think) is that Altair is a thin wrapper around Vega, which is a JavaScript graphing library. Thus, if you take the code above and call to_json(), you can get the Vega config (a JSON object) for the graph:

chart = alt.Chart(df.reset_index()).mark_line().encode(
    x='index',
    y='val'
)
print(chart.to_json())
{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
  "config": {
    "view": {
      "continuousHeight": 300,
      "continuousWidth": 400
    }
  },
  "data": {
    "name": "data-54155f6e9cef9af445e6523406ab9d2b"
  },
  "datasets": {
    "data-54155f6e9cef9af445e6523406ab9d2b": [
      {
        "index": 0,
        "val": 0.772999594224295
      },
      {
        "index": 1,
        "val": 0.6175666167357753
      },
      {
        "index": 2,
        "val": 0.824746009472559
      },
      {
        "index": 3,
        "val": 0.23636915023034855
      },
      {
        "index": 4,
        "val": 0.730579649676023
      },
      {
        "index": 5,
        "val": 0.507522783979701
      },
      {
        "index": 6,
        "val": 0.6662601853327993
      },
      {
        "index": 7,
        "val": 0.39232102729533436
      },
      {
        "index": 8,
        "val": 0.9814526591403565
      },
      {
        "index": 9,
        "val": 0.6932117440802663
      }
    ]
  },
  "encoding": {
    "x": {
      "field": "index",
      "type": "quantitative"
    },
    "y": {
      "field": "val",
      "type": "quantitative"
    }
  },
  "mark": "line"
}

The cool thing about Vega charts is that they are self-contained, so you can copy-paste that info into the online Vega chart editor and see it.

In general, I’ve found there are slightly confusing Python equivalents to everything you can do in Vega. But sometimes I’ve run into a feature that isn’t yet supported in Python and had to drop into JS.

Lipstick on the pig

We can give everything on this chart a nice, human-readable name by passing a title to the constructor, x, and y fields:

alt.Chart(df.reset_index(), title='Spring Rainfall').mark_line().encode(
    x=alt.X('index', title='Day'),
    y=alt.Y('val', title='Inches of rainfall'),
)

You can also use custom colors and such, but the last graph I made someone asked why it was puke-colored, so that’s left as an exercise to the reader.

Poking things

The real strength of Altair, I think, is how easy it is to make interactive graphs. Ready? Add .interactive().

alt.Chart(df.reset_index(), title='Spring Rainfall').mark_line().encode(
    x=alt.X('index', title='Day'),
    y=alt.Y('val', title='Inches of rainfall'),
).interactive()

Now your graph is zoomable and scrollable.

However, you might want to give more information. In this totally made up example, suppose we wanted to show who had collected each rainwater measurement. Let’s add that info to the dataframe, first:

rangers = (
    pd.Series(['Rick', 'Scarlett', 'Boomer'])
    .sample(10, replace=True)
    .reset_index(drop=True))
df = df.assign(ranger=rangers)

Now we’ll add tooltips to our chart:

alt.Chart(df.reset_index(), title='Spring Rainfall').mark_line().encode(
    x=alt.X('index', title='Day'),
    y=alt.Y('val', title='Inches of rainfall'),
    tooltip='ranger',
).interactive()

Which results in:

Pretty nifty! Give it a try yourself in a colab or the Vega editor, and let me know what you think!