Intro to Altair

Altair is a beautiful graphing library for Python. I’ve been using it a lot recently, but it was a real struggle to get started with. Here’s the guide I wish I’d had.

I’m going to be using https://colab.research.google.com/, but this should work fine in any other interactive notebook you want to use.

Getting started

First, you’re going to want to import numpy and pandas as well as altair. They’ll make working with data easier.

import altair as altimport numpy as np
import pandas as pd

To start with, we’ll generate a random dataframe and graph it using pandas. It’ll use matplotlib and look pretty ugly:

Instead, if you use altair:

Not much prettier, but it’s a start. There are several important things to note:

  • There are three separate parts to creating this graph:
    1. Passing in the data you’re using (the alt.Chart call).
    2. What kind of marks you want. There are dozens of options: dots, stacks, pies, maps, etc. Line is a nice simple one to start with.
    3. What x and y should be. These should be the names of columns in your dataframe.
  • From point #3 above: Altair does not understand your indexes. You have to reset_index() on your dataframe before you pass it to Altair, otherwise you can’t access the index values. (The index becomes a column named “index” above.)
  • The API is designed to chain calls, each building up more graph configuration and returning a Chart object. The default behavior for showing a returned chart is displaying it.

Using this slightly more complicated configuration, you get a more attractive graph that you can do more with. However, as you try to do more with Altair, it just feels… not quite right. And it took me a while to figure out why.

Why Altair’s API feels weird

Why doesn’t Altair let you pass in a column (instead of a column name)? Why is typing and aggregation done in strings? Why is the API so weird in general?

The reason (I think) is that Altair is a thin wrapper around Vega, which is a JavaScript graphing library. Thus, if you take the code above and call to_json(), you can get the Vega config (a JSON object) for the graph:

chart = alt.Chart(df.reset_index()).mark_line().encode(
    x='index',
    y='val'
)
print(chart.to_json())
{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
  "config": {
    "view": {
      "continuousHeight": 300,
      "continuousWidth": 400
    }
  },
  "data": {
    "name": "data-54155f6e9cef9af445e6523406ab9d2b"
  },
  "datasets": {
    "data-54155f6e9cef9af445e6523406ab9d2b": [
      {
        "index": 0,
        "val": 0.772999594224295
      },
      {
        "index": 1,
        "val": 0.6175666167357753
      },
      {
        "index": 2,
        "val": 0.824746009472559
      },
      {
        "index": 3,
        "val": 0.23636915023034855
      },
      {
        "index": 4,
        "val": 0.730579649676023
      },
      {
        "index": 5,
        "val": 0.507522783979701
      },
      {
        "index": 6,
        "val": 0.6662601853327993
      },
      {
        "index": 7,
        "val": 0.39232102729533436
      },
      {
        "index": 8,
        "val": 0.9814526591403565
      },
      {
        "index": 9,
        "val": 0.6932117440802663
      }
    ]
  },
  "encoding": {
    "x": {
      "field": "index",
      "type": "quantitative"
    },
    "y": {
      "field": "val",
      "type": "quantitative"
    }
  },
  "mark": "line"
}

The cool thing about Vega charts is that they are self-contained, so you can copy-paste that info into the online Vega chart editor and see it.

In general, I’ve found there are slightly confusing Python equivalents to everything you can do in Vega. But sometimes I’ve run into a feature that isn’t yet supported in Python and had to drop into JS.

Lipstick on the pig

We can give everything on this chart a nice, human-readable name by passing a title to the constructor, x, and y fields:

alt.Chart(df.reset_index(), title='Spring Rainfall').mark_line().encode(
    x=alt.X('index', title='Day'),
    y=alt.Y('val', title='Inches of rainfall'),
)

You can also use custom colors and such, but the last graph I made someone asked why it was puke-colored, so that’s left as an exercise to the reader.

Poking things

The real strength of Altair, I think, is how easy it is to make interactive graphs. Ready? Add .interactive().

alt.Chart(df.reset_index(), title='Spring Rainfall').mark_line().encode(
    x=alt.X('index', title='Day'),
    y=alt.Y('val', title='Inches of rainfall'),
).interactive()

Now your graph is zoomable and scrollable.

However, you might want to give more information. In this totally made up example, suppose we wanted to show who had collected each rainwater measurement. Let’s add that info to the dataframe, first:

rangers = (
    pd.Series(['Rick', 'Scarlett', 'Boomer'])
    .sample(10, replace=True)
    .reset_index(drop=True))
df = df.assign(ranger=rangers)

Now we’ll add tooltips to our chart:

alt.Chart(df.reset_index(), title='Spring Rainfall').mark_line().encode(
    x=alt.X('index', title='Day'),
    y=alt.Y('val', title='Inches of rainfall'),
    tooltip='ranger',
).interactive()

Which results in:

Pretty nifty! Give it a try yourself in a colab or the Vega editor, and let me know what you think!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: