## Part 4: compounding returns

As Einstein (maybe) said, compounding interest is the eighth wonder of the world. In the previous posts in this series, we used a very linear benchmark: 4% off of the amount contributed forever. However, this is a weird way to benchmark results. Imagine you and friend (call him Baelish) are both investing and comparing results. You start off by investor \$1k in August, 2020 and hope to have \$1040 in one year. One year later, you have succeeded and have \$1040. You tell your friend Peter Baelish about your investment and he puts \$1040 in the market and sets a goal of making 4%: he’ll try to have \$1081.60 by next year. However, if you stick with the model used in the previous post, your goal for the year will only be \$1080 by 2022, leaving you a whole \$1.60 poorer than your friend. (Arguably. We are talking about the benchmark, not the actual amount of money you’re earning. However, this whole series could probably be titled “It’s easy to mislead yourself that you’re doing better than you are,” so it’s fitting the theme)

So instead of a fixed amount, we want to continuous compound our benchmark. Because each value in the benchmark is determined based on the previous one, this is not Pandas-friendly. We’ll have to drop into “normal Python” to compute a more accurate benchmark.

Let’s stick with SPY, making once-a-month \$1k contributions with the goal of a 4% annual return. The only difference is that we now want a 4% compounding annual return.

There are 253 trading days in a year and the formula for continuously compounding returns is:

Pt = P0 * ert

That is, the amount you have at time t (Pt) is the amount you have at the beginning of the period (P0), multiplied by the constant e raised to the rate (4%) times the amount of time. For example, if we contribute \$1k and wait 1 year, we’ll want to have:

P1 = \$1000 * e4% * 1 year = \$1000 * e.04 * 1 = \$1040.81

We get an extra \$.81 cents off of the continuous compounding.

However, we want to be able to graph this benchmark by the day, so our t isn’t 1, it’s 1 year / 253 days = .00395. Plugging this in, each day we should have e.00016 more than the previous day. Using the code from previous posts, this gives us:

```import math
benchmark = []
idx = 0
for day in df.index:
contribution = 0
if day in my_portfolio.index:
contribution = my_portfolio.loc[day].total_cost_basis
# The first time through, benchmark is [] (False), so it just adds
# todays_benchmark. Each subsequent iteration it uses the previous row.
if benchmark:
today = benchmark[-1] * math.exp(.004 * (1/253)) + contribution
else:
today = contribution
benchmark.append(today)
df.assign(benchmark=benchmark)[['total', 'benchmark']].plot()
```

This is, honestly, extremely similar to the original chart. After all, at the end of 1 year we only expect them to have diverged by ~80 cents per \$1000! However, over time this number grows, which is clearer if we subtract our cost basis from the benchmark:

```(df.assign(cost_basis=my_portfolio.total_cost_basis.cumsum())
.fillna(method='ffill')
.assign(performance=lambda x: x.benchmark - x.cost_basis)
.performance
.plot())
```

The nice thing about investing is that your money works “while you sleep” (depending on timezone). Tightening up the benchmark line by using a compounding benchmark lets us hold our money accountable when reviewing performance.

## A multi-stock portfolio: comparing returns

The last posts have discussed portfolio performance with a very boring portfolio: one stock! Things get more interesting when there’s more than one stock to compare.

Let’s say we have a two stock portfolio now: SPY (as before) and DOCU (Docusign). We’ll combine the two tickers into one dataframe:

```spy = pandas_datareader.data.DataReader('SPY', 'yahoo', start='2020-05-01', end='2020-07-31')
df = (
pd.concat([spy.assign(ticker='SPY'), docu.assign(ticker='DOCU')])
.reset_index()
.set_index(['ticker', 'Date'])
.sort_index())
```

Now that things are getting more complicated, I’m going to switch to graphing with Altair, which I’ve found to be a more sophisticated (and prettier) alternative to matplotlib (which is what df.plot gives you).

```import altair as alt
alt.Chart(df.reset_index()).mark_line().encode(
x='Date',
y='Close',
color='ticker',
)
```

Both stocks are doing well and it looks like DOCU is doing a bit better. So what would happen if we had invested \$1000 in each of these stocks each month? We’ll create a random portfolio in a similar way to the first post, but notice that now we have to group by ticker symbol when we want ticker-specific info.

```import numpy as np
# Get a random price on the first of the month _for each ticker_.
first_of_the_month = df[df.index.get_level_values(1).day == 1]
my_portfolio = (
first_of_the_month
.apply(lambda x: np.random.randint(x.Low, x.High), axis=1)
.rename('share_cost')
.to_frame()
.assign(
# Add a cost basis column to show \$1k/month.
cost_basis=1000,
shares_purchased=lambda x: 1000/x.share_cost,
)
# Here's where the multiple tickers becomes interesting. We only want to
# sum over rows for the same stock, not all rows.
.assign(
cost_basis=lambda x: x.groupby('ticker').cost_basis.cumsum(),
total_shares=lambda x: x.groupby('ticker').shares_purchased.cumsum(),
)
)
my_portfolio
```

Add my portfolio back into the main dataframe:

```df = (
df
.assign(
shares_owned=my_portfolio.total_shares,
cost_basis=my_portfolio.cost_basis,
)
.fillna(method='ffill')
.assign(total_value=lambda x: x.shares_owned * x.Close)
)
alt.Chart(df.reset_index()).mark_line().encode(
x='Date',
y='total_value',
color='ticker',
)
```

This makes it nicely clear that, while SPY grew our investment by a couple hundred over this period, DOCU trounced it, nearly returning \$2k on the same investment.

…or did it? After all, these gains aren’t realized. You don’t actually have \$4917.68 in your wallet, you have shares that are worth that on the market. And to get that into your wallet, you need to pay taxes. So, a somewhat less exuberant way of looking at this is to include taxes in our calculations.

Everyone has a different tax situation, but since most people reading this are probably developers, let’s assume you’re in a pretty high tax bracket (40% income tax, 20% long-term cap gains). For the first year we hold this stock, profits will be taxed at 40% if we sell. After that, we could decrease the taxes to 20%, but we only have 3 months here, so let’s say we’re still in short-term gains territory.

```df = df.assign(
# Capital gains
gain=lambda x: x.total_value - x.cost_basis,
taxes=lambda x: x.gain * .4,
realizable_value=lambda x: x.total_value - x.taxes,
)
```

If the gain is less than zero it’s a loss and could be used to cancel out gains, but I’m not handling that here (technically I don’t have to because there aren’t any loses, but I hear that sometimes the stock market goes down).

If we subtract taxes and our original investment and pretend we sold on the last day on the chart (July 31), we get:

```(df.realizable_value - df.cost_basis).groupby('ticker').last()
```
```ticker
DOCU 1150.608974
SPY 166.252482
dtype: float64```

DOCU still does amazing (although less so), SPY’s returns are starting to look fairly ho-hum. Although less dramatic, I like this way of looking at returns since it’s more reflective of what you actually end up with. And a good incentive to let stocks marinate for a year.

## Show me the money: tracking returns

Last post went over building a very simple portfolio tracker to show a portfolio’s performance over time. However, it would be easy to trick myself: “My portfolio value is going up over time, I’m doing great!” But I’m also adding money to my portfolio over time, so that money shouldn’t “count” in terms of performance. I really want to be able to see the difference between having stashed the money in my mattress vs. put it into the market. We’ll figure out how to graph that in this post.

Last post, we ended with this chart:

Let’s say that we don’t care about cash invested, only profits and losses. To see that, we can subtract out the cost basis and see what the raw performance looks like:

```df = df.assign(total_profit=df.total_value - df.cash_pos)
_ = df.total_profit.plot()
```

This gives a pretty nice breakdown of how I’m doing vs. keeping money in a mattress. However, how am I doing vs. my goals? Say my goal is to return at least 4%/year, but I can’t just draw a line from 0% January 1 to 4% December 31 because the money wasn’t invested in a lump sum. Money that’s been sitting there for a year should have yielded 4%, but money that was put in 1 month ago should yield 1/12 of that.

I think the easiest way to model this is to figure out how much our cost basis (`cash_pos`) should return per day, and then take a cumulative sum as the days pass to get the total expected return for any given day. (This isn’t perfectly accurate, but good enough for my purposes.)

```benchmark_percent = .04  # 4%
trading_days_per_year = 253  # Market isn't open 365 days/year
df = df.assign(benchmark=np.cumsum(df.cash_pos * daily_return))
_ = df[['total_profit', 'benchmark']].plot()
```

We’re doing, uh, pretty good vs. the benchmark!

This is a nice way to handle benchmarking because it’ll work even if the portfolio isn’t quite such a “toy” example: if you’re investing irregular amounts at irregular intervals, this will still show you (roughly) the correct expected return over time.

## Building a (very simple) portfolio tracker with pandas

All of these graphs were created in Colab.

I’ve actually never found a commercial product that does everything I want, so I figured I’d build one up in a series of blog posts. We’ll see how many I get through! 🧵👇

First, we’ll get SPY’s stock history with `pandas_datareader`.

```import pandas_datareader
'SPY', 'yahoo', start='2020-05-01')
```

This returns a dataframe that looks like this (as of this writing):

If we plot the `High` and `Low` columns (`_ = df[['High', 'Low']].plot()`), we get:

Now let’s throw in an actual portfolio. Let’s say I bought stock on the first of each month at some random point between the high and low:

```import numpy as np
# Get the prices at the first day of each month.
first_of_the_month = df[df.index.day == 1]
my_portfolio = (
first_of_the_month
# Choose a random number between the day's high and low.
.apply(lambda x: np.random.randint(x.Low, x.High), axis=1)
.rename('cost_basis'))
my_portfolio
```
```Date
2020-05-01 282
2020-06-01 304
2020-07-01 310
Name: cost_basis, dtype: int64```

Let’s say that we have \$1000 to invest each month (and our broker supports buying partial shares). So we can invest the following amounts each month:

```my_portfolio = my_portfolio.assign(
# Get the number of shares purchased this month.
shares_purchased=1000/my_portfolio.cost_basis,
# Get all shares I own so far.
total_shares=lambda x: np.cumsum(x.shares_purchased),
)
my_portfolio
```

Finally, we want to add that back into the ticker info and plot it (let’s go with “Close” as the value for each day):

```df = (
df
# Combine my_portfolio with the stock ticker.
.assign(shares_owned=my_portfolio.total_shares)
# Fill in the # of shares owned per day.
.fillna(method='ffill')
# Portfolio value = number of shares * price
.assign(total_shares=lambda x: x.shares_owned * x.Close)
)
_ = df.total_value.plot()
```

Looks a bit less impressive than the S&P 500 graph. Let’s compare it to keeping our money in cash:

```my_portfolio = my_portfolio.assign(cash_pos=range(1000, 4000, 1000))
df = df.assign(cash_pos=my_portfolio.cash_pos).fillna(method='ffill')
_ = df[['total_value', 'cash_pos']].plot()
```

We can see that our portfolio is doing better than an all-cash position almost immediately. However, there are a couple of things I’d like to add:

• A real portfolio might have more than one stock. I want overall performance, plus broken out in different “sub-portfolios.”
• At least for me, it’s not good enough to beat cash, I want to beat a benchmark (e.g., return at least 4%/year).

Let me know any other features you wish your broker supported in the comments and I may (or may not 😅) implement them.

On the other hand, this is a quote I’ve been thinking about a lot as I strategize about my portfolio:

We are all at a wonderful ball where the champagne sparkles in every glass and soft laughter falls upon the summer air. We know, by the rules, that at some moment, the Black Horseman will come shattering through the great terrace doors, wreaking vengeance and scattering the survivors. Those who leave early are saved, but the ball is so splendid no one wants to leave while there is still time, so that everyone keeps asking “What time is it? What time is it?” But none of the clocks have any hands.

## The stock market is going down

Not the stocks, mind you, but the market itself. There are less than 4,000 companies listed, and new companies have less and less appetite for going public. Conversely, there are over 8,000 private-equity-backed companies in the US, and growing.

There are a couple of problems with the stock market mouldering. One is that this is where most of America keeps their retirement. Common wisdom is that if you invest in a nice index fund, then your money should grow apace to the US economy. This works until the economy grows without the market.

Suppose the public market continues to dwindle. New companies stay private, so the companies on the market get older and older. Old companies have a tendency to grow more slowly and, eventually, go out-of-business. Without new blood, I’d guess the market would ossify around a few large surviving stocks.

Without the possibility of explosive growth in the public markets, any money that could go into private equity would (and is). Unfortunately, at the moment, private equity is only open to accredited investors (i.e., the already well-off) so most of America won’t be able to participate at all.

It’s important that everyone in the US:

1. Has the opportunity to benefit from economic growth and
2. Able to participate in a way where they won’t get fleeced.

As I understand it, the SEC doesn’t allow the middle nor lower class to invest in private companies because they don’t want people who are barely making ends meet to be hoodwinked out of all of their money.

Why is investing private equity riskier than public companies? Small companies tend to be riskier, but there are plenty of private companies that are larger than public ones. If I understand correctly, most of it is related public companies being better-regulated than private ones.

My knee-jerk reaction is, then: how can we get more companies to go public? And maybe that’s a good goal. However, why are we in the pocket of big-market here?

The goal is to allow more people to benefit from private companies, not bail out banks/NYSE/NASDAQ. So one possible solution is making the private market more liquid while adding protections for investors.

Making the private markets more liquid is easy: just let anyone buy and sell shares without being accredited and without needing the company’s approval.

However, given that something like 50% of VC firms don’t even return the capital they invest and (almost) everyone there is doing this professionally and on the up-and-up, making sure people aren’t fleeced seems more difficult. My strawman proposal is to tighten up regulations on private companies based on top-of-line revenue. Making \$1M/year? Great, you need some sort of S-1-like prospectus for investors. Making \$10M/year? Great, you can’t just spout off about “funding secured” on Twitter. Making \$50M/year? Great, you basically have to follow all of the rules public companies do. I dunno, it’s a first draft. But I think it’s important the SEC gets on making private equity more accessible before the middle and lower classes are left in the dust.

## Losing money sucks – the mathematics of loss aversion

There’s a lot of research on loss aversion: how bad people feel when they lose \$50 vs. how good they feel good about gaining \$50. This research is kind of taking an absolute value of emotion, positive or negative. We can represent this with emoji equations:

• 😀 == 😫 (large-magnitude happiness or sadness)
• 🙂 == 🙁 (small-magnitude happiness or sadness)
• 😫 > 🙂 > 😐 (ordering magnitude of emotion)

What research has shown is that, when someone loses \$X and feels 😫 (large-magnitude), if they gain \$X they feel 🙂 (small-magnitude). Gaining money doesn’t make them as happy as losing money makes them sad!

My hypothesis is that people have an intuitive feeling about the math behind these experiments and the results make a lot of sense if you look at percentages instead of values.

For example, suppose you invest \$100 and your investment goes up to \$150. Holding that money in your hand, \$100 is a distant memory, when you had 33% less than you currently have. (\$150 – (33%*\$150) = \$100)

Now let’s say your investment doesn’t do well and you’re standing there with \$50, feeling sad. Now you have 50% of what you started with.

My hypothesis is that loss aversion is really an intuition about the difference between 33% and 50%. My guess is that the emoji magnitude would be (roughly) equal if you gained and lost the same percentage. E.g., these would have equal magnitudes of satisfaction:

If we extend this out to an infinite number of plans, we can see it breaks down at the ends. I don’t think losing 80% would feel the same magnitude as making 4x, but it also feels hard to imagine. Losing 80% of a meaningful amount of money would feel terrible, but would it feel as terrible as 4x-ing my money feels good? 10x-ing? 10x feels more significant than 4x, but 4x-5x? I’m not sure I’d be materially cockier.

Anyway, I assume the ratios are a bit different for different people. But it always struck me with the loss aversion studies that most people understand that going from \$1 to \$2 doubles your money, but going from \$2 to \$3 does less.

## The surprisingly complex math behind startup equity and taxes

Taxes for employees at startups are weird and can vastly change the amount you make.  To illustrate why, let’s take a simple example.

Suppose we have a group of early employees at a startup, we’ll call them the Unicorn Inc. Mafia.  They’re all fresh out of college and managed to get through it without any debt, so they each have a net worth of \$0. They all join the same day and are given the same equity package: \$10k in stock options with a strike prices of \$1 (so, 10,000 common stock). We’ll take four members of the mafia, each with a different strategy.

Name Description Net worth
Alice Exercise very early, before price changes \$0
Bob Exercise somewhat early \$0
Carol Wait until liquid to exercise, wait until long-term capital gains apply \$0
David Wait until liquid to exercise, sell immediately \$0

From here on out, keep in mind that this could be the end of the story. The company could always fold, leaving everyone with zero or, if they’ve exercised, a negative net worth.

However, suppose the company is doing well and lets the employees know that they’re going out to raise a \$40M round.  Alice exercises her options before the round happens.  This means that she has to pay for them, so she’s in the hole \$10k.  Now if anything goes wrong, she’s out \$10,000.

Once the company raises the round, the stock is worth \$5/share.  Unfortunately, Bob’s significant other got a job across the country, so he has to find a new job. He feels like the company is going places, though, so he wants to collect his equity before he goes. He exercises his options. Because he is buying his stock for \$1 and it is now worth \$5, the IRS says that he just “made” \$4. So he has to pay normal income taxes on that \$40,000. To keep things simple, let’s say everyone’s tax rate is 25%. So now he’s paid \$10k for the stock and \$10k for taxes:

Name Description Exercise Income taxes Net worth
Alice Exercise very early, before price changes (\$10,000) 0 (\$10,000)
Bob Exercise somewhat early (\$10,000) 25%*\$40,000 -> (\$10,000) (\$20,000)
Carol Wait until liquid to exercise, wait until long-term capital gains apply \$0 \$0 \$0
David Wait until liquid to exercise, sell immediately \$0 \$0 \$0

So Bob’s out \$20k if the company goes under (ouch!).

However, luckily for Alice & Bob, over the next several years, the company continues to grow and raise money. Finally, the company goes public for \$100/share. Wow! Once the lockup period expires, everyone eventually sells (somehow it’s still exactly at the IPO price) and makes \$1M. Our final shakeout looks like:

Name Description Exercise Income taxes Short-term capital gains Long-term capital gains Sell price Net worth
Alice Exercise very early, before price changes (\$10,000) 0 0 0 \$1,000,000 \$990,000
Bob Exercise somewhat early (\$10,000) 25%*\$40,000 -> (\$10,000) 0 20%*\$990,000 -> (\$198,000) \$1,000,000 \$782,000
Carol Wait until liquid to exercise, wait until long-term capital gains apply (\$10,000) 25%*\$990,000 -> (\$247,500) 0 20%*\$990,000 -> (\$198,000) \$1,000,000 \$544,500
David Wait until liquid to exercise, sell immediately (\$10,000) 25%*\$990,000 -> (\$247,500) 25%*\$990,000 -> (\$247,500) 0 \$1,000,000 \$495,000

There are, uh, a couple of different outcomes. Alice obviously has an accountant in the family: she avoided paying any taxes at all! How is this possible? First, she exercised his options before the price changed, so she didn’t have to pay any taxes on exercise. Then she held them long enough to qualify for long-term capital gains. However, she didn’t even have to pay those! It turns out that, if you own stock in a startup before it has \$50M in assets, long-term capital gains up to \$10M are tax-free (Google “QSBS” for details). However, Alice is also taking on more risk for longer than anyone else: most startups don’t have outcomes like this and she’d have just been out \$10,000 if they had gone out of business.

Obviously there are a ton of simplifying assumptions (stock prices never change! Everyone has the same tax rate, which happens to be one that make numbers easy!). However, I wish someone had told me about all this ~10 years ago, so putting this out there in the hopes that it’ll help someone else.

## Goals for 2019

Followup from last year’s post:

Programming: I really dug into Pandas and not I feel pretty comfy with Python & Pandas. This year, I’d like to focus a bit more on stats and machine learning.

Work-life balance: I am freakin obsessed with my job, which is terrific. But I also feel comfy taking days off and leaving at a reasonable time to walk Domino. So, solid win on that front. I’d like to keep that going this year.

Had many fun adventures with Andrew. We’ve kind of made that a priority this year and I’d like to continue doing that.

Baking: made all sorts of cool things. Highlights were Georgian cheese bread (which tasted like crack but sat like a bag of cement), apple cider donut muffins (they’re muffins, so they’re healthy!), and cinnamon pull-apart bread (I killed the yeast and they were still delicious).

Boxing: started going to sparring regularly, which is incredibly fun and hard.  However, one of the women I spar with is several inches taller than me, solid muscle, and just got down to 30lbs lighter than me for a fight. While I don’t feel the need to be that skinny, I’m thinking maybe I should lose a couple pounds next year. Speaking of:

Eating better: we ate weekly salads this year (thanks to Andrew!) but I wouldn’t say we ate more healthily overall. I think we have a solid plan for next year, though.

Finances: not sure if I saved more than usual from my paychecks, but the MongoDB IPO was a nice bump! Dropping this from goals for next year.

Travel: failed miserably, had to go to California many times for work. However, I think I’ve figured out the basics of air travel to the extent that I no longer feel totally screwed over every time I fly.  I even got a free upgrade last trip! Dropping this from goals for next year, I’m going to have to continue to travel a bunch. And playing Assassin’s Creed: Odyssey is making me want to visit Greece.

Read a couple of books: success! My favorite books this year were Spinning Silver and Bad Blood. Dropping this from goals for next year.

Writing: despite feeling like I wasn’t writing anything, actually did manage to hit my goals.  Also, started writing on Medium, since it’s a GV investment.  I’d like to continue with attempting 6 posts next year.

Went camping and hiking a couple times and it was very nice, aside from the one time we got caught in the rain several miles from the trailhead.  But that was fun too, just not at the moment.

Home improvement: fixed most of our toilet’s innards and it works so much better now.  Brings me joy with every flush!

Finally, it was fun to think about this list and what I had done so far all year, looking forward to following up around New Year’s.  So, hopefully it’ll be the same next year.  Happy 2019!

## AST Financial: the dumpster fire of a company

As an employee at MongoDB for several years, I had a bunch of shares that MongoDB was holding for me when it went public which I had to get to my brokerage account to actually sell. It turns out that the company can’t just hand you your shares, they have to go through a transfer agent who keeps the record of who owns what shares of a company. Then the transfer agent transfers those shares to a brokerage.

So several months ago, I got an email that MongoDB would be transferring my shares to AST Financial (a transfer agent). I got an email from them with some forms to return. I heard from international friends that the email address they gave to people outside the US didn’t actually work, but luckily mine worked fine.

I received a somewhat confusing snail mail from AST with an account summary, which added together my first and last stock grants and broke those out into a separate section than the total, for reasons that are completely unclear to me.

I figured that maybe their website would have a more detailed breakdown, so I tried to activate my account. Their password form was pretty badly done but whatever, banks always have crappy logins. Then we got to the “security questions,” which either didn’t actually apply to me (‘What is your oldest sibling’s middle name?’) or might as well have been yes/no questions (‘What color was your first dog?’ Black. It’s the most common damn color for dogs in the world.) Then I agreed to their terms and services and… got an error page. I tried going back to the homepage, saying “Sign up” again, and it took me directly to the terms & services. I nervously accepted, again. My account appeared!

Note that every time I have logged in subsequently, the site has presented an error page. Then I go back to the home page, click “Log in” again, and it takes me to my account.

I decided it was time to transfer my shares to my usual brokerage account. So I called AST, navigated their phone maze, and waited for someone to pick up. And waited, and waited. Eventually, the robot said that they were experiencing heavy call volumes and asked me to leave a message.

I left a message and a support person called a few hours later. Yay! I explained what I wanted and she asked to put me on hold while she looked up my account. Then she disconnected me.

After I finished raging, I called them back. After an hour of waiting for someone to pick up, I gave up and hung up. I sent the original person who had responded to emailed-in forms, asking her to please have someone contact me.

I received a response from help@astfinancial.com: the original email they sent with the forms to fill out and send back.

Luckily, the MongoDB alumni had a group where everyone was bitching and, to a lesser extent, offering advice. Apparently AST actually had an online conversion process: under Account Holdings -> Account Profile, which of course takes you to the General Account Information page, which (of course) is where you transfer shares. I clicked on the button to convert my shares and… got a page that said “ERROR. An error occurred while processing your request.” I tried going to the previous page (which of course didn’t work, back button just took me to the error page again and history was unhelpfully a zillion pages with the title “AST” so I manually put in the URL. This got me to the conversion confirmation page, where it asked if I wanted to submit to convert all shares… with a universal “go back” symbol on the button.

Playing with fire, I entered my total number of shares and pressed submit. It gave me a confirmation page… with a submit button. So I submitted again, and finally got to the real confirmation page.

I didn’t want to publish this before I got extricated from ever having to deal with them again, but now that all my shares are safely out of their hands: AST is the worst. Anyone working on a blockchain-backed transfer agent?

## Thinking with Pandas

You can see and run all of my work for this blog post at Colabratory.

Pandas is a Python library for manipulating data. Wrapping my head around it took a while: I’ve been using it for ~6 months and I’m still learning how to use it “right.” It uses all kinds of syntactic sugar to optimize working with vectors and matrices instead of scalars. This makes working with Pandas very different than working with “vanilla” Python.

For example, let’s say you wanted to get a bunch of random dice rolls for playing Dungeons and Dragons. D&D uses 20-sided dice, so in normal Python, you’d probably do something like:

```rolls = [random(1, 21) for i in range(0, 10000)]
```

In Pandas, it would be something like:

```rolls = pd.DataFrame(np.random.randint(1, 21, size=(10000, 1)), columns=['roll'])
```

In D&D, rolling a 20 on the die is special and called a “critical hit.” It usually does good things for the player. If we iterate through rolls seeing how many critical hits we have in vanilla Python, it’s pretty fast:

```%%timeit
count = 0
for roll in rolls:
if roll == 20:
count += 1

# Output:
1000 loops, best of 3: 267 µs per loop
```

If we do the same naive approach with Pandas, it’s, uh, slower:

```%%timeit
count = 0
for roll in rolls.iterrows():
count += (roll[1] == 20)
count

# Output:
1 loop, best of 3: 3.12 s per loop
```

That’s over 10,000x slower (267 µs -> 3.12 seconds). Like, they-must-have-put-a-sleep()-in-iterrows()-to-discourage-you-from-using-it slow.

So why use Pandas? Because it isn’t slow if you do it the “Pandas way”:

```%%timeit
(rolls.roll == 20).sum()

# Output:
1000 loops, best of 3: 341 µs per loop
```

Nice! Only 1.3x slower than vanilla Python. Also, notice the syntactic sugar: you can pretend that the vector is a single number, comparing it to a scalar. If you look at `(rolls.roll == 20)`, it’s a series of booleans:

```(rolls.roll == 20).head()
0    False
1    False
2    False
3    True
4    False
```

When you take the sum() of that series, False is converted to 0 and True to 1, so the sum is the number of True elements.

## Modifying some elements of a vector

If you’re attacking and your roll (as calculated above) is greater than the defender’s armor class (say, 14), then you roll for damage. Suppose you do 2d6 damage on a hit. If your attack roll is greater than or equal to 14, then you do 2d6 damage, otherwise the blow glances off their armor and you do 0 damage.

With vanilla Python, this would look something like:

```for roll in rolls:
if roll >= 14:
damage = roll_damage()
else:
damage = 0
```

However, with Pandas, we shouldn’t loop through the rows. Instead, we’ll use a mask. Like a theater mask, a Pandas mask is an opaque structure that you “punch holes” in wherever you want operations to happen. Then you apply the mask to your Pandas dataframe and apply the operation: only the entries where there are “holes” will get the operation applied.

First we create the mask with the criteria we want to update:

```mask = rolls.roll >= 14
0    False
1    False
2    False
3    True
4    True
Name: roll, dtype: bool
```

Now we want to:

1. Grab all the hits (wherever the mask is `True`).
2. Generate random numbers for each hit (equivalent to rolling 2d6).
3. Merge those hits back into the `rolls` dataframe.

First we’ll create a series that tracks all of the hits. We want to be able to merge that back into our original dataframe at the end, so we want to preserve the index of each hit from the original dataframe. Based on the mask above, this would be `[3, 4, ...]` and so on (wherever the mask is `True`). We get this with `mask[mask]`, which is a series of all the True values with their associated index. Then we set every element of this series to a randomly generate 2d6 roll:

```hits = pd.Series(index=mask[mask].index)
hits.loc[:] = np.random.randint(1, 7, size=(len(hits),)) + np.random.randint(1, 7, size=(len(hits),))
3     6
4     3
9     8
10    3
14    3
dtype: int64
```

Note that we’re generating a “1D” random int (`randint(len(hits),)`) for the damage instead of the 2D one above (`randint(10000,1)`) because this is a series (1D), not a dataframe (2D).

Then we can combine that damage back into the rolls dataframe using our original mask:

```rolls.loc[mask, 'damage'] = hits