Combining columns in Pandas

Suppose we have a dataframe with a couple of columns and we’d like to merge them into one column with some delimiter. For instance, let’s take a restaurant with orders to fill:

        dinner	dessert
patron		
Alice	Steak	None
Bob	Fish	Cake
Carol	None	Pie

And we want to combine the columns into:

         patron
Alice    Steak     
Bob      Fish, Cake
Carol    Pie

Note that this is made more difficult by the missing values: if everyone would just order dinner and dessert, it would be much simpler. However, users gonna user so there are a couple of ways of doing this in Pandas. If we weren’t picky about formatting, we could simply do:

df['dinner'].fillna('') + ', ' + df['dessert'].fillna('')
patron
Alice   Steak,
Bob     Fish, Cake
Carol   , Pie

Ew. However, easy enough to fix! Add an str.strip:

(df['dinner'].fillna('') + ', ' + df['dessert'].fillna('')).str.strip(', ')
patron
Alice   Steak
Bob     Fish, Cake
Carol   Pie

This is probably the most straightforward way of getting the formatting we want. However, this is a long messy line and gets worse as we add more columns (what about drinks? Appetizers?). We can get more elegant using aggregation to ignore the missing values, instead of having to fill them in and then strip them out at the end. First, we stack up the two columns into one:

df.stack()
patron
Alice   dinner   Steak
Bob     dinner   Fish
        dessert  Cake
Carol   dessert  Pie

This creates a multiindex on name and part of meal. Now we can group by person and join on ‘,’ for any foods they list:

df.stack().groupby(level=0).agg(', '.join)
patron
Alice    Steak
Bob      Fish, Cake
Carol    Pie

:Chef’s kiss:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: