Suppose we have a dataframe with a couple of columns and we’d like to merge them into one column with some delimiter. For instance, let’s take a restaurant with orders to fill:
dinner dessert
patron
Alice Steak None
Bob Fish Cake
Carol None Pie
And we want to combine the columns into:
patron
Alice Steak
Bob Fish, Cake
Carol Pie
Note that this is made more difficult by the missing values: if everyone would just order dinner and dessert, it would be much simpler. However, users gonna user so there are a couple of ways of doing this in Pandas. If we weren’t picky about formatting, we could simply do:
df['dinner'].fillna('') + ', ' + df['dessert'].fillna('')
patron Alice Steak, Bob Fish, Cake Carol , Pie
Ew. However, easy enough to fix! Add an str.strip
:
(df['dinner'].fillna('') + ', ' + df['dessert'].fillna('')).str.strip(', ')
patron Alice Steak Bob Fish, Cake Carol Pie
This is probably the most straightforward way of getting the formatting we want. However, this is a long messy line and gets worse as we add more columns (what about drinks? Appetizers?). We can get more elegant using aggregation to ignore the missing values, instead of having to fill them in and then strip them out at the end. First, we stack up the two columns into one:
df.stack()
patron Alice dinner Steak Bob dinner Fish dessert Cake Carol dessert Pie
This creates a multiindex on name and part of meal. Now we can group by person and join on ‘,’ for any foods they list:
df.stack().groupby(level=0).agg(', '.join)
patron Alice Steak Bob Fish, Cake Carol Pie
:Chef’s kiss: