You are on page 1of 1

# 5 2 3 apples 1

# 6 2 3 bananas 1
# 7 3 4 coffee 3
# 8 3 5 milkshake 3
# 9 3 6 chocolate 3
# 10 3 6 strawberry 3
# 11 3 6 strawberry 3

Section 186.2: Multiple results per group


Using transform functions that return sub-calculations per group

In the previous example, we had one result per client. However, functions returning different values for the group
can also be applied.

# Create a dummy dataframe


orders_df = pd.DataFrame()
orders_df['customer_id'] = [1,1,1,1,1,2,2,3,3,3,3,3]
orders_df['order_id'] = [1,1,1,2,2,3,3,4,5,6,6,6]
orders_df['item'] = ['apples', 'chocolate', 'chocolate', 'coffee', 'coffee', 'apples',
'bananas', 'coffee', 'milkshake', 'chocolate', 'strawberry', 'strawberry']

# Let's try to see if the items were ordered more than once in each orders

# First, we define a function that will be applied per group


def multiple_items_per_order(_items):
# Apply .duplicated, which will return True is the item occurs more than once.
multiple_item_bool = _items.duplicated(keep=False)
return(multiple_item_bool)

# Then, we transform each group according to the defined function


orders_df['item_duplicated_per_order'] = ( # Put the results into a new column
orders_df # Take the orders dataframe
.groupby(['order_id'])['item'] # Create a separate group for each
order_id & select the item
.transform(multiple_items_per_order)) # Apply the defined function to each
group separately

# Inspecting the results ...


print(orders_df)
# customer_id order_id item item_duplicated_per_order
# 0 1 1 apples False
# 1 1 1 chocolate True
# 2 1 1 chocolate True
# 3 1 2 coffee True
# 4 1 2 coffee True
# 5 2 3 apples False
# 6 2 3 bananas False
# 7 3 4 coffee False
# 8 3 5 milkshake False
# 9 3 6 chocolate False
# 10 3 6 strawberry True
# 11 3 6 strawberry True

GoalKicker.com – Python® Notes for Professionals 714

You might also like