November 29, 2019
You can use the apply method and lambda functions when the operation you want to perform is more complicated than mean or count.
If we want to calculate the 75th percentile for each category, we can use the following combination of apply and lambda.
high_earners = df.groupby(‘category’).wage.apply(lambda x: np.percentile(x, 75)).reset_index()
We can groupby multiple columns.
df.groupby([‘location’, ‘Day of Week’])[‘Total Sales’].mean().reset_index()
Pivot Tables
You may want to reorganize the way your groupby data is displayed. This is called pivoting and the new table is called a pivot table.
In Pandas, the command for pivot is:
df,pivot(columns = ‘ColumnsToPivot’, index = ‘Columns_To_Be_Rows’, values = ‘Columns_To_Be_Values’)
First use the groupby statement:
unpivoted = df.groupby([‘Location’, ‘Day of Week’])[‘Total Sales’].mean().reset_index()
Now pivot the table:
pivoted = unpivoted.pivot(columns = ‘Day of Week’, index = ‘Location’, values = ‘Total Sales’).reset_index()