Aggregating
Adding and Modifying Columns
Adding a new column to a dataframe using assign
Chain methods together instead of writing long, hard-to-read lines
- Need to wrap expression in parentheses to add newlines before every method call
Assign with special column names (spaces, special characters)
df.copy()
copies a dataframe in place
df[] =
assigns column in place
df.assign()
assigns a column to a new dataframe
Avoid inplace=True
- plans to remove in future releases of pandas, not good practice
df[column].to_numpy()
returns the numpy array of a column
dogs.max(axis=1)
won’t work because you are trying to take the max of a mix of datatypes
Data Granularity and the groupby
method
Fine granularity: small details
Coarse: bigger picture
You should opt for finer granularity for more detail if you have the resources to do so
How to go from fine to coarse granularity: Aggregating
Aggregation
Aggregating is the act of combining many values into a single value
penguins.groupby('species')['body_mass_g'].mean()
“Split-apply-combine” Paradigm
Allows us to visualize groupby_objects
Aggregation Methods
count()
sum()
mean()
max()
last()
first()
Generally, you should select column(s) directly after groupby