You are on page 1of 11

7 advanced pandas tricks for data science | by Félix Revert |... https://towardsdatascience.com/7-advanced-tricks-in-pandas-...

Sign in create an account

1 of 11 1/13/2023, 9:52 AM
7 advanced pandas tricks for data science | by Félix Revert |... https://towardsdatascience.com/7-advanced-tricks-in-pandas-...

1 generator = df.groupby(['identifier']).__iter__()

groupby generator.py hosted with by GitHub view raw

1 group_id, grouped_data = generator.__next__()


2 print(group_id)
3 grouped_data

groupby generator iterator.py hosted with by GitHub view raw

2 of 11 1/13/2023, 9:52 AM
7 advanced pandas tricks for data science | by Félix Revert |... https://towardsdatascience.com/7-advanced-tricks-in-pandas-...

3 of 11 1/13/2023, 9:52 AM
7 advanced pandas tricks for data science | by Félix Revert |... https://towardsdatascience.com/7-advanced-tricks-in-pandas-...

4 of 11 1/13/2023, 9:52 AM
7 advanced pandas tricks for data science | by Félix Revert |... https://towardsdatascience.com/7-advanced-tricks-in-pandas-...

1 def female_proportion(dataframe):
2 return (dataframe.Sex=='female').sum() / len(dataframe)
3
4 female_proportion(df)

1 df.merge(
2 df.loc[
3 df.Ticket.isin(
4 df.Ticket.value_counts().loc[
5 df.Ticket.value_counts()>1
6 ].index
7 )
8 ].groupby('Ticket').apply(female_proportion) \
9 .reset_index().rename(columns={0:'proportion_female'}),
10 how='left', on='Ticket'
)

5 of 11 1/13/2023, 9:52 AM
7 advanced pandas tricks for data science | by Félix Revert |... https://towardsdatascience.com/7-advanced-tricks-in-pandas-...

1 pd.DataFrame({
2 'variable': variables,
3 'coefficient': model.coef_[0]
4 }) \
5 .round(decimals=2) \
6 .sort_values('coefficient', ascending=False) \
7 .style.bar(color=['grey', 'lightblue'], align='zero')

6 of 11 1/13/2023, 9:52 AM
7 advanced pandas tricks for data science | by Félix Revert |... https://towardsdatascience.com/7-advanced-tricks-in-pandas-...

7 of 11 1/13/2023, 9:52 AM
7 advanced pandas tricks for data science | by Félix Revert |... https://towardsdatascience.com/7-advanced-tricks-in-pandas-...

1 from sklearn.impute import SimpleImputer


2 from sklearn.preprocessing import OneHotEncoder
3 from sklearn.preprocessing import MinMaxScaler, StandardScaler
4 from sklearn_pandas import DataFrameMapper
5 from category_encoders import LeaveOneOutEncoder
6
7 imputer_Pclass = SimpleImputer(strategy='most_frequent',
add_indicator=True)
8 imputer_Age = SimpleImputer(strategy='median', add_indicator=True)
9 imputer_SibSp = SimpleImputer(strategy='constant', fill_value=0,
add_indicator=True)
10 imputer_Parch = SimpleImputer(strategy='constant', fill_value=0,
add_indicator=True)
11 imputer_Fare = SimpleImputer(strategy='median', add_indicator=True)
12 imputer_Embarked = SimpleImputer(strategy='most_frequent')
13
14 scaler_Age = MinMaxScaler()
15 scaler_Fare = StandardScaler()
16
17 onehotencoder_Sex = OneHotEncoder(drop=['male'], handle_unknown='error')
18 onehotencoder_Embarked = OneHotEncoder(handle_unknown='error')
19
20 leaveoneout_encoder = LeaveOneOutEncoder(sigma=.1, random_state=2020)
21
22 mapper = DataFrameMapper([
23 (['Age'], [imputer_Age, scaler_Age], {'alias':'Age_scaled'}),
24 (['Pclass'], [imputer_Pclass]),
25 (['SibSp'], [imputer_SibSp]),
26 (['Parch'], [imputer_Parch]),

8 of 11 1/13/2023, 9:52 AM
7 advanced pandas tricks for data science | by Félix Revert |... https://towardsdatascience.com/7-advanced-tricks-in-pandas-...

1 from tqdm import notebook


2 notebook.tqdm().pandas()

tqdm.py hosted with by GitHub view raw

9 of 11 1/13/2023, 9:52 AM
7 advanced pandas tricks for data science | by Félix Revert |... https://towardsdatascience.com/7-advanced-tricks-in-pandas-...

10 of 11 1/13/2023, 9:52 AM
7 advanced pandas tricks for data science | by Félix Revert |... https://towardsdatascience.com/7-advanced-tricks-in-pandas-...

11 of 11 1/13/2023, 9:52 AM

You might also like