You are on page 1of 20
2728122, 153 AM In [4]: In [2]: In [3]: In [4]: In [5]: In [6]: In [7]: In [8]: localhost 8888inotebooks/Jupyler Notebooks/Scalor'Scal Scaler Class Code - EDA 2 - Jupyter Notebook local_data_path_eda = "F:\DATA_SCTENCE\Scaler\Data\marketing data. csv" import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns df = pd.read_csv(local_data_path_eda) def cleanincome(x): if isinstance(x, str): return float(x[1:].replace(",", "")) df[" Income "] = df[" Income "].apply(cleanincome) p99 = df['NntWines*].quantile(o.99) pl = df['MintWines'].quantile(@.e1) df["Mnthines"] = np.where(df["Mnthines"] > p99, p99, df["Mnthin df["MntWines"] “y) np.where(df["MntWines"] < pi, pi, df["NntWines"]) df = df. fillna(a) Clase Code - EDA 2ipynb 1120 2728122, 153 AM Scaler Class Code - EDA 2 - Jupyter Notebook In [9]: df.info() RangeIndex: 2240 entries, @ to 2239 Data columns (total 28 columns): # Column Non-Null Count Dtype 1D 2240 non-null int64 e 1 Year_birth 2240 non-null inte4 2 Education 2240 non-null object 3 Marital_status 2248 non-null object 4 Income 2240 non-null float 5 Kidhome 2240 non-null inte4 6 Teenhome 2240 non-null int64 7 Dt_Customer 2248 non-null object 8 Recency 2248 non-null inte4 9 MntWines 2240 non-null float6a 10 MntFruits 2240 non-null int64 11 MntMeatProducts 2240 non-null int64 12. MntFishProducts 2248 non-null int64 13. MntSweetProducts 2240 non-null int6a 14 MntGoldProds 2248 non-null int64 15 NumDealsPurchases 2240 non-null int6a 16 NunWebPurchases 2240 non-null int64 17 NumCatalogPurchases 2240 non-null int6a 18 NumStorePurchases 2240 non-null ints 19 NumbebVisitsMonth | 2240 non-null int6a 28 Acceptedcmp3 2248 non-null int64 21 Acceptedcmpa 2248 non-null inte4 22 Acceptedcmps 2248 non-null inte 23. Acceptedmp1 2248 non-null inte4 24 Accepted¢mp2 2240 non-null inte4 25. Response 2248 non-null inte 26 Complain 2248 non-null intea 27. Country 2240 non-null object dtypes: float64(2), int64(22), object(4) memory usage: 490.14 KB In [18]: cols = df.columns[4:20] cols Out[1@]: Index([* Income *, 'Kidhome', ‘Teenhome', ‘Dt_Customer', ‘Recency’, ‘MntWines', ‘MntFruits’, 'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts’, "MntGoldProds*, 'NumDealsPurchases', ‘NumlebPurchases', ‘NumCatalogPurchases', ‘NumStorePurchases', 'NumWebVisitsMonth' ], dtype="object') In [11]: df2 = df[cols} localhost 8888inotebooks/Jupyter Notebooks/ScalriScaler Class Code - EDA 2ipynb 2728122, 153 AM ‘Scaler Class Code - EDA 2-Jupyter Notebook In [13]: plt.figure(figsize=(16, 9)) sns.heatmap(df2.corr(), annot=True) plt.show() Case Study 2 In [14]: local_data_path = "€ DATA_SCIENCE\Scaler\Data\HRDataset_v14.csv" In [16]: df = pd.read_csv(local_data_path) localhost 8888inotebooks/Jupyter Notebooks/ScalriScaler Class Code - EDA 2ipynb 2728122, 153 AM Scaler Class Code - EDA 2 - Jupyter Notebook In [18]: df.describe().transpose() out [18]: ‘count mean std min. 28% = 50% = 78% EmpiD 311.0 10156,000000 9.922189 1007.00 10078,50 10156.00 10233.5 MarriediD 311.0 0.98714 0.490423. «0.00.00, 1.0 MaritalStatusiD 311.0 0.810289 = 0.943299 «0.00 000.00 1.0 GenderiD 311.0 0.434084 = 0.496435 0.0000 1.0 EmpStatusio 311.0 2.392283 «1.794983 = 100 100 1.0080 DeptID 311.0 4.610932 1.083487 1.00 5.00 5.00 8.0 PorfScorelD 311.0 2.977492 «0.887072, 100 3.00 3.00800 FromDiversityJobFairiD 311.0 0.093248 ©—«0.281248. © 0.00 0.000000 311.0 69020,684887 25156,636930 45046,00 55501.50 62810.00 72036.0 310 o.gaaa05 0.472542 «0.000.001. PositionID 311.0 18.845659 6.223419 1.00 18.00» 19.00 20.0 Zip 311.0 6555.482915 16908396884 1013.00 1901.50 2132.00 23550 ManageriD 303.0 4.570957 8.078306. «= «1.00» 10.00 18.00 18.0 EngagementSurvey 311.0 4.110000 o7esg38.- 1.123890 428 AT, EmpSatisfaction 311.0 3.890675 0.909241 400-300 4005.0 SpeciaiProjectsCount 3110 1.218650 2.349421 «0.000.000.0000 310 oata7a1 = 1.204519 «= 0.00.00 Absences 311.0 10,297942 5.852596 1,00 0010.00 15.0 » localhost 8888inotebooks/Jupyter Notebooks/ScalriScaler Class Code - EDA 2ipynb 420 2728122, 153 AM Scaler Class Code - EDA 2 - Jupyter Notebook In [19]: df.info() 311 entries, @ to 310 Data columns (total 36 columns): Rangeindex: 30 31 32 33 34 35, Column Employee_Name Emp1d MarriedID Maritalstatus1D GenderID EmpStatusID DeptID PerfScoreID FromDiversityJobFairID salary Termd PositionID Position state Zip DOB Sex MaritalDese CitizenDesc HispanicLatino RaceDesc DateofHire DateofTermination TermReason EmploymentStatus Department ManagerName ManagerID RecruitmentSource PerformanceScore EngagementSurvey EmpSatisfaction SpecialProjectsCount LastPerformanceReview_Date DaysLateLast30 Absences. Non-Null Count 31 31 311 311. 31 3a. 311, 311, 311 31 311 311 311 311 311 31 311 3a 311 3a 3a 311 104 311 31 311 311 303 311 311, 311 311 311. Bal 311 311 non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null dtypes: float64(2), int64(16), object(18) memory usage: 87.6+ KB localhost 8888inotebooks/Jupyter Notebooks/ScalriScaler Class Code - EDA 2ipynb object intea inte inte inte intea intea intea intea intea intea intea object object inte object object object object object object object object object object object object floated object object floated. intea intea object intea intea 2728122, 153 AM ‘Scaler Class Code - EDA 2-Jupyter Notebook In [20]: sns.countplot("PerformanceScore", data=df) C:\Users\nitis\anaconda3\1ib\site-packages\seaborn\_decorators.py:36: FutureWar ning: Pass the following variable as a keyword arg: x. From version @.12, the o nly valid positional argument will be “data”, and passing other arguments witho ut an explicit keyword will result in an error or misinterpretation. warnings.warn( out[20]: 50 200 150 count 100 Exceeds Fully Meets Needs Improvement PIP Performancescore localhost 8888inotebooks/Jupyter Notebooks/ScalriScaler Class Code - EDA 2ipynb 2128122, 153 AM ‘Scaler Class Code - EDA 2-Jupyter Notebook In [22]: plt.figure(Figsize=(16,9)) ‘sns.countplot("RecruitmentSource", data=df) C:\Users\nitis\anaconda3\1ib\site-packages\seaborn\_decorators.py:36: FutureWar ning: Pass the following variable as a keyword arg: x. From version @.12, the o nly valid positional argument will be “data”, and passing other arguments witho ut an explicit keyword will result in an error or misinterpretation. warnings.warn( out[22]: localhost 8888inotebooks/Jupyter Notebooks/ScalriScaler Class Code - EDA 2ipynb 120 2128122, 153 AM ‘Scaler Class Code - EDA 2-Jupyter Notebook In [23]: plt.figure(Figsize=(16,9)) ‘sns.countplot(x="RecruitmentSource", hue="PerformanceScore", data=d¥) plt.show() i Beahdo 2 |.. localhost 8888inotebooks/Jupyter Notebooks/ScalriScaler Class Code - EDA 2ipynb 2728122, 153 AM In [24]: dF. groupby("RecruitmentSource")["PerformanceScore" ].value_counts (normaliz out[24]: RecruitmentSource CareerBuilder Diversity Job Fair Employee Referral Google Search Indeed Linkedin On-line Web application other Website Scaler Class Code - EDA 2 - Jupyter Notebook Performancescore Fully Meets Needs Improvement Exceeds Fully Meets Exceeds Needs Improvement PIP Fully Meets Exceeds PIP Fully Meets Needs Improvement Exceeds PIP Fully Meets Exceeds PIP Needs Improvement Fully Meets Exceeds Needs Improvement PIP Fully Meets Fully Meets Fully Meets PIP Exceeds Name: PerformanceScore, dtype: floatéa localhost 8888inotebooks/Jupyter Notebooks/ScalriScaler Class Code - EDA 2ipynb 73.913043 17.391304 8.695652 62068966 20.689655 13.793103 3.448276 80..645161 16.129032 3.225806 87.755102 6.122449 4.081633 2.040816 75.862069 13.793103 5.747126 4.59701 80..263158 11.842105 3.947368 3.947368 100.000000 1.000000 76.923077 15.384615 7.692308 2128122, 153 AM ‘Scaler Class Code - EDA 2-Jupyter Notebook In [31]: (af -groupby("RecruittmentSource")["PerformanceScore"] -value_counts(normalize=True) -mul(16@) -renane("percent") sreset_index() pipe (sns.catplot, ‘data'), x="RecruitmentSource", kind="bar", y="percent", size=6, aspect=2, hue="PerformanceScore”™ » plt.show() Users \nitis\anaconda3\1ib\site-packages\seaborn\categorical.py:375@: UserWar ning: The “size” parameter has been renamed to “height”; please update your cod e warnings.warn(msg, UserWarning) r = Lets look at the Salary field now In [35]: d#[["Engagement survey" ]].min() Out(35]: EngagementSurvey 1.12 dtype: floatea localhost 8888inotebooks/Jupyter Notebooks/ScalriScaler Class Code - EDA 2ipynb 2128122, 153 AM In [38]: ‘Sealer Class Code - EDA 2 ~Jupyter Notebook plt. Figure(Figsiz 8,6) sns.scatterplot( xe"Salary", y="EngagementSurvey", hue="PerformanceScore", data=df ) plt.show() so ar 45 . 40 Bas . 2 #20 Eas . 20 PerformanceScore © Exceeds 1s . © Fully Meets ‘© Needs Improvement : ° PP 10 s0500 100000 150000 zo0o00 70000 sslary localhost 8888inotebooks/Jupyter Notebooks/ScalriScaler Class Code - EDA 2ipynb 20 2128122, 153 AM ‘Scaler Class Code - EDA 2-Jupyter Notebook In [41]: plt.figure(Figsize=(8,6)) salary_plot = sns.scatterplot( xe"Salary", y="EngagementSurvey", hue="PerformanceScore", data=df ) salary_plot.set(x1im=(None, 130000) plt.show() 50 45 40 EngagementSurvey 20 omens arcs Seeeds 1s . Fully Meets: Needs improvement : PIP 10 “oso oi 0500 72000 ‘Salary localhost 8888inotebooks/Jupyter Notebooks/ScalriScaler Class Code - EDA 2ipynb 2128122, 153 AM ‘Scaler Class Code - EDA 2-Jupyter Notebook In [42]: plt. Figure(Figsiz 12,8)) salary_plot = sns.scatterplot( x="Salary", y="EngagementSurvey", hue="PerformanceScore", style="Department", data=df ) salary_plot.set(xlim=(None, 13@000)) plt.show() so ane 4s . a » * o wo ps * 5 : 4 : Boo . “ : rarer “Os we . © Exceeds . . . © Fully Meets . (© Needs improvement ; ° he 20 we. ot Department : °moautin ts as . ‘Software Engineering. + kammonces St to 2 Stes oe 73500 oi00 ohm Tao Tat ay localhost 8888inotebooks/Jupyter Notebooks/ScalriScaler Class Code - EDA 2ipynb 2128122, 153 AM In [45]: plt. Figure(Figsiz 12,8)) salary_plot = sns.scatterplot( x="Salary", y="EngagementSurvey", hue="PerformanceScore", ‘Sealer Class Code - EDA 2 ~Jupyter Notebook style="Department", size="EmpSatisfaction", # Absences data=df ) salary_plot.set(x1ine(None, 130008)) pit.show) a es 5 5 en a0 7 i : 2 fares i . a 230 * ™ ° PP i . cr os . 2 . Spaoae 7 . —— spon ; 332 to ieee a coal aa as = = localhost 8888inotebooks/Jupyter Notebooks/ScalriScaler Class Code - EDA 2ipynb 2128722, 183. AM ‘Scaler Clas Code - EDA 2-Jupyter Notebook In [46]: # Termd - terminated from employment or not # FacetGrid ~ Subplot g = sns.FacetGrid(df, row="Ternd”, hue="PerformanceScore”, height=4, aspect=2) g.map(sns.scatterplot, "Salary", “EngagenentSurvey") g.set(xlim=(None, 126000) plt.show() EngagementSurvey GeRek ee ES EngagementSurvey 0000 so000 60000 70000 + epdoo -s0000 «100000110000 120000 Salary localhost 8888inotebooks/Jupyter Notebooks/ScalriScaler Class Code - EDA 2ipynb 19120 2128122, 153 AM In [49]: In [50]: ‘Sealer Class Code - EDA 2 ~Jupyter Notebook ge g.map(sns.scatterplot, "Salary", "EngagementSurvey") g.set(xlim=(None, 1200¢@)) sns.FacetGrid(df, row="MaritalDesc", col="Sex", hue="PerformanceScore", heigt plt.show() » Jocal_data_path = "E:\DATA_SCIENCE\Scaler\Data\Weather.csv" 16120 localhost 8888inotebooks/Jupyter Notebooks/ScalriScaler Class Code - EDA 2ipynb 2728122, 153 AM In [53]: weather = pd.read_csv(local_data_path) In [56]: weather.index = weather["date”] weather out [56]: date date Scaler Class Code - EDA 2 - Jupyter Notebook actual_mean_temp actual_min_temp actual_max_temp average_min_temp average_m 2014 74 2014 72 2014 7 2014 2014 15 2015- 6-26 2018- 6.27 2015- 6-28 2015- 6.29 2015- 6-30 365 rows 13 columns localhost 8888inotebooks/Jupyter Notebooks/ScalriScaler Class Code - EDA 2ipynb 2014 7 2014 72 2014 73 2014 201 75 2015 6-26 2015- 627 2015 6-28 2015- 6-29 2015- 6-30 83 86 83 73 4 76 er 13 2 7 R 75 ™ 68 64 68 63 66 63 67 93 98 92 78 83 84 n 79 at 86 68 68 68 68 69 er er er 68 68 1720 2128122, 153 AM In [59]: ‘Sealer Class Code - EDA 2 ~Jupyter Notebook plt. Figure(Figsiz 10,6)) weather[actual_min_temp"].plot(label="Mininum Temperature") weather["actual_max_temp"].plot(label="Maxinum Temperature") plt.legend() plt.show() 100 80 o 0 2» — Minimum Temperature | — Maximum Temperature 21 AL W1E820 WISI0-9 WITLI WISLAT S38 WIS4I7 WIS61G cate localhost 8888inotebooks/Jupyter Notebooks/ScalriScaler Class Code - EDA 2ipynb 2128122, 153 AM In [60]: ‘Scaler Clas Code - EDA 2-Jupyter Notebook plt. Figure(Figsize=(10,6)) weather "actual_min_temp"].plot(label="Mininum Temperature") weather "actual_max_temp"].plot(label="Maxinum Temperature") weather "average_min_temp"].plot(1abel="Minimum Avg Temperature") weather "average_max_temp"].plot(1abel="Maximum Avg Temperature") plt.legend() plt.show() 100 » o ry » — Minimum Temperature — Maximum Temperature — Minimum Avg Temperature | — Maximum Ave Temperature DTA 2018820 110-9 WIL WISL17 WIS38 WIS42 215-616 cate localhost 8888inotebooks/Jupyter Notebooks/ScalriScaler Class Code - EDA 2ipynb 2128122, 153 AM ‘Scaler Class Code - EDA 2-Jupyter Notebook In [61]: plt.Figure(Figsize=(12, 8)) weather["actual_precipitation"] plot (label=' Precipitation’) plt.show() 200 us 150 as 100 07s 050 025 000 wri -wies20 ~~ mie1eS ITB ISLA «ISCAS OTS cote In[ ]: In [ J: localhost 8888inotebooks/Jupyter Notebooks/ScalriScaler Class Code - EDA 2ipynb

You might also like