You are on page 1of 10

統計學習作業1 (ch2.

8) - Jupyter Notebook 2023/9/30 下午6:11

In [33]: %matplotlib inline

import matplotlib.pyplot as plt


import seaborn as sns
import pandas as pd
import numpy as np

pd.options.display.float_format = '{:,.2f}'.format # 只印出⼩數點兩位

In [34]: #第(a)題
college = pd.read_csv('College.csv') #⽤pandas讀入excel檔
college

Out[34]:
Unnamed: 0 Private Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal

Abilene
0 Christian Yes 1660 1232 721 23 52 2885 537 7440 3300 450 2200
University

Adelphi
1 Yes 2186 1924 512 16 29 2683 1227 12280 6450 750 1500
University

Adrian
2 Yes 1428 1097 336 22 50 1036 99 11250 3750 400 1165
College

Agnes Scott
3 Yes 417 349 137 60 89 510 63 12960 5450 450 875
College

Alaska
4 Pacific Yes 193 146 55 16 44 249 869 7560 4120 800 1500
University

... ... ... ... ... ... ... ... ... ... ... ... ... ...

Worcester
772 State No 2197 1515 543 4 26 3089 2029 6797 3900 500 1200
College

http://localhost:8888/notebooks/Desktop/統計學習作業1%20(ch2.8).ipynb 第1⾴(共10⾴)
統計學習作業1 (ch2.8) - Jupyter Notebook 2023/9/30 下午6:11

Xavier
773 Yes 1959 1805 695 24 47 2849 1107 11520 4960 600 1250
University

Xavier
774 University of Yes 2097 1915 695 34 61 2793 166 6900 4200 617 781
Louisiana

Yale
775 Yes 10705 2453 1317 95 99 5217 83 19840 6510 630 2115
University

York College
776 of Yes 2989 1855 691 28 63 2988 1726 4990 3560 500 1250
Pennsylvania

777 rows × 19 columns

In [35]: #第(b)題
college2 = pd.read_csv('College.csv', index_col=0)
college2

Out[35]:
Private Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD

Abilene
Christian Yes 1660 1232 721 23 52 2885 537 7440 3300 450 2200 70
University

Adelphi
Yes 2186 1924 512 16 29 2683 1227 12280 6450 750 1500 29
University

Adrian
Yes 1428 1097 336 22 50 1036 99 11250 3750 400 1165 53
College

Agnes Scott
Yes 417 349 137 60 89 510 63 12960 5450 450 875 92
College

Alaska
Pacific Yes 193 146 55 16 44 249 869 7560 4120 800 1500 76
University

http://localhost:8888/notebooks/Desktop/統計學習作業1%20(ch2.8).ipynb 第2⾴(共10⾴)
統計學習作業1 (ch2.8) - Jupyter Notebook 2023/9/30 下午6:11

... ... ... ... ... ... ... ... ... ... ... ... ... ...

Worcester
State No 2197 1515 543 4 26 3089 2029 6797 3900 500 1200 60
College

Xavier
Yes 1959 1805 695 24 47 2849 1107 11520 4960 600 1250 73
University

Xavier
University of Yes 2097 1915 695 34 61 2793 166 6900 4200 617 781 67
Louisiana

Yale
Yes 10705 2453 1317 95 99 5217 83 19840 6510 630 2115 96
University

York College
of Yes 2989 1855 691 28 63 2988 1726 4990 3560 500 1250 75
Pennsylvania

777 rows × 18 columns

In [42]: college3 = college.rename({'Unnamed: 0': 'College'}, axis = 1)


college3 = college3.set_index('College') #將原本的index從數字改成College
college3

Out[42]:
Private Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD

College

Abilene
Christian Yes 1660 1232 721 23 52 2885 537 7440 3300 450 2200 70
University

Adelphi
Yes 2186 1924 512 16 29 2683 1227 12280 6450 750 1500 29
University

Adrian
Yes 1428 1097 336 22 50 1036 99 11250 3750 400 1165 53

http://localhost:8888/notebooks/Desktop/統計學習作業1%20(ch2.8).ipynb 第3⾴(共10⾴)
統計學習作業1 (ch2.8) - Jupyter Notebook 2023/9/30 下午6:11

College

Agnes Scott
Yes 417 349 137 60 89 510 63 12960 5450 450 875 92
College

Alaska
Pacific Yes 193 146 55 16 44 249 869 7560 4120 800 1500 76
University

... ... ... ... ... ... ... ... ... ... ... ... ... ...

Worcester
State No 2197 1515 543 4 26 3089 2029 6797 3900 500 1200 60
College

Xavier
Yes 1959 1805 695 24 47 2849 1107 11520 4960 600 1250 73
University

Xavier
University of Yes 2097 1915 695 34 61 2793 166 6900 4200 617 781 67
Louisiana

Yale
Yes 10705 2453 1317 95 99 5217 83 19840 6510 630 2115 96
University

York College
of Yes 2989 1855 691 28 63 2988 1726 4990 3560 500 1250 75
Pennsylvania

777 rows × 18 columns

In [37]: college=college3

http://localhost:8888/notebooks/Desktop/統計學習作業1%20(ch2.8).ipynb 第4⾴(共10⾴)
統計學習作業1 (ch2.8) - Jupyter Notebook 2023/9/30 下午6:11

In [38]: #第(c)題
college.describe(include='all')#⽤describe算出基本的敘述統計量,include='all'⽤來包含屬量變數

Out[38]:
College Private Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books

count 777 777 777.00 777.00 777.00 777.00 777.00 777.00 777.00 777.00 777.00 777.00

unique 777 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Abilene
top Christian Yes NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
University

freq 1 565 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

mean NaN NaN 3,001.64 2,018.80 779.97 27.56 55.80 3,699.91 855.30 10,440.67 4,357.53 549.38

std NaN NaN 3,870.20 2,451.11 929.18 17.64 19.80 4,850.42 1,522.43 4,023.02 1,096.70 165.11

min NaN NaN 81.00 72.00 35.00 1.00 9.00 139.00 1.00 2,340.00 1,780.00 96.00

25% NaN NaN 776.00 604.00 242.00 15.00 41.00 992.00 95.00 7,320.00 3,597.00 470.00

50% NaN NaN 1,558.00 1,110.00 434.00 23.00 54.00 1,707.00 353.00 9,990.00 4,200.00 500.00

75% NaN NaN 3,624.00 2,424.00 902.00 35.00 69.00 4,005.00 967.00 12,925.00 5,050.00 600.00

max NaN NaN 48,094.00 26,330.00 6,392.00 96.00 100.00 31,643.00 21,836.00 21,700.00 8,124.00 2,340.00

In [43]: #第(d)題 畫出['Top10perc', 'Apps', 'Enroll']的相關分布圖


picture = pd.plotting.scatter_matrix(college[['Top10perc', 'Apps', 'Enroll']], figsize=(13, 13))
plt.show()

http://localhost:8888/notebooks/Desktop/統計學習作業1%20(ch2.8).ipynb 第5⾴(共10⾴)
統計學習作業1 (ch2.8) - Jupyter Notebook 2023/9/30 下午6:11

http://localhost:8888/notebooks/Desktop/統計學習作業1%20(ch2.8).ipynb 第6⾴(共10⾴)
統計學習作業1 (ch2.8) - Jupyter Notebook 2023/9/30 下午6:11

In [40]: #第(e)題
#Outstate做y軸, ⽤private做x軸,接著畫出盒鬚圖
sns.boxplot(x='Private', y='Outstate', data=college)

Out[40]: <AxesSubplot:xlabel='Private', ylabel='Outstate'>

http://localhost:8888/notebooks/Desktop/統計學習作業1%20(ch2.8).ipynb 第7⾴(共10⾴)
統計學習作業1 (ch2.8) - Jupyter Notebook 2023/9/30 下午6:11

In [44]: #第(f)題

college.loc[college['Top10perc']>50, 'Elite'] = 'Yes'


college['Elite'] = college['Elite'].fillna('No')
#把college中的'Top10perc'數量超過50的標註成'Yes',接著再多加'Elite'這個欄位,把其餘沒有超過50的標註為'No'
sns.boxplot(x='Elite', y='Outstate', data=college)
college['Elite'].value_counts() #算Elite的數量

Out[44]: No 699
Yes 78
Name: Elite, dtype: int64

In [29]:

http://localhost:8888/notebooks/Desktop/統計學習作業1%20(ch2.8).ipynb 第8⾴(共10⾴)
統計學習作業1 (ch2.8) - Jupyter Notebook 2023/9/30 下午6:11

#第(g)題
#為了要畫直⽅圖,必須要先把資料做分類以及分群
college['PhD'] = pd.cut(college['PhD'], 3, labels=['Low', 'Medium', 'High'])
#將PhD這個欄位分為['Low', 'Medium', 'High']3組
college['Grad.Rate'] = pd.cut(college['Grad.Rate'], 5, labels=['Very low', 'Low', 'Medium', 'High', 'Very hi
#將Grad.Rate這個欄位分為['Very low', 'Low', 'Medium', 'High', 'Very high']5組
college['Books'] = pd.cut(college['Books'], 2, labels=['Low', 'High'])
#將Books這個欄位分為['Low', 'High']2組
college['Enroll'] = pd.cut(college['Enroll'], 4, labels=['Very low', 'Low', 'High', 'Very high'])
#將Enroll這個欄位分為['Very low', 'Low', 'High', 'Very high']4組

#畫圖
fig = plt.figure()
plt.subplot(221) #設定位置為2x2的左上⾓
college['PhD'].value_counts().plot(kind='bar', title = 'Private');
plt.subplot(222) #設定位置為2x2的右上⾓
college['Grad.Rate'].value_counts().plot(kind='bar', title = 'Grad.Rate');
plt.subplot(223) #設定位置為2x2的左下⾓
college['Books'].value_counts().plot(kind='bar', title = 'Books');
plt.subplot(224) #設定位置為2x2的右下⾓
college['Enroll'].value_counts().plot(kind='bar', title = 'Enroll');
fig.subplots_adjust(hspace=1) # 把⼦圖之間加上間隔

http://localhost:8888/notebooks/Desktop/統計學習作業1%20(ch2.8).ipynb 第9⾴(共10⾴)
統計學習作業1 (ch2.8) - Jupyter Notebook 2023/9/30 下午6:11

http://localhost:8888/notebooks/Desktop/統計學習作業1%20(ch2.8).ipynb 第10⾴(共10⾴)

You might also like