Professional Documents
Culture Documents
오후 3:05 파이썬 정리
제목 짓기
캐멀 케이스 (대문자로 시작)
ex) PrintHello
클래스 이름으로 사용
스네이크 케이스 (소문자로 시작)
뒤에 괄호가 있다 (함수명)
ex) print_hello() 뒤에 괄호가 없다 (변수명)
파이썬 계산 기능
1. 사칙연산
+, -, *, /
2. 지수곱
** (또는 pow(a,b))
3. 나머지
%
4. 몫
//
대입 연산자
1. +=
2. -=
3. *=
4. /=
5. //=
6. %=
7. **=
관계 연산자
1. '==' (같다)
2. '!=' (같지 않다)
3. '>' (크다)
4. '<' (작다)
5. '>=' (크거나 같다)
6. '<=' (작거나 같다)
논리 연산자
1. and(논리곱) [둘다 참이면 참]
2. or(논리합) [둘 중 하나만 참이면 참]
localhost:8888/nbconvert/html/Desktop/anaconda/파이썬 정리.ipynb?download=false 1/71
22. 12. 19. 오후 3:05 파이썬 정리
변수명 규칙
1. 문자로 시작하며 숫자가 뒤에 붙을 수 있다.
2. 변수명에 특수문자는 사용할 수 없다.
3. 변수명에 .(마침표)는 사용 불가능 하지만 _ (언더바)는 사용이 가능하다
4. 대소문자 구분을 한다
주석
내부에 있는 문장이지만 프로그램의 실행에서는 사용하지 않는 것을 말한다.
프로그램 해석을 용이하게 하기 위해 작성자가 삽입한다.
#은 Code의 경우는 주석, Markdown의 경우 제목이 된다.
파이썬 자료 형태
1. 정수형 (int)
2. 실수형 (float)
3. 논리형 (boolean(True/False))
4. 문자열 str
파이썬 패키지
import import package_name as name
형태로 패키지를 불러오고 as 이후로 칭한다.
In [5]: pip install numpy
In [7]: a=np.arange(15)
In [8]: a
In [9]: print(a)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]
In [10]: m=np.mean(a)
In [11]: m
7.0
Out[11]:
In [12]: print(m)
7.0
In [14]: print(np.mean(height))
# 결측치가 포함된
array nan 라서 이 출력됨
nan
In [15]: print(np.nanmean(height))
#NAN 을 제외하고 평균을 계산해 주는 nanmean 함수가 있음
62.75
In [16]: x=np.array([1,-1,np.infty,-np.infty])
In [17]: print(x/0)
#inf 가 출력되면 경고문이 뜸
[ inf -inf inf -inf]
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_85878/2756779915.
py:1: RuntimeWarning: divide by zero encountered in true_divide
print(x/0)
if 조건문
조건에 맞을 경우에만 지정한 수행할 문장을 실행하는 경우를 의미한다.
사용방법 if 조건:
(수행할 문장)
elif 2번째 조건:
(수행할 문장)
else:
(다른 조건들이 해당하지 않을 경우 수행할 문장)
In [18]: # 활용 예시
x = 6
if x>5:
print("5 보다 큽니다.")
elif x<5:
print("5 보다 작습니다.")
elif x == 5:
print("5 입니다.")
else:
print(" 값이 잘못되었습니다.")
보다 큽니다.
5
반복문
localhost:8888/nbconvert/html/Desktop/anaconda/파이썬 정리.ipynb?download=false 3/71
22. 12. 19. 오후 3:05 파이썬 정리
1
2
3
4
5
6
7
8
9
10
For 문 기본 구조
for 변수 in 리스트(튜플, 문자열) :
<실행할 문자열1>
<실행할 문자열2>
In [20]: list = ["c","d","d"]
for i in list:
print(i)
c
d
d
In [21]: #1 10 부터 까지 더하는 수
isum=0
#range( 시작숫자
,(B-1) ) 개
x=range(1,11)
for i in x:
isum += i
print("1 10부터 까지의 합은
",isum," 입니다")
1부터 10까지의 합은 55 입니다
In [24]: sqsum=0
x=range(1,11)
for i in x:
sqsum += i**2
print("1 10부터 까지 제곱합은",sqsum, "입니다")
1부터 10까지 제곱합은 385 입니다
In [ ]: # 평균 구하는 방법
xsum = 0
x=[10,12,21]
for i in range(0,3):
xsum = xsum + x[i]
43
685
3 단
========
3 * 1 = 3
3 * 2 = 6
3 * 3 = 9
3 * 4 = 12
3 * 5 = 15
3 * 6 = 18
3 * 7 = 21
3 * 8 = 24
3 * 9 = 27
4 단
========
4 * 1 = 4
4 * 2 = 8
4 * 3 = 12
4 * 4 = 16
4 * 5 = 20
4 * 6 = 24
4 * 7 = 28
4 * 8 = 32
4 * 9 = 36
5 단
========
5 * 1 = 5
5 * 2 = 10
5 * 3 = 15
5 * 4 = 20
5 * 5 = 25
5 * 6 = 30
5 * 7 = 35
5 * 8 = 40
5 * 9 = 45
6 단
========
6 * 1 = 6
6 * 2 = 12
6 * 3 = 18
6 * 4 = 24
6 * 5 = 30
6 * 6 = 36
6 * 7 = 42
6 * 8 = 48
6 * 9 = 54
7 단
========
7 * 1 = 7
7 * 2 = 14
localhost:8888/nbconvert/html/Desktop/anaconda/파이썬 정리.ipynb?download=false 6/71
22. 12. 19. 오후 3:05 파이썬 정리
7 * 3 = 21
7 * 4 = 28
7 * 5 = 35
7 * 6 = 42
7 * 7 = 49
7 * 8 = 56
7 * 9 = 63
8 단
========
8 * 1 = 8
8 * 2 = 16
8 * 3 = 24
8 * 4 = 32
8 * 5 = 40
8 * 6 = 48
8 * 7 = 56
8 * 8 = 64
8 * 9 = 72
9 단
========
9 * 1 = 9
9 * 2 = 18
9 * 3 = 27
9 * 4 = 36
9 * 5 = 45
9 * 6 = 54
9 * 7 = 63
9 * 8 = 72
9 * 9 = 81
m1 = xsum/n1
var1=(xsqsum-(n1*m1**2))/(n1-1)
sd1=math.sqrt(var1)
cv1 = sd1/m1
m2 = ysum/n2
var2=(ysqsum-(n2*m2**2))/(n2-1)
sd2=math.sqrt(var2)
cv2 = sd2/m2
if cv1>cv2:
의 변이계수가 더 크므로 x변동이 더 크다")
print("x
elif cv1<cv2:
의 변이계수가 더 크므로 y변동이 더 크다")
print("y
else:
두 자료의 변이계수가 같다")
print("
함수
def 함수명(입력인자):
명령문 return
In [1]: # 합구하기 함수
def summary(x):
isum = 0
n=len(x)
for i in range(0,n):
isum+=x[i]
print(isum)
In [2]: x=[10,20,30,40,50]
In [3]: summary(x)
150
m =xsum/n
var=sqsum/(n-1)-(n*m**2)/(n-1)
sd=math.sqrt(var)
print(" 평균
")
In [7]: # 홀짝 구분
def even(x):
remain = x%2
if remain == 0:
print(" 짝수입니다.")
else:
print(" 홀수입니다.")
even(5)
홀수입니다.
In [14]: #상관계수
def corr(x,y):
import math
xsum=0; ysum=0; xysum=0; xsqsum=0; ysqsum=0
n=len(x)
for i in range(0,n):
xsum += x[i]
localhost:8888/nbconvert/html/Desktop/anaconda/파이썬 정리.ipynb?download=false 8/71
22. 12. 19. 오후 3:05 파이썬 정리
ysum += y[i]
xysum += x[i] + y[i]
xsqsum += x[i]**2
ysqsum += y[i]**2
xmean = xsum/len(x)
ymean = ysum/len(y)
corr=(xysum-n*xmean*ymean)/(math.sqrt(xsqsum-n*xmean**2)*math.sqrt(ysqsu
print(round(corr,3))
x=[73,77,68]
y=[45,67,98]
corr(x,y)
-61.765
날짜 및 시간
In [1]: import datetime as dt
In [3]: print(now.year)
2022
벡터와 행렬
성질에 따라 list, tuple, dict 등으로 구분함
In [7]: import numpy as np
xv=np.array([1,2,3,4,5,6,7,8,9])
In [5]: xv=np.arange(1,10)
In [8]: xv
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
Out[8]:
In [9]: xv[4]
5
Out[9]:
In [14]: xv[1:4]
array([2, 3, 4])
Out[14]:
In [20]: xm=np.array([[1,2,3],[4,5,6],[7,8,9]])
In [16]: xm=np.arange(1,10)
In [18]: xm=xm.reshape(3,3)
In [19]: xm
In [22]: xm[:,0] # 첫 번째 열
array([1, 4, 7])
Out[22]:
In [23]: xm[0:2,:] # 첫 두행
array([[1, 2, 3],
Out[23]:
[4, 5, 6]])
In [25]: xm[0,2]=11
xm
array([[ 1, 2, 11],
Out[25]:
[ 4, 5, 6],
[ 7, 8, 9]])
In [26]: x=list(range(20))
x
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Out[26]:
List(리스트)
사용 모양 [ ]
In [32]: a=[1,2,'c']
x=[1,2,'1',a]
In [33]: print(x)
In [34]: x2=[[1,2,3],[4,5,6],[7,8,9]]
x2
In [ ]: x2[0][2] # 첫 번째 행, 세 번째 열의 값
튜플(tuple)
불변한 순서가 있는 객체의 집합이며, list형과 비슷하지만 한 번 생성되면 값의 변경이 불가능하다.
셀 수 있는 수량의 순서있는 열거라고도 볼 수 있다.
튜플은 소괄호로 만들 수 있다.
사용 모양 ( )
In [35]: tp1 =()
tp1
()
Out[35]:
In [37]: tp1=(1,)
tp1
(1,)
Out[37]:
In [39]: tp1=(1,2,3)
tp1
(1, 2, 3)
Out[39]:
In [40]: tp1[2]
3
Out[40]:
딕셔너리(dictionary)
딕셔너리는 키(key)와 값(value)이 한 쌍이 하나의 대응관ㄴ계를 가지고 있는 자료형
딕셔너리={키:값}
딕셔너리={Key1:Value1, Key2:Value2}
※ 주의할 점
In [44]: print(country_code['korea'])
82
KeyError: 1
In [47]: print(country_code.keys())
In [48]: print(country_code.values())
dict_values([82, 1, 86])
In [49]: print(country_code.items())
[[-2. 1. ]
[ 1.5 -0.5]]
In [54]: print(a*a.I)
[[1.00000000e+00 1.11022302e-16]
[0.00000000e+00 1.00000000e+00]]
In [60]: np.linalg.eig(a)
(array([-0.37228132, 5.37228132]),
Out[60]:
matrix([[-0.82456484, -0.41597356],
[ 0.56576746, -0.90937671]]))
array([2, 3, 3, 4, 4, 4, 4, 5, 5, 6])
Out[63]:
In [65]: sp.sum(fish_data) # 합을 구함
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/3877179421.p
y:1: DeprecationWarning: scipy.sum is deprecated and will be removed in SciP
y 2.0.0, use numpy.sum instead
sp.sum(fish_data) # 합을 구함
40
Out[65]:
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/2465631497.p
y:4: DeprecationWarning: scipy.sum is deprecated and will be removed in SciP
y 2.0.0, use numpy.sum instead
sum_value=sp.sum(fish_data)
4.0
Out[69]:
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/2488955426.p
y:1: DeprecationWarning: scipy.mean is deprecated and will be removed in Sci
Py 2.0.0, use numpy.mean instead
sp.mean(fish_data) #scipy mean 의 함수를 이용
4.0
Out[67]:
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/3771105747.p
y:1: DeprecationWarning: scipy.sum is deprecated and will be removed in SciP
y 2.0.0, use numpy.sum instead
sigma_2_pop=sp.sum((fish_data-mu)**2)/N # 모분산 구하는 방법
1.2
Out[70]:
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/2116015783.p
y:1: DeprecationWarning: scipy.var is deprecated and will be removed in SciP
y 2.0.0, use numpy.var instead
sp.var(fish_data,ddof=0) # 표본분산은
ddof=1, ddof=0 모분산은
1.2
Out[71]:
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/3438274716.p
y:1: DeprecationWarning: scipy.sum is deprecated and will be removed in SciP
y 2.0.0, use numpy.sum instead
sigma_2_sample=sp.sum((fish_data-mu)**2/(N-1)) # 표본분산 구하는 방법
1.3333333333333335
Out[76]:
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/2982773417.p
y:1: DeprecationWarning: scipy.sqrt is deprecated and will be removed in Sci
Py 2.0.0, use numpy.lib.scimath.sqrt instead
sigma_sample=sp.sqrt(sigma_2_sample) # 표본 표준편차 구하는 방법
1.1547005383792517
Out[77]:
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/1563664543.p
y:2: DeprecationWarning: scipy.std is deprecated and will be removed in SciP
y 2.0.0, use numpy.std instead
sp.std(fish_data,ddof=1)
1.1547005383792515
Out[78]:
In [87]: fish_data_3=np.array([1,2,3,4,5,6,7,8,9])
stats.scoreatpercentile(fish_data_3,25) # 제 25 백분위수
3.0
Out[87]:
다변량 데이터 관리
pandas의 데이터 프레임으로 간단하게 관리할 수 있으며, 다음은 pandas와 scipy를 임포트한 후 표시
자릿수를 지정하는 코드
In [90]: # 수치 계산에 사용하는 라이브러리
import pandas as pd
import scipy as sp
# 표시 자릿수 지정
%precision 3
'%.3f'
Out[90]:
In [91]: fish_multi=pd.read_csv('/Users/mind/desktop/anaconda/fish_multi.csv')
In [92]: fish_multi
Out[98]: length
count mean std min 25% 50% 75% max
spcies
A 3.0 3.0 1.0 2.0 2.5 3.0 3.5 4.0
B 3.0 8.0 2.0 6.0 7.0 8.0 9.0 10.0
In [103… N=len(x)
mu_x=sp.mean(x)
mu_y=sp.mean(y)
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/2612063678.p
y:2: DeprecationWarning: scipy.mean is deprecated and will be removed in Sci
Py 2.0.0, use numpy.mean instead
mu_x=sp.mean(x)
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/2612063678.p
y:3: DeprecationWarning: scipy.mean is deprecated and will be removed in Sci
Py 2.0.0, use numpy.mean instead
mu_y=sp.mean(y)
In [104… cov=sum((x-mu_x)*(y-mu_y))/N
cov
2.000
Out[104]:
In [105… cov_sample=sum((x-mu_x)*(y-mu_y))/(N-1)
cov_sample
2.500
Out[105]:
In [110… #분산 계산
sigma_2_x=sp.var(x,ddof=1) # 표본분산
sigma_2_y=sp.var(y,ddof=1)
#상관계수
rho=cov_sample/(sp.sqrt(sigma_2_x)*sp.sqrt(sigma_2_y))
세자리에서 반올림되어서 이된건 본래는
rho # 1 0.9999999999999999998
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/1312370.py:
2: DeprecationWarning: scipy.var is deprecated and will be removed in SciPy
2.0.0, use numpy.var instead
sigma_2_x=sp.var(x,ddof=1) # 표본분산
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/1312370.py:
3: DeprecationWarning: scipy.var is deprecated and will be removed in SciPy
2.0.0, use numpy.var instead
sigma_2_y=sp.var(y,ddof=1)
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/1312370.py:
5: DeprecationWarning: scipy.sqrt is deprecated and will be removed in SciPy
2.0.0, use numpy.lib.scimath.sqrt instead
rho=cov_sample/(sp.sqrt(sigma_2_x)*sp.sqrt(sigma_2_y))
1.000
Out[110]:
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/2107064198.p
y:1: DeprecationWarning: scipy.corrcoef is deprecated and will be removed in
SciPy 2.0.0, use numpy.corrcoef instead
sp.corrcoef(x,y) #scipy corrcoef 의 함수를 이용하여 상관행렬 계산
array([[1., 1.],
Out[111]:
[1., 1.]])
In [114… x=np.array([0,1,2,3,4,5,6,7,8,9])
y=np.array([2,3,4,3,5,4,6,7,4,8])
# 꺽은선 그래프
plt.plot(x,y,color='black')
plt.title("lineplot matplotlib")
plt.xlabel("X")
plt.ylabel("Y")
In [117… fish_data=np.array([2,3,3,4,4,4,4,5,5,6])
fish_data
array([2, 3, 3, 4, 4, 4, 4, 5, 5, 6])
Out[117]:
<AxesSubplot:>
Out[120]:
In [121… sns.distplot(fish_data,color='black')
<AxesSubplot:ylabel='Density'>
Out[121]:
<AxesSubplot:xlabel='spcies', ylabel='length'>
Out[122]:
In [124… 막대그래프
# barplot
sns.barplot(x='spcies',y='length',data=fish_multi,color='gray')
높이는 평균을 의미하고 막대는 에러바를 의미합니다
# .
<AxesSubplot:xlabel='spcies', ylabel='length'>
Out[124]:
In [126… iris.groupby('species').mean()
In [127… sns.pairplot(iris,hue='species',palette="gray")
<seaborn.axisgrid.PairGrid at 0x7fe148667cd0>
Out[127]:
여기부터 기말고사 범위
localhost:8888/nbconvert/html/Desktop/anaconda/파이썬 정리.ipynb?download=false 21/71
5장 표본추출 시뮬레이션
22. 12. 19. 오후 3:05 파이썬 정리
※전제
모집단을 완전히 알고 있다는 전제하에 모집단에서 표본추출의 시뮬레이션을 하고자 함
난수 발생 방법
물고기 5마리밖에 없는 호수를 대상으로 예를 들어보면, 물고기의 몸길이 데이터를 numpy 배열을 이요
해 만들고자 한다.
In [129… fish_5=np.array([2,3,4,5,6])
fish_5
array([2, 3, 4, 5, 6])
Out[129]:
array([2])
Out[132]:
In [133… np.random.choice(fish_5,size=3,replace=False)
array([4, 2, 3])
Out[133]:
In [135… # 동일 난수 추출 방법 을 고정시키면 됨
seed(n) n
# seed : 난수 생성 초기값 부여 샘플 생성 개수
/ size :
np.random.seed(1)
np.random.choice(fish_5,size=3,replace=False)
array([4, 3, 6])
Out[135]:
In [138… np.random.choice(fish_5,size=3,replace=False)
array([4, 5, 3])
Out[138]:
In [139… # 복원추출 방법
np.random.choice(fish_5,size=3,replace=True)
array([6, 3, 4])
Out[139]:
정규분포의 확률밀도함수
모집단의 히스토그램과 N(4,0.64)인 분포의 확률밀도함수를 비교해 보자.
In [141… import numpy as np
In [142… x=np.arange(start=1,stop=7.1,step=0.1)
x
array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2,
Out[142]:
2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5,
3.6, 3.7, 3.8, 3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8,
4.9, 5. , 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6. , 6.1,
6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7. ])
In [144… #loc : 평균
, scale : 표준편차
stats.norm.pdf(x=x,loc=4,scale=0.8)
In [146… plt.plot(x,stats.norm.pdf(x=x,loc=4,scale=0.8),color='black')
[<matplotlib.lines.Line2D at 0x7fe1486eebe0>]
Out[146]:
정규분포에서 난수 추출하는 경우
In [147… sampling_norm=stats.norm.rvs(loc=4,scale=0.8,size=10)
sampling_norm
In [149… sp.mean(sampling_norm)
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/3552878522.p
y:1: DeprecationWarning: scipy.mean is deprecated and will be removed in Sci
Py 2.0.0, use numpy.mean instead
sp.mean(sampling_norm)
3.910
Out[149]:
In [150… sp.var(sampling_norm)
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/374415388.p
y:1: DeprecationWarning: scipy.var is deprecated and will be removed in SciP
y 2.0.0, use numpy.var instead
sp.var(sampling_norm)
0.822
Out[150]:
표본평균 여러 번 계산하기
In [151… population=stats.norm(loc=4,scale=0.8)
In [153… sampling_mean_array
In [156… np.random.seed(1)
for i in range(0,1000):
localhost:8888/nbconvert/html/Desktop/anaconda/파이썬 정리.ipynb?download=false 25/71
22. 12. 19. 오후 3:05 파이썬 정리
sample=population.rvs(size=10)
sampling_mean_array[i]=sp.mean(sample)
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/4157085529.p
y:4: DeprecationWarning: scipy.mean is deprecated and will be removed in Sci
Py 2.0.0, use numpy.mean instead
sampling_mean_array[i]=sp.mean(sample)
In [157… sampling_mean_array
In [158… sp.mean(sampling_mean_array)
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/3859277482.p
y:1: DeprecationWarning: scipy.mean is deprecated and will be removed in Sci
Py 2.0.0, use numpy.mean instead
sp.mean(sampling_mean_array)
4.008
Out[158]:
In [159… sp.var(sampling_mean_array)
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/2230280161.p
y:1: DeprecationWarning: scipy.var is deprecated and will be removed in SciP
y 2.0.0, use numpy.var instead
sp.var(sampling_mean_array)
In [160… sns.distplot(sampling_mean_array,color='black')
/Users/mind/opt/anaconda3/lib/python3.9/site-packages/seaborn/distributions.
py:2619: FutureWarning: `distplot` is a deprecated function and will be remo
ved in a future version. Please adapt your code to use either `displot` (a f
igure-level function with similar flexibility) or `histplot` (an axes-level
function for histograms).
warnings.warn(msg, FutureWarning)
<AxesSubplot:ylabel='Density'>
Out[160]:
In [162… size_array=np.arange(10,100100,100)
size_array
In [163… sampling_mean_array_size=np.zeros(len(size_array))
In [164… np.random.seed(1)
for i in range(0,len(size_array)):
sample=population.rvs(size=size_array[i])
sampling_mean_array_size[i]=sp.mean(sample)
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/2967322461.p
y:4: DeprecationWarning: scipy.mean is deprecated and will be removed in Sci
Py 2.0.0, use numpy.mean instead
sampling_mean_array_size[i]=sp.mean(sample)
In [166… plt.plot(size_array,sampling_mean_array_size,color='black')
plt.xlabel("sample size")
plt.ylabel("sample mean")
In [170… np.random.seed(1)
sp.mean(calc_sample_mean(size=10,n_trial=10000))
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/1280674185.p
y:5: DeprecationWarning: scipy.mean is deprecated and will be removed in Sci
Py 2.0.0, use numpy.mean instead
sample_mean_array[i]=sp.mean(sample)
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/3417512147.p
y:2: DeprecationWarning: scipy.mean is deprecated and will be removed in Sci
Py 2.0.0, use numpy.mean instead
sp.mean(calc_sample_mean(size=10,n_trial=10000))
4.004
Out[170]:
array([ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26,
Out[171]:
28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52,
54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78,
80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100])
In [179… sample_mean_std_array=np.zeros(len(size_array))
In [181… np.random.seed(1)
for i in range(0,len(size_array)):
sample_mean=calc_sample_mean(size=size_array[i],n_trial=100)
sample_mean_std_array[i]=sp.std(sample_mean,ddof=1)
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/1280674185.p
y:5: DeprecationWarning: scipy.mean is deprecated and will be removed in Sci
Py 2.0.0, use numpy.mean instead
sample_mean_array[i]=sp.mean(sample)
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/1683221369.p
y:4: DeprecationWarning: scipy.std is deprecated and will be removed in SciP
y 2.0.0, use numpy.std instead
sample_mean_std_array[i]=sp.std(sample_mean,ddof=1)
In [183… plt.plot(size_array,sample_mean_std_array,color='black')
plt.xlabel("sample size")
plt.ylabel("mean_std value")
중심극한정리
In [186… # 샘플사이즈와 시행횟수
n_size=10000
n_trial=50000
# 앞면 뒷면
= 1, = 0
coin=np.array([0,1])
# 앞면이 나온 횟수
count_coin=np.zeros(n_trial)
# 동전을 번 던지는 시행을
n_size n_trial 번 수행
np.random.seed(1)
for i in range(0,n_trial):
count_coin[i]=sp.sum(np.random.choice(coin,size=n_size,replace=True))
#히스토그램 그리기
sns.distplot(count_coin,color='black')
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/2554774612.p
y:11: DeprecationWarning: scipy.sum is deprecated and will be removed in Sci
Py 2.0.0, use numpy.sum instead
count_coin[i]=sp.sum(np.random.choice(coin,size=n_size,replace=True))
/Users/mind/opt/anaconda3/lib/python3.9/site-packages/seaborn/distributions.
py:2619: FutureWarning: `distplot` is a deprecated function and will be remo
ved in a future version. Please adapt your code to use either `displot` (a f
igure-level function with similar flexibility) or `histplot` (an axes-level
function for histograms).
warnings.warn(msg, FutureWarning)
<AxesSubplot:ylabel='Density'>
Out[186]:
정규분포와 응용
1. 라이브러리 임포트
In [187… # 수치 계산에 사용하는 라이브러리
import numpy as np
import pandas as pd
import scipy as sp
from scipy import stats
# 그래프를 그리기 위한 라이브러리
from matplotlib import pyplot as plt
import seaborn as sns
sns.set()
# 그래프를 주피터 노트북에 그리기 위한 설정
%matplotlib inline
In [188… sp.pi
3.142
Out[188]:
In [189… sp.exp(1)
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/2224795471.p
y:1: DeprecationWarning: scipy.exp is deprecated and will be removed in SciP
y 2.0.0, use numpy.exp instead
sp.exp(1)
2.718
Out[189]:
0.228
Out[190]:
In [191… np.random.seed(1)
simulated_sample=stats.norm.rvs(loc=4,scale=0.8,size=100000)
simulated_sample
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/3737397220.p
y:2: DeprecationWarning: scipy.sum is deprecated and will be removed in SciP
y 2.0.0, use numpy.sum instead
sp.sum(simulated_sample<=3)
10371
Out[193]:
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/2515190789.p
y:2: DeprecationWarning: scipy.sum is deprecated and will be removed in SciP
y 2.0.0, use numpy.sum instead
sp.sum(simulated_sample<=3)/len(simulated_sample)
0.104
Out[194]:
In [195… # P(X<=4) 를 구함
stats.norm.cdf(loc=4,scale=0.8,x=4)
0.500
Out[195]:
In [196… # P(X<=3) 를 구함
stats.norm.cdf(loc=4,scale=0.8,x=3)
0.106
Out[196]:
2.432
Out[197]:
In [198… stats.norm.ppf(loc=0,scale=1,q=0.025)
-1.960
Out[198]:
In [200… lower=stats.norm.cdf(loc=0,scale=1,x=-1.96)
stats.norm.ppf(loc=0,scale=1,q=lower)
-1.960
Out[200]:
0.500
Out[206]:
-2.228
Out[207]:
In [212… x=range(0,21)
0.560
Out[214]:
0.440
Out[215]:
3.247
Out[216]:
0.002
Out[218]:
0.957
Out[219]:
0.265
Out[220]:
0.246
Out[221]:
0.623
Out[222]:
0.377
Out[223]:
array([7, 2, 7])
Out[224]:
0.135
Out[225]:
0.406
Out[226]:
array([1, 1, 0, 3, 3])
Out[227]:
추정과 가설검정
localhost:8888/nbconvert/html/Desktop/anaconda/파이썬 정리.ipynb?download=false 35/71
22. 12. 19. 오후 3:05 파이썬 정리
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/2839528330.p
y:10: DeprecationWarning: scipy.mean is deprecated and will be removed in Sc
iPy 2.0.0, use numpy.mean instead
mu=sp.mean(sample)
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/2839528330.p
y:11: DeprecationWarning: scipy.std is deprecated and will be removed in Sci
Py 2.0.0, use numpy.std instead
std=sp.std(sample,ddof=1)
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/2839528330.p
y:12: DeprecationWarning: scipy.sqrt is deprecated and will be removed in Sc
iPy 2.0.0, use numpy.lib.scimath.sqrt instead
se=std/sp.sqrt(len(sample))
0.948
Out[230]:
모평균에 대한 신뢰구간
μ에 대한 100x(1-α)%에 대한 신뢰구간을 구하기
In [231… x = [3.4,3.3,4.2,4.4,3.7,4.5,4.6,3.8,4.1]
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/1099207190.p
y:1: DeprecationWarning: scipy.mean is deprecated and will be removed in Sci
Py 2.0.0, use numpy.mean instead
lower=sp.mean(x)-stats.norm.ppf(0.975)*sp.sqrt(0.16)/sp.sqrt(len(x))
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/1099207190.p
y:1: DeprecationWarning: scipy.sqrt is deprecated and will be removed in Sci
Py 2.0.0, use numpy.lib.scimath.sqrt instead
lower=sp.mean(x)-stats.norm.ppf(0.975)*sp.sqrt(0.16)/sp.sqrt(len(x))
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/1099207190.p
y:2: DeprecationWarning: scipy.mean is deprecated and will be removed in Sci
Py 2.0.0, use numpy.mean instead
upper=sp.mean(x)+stats.norm.ppf(0.975)*sp.sqrt(0.16)/sp.sqrt(len(x))
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/1099207190.p
y:2: DeprecationWarning: scipy.sqrt is deprecated and will be removed in Sci
Py 2.0.0, use numpy.lib.scimath.sqrt instead
upper=sp.mean(x)+stats.norm.ppf(0.975)*sp.sqrt(0.16)/sp.sqrt(len(x))
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/3990560165.p
y:1: DeprecationWarning: scipy.mean is deprecated and will be removed in Sci
Py 2.0.0, use numpy.mean instead
stats.norm.interval(0.95,sp.mean(x), sp.sqrt(0.16)/sp.sqrt(len(x)))
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/3990560165.p
y:1: DeprecationWarning: scipy.sqrt is deprecated and will be removed in Sci
Py 2.0.0, use numpy.lib.scimath.sqrt instead
stats.norm.interval(0.95,sp.mean(x), sp.sqrt(0.16)/sp.sqrt(len(x)))
(3.739, 4.261)
Out[234]:
In [237… stats.t.interval(alpha=0.95,df=df,loc=np.mean(x),scale=sp.std(x,ddof=1)/sp.s
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/3831102112.p
y:1: DeprecationWarning: scipy.std is deprecated and will be removed in SciP
y 2.0.0, use numpy.std instead
stats.t.interval(alpha=0.95,df=df,loc=np.mean(x),scale=sp.std(x,ddof=1)/s
p.sqrt(len(x)))
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/3831102112.p
y:1: DeprecationWarning: scipy.sqrt is deprecated and will be removed in Sci
Py 2.0.0, use numpy.lib.scimath.sqrt instead
stats.t.interval(alpha=0.95,df=df,loc=np.mean(x),scale=sp.std(x,ddof=1)/s
p.sqrt(len(x)))
(3.635, 4.365)
Out[237]:
lower=hatp-stats.norm.ppf(0.975)*sp.sqrt(v)
upper=hatp+stats.norm.ppf(0.975)*sp.sqrt(v)
In [2]: stats.norm.interval(0.95,loc=hatp,scale=sp.sqrt(v))
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_40027/3925720477.
py:1: DeprecationWarning: scipy.sqrt is deprecated and will be removed in Sc
iPy 2.0.0, use numpy.lib.scimath.sqrt instead
stats.norm.interval(0.95,loc=hatp,scale=sp.sqrt(v))
(-0.007439495783560755, 0.047439495783560756)
Out[2]:
Image('/Users/mind/downloads/img1.jpeg')
Out[5]:
hatp1=x1/n
hatp2=x2/n
hatp=hatp1-hatp2
v=hatp*(1-hatp1)/n+hatp2*(1-hatp2)/n
lower=hatp-stats.norm.ppf(0.975)*sp.sqrt(v)
upper=hatp+stats.norm.ppf(0.975)*sp.sqrt(v)
(0.007791453721145501, 0.0322085462788545)
신뢰 하한 : 0.008, 신뢰 상한
: 0.032
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_40027/818740128.p
y:13: DeprecationWarning: scipy.sqrt is deprecated and will be removed in Sc
iPy 2.0.0, use numpy.lib.scimath.sqrt instead
lower=hatp-stats.norm.ppf(0.975)*sp.sqrt(v)
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_40027/818740128.p
y:14: DeprecationWarning: scipy.sqrt is deprecated and will be removed in Sc
iPy 2.0.0, use numpy.lib.scimath.sqrt instead
upper=hatp+stats.norm.ppf(0.975)*sp.sqrt(v)
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_40027/818740128.p
y:18: DeprecationWarning: scipy.sqrt is deprecated and will be removed in Sc
iPy 2.0.0, use numpy.lib.scimath.sqrt instead
fci=stats.norm.interval(0.95,loc=hatp,scale=sp.sqrt(v))
x=[45,47,44,46,47,48]
n=len(x)
var=sp.var(x,ddof=1)
lower=(n-1)*var/stats.chi2.ppf(0.975,n-1)
upper=(n-1)*var/stats.chi2.ppf(0.025,n-1)
print('')
df = n-1
chi2_under, chi2_upper = stats.chi2.interval(alpha=0.95, df=n-1)
interval_under = df*var/chi2_upper
interval_upper = df*var/chi2_under
print(" 신뢰 하한: ", interval_under, " 신뢰 상한
: ", interval_upper)
신뢰 하한 : 0.8442105318489914 신뢰 상한 : 13.033183316449364
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_40027/3986055567.
py:9: DeprecationWarning: scipy.var is deprecated and will be removed in Sci
Py 2.0.0, use numpy.var instead
var=sp.var(x,ddof=1)
localhost:8888/nbconvert/html/Desktop/anaconda/파이썬 정리.ipynb?download=false 40/71
22. 12. 19. 오후 3:05 파이썬 정리
모분산 비에 대한 신뢰구간
In [18]: from IPython.display import Image
Image('/Users/mind/downloads/img2.jpeg')
Out[18]:
lower = v1/v2/stats.f.ppf(0.99,n1-1,n2-1)
upper = v1/v2/stats.f.ppf(0.01,n1-1,n2-1)
Image('/Users/mind/downloads/img3_1.jpeg')
Out[24]:
In [25]: Image('/Users/mind/downloads/img3_2.jpeg')
Out[25]:
In [26]: Image('/Users/mind/downloads/img3_3.jpeg')
Out[26]:
In [27]: z=(0.59-0.6)/(0.1/sp.sqrt(100))
if(z<stats.norm.ppf(0.05,loc=0,scale=1)):
귀무가설을 기각합니다
print(" .")
else:
귀무가설을 채택합니다
print(" .")
귀무가설을 채택합니다.
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_40027/879196106.p
y:1: DeprecationWarning: scipy.sqrt is deprecated and will be removed in Sci
Py 2.0.0, use numpy.lib.scimath.sqrt instead
z=(0.59-0.6)/(0.1/sp.sqrt(100))
In [28]: # p 값을 출력
stats.norm.cdf(z,loc=0,scale=1)
0.15865525393145685
Out[28]:
In [29]: Image('/Users/mind/downloads/img4.jpeg')
Out[29]:
귀무가설을 채택합니다
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_40027/3492771773.
py:2: DeprecationWarning: scipy.sqrt is deprecated and will be removed in Sc
iPy 2.0.0, use numpy.lib.scimath.sqrt instead
t=(14.2-15)/(2.5/sp.sqrt(25))
0.12268143730144648
Out[37]:
In [40]: x=[3.4,3.3,4.2,4.4,3.7,4.5,4.6,3.8,4.1]
t=(sp.mean(x)-3.5)/(sp.std(x,ddof=1)/sp.sqrt(len(x)))
t
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_40027/399501351.p
y:2: DeprecationWarning: scipy.mean is deprecated and will be removed in Sci
Py 2.0.0, use numpy.mean instead
t=(sp.mean(x)-3.5)/(sp.std(x,ddof=1)/sp.sqrt(len(x)))
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_40027/399501351.p
y:2: DeprecationWarning: scipy.std is deprecated and will be removed in SciP
y 2.0.0, use numpy.std instead
t=(sp.mean(x)-3.5)/(sp.std(x,ddof=1)/sp.sqrt(len(x)))
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_40027/399501351.p
y:2: DeprecationWarning: scipy.sqrt is deprecated and will be removed in Sci
Py 2.0.0, use numpy.lib.scimath.sqrt instead
t=(sp.mean(x)-3.5)/(sp.std(x,ddof=1)/sp.sqrt(len(x)))
3.162277660168379
Out[40]:
In [42]: # 양측검정 속성
( alterantive 에 대한 정의가 없으면 양측검정이다.)
ttest=stats.ttest_1samp(x,3.5)
print('t-value=%.3f,p-value=%.3f'%ttest)
t-value=3.162,p-value=0.007
t-value=3.162,p-value=0.993
In [45]: Image('/Users/mind/downloads/img5.jpeg')
Out[45]:
In [46]: # 등분산 확인
x=[44,44,56,46,47,38,58,53,49,35,46,30,41]
y=[35,47,55,29,40,39,32,41,42,57,51,39]
#등분산 테스트
equal=stats.levene(x,y)
print("LeveneResult(statistic=%.3f,pvalue=%.3f)"% equal)
LeveneResult(statistic=0.092,pvalue=0.764)
검정통계량=0.868, p값=0.197
In [48]: Image('/Users/mind/downloads/img6.jpeg')
Out[48]:
In [51]: x=[70,80,72,76,76,76,72,78,82,64,74,92,74,68,84]
y=[68,72,62,70,58,66,68,52,64,72,74,60,74,72,74]
Ttest_relResult(statistic=3.105360487466109, pvalue=0.0038747180533270594)
검정통계량 : 3.105, p :0.004 값
In [52]: Image('/Users/mind/downloads/img7.jpeg')
Out[52]:
a=np.array([1.1,2.3,4.3,2.2,5.3])
n=len(a)
df= n-1
alpha=0.05
sigma2=1
chisq=(n-1)*sp.var(a,ddof=1)/sigma2
p_value=2*(1-sp.stats.chi2.cdf(chisq,df))
print('chisq =',round(chisq,3),' p-value =',round(p_value,3))
if(p_value<alpha):
print(" :결론 귀무가설을 기각하고 모분산이 1이라고 할 수 없다.")
else:
print(" :결론 귀무가설을 채택하며 모분산이 1이라고 할 수 있다.")
chisq = 11.712 p-value = 0.039
결론 : 귀무가설을 기각하고 모분산이 1이라고 할 수 없다.
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_40027/3621355189.
py:10: DeprecationWarning: scipy.var is deprecated and will be removed in Sc
iPy 2.0.0, use numpy.var instead
chisq=(n-1)*sp.var(a,ddof=1)/sigma2
f=sp.var(a,ddof=1)/sp.var(b,ddof=1)
p_value=2*(1-sp.stats.f.cdf(f,df1,df2))
if(p_value<alpha):
print(" :결론 귀무가설을 기각하고 모분산이 다르다.")
else:
print(" :결론 귀무가설을 채택하고 두 모분산은 같다.")
F= 2.889 p-value = 0.547
카이제곱검정
적합도 검정
카이제곱 검정은 근사 통계량임. 각 구간의 기대빈도가 5이상일 경우 검정 통계량은 카이제곱분포를 따른
다.
이것이 만족되지 않으면 카이제곱 검정 방법을 사용할 수 없다.
In [59]: Image('/Users/mind/downloads/img8.jpeg')
Out[59]:
Power_divergenceResult(statistic=9.5, pvalue=0.09070739170404737)
Out[58]:
In [60]: Image('/Users/mind/downloads/img9.jpeg')
Out[60]:
p0=np.array([9/16,3/16,3/16,1/16])
n=381
observed=[216,79,65,21]
expected=n*p0
sp.stats.chisquare(f_obs=observed,f_exp=expected)
Power_divergenceResult(statistic=1.726159230096238, pvalue=0.631133745993308
Out[62]:
4)
In [64]: Image('/Users/mind/downloads/img10.jpeg')
# 가설설정
#H0 : 성별과 텔레비전 크기는 서로 독립이다 . vs H1 : not H0
Out[64]:
(23.55514855514856,
Out[66]:
9.80647431753431e-05,
4,
array([[62.4, 93.6, 52. , 36.4, 15.6],
[57.6, 86.4, 48. , 33.6, 14.4]]))
분산분석
분산분석 개요
집단이 셋 이상인 경우에 사용됨
평균을 비교하고자 하는 각 그룹별 평균의 분산이 0이면 세그룹의 평균은 동일하다고 판단할 수 있다.
각 그룹의 평균은 전체 평균에서 각 그룹들이 가지고 있는 요이난크 변한 값이라고 가정한다.
A라는 그룹의 평균μa는 "μ+a라는 그룹이 갖는 효과"로 표현된다."
이러한 효과를 요인이라고 하며 각 요인이 취하는 값에 따라서 그룹이 나누어진다.
요인이 하나인 경우 이를 일원배치라 하며, 일원배치 자료로부터 실시하는 분산분석을 일원배치 분산분석
이라고 한다.
요인 수가 증가함에 따라 이원배치, 삼원배치로 표현한다.
※ 유사하게 동일한 배열을 반복해서 복사 붙여넣기 하는 함수 = np.tile.
※ 결합을 하기 위해서 vstack와 hstack를 사용
vstack = 배열을 세로로 결합할 때 사용
hstack = 배열을 가로로 결합할 때 사용
In [67]: Image('/Users/mind/downloads/img11_1.jpeg')
Out[67]:
In [68]: Image('/Users/mind/downloads/img11_2.jpeg')
Out[68]:
In [76]: data1={'cotton':[7,7,15,11,9,12,17,12,18,18,14,18,18,19,19,19,25,22,19,23,7,
'group':np.hstack([np.tile("a1",5),np.tile("a2",5),np.tile("a3",5),np.
data=DataFrame(data1)
print(data)
cotton group
0 7 a1
1 7 a1
2 15 a1
3 11 a1
4 9 a1
5 12 a2
6 17 a2
7 12 a2
8 18 a2
9 18 a2
10 14 a3
11 18 a3
12 18 a3
13 19 a3
14 19 a3
15 19 a4
16 25 a4
17 22 a4
18 19 a4
19 23 a4
20 7 a5
21 10 a5
22 11 a5
23 15 a5
24 11 a5
Out[81]:
가설 : H0 : α1=α2=α3=...=αa =0
H1 : 최소한 하나의 수준에서는 0잉 아니다. F0 = MSA/MSE ~ F(a-1,n-a-b+1) F0>F이면, 귀무가설
localhost:8888/nbconvert/html/Desktop/anaconda/파이썬 정리.ipynb?download=false 51/71
22. 12. 19. 오후 3:05 파이썬 정리
을 기각
F0<F이면, 귀무가설을 채택
In [82]: Image('/Users/mind/downloads/img13.jpeg')
Out[82]:
Out[87]: design ad y
0 a 1 23
1 b 1 15
2 c 1 18
3 a 0 16
4 b 0 9
sum_sq df F PR(>F)
design 58.333333 2.0 175.0 0.005682
ad 66.666667 1.0 400.0 0.002491
Residual 0.333333 2.0 NaN NaN
Intercept 16.166667
design[T.b] -7.500000
design[T.c] -5.000000
ad 6.666667
dtype: float64
<AxesSubplot:xlabel='design', ylabel='y'>
Out[91]:
In [92]: sns.boxplot(x="ad",y="y",data=df)
<AxesSubplot:xlabel='ad', ylabel='y'>
Out[92]:
반복이 있는 이원 분산분석
In [93]: Image('/Users/mind/downloads/img14.jpeg')
Out[93]:
In [94]: Image('/Users/mind/downloads/img15.jpeg')
Out[94]:
pressure = np.array([200,220,240,200,220,240,200,220,240,200,220,240])
temp=np.array(["low","low","low","low","low","low","high","high","high","hig
y=np.array([90.4,90.7,90.2,90.2,90.1,90.4,92.2,91.6,90.5,93.7,91.8,92.8])
df=pd.DataFrame({
'pressure': pressure,
'temp':temp,
'y':y
})
df.head()
In [102… formula='y~C(temp)+C(pressure)+C(temp):C(pressure)'
model=smf.ols(formula,data=df).fit()
aov_table=sm.stats.anova_lm(model,typ=2)
print(aov_table)
sum_sq df F PR(>F)
C(temp) 9.363333 1.0 14.009975 0.009589
C(pressure) 1.011667 2.0 0.756858 0.509201
C(temp):C(pressure) 1.171667 2.0 0.876559 0.463473
Residual 4.010000 6.0 NaN NaN
In [104… Image('/Users/mind/downloads/img16_1.jpeg')
Out[104]:
In [105… Image('/Users/mind/downloads/img16_2.jpeg')
Out[105]:
In [106… Image('/Users/mind/downloads/img16_3.jpeg')
Out[106]:
In [107… Image('/Users/mind/downloads/img16_4.jpeg')
Out[107]:
In [109… plt.scatter(x,y,label="stars",color="green",marker="+")
<matplotlib.collections.PathCollection at 0x7fa1191053a0>
Out[109]:
In [112… 행 열
# 1 2 / 2 1 행 열의 숫자를 통해 양의 상관성이 높음을 알 수 있다.
r=np.corrcoef(x,y)
r
array([[1. , 0.96170475],
Out[112]:
[0.96170475, 1. ]])
/Users/mind/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/stats.py:1
541: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n
=10
warnings.warn("kurtosistest only valid for n>=20 ... continuing "
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
R^2 = 0.925
Intercept (β0) = -0.9725 x (β1) = 0.094
추정된 모형
y=-0.9725+0.094x
In [114… model_ex=sm.OLS(y,x) # 절편이 없음
result_ex=model_ex.fit()
print(result_ex.summary())
print(result_ex.predict(x))
Notes:
[1] R² is computed without centering (uncentered) since the model does not c
ontain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is corre
ctly specified.
[ 1.30331969 0.54876618 0.7545535 0.06859577 -0.06859577 3.01821401
3.15540555 1.09753237 3.77276751 2.33225628]
/Users/mind/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/stats.py:1
541: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n
=10
warnings.warn("kurtosistest only valid for n>=20 ... continuing "
In [117… #위와 동일
lm_reg=smf.ols(formula='y~x+0',data=df).fit()
lm_reg.summary()
/Users/mind/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/stats.py:1
541: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n
=10
warnings.warn("kurtosistest only valid for n>=20 ... continuing "
Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a
constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
추정된 회귀모형
y = -0.9725+0.9367x
In [118… # 이차다항회귀모형에 적합시켜보기
import numpy as np
import matplotlib.pyplot as plt
x=np.array([1.9,0.8,1.1,0.1,-0.1,4.4,4.6,1.6,5.5,3.4])
y=np.array([0.7,-1.0,-0.2,-1.2,-0.1,3.4,4.0,0.8,3.7,2.0])
x2 = x**2
/Users/mind/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/stats.py:1
541: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n
=10
warnings.warn("kurtosistest only valid for n>=20 ... continuing "
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
추정된 회귀모형
y=-0.9053+0.8280x+0.0206x^2
다중회귀분석 (다중회귀모형)
In [120… import pandas as pd; import scipy.stats as ss; import seaborn as sns
all=pd.DataFrame({'temp':[195,179,205,204,167,184,187],
'pressure':[57,61,60,62,61,59,62],
'intensity':[81.4,122.2,170.7,175.6,150.3,96.8,169.8]})
scatter=sns.PairGrid(all)
scatter.map(sns.scatterplot)
<seaborn.axisgrid.PairGrid at 0x7fa0c8326bb0>
Out[120]:
In [122… prod=smf.ols(formula='intensity~temp+pressure',data=all).fit()
prod.summary()
print(prod.summary())
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is corre
ctly specified.
[2] The condition number is large, 8.16e+03. This might indicate that there
are
strong multicollinearity or other numerical problems.
/Users/mind/opt/anaconda3/lib/python3.9/site-packages/statsmodels/stats/stat
tools.py:74: ValueWarning: omni_normtest is not valid with less than 8 obser
vations; 7 samples were given.
warn("omni_normtest is not valid with less than 8 observations; %i "
/Users/mind/opt/anaconda3/lib/python3.9/site-packages/statsmodels/stats/stat
tools.py:74: ValueWarning: omni_normtest is not valid with less than 8 obser
vations; 7 samples were given.
warn("omni_normtest is not valid with less than 8 observations; %i "
데이터 처리 실무 과제
In [6]: s = [20,22,18,19,7,6]
sorted(s)
In [5]: s.sort()
s
In [9]: s = [20,22,18,19,7,6]
sorted(s,reverse=True)
In [10]: s = [20,22,18,19,7,6,22,19,6]
set(s)
In [12]: x1=np.random.normal(loc=0,scale=1,size=1000)
x2=np.random.normal(loc=0,scale=1,size=1000)
In [13]: stats.shapiro(x1)
ShapiroResult(statistic=0.9983627200126648, pvalue=0.4683176875114441)
Out[13]:
In [14]: stats.bartlett(x1,x2)
BartlettResult(statistic=6.649219979740087, pvalue=0.009919924282583092)
Out[14]:
In [15]: stats.levene(x1,x2)
LeveneResult(statistic=8.886769444543772, pvalue=0.0029071468929051667)
Out[15]:
In [19]: x= np.array([0,1,2,3,4,5,6,7,8,9])
y=(x**2)/10
plt.plot(x,y,color='blue')
plt.title("quadratic plot")
plt.xlabel("x")
plt.ylabel("y")
In [22]: pi=3.14
x=np.arange(0,20)*pi/10
y=np.sin(x)
plt.plot(x,y,color='blue')
plt.title("sin plot")
plt.xlabel("x")
plt.ylabel("sin x")
In [25]: fig=plt.figure(figsize=(10,10))
ax=Axes3D(fig)
x1=np.arange(-3,4,0.1)
x2=np.arange(-4,5,0.1)
x1, x2 = np.meshgrid(x1, x2)
y=(x1**2+x2**2+x1*x2)
ax.plot_surface(x1,x2,y,rstride=1,cstride=1,cmap=plt.cm.hot)
ax.contourf(x1,x2,y,zdir='z',offset=-2,cmap=plt.cm.hot)
ax.set_zlim(-2.50)
plt.show()
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_37306/2884456614.
py:2: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure
is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False
and use fig.add_axes(ax) to suppress this warning. The default value of auto
_add_to_figure will change to False in mpl3.5 and True values will no longer
work in 3.6. This is consistent with other Axes classes.
ax=Axes3D(fig)
x=np.arange(-3,3.1,step=0.1)
plt.plot(x,stats.norm.pdf(x=x),color='black',linestyle='dotted')
plt.plot(x,stats.t.pdf(x=x,df=5),color='green')
[<matplotlib.lines.Line2D at 0x7f79f0c9cee0>]
Out[27]:
x=np.arange(-3,3.1,step=0.1)
plt.plot(x,stats.norm.pdf(x=x),color='black',linestyle='dotted')
plt.plot(x,stats.t.pdf(x=x,df=20),color='green')
[<matplotlib.lines.Line2D at 0x7f79f0c13460>]
Out[28]:
x=np.arange(-3,3.1,step=0.1)
plt.plot(x,stats.norm.pdf(x=x),color='black',linestyle='dotted')
plt.plot(x,stats.t.pdf(x=x,df=30),color='green')
[<matplotlib.lines.Line2D at 0x7f79e6a41a90>]
Out[31]: