Exercise Python

22. 12. 19.
오후 3:05 파이썬 정리
제목 짓기
캐멀 케이스 (대문자로 시작)
ex) PrintHello
클래스 이름으로 사용
스네이크 케이스 (소문자로 시작)
뒤에 괄호가 있다 (함수명)
ex) print_hello() 뒤에 괄호가 없다 (변수명)
파이썬 계산 기능
1. 사칙연산
+, -, *, /
2. 지수곱
** (또는 pow(a,b))
3. 나머지
%
4. 몫
//
대입 연산자
1. +=
2. -=
3. *=
4. /=
5. //=
6. %=
7. **=
관계 연산자
1. '==' (같다)
2. '!=' (같지 않다)
3. '>' (크다)
4. '<' (작다)
5. '>=' (크거나 같다)
6. '<=' (작거나 같다)
논리 연산자
1. and(논리곱) [둘다 참이면 참]
2. or(논리합) [둘 중 하나만 참이면 참]
localhost:8888/nbconvert/html/Desktop/anaconda/파이썬 정리.ipynb?download=false 1/71
22. 12. 19. 오후 3:05 파이썬 정리
3. not(논리부정) [참이면 거짓, 거짓이면 참]
변수명 규칙
1. 문자로 시작하며 숫자가 뒤에 붙을 수 있다.
2. 변수명에 특수문자는 사용할 수 없다.
3. 변수명에 .(마침표)는 사용 불가능 하지만 _ (언더바)는 사용이 가능하다
4. 대소문자 구분을 한다
주석
내부에 있는 문장이지만 프로그램의 실행에서는 사용하지 않는 것을 말한다.
프로그램 해석을 용이하게 하기 위해 작성자가 삽입한다.
#은 Code의 경우는 주석, Markdown의 경우 제목이 된다.
파이썬 자료 형태
1. 정수형 (int)
2. 실수형 (float)
3. 논리형 (boolean(True/False))
4. 문자열 str
파이썬 패키지
import import package_name as name
형태로 패키지를 불러오고 as 이후로 칭한다.
In [5]: pip install numpy
Requirement already satisfied: numpy in /Users/mind/opt/anaconda3/lib/python

3.9/site-packages (1.21.5)
Note: you may need to restart the kernel to use updated packages.
In [6]: import numpy as np
In [7]: a=np.arange(15)
In [8]: a
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])

Out[8]:
In [9]: print(a)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]
In [10]: m=np.mean(a)
In [11]: m
7.0
Out[11]:

22. 12. 19. 오후 3:05 파이썬 정리
In [12]: print(m)
7.0
결측치(관측되지 않거나 어떤 이유로 계산되지 않은 값)와 무

한대
In [13]: height=np.array([45,75,60,71,np.nan])
In [14]: print(np.mean(height))
# 결측치가 포함된
array nan 라서 이 출력됨
nan
In [15]: print(np.nanmean(height))
#NAN 을 제외하고 평균을 계산해 주는 nanmean 함수가 있음
62.75
In [16]: x=np.array([1,-1,np.infty,-np.infty])
In [17]: print(x/0)
#inf 가 출력되면 경고문이 뜸
[ inf -inf inf -inf]
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_85878/2756779915.
py:1: RuntimeWarning: divide by zero encountered in true_divide
print(x/0)
if 조건문
조건에 맞을 경우에만 지정한 수행할 문장을 실행하는 경우를 의미한다.
사용방법 if 조건:
(수행할 문장)
elif 2번째 조건:
(수행할 문장)
else:
(다른 조건들이 해당하지 않을 경우 수행할 문장)
In [18]: # 활용 예시
x = 6
if x>5:
print("5 보다 큽니다.")
elif x<5:
print("5 보다 작습니다.")
elif x == 5:
print("5 입니다.")
else:
print(" 값이 잘못되었습니다.")
보다 큽니다.
5
반복문
22. 12. 19. 오후 3:05 파이썬 정리
해당 조건이 참일 경우 반복해서 수행하는 함수 while 조건문

수행할 문장
In [19]: a=0
while a<10:
a+=1
print(a)
1
2
3
4
5
6
7
8
9
10
For 문 기본 구조
for 변수 in 리스트(튜플, 문자열) :
<실행할 문자열1>
<실행할 문자열2>
In [20]: list = ["c","d","d"]
for i in list:
print(i)
c
d
d
In [21]: #1 10 부터 까지 더하는 수
isum=0
#range( 시작숫자
,(B-1) ) 개
x=range(1,11)
for i in x:
isum += i
print("1 10부터 까지의 합은
",isum," 입니다")
1부터 10까지의 합은 55 입니다
In [24]: sqsum=0
x=range(1,11)
for i in x:
sqsum += i**2
print("1 10부터 까지 제곱합은",sqsum, "입니다")
1부터 10까지 제곱합은 385 입니다
In [25]: #1부터 10까지의 제곱을 출력하는 경우

x=range(1,11)
for i in x:
x2=i**2
print(x2)

22. 12. 19. 오후 3:05 파이썬 정리
1
4
9
16
25
36
49
64
81
100
In [ ]: # 평균 구하는 방법
xsum = 0
x=[10,12,21]
for i in range(0,3):
xsum = xsum + x[i]
In [26]: # 누적합 구하는 방법

xsum=0
x=[10,12,21]
for i in range(0,len(x)):
xsum += x[i]
print(xsum)
43
In [27]: # 제곱합 구하는 방법

sqsum = 0
x=[10,12,21]
sqsum+=x[i]**2
print(sqsum)
685
In [29]: import math

xsum =0; sqsum=0
x=[10,12,21]
xsum+=x[i]
sqsum+=x[i]**2
m = xsum/len(x)
var=sqsum/(len(x)-1)-(len(x)*m**2)/(len(x)-1)
sd=math.sqrt(var)
print(' 평균
: ',round(m,3), '|',' 분산
: ',round(var,3),'|',' 표본편차 : ',round(sd
평균 : 14.333 | 분산 : 34.333 | 표본편차 : 5.859
In [31]: #구구단 만들기

print('')
print(i,' ') 단
print('========')
for j in range(1,10):
k=i*j
print(i,'*',j,'=',k)

22. 12. 19. 오후 3:05 파이썬 정리
2 단
========
2 * 1 = 2
2 * 2 = 4
2 * 3 = 6
2 * 4 = 8
2 * 5 = 10
2 * 6 = 12
2 * 7 = 14
2 * 8 = 16
2 * 9 = 18
3 단
========
3 * 1 = 3
3 * 2 = 6
3 * 3 = 9
3 * 4 = 12
3 * 5 = 15
3 * 6 = 18
3 * 7 = 21
3 * 8 = 24
3 * 9 = 27
4 단
========
4 * 1 = 4
4 * 2 = 8
4 * 3 = 12
4 * 4 = 16
4 * 5 = 20
4 * 6 = 24
4 * 7 = 28
4 * 8 = 32
4 * 9 = 36
5 단
========
5 * 1 = 5
5 * 2 = 10
5 * 3 = 15
5 * 4 = 20
5 * 5 = 25
5 * 6 = 30
5 * 7 = 35
5 * 8 = 40
5 * 9 = 45
6 단
========
6 * 1 = 6
6 * 2 = 12
6 * 3 = 18
6 * 4 = 24
6 * 5 = 30
6 * 6 = 36
6 * 7 = 42
6 * 8 = 48
6 * 9 = 54
7 단
========
7 * 1 = 7
7 * 2 = 14
22. 12. 19. 오후 3:05 파이썬 정리
7 * 3 = 21
7 * 4 = 28
7 * 5 = 35
7 * 6 = 42
7 * 7 = 49
7 * 8 = 56
7 * 9 = 63
8 단
========
8 * 1 = 8
8 * 2 = 16
8 * 3 = 24
8 * 4 = 32
8 * 5 = 40
8 * 6 = 48
8 * 7 = 56
8 * 8 = 64
8 * 9 = 72
9 단
========
9 * 1 = 9
9 * 2 = 18
9 * 3 = 27
9 * 4 = 36
9 * 5 = 45
9 * 6 = 54
9 * 7 = 63
9 * 8 = 72
9 * 9 = 81
In [32]: # 변이계수 구하는 방법

import math
xsum=0; ysum=0; xsqsum=0; ysqsum=0
x=[10,12,21]
y=[150,180,186,200]
n1=len(x)
n2=len(y)
xsum += x[i]
xsqsum += x[i]**2
ysum += y[i]
ysqsum += y[i]**2
m1 = xsum/n1
var1=(xsqsum-(n1*m1**2))/(n1-1)
sd1=math.sqrt(var1)
cv1 = sd1/m1
m2 = ysum/n2
var2=(ysqsum-(n2*m2**2))/(n2-1)
sd2=math.sqrt(var2)
cv2 = sd2/m2
if cv1>cv2:
의 변이계수가 더 크므로 x변동이 더 크다")
print("x
elif cv1<cv2:
의 변이계수가 더 크므로 y변동이 더 크다")
print("y
else:
두 자료의 변이계수가 같다")
print("
x의 변이계수가 더 크므로 x변동이 더 크다

22. 12. 19. 오후 3:05 파이썬 정리
함수
def 함수명(입력인자):
명령문 return
In [1]: # 합구하기 함수
def summary(x):
isum = 0
n=len(x)
for i in range(0,n):
isum+=x[i]
print(isum)
In [2]: x=[10,20,30,40,50]
In [3]: summary(x)
150
In [ ]: def summary(x):

import math
xsum=0; sqsum=0
n=len(x)
xsum+=x[i]
sqsum+=x[i]**2
m =xsum/n
var=sqsum/(n-1)-(n*m**2)/(n-1)
sd=math.sqrt(var)
print(" 평균
")
In [4]: # 절댓값 구하기

def ab(x):
import math
if(x>0):
print(x)
else:
print(-x)
ab(-5)
In [7]: # 홀짝 구분
def even(x):
remain = x%2
if remain == 0:
print(" 짝수입니다.")
else:
print(" 홀수입니다.")
even(5)
홀수입니다.
In [14]: #상관계수
def corr(x,y):
import math
xsum=0; ysum=0; xysum=0; xsqsum=0; ysqsum=0
n=len(x)
xsum += x[i]
22. 12. 19. 오후 3:05 파이썬 정리
ysum += y[i]
xysum += x[i] + y[i]
xsqsum += x[i]**2
ysqsum += y[i]**2
xmean = xsum/len(x)
ymean = ysum/len(y)
corr=(xysum-n*xmean*ymean)/(math.sqrt(xsqsum-n*xmean**2)*math.sqrt(ysqsu
print(round(corr,3))
x=[73,77,68]
y=[45,67,98]
corr(x,y)
-61.765
날짜 및 시간
In [1]: import datetime as dt
In [2]: now= dt.datetime.now()
In [3]: print(now.year)
2022
벡터와 행렬
성질에 따라 list, tuple, dict 등으로 구분함
xv=np.array([1,2,3,4,5,6,7,8,9])
In [5]: xv=np.arange(1,10)
In [8]: xv
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
Out[8]:
In [9]: xv[4]
5
Out[9]:
In [14]: xv[1:4]
array([2, 3, 4])
Out[14]:
In [20]: xm=np.array([[1,2,3],[4,5,6],[7,8,9]])
In [16]: xm=np.arange(1,10)
In [18]: xm=xm.reshape(3,3)
In [19]: xm

22. 12. 19. 오후 3:05 파이썬 정리
array([[1, 2, 3],
Out[19]:
[4, 5, 6],
[7, 8, 9]])
In [22]: xm[:,0] # 첫 번째 열
array([1, 4, 7])
Out[22]:
In [23]: xm[0:2,:] # 첫 두행
array([[1, 2, 3],
Out[23]:
[4, 5, 6]])
In [24]: # 배열의 특정 위치의 값을 바꾸는 경우

xv[3]=11
xv
array([ 1, 2, 3, 11, 5, 6, 7, 8, 9])

Out[24]:
In [25]: xm[0,2]=11
xm
array([[ 1, 2, 11],
Out[25]:
[ 4, 5, 6],
[ 7, 8, 9]])
In [26]: x=list(range(20))
x
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Out[26]:
In [28]: x[0:5] # 리스트 첫번째 부터 다섯 번째까지 출력

[0, 1, 2, 3, 4]
Out[28]:
In [29]: x[:5] # 리스트 첫번째 부터 다섯 번째까지 출력

[0, 1, 2, 3, 4]
Out[29]:
In [30]: x[-1] # 뒤에서 첫 번째 출력

19
Out[30]:
In [31]: x[-5:-2] # 뒤에서 다섯 번째부터(자신포함) 출력

[15, 16, 17]
Out[31]:
List(리스트)
사용 모양 [ ]
In [32]: a=[1,2,'c']
x=[1,2,'1',a]
In [33]: print(x)
[1, 2, '1', [1, 2, 'c']]

22. 12. 19. 오후 3:05 파이썬 정리
In [34]: x2=[[1,2,3],[4,5,6],[7,8,9]]
x2
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Out[34]:
In [ ]: x2[0][2] # 첫 번째 행, 세 번째 열의 값
튜플(tuple)
불변한 순서가 있는 객체의 집합이며, list형과 비슷하지만 한 번 생성되면 값의 변경이 불가능하다.
셀 수 있는 수량의 순서있는 열거라고도 볼 수 있다.
튜플은 소괄호로 만들 수 있다.
사용 모양 ( )
In [35]: tp1 =()
tp1
()
Out[35]:
In [37]: tp1=(1,)
tp1
(1,)
Out[37]:
In [39]: tp1=(1,2,3)
tp1
(1, 2, 3)
Out[39]:
In [40]: tp1[2]
3
Out[40]:
In [42]: tp1[2]=3 # 튜플은 내부값 변경이 불가능하다.

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [42], in <cell line: 1>()
----> 1 tp1[2]=3
TypeError: 'tuple' object does not support item assignment
딕셔너리(dictionary)
딕셔너리는 키(key)와 값(value)이 한 쌍이 하나의 대응관ㄴ계를 가지고 있는 자료형
딕셔너리={키:값}
딕셔너리={Key1:Value1, Key2:Value2}
※ 주의할 점
1. 키 값에 list, set이 올 수 없다.

2. 키 값은 중복 될 수 없다.
22. 12. 19. 오후 3:05 파이썬 정리
3. 딕셔너리 키 값 추가, 접근방법, 값 변경

딕셔너리[키]= 값을 통해서 키, 값, 쌍을 추가할 수 있다.
딕셔너리[키]를 통해서 값에 접근할 수 있다.
딕셔너리[키]=값을 통해서 값을 변경할 수 있다.
In [43]: country_code={'korea':82,'us':1,'china':86}
print(country_code)
{'korea': 82, 'us': 1, 'china': 86}
In [44]: print(country_code['korea'])
82
In [46]: print(country_code[1]) # 단순 위치번호로는 불러올 수 없음

---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Input In [46], in <cell line: 1>()
----> 1 print(country_code[1])
KeyError: 1
In [47]: print(country_code.keys())
dict_keys(['korea', 'us', 'china'])
In [48]: print(country_code.values())
dict_values([82, 1, 86])
In [49]: print(country_code.items())
dict_items([('korea', 82), ('us', 1), ('china', 86)])

a=np.mat([[1,2],[3,4]]) # np.mat 대신 np.matrix를 사용해도 좋음
b=np.mat([[1,5],[5,1]])
In [51]: print(a+b) # 사칙연산 가능하다

[[2 7]
[8 5]]
In [53]: print(a.I) # 역행렬 혹은 numpy.inv(a)
[[-2. 1. ]
[ 1.5 -0.5]]
In [54]: print(a*a.I)
[[1.00000000e+00 1.11022302e-16]
[0.00000000e+00 1.00000000e+00]]
In [55]: print(a.T) # 행렬의 전치

[[1 3]
[2 4]]
In [56]: np.eye(3) # 단위행렬 만들기

array([[1., 0., 0.],
Out[56]:
[0., 1., 0.],
[0., 0., 1.]])

22. 12. 19. 오후 3:05 파이썬 정리
In [57]: np.linalg.det(a) # 행렬식

-2.0000000000000004
Out[57]:
In [59]: np.linalg.eigvals(a) # 고유치

array([-0.37228132, 5.37228132])
Out[59]:
In [60]: np.linalg.eig(a)
(array([-0.37228132, 5.37228132]),
Out[60]:
matrix([[-0.82456484, -0.41597356],
[ 0.56576746, -0.90937671]]))
파이썬을 이용한 데이터 분석

기술통계
1. 통계처리와 scipy
2. 사분위수 구하기
import scipy as sp
In [63]: fish_data = np.array([2,3,3,4,4,4,4,5,5,6])

fish_data
array([2, 3, 3, 4, 4, 4, 4, 5, 5, 6])
Out[63]:
In [64]: len(fish_data) # 데이터 수

10
Out[64]:
In [65]: sp.sum(fish_data) # 합을 구함
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/3877179421.p
y:1: DeprecationWarning: scipy.sum is deprecated and will be removed in SciP
y 2.0.0, use numpy.sum instead
sp.sum(fish_data) # 합을 구함
40
Out[65]:
In [69]: # 모평균 구하기

N=len(fish_data)
sum_value=sp.sum(fish_data)
mu=sum_value/N
mu
sum_value=sp.sum(fish_data)
4.0
Out[69]:
In [67]: sp.mean(fish_data) #scipy 의 mean 함수를 이용

22. 12. 19. 오후 3:05 파이썬 정리
y:1: DeprecationWarning: scipy.mean is deprecated and will be removed in Sci
Py 2.0.0, use numpy.mean instead
sp.mean(fish_data) #scipy mean 의 함수를 이용
4.0
Out[67]:
In [70]: sigma_2_pop=sp.sum((fish_data-mu)**2)/N # 모분산 구하는 방법

sigma_2_pop
sigma_2_pop=sp.sum((fish_data-mu)**2)/N # 모분산 구하는 방법
1.2
Out[70]:
In [71]: sp.var(fish_data,ddof=0) # 표본분산은 ddof=1, 모분산은 ddof=0
y:1: DeprecationWarning: scipy.var is deprecated and will be removed in SciP
y 2.0.0, use numpy.var instead
sp.var(fish_data,ddof=0) # 표본분산은
ddof=1, ddof=0 모분산은
1.2
Out[71]:
In [76]: sigma_2_sample=sp.sum((fish_data-mu)**2/(N-1)) # 표본분산 구하는 방법

sigma_2_sample
sigma_2_sample=sp.sum((fish_data-mu)**2/(N-1)) # 표본분산 구하는 방법
1.3333333333333335
Out[76]:
In [77]: sigma_sample=sp.sqrt(sigma_2_sample) # 표본 표준편차 구하는 방법

sigma_sample
y:1: DeprecationWarning: scipy.sqrt is deprecated and will be removed in Sci
Py 2.0.0, use numpy.lib.scimath.sqrt instead
sigma_sample=sp.sqrt(sigma_2_sample) # 표본 표준편차 구하는 방법
1.1547005383792517
Out[77]:
In [78]: #scipy 로 표본 표준편차 구하기

sp.std(fish_data,ddof=1)
y:2: DeprecationWarning: scipy.std is deprecated and will be removed in SciP
y 2.0.0, use numpy.std instead
sp.std(fish_data,ddof=1)
1.1547005383792515
Out[78]:
In [82]: sp.amax(fish_data) # 최대값 구하기

y:1: DeprecationWarning: scipy.amax is deprecated and will be removed in Sci
Py 2.0.0, use numpy.amax instead
sp.amax(fish_data) # 최대값 구하기
6
Out[82]:

22. 12. 19. 오후 3:05 파이썬 정리
In [83]: sp.amin(fish_data) # 최소값 구하기

y:1: DeprecationWarning: scipy.amin is deprecated and will be removed in Sci
Py 2.0.0, use numpy.amin instead
sp.amin(fish_data) # 최소값 구하기
2
Out[83]:
In [84]: sp.median(fish_data) # 중앙값 구하기

y:1: DeprecationWarning: scipy.median is deprecated and will be removed in S
ciPy 2.0.0, use numpy.median instead
sp.median(fish_data) # 중앙값 구하기
4.0
Out[84]:
소수표현 round(x,3) 소수 셋째자리까지의 표현

사분위수 구하기
사분위 범위는 뒷 백분위수에서 앞 백분위수를 빼는 것
In [86]: from scipy import stats
In [87]: fish_data_3=np.array([1,2,3,4,5,6,7,8,9])
stats.scoreatpercentile(fish_data_3,25) # 제 25 백분위수
3.0
Out[87]:
In [88]: stats.scoreatpercentile(fish_data_3,75) # 제 75 백분위수

7.0
Out[88]:
다변량 데이터 관리
pandas의 데이터 프레임으로 간단하게 관리할 수 있으며, 다음은 pandas와 scipy를 임포트한 후 표시
자릿수를 지정하는 코드
In [90]: # 수치 계산에 사용하는 라이브러리
import pandas as pd
import scipy as sp
# 표시 자릿수 지정
%precision 3
'%.3f'
Out[90]:
In [91]: fish_multi=pd.read_csv('/Users/mind/desktop/anaconda/fish_multi.csv')
In [92]: fish_multi

22. 12. 19. 오후 3:05 파이썬 정리
Out[92]: spcies length

0 A 2
1 A 3
2 A 4
3 B 6
4 B 8
5 B 10
In [96]: # 그룹별 통계량 계산하기

group=fish_multi.groupby("spcies")
print(group.mean()) # 그룹 평균
length
spcies
A 3.0
B 8.0
In [97]: print(group.std(ddof=1)) # 그룹 표준편차

length
spcies
A 1.0
B 2.0
데이터 개수, 평균, 표준편차, 최소값, 사분위수

describe = 설명하다
데이터 전반적인 수를 표현해줌
In [98]: group.describe()
Out[98]: length
count mean std min 25% 50% 75% max
spcies
A 3.0 3.0 1.0 2.0 2.5 3.0 3.5 4.0
B 3.0 8.0 2.0 6.0 7.0 8.0 9.0 10.0
In [102… #상관계수 구하기

x=[1,2,3,4,5]
y=[10,11,12,13,14]
In [103… N=len(x)
mu_x=sp.mean(x)
mu_y=sp.mean(y)

22. 12. 19. 오후 3:05 파이썬 정리
mu_x=sp.mean(x)
mu_y=sp.mean(y)
In [104… cov=sum((x-mu_x)*(y-mu_y))/N
cov
2.000
Out[104]:
In [105… cov_sample=sum((x-mu_x)*(y-mu_y))/(N-1)
cov_sample
2.500
Out[105]:
In [107… sp.cov(x,y,ddof=0) # 1 행 2열, 2행 1열이 공분산

y:1: DeprecationWarning: scipy.cov is deprecated and will be removed in SciP
y 2.0.0, use numpy.cov instead
sp.cov(x,y,ddof=0) # 1 2 , 2 1 행 열 행 열이 공분산
array([[2., 2.],
Out[107]:
[2., 2.]])
In [108… sp.cov(x,y,ddof=1) # 1 행 2열, 2행 1열이 공분산

y:1: DeprecationWarning: scipy.cov is deprecated and will be removed in SciP
y 2.0.0, use numpy.cov instead
sp.cov(x,y,ddof=1) # 1 2 , 2 1 행 열 행 열이 공분산
array([[2.5, 2.5],
Out[108]:
[2.5, 2.5]])
In [110… #분산 계산
sigma_2_x=sp.var(x,ddof=1) # 표본분산
sigma_2_y=sp.var(y,ddof=1)
#상관계수
rho=cov_sample/(sp.sqrt(sigma_2_x)*sp.sqrt(sigma_2_y))
세자리에서 반올림되어서 이된건 본래는
rho # 1 0.9999999999999999998
/var/folders/yn/kwys170j0y35xvl985n3ld2r0000gn/T/ipykernel_8412/1312370.py:
2: DeprecationWarning: scipy.var is deprecated and will be removed in SciPy
2.0.0, use numpy.var instead
sigma_2_x=sp.var(x,ddof=1) # 표본분산
3: DeprecationWarning: scipy.var is deprecated and will be removed in SciPy
2.0.0, use numpy.var instead
sigma_2_y=sp.var(y,ddof=1)
5: DeprecationWarning: scipy.sqrt is deprecated and will be removed in SciPy
2.0.0, use numpy.lib.scimath.sqrt instead
rho=cov_sample/(sp.sqrt(sigma_2_x)*sp.sqrt(sigma_2_y))
1.000
Out[110]:
In [111… sp.corrcoef(x,y) #scipy 의 corrcoef 함수를 이용하여 상관행렬 계산

22. 12. 19. 오후 3:05 파이썬 정리
y:1: DeprecationWarning: scipy.corrcoef is deprecated and will be removed in
SciPy 2.0.0, use numpy.corrcoef instead
sp.corrcoef(x,y) #scipy corrcoef 의 함수를 이용하여 상관행렬 계산
array([[1., 1.],
Out[111]:
[1., 1.]])
matplotlib과 seaborn을 이용한 데이터 시각화

matplotlib은 그래프를 그리는 표준 라이브러리이며,
seaborn은 matplotlib의 그래프를 더 이쁘게 그리기 위한 라이브러리이다.
In [113… #수치 계산 사용 라이브러리
import numpy as np
import pandas as pd
#표시 자릿수 지정
%precision 3
#그래프를 그리기 위한 라이브러리
from matplotlib import pyplot as plt
# 그래프를 주피터 노트북에 그리기 위한 설정
%matplotlib inline
In [114… x=np.array([0,1,2,3,4,5,6,7,8,9])
y=np.array([2,3,4,3,5,4,6,7,4,8])
# 꺽은선 그래프
plt.plot(x,y,color='black')
plt.title("lineplot matplotlib")
plt.xlabel("X")
plt.ylabel("Y")
Text(0, 0.5, 'Y')

Out[114]:
In [115… import seaborn as sns

sns.set()
In [116… # 꺽은선 그래프

plt.plot(x,y,color='black')
plt.title("lineplot matplotlib")
plt.xlabel("X")
plt.ylabel("Y")
Text(0, 0.5, 'Y')

Out[116]:

22. 12. 19. 오후 3:05 파이썬 정리
In [117… fish_data=np.array([2,3,3,4,4,4,4,5,5,6])
fish_data
array([2, 3, 3, 4, 4, 4, 4, 5, 5, 6])
Out[117]:
In [120… #seaborn 을 이용한 히스토그램

sns.distplot(fish_data,bins=5,color='black',kde=False)
<AxesSubplot:>
Out[120]:
In [121… sns.distplot(fish_data,color='black')
<AxesSubplot:ylabel='Density'>
Out[121]:

22. 12. 19. 오후 3:05 파이썬 정리
In [122… # boxplot 상자그림

import pandas as pd
fish_multi=pd.read_csv('/Users/mind/desktop/anaconda/fish_multi.csv')
sns.boxplot(x='spcies',y='length',data=fish_multi,color='gray')
<AxesSubplot:xlabel='spcies', ylabel='length'>
Out[122]:
In [124… 막대그래프
# barplot
sns.barplot(x='spcies',y='length',data=fish_multi,color='gray')
높이는 평균을 의미하고 막대는 에러바를 의미합니다
# .
<AxesSubplot:xlabel='spcies', ylabel='length'>
Out[124]:

22. 12. 19. 오후 3:05 파이썬 정리
In [125… # 페어플롯 : 많은 양의 변수를 가지고 있는 데이터를 대상으로 그래프를 그리는 방법입니다.

iris=sns.load_dataset('iris')
iris.head(n=3)
Out[125]: sepal_length sepal_width petal_length petal_width species

0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
In [126… iris.groupby('species').mean()
Out[126]: sepal_length sepal_width petal_length petal_width

species
setosa 5.006 3.428 1.462 0.246
versicolor 5.936 2.770 4.260 1.326
virginica 6.588 2.974 5.552 2.026
In [127… sns.pairplot(iris,hue='species',palette="gray")
<seaborn.axisgrid.PairGrid at 0x7fe148667cd0>
Out[127]:
여기부터 기말고사 범위
5장 표본추출 시뮬레이션
22. 12. 19. 오후 3:05 파이썬 정리
※전제
모집단을 완전히 알고 있다는 전제하에 모집단에서 표본추출의 시뮬레이션을 하고자 함
In [128… # 수치계산에 사용하는 라이브러리

import numpy as np
import pandas as pd
import scipy as sp
from scipy import stats
# 그래프를 그리기 위한 라이브러리
import seaborn as sns
sns.set()
# 표시 자릿수 지정
%precision 3
# 그래프를 주피터 그래프에 그리기 위한 설정
%matplotlib inline
난수 발생 방법
물고기 5마리밖에 없는 호수를 대상으로 예를 들어보면, 물고기의 몸길이 데이터를 numpy 배열을 이요
해 만들고자 한다.
In [129… fish_5=np.array([2,3,4,5,6])
fish_5
array([2, 3, 4, 5, 6])
Out[129]:
In [132… # 위의 예를 이용하여 물고기 몇 마리를 랜덤으로 뽑고 싶은 경우 np.random.choice 함수를 사용하면

np.random.choice(fish_5,size=1,replace=False)
는
#replace True 인 경우 복원추출 인 경우는 비복원 추출을 한다.
False
array([2])
Out[132]:
In [133… np.random.choice(fish_5,size=3,replace=False)
array([4, 2, 3])
Out[133]:
In [135… # 동일 난수 추출 방법 을 고정시키면 됨
seed(n) n
# seed : 난수 생성 초기값 부여 샘플 생성 개수
/ size :
np.random.seed(1)
np.random.choice(fish_5,size=3,replace=False)
array([4, 3, 6])
Out[135]:
In [138… np.random.choice(fish_5,size=3,replace=False)
array([4, 5, 3])
Out[138]:
In [139… # 복원추출 방법
np.random.choice(fish_5,size=3,replace=True)
array([6, 3, 4])
Out[139]:

22. 12. 19. 오후 3:05 파이썬 정리
정규분포의 확률밀도함수
모집단의 히스토그램과 N(4,0.64)인 분포의 확률밀도함수를 비교해 보자.
In [141… import numpy as np
In [142… x=np.arange(start=1,stop=7.1,step=0.1)
x
array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2,
Out[142]:
2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5,
3.6, 3.7, 3.8, 3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8,
4.9, 5. , 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6. , 6.1,
6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7. ])
In [143… from scipy import stats
In [144… #loc : 평균
, scale : 표준편차
stats.norm.pdf(x=x,loc=4,scale=0.8)
array([4.407e-04, 6.988e-04, 1.091e-03, 1.676e-03, 2.536e-03, 3.778e-03,

Out[144]:
5.540e-03, 7.998e-03, 1.137e-02, 1.591e-02, 2.191e-02, 2.971e-02,
3.967e-02, 5.215e-02, 6.749e-02, 8.598e-02, 1.078e-01, 1.332e-01,
1.619e-01, 1.938e-01, 2.283e-01, 2.648e-01, 3.025e-01, 3.401e-01,
3.764e-01, 4.102e-01, 4.401e-01, 4.648e-01, 4.833e-01, 4.948e-01,
4.987e-01, 4.948e-01, 4.833e-01, 4.648e-01, 4.401e-01, 4.102e-01,
3.764e-01, 3.401e-01, 3.025e-01, 2.648e-01, 2.283e-01, 1.938e-01,
1.619e-01, 1.332e-01, 1.078e-01, 8.598e-02, 6.749e-02, 5.215e-02,
3.967e-02, 2.971e-02, 2.191e-02, 1.591e-02, 1.137e-02, 7.998e-03,
5.540e-03, 3.778e-03, 2.536e-03, 1.676e-03, 1.091e-03, 6.988e-04,
4.407e-04])
In [145… from matplotlib import pyplot as plt
In [146… plt.plot(x,stats.norm.pdf(x=x,loc=4,scale=0.8),color='black')
[<matplotlib.lines.Line2D at 0x7fe1486eebe0>]
Out[146]:
정규분포에서 난수 추출하는 경우
In [147… sampling_norm=stats.norm.rvs(loc=4,scale=0.8,size=10)
sampling_norm

22. 12. 19. 오후 3:05 파이썬 정리
array([3.628, 4.689, 1.877, 4.922, 4.157, 2.764, 3.93 , 4.682, 4.542,
Out[147]:
3.914])
In [148… import scipy as sp
In [149… sp.mean(sampling_norm)
sp.mean(sampling_norm)
3.910
Out[149]:
In [150… sp.var(sampling_norm)
sp.var(sampling_norm)
0.822
Out[150]:
표본평균 여러 번 계산하기
In [151… population=stats.norm(loc=4,scale=0.8)
In [152… #평균값들을 저장할 준비 길이가, 1000 임

sampling_mean_array=np.zeros(1000)
In [153… sampling_mean_array

22. 12. 19. 오후 3:05 파이썬 정리
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
Out[153]:
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
In [156… np.random.seed(1)
22. 12. 19. 오후 3:05 파이썬 정리
sample=population.rvs(size=10)
sampling_mean_array[i]=sp.mean(sample)
sampling_mean_array[i]=sp.mean(sample)
In [157… sampling_mean_array

22. 12. 19. 오후 3:05 파이썬 정리
array([3.922, 3.864, 4.07 , 3.857, 4.185, 4.142, 4.365, 3.912, 4.116,
Out[157]:
4.052, 4.032, 3.827, 4.276, 3.914, 4.289, 3.957, 4.02 , 4.3 ,
4.09 , 4.516, 3.761, 4.018, 4.27 , 3.908, 3.863, 4.028, 4.034,
4.257, 3.938, 4.002, 3.965, 4.402, 4.12 , 3.415, 3.967, 4.012,
3.834, 4.216, 3.806, 4.099, 3.889, 3.827, 4.146, 4.661, 4.019,
3.953, 4.217, 3.826, 4.351, 3.626, 3.733, 4.149, 4.578, 3.934,
3.638, 3.959, 4.353, 4.337, 3.896, 4.43 , 3.71 , 3.918, 4.056,
3.863, 3.824, 4.334, 3.693, 3.984, 4.257, 3.899, 3.958, 3.968,
3.876, 3.908, 3.854, 4.031, 3.836, 3.956, 3.88 , 3.954, 4.2 ,
4.092, 3.974, 4.194, 4.253, 4.462, 3.836, 4.315, 4.071, 3.459,
4.233, 4.123, 4.017, 4.172, 4.219, 4.239, 3.706, 3.664, 3.962,
4.011, 3.763, 3.703, 4.113, 4.33 , 3.871, 4.091, 3.732, 4.338,
4.055, 4.287, 3.817, 4.046, 3.895, 4.089, 3.692, 4.21 , 3.595,
3.806, 3.952, 3.938, 4.425, 3.88 , 4.539, 4.202, 4.13 , 4.135,
4.175, 4.119, 3.92 , 3.936, 4.055, 4.235, 4.238, 4.169, 4.037,
3.794, 4.333, 4.064, 3.8 , 4.103, 4.289, 4.301, 4.253, 3.968,
4.413, 4.093, 3.812, 4.422, 3.864, 4.087, 4.158, 3.704, 4.059,
3.877, 3.793, 4.017, 4.002, 3.76 , 4.127, 3.511, 4.062, 3.336,
3.819, 4.124, 4.064, 3.921, 4.226, 4.117, 4.042, 4.115, 3.899,
3.668, 4.052, 4.016, 4.385, 3.726, 3.801, 4.053, 4.374, 4.83 ,
4.073, 4.138, 3.664, 4.199, 4.273, 4.166, 4.06 , 3.68 , 4.03 ,
3.842, 3.659, 4.033, 4.212, 4.053, 4.142, 3.769, 3.406, 4.002,
4.037, 4.004, 3.821, 4.11 , 3.306, 4.253, 3.89 , 3.942, 4.035,
4.329, 3.738, 4.11 , 4.04 , 4.099, 3.769, 4.48 , 4.004, 3.686,
3.988, 4.398, 4.025, 3.586, 3.902, 4.162, 3.802, 4.407, 4.091,
4.287, 3.804, 3.718, 4.248, 4.17 , 3.667, 3.567, 3.928, 4.251,
3.338, 3.786, 4.281, 3.855, 3.901, 4.296, 4.4 , 4.431, 3.863,
4.047, 3.886, 4.05 , 3.444, 4.266, 4.148, 3.854, 3.926, 3.641,
3.76 , 3.7 , 4.314, 4.117, 4.083, 4.162, 3.83 , 3.797, 4.16 ,
3.909, 4.504, 3.992, 4.445, 4.36 , 3.85 , 3.84 , 3.547, 3.871,
4.136, 3.649, 4.128, 4.328, 3.894, 3.795, 3.996, 3.971, 3.958,
3.986, 3.819, 4.05 , 4.137, 3.514, 4.079, 4.143, 3.975, 4.156,
3.995, 3.839, 3.944, 3.602, 3.924, 4.15 , 3.918, 3.625, 4.142,
4.464, 3.948, 3.697, 4.369, 4.173, 4.582, 4.118, 4.18 , 4.21 ,
4.561, 4.213, 3.825, 4.267, 4.028, 4.283, 3.992, 3.987, 4.065,
3.933, 4.336, 4.103, 4.033, 4.044, 3.985, 3.969, 3.76 , 3.804,
3.663, 4.199, 3.952, 4.271, 4.268, 3.531, 4.009, 4.446, 3.696,
4.241, 3.963, 4.271, 3.825, 3.971, 3.619, 3.426, 3.916, 4.25 ,
4.138, 4.137, 3.871, 3.928, 3.758, 3.636, 3.798, 3.841, 3.434,
3.792, 3.847, 3.787, 3.78 , 4.189, 3.832, 3.702, 3.888, 3.79 ,
4.453, 3.835, 3.97 , 4.089, 3.988, 3.394, 4.138, 4.128, 4.391,
4.099, 3.761, 4.001, 3.733, 4.077, 4.187, 4.26 , 3.721, 3.93 ,
4.049, 4.008, 3.977, 3.941, 4.213, 4.071, 3.707, 3.644, 4.12 ,
4.133, 3.802, 4.223, 4.365, 4.111, 3.887, 3.849, 4.436, 4.331,
3.872, 3.881, 4.155, 3.874, 4.413, 4.496, 3.796, 4.277, 3.963,
3.811, 4.186, 4.344, 3.63 , 3.968, 3.765, 4.241, 3.68 , 3.896,
4.324, 4.401, 3.732, 4.269, 4.21 , 3.744, 3.746, 4.363, 4.295,
3.974, 4.208, 3.728, 4.182, 3.657, 4.202, 4.396, 3.604, 4.098,
3.912, 3.668, 4.209, 3.566, 3.745, 3.863, 4.357, 4.057, 4.386,
3.377, 4.089, 4.466, 4.007, 3.867, 4.308, 4.183, 3.943, 4.419,
4.027, 3.719, 4.339, 4.474, 4.244, 3.964, 4.152, 3.856, 4.043,
4.053, 4.494, 3.874, 4.125, 4.137, 4.575, 3.801, 3.725, 3.661,
4.271, 4.184, 4.439, 4.439, 4.014, 4.149, 4.241, 4.151, 4.437,
3.368, 3.726, 4.126, 4.253, 3.982, 3.887, 4.032, 3.988, 4.358,
4.132, 4.226, 4.207, 3.849, 3.857, 4.231, 4.116, 3.81 , 3.379,
4.025, 3.933, 3.911, 4.313, 4.594, 4.159, 3.929, 3.913, 3.901,
4.003, 3.859, 4.403, 3.917, 3.478, 4.023, 4.389, 4.296, 4.009,
4.058, 3.954, 3.725, 3.535, 3.508, 4.254, 4.352, 3.71 , 3.912,
3.602, 4.18 , 4.251, 3.862, 4.304, 3.787, 3.878, 3.852, 3.976,
4.213, 4.201, 4.206, 4.304, 4.221, 4.057, 3.689, 4.036, 3.642,
3.893, 3.88 , 4.426, 3.971, 4.15 , 3.959, 4.102, 4.121, 3.644,
3.614, 3.751, 4.11 , 4.078, 4.207, 4.067, 4.069, 4.104, 4.133,
4.453, 3.543, 3.886, 3.69 , 3.542, 3.987, 4.034, 3.745, 3.957,
4.559, 3.915, 3.709, 4.114, 3.915, 4.062, 4.113, 4.212, 3.998,
22. 12. 19. 오후 3:05 파이썬 정리
4.026, 3.701, 4.165, 3.754, 3.817, 3.811, 4.043, 3.568, 3.784,
4.208, 3.744, 3.756, 3.947, 4.016, 3.581, 3.49 , 4.112, 3.808,
3.904, 4.04 , 3.778, 4.101, 3.746, 3.805, 3.965, 3.95 , 3.883,
4.178, 3.855, 4.451, 3.893, 4.087, 4.324, 3.638, 3.89 , 3.832,
4.035, 4.138, 4.033, 3.933, 4.176, 3.524, 3.883, 3.59 , 4.188,
3.941, 4.013, 4.219, 3.723, 4.357, 4.2 , 4.252, 3.68 , 4.216,
4.466, 4.167, 3.593, 4.106, 3.936, 3.992, 3.974, 4.067, 3.921,
4.151, 4.362, 3.665, 3.913, 4.093, 3.78 , 4.201, 4.033, 3.923,
4.016, 3.861, 4.247, 4.126, 3.769, 4.112, 3.809, 3.971, 4.177,
3.455, 4.381, 3.87 , 4.043, 3.898, 4.115, 4.23 , 4.182, 4.661,
3.905, 3.667, 3.936, 3.76 , 3.756, 4.148, 4.129, 4.254, 4.116,
3.664, 3.891, 3.793, 4.246, 3.494, 3.771, 3.682, 3.582, 3.67 ,
4.241, 3.752, 3.833, 4.178, 4.184, 4.234, 4.036, 3.934, 3.889,
4.202, 4.217, 3.878, 3.982, 3.988, 3.727, 3.656, 4.52 , 3.742,
3.953, 3.464, 3.886, 3.746, 4.048, 3.792, 4.043, 3.848, 4.28 ,
3.946, 3.673, 4.179, 3.8 , 4.089, 3.644, 3.836, 3.965, 4.363,
4.136, 3.99 , 3.56 , 3.897, 4.024, 3.923, 3.883, 4.114, 4.029,
4.236, 3.981, 3.973, 3.795, 4.482, 4.01 , 3.699, 3.581, 4.242,
3.991, 3.51 , 3.765, 4.034, 3.915, 4.229, 4.525, 3.683, 4.146,
3.969, 4.121, 4.067, 3.523, 4.255, 4.13 , 3.956, 4.006, 4.092,
4.179, 3.746, 3.807, 4.004, 3.937, 4.018, 4.134, 3.671, 4.221,
4.024, 3.988, 4.281, 3.762, 3.493, 3.953, 3.804, 4.001, 4.049,
4.339, 4.659, 3.4 , 4.058, 4.032, 4.28 , 4.475, 4.042, 3.955,
4.047, 3.966, 3.925, 4.152, 3.745, 3.589, 4.31 , 4.144, 3.843,
4.043, 4.036, 3.927, 4.238, 4.459, 3.756, 4.129, 4.547, 3.577,
3.592, 4.145, 4.086, 3.938, 3.961, 4.099, 3.93 , 4.295, 4.167,
4.025, 4.392, 4.082, 4.001, 3.987, 3.828, 3.988, 4.306, 4.281,
4.227, 4.132, 4.284, 3.754, 3.38 , 3.676, 3.795, 4.022, 4.018,
3.81 , 3.931, 3.692, 4.022, 4.277, 4.177, 3.732, 3.554, 4.051,
4.063, 4.711, 4.093, 3.957, 4.519, 3.953, 3.866, 4.149, 4.041,
3.818, 3.977, 4.207, 3.704, 3.837, 4.39 , 3.622, 4.002, 3.262,
4.355, 4.184, 4.035, 4.248, 3.847, 4.422, 4.423, 4.225, 4.479,
4.145, 4.137, 4.08 , 3.933, 3.922, 3.908, 4.18 , 3.891, 4.529,
3.859, 4.149, 3.658, 4.281, 4.409, 4.026, 3.642, 3.85 , 3.753,
3.923, 3.899, 3.948, 3.748, 3.663, 3.549, 4.307, 4.03 , 3.968,
3.843, 4.122, 3.97 , 3.956, 4.128, 4.176, 4.264, 4.054, 4.38 ,
4.088, 3.92 , 4.464, 4.426, 3.987, 3.818, 3.932, 3.994, 3.762,
4.111, 4.097, 4.209, 4.285, 4.454, 3.95 , 3.859, 3.691, 4.281,
4.196, 3.99 , 3.817, 3.911, 3.773, 4.149, 3.914, 3.804, 3.628,
4.078, 4.251, 4.359, 3.97 , 4.382, 4.012, 4.004, 4.223, 4.14 ,
3.968, 3.962, 4.296, 4.234, 4.186, 4.194, 4.199, 4.147, 3.928,
4.246, 3.846, 3.823, 3.961, 4.353, 4.04 , 3.736, 4.67 , 4.112,
3.604, 3.349, 3.974, 4.323, 4.257, 3.765, 3.756, 4.063, 4.208,
3.976, 3.846, 4.202, 4.243, 3.698, 3.742, 4.174, 4.663, 3.881,
3.741, 3.57 , 4.133, 3.85 , 3.785, 3.742, 3.875, 3.943, 4.209,
3.809, 3.971, 4.141, 3.627, 4.101, 4.076, 3.523, 4.19 , 3.438,
3.735, 4.116, 4.364, 4.31 , 3.819, 4.018, 4.447, 4.215, 3.939,
3.852])
In [158… sp.mean(sampling_mean_array)
sp.mean(sampling_mean_array)
4.008
Out[158]:
In [159… sp.var(sampling_mean_array)
sp.var(sampling_mean_array)

22. 12. 19. 오후 3:05 파이썬 정리
0.063
Out[159]:
In [160… sns.distplot(sampling_mean_array,color='black')
/Users/mind/opt/anaconda3/lib/python3.9/site-packages/seaborn/distributions.
py:2619: FutureWarning: `distplot` is a deprecated function and will be remo
ved in a future version. Please adapt your code to use either `displot` (a f
igure-level function with similar flexibility) or `histplot` (an axes-level
function for histograms).
warnings.warn(msg, FutureWarning)
Out[160]:
In [162… size_array=np.arange(10,100100,100)
size_array
array([ 10, 110, 210, ..., 99810, 99910, 100010])

Out[162]:
In [163… sampling_mean_array_size=np.zeros(len(size_array))
for i in range(0,len(size_array)):
sample=population.rvs(size=size_array[i])
sampling_mean_array_size[i]=sp.mean(sample)
sampling_mean_array_size[i]=sp.mean(sample)
In [166… plt.plot(size_array,sampling_mean_array_size,color='black')
plt.xlabel("sample size")
plt.ylabel("sample mean")
Text(0, 0.5, 'sample mean')

Out[166]:

22. 12. 19. 오후 3:05 파이썬 정리
표본평균을 몇 번이고 계산할 수 있는 함수 만들기

In [169… def calc_sample_mean(size,n_trial):
sample_mean_array=np.zeros(n_trial)
for i in range(0,n_trial):
sample=population.rvs(size=size)
sample_mean_array[i]=sp.mean(sample)
return(sample_mean_array)
sp.mean(calc_sample_mean(size=10,n_trial=10000))
sp.mean(calc_sample_mean(size=10,n_trial=10000))
4.004
Out[170]:
표본평균의 표준편차는 모집단보다 작다

In [171… size_array=np.arange(2,102,2)
size_array
array([ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26,
Out[171]:
28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52,
54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78,
80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100])
In [179… sample_mean_std_array=np.zeros(len(size_array))
for i in range(0,len(size_array)):
sample_mean=calc_sample_mean(size=size_array[i],n_trial=100)
sample_mean_std_array[i]=sp.std(sample_mean,ddof=1)

22. 12. 19. 오후 3:05 파이썬 정리
sample_mean_std_array[i]=sp.std(sample_mean,ddof=1)
In [183… plt.plot(size_array,sample_mean_std_array,color='black')
plt.xlabel("sample size")
plt.ylabel("mean_std value")
Text(0, 0.5, 'mean_std value')

Out[183]:
중심극한정리
In [186… # 샘플사이즈와 시행횟수
n_size=10000
n_trial=50000
# 앞면 뒷면
= 1, = 0
coin=np.array([0,1])
# 앞면이 나온 횟수
count_coin=np.zeros(n_trial)
# 동전을 번 던지는 시행을
n_size n_trial 번 수행
np.random.seed(1)
for i in range(0,n_trial):
count_coin[i]=sp.sum(np.random.choice(coin,size=n_size,replace=True))
#히스토그램 그리기
sns.distplot(count_coin,color='black')
y:11: DeprecationWarning: scipy.sum is deprecated and will be removed in Sci
Py 2.0.0, use numpy.sum instead
count_coin[i]=sp.sum(np.random.choice(coin,size=n_size,replace=True))
/Users/mind/opt/anaconda3/lib/python3.9/site-packages/seaborn/distributions.
py:2619: FutureWarning: `distplot` is a deprecated function and will be remo
ved in a future version. Please adapt your code to use either `displot` (a f
igure-level function with similar flexibility) or `histplot` (an axes-level
function for histograms).
warnings.warn(msg, FutureWarning)
Out[186]:

22. 12. 19. 오후 3:05 파이썬 정리
정규분포와 응용
1. 라이브러리 임포트
In [187… # 수치 계산에 사용하는 라이브러리
import numpy as np
import pandas as pd
import scipy as sp
# 그래프를 그리기 위한 라이브러리
sns.set()
# 그래프를 주피터 노트북에 그리기 위한 설정
%matplotlib inline
In [188… sp.pi
3.142
Out[188]:
In [189… sp.exp(1)
y:1: DeprecationWarning: scipy.exp is deprecated and will be removed in SciP
y 2.0.0, use numpy.exp instead
sp.exp(1)
2.718
Out[189]:
In [190… # X~N(4,0,64) 인 분포에서

f(3) 를 구하는 방법
stats.norm.pdf(loc=4,scale=0.8,x=3)
0.228
Out[190]:
simulated_sample=stats.norm.rvs(loc=4,scale=0.8,size=100000)
simulated_sample
array([5.299, 3.511, 3.577, ..., 4.065, 4.275, 3.402])

Out[191]:
In [193… # 3 이하인 데이터의 갯수를 비교연산자를 이용하여 계산

sp.sum(simulated_sample<=3)

22. 12. 19. 오후 3:05 파이썬 정리
sp.sum(simulated_sample<=3)
10371
Out[193]:
In [194… # 3 이하인 데이터의 비율

sp.sum(simulated_sample<=3)/len(simulated_sample)
sp.sum(simulated_sample<=3)/len(simulated_sample)
0.104
Out[194]:
In [195… # P(X<=4) 를 구함
stats.norm.cdf(loc=4,scale=0.8,x=4)
0.500
Out[195]:
In [196… # P(X<=3) 를 구함
stats.norm.cdf(loc=4,scale=0.8,x=3)
0.106
Out[196]:
In [197… #Pecent Point Function 으로 분위수를 구함

stats.norm.ppf(loc=4,scale=0.8,q=0.025)
2.432
Out[197]:
In [198… stats.norm.ppf(loc=0,scale=1,q=0.025)
-1.960
Out[198]:
In [200… lower=stats.norm.cdf(loc=0,scale=1,x=-1.96)
stats.norm.ppf(loc=0,scale=1,q=lower)
-1.960
Out[200]:
t분포의 확률밀도함수, 누적확률, 분위수 구하기

In [201… x=np.arange(-3,3.1,0.1)
In [204… # 자유도가 10 t 인 분포의

pdf 값
stats.t.pdf(x=x,df=10)
array([0.011, 0.014, 0.016, 0.019, 0.023, 0.027, 0.032, 0.038, 0.044,

Out[204]:
0.052, 0.061, 0.071, 0.083, 0.096, 0.111, 0.127, 0.145, 0.165,
0.186, 0.208, 0.23 , 0.254, 0.277, 0.299, 0.32 , 0.34 , 0.357,
0.37 , 0.381, 0.387, 0.389, 0.387, 0.381, 0.37 , 0.357, 0.34 ,
0.32 , 0.299, 0.277, 0.254, 0.23 , 0.208, 0.186, 0.165, 0.145,
0.127, 0.111, 0.096, 0.083, 0.071, 0.061, 0.052, 0.044, 0.038,
0.032, 0.027, 0.023, 0.019, 0.016, 0.014, 0.011])
In [205… # 자유도가 10 t 인 분포에서 0이하의 확률

stats.t.cdf(0,10)

22. 12. 19. 오후 3:05 파이썬 정리
0.500
Out[205]:
In [206… # 자유도가 10 t 인 분포에서 1이상의 확률

1-stats.t.cdf(0,10)
0.500
Out[206]:
In [207… # 자유도가 10 t 인 분포에서 누적확률이 0.025 일 때 분위수를 구함

stats.t.ppf(0.025,10)
-2.228
Out[207]:
In [209… # 자유도가 10 t 인 분포에서 난수 2개를 추출

stats.t.rvs(10,size=2)
array([ 0.057, -0.924])

Out[209]:
χ2분포의 확률밀도함수, 누적확률, 분위수 구하기

import scipy as sp
In [212… x=range(0,21)
In [213… # 자유도가 10 인 카이제곱분포의 확률밀도함수

stats.chi2.pdf(x,10)
array([0. , 0.001, 0.008, 0.024, 0.045, 0.067, 0.084, 0.094, 0.098,

Out[213]:
0.095, 0.088, 0.078, 0.067, 0.056, 0.046, 0.036, 0.029, 0.022,
0.017, 0.013, 0.009])
In [214… # 자유도가 10 인 카이제곱분포의 10 이하 확률

stats.chi2.cdf(10,10)
0.560
Out[214]:
In [215… # 자유도가 10 인 카이제곱분포의

10 이상 확률
1-stats.chi2.cdf(10,10)
0.440
Out[215]:
In [216… # 자유도가 10 인 카이제곱분포의 누적확률이 0.025 일 때 분위수

stats.chi2.ppf(0.025,10)
3.247
Out[216]:
In [217… # 자유도가 10 인 카이제곱부포의 난수 20 개 추출

stats.chi2.rvs(10,size=20)
array([17.409, 6.015, 9.135, 7.811, 8.862, 12.656, 4.474, 3.946,

Out[217]:
12.813, 12.437, 9.818, 6.294, 12.77 , 10.103, 13.21 , 10.502,
6.448, 10.81 , 8.638, 8.858])
F분포의 확률밀도함수, 누적확률, 분위수 구하기

In [218… # 자유도가 (13,5) F 인 분포의 f(10) 인 확률밀도함수 값
stats.f.pdf(10,13,5)
22. 12. 19. 오후 3:05 파이썬 정리
0.002
Out[218]:
In [219… # 자유도가 (13,5) F 인 분포의 5이하의 확률

stats.f.cdf(5,13,5)
0.957
Out[219]:
In [220… #자유도가 (13,5) F 인 분포의 누적확률이 0.025 일 때 분위수

stats.f.ppf(0.025,13,5)
0.265
Out[220]:
이항분포의 개별확률, 누적확률, 난수 구하기

In [221… # B(10,0.5) 인 분포로
P(X=5) 의 확률을 구하며 소수 4자리까지 출력
round(stats.binom.pmf(5,10,0.5),4)
0.246
Out[221]:
In [222… # B(10,5) 인 분포로

P(X<=5) 의 확률을 구함
stats.binom.cdf(5,10,0.5)
0.623
Out[222]:
In [223… # B(10,5) 인 분포로

P(X>5) P(X>=8) 또는 의 확률을 구함
1-stats.binom.cdf(5,10,0.5)
0.377
Out[223]:
In [224… # 동전을 열번 던지는 실험을 세 번 했을 때 앞면의 수

stats.binom.rvs(10,0.5,size=3)
array([7, 2, 7])
Out[224]:
포아송분포의 개별확률, 누적확률, 난수 구하기

In [225… #평균이 인 포아송분포의
2 P(X=0) 의값
stats.poisson.pmf(0,2)
0.135
Out[225]:
In [226… #평균이 인 포아송분포의

2 P(X<=1) 의값
stats.poisson.cdf(1,2)
0.406
Out[226]:
In [227… #평균이 인 포아송분포의 난수

2
stats.poisson.rvs(2,size=5)
array([1, 1, 0, 3, 3])
Out[227]:
추정과 가설검정
22. 12. 19. 오후 3:05 파이썬 정리
In [228… # 신뢰구간에 대한 모의실험

import numpy as np
import scipy as sp
In [229… # 95%신뢰구간을 구하는 시행을 20000번 반복한다.

# 신뢰구간이 모평균 4를 포함하면 True
be_included_array=np.zeros(20000)
np.random.seed(1)
norm_dist=stats.norm(loc=4,scale=0.8)
sample=norm_dist.rvs(size=10)
df=len(sample)-1
mu=sp.mean(sample)
std=sp.std(sample,ddof=1)
se=std/sp.sqrt(len(sample))
interval=stats.t.interval(0.95,df,mu,se)
if(interval[0]<=4 and interval[1]>=4):
be_included_array[i]=True
y:10: DeprecationWarning: scipy.mean is deprecated and will be removed in Sc
iPy 2.0.0, use numpy.mean instead
mu=sp.mean(sample)
y:11: DeprecationWarning: scipy.std is deprecated and will be removed in Sci
Py 2.0.0, use numpy.std instead
std=sp.std(sample,ddof=1)
y:12: DeprecationWarning: scipy.sqrt is deprecated and will be removed in Sc
iPy 2.0.0, use numpy.lib.scimath.sqrt instead
se=std/sp.sqrt(len(sample))
In [230… # 신뢰구간이 모평균 를 포함한 비율을 구한다

4 .
sum(be_included_array)/len(be_included_array)
0.948
Out[230]:
모평균에 대한 신뢰구간
μ에 대한 100x(1-α)%에 대한 신뢰구간을 구하기
In [231… x = [3.4,3.3,4.2,4.4,3.7,4.5,4.6,3.8,4.1]
In [232… # 모분산이 0.16 인 경우

lower=sp.mean(x)-stats.norm.ppf(0.975)*sp.sqrt(0.16)/sp.sqrt(len(x))
upper=sp.mean(x)+stats.norm.ppf(0.975)*sp.sqrt(0.16)/sp.sqrt(len(x))
print(" 신뢰하한: ",round(lower,3)," 신뢰상한
: ",round(upper,3))
신뢰하한 : 3.739 신뢰상한 : 4.261

22. 12. 19. 오후 3:05 파이썬 정리
In [234… stats.norm.interval(0.95,sp.mean(x), sp.sqrt(0.16)/sp.sqrt(len(x)))
stats.norm.interval(0.95,sp.mean(x), sp.sqrt(0.16)/sp.sqrt(len(x)))
stats.norm.interval(0.95,sp.mean(x), sp.sqrt(0.16)/sp.sqrt(len(x)))
(3.739, 4.261)
Out[234]:
In [235… # 모분산이 미지인 경우

df=len(x)-1
lower=sp.mean(x)-stats.t.ppf(0.975,df=df)*sp.std(x,ddof=1)/sp.sqrt(len(x))
upper=sp.mean(x)+stats.t.ppf(0.975,df=df)*sp.std(x,ddof=1)/sp.sqrt(len(x))
print(" 신뢰하한: ",round(lower,3)," 신뢰상한
: ",round(upper,3))
신뢰하한 : 3.635 신뢰상한 : 4.365

In [237… stats.t.interval(alpha=0.95,df=df,loc=np.mean(x),scale=sp.std(x,ddof=1)/sp.s

22. 12. 19. 오후 3:05 파이썬 정리
stats.t.interval(alpha=0.95,df=df,loc=np.mean(x),scale=sp.std(x,ddof=1)/s
p.sqrt(len(x)))
stats.t.interval(alpha=0.95,df=df,loc=np.mean(x),scale=sp.std(x,ddof=1)/s
p.sqrt(len(x)))
(3.635, 4.365)
Out[237]:
In [1]: # 비율에 대한 신뢰구간

import scipy as sp
n=100
x=2
hatp=x/n
v=hatp*(1-hatp)/n
lower=hatp-stats.norm.ppf(0.975)*sp.sqrt(v)
upper=hatp+stats.norm.ppf(0.975)*sp.sqrt(v)
print(" 신뢰하한 : ",round(lower,3)," 신뢰상한 : ",round(upper,3))
신뢰하한 : -0.007 신뢰상한 : 0.047

py:9: DeprecationWarning: scipy.sqrt is deprecated and will be removed in Sc
py:10: DeprecationWarning: scipy.sqrt is deprecated and will be removed in S
ciPy 2.0.0, use numpy.lib.scimath.sqrt instead
In [2]: stats.norm.interval(0.95,loc=hatp,scale=sp.sqrt(v))
stats.norm.interval(0.95,loc=hatp,scale=sp.sqrt(v))
(-0.007439495783560755, 0.047439495783560756)
Out[2]:
In [4]: pip install IPython

22. 12. 19. 오후 3:05 파이썬 정리
Requirement already satisfied: IPython in /Users/mind/opt/anaconda3/lib/pyth
on3.9/site-packages (8.2.0)
Requirement already satisfied: pexpect>4.3 in /Users/mind/opt/anaconda3/lib/
python3.9/site-packages (from IPython) (4.8.0)
Requirement already satisfied: decorator in /Users/mind/opt/anaconda3/lib/py
thon3.9/site-packages (from IPython) (5.1.1)
Requirement already satisfied: matplotlib-inline in /Users/mind/opt/anaconda
3/lib/python3.9/site-packages (from IPython) (0.1.2)
Requirement already satisfied: pickleshare in /Users/mind/opt/anaconda3/lib/
python3.9/site-packages (from IPython) (0.7.5)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0
in /Users/mind/opt/anaconda3/lib/python3.9/site-packages (from IPython) (3.
0.20)
Requirement already satisfied: stack-data in /Users/mind/opt/anaconda3/lib/p
ython3.9/site-packages (from IPython) (0.2.0)
Requirement already satisfied: jedi>=0.16 in /Users/mind/opt/anaconda3/lib/p
ython3.9/site-packages (from IPython) (0.18.1)
Requirement already satisfied: pygments>=2.4.0 in /Users/mind/opt/anaconda3/
lib/python3.9/site-packages (from IPython) (2.11.2)
Requirement already satisfied: setuptools>=18.5 in /Users/mind/opt/anaconda
3/lib/python3.9/site-packages (from IPython) (61.2.0)
Requirement already satisfied: appnope in /Users/mind/opt/anaconda3/lib/pyth
on3.9/site-packages (from IPython) (0.1.2)
Requirement already satisfied: backcall in /Users/mind/opt/anaconda3/lib/pyt
hon3.9/site-packages (from IPython) (0.2.0)
Requirement already satisfied: traitlets>=5 in /Users/mind/opt/anaconda3/li
b/python3.9/site-packages (from IPython) (5.1.1)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /Users/mind/opt/anacon
da3/lib/python3.9/site-packages (from jedi>=0.16->IPython) (0.8.3)
Requirement already satisfied: ptyprocess>=0.5 in /Users/mind/opt/anaconda3/
lib/python3.9/site-packages (from pexpect>4.3->IPython) (0.7.0)
Requirement already satisfied: wcwidth in /Users/mind/opt/anaconda3/lib/pyth
on3.9/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->IPyt
hon) (0.2.5)
Requirement already satisfied: pure-eval in /Users/mind/opt/anaconda3/lib/py
thon3.9/site-packages (from stack-data->IPython) (0.2.2)
Requirement already satisfied: executing in /Users/mind/opt/anaconda3/lib/py
Requirement already satisfied: asttokens in /Users/mind/opt/anaconda3/lib/py
Requirement already satisfied: six in /Users/mind/opt/anaconda3/lib/python3.
9/site-packages (from asttokens->stack-data->IPython) (1.16.0)
Note: you may need to restart the kernel to use updated packages.
In [5]: from IPython.display import Image
Image('/Users/mind/downloads/img1.jpeg')
Out[5]:

import scipy as sp
22. 12. 19. 오후 3:05 파이썬 정리
n=1000
x1=40
x2=20
hatp1=x1/n
hatp2=x2/n
hatp=hatp1-hatp2
v=hatp*(1-hatp1)/n+hatp2*(1-hatp2)/n
print(" 신뢰하한: ", round(lower,3)," 신뢰상한

", round(upper,3))
print("")
fci=stats.norm.interval(0.95,loc=hatp,scale=sp.sqrt(v))
print(fci)
print(' 신뢰 하한: %.3f, 신뢰 상한
: %.3f'%fci)
신뢰하한 : 0.008 신뢰상한 0.032
(0.007791453721145501, 0.0322085462788545)
신뢰 하한 : 0.008, 신뢰 상한
: 0.032
fci=stats.norm.interval(0.95,loc=hatp,scale=sp.sqrt(v))
In [17]: # 모분산에 대한 신뢰구간

import numpy as np
import scipy as sp
x=[45,47,44,46,47,48]
n=len(x)
var=sp.var(x,ddof=1)
lower=(n-1)*var/stats.chi2.ppf(0.975,n-1)
upper=(n-1)*var/stats.chi2.ppf(0.025,n-1)
print(" 신뢰 하한 : ",lower," 신뢰상한 : ",upper)
print('')
df = n-1
chi2_under, chi2_upper = stats.chi2.interval(alpha=0.95, df=n-1)
interval_under = df*var/chi2_upper
interval_upper = df*var/chi2_under
print(" 신뢰 하한: ", interval_under, " 신뢰 상한
: ", interval_upper)
신뢰 하한 : 0.8442105318489914 신뢰상한 : 13.033183316449371
신뢰 하한 : 0.8442105318489914 신뢰 상한 : 13.033183316449364
py:9: DeprecationWarning: scipy.var is deprecated and will be removed in Sci
Py 2.0.0, use numpy.var instead
var=sp.var(x,ddof=1)
22. 12. 19. 오후 3:05 파이썬 정리
모분산 비에 대한 신뢰구간
In [18]: from IPython.display import Image
Image('/Users/mind/downloads/img2.jpeg')
Out[18]:

import scipy as sp
n1=10
n2=8
v1=0.25
v2=0.49
lower = v1/v2/stats.f.ppf(0.99,n1-1,n2-1)
upper = v1/v2/stats.f.ppf(0.01,n1-1,n2-1)
모분산 비에 대한 98% 신뢰구간 : (", round(lower,3),round(upper,3),")")

print("
모분산 비에 대한 98% 신뢰구간 : ( 0.076 2.864 )

In [24]: # 검정
from IPython.display import Image
Image('/Users/mind/downloads/img3_1.jpeg')
Out[24]:
In [25]: Image('/Users/mind/downloads/img3_2.jpeg')

22. 12. 19. 오후 3:05 파이썬 정리
Out[25]:
Out[26]:
In [27]: z=(0.59-0.6)/(0.1/sp.sqrt(100))
if(z<stats.norm.ppf(0.05,loc=0,scale=1)):
귀무가설을 기각합니다
print(" .")
else:
귀무가설을 채택합니다
print(" .")
귀무가설을 채택합니다.
z=(0.59-0.6)/(0.1/sp.sqrt(100))
In [28]: # p 값을 출력
stats.norm.cdf(z,loc=0,scale=1)
0.15865525393145685
Out[28]:
In [29]: Image('/Users/mind/downloads/img4.jpeg')

22. 12. 19. 오후 3:05 파이썬 정리
Out[29]:
In [37]: # (양측검정 양측검정의 값은 단측의 두배임

p )
t=(14.2-15)/(2.5/sp.sqrt(25))
if(t>stats.t.ppf(0.995,24)or t<stats.t.ppf(0.005,24)):
print(" 귀무가설을 기각합니다
")
else:
print(" 귀무가설을 채택합니다
")
2*stats.t.cdf(t,df=24)
귀무가설을 채택합니다
t=(14.2-15)/(2.5/sp.sqrt(25))
0.12268143730144648
Out[37]:
In [40]: x=[3.4,3.3,4.2,4.4,3.7,4.5,4.6,3.8,4.1]
t=(sp.mean(x)-3.5)/(sp.std(x,ddof=1)/sp.sqrt(len(x)))
t
3.162277660168379
Out[40]:
In [42]: # 양측검정 속성
( alterantive 에 대한 정의가 없으면 양측검정이다.)
ttest=stats.ttest_1samp(x,3.5)
print('t-value=%.3f,p-value=%.3f'%ttest)

22. 12. 19. 오후 3:05 파이썬 정리
t-value=3.162,p-value=0.013
In [43]: # 3.5 보다 큰 경우의 검정

ttest=stats.ttest_1samp(x,3.5,alternative='greater')
In [44]: # 3.5 보다 작은 경우의 검정

ttest=stats.ttest_1samp(x,3.5,alternative='less')
Out[45]:
In [46]: # 등분산 확인
x=[44,44,56,46,47,38,58,53,49,35,46,30,41]
y=[35,47,55,29,40,39,32,41,42,57,51,39]
#등분산 테스트
equal=stats.levene(x,y)
print("LeveneResult(statistic=%.3f,pvalue=%.3f)"% equal)
LeveneResult(statistic=0.092,pvalue=0.764)
pvalue는 0.764로 귀무가설을 채택하여 등분산임을 알 수 있다.

In [47]: #ttest_ind ind 는 독립을 의미한다
.
test=stats.ttest_ind(x,y,equal_var=True,alternative="greater") # 등분산인 경우
print(" 검정통계량 값
=%.3f, p =%.3f"%test)
검정통계량=0.868, p값=0.197

22. 12. 19. 오후 3:05 파이썬 정리
Out[48]:
In [51]: x=[70,80,72,76,76,76,72,78,82,64,74,92,74,68,84]
y=[68,72,62,70,58,66,68,52,64,72,74,60,74,72,74]
#ttest_rel 은 짝표본 대응표본

( )
test=stats.ttest_rel(x,y,alternative="greater")
print(test)
print(" 검정통계량 값
: %.3f, p :%.3f"%test)
Ttest_relResult(statistic=3.105360487466109, pvalue=0.0038747180533270594)
검정통계량 : 3.105, p :0.004 값
Out[52]:

import scipy as sp
a=np.array([1.1,2.3,4.3,2.2,5.3])
n=len(a)
df= n-1
alpha=0.05
sigma2=1
chisq=(n-1)*sp.var(a,ddof=1)/sigma2
p_value=2*(1-sp.stats.chi2.cdf(chisq,df))
print('chisq =',round(chisq,3),' p-value =',round(p_value,3))
if(p_value<alpha):
print(" :결론 귀무가설을 기각하고 모분산이 1이라고 할 수 없다.")
else:
print(" :결론 귀무가설을 채택하며 모분산이 1이라고 할 수 있다.")
chisq = 11.712 p-value = 0.039
결론 : 귀무가설을 기각하고 모분산이 1이라고 할 수 없다.

22. 12. 19. 오후 3:05 파이썬 정리
py:10: DeprecationWarning: scipy.var is deprecated and will be removed in Sc
iPy 2.0.0, use numpy.var instead
chisq=(n-1)*sp.var(a,ddof=1)/sigma2
두 모분산이 같다고 할 수 있는지에 대한 검정을 하여라.

import scipy as sp
a=np.array([1.1,2.3,4.3,2.2,5.3])
b=np.array([2.3, 4.3,3.5])
df1=len(a)-1
df2=len(b)-1
alpha=0.05
f=sp.var(a,ddof=1)/sp.var(b,ddof=1)
p_value=2*(1-sp.stats.f.cdf(f,df1,df2))
print('F=',round(f,3),' p-value = ', round(p_value,3))

print("")
if(p_value<alpha):
print(" :결론 귀무가설을 기각하고 모분산이 다르다.")
else:
print(" :결론 귀무가설을 채택하고 두 모분산은 같다.")
F= 2.889 p-value = 0.547
결론 : 귀무가설을 채택하고 두 모분산은 같다.

py:10: DeprecationWarning: scipy.var is deprecated and will be removed in Sc
iPy 2.0.0, use numpy.var instead
f=sp.var(a,ddof=1)/sp.var(b,ddof=1)
카이제곱검정
적합도 검정
카이제곱 검정은 근사 통계량임. 각 구간의 기대빈도가 5이상일 경우 검정 통계량은 카이제곱분포를 따른
다.
이것이 만족되지 않으면 카이제곱 검정 방법을 사용할 수 없다.

22. 12. 19. 오후 3:05 파이썬 정리
Out[59]:

import scipy as sp
from scipy.stats import chisquare
p0 = np.array([1/6,1/6,1/6,1/6,1/6,1/6])
n=1000
observed=[150,160,165,155,170,200] # 관측빈도
expected=n*p0 # 기대빈도
sp.stats.chisquare(f_obs=observed, f_exp=expected)
Power_divergenceResult(statistic=9.5, pvalue=0.09070739170404737)
Out[58]:
Out[60]:

import scipy as sp
from scipy.stats import chisquare
p0=np.array([9/16,3/16,3/16,1/16])
n=381
observed=[216,79,65,21]
expected=n*p0
sp.stats.chisquare(f_obs=observed,f_exp=expected)
Power_divergenceResult(statistic=1.726159230096238, pvalue=0.631133745993308
Out[62]:
4)
# 가설설정
#H0 : 성별과 텔레비전 크기는 서로 독립이다 . vs H1 : not H0

22. 12. 19. 오후 3:05 파이썬 정리
Out[64]:

from scipy.stats import chi2_contingency
obs=np.array([[40,100,60,40,20],[80,80,40,30,10]])
chi2_contingency(obs)
(23.55514855514856,
Out[66]:
9.80647431753431e-05,
4,
array([[62.4, 93.6, 52. , 36.4, 15.6],
[57.6, 86.4, 48. , 33.6, 14.4]]))
p-value : 9.80647431753431e-05로 유의수준 0.05보다 작으므로 귀무가설을 기각한다.

성별에 딸 텔레비전 선호도는 달라진다.
분산분석
분산분석 개요
집단이 셋 이상인 경우에 사용됨
평균을 비교하고자 하는 각 그룹별 평균의 분산이 0이면 세그룹의 평균은 동일하다고 판단할 수 있다.
각 그룹의 평균은 전체 평균에서 각 그룹들이 가지고 있는 요이난크 변한 값이라고 가정한다.
A라는 그룹의 평균μa는 "μ+a라는 그룹이 갖는 효과"로 표현된다."
이러한 효과를 요인이라고 하며 각 요인이 취하는 값에 따라서 그룹이 나누어진다.
요인이 하나인 경우 이를 일원배치라 하며, 일원배치 자료로부터 실시하는 분산분석을 일원배치 분산분석
이라고 한다.
요인 수가 증가함에 따라 이원배치, 삼원배치로 표현한다.
※ 유사하게 동일한 배열을 반복해서 복사 붙여넣기 하는 함수 = np.tile.
※ 결합을 하기 위해서 vstack와 hstack를 사용
vstack = 배열을 세로로 결합할 때 사용
hstack = 배열을 가로로 결합할 때 사용

22. 12. 19. 오후 3:05 파이썬 정리
Out[67]:
Out[68]:

import scipy.stats as tats
from pandas import Series,DataFrame
import matplotlib.pyplot as plt
%matplotlib inline
In [76]: data1={'cotton':[7,7,15,11,9,12,17,12,18,18,14,18,18,19,19,19,25,22,19,23,7,
'group':np.hstack([np.tile("a1",5),np.tile("a2",5),np.tile("a3",5),np.
data=DataFrame(data1)
print(data)
cotton group
0 7 a1
1 7 a1
2 15 a1
3 11 a1
4 9 a1
5 12 a2
6 17 a2
7 12 a2
8 18 a2
9 18 a2
10 14 a3
11 18 a3
12 18 a3
13 19 a3
14 19 a3
15 19 a4
16 25 a4
17 22 a4
18 19 a4
19 23 a4
20 7 a5
21 10 a5
22 11 a5
23 15 a5
24 11 a5

22. 12. 19. 오후 3:05 파이썬 정리
In [77]: import seaborn as sns
sns.boxplot(x='group', y='cotton', data=data)

plt.show()
In [78]: group1 = data[data['group']=='a1']['cotton']

group2 = data[data['group']=='a2']['cotton']
In [79]: F_statistic, pVal = stats.f_oneway(group1,group2, group3,group4,group5)
print("F 검정통계량 : ",round(F_statistic,3),"p 값 : ",round(pVal,5))
F 검정통계량 : 14.757 p 값 : 1e-05

p 값이 매우 유의함을 알 수 있다.
In [80]: from statsmodels.sandbox.stats.multicomp import MultiComparison
comp = MultiComparison(data['cotton'], data['group'])

result = comp.allpairtest(stats.ttest_ind, method='bonf')
result[0]
Out[80]: Test Multiple Comparison ttest_ind FWER=0.05

method=bonf alphacSidak=0.01, alphacBonf=0.005
group1 group2 stat pval pval_corr reject
a1 a2 -2.7325 0.0257 0.2575 False
a1 a3 -4.4301 0.0022 0.022 True
a1 a4 -6.2191 0.0003 0.0025 True
a1 a5 -0.5077 0.6254 1.0 False
a2 a3 -1.3101 0.2265 1.0 False
a2 a4 -3.4027 0.0093 0.0932 False
a2 a5 2.4244 0.0416 0.4156 False
a3 a4 -2.6846 0.0277 0.2773 False
a3 a5 4.3007 0.0026 0.0261 True
a4 a5 6.2354 0.0002 0.0025 True
22. 12. 19. 오후 3:05 파이썬 정리
이원분산분석 two-way variance analysis

이원 분산분석의 의의 : 종속변수에 영향을 주는 두 개의 요인에 대해 두 변수의 영향을 동시에 검정하고,
또한 상호작용에 대한 가설도 검정
Out[81]:
가설 : H0 : α1=α2=α3=...=αa =0
H1 : 최소한 하나의 수준에서는 0잉 아니다. F0 = MSA/MSE ~ F(a-1,n-a-b+1) F0>F이면, 귀무가설
22. 12. 19. 오후 3:05 파이썬 정리
을 기각
F0<F이면, 귀무가설을 채택
Out[82]:
In [87]: import pandas as pd

df=pd.DataFrame({
'design':["a","b","c","a","b","c"],
'ad':[1,1,1,0,0,0],
'y':[23,15,18,16,9,11]
})
df.head()
Out[87]: design ad y
0 a 1 23
1 b 1 15
2 c 1 18
3 a 0 16
4 b 0 9
In [90]: import statsmodels.formula.api as smf

import statsmodels.api as sm
aov_model=smf.ols("y~design+ad",data=df).fit()
print(sm.stats.anova_lm(aov_model,typ=2))
print(aov_model.params)
sum_sq df F PR(>F)
design 58.333333 2.0 175.0 0.005682
ad 66.666667 1.0 400.0 0.002491
Residual 0.333333 2.0 NaN NaN
Intercept 16.166667
design[T.b] -7.500000
design[T.c] -5.000000
ad 6.666667
dtype: float64
유의수준 0.05 에서 광고 효과 p값이 0.0024이며 디자인 종류에 따라 효과의 p값이 0.0057이므로 유

의함을 알 수 있다.
즉 디자인 종류 및 광고 유무에 따라 매출액은 달라짐을 알 수 있다.
In [91]: sns.boxplot(x="design",y="y",data=df)
<AxesSubplot:xlabel='design', ylabel='y'>
Out[91]:

22. 12. 19. 오후 3:05 파이썬 정리
In [92]: sns.boxplot(x="ad",y="y",data=df)
<AxesSubplot:xlabel='ad', ylabel='y'>
Out[92]:
반복이 있는 이원 분산분석

22. 12. 19. 오후 3:05 파이썬 정리
Out[93]:

22. 12. 19. 오후 3:05 파이썬 정리
Out[94]:
In [101… import pandas as pd

import statsmodels.formula.api as smf
import statsmodels.api as sm
pressure = np.array([200,220,240,200,220,240,200,220,240,200,220,240])
temp=np.array(["low","low","low","low","low","low","high","high","high","hig
y=np.array([90.4,90.7,90.2,90.2,90.1,90.4,92.2,91.6,90.5,93.7,91.8,92.8])
df=pd.DataFrame({
'pressure': pressure,
'temp':temp,
'y':y
})
df.head()

22. 12. 19. 오후 3:05 파이썬 정리
Out[101]: pressure temp y

0 200 low 90.4
1 220 low 90.7
2 240 low 90.2
3 200 low 90.2
4 220 low 90.1
In [102… formula='y~C(temp)+C(pressure)+C(temp):C(pressure)'
model=smf.ols(formula,data=df).fit()
aov_table=sm.stats.anova_lm(model,typ=2)
print(aov_table)
sum_sq df F PR(>F)
C(temp) 9.363333 1.0 14.009975 0.009589
C(pressure) 1.011667 2.0 0.756858 0.509201
C(temp):C(pressure) 1.171667 2.0 0.876559 0.463473
Residual 4.010000 6.0 NaN NaN
유의수준 0.05에서 temp는 유의함을 알 수 있으며

pressure는 유의하지 못하고
상호작용도 유의하지 못함을 알 수 있다.
범주형 데이터를 사용하는 독립변수 앞에는 C를 붙여줘야 한다.
In [103… from statsmodels.graphics.factorplots import interaction_plot
fig1=interaction_plot(x=temp,trace=pressure,response=y,xlabel='temperature')
fig2=interaction_plot(x=pressure,trace=temp,response=y,xlabel='pressure')

22. 12. 19. 오후 3:05 파이썬 정리
In [104… Image('/Users/mind/downloads/img16_1.jpeg')
Out[104]:

22. 12. 19. 오후 3:05 파이썬 정리
Out[105]:

22. 12. 19. 오후 3:05 파이썬 정리
Out[106]:
Out[107]:

22. 12. 19. 오후 3:05 파이썬 정리
In [108… # 산점도와 상관계수

import numpy as np
x=np.array([1.9,0.8,1.1,0.1,-0.1,4.4,4.6,1.6,5.5,3.4])
y=np.array([0.7,-1.0,-0.2,-1.2,-0.1,3.4,4.0,0.8,3.7,2.0])
In [109… plt.scatter(x,y,label="stars",color="green",marker="+")
<matplotlib.collections.PathCollection at 0x7fa1191053a0>
Out[109]:
In [112… 행 열
# 1 2 / 2 1 행 열의 숫자를 통해 양의 상관성이 높음을 알 수 있다.
r=np.corrcoef(x,y)
r
array([[1. , 0.96170475],
Out[112]:
[0.96170475, 1. ]])

df=pd.DataFrame()
df['x']=x
df['y']=y
lm_ex=smf.ols("y~x",df).fit()
lm_ex.summary()
/Users/mind/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/stats.py:1
541: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n
=10
warnings.warn("kurtosistest only valid for n>=20 ... continuing "

22. 12. 19. 오후 3:05 파이썬 정리
Out[113]: OLS Regression Results

Dep. Variable: y R-squared: 0.925
Model: OLS Adj. R-squared: 0.915
Method: Least Squares F-statistic: 98.49
Date: Sat, 17 Dec 2022 Prob (F-statistic): 8.98e-06
Time: 09:39:16 Log-Likelihood: -7.3987
No. Observations: 10 AIC: 18.80
Df Residuals: 8 BIC: 19.40
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept -0.9725 0.284 -3.428 0.009 -1.627 -0.318
x 0.9367 0.094 9.924 0.000 0.719 1.154
Omnibus: 0.650 Durbin-Watson: 1.498
Prob(Omnibus): 0.722 Jarque-Bera (JB): 0.556
Skew: 0.453 Prob(JB): 0.757
Kurtosis: 2.283 Cond. No. 5.09
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
R^2 = 0.925
Intercept (β0) = -0.9725 x (β1) = 0.094
추정된 모형
y=-0.9725+0.094x
In [114… model_ex=sm.OLS(y,x) # 절편이 없음
result_ex=model_ex.fit()
print(result_ex.summary())
print(result_ex.predict(x))

22. 12. 19. 오후 3:05 파이썬 정리
OLS Regression Results
============================================================================
===========
Dep. Variable: y R-squared (uncentered):
0.870
Model: OLS Adj. R-squared (uncentered):
0.856
Method: Least Squares F-statistic:
60.29
Date: Sat, 17 Dec 2022 Prob (F-statistic):
2.81e-05
Time: 09:46:23 Log-Likelihood:
-11.917
No. Observations: 10 AIC:
25.83
Df Residuals: 9 BIC:
26.14
Df Model: 1
============================================================================
==
coef std err t P>|t| [0.025 0.97
5]
----------------------------------------------------------------------------
--
x1 0.6860 0.088 7.765 0.000 0.486 0.8
86
============================================================================
==
38
52
Skew: -0.008 Prob(JB): 0.8
82
Kurtosis: 2.223 Cond. No. 1.
00
============================================================================
==
Notes:
[1] R² is computed without centering (uncentered) since the model does not c
ontain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is corre
ctly specified.
[ 1.30331969 0.54876618 0.7545535 0.06859577 -0.06859577 3.01821401
3.15540555 1.09753237 3.77276751 2.33225628]
=10
In [117… #위와 동일
lm_reg=smf.ols(formula='y~x+0',data=df).fit()
lm_reg.summary()
=10

22. 12. 19. 오후 3:05 파이썬 정리

Dep. Variable: y R-squared (uncentered): 0.870
Model: OLS Adj. R-squared (uncentered): 0.856
Date: Sat, 17 Dec 2022 Prob (F-statistic): 2.81e-05
Df Model: 1
coef std err t P>|t| [0.025 0.975]
x 0.6860 0.088 7.765 0.000 0.486 0.886
Skew: -0.008 Prob(JB): 0.882
Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a
constant.
specified.
추정된 회귀모형
y = -0.9725+0.9367x
In [118… # 이차다항회귀모형에 적합시켜보기
import numpy as np
x=np.array([1.9,0.8,1.1,0.1,-0.1,4.4,4.6,1.6,5.5,3.4])
y=np.array([0.7,-1.0,-0.2,-1.2,-0.1,3.4,4.0,0.8,3.7,2.0])
x2 = x**2

df=pd.DataFrame()
df['x']=x
df['x2']=x2
df['y']=y
lm_ex=smf.ols("y~x+x2",df).fit()
lm_ex.summary()
=10

22. 12. 19. 오후 3:05 파이썬 정리

Dep. Variable: y R-squared: 0.926
Date: Sat, 17 Dec 2022 Prob (F-statistic): 0.000112
Df Model: 2
coef std err t P>|t| [0.025 0.975]
Intercept -0.9053 0.381 -2.375 0.049 -1.807 -0.004
x 0.8280 0.391 2.119 0.072 -0.096 1.752
x2 0.0206 0.072 0.288 0.782 -0.149 0.190
Skew: 0.257 Prob(JB): 0.778
Notes:
specified.
추정된 회귀모형
y=-0.9053+0.8280x+0.0206x^2
다중회귀분석 (다중회귀모형)
In [120… import pandas as pd; import scipy.stats as ss; import seaborn as sns
all=pd.DataFrame({'temp':[195,179,205,204,167,184,187],
'pressure':[57,61,60,62,61,59,62],
'intensity':[81.4,122.2,170.7,175.6,150.3,96.8,169.8]})
scatter=sns.PairGrid(all)
scatter.map(sns.scatterplot)
<seaborn.axisgrid.PairGrid at 0x7fa0c8326bb0>
Out[120]:

22. 12. 19. 오후 3:05 파이썬 정리
In [122… prod=smf.ols(formula='intensity~temp+pressure',data=all).fit()
prod.summary()
print(prod.summary())

22. 12. 19. 오후 3:05 파이썬 정리
OLS Regression Results
============================================================================
==
Dep. Variable: intensity R-squared: 0.8
26
39
84
Date: Sat, 17 Dec 2022 Prob (F-statistic): 0.03
03
92
No. Observations: 7 AIC: 63.
58
Df Residuals: 4 BIC: 63.
42
Df Model: 2
============================================================================
==
coef std err t P>|t| [0.025 0.97
5]
----------------------------------------------------------------------------
--
Intercept -1180.2817 304.169 -3.880 0.018 -2024.790 -335.7
73
temp 0.9945 0.587 1.695 0.165 -0.635 2.6
24
pressure 18.7559 4.475 4.192 0.014 6.333 31.1
79
============================================================================
==
Omnibus: nan Durbin-Watson: 3.4
17
Prob(Omnibus): nan Jarque-Bera (JB): 0.6
26
Skew: 0.364 Prob(JB): 0.7
31
Kurtosis: 1.728 Cond. No. 8.16e+
03
============================================================================
==
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is corre
ctly specified.
[2] The condition number is large, 8.16e+03. This might indicate that there
are
strong multicollinearity or other numerical problems.
/Users/mind/opt/anaconda3/lib/python3.9/site-packages/statsmodels/stats/stat
tools.py:74: ValueWarning: omni_normtest is not valid with less than 8 obser
vations; 7 samples were given.
warn("omni_normtest is not valid with less than 8 observations; %i "
/Users/mind/opt/anaconda3/lib/python3.9/site-packages/statsmodels/stats/stat
tools.py:74: ValueWarning: omni_normtest is not valid with less than 8 obser
vations; 7 samples were given.
warn("omni_normtest is not valid with less than 8 observations; %i "
안양대학교 2017E8122 이용준

22. 12. 19. 오후 3:05 파이썬 정리
데이터 처리 실무 과제
In [6]: s = [20,22,18,19,7,6]
sorted(s)
[6, 7, 18, 19, 20, 22]

Out[6]:
In [5]: s.sort()
s
[6, 7, 18, 19, 20, 22]

Out[5]:
In [9]: s = [20,22,18,19,7,6]
sorted(s,reverse=True)
[22, 20, 19, 18, 7, 6]

Out[9]:
In [10]: s = [20,22,18,19,7,6,22,19,6]
set(s)
{6, 7, 18, 19, 20, 22}

Out[10]:

import scipy.stats as stats
In [12]: x1=np.random.normal(loc=0,scale=1,size=1000)
x2=np.random.normal(loc=0,scale=1,size=1000)
In [13]: stats.shapiro(x1)
ShapiroResult(statistic=0.9983627200126648, pvalue=0.4683176875114441)
Out[13]:
In [14]: stats.bartlett(x1,x2)
BartlettResult(statistic=6.649219979740087, pvalue=0.009919924282583092)
Out[14]:
In [15]: stats.levene(x1,x2)
LeveneResult(statistic=8.886769444543772, pvalue=0.0029071468929051667)
Out[15]:

%matplotlib inline
In [19]: x= np.array([0,1,2,3,4,5,6,7,8,9])
y=(x**2)/10
plt.plot(x,y,color='blue')
plt.title("quadratic plot")
plt.xlabel("x")
plt.ylabel("y")
Text(0, 0.5, 'y')

Out[19]:

22. 12. 19. 오후 3:05 파이썬 정리
In [22]: pi=3.14
x=np.arange(0,20)*pi/10
y=np.sin(x)
plt.plot(x,y,color='blue')
plt.title("sin plot")
plt.xlabel("x")
plt.ylabel("sin x")
Text(0, 0.5, 'sin x')

Out[22]:
In [23]: import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D
import numpy as np
In [25]: fig=plt.figure(figsize=(10,10))
ax=Axes3D(fig)
x1=np.arange(-3,4,0.1)
x2=np.arange(-4,5,0.1)
x1, x2 = np.meshgrid(x1, x2)
y=(x1**2+x2**2+x1*x2)
ax.plot_surface(x1,x2,y,rstride=1,cstride=1,cmap=plt.cm.hot)
ax.contourf(x1,x2,y,zdir='z',offset=-2,cmap=plt.cm.hot)
ax.set_zlim(-2.50)
plt.show()

22. 12. 19. 오후 3:05 파이썬 정리
py:2: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure
is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False
and use fig.add_axes(ax) to suppress this warning. The default value of auto
_add_to_figure will change to False in mpl3.5 and True values will no longer
work in 3.6. This is consistent with other Axes classes.
ax=Axes3D(fig)

x=np.arange(-3,3.1,step=0.1)
plt.plot(x,stats.norm.pdf(x=x),color='black',linestyle='dotted')
plt.plot(x,stats.t.pdf(x=x,df=5),color='green')
[<matplotlib.lines.Line2D at 0x7f79f0c9cee0>]
Out[27]:

22. 12. 19. 오후 3:05 파이썬 정리

[<matplotlib.lines.Line2D at 0x7f79f0c13460>]
Out[28]:

[<matplotlib.lines.Line2D at 0x7f79e6a41a90>]
Out[31]:

22. 12. 19. 오후 3:05 파이썬 정리

Exercise Python

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Exercise Python

Uploaded by

Copyright:

Available Formats

22. 12. 19.

3. not(논리부정) [참이면 거짓, 거짓이면 참]

Requirement already satisfied: numpy in /Users/mind/opt/anaconda3/lib/python

In [6]: import numpy as np

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])

localhost:8888/nbconvert/html/Desktop/anaconda/파이썬 정리.ipynb?download=false 2/71

결측치(관측되지 않거나 어떤 이유로 계산되지 않은 값)와 무

해당 조건이 참일 경우 반복해서 수행하는 함수 while 조건문

In [25]: #1부터 10까지의 제곱을 출력하는 경우

localhost:8888/nbconvert/html/Desktop/anaconda/파이썬 정리.ipynb?download=false 4/71

In [26]: # 누적합 구하는 방법

In [27]: # 제곱합 구하는 방법

In [29]: import math

평균 : 14.333 | 분산 : 34.333 | 표본편차 : 5.859

In [31]: #구구단 만들기

localhost:8888/nbconvert/html/Desktop/anaconda/파이썬 정리.ipynb?download=false 5/71

In [32]: # 변이계수 구하는 방법

x의 변이계수가 더 크므로 x변동이 더 크다

localhost:8888/nbconvert/html/Desktop/anaconda/파이썬 정리.ipynb?download=false 7/71

In [ ]: def summary(x):

In [4]: # 절댓값 구하기

In [2]: now= dt.datetime.now()

localhost:8888/nbconvert/html/Desktop/anaconda/파이썬 정리.ipynb?download=false 9/71

In [24]: # 배열의 특정 위치의 값을 바꾸는 경우

array([ 1, 2, 3, 11, 5, 6, 7, 8, 9])

In [28]: x[0:5] # 리스트 첫번째 부터 다섯 번째까지 출력

In [29]: x[:5] # 리스트 첫번째 부터 다섯 번째까지 출력

In [30]: x[-1] # 뒤에서 첫 번째 출력

In [31]: x[-5:-2] # 뒤에서 다섯 번째부터(자신포함) 출력

[1, 2, '1', [1, 2, 'c']]

localhost:8888/nbconvert/html/Desktop/anaconda/파이썬 정리.ipynb?download=false 10/71

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [42]: tp1[2]=3 # 튜플은 내부값 변경이 불가능하다.

TypeError: 'tuple' object does not support item assignment

1. 키 값에 list, set이 올 수 없다.

3. 딕셔너리 키 값 추가, 접근방법, 값 변경

{'korea': 82, 'us': 1, 'china': 86}

In [46]: print(country_code[1]) # 단순 위치번호로는 불러올 수 없음

dict_keys(['korea', 'us', 'china'])

dict_items([('korea', 82), ('us', 1), ('china', 86)])

In [50]: import numpy as np

In [51]: print(a+b) # 사칙연산 가능하다

In [53]: print(a.I) # 역행렬 혹은 numpy.inv(a)

In [55]: print(a.T) # 행렬의 전치

In [56]: np.eye(3) # 단위행렬 만들기

localhost:8888/nbconvert/html/Desktop/anaconda/파이썬 정리.ipynb?download=false 12/71

In [57]: np.linalg.det(a) # 행렬식

In [59]: np.linalg.eigvals(a) # 고유치

파이썬을 이용한 데이터 분석

In [63]: fish_data = np.array([2,3,3,4,4,4,4,5,5,6])

In [64]: len(fish_data) # 데이터 수

In [69]: # 모평균 구하기

In [67]: sp.mean(fish_data) #scipy 의 mean 함수를 이용

In [70]: sigma_2_pop=sp.sum((fish_data-mu)**2)/N # 모분산 구하는 방법

In [71]: sp.var(fish_data,ddof=0) # 표본분산은 ddof=1, 모분산은 ddof=0

In [76]: sigma_2_sample=sp.sum((fish_data-mu)**2/(N-1)) # 표본분산 구하는 방법

In [77]: sigma_sample=sp.sqrt(sigma_2_sample) # 표본 표준편차 구하는 방법

In [78]: #scipy 로 표본 표준편차 구하기

In [82]: sp.amax(fish_data) # 최대값 구하기

localhost:8888/nbconvert/html/Desktop/anaconda/파이썬 정리.ipynb?download=false 14/71

In [83]: sp.amin(fish_data) # 최소값 구하기

In [84]: sp.median(fish_data) # 중앙값 구하기

소수표현 round(x,3) 소수 셋째자리까지의 표현

In [88]: stats.scoreatpercentile(fish_data_3,75) # 제 75 백분위수