You are on page 1of 13

WINTER 2022

86
g
统 计

an
u i
oy Un
W
STA 238
ia y
vv
Sa
Q

公开课 Class 1
By Alex

20011009279
Disclaimers

This copy of handout and its content is the intellectual property of SavvyUni Edu. UTSG campus,

86
2020. All rights reserved.

This handout is intended to be used as a supplement study material to the class contents taught in

g
school. The purpose of this handout is help students strengthen the knowledge of the subject area

an
u i
by clarifying concepts, summarizing key points, and providing additional practices. This handout

oy Un
is NOT a direct substitute to any course material, lecture notes, problem sets, past exams provided

W
by professors, school programs, and other publicly available resources.
ia y
vv
You may not, except agreed upon, distribute or commercially exploit this handout and associated
content. Any forms of re-distribution or re-production of a part or all of this handout in any form
is prohibited. SavvyUni Edu. reserves the right to seek all remedies available for any violations.
Sa
Q

20011009279
Winter 2022 STA238 公开课

Topic: Exploratory Data Analysis

86
1 Course Description - 课程介绍

g
STA238 承接了 STA237 的内容,是大家第一门真真正正的数据分析课程。它将培养大家逐

an
u i
步成为一名“统计学家”,以统计学的思维来分析生活中的许多事物。STA238 将为大家介

oy Un
绍许多核心统计学概念,例如 Maximum Likelihood Estimation, Confidence Interval, Linear

Regression 等,都会在之后的 STA 课里反复出现,可以说是数据分析领域的基础核心知识。

W
ia y
在我们学习 STA238 的过程中,核心材料有 Textbook,课堂笔记,书后练习和 Test。在学习
vv
的过程中,大家一定要从基本的定义和方法入手,而不是直接盲目刷题。同时,对所有不理

解、不熟练的概念还有做错的题目一定要进行归纳及整理,这些都是考前复习的宝贵材料。
Sa
Q

Page 1

20011009279
2 Review of Statistical Concepts

2.1 Population vs. Sample, Parameter vs. Estimate vs. Estimator

86
Population: all members of a specified group. It’s what we are interested in but cannot

directly observe.

所有我想研究的人,但我观测不到。

g
an
u i
Parameter: an unknown true quantity about the population.

oy Un
Population 里某个我想知道但不知道的量。

W
ia y
vv
Sample: a part, or a subset, of a population. We use information from the sample to make

inference(推测) about the population.

实验采集来的样本,用来帮助我推测 Population 里我不知道的量。


Sa

Statistic: a value computed from the sample data to describe a feature of the data。

通过对我实验数据进行计算所算出的一个 parameter 的估计值。


Q

Outlier: an unusually large/small observation in your data that may not be representative

of the population.

Page 2

20011009279
3 Statistical Process

1. Formulate a research question

86
想研究什么?

2. Design your experiment/study

准备怎么研究?

g
an
u i
3. Collect your data

oy Un
收集数据

W
4. Exploratory data analysis

通过计算和画图熟悉你的数据
ia y
vv
5. Inference on data

试图用数据回答你想研究的问题
Sa

6. Make conclusions

下结论
Q

Page 3

20011009279
4 Exploratory Data Analysis

Example 1. We surveyed 10 students and recorded their heights below:

86
169.1 170.9 167.6 170.5 168.6 169.8 170.3 171.0 166.6 172.1

What are some statistics that you can use to help describe this data sample?

g
an
u i
oy Un
W
ia y
vv
Sa
Q

Page 4

20011009279
4.1 Numerical Summaries - Centre of Data

Sample Mean:

86
x1 +x2 +···+xn
x̄n = n

g
an
u i
oy Un
W
ia y
vv
Sa

Sample Median:

a number such that half of the values are smaller than the median and half of the values are

larger than the median.


Q

Page 5

20011009279
4.2 Numerical Summaries - Variability of Data

Sample Variance:

∑n

86
s2 = 1
n−1 i=1 (xi − x̄n )2

g
an
u i
oy Un
Sample Standard Deviation:

s2

W
ia y s=
vv
Sa

Inter-Quartile Range (IQR):

IQR = qn (0.75) − qn (0.25)


( )
qn (p) = x(k) + α x(k+1) − x(k)
Q

k = ⌊p(n + 1)⌋ and α = p(n + 1) − k

Range:

range = max - min

Page 6

20011009279
Example 2. Using the same data we had in Example 1, what are some graphs that can help

visualize this set of data?

86
g
an
u i
oy Un
W
ia y
vv
Sa
Q

Page 7

20011009279
4.3 Graphical Summaries

Histogram:

86
Bi = [x0 + (i − 1)b, x0 + ib)
the number of xj in Bi
n

g
an
u i
oy Un
W
ia y
vv
Sa
Q

Page 8

20011009279
Empirical Cumulative Distribution Function:

number of elements in the dataset ≤x


Fn (x) = n

86
g
an
u i
oy Un
W
ia y
vv
Sa
Q

Page 9

20011009279
Box-and-Whisker Plot:

A boxplot displays the distribution of data based on a five-number summary

86
g
an
u i
oy Un
W
ia y
vv
Sa
Q

Page 10

20011009279
86
g
an
u i
oy Un
W
ia y
vv
Sa
Q

+1(226)339-6187
SAVVYUNI_UTVIP

20011009279

You might also like