0% found this document useful (0 votes)
22 views3 pages

EDX S1 Lecture Note: 1 Basic Statistics

The document provides an overview of basic statistics concepts including types of data, measures of central tendency, measures of spread, and representation of data. It discusses raw and grouped data, frequency tables, interpolation, modes, means, medians, quartiles, variance, standard deviation, histograms, box plots, and comparing data.

Uploaded by

lasnieyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views3 pages

EDX S1 Lecture Note: 1 Basic Statistics

The document provides an overview of basic statistics concepts including types of data, measures of central tendency, measures of spread, and representation of data. It discusses raw and grouped data, frequency tables, interpolation, modes, means, medians, quartiles, variance, standard deviation, histograms, box plots, and comparing data.

Uploaded by

lasnieyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

EDX S1 Lecture Note

Yan Jiaqi

Updated August 17, 2023

1 Basic Statistics
1. Type of Data.
Continuous Measured: Heights/Weights/…

Numerical
Type of data Discrete Counted: Number of pets…

Non-numerical Noun: Colors/Animals/…

2. Raw data v.s. Grouped data.

• Raw data is data that has not been processed for use.
• Grouped data is data that is organized into a number of groups.

3. Frequency table.

t (min) Frequency t (min) Frequency


300-349 3 300-350 3
350-399 6 350-400 6
400-449 10 400-450 10
450-499 7 450-500 7
500-549 5 500-550 5

Table 1: gapped Table 2: ungapped

For the table 1, gap = 350 − 349 = 1. For the table 2, gap = 0 (and hence is ungapped).

(a) Class. Each row of the frequency table is called a class.


(b) Class Boundary. Upper Boundary and Lower Boundary are extended outwards one half of the gap.
(c) Class Interval. LB ⩽ t < UB.
(d) Class Width = UB − LB.
(e) Class Mid-value = 12 (LB + UB).

4. Measure of central tendency.


A value describes the centre of a set of data.

1
raw data grouped data
Mode
the value occurs most often.
Modal class
n+1 the n2 -th value.
Median the 2 -th value
Interpolation.
∑ ∑
x x·f
Mean µ= µ= ∑ .
n f

• Interpolation.
To find some measures of locations for grouped data, Interpolation is used, it is assumed that within
each class, the data values are evenly distributed.
3 + 6 + 10 + 7 + 5
t (min) Frequency = 15.5
2
300-349 3 Position 9 15.5 19
350-399 6
400-449 10 Value 399.5 Median 449.5
450-499 7
500-549 5 15.5 − 9 19 − 9
=
Median − 399.5 449.5 − 399.5
5. Measure of other locations.
A value describes other positions of a set of data.

Min Q1 Q2 Q3 Max

25% 25% 25% 25%

raw data grouped data


n
Q1 : the 4 -th value.
n
Q2 : the 2 -th value.
Quartile Median-and-Median 3n
Q3 : the 4 -th value.
Interpolation.
Percentile Pt : the (t% · n)-th value. Interpolation.

6. Measures of Spread.
A value describes how spread out a set of data is.

raw data grouped data


Range Range = Max - Min
IQR IQR = Q3 − Q1
Interpercentile Range IPR=Pt − Ps
∑ ( ∑ )2 ∑ 2 ( ∑ )2
Variance (σ 2 )
σ2 = n
x2
− nx σ 2 = ∑x f·f − ∑x·f
f
Standard Deviation (σ)

2

• Sxx symbols: Sxx = (x − x̄)2 .
the subscript xx means the product of (x − x̄) and (x − x̄).
∑ ∑
Similarly, we have Sxy = (x − x̄)(y − ȳ), Syy = (y − ȳ)2 .

7. Coded data.

8. Combined set of data.

2 Representation of data
1. Histogram.

2. Box plot.

3. Stem and leaf diagrams.

4. Scatter diagram.

3 Analysis of data
1. Outliers.

2. Skewness.

3. Comparing data.

4 probability

5 Discrete Random variables

6 Normal distributions

You might also like