You are on page 1of 48

STAT 245

◮ Course : STAT 245 (Summer 2021)


◮ Class Room : Remotely
◮ Instructor : Dr. Osama Mohammad Al-Bataineh
1. PhD (Stat), Univ. of Sask.
◮ Office : U of Sask
Course Outline

◮ Textbook : Introduction to Applied Statistics for Psychology


Students, Gordon Sarty.
◮ Assignments : 15%.
◮ Midterms (mandatory) :
1. Quiz 1 : Tuesday July 13th, 2021.
2. Mid 1 : Thursday July 22nd, 2021.
3. Mid 2 : Tuesday August 10th, 2021.
◮ Final Exam : TBA, 50% (All course material).
Course Outline
◮ Chapters 1 : Background and Motivation.
◮ Chapters 2 : Descriptive statistics : frequency data (Counting).
◮ Chapters 3 : Descriptive statistics : central tendency and
dispersion.
◮ Chapter 4 : Probability and the Binomial distributions.
◮ Chapter 5 : The Normal distribution.
◮ Chapter 6 : Percentiles and quartiles.
◮ Chapter 7 : The Central limit theorem.
◮ Chapter 8 : Confidence intervals.
◮ Chapter 9 : Hypothesis testing.
◮ Chapter 10 : Comparing two population means.
◮ Chapter 11 : Comparing proportions.
◮ Chapter 12 : ANOVA.
◮ Chapter 14 : Correlation and Regression.
◮ Chapter 15 : Goodness of Fit and Contingency Tables.
◮ Chapter 16 : Non–parametric Tests.
Chapter 1
Background and Motivation
Section 1.1 : Overview

◮ Section 1.1 : Overview (slide 5).


◮ Section 1.2 : Basic Definitions (slide 7).
◮ Section 1.3 : Summation Convention (slide 10).
Chapter 1
Background and Motivation
Section 1.1 : Overview

◮ Statistics is the art of turning data into useful information.


◮ In a Statistical study, we collect, analyze and interpret data in
hand.
Chapter 1
Background and Motivation
Section 1.1 : Overview

◮ Descriptive statistics : The presentation, organization and


description of data (Graphs, means, standard deviations, etc.).
Descriptive statistics lead to ideas about probability - we will
cover probabilities as given by functions known as the binomial
distribution and the normal distribution.
◮ Inferential statistics : The use of probability to infer things
about a population from a sample through the use of hypothesis
testing. Why do we need inferential statistics? Because it is
usually impossible to measure (poll) an entire population.
Chapter 1
Background and Motivation
Section 1.2 : Basic Definitions
◮ The Data : The numbers we collect (Note the word data is
plural. Datum is singular.). Data may be grouped into sets,
hence data set.
◮ A Variable : A mathematical term used to denote something
that can take on a range of values. There are important two
types of variables:
1. Independent variable : You set the value, a.k.a.
explanatory variable.
2. Dependent variable : Value set (generally caused)
by the independent variable, a.k.a. outcome variable.
◮ The Random variable : A dependent variable with random
noise added. Value given by a stochastic process. We will only
refer to random variables when discussing the theoretical
relationship between probability distributions. Random
variables, which we will denote with capital letters like X , are
defined by their probability distribution. A stochastic process
produces values that form a probability distribution if you allow
the process that generates their values run for long enough.
Chapter 1
Background and Motivation
Section 1.2 : Basic Definitions

◮ Qualitative variables result when a single variable is measured


on a single experimental unit (eye color, score rank, ...).
◮ Quantitative variables result when two variables are measured
on a single experimental unit (weight, height, ...).
1. A discrete variable assumes only a countable
number of values (number of family members,
number of registered courses, ...).
2. A continuous variable assumes many values
corresponding to the points on a line interval
(temperature, time, ...).
Chapter 1
Background and Motivation
Section 1.2 : Basic Definitions

◮ Univariate data result when a single variable is measured on a


single experimental unit.
◮ Bivariate data result when two variables are measured on a
single experimental unit.
◮ Multivariate data result when more than two variables are
measured on a single experimental unit.
Chapter 1
Background and Motivation
Section 1.3 : Summation Convention
P
◮ The summation notation is (pronounced sigma).

◮ On page (15) :

n
X
xi = x1 + x2 + .... + xn (1)
i=1

n
!2
X
xi = (x1 + x2 + .... + xn )2
i=1
(2)
Section 1.3 : Summation Convention

n
X
xi2 = x12 + x22 + .... + xn2 (3)
i=1
Section 1.3 : Summation Convention

◮ Example : Annual salaries (in thousands) of four workers are 75,


90, 125, and 61, respectively. Find :
4 4
!2 4
X X X
a. xi b. xi c. xi2
i=1 i=1 i=1

◮ Sol : x1 = 75, x2 = 90, x3 = 125, x4 = 61.


4
X
a. xi = x1 + x2 + x3 + x4 = 75+90+125+61 = 351
i=1

4
!2
= (75 + 90 + 125 + 61)2 = (351)2 =123,201
X
b. xi
i=1

4
X
c. xi2 = 752 + 902 + 1252 + 612 = 33,071
i=1
Section 1.3 : Summation Convention

◮ Example :

m 12 15 20 30

f 5 9 10 16

◮ Find
X4 X
4 X
4 X
4
a. mi b. fi2 c. mi fi d. mi2 fi
i=1 i=1 i=1 i=1
Section 1.3 : Summation Convention

4
X
◮ Sol : a. mi = 12 + 15 + 20 + 30 = 77
i=1

4
X
◮ b. fi2 = 52 + 92 + 102 + 162 = 462
i=1

4
X
◮ c. mi fi = m1 f1 + m2 f2 + m3 f3 + m4 f4 =
i=1
12(5)+15(9)+20(10)+30(16) = 875

4
X
◮ d. mi2 fi = m12 f1 + m22 f2 + m32 f3 + m42 f4 =
i=1
122 (5) + 152 (9) + 202 (10) + 302 (16) = 21,145
Section 1.3 : Summation Convention

◮ Calculations are given in the following table

m f f2 mf m2 f
12 5 25 60 720
15 9 81 135 2025
20 10 100 200 4000
30 16 256 480 14,400

X
4 X
4 X
4 X
4 X
4
mi = 77 fi = 40 fi2 = 462 mi fi = 875 mi2 fi = 21, 145
i=1 i=1 i=1 i=1 i=1
Chapter 2
Descriptive Statistics :
Frequency Data (Counting)

◮ Section 2.1 : Frequency Tables (slide 17).


◮ Section 2.2 : Plotting Frequency Data (slide 29).
◮ Questions of Direct Relevance to Exam : slide 21, 23, 25, ??,
30, 38, 40, 45, 47.
Chapter 2
Descriptive Statistics :
Frequency Data (Counting)
Section 2.1 : Frequency Tables

◮ To determine a frequency table :


1. Determine the classes : define the classes based on
the number of groups you want.
2. To group data into classes :
◮ determine high data limit, H and the low data limit, L.
◮ compute the range R = H − L.
◮ compute the class width :

R+1
W = (4)
G
where G is the number of groups (classes).
Chapter 2
Descriptive Statistics :
Frequency Data (Counting)
Section 2.1 : Frequency Tables

3. Begin the frequency table’s first two columns :


◮ Construction of frequency table

Class Class Boundaries

L to (L + W − 1) (L − 0.5) to (L − 0.5 + W )
(L + W ) to (L + 2W − 1) (L − 0.5 + W ) to (L − 0.5 + 2W )

.. ..
. .
(H + 0.5 − W ) to (H + 0.5)
Chapter 2
Descriptive Statistics :
Frequency Data (Counting)
Section 2.1 : Frequency Tables

4. Construct the frequency table and fill it :


◮ Construction of frequency table

Class Class Tally Frequency Cumulative Relative


Boundaries Frequency Frequency

a a a/n
b a+b b/n
c a+b+c c/n
.. ..
. .
n
Section 2.1 : Frequency Tables

◮ To calculate relative frequency and percentage of a class

Frequency of that class f


Relative frequency of a class = = P .
Sum of all frequencies f
(5)

Percentage = (Relative frequency) .100% (6)

◮ A class boundary is given by the midpoint of the upper limit of


one class and the lower limit of the next class.
Question of Direct Relevance to Exam

◮ Example 2.1 : 25 army inductees were tested for blood type.


The data are :
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A

Construct a frequency table.


Solution

◮ Solution : Step 1 : classes are given : A, B, O and AB. Then


Step 2 : construct the frequency table :

Construction of frequency table

Class Tally Frequency Cumulative Relative


Frequency Frequency

A ||||| 5 5 5/25 = 0.20


B ||||||| 7 12 7/25 = 0.28
O ||||||||| 9 21 9/25 = 0.36
AB |||| 4 25 4/25 = 0.16
Question of Direct Relevance to Exam

◮ Example 2.2 : Given the high temperature data for each of 50


states for the month of July :

112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114

Construct the frequency table using seven classes.


Solution
◮ Solution : Step 1 :
1. High limit (H) = 134 and Low limit (L) = 100.
2. Range (R) = H - L =134 - 100 = 34.
R+1 34 + 1
3. Class width (W) = = =5
G 7
◮ Step 2 : Frequency Table :
Construction of frequency table

Class Class Tally Frequency Cumulative Relative


Boundaries Frequency Frequency

100–104 99.5 to 104.5 || 2 2 0.04


105–109 104.5 to 109.5 |||||||| 8 10 0.16
110–114 109.5 to 114.5 etc. 18 28 0.36
115–119 114.5 to 119.5 13 41 0.26
120–124 119.5 to 124.5 7 48 0.14
125–129 124.5 to 129.5 1 49 0.02
130–134 129.5 to 134.5 1 50 0.02
Question of Direct Relevance to Exam

◮ To construct a frequency distribution, we need to find the class


width :

Largest value - Smallest value + 1


Class width = . (7)
Number of classes

◮ Example: Look at the following set of data, then construct a


frequency distribution table.
8 25 11 15 29 22 10 5 17 21
22 13 26 16 18 12 9 26 20 16
23 14 19 23 20 16 27 16 21 14

29 − 5 + 1
Approximate class width = = 5.
5
Solution

◮ Table : Frequency distribution for the data on iPods sold

iPods Sold f

5-9 f1 = 3
10 - 14 f2 = 6
15 - 19 f3 = 8
20 - 24 f4 = 8
25 - 29 f5 = 5

X
5
fi = 30
i=1
Solution

◮ Table : Relative frequency and percentage distributions

iPods Sold Class Boundaries Relative Frequency Percentage


3
5-9 4.5 to 9.5 30
= .100 10.0

10 - 14
15 - 19
20 - 24
25 - 29


Solution

◮ Table : Relative frequency and percentage distributions

iPods Sold Class Boundaries Relative Frequency Percentage


3
5-9 4.5 to 9.5 30
= 0.100 10.0
6
10 - 14 9.5 to 14.5 30
= 0.200 20.0
8
15 - 19 14.5 to 19.5 30
= 0.267 26.7
8
20 - 24 19.5 to 24.5 30
= 0.267 26.7
5
25 - 29 24.5 to 29.5 30
= 0.167 16.7


Section 2.2 : Plotting Frequency Data

◮ There are several ways of presenting the same data graphically,


the primary way being the histogram :
1. Histogram : plot of frequency data using steps.
2. Frequency polygon : plot of frequency data using
straight lines.
3. Cumulative frequency graph : the cumulative
frequency graph shows the “area under the curve” (of
the traditional histogram) from the beginning of the
first class up to the given point.
4. Pie charts : a pie chart is a round histogram.
5. Stem and leaf plots : a stem and leaf plot is a fancy
kind of histogram that lets you see all your data
instead of just class frequency information.
Question of Direct Relevance to Exam
◮ Example 2.3 : continuing on data given in Ex 2.1 (slide 21).
The data are :
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A

We will demonstrate most of the graph types using these data.

Class Frequency Cumulative Relative


Frequency Frequency

A 5 5 5/25 = 0.20
B 7 12 7/25 = 0.28
O 9 21 9/25 = 0.36
AB 4 25 4/25 = 0.16
Solution
Histograms
1. Histogram :

10
8
6
Frequency

4
2
0

A B O AB

Class
Solution
Histograms
1. Histogram :
1.0
0.8
Relative Frequency

0.6
0.4
0.2
0.0
A B O AB
Class
Solution
Frequency Polygons
2. Frequency Polygons :

10
8
6
Frequency

4
2
0

A B O AB

Class
Solution
Frequency Polygons
2. Frequency Polygons :

1.0
0.8
Relative Frequency

0.6
0.4
0.2
0.0

A B O AB

Class
Solution
Cumulative frequency Graph
3. Cumulative Frequency Graph :

30
25
cumulative Frequency

20
15
10
5
0

A B O AB

Class
Solution
Pie Chart

4. Pie Chart :

Construction of frequency table

Class Angle

A 0.20 × 360◦ = 72◦


B 0.28 × 360◦ = 100.8◦
O 0.36 × 360◦ = 129.6◦
AB 0.16 × 360◦ = 57.6◦
Solution
Pie Chart
4. Pie Chart : Pie Chart of Blood Type Data

AB

O
Question of Direct Relevance to Exam

◮ Example :

Rating Frequency Relative Frequency Percent Angle


36
A 36 400
= 0.09 9% 0.09×360o = 32.4o
260
B 260 400
= 0.65 65% 0.65×360o = 234.0o
92
C 92 400
= 0.23 23% 0.23×360o = 82.8o
12
D 12 400
= 0.03 3% 0.03×360o = 10.8o
Solution
◮ A pie chart is a circular graph that shows how the
measurements are distributed among the categories.

Pie Chart of Education Quality Rates

C
Question of Direct Relevance to Exam
The histogram for data on Slide 25

8
6
Frequency

4
2
0

5−9 10−14 15−19 20−24 25−29

iPods Sold
Relative Frequency

0.00 0.05 0.10 0.15 0.20 0.25


5−9
10−14

Solution
iPods Sold

15−19
20−24
25−
29
Solution
Histograms have different shapes :
5

6
4
Frequency

Frequency
3

4
2

2
1
0

0
Symmetric Right Skewed
6

0.8
Frequency

Frequency
4

0.4
2

0.0
0
Left Skewed Uniform
Solution
This polygon is for the Histogram on Slide 40

10
8
6
Frequency

4
2
0

5−9 10−14 15−19 20−24 25−29


Section 2.2 : Plotting Frequency Data
Stem and Leaf Plots

5. Stem and Leaf : the steps for making a stem and leaf plot are:
5.1 Order the data (this is a frequently used, tedious, step
for many procedures as we’ll see).
5.2 Divide into classes of 10’s or 5’s (low decade and
high decade).
5.3 Use “leading” and “trailing” digits of the data values to
make the plot.
Question of Direct Relevance to Exam

◮ Example 2.4 : Given the following classes :


50 − 54, 55 − 59, 60 − 64, 65 − 69, 70 − 74, 75 − 79
or equivalently, divide the classes into 5’s and the data in order
(i.e. with the tedious ordering step 1 already done).
◮ Solution :
1. 50-54 : 50,51,51,52,53,53.
2. 55-59 : 55,55,56,57,57,58,59.
3. 60-64 : 62,63.
4. 65-69 : 65,65,66,66,67,68,69,69.
5. 70-74 : 72,73.
6. 75-79 : 75,75,77,78,79.
Solution
Stem and Leaf Plots

◮ Solution :

Stem and Leaf Method

Stem Leaf

50-54 011233
55-59 5567789
60-64 23
65-69 55667899
70-74 23
75-79 55789
Question of Direct Relevance to Exam

◮ Example :
75 52 80 96 65 79 71 87 93 95 69 72
81 61 76 86 79 68 50 92 83 84 77 64
71 87 72 92 57 98
Solution
A stem-and-leaf display :

5|207
6|59184
7|591269712
8|0716347
9|635228

You might also like