Professional Documents
Culture Documents
INTRODUCTION TO STATISTICS
Presenting Collecting
STATISTICS
Analyzing Organizing
Interpreting
1. Descriptive Statistics
→ Consists of methods for organizing, displaying, and describing
data by using tables, graphs, and summary measures.
→ Deals with the description and analysis of a given group of data.
→ Present information in a convenient, usable and comprehensible
form.
2. Inductive Statistics (Inferential Statistics)
→ Consists of methods that use sample results to make decisions
or predictions about a population.
→ Deals with the problems of making inferences or drawing
conclusions about population based on information obtained from
the samples taken from the population.
1
PURPOSE OF STATISTICS
2
Statistic and Parameter
• A summary measure such as mean, median, mode or standard
deviation, computed from sample data is called a statistic.
• A summary measure for the entire population is called a parameter.
• Statisticians often estimate population parameters from the
corresponding sample statistics.
TYPES OF VARIABLES
Discrete Continuous
• Numerical response which • Numerical response which
arises from a counting arises from a measuring
process. process.
• E.g. How many mobile phones • E.g. What is your weight?
do you have?
3
DATA PRESENTATION
Raw data
• Data collected that have not been organized or processed are called raw
data.
• When every observed value of the random variable is listed, the data are
called ungrouped data.
• Grouping is one of the most common methods of organizing data. When
we group data, we are actually constructing frequency distributions for the
raw data.
Frequency Distribution
Example
The frequency distribution below represents the number of books read by 500
students in a school during one year:
Number of books read Number of students (Frequency)
0–9 52
10 – 19 63
20 – 29 71
30 – 39 96
40 – 49 43
50 – 59 58
60 – 79 72
80 – 99 45
4
The followings are some guidelines for the construction of frequency
distributions, not as absolute rules.
*class (exclusive type) is mainly used for continuous data or discrete data
which have been rounded to the nearest tens, hundreds, thousands, millions
etc.
**class (inclusive type) is mainly used for discrete data where there is a gap
between classes.
Example
The following is a record of the number of books borrowed per week in the
library for 30 weeks: -
21 47 64 42 89 76 55 100 75 67
89 15 97 25 35 12 92 36 93 34
87 27 74 21 66 25 47 10 89 30
5
Tabulate the data in the form of a frequency distribution, grouping by suitable
class size.
Solution:
The variable is the number of books borrowed per week which is discrete.
log n log 30
Number of classes: k = = 4.9069 Use k = 5
log 2 log 2
Class size: Lowest value = 10; highest value = 100
( H − L) (100 − 10)
i = = 18 Use i = 20
k 5
Frequency distribution for the number of books borrowed per week in the library
for 30 weeks:
Example
The amount of rainfall (in cm) for a small town was recorded for the month of
December.
Construct a grouped frequency distribution for the data using suitable class
size.
Solution:
The variable is the amount of rainfall which is continuous.
log n log 31
Number of classes: k = = 4.9542 Use k = 5
log 2 log 2
6
Class size: Lowest value = 19.22; highest value = 33.01
( H − L) (33.01 − 19.22)
i = = 2.758 Use i = 3
k 5
Frequency distribution for the amount of rainfall in the month of December:
Class limits- the smallest and largest possible measurements in each class,
i.e. the upper and lower limits are known as class limits.
Class size/ class width = upper class boundary – lower class boundary. An
exception is opening and closing classes to include extreme values.
Opened ended classes- one boundary is not specified e.g. below 20; 50 and
above. In further calculation, assume to be of the same size as the immediate
neighboring class.
Class mark or class mid-point – the value exactly at the middle of a class. It
lies half way between the class limits or the class boundaries.
7
Example
Class Class boundaries Class size Class mark
10 – 29 9.5 – 29.5 29.5 – 9.5 =20 19.5
30 – 49 29.5 – 49.5 49.5 – 29.5=20 39.5
50 – 69 49.5 – 69.5 69.5 – 49.5=20 59.5
70 – 89 69.5 – 89.5 89.5 – 69.5=20 79.5
90 – 109 89.5 – 109.5 109.5 –89.5=20 99.5
Example
Class Class boundaries Class size Class mark
19 – < 22 19 – 22 22 – 19 = 3 20.5
22 – < 25 22 – 25 25 – 22 = 3 23.5
25 – < 28 25 – 28 28 – 25 = 3 26.5
28 – < 31 28 – 31 31 – 28 = 3 29.5
31 – < 34 31 – 34 34 – 31 = 3 32.5
8
Histogram
Example
Construct a histogram for the frequency distribution of the number of books
borrowed per week in the library for 30 weeks:
Number of books Number of weeks
10 – 29 8
30 – 49 7
50 – 69 4
70 – 89 7
90 – 109 4
Total 30
Solution:
9
Example
Construct a histogram for the frequency distribution of the amount of rainfall in
the month of December:
Amount of rainfall (cm) Number of days
19 - < 22 4
22 - < 25 10
25 - < 28 12
28 - < 31 4
31 - < 34 1
Total 31
Solution:
• For frequency distribution of unequal class size, the height of each bar is
drawn proportional to the adjusted frequency of each bar where
10
Example
Construct a histogram for the frequency distribution of sales of 46 branches of
a company in the course of one week.
Sales (units) Number of branches
0 – 99 10
100 – 199 18
200 – 299 8
300 – 499 6
500 – 699 4
Solution:
Sales (units) Number of branches Class Class *Adjusted
(frequency) boundaries size frequency
0 – 99 10 - 0.5 – 99.5 100 10
100 – 199 18 99.5 – 199.5 100 18
200 – 299 8 199.5 – 299.5 100 8
300 – 499 6 299.5 – 499.5 200 3
500 – 699 4 499.5 – 699.5 200 2
100 × 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 = , 𝑤ℎ𝑒𝑟𝑒 𝑡ℎ𝑒 𝑐𝑜𝑚𝑚𝑜𝑛 𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒 𝑖𝑠 100
𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒
• If the peak of the histogram lies at the centre of the distribution with two
slopes virtually identical, the distribution is said to be symmetrical, or not
skewed.
12
1. “Less than” cumulative frequency distribution
A table showing the total frequency of all values less than the upper class
boundary of each class is called a “less than” cumulative frequency distribution.
Example
Number of Number of Class ‘<’ Cum. Freq. table
books weeks (freq.) boundaries No. of books Cum. freq.
< 9.5 0
10 – 29 8 9.5 – 29.5 < 29.5 8
30 – 49 7 29.5 – 49.5 < 49.5 15
50 – 69 4 49.5 – 69.5 < 69.5 19
70 – 89 7 69.5 – 89.5 < 89.5 26
90 – 109 4 89.5 – 109.5 < 109.5 30
upper class
boundaries
Example
Class ‘<’ Cum. Freq. table
Amount of Number of
boundaries Amount of rainfall Cum. freq.
rainfall (cm) days (freq.)
(cm)
< 19 0
19 - < 22 4 19 – 22 < 22 4
22 - < 25 10 22 – 25 < 25 14
25 - < 28 12 25 – 28 < 28 26
28 - < 31 4 28 – 31 < 31 30
31 - < 34 1 31 – 34 < 34 31
upper class
boundaries
13
Ogives (Cum. Freq. Polygon/ Cum. Freq. Curve)
1. “Less than” ogive showing the cumulative frequency less than the upper
class boundary plotted against the upper class boundary of any class.
2. “More than” ogive showing the cumulative frequency more than or equal
to the lower class boundary plotted against the lower class boundary of any
class.
Example
The following table shows the output produced by 20 employees in an hour in
a factory.
Output (units) Number of employees
1–5 1
6 – 10 2
11 – 15 3
16 – 20 9
21 – 25 5
Construct a ‘less than’ cumulative frequency distribution and plot a ‘less than’
ogive. Hence estimate
(i) the number of employees producing output less than 13 units
(ii) the proportion of employees producing output more than 22 units
(iii) the number of units of output which will be exceeded by 90% of the
employees
(iv) the number of employees producing output between 8 and 18 units.
Solution:
Number of Class ‘<’ Cum. Freq. table
Output (units)
employees (freq.) boundaries Output (units) Cum. freq.
< 0.5 0
1–5 1 0.5 – 5.5 < 5.5 1
6 – 10 2 5.5 – 10.5 < 10.5 3
11 – 15 3 10.5 – 15.5 < 15.5 6
16 – 20 9 15.5 – 20.5 < 20.5 15
21 – 25 5 20.5 – 25.5 < 25.5 20
14
'<' Ogive of output produced by 20 employees
20
18
16
14
Cumulative Frequency
12
10
0
0.5
10.5 5.515.5 20.5 25.5
Output
From the ‘<’ ogive, we can estimate
(i) the number of employees producing output less than 13 units to be 4.5.
(ii) the proportion of employees producing output more than 22 units to be
20−16.5 3.5
= 20 = 0.175
20
(iii) the number of units of output which will be exceeded by 90% of the employees to be
x units.
90% of the employees are producing more than x units
→ 10% of the other employees (10% x 20= 2 employees) are producing less than
x units. From the ‘<’ ogive, x = 8 units.
(iv) the number of employees producing output between 8 and 18 units to be 10.5 - 2 =
8.5.
15
BAMS1753 FINANCIAL MATHEMATICS
TUTORIAL 1 (Introduction to Statistics and Data Presentation)
(a) Tabulate the above data in the form of a frequency distribution, using
160 - <165 as the first class, 165 - <170 as the second class and so
on.
(b) Draw a histogram for the above data.
(c) Construct a “less than” cumulative frequency distribution.
(d) Draw a “less than” cumulative frequency polygon (ogive).
(e) Using the ogive in part (d), estimate:
(i) the height which will be exceeded by 25% of the employees.
(ii) the number of employees who have heights less than 175 cm.
(iii) the proportion of employees who have heights exceeding 175
cm.
16
3. The following table shows the gross profit of a random sample of 500
small companies in a year.
4. The following data shows the number of rejects from the assembly line
of a local manufacturer recorded for a period of 80 days:
17
5. The following cumulative frequency distribution shows the duration of each
telephone call made by an employee recorded for a period of one month:
(a) Draw the ogive for the above cumulative frequency distribution.
(b) Use the ogive to estimate:
(i) the number of calls that lasted between 5 and 10 minutes;
(ii) the duration not exceeded by 90% of the calls.
(c) Redraft the above data in the form of frequency distribution and
construct a histogram.
Answers:
2. (e) (i) 185.5 cm. (ii) 23 (iii) 0.7294
3. (c) (i) 100 (ii) 0.865
4. (b) (i) 26.5 days (ii) 24 rejects
5. (b) (i) 68 calls (ii) 14 min.
18