You are on page 1of 18

BAMS1753 FINANCIAL MATHEMATICS

CHAPTER 1: INTRODUCTION TO STATISTICS AND DATA


PRESENTATION

INTRODUCTION TO STATISTICS

Statistics represent scientific procedures and methods for collecting,


organizing, summarizing, presenting and analyzing data, as well as
drawing valid conclusions and making reasonable decisions based on the
analysis. However, the figures that result from statistical analysis are also
referred to as “statistics”.

Presenting Collecting

STATISTICS

Analyzing Organizing

Interpreting

The field of statistics is divided into two categories:

1. Descriptive Statistics
→ Consists of methods for organizing, displaying, and describing
data by using tables, graphs, and summary measures.
→ Deals with the description and analysis of a given group of data.
→ Present information in a convenient, usable and comprehensible
form.
2. Inductive Statistics (Inferential Statistics)
→ Consists of methods that use sample results to make decisions
or predictions about a population.
→ Deals with the problems of making inferences or drawing
conclusions about population based on information obtained from
the samples taken from the population.
1
PURPOSE OF STATISTICS

• Statistical techniques are used extensively by marketing managers,


accountants, consumers, educators, politicians, physicians, etc.
• Statistical techniques are used to make many decisions that affect our
lives. Regardless what your future line of work is, you will make
decisions that involved data.
• Some problems, which are the concern of management and for which
statistical methods are appropriate, are given below:
(a) Stock control: Carrying too large a stock means idle capital and
unnecessary costs of storage. Too small a stock means that materials
are not available when required, resulting perhaps in lost sales. The
“right” amount is a matter of considerable importance.
(b) Market research: The businessman, in order to be successful, requires
selling what his customers want in “right” types and quantities, when and
where.
(c) Sales trend: Been able to project future demand is vital for future
planning, statistical methods are preferable to optimistic guesses/
judgement.
(d)The relationship between costs and methods of production: Data on
costs, revenue and profit are important input into the decision making
process.

• The reasons for learning statistics are the following:

1. To know how to properly present and describe information.


2. To know how to obtain reliable forecasts of variables of interest.
3. To know how to draw conclusions about large populations based on
information obtained from samples.

Population and Sample


Population: A set of all items under observation.

Sample: A set of items selected


from the population. A subset of a
population.

2
Statistic and Parameter
• A summary measure such as mean, median, mode or standard
deviation, computed from sample data is called a statistic.
• A summary measure for the entire population is called a parameter.
• Statisticians often estimate population parameters from the
corresponding sample statistics.

TYPES OF VARIABLES

Variable measures the characteristics of the population that the researcher


wants to study.
Variable
• The characteristics of the population of
interest
• E.g. monthly income of respondents,
respondents’ age, gender, level of
education, number of children and type of
house owned

Quantitative or Numerical Qualitative or Attributive


• Measured on numerical scale • Measured on non-numerical
• Yields numerical response scale
• E.g. How tall are you? The • Yields categorical response
answer is numerical. • E.g. Are you a Malaysian? The
answer is only “Yes” or “No”.

Discrete Continuous
• Numerical response which • Numerical response which
arises from a counting arises from a measuring
process. process.
• E.g. How many mobile phones • E.g. What is your weight?
do you have?

3
DATA PRESENTATION

Raw data

• Data collected that have not been organized or processed are called raw
data.
• When every observed value of the random variable is listed, the data are
called ungrouped data.
• Grouping is one of the most common methods of organizing data. When
we group data, we are actually constructing frequency distributions for the
raw data.

Frequency Distribution

• A frequency distribution is a table in which possible values for a variable


are grouped into non –overlapping classes, and the number of observed
values which fall into each class is recorded.
• Data organized in a frequency distribution are called grouped data.

Example
The frequency distribution below represents the number of books read by 500
students in a school during one year:
Number of books read Number of students (Frequency)
0–9 52
10 – 19 63
20 – 29 71
30 – 39 96
40 – 49 43
50 – 59 58
60 – 79 72
80 – 99 45

The variable is number of books read.


The data (number of books read) are grouped into 8 classes.

4
The followings are some guidelines for the construction of frequency
distributions, not as absolute rules.

• Classes / class intervals set up should be non-overlapping and no double


counting.
• Normally, the number of classes should not be less than 5 or more than
15.
• The number of classes, k, can be estimated based on the formula:
log n
k
log 2 where n = number of observations
• Use equal class sizes/ widths whenever possible.
• The class size / width, i can be determined as:
( H − L)
i
k where H = the highest data value;
L = the lowest data value.
• The start point should be a little smaller than the lowest value. If possible,
it should be an even multiple of the class size.

Some common practices for classes:

*Class ** Class Class


(exclusive type) (inclusive type) (open-ended)
0 - < 10 or 0 - 10 0– 9 Below 20
10 - < 20 10 - 20 10 – 19 20 - < 30
20 - < 30 20 - 30 20 – 29 30 - < 40
30 - < 40 30 - 40 30 – 39 40 - < 50
40 - < 50 40 - 50 40 – 49 50 and above

*class (exclusive type) is mainly used for continuous data or discrete data
which have been rounded to the nearest tens, hundreds, thousands, millions
etc.
**class (inclusive type) is mainly used for discrete data where there is a gap
between classes.

Example
The following is a record of the number of books borrowed per week in the
library for 30 weeks: -
21 47 64 42 89 76 55 100 75 67
89 15 97 25 35 12 92 36 93 34
87 27 74 21 66 25 47 10 89 30
5
Tabulate the data in the form of a frequency distribution, grouping by suitable
class size.

Solution:
The variable is the number of books borrowed per week which is discrete.
log n log 30
Number of classes: k  = = 4.9069  Use k = 5
log 2 log 2
Class size: Lowest value = 10; highest value = 100
( H − L) (100 − 10)
i  = = 18  Use i = 20
k 5
Frequency distribution for the number of books borrowed per week in the library
for 30 weeks:

Number of books Tally count Number of weeks


10 – 29
30 – 49
50 – 69
70 – 89
90 – 109
Total 30

Example
The amount of rainfall (in cm) for a small town was recorded for the month of
December.

20.42 21.06 22.40 21.117 22.6 33.01 22.89


22.9 30.34 25.61 23 24.5 26.881 24.49
23.7 28 25.0 25.69 27.14 26.321 27.216
19.22 29.6 26.5 24.15 24.18 26.4 25
25.7 28 25.556

Construct a grouped frequency distribution for the data using suitable class
size.

Solution:
The variable is the amount of rainfall which is continuous.
log n log 31
Number of classes: k  = = 4.9542  Use k = 5
log 2 log 2

6
Class size: Lowest value = 19.22; highest value = 33.01
( H − L) (33.01 − 19.22)
i = = 2.758 Use i = 3
k 5
Frequency distribution for the amount of rainfall in the month of December:

Amount of rainfall (cm) Tally count Number of days


19 - < 22
22 - < 25
25 - < 28
28 - < 31
31 - < 34
Total 31

Basic components of a frequency distribution:

Class limits- the smallest and largest possible measurements in each class,
i.e. the upper and lower limits are known as class limits.

Class boundaries- the dividing lines between successive classes.

Class size/ class width = upper class boundary – lower class boundary. An
exception is opening and closing classes to include extreme values.

Opened ended classes- one boundary is not specified e.g. below 20; 50 and
above. In further calculation, assume to be of the same size as the immediate
neighboring class.

Class mark or class mid-point – the value exactly at the middle of a class. It
lies half way between the class limits or the class boundaries.

𝒍𝒐𝒘𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒍𝒊𝒎𝒊𝒕 + 𝒖𝒑𝒑𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒍𝒊𝒎𝒊𝒕


𝑪𝒍𝒂𝒔𝒔 𝒎𝒂𝒓𝒌 =
𝟐
or
𝒍𝒐𝒘𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒃𝒐𝒖𝒏𝒅𝒂𝒓𝒚 + 𝒖𝒑𝒑𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒃𝒐𝒖𝒏𝒅𝒂𝒓𝒚
𝑪𝒍𝒂𝒔𝒔 𝒎𝒂𝒓𝒌 =
𝟐

7
Example
Class Class boundaries Class size Class mark
10 – 29 9.5 – 29.5 29.5 – 9.5 =20 19.5
30 – 49 29.5 – 49.5 49.5 – 29.5=20 39.5
50 – 69 49.5 – 69.5 69.5 – 49.5=20 59.5
70 – 89 69.5 – 89.5 89.5 – 69.5=20 79.5
90 – 109 89.5 – 109.5 109.5 –89.5=20 99.5

1st class 2nd class 3rd class


class marks 19.5 39.5 59.5
  
...] [//////// • //// ///] [/////// • ////////] [/////// • ////////][…
class limits 10 29 30 49 50 69 70

class 9.5 29.5 49.5 69.5


boundaries

Example
Class Class boundaries Class size Class mark
19 – < 22 19 – 22 22 – 19 = 3 20.5
22 – < 25 22 – 25 25 – 22 = 3 23.5
25 – < 28 25 – 28 28 – 25 = 3 26.5
28 – < 31 28 – 31 31 – 28 = 3 29.5
31 – < 34 31 – 34 34 – 31 = 3 32.5

1st class 2nd class 3rd class


class marks 20.5 23.5 26.5
  
[///////// /////////)[//////// //////////)[///////// • /////////)[……
• •
class limits 19 22 25 28
class 19 22 25 28
boundaries

8
Histogram

• Histogram is a graphical representation of the frequency distribution.


• A bar is drawn for each class and the area of each bar is proportional to the
class frequency. The bars are drawn adjacent to another. Class boundaries
are graduated on the horizontal axis.
• For frequency distribution with equal class size, the height of each bar is
drawn proportional to the actual frequency of each class and the width of
each bar extends from the lower class boundary to the upper class boundary
of the class.

Example
Construct a histogram for the frequency distribution of the number of books
borrowed per week in the library for 30 weeks:
Number of books Number of weeks
10 – 29 8
30 – 49 7
50 – 69 4
70 – 89 7
90 – 109 4
Total 30

Solution:

9
Example
Construct a histogram for the frequency distribution of the amount of rainfall in
the month of December:
Amount of rainfall (cm) Number of days
19 - < 22 4
22 - < 25 10
25 - < 28 12
28 - < 31 4
31 - < 34 1
Total 31

Solution:

• For frequency distribution of unequal class size, the height of each bar is
drawn proportional to the adjusted frequency of each bar where

𝑐𝑜𝑚𝑚𝑜𝑛 𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒 × 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦


𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 =
𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒

10
Example
Construct a histogram for the frequency distribution of sales of 46 branches of
a company in the course of one week.
Sales (units) Number of branches
0 – 99 10
100 – 199 18
200 – 299 8
300 – 499 6
500 – 699 4

Solution:
Sales (units) Number of branches Class Class *Adjusted
(frequency) boundaries size frequency
0 – 99 10 - 0.5 – 99.5 100 10
100 – 199 18 99.5 – 199.5 100 18
200 – 299 8 199.5 – 299.5 100 8
300 – 499 6 299.5 – 499.5 200 3
500 – 699 4 499.5 – 699.5 200 2
100 × 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 = , 𝑤ℎ𝑒𝑟𝑒 𝑡ℎ𝑒 𝑐𝑜𝑚𝑚𝑜𝑛 𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒 𝑖𝑠 100
𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒

Vertical axis label Types of Histogram


No. of observations Frequency Histogram
Proportion of observations Relative Freq. Histogram
Percentage of observations Percentage Histogram
11
• The term skewness is used to describe the shape of a frequency
distribution.

• Positive skewness • Negative skewness


The peak of the histogram lies to the The peak of the histogram lies to the
left of the centre of the distribution. right of the centre of the distribution.

• If the peak of the histogram lies at the centre of the distribution with two
slopes virtually identical, the distribution is said to be symmetrical, or not
skewed.

Cumulative Frequency Distribution

• Given a frequency distribution, a cumulative frequency distribution can be


derived by the addition of the frequencies of the successive classes.
• There are two types of cumulative frequency distributions:

12
1. “Less than” cumulative frequency distribution
A table showing the total frequency of all values less than the upper class
boundary of each class is called a “less than” cumulative frequency distribution.

2. “More than” cumulative frequency distribution


A table showing the total frequency of all values more than or equal to the lower
class boundary of each class is called a “more than” cumulative frequency
distribution.

* In examination, only ‘less than’ cumulative frequency distribution will be


included.

Example
Number of Number of Class ‘<’ Cum. Freq. table
books weeks (freq.) boundaries No. of books Cum. freq.
< 9.5 0
10 – 29 8 9.5 – 29.5 < 29.5 8
30 – 49 7 29.5 – 49.5 < 49.5 15
50 – 69 4 49.5 – 69.5 < 69.5 19
70 – 89 7 69.5 – 89.5 < 89.5 26
90 – 109 4 89.5 – 109.5 < 109.5 30

upper class
boundaries

Example
Class ‘<’ Cum. Freq. table
Amount of Number of
boundaries Amount of rainfall Cum. freq.
rainfall (cm) days (freq.)
(cm)
< 19 0
19 - < 22 4 19 – 22 < 22 4
22 - < 25 10 22 – 25 < 25 14
25 - < 28 12 25 – 28 < 28 26
28 - < 31 4 28 – 31 < 31 30
31 - < 34 1 31 – 34 < 34 31

upper class
boundaries

13
Ogives (Cum. Freq. Polygon/ Cum. Freq. Curve)

• Ogive is a line chart of a cumulative frequency distribution.


• There are two types of ogives:

1. “Less than” ogive showing the cumulative frequency less than the upper
class boundary plotted against the upper class boundary of any class.

2. “More than” ogive showing the cumulative frequency more than or equal
to the lower class boundary plotted against the lower class boundary of any
class.

*In examination, only ‘less than’ ogive will be included.

Example
The following table shows the output produced by 20 employees in an hour in
a factory.
Output (units) Number of employees
1–5 1
6 – 10 2
11 – 15 3
16 – 20 9
21 – 25 5

Construct a ‘less than’ cumulative frequency distribution and plot a ‘less than’
ogive. Hence estimate
(i) the number of employees producing output less than 13 units
(ii) the proportion of employees producing output more than 22 units
(iii) the number of units of output which will be exceeded by 90% of the
employees
(iv) the number of employees producing output between 8 and 18 units.

Solution:
Number of Class ‘<’ Cum. Freq. table
Output (units)
employees (freq.) boundaries Output (units) Cum. freq.
< 0.5 0
1–5 1 0.5 – 5.5 < 5.5 1
6 – 10 2 5.5 – 10.5 < 10.5 3
11 – 15 3 10.5 – 15.5 < 15.5 6
16 – 20 9 15.5 – 20.5 < 20.5 15
21 – 25 5 20.5 – 25.5 < 25.5 20

14
'<' Ogive of output produced by 20 employees
20

18

16

14
Cumulative Frequency

12

10

0
0.5
10.5 5.515.5 20.5 25.5
Output
From the ‘<’ ogive, we can estimate
(i) the number of employees producing output less than 13 units to be 4.5.
(ii) the proportion of employees producing output more than 22 units to be
20−16.5 3.5
= 20 = 0.175
20
(iii) the number of units of output which will be exceeded by 90% of the employees to be
x units.
90% of the employees are producing more than x units
→ 10% of the other employees (10% x 20= 2 employees) are producing less than
x units. From the ‘<’ ogive, x = 8 units.
(iv) the number of employees producing output between 8 and 18 units to be 10.5 - 2 =
8.5.

15
BAMS1753 FINANCIAL MATHEMATICS
TUTORIAL 1 (Introduction to Statistics and Data Presentation)

1. The data below are the marks obtained by 40 students in an


examination.
62 54 38 33 80 66 56 60 68 52
57 71 85 47 50 71 52 76 49 69
48 68 55 49 79 41 61 65 75 81
64 58 66 59 52 43 65 48 41 56

(a) Construct a frequency distribution table using 30 – 39 as the first


class, 40 – 49 as the second class and so on.
(b) Draw a histogram for the above data.
(c) Construct a “less than” cumulative frequency distribution.
(d) Draw a “less than” cumulative frequency polygon.

2. The following data is the heights (in nearest centimeters) of 85


employees in a company:
169 179 183 186 166 181 177 173 167 193 176
183 162 170 186 174 188 165 168 174 170 176
186 177 185 175 179 166 190 182 182 180 194
177 184 175 168 181 180 172 178 192 175 189
180 175 183 191 172 188 180 176 185 178 179
173 165 170 178 181 181 189 187 191 179 196
179 182 171 169 171 184 198 182 175 190 187
176 164 187 167 185 177 184 178

(a) Tabulate the above data in the form of a frequency distribution, using
160 - <165 as the first class, 165 - <170 as the second class and so
on.
(b) Draw a histogram for the above data.
(c) Construct a “less than” cumulative frequency distribution.
(d) Draw a “less than” cumulative frequency polygon (ogive).
(e) Using the ogive in part (d), estimate:
(i) the height which will be exceeded by 25% of the employees.
(ii) the number of employees who have heights less than 175 cm.
(iii) the proportion of employees who have heights exceeding 175
cm.

16
3. The following table shows the gross profit of a random sample of 500
small companies in a year.

Gross Profit ($thousand) Percentages of companies


Under 10 8
10 and under 20 22
20 and under 30 36
30 and under 40 18
40 and under 60 10
60 and under 90 6

(a) Draw a histogram.


(b) Construct a “less than” cumulative frequency distribution.
(c) Plot a “less than” ogive and use it to estimate
(i) the number of small companies which earned at least $38,000 of
gross profit;
(ii) the proportion of small companies which earned less than $45,000
of gross profits.

4. The following data shows the number of rejects from the assembly line
of a local manufacturer recorded for a period of 80 days:

Number of rejects Number of days


0–4 1
5–9 14
10 – 14 23
15 – 19 20
20 – 24 16
25 – 29 6

(a) Draw a histogram for the data.


(b) Construct a “less than” cumulative frequency distribution and plot a
“less than” cumulative frequency polygon. Use the graph to estimate
(i) the number of days that produce at most 12 rejects;
(ii) the number of rejects exceeded by 10 % of the days.

17
5. The following cumulative frequency distribution shows the duration of each
telephone call made by an employee recorded for a period of one month:

Duration (minutes) Number of calls


Under 3 45
Under 6 104
Under 9 142
Under 12 173
Under 18 192
Under 24 200

(a) Draw the ogive for the above cumulative frequency distribution.
(b) Use the ogive to estimate:
(i) the number of calls that lasted between 5 and 10 minutes;
(ii) the duration not exceeded by 90% of the calls.
(c) Redraft the above data in the form of frequency distribution and
construct a histogram.

Answers:
2. (e) (i) 185.5 cm. (ii) 23 (iii) 0.7294
3. (c) (i) 100 (ii) 0.865
4. (b) (i) 26.5 days (ii) 24 rejects
5. (b) (i) 68 calls (ii) 14 min.

18

You might also like