You are on page 1of 41

Managerial Accounting – Week 3

Data analysis and statistical techniques


Dr. Arpad Toth PhD Mr. Alex Suta
suta.alex@ga.sze.hu
Agenda
1. Forecasting
2. Summarising and analysing data

2
1. Forecasting
I. Forecasting
We will be covering two principal forecasting techniques in this chapter, regression analysis and time series
analysis. Regression analysis can be applied to costs and revenues while time series analysis is generally
applied to revenue.
TOPIC LIST 7. The components of time series

1. Correlation 8. Finding the trend

2. The correlation coefficient and the coefficient 9. Finding the seasonal variations
of determination
10. Deseasonalisation
3. Lines of best fit
11. Sales forecasting: time series analysis
4. Least squares method of linear regression
analysis 12. Forecasting problems

5. The reliability of regression analysis forecasts 13. Using index numbers

6. The high-low method 14. Sales forecasting: the product life cycle

3
1. Forecasting
1. Correlation

Two variables are said to be correlated if a change in the value of one variable is accompanied by a change in the value
of another variable. This is what is meant by correlation

1.2. Degrees of correlation


1.1. Scattergraphs Two variables might be perfectly correlated, partly correlated or uncorrelated.
One way of showing the correlation between two related Correlation can be
positive or negative.
variables is on a scattergraph or scatter diagram, plotting The differing degrees of correlation can be illustrated by scatter diagrams.
a number of pairs of data on the graph. For example, a
scattergraph showing monthly selling costs against the 1.2.1. Perfect correlation
All the pairs of values lie on a straight line. An exact linear relationship exists
volume of sales for a 12-month period might be as between the two variables.
follows.

This scattergraph suggests that there is some correlation


between selling costs and sales volume, so that as sales
volume rises, selling costs tend to rise as well.

1.2.2. Partial correlation

4
1. Forecasting
1. Correlation/2

1.2.3. No correlation

The values of these two variables are not correlated with each other.

1.2.4. Positive and negative correlation

Correlation, whether perfect or partial, can be positive or negative.


Positive correlation means that low values of one variable are associated with low values of the other, and high
values of one variable are associated with high values of the other.
Negative correlation means that low values of one variable are associated with high values of the other, and
high values of one variable with low values of the other.
5
1. Forecasting
2. The correlation coefficient and determination

2. The correlation coefficient and the coefficient of determination

2.1. The correlation coefficient

The degree of linear correlation between two variables is measured by the Pearsonian (product moment) correlation
coefficient, r. The nearer r is to +1 or –1, the stronger the relationship.
When we have measured the degree of correlation between two variables we can decide, using actual results in the
form of pairs of data, whether two variables are perfectly or partially correlated and, if they are partially
correlated, whether there is a high or low degree of partial correlation.

• r= +1 means that the


FORMULA variables are perfectly positively
Correlation coefficient: correlated
• r= –1 means that the
variables are perfectly negatively
correlated
where X and Y represent pairs of data for two variables X and Y • r= 0 means that the
n = the number of pairs of data used in the analysis variables are uncorrelated

6
1. Forecasting
2.2. Example: The correlation coefficient formula

The cost of output at a factory is thought to depend on the number of units produced. Data have been collected for the
number of units produced each month in the last six months, and the associated costs, as follows.

Required
Assess whether there is any correlation between output and cost.

Month Output Cost


'000s of units $'000 Solution
X Y
1 2 9
We need to find the values for the following.
2 3 11
3 1 7
(a) XY Multiply each value of X by its corresponding Y value, so that
4 4 13
there are six values for XY. Add up the six values to get the total.
5 3 11
(b) X Add up the six values of X to get a total. (X)2 will be the
6 5 15
square of this total.
(c) Y Add up the six values of Y to get a total. (Y)2 will be the
square of this total.
(d) X2 Find the square of each value of X, so that there are six values
for X . Add up these values to get a total.
2

(e) Y2 Find the square of each value of Y, so that there are six values
for Y2. Add up these values to get a total.

7
1. Forecasting
2.2. Example solution, workings
2 2
X Y XY X Y
2 9 18 4 81
3 11 33 9 121
1 7 7 1 49
Solution
4 13 52 16 169
3 11 33 9 121
We need to find the values for the following.
5 15 75 25 225
2 2
X = 18 Y = 66 XY = 218 X = 64 Y = 766
(a) XY Multiply each value of X by
its corresponding Y value, so that there
are six values for XY. Add up the six
values to get the total.
(b) X Add up the six values of X to
get a total. (X)2 will be the square of
this total.
(c) Y Add up the six values of Y to
get a total. (Y)2 will be the square of
this total.
(d) X2 Find the square of each
value of X, so that there are six values for
X2. Add up these values to get a total.
(e) Y2 Find the square of each
value of Y, so that there are six values for There is perfect positive correlation between the volume of output at the factory and costs which means
Y2. Add up these values to get a total. that there is a perfect linear relationship between output and costs.

8
1. Forecasting
1. Correlation/3

2.3. The correlation coefficient without the formula 2.4. Correlation in a time series

Number of units purchased Total cost


x y Correlation exists in a time series if there is a
$ relationship between the period of time and the
1 10 recorded value for that period of time. The
2 20 correlation coefficient is calculated with time as the
3 30 X variable although it is convenient to use simplified
4 40 values for X instead of year numbers.
5 50 For example, instead of having a series of years
20X1 to 20X5, we could have values for X from 0
A correlation coefficient of +1 means that there is a perfect (20X1) to 4 (20X5).
linear relationship between the two variables. The equation
relating the two variables would be of the form y = a + bx. If
2.5. The coefficient of determination, r^2
you plotted a graph, it would be a straight line.

The coefficient of determination, r2 (alternatively


R2) measures the proportion of the total variation
in the value of one variable that can be explained by
variations in the value of the other variable. It
denotes the strength of the linear association
between two variables.
9
1. Forecasting
3. Lines of best fit

3.1. Linear relationships


3.2. Estimating the equation of the line of best fit
Correlation enables us to determine the strength of any There are a number of techniques for estimating the
relationship between two variables but it does equation of a line of best fit. We will be looking at
not offer us any method of forecasting values for simple linear regression analysis. This provides a
one variable, Y, given values of another variable, technique for estimating values for a and b in the
X. equation
If we assume that there is a linear relationship between
the two variables, however, and we determine Y = a + bX
the equation of a straight line (Y = a + bX) which
is a good fit for the available data plotted on a where X and Y are the related variables and
scattergraph, we can use the equation for a and b are estimated using pairs of data for X and Y.
forecasting: we can substitute values for X into
the equation and derive values for Y.

10
1. Forecasting
4. Least squares method of linear regression analysis

Linear regression analysis (the least squares method) is 4.2. Example: The least squares method
one technique for estimating a line of best fit. Once an (a) Using the data below for variables X (output) and Y
equation for a line of best fit has been determined, (total cost), calculate an equation to determine the
forecasts can be made. expected level of costs, for any given volume of output,
using the least squares method.
FORMULA
Time 1 2 3 4 5
period
Output 20 16 24 22 18
('000
units)
Total 82 70 90 85 73
cost
($'000)
The least squares method of linear regression analysis
involves using the following formulae for a and b in (b) Prepare a budget for total costs if output is 22,000
units.
Y = a + bX.
(c) Confirm that the degree of correlation between output
where n is the number of pairs of data. and costs is high by calculating the correlation coefficient.

11
1. Forecasting
4.2. Example solutions-workings

Solution
(a) Workings

where Y = total cost, in thousands of dollars


X = output, in thousands of units

Note that the fixed costs are $28,000 (when X = 0 costs are $28,000) and the variable cost per unit is $2.60.

(b) If the output is 22,000 units, we would expect costs to be 28 + 2.6 * 22 = 85.2 = $85,200.

(c)

12
1. Forecasting
5. Regression lines and time series

The same technique can be applied to calculate a regression line (a trend line) for a time series. This is particularly
useful for purposes of forecasting. As with correlation, years can be numbered from 0 upwards.

The reliability of regression analysis forecasts


As with all forecasting techniques, the results from regression analysis will not be wholly reliable. There are a number
of factors which affect the reliability of forecasts made using regression analysis.

Advantages of regression analysis


(a) It gives a definitive line of best fit, taking account of all the data.
(b) Linear regression makes efficient use of data and good results can be obtained with relatively small data sets.
(c) The significance/reliability of the relationship between variables can be statistically tested (but you don't need
to know the details of this for FMA).
(d) Many processes are linear so are well described by regression analysis. Even many non-linear relationships can
be well approximated by a linear model over a short range.

13
1. Forecasting
6. The high-low method

(a) Records of costs in previous periods are (d) This method may be applied to annual sales
reviewed and the costs of the following two figures or any other activity as well as costs. So
periods are selected. be prepared to use this outside the context of
(i) The period with the highest volume of costs.
activity 6.2. Example: The high-low method using
(ii) The period with the lowest volume of revenues
activity The following information concerning sales revenues
(b) The difference between the total cost of these for a development, Cool Blue, for the last four
two periods will be the variable cost of the months have been as follows.
difference in activity levels (since the same fixed
Month Sales revenues Website 'hits'
cost is included in each total cost).
$
(c) The variable cost per unit may be calculated
1 110,000 70,000
from this (difference in total costs difference in 2 115,000 80,000
3 111,000 77,000
activity levels), and the fixed cost may then be
4 97,000 60,000
determined by substitution. 14
1. Forecasting
6.2. Example: The high-low method using revenues

Required
Calculate the revenues that should be expected in month five when hits is expected to be 75,000 units. Ignore
inflation.

Hits Revenue
6.3. Solutions
$
(a)
High activity 80,000 115,000

Low activity 60,000 97,000

20,000 18,000

(b)

15
6.4. Example: The high-low method with 1. Forecasting
stepped fixed costs

The following data relate to the overhead expenditure of


contract cleaners (for industrial cleaning) at two 6.5. Solution
activity levels. Units $
High output 15,100 Total cost 83,585
Square metres cleaned 12,750 15,100 Low output 12,750 Total cost 73,950
Overheads $73,950 $83,585 2,350 9,635
= $4.10 per square metre
When more than 20,000 square metres are industrially Estimated overhead expenditure if 22,000 square
cleaned, it is necessary to have another supervisor metres are to be industrially cleaned:
and so the fixed costs rise to $43,350. $
Fixed costs 43,350
Required Variable costs (22,000 × $4.10) 90,200
Calculate the estimated overhead expenditure if 22,000 133,550
square metres are to be industrially cleaned.
16
1. Forecasting
6.6. Example: The high-low method with inflation

You may be asked to use the high-low method when Cost/inflation index
cost inflation is included. You need to deflate Year 1 Year 2 Year 3 Year 4
(reduce) all the costs to a base year before the $337,500 $365,670/1.02 $379,080/1.04 $382,395/1.06
= $337,500 =$358,500 =$364,500 =$360,750
high-low method can be applied.
Year 1 Year 2 Year 3 Year 4 After adjusting for inflation, the year of highest output
Sales/produc 85,000 93,400 95,800 94,300
tion (units)
(Year 3) is now also the year of the highest cost.
Total costs $337,500 $365,670 $379,080 $382,395 Using the high-low method for Year 1 and Year 3:
Cost inflation 100 102 104 106
index Units ($) Cost
High 95,800 364,500
Required
Low 85,000 337,500
Establish a linear equation for total costs per annum 10,800 27,000

(at Year 1 prices) using the high-low method. Variable cost per unit = $27,000/10,800 = $2.50
Fixed cost = $337,500 – (85,000 × $2.50) = $125,000
6.7. Solution Total cost (y) = $2.50x + $125,000 (where x is the
Cost data has to be reduced by dividing by the inflation number of units)
index before the high-low method can be applied. 17
1. Forecasting
6.8. Advantages and disadvantages of the high-low method

Advantages
• It is easy to use and understand.
• It needs just two activity levels.

Disadvantages
• It uses two extreme data points which may not be representative of normal conditions.
• Using only two points to determine a formula may mean that the formula is not very accurate.

18
1. Forecasting
7. The components of time series

A time series is a series of figures or values recorded over time.


There are four components of a time series: trend, seasonal variations, cyclical variations and random variations.

There are several features of a time series which it may be necessary to analyse in order to prepare forecasts.
(a) A trend
(b) Seasonal variations or fluctuations
(c) Cycles, or cyclical variations
(d) Non-recurring, random variations, which may be caused by unforeseen circumstances, such as a change in
the government of the country, a war, the collapse of a company, technological change and a fire

19
1. Forecasting
7. The components of time series/2

The trend is the underlying long-term movement over movement over time is one of rising costs.
time in the values of the data recorded. In the (c) In time series (C) there is no clear movement up
following examples of time series, there are three or down, and the number of employees remained
types of trend. fairly constant, around 100. The trend is
Year Output per labour Cost per unit Number of
therefore a static, or level one.
hour employees
Units $ 7.2. Seasonal variations
4 30 1.00 100 Seasonal variations are short-term fluctuations in
5 24 1.08 103 recorded values, due to different circumstances which
6 26 1.20 96 affect results at different times of the year, on different
7 22 1.15 102 days of the week, at different times of day, or whatever.
8 21 1.18 103
9 17 1.25 98
(A) (B) (C)
7.4. Cyclical variations
Cyclical variations are fluctuations which take place
(a) In time series (A) there is a downward trend in over a longer time period than seasonal variations. It
the output per labour hour. Output per labour may take several years to complete the cycle. For
hour did not fall every year, because it went up example, the sales of fashion items, such as flared
between years 5 and 6, but the long-term trousers, could be said to be cyclical. The last cycle took
movement is clearly a downward one. approximately 30 years (mid 1960s to mid 1990s) to
(b) In time series (B) there is an upward trend in the complete.
cost per unit. Although unit costs went down in
year 7 from a higher level in year 6, the basic
20
1. Forecasting
8-9. Trends, seasonal variations

8. Finding the trend Once a trend has been established, we can find the
One method of finding the trend is by the use of moving seasonal variations.
averages. Remember that when finding the
moving average of an even number of results, a 10. Deseasonalisation
second moving average has to be calculated so Deseasonalised data are often used by economic
that trend values can relate to specific actual commentators. Economic statistics, such as
figures. unemployment figures, are often 'seasonally
adjusted' or 'deseasonalised' so as to ensure that
9. Finding the seasonal variations the overall trend (rising, falling or stationary) is
Seasonal variations are the difference between actual clear. All this means is that seasonal variations
and trend figures (additive model). An average of (derived from previous data) have been taken
the seasonal variations for each time period out, to leave a figure which might be taken as
within the cycle must be determined and then indicating the trend.
adjusted so that the total of the seasonal
variations sums to zero. 21
1. Forecasting
11. Sales forecasting: time series analysis

Forecasting using time series analysis involves 12. Forecasting problems


calculating a trend line, extrapolating the trend All forecasts are subject to error, but the likely errors
line and adjusting the forecasts by appropriate vary from case to case.
seasonal variations. The trend line can be (a) The further into the future the forecast is for, the
extrapolated by eye or by using a common sense more unreliable it is likely to be.
'rule of thumb' approach. (b) The less data available on which to base the
(a) The trend line should be calculated. forecast, the less reliable the forecast.
(b) The trend line should be used to forecast future (c) The pattern of trend and seasonal variations
trend line values. cannot be guaranteed to continue in the future.
(c) These values should be adjusted by the average (d) There is always the danger of random variations
seasonal variation applicable to the future period, upsetting the pattern of trend and seasonal variation.
to determine the forecast for the period.

22
1. Forecasting
13. Using index numbers

An index is a measure, over time, of the average


changes in the value (price or quantity) of a 13.1. Price indices and quantity indices
group of items relative to the situation at some An index may be a price index or a quantity index.
period in the past. (a) A price index measures the change in the
• Composite indices cover more than one item. monetary value of a group of items over
• Weighting is used to reflect the importance of time. Perhaps the best known price index in
each item in the index. the UK is the RPI which measures changes
• Weighted aggregate indices are found by in the costs of items of expenditure of the
applying weights and then calculating the index. average household.
• There are two types of weighted aggregate index,
the Laspeyre (which uses quantities/prices from
the base period as the weights) and the Paasche
(which uses quantities/prices from the current (b) A quantity index (also called a volume
period as weights). index) measures the change in the non-
• Fisher's ideal index is the geometric mean of the monetary values of a group of items over
Laspeyre and Paasche indices. time. An example is a productivity index,
• Index numbers are a very useful way of which measures changes in the productivity
summarising a large amount of data in a single of various departments or groups of
series of numbers. You should remember, workers.
however, that any summary hides some detail
and that index numbers should therefore be
interpreted with caution.
23
1. Forecasting
Laspeyre, Paasche indices

Laspeyre indices: Laspeyre indices use weights from Paasche indices: Paasche indices use current time
the base period and are therefore sometimes period weights. In other words, the weights are
called base weighted indices. changed every time period.
Laspeyre price index: uses quantities Paasche price index: uses quantities consumed
consumed in the base period as weights. In the in the current period as weights and can be
notation already used it can be expressed as expressed as follows.
follows.

Laspeyre quantity index: uses prices from the Paasche quantity index: uses prices from the
base period as weights and can be expressed as current period as weights and can be expressed
follows. as follows.

24
1. Forecasting
Which to use – Paasche or Laspeyre?

Both patterns of consumption and prices change and a decision Paasche index, on the other hand, comparisons can only
therefore has to be made as to whether a Paasche or a be drawn directly between the current year and the base
Laspeyre index should be used. year (although indirect comparisons can be made).
(d) The weights for a Laspeyre index become out of date,
The following points should be considered when deciding which whereas those for the Paasche index are updated each
type of index to use. year.
(a) A Paasche index requires quantities to be ascertained (e) A Laspeyre price index implicitly assumes that, whatever
each year. A Laspeyre index only requires them for the the price changes, the quantities purchased will remain
base year. Constructing a Paasche index may therefore be the same. In terms of economic theory, no substitution of
costly. cheaper alternative goods and services is allowed to take
(b) For the Paasche index the denominator has to be place. Even if goods become relatively more expensive, it
recalculated each year because the quantities/prices assumes that the same quantities are bought. As a result,
must be changed to current year consumption/price the index tends to overstate inflation.
levels. (f) The effect of current year weighting when using the
For the Laspeyre index, the denominator is fixed. The Paasche price index means that greater importance is
Laspeyre index can therefore be calculated as soon as placed on goods that are relatively cheaper now than
current prices/quantities are known. The Paasche index, they were in the base year. As a consequence, the
on the other hand, cannot be calculated until the end of a Paasche price index tends to understate inflation.
period, when information about current
quantities/prices becomes available. In practice, it is common to use a Laspeyre index and revise the
(c) The denominator of a Laspeyre index is fixed and weights every few years. (Where appropriate, a new base
therefore the Laspeyre index numbers for several year may be created when the weights are changed.)
different years can be directly compared. With the
25
1. Forecasting
Additions – Fisher’s index and collecting data

Because Laspeyre's index uses base period weights, it Collecting the data
tends to overstate any change in prices or Data are required to determine the following.
quantities. When prices increase there is usually (a) The values for each item
a reduction in the quantities consumed. The (b) The weight that will be attached to each item
index numerator is therefore likely to be too
large. Likewise, when prices decrease, quantities Consider as an example a cost of living index. The
consumed increase, resulting in an prices of a particular commodity will vary from
underweighting of those prices which have place to place, from shop to shop and from type
decreased and hence an overstatement of to type. Also the price will vary during the
change. The Paasche index, on the other hand, period under consideration. The actual prices
tends to understate change. used must obviously be some sort of average.
The way in which the average is to be obtained
To overcome these difficulties some statisticians prefer should be clearly defined at the outset.
to use Fisher's ideal index. This index is found by When constructing a price index, it is common practice
taking the geometric mean of the Laspeyre index to use the quantities consumed as weights;
and the Paasche index. similarly, when constructing a quantity index,
the prices may be used as weights. Care must be
taken in selecting the basis for the weighting. For
example, in a cost of living index, it may be
decided to use the consumption of a typical
family as the weights, but some difficulty may be
encountered in defining a typical family.
26
1. Forecasting
14. Sales forecasting: the product life cycle

The product life cycle model shows how sales of a product can be expected to vary with the passage of time.

A product will probably go through the stages of


introduction, growth, maturity, decline and
senility. Different levels of sales and profit
can be expected at each stage. Note that the
product life cycle is a model of what might
happen, not a law prescribing what will
happen. In other words, not all products go
through these stages or even have a life
cycle. However, the idea of a life cycle can be
useful to experienced marketing staff when
forecasting sales and profits.

27
1. Forecasting
Forecasting - Chapter summary
• Two variables are said to be correlated if a change in the value of forecasts can be made.
one variable is accompanied by a change in the value of another • As with all forecasting techniques, the results from regression
variable. This is what is meant by correlation. analysis will not be wholly reliable. There are a number of
• Two variables might be perfectly correlated, partly correlated or factors which affect the reliability of forecasts made using
uncorrelated. Correlation can be: positive or negative. regression analysis.
• The degree of linear correlation between two variables is measured • The high-low method is a simple forecasting technique.
by the Pearsonian (product moment): correlation coefficient, r. • A time series is a series of figures or values recorded over time.
The nearer r is to +1 or –1, the stronger the relationship. • There are four components of a time series: trend, seasonal
• The coefficient of determination, r2 (alternatively R2) measures the variations, cyclical variations and random variations.
proportion of the total variation in the value of one variable that can
be explained by variations in the value of the other variable. It
denotes the strength of the linear association between two
variables.
• Linear regression analysis (the least squares method) is one
technique for estimating a line of best fit.
• Once an equation for a line of best fit has been determined,

28
2. Summarising and analysing data
II. Summarising and analysing data

In Chapter 2 we saw how data can summarised and presented in tabular, chart and graphical formats. Sometimes
management might need more information than that provided by diagrammatic analysis. For example, they
might wish to calculate a measure of centrality (averages) and a measure of dispersion. These are the
subjects of this chapter.

TOPIC LIST
1 Grouped and ungrouped data
2 Averages
3 Dispersion
4 Probabilities and expected values
5 Normal distribution
6 The standard normal distribution
7 Using the normal distribution to calculate probabilities

29
2. Summarising and analysing data
1. Grouped and ungrouped data

Grouped data is data where the frequency is shown in terms of a range. Ungrouped data is data where , the
frequency is shown in terms of a specific measure or value. Discrete data can only take on a countable
number of values. Continuous data can take on any value.

1.2. Frequency distributions


Frequency diagrams are used if values of particular variables occur more than once.
Frequently the data collected from a statistical survey or investigation is simply a mass of numbers.
e.g.
65 69 70 71 70 68 69 67 70 68
72 71 69 74 70 73 71 67 69 70

30
2. Summarising and analysing data
Some examples on distributions

Ogives Histograms
A cumulative frequency distribution can be graphed A frequency distribution can be represented pictorially
as an ogive. by means of a histogram. The number of
observations in a class is represented by the area
covered by the bar, rather than by its height.

The ogive is drawn by plotting the cumulative frequencies on the


graph, and joining them with straight lines.
An ogive drawn with straight lines may be referred to as a
cumulative frequency polygon (or cumulative frequency
diagram) whereas one drawn as a curve may be referred to
as a cumulative frequency curve.
31
2. Summarising and analysing data
2. Averages

The arithmetic mean 2.1.3. Example: The arithmetic mean of grouped data
The arithmetic mean is the best known type of average and is The arithmetic mean of grouped data, where
widely understood. n is the number of values recorded, or the number of
Arithmetic mean of ungrouped data = items measured. In this case the data have been
The arithmetic mean of a variable is shown as (x bar). collected into classes (ie it is now grouped data). To
calculate the arithmetic mean of grouped data we
therefore need to decide on a value which best
represents all of the values in a particular class interval.
This value is known as the mid-point.
Daily demand Frequency Daily demand Mid point Frequency

> 0 <= 5 4 X f fx
>0<5 3 4 12
> 5 <= 10 8
> 5 < 10 8 8 64
> 10 <= 15 6 >10 <15 13 6 78
>15 <20 18 _2 36
> 15<= 20 2
sum f = 20 sum fx = 190
20

32
2. Summarising and analysing data
2. Averages/2

Class
The mode Frequency
interval
The mode or modal value is an average which means 'the most frequently 0 and less than 10 0

occurring value. 10 and less than 20 50

2.2.2. Example: The mode of a grouped frequency distribution 20 and less than 30 150

The mode of a grouped frequency distribution can be calculated from a histogram. 30 and less than 40 100

1. The modal class (the one with the highest frequency) is '20 and less than
30'. But how can we find a single value to represent the mode?

2. What we need to do is draw a histogram of the frequency distribution.

3. We can estimate the mode graphically as follows.

33
2. Summarising and analysing data
2. Averages/3

The median
The median is the value of the middle member of an array. The middle
item of an odd number of items is calculated as the Cumulative
Class Frequency
item. frequency
$
> 340, < 370 17 17
2.3.3. Example: The median of a grouped frequency distribution >370, < 400 9 26
> 400, < 430 9 35
> 430, < 460 3 38
The median of a grouped frequency distribution can be established > 460, < 490 2 40
40
from an ogive.

The median is at the 1/2 x 40 = 20th item. Reading off from the
horizontal axis on the ogive, the value of the median is
approximately $380.
Note that, because we are assuming that the values are spread evenly
within each class, the median calculated is only approximate.

34
2. Summarising and analysing data
3. Dispersion

Standard deviation
The standard deviation, which is the square root of the variance,
is the most important measure of spread used in
statistics. Make sure you understand how to calculate the
standard deviation of a set of data.
Standard deviation (σ) is one of the most important measures of
dispersion. The standard deviation measures the spread
of data around the mean.

TERM The variance is the square of the standard deviation


(variance = σ 2).

Coefficient of variation
The spreads of two distributions can be compared using the
coefficient of variation.
Coefficient of variation (coefficient of relative spread)

The bigger the coefficient of variation, the wider the spread.

35
2. Summarising and analysing data
4. Probabilities and expected values

An expected value is a weighted average value of the FORMULA


different possible outcomes from a decision,
where weightings are based on the probability
of each possible outcome. Expected values
indicate what an outcome is likely to be in the 4.1.1. Example
long term, if the decision can be repeated Option A Option B
Probability Profit ($) Probability Profit $
many times over. Fortunately, many business 0.8 5,000 0.1 (2,000)
0.2 6,000 0.2 5,000
transactions do occur over and over again.
0.6 7,000
Although the outcome of a decision may not be 0.1 8,000

certain, the probabilities of different possible


(2,000
0.8 * 5,000 = 4,000 0.1 * = (200)
outcomes may possibly be estimated; for )
0.2 * 6,000 = 1,200 0.2 * 5,000 = 1,000
example, on the basis of historical experience EV = 5,200 0.6 * 7,000 = 4,200
0.1 * 8,000 = 800
of similar circumstances.
EV = 5,800

36
2. Summarising and analysing data
5. Normal distribution

A probability distribution is an analysis of the proportion of times each particular value occurs in a set of ' items.
There are a number of different probability distributions but the only one that you need to know about for
Management Accounting is the normal distribution.

Properties of the normal distribution are as follows:


It is symmetrical and bell-shaped
It has a mean, μ (pronounced mew)
The area under the curve totals exactly 1
The area to the left of μ = area to the right of μ = 0.5

The normal distribution is important because in the practical application of statistics, it has been found that many
probability distributions are close enough to a normal distribution to be treated as one without any
significant loss of accuracy. This means that the normal distribution can be used as a tool in business
decision making involving probabilities.

37
2. Summarising and analysing data
6. The standard normal distribution

For any normal distribution, the dispersion around the mean of the frequency
of occurrences can be measured exactly in terms of the standard
deviation.
The standard normal distribution has a mean of 0 and a standard deviation of 1.
The entire frequency curve represents all the possible outcomes and
their frequencies of occurrence. Since the normal curve is symmetrical,
50% of occurrences have a value greater than the mean value, and 50%
of occurrences have a value less than the mean value

Although there is an infinite number of normal distributions, below the mean, 49.5% of outcomes will always be in
depending on values of the mean p and the standard the range between the mean and 2.58 standard
deviation a, the relative dispersion of frequencies deviations below the mean and so on.
around the mean, measured as proportions of the A normal distribution table, shown at the next slide, gives
total population, is exactly the same for all normal the proportion of the total between the mean and a
distributions. In other words, whatever the normal point above or below the mean for any multiple of the
distribution, 47.5% of outcomes will always be in the standard deviation.
range between the mean and 1.96 standard deviations 38
2. Summarising and analysing data
Example: Normal distribution tables

• What is the probability that a randomly picked item will be in the shaded area of the diagram?
Look up 1.96 in the normal distribution table and you will obtain the value .475. This means
there is a 47.5% probability that the item will be in the shaded area.
• Since the normal distribution is symmetrical 1.96 σ below the mean will also correspond to an
area of 47.5%.
• The total shaded area = 47.5% x 2 = 95%

• 95% of the frequencies in a normal distribution lie in the range ± 1.96 standard deviations
from the mean. This is based on the corresponding value in the normal distribution tables
(when z = 1.96) as shown above.
• We can also show that 99% of the frequencies occur in the range ± 2.58 standard deviation
from the mean.
• Using the normal distribution table, a z score of 2.58 corresponds to an area of 0.4949 (or
49.5%). Remember, the normal distribution is symmetrical.

39
Summarising and analysing data – 2. Summarising and analysing data
Chapter summary

• Grouped data is data where the frequency is shown in terms of a • An expected value is a weighted average value of the different
range. Ungrouped data is data where the frequency is shown in possible outcomes from a decision, where weightings are based
terms of a specific measure or value. Discrete data can only take on the probability of each possible outcome. Expected values
on a countable number of values. Continuous data can take on indicate what an outcome is likely to be in the long term, if the
any value. decision can be repeated many times over. Fortunately, many
• Frequency diagrams are used if values of particular variables business transactions do occur over and over again.
occur more than once. • A probability distribution is an analysis of the proportion of
• A cumulative frequency distribution can be graphed as an ogive. times each particular value occurs in a set of items. There are a
• A frequency distribution can be represented pictorially by number of different probability distributions but the only one
means of a histogram. The number of observations in a class is that you need to know about for Management Accounting is the
represented by the area covered by the bar, rather than by its normal distribution.
height. • The normal distribution can be used to calculate probabilities.
• The arithmetic mean is the best known type of average and is Sketching a graph of a normal distribution curve often helps in
widely understood. normal distribution problems.
• If you are given the variance of a distribution, remember to first
• The median of a grouped frequency distribution can be calculate the standard deviation by taking its square root.
established from an ogive.
• The standard deviation, which is the square root of the variance,
is the most important measure of spread used in statistics. Make
sure you understand how to calculate the standard deviation of a
set of data.
• The spreads of two distributions can be compared using the
coefficient of variation. 40
Thank you for your kind attention.

You might also like