You are on page 1of 15

Business Statistics T3_S2, 2017

TUTORIAL 3 SOLUTIONS PART A


(To be completed for homework):
A3.1
The data required for this question is provided in Bulbs.xls. It is available in the Tutorial
Material Folder under the Week 3 section on Moodle. It provides information on the life (in
hours of usage) of samples of forty 15 watt compact fluorescent (CFL) light bulbs produced
by two manufacturers, A and B. The data file can be used for your own practice in determining
the following five-number summary for each manufacturer:
A B
Minimum 5647 6722
Q1 6622 7457.25
Median 7274.5 8102.5
Q3 8060.5 9061
Maximum 8776 9741
Mean 7369 8246

a. Construct a box plot manually (you are not required to use Excel) for each
manufacturer, using these five-number summaries. Both box plots need to be plotted
on the same set of axis. Be sure to label the mean on each box plot

x
Business Statistics T3_S2, 2017

b. Compare the two distributions in terms of shape

Both the distributions of life of bulbs produced by manufacturer A and B are positively skewed
as shown by the mean > median. This means that for both manufacturer A and B, there are a
small number of bulbs that have a much longer lifespan than is usual.

A3.2
Hytex Company
Hytex Company is a direct marketer of electronic equipment and wants to investigate the
efficacy of catalogue mailings to its 1,000 mail order customers. Catalogue Marketing.xlsx
contains customer demographic attributes including the following:

 Gender 1 if the customer is male; 0 if the customer is female


 AmountSpent Amount spent ($) for most recent transaction

Discuss the distribution of Amount Spent for males and females.


Business Statistics T3_S2, 2017

The mean Amount Spent is higher for males than for females. The median Amount Spent is
also higher for males than for females. Overall male customers spend more on average than
female customers.

The range of Amount Spent is higher for males than for females. The interquartile ranges of
Amount Spent are very similar for both males and females.

PART B (To be completed during the tutorial):


B3.1
Continuing the Hytex example, provided below are boxplots, a table of summary statistics
and histograms of salaries for male and female customers:

SALARY
Female Male
Mean $48,197.43 $64,202.43
Median $42,650.00 $62,800.00
Quartile 1 $22,175.00 $43,700.00
Quartile 3 $69,475.00 $84,225.00
IQR $47,300.00 $40,525.00
Min $10,100.00 $10,200.00
Business Statistics T3_S2, 2017

Max $135,700.00 $168,800.00


Range $125,600.00 $158,600.00
Standard deviation $29,533.11 $29,599.33
Coefficient of variation 61.28% 46.10%
Count 506 494

Histogram of Female Salaries


200

150
Frequency

100

50

0
25000 40000 55000 70000 85000 100000 115000 130000 145000 160000 175000
Salaries ($)

Histogram of Male Salaries


120
100
Frequency

80
60
40
20
0
25000 40000 55000 70000 85000 100000 115000 130000 145000 160000 175000
Salaries ($)

a) Discuss the distributions for the male and female salaries of Hytex customers.

Central Location:
The mean Salary for male customers ($64,202.43) is higher than that for female customers
($48,197.43). The median Salary for male customers ($62,800) is higher than that for female
customers ($42,650). The modal class Salary for male customers ($55,000 - $70,000) is
higher than that for female customers ($10,000 - $25,000). In conclusion, it can be seen that
male customers earn a higher Salary on average compared to their female counterparts.
Business Statistics T3_S2, 2017

Variability:
The range of Salary for male customers ($158,600), is greater than that for female
customers ($125,600). This is due to a higher maximum salary amongst the male customers.
The standard deviation of Salary for male and female customers are quite similar
($29,599.33 and $29,533.11 respectively). The range of the middle 50% of salaries is more
spread out for female customers ($47,300) than for male customers ($40,525). The
coefficient of variation shows that the salaries for females has a higher relative variability
(61.28%) compared to the salaries for males (46.10%). Overall, using the measure of relative
variability as the basis for our conclusion, we see that female salaries are more variable than
male salaries.

Shape:
The distribution of Salary for females is positively skewed as seen in both the histogram and
boxplot. This is confirmed with the mean ($48,197.43) being higher than the median
($42,650). In context, this means that only a few females have a high salary with most of the
females earning lower salaries. The distribution of Salary for males is slightly positively
skewed with the mean ($64,202.43) only slightly higher than the median ($62,800). This can
be seen from the histogram and boxplots having more symmetric distributions in
comparison to the female salary distribution. Both distributions of Salary are unimodal.

b) Using information from the following table, can you conclude that the distribution of
Salary is more variable than that of AmountSpent?

OVERALL
AMOUNT SPENT SALARY
Mean $1,216.77 $56,104
Standard deviation $961.08 $30,616.31
Coefficient of variation 78.99% 54.57%

Whilst the distribution of Salary has a standard deviation which is much higher than that for
the distribution of Amount Spent, I cannot conclude that it is more variable. The table
indicates that the means for these two distributions are very different, which further
signifies that the coefficient of variation is the best measure to use when compare
variability. The measure of relative variability indicates that the distribution of Amount
Spent is in fact more variable than the distribution of Salary.

c) Using information from exhibits in both A3.2 and B3.1, which measure of central
location would be the most appropriate to use for both distributions?

The boxplots for AmountSpent in A3.2 show skewness, there are outliers affecting the
accuracy of the mean value. So the most appropriate measure of central location to use is
the median as it is robust to outliers.
Business Statistics T3_S2, 2017

B3.2
Hytex Company
Hytex Company is a direct marketer of electronic equipment and wants to investigate the
efficacy of catalogue mailings to its 1,000 mail order customers. Catalogue Marketing.xlsx
contains customer demographic attributes including the following:

 Region South or East


 Gender 1 if the customer is male; 0 if the customer is female
 AmountSpent Amount spent ($) for most recent transaction

Use the following pivot table to analyse the data and report on how AmountSpent is related
to these demographic attributes. Is Hytex sending catalogues to the right customers? If not,
to whom should the catalogues be sent to?

Exhibit 1

Average of AmountSpent Region


Gender East South Grand Total
Male $1,125 $913 $1,021
Female $1,241 $1,502 $1,374
Grand Total $1,182 $1,214 $1,198

SOLUTION
Step One:
Identify the dependent and independent variables
Dependent variable (DV) Amount Spent ($) (DV)
Independent variables (IV) Gender (IV1),
Region (IV2)

For all answers in each step, refer to Exhibit 1

Step Two:
Describe any relevant overall features of AmountSpent.

Describe the overall behaviour of mean AmountSpent.

The average amount spent by customers is $1,198.

Step Three:
Describe the overall relationship between AmountSpent and Gender (IV1).
Business Statistics T3_S2, 2017

AmountSpent and Gender (IV1)


The mean AmountSpent is greater for females ($1,374) than for males ($1,021).

This is also true for each region. Female customers from the East region have a mean
AmountSpent ($1,241) which is greater than male customers from the East region
($1,125).

Female customers from the South region have a mean AmountSpent ($1,502) which is
greater than male customers from the South region ($913).

Step Four:
Describe the overall relationship between AmountSpent and Region (IV2).

AmountSpent and Region (IV2)


The mean AmountSpent is greater for customers from the South region ($1,214) than
for customers who are from the East region ($1,182).

This holds true for female customers. Female customer from the South region have a
mean AmountSpent ($1,502) which is greater than female customer from the East
region ($1,241).

However, this does not hold true for male customers. Male customers from the East
region have a mean AmountSpent ($1,125) which is greater than male customers from
the South region ($913).

Now, it is important to address HyTex’s question. Is it is sending the catalogues to the right
customers? If not, to whom should HyTex send the catalogues?

Overall Conclusion:
 Female customers from the South region have a higher mean spend ($1,502) than
any other customer group.
 Female customers from the East region have the next highest mean spend ($1,241).
 So, it seems to make good sense for HyTex to send catalogues to these customer
segments.
Business Statistics T3_S2, 2017

EXTRA QUESTIONS
B3.3
PlanFinan Pty Ltd
PlanFinan is a financial planning organisation. The data in PlanFinan.xlsx was obtained from
PlanFinan’s database of a particular group of clients. Definitions are given for the following
variables:

 Sex 1 if the client is male; 0 for female


 EducLevel 1: high school incomplete 2: high school complete
3: undergraduate degree 4: postgraduate degree
 Salary Annual salary ($,000)

Use the following pivot table to analyse the data and report on how sex and educational
level are associated with annual salary for this group of clients.

Exhibit 1

Average of Salary EducLevel


Sex 1-2 3-4 Grand Total
Female $90.07 $109.17 $102.35
Male $83.98 $98.91 $92.69
Grand Total $87.17 $104.92 $98.12

SOLUTION
Step One:
Identify the dependent and independent variables
Dependent variable (DV) Salary ($,000) (DV)
Independent variables (IV) Sex (IV1),
EducLevel (IV2),

For all answers in each step, refer to Exhibit 1


Step Two:
Describe any relevant overall features of Salary.

Describe the overall behaviour of mean Salary.

The average salary of clients is $98,120.


Business Statistics T3_S2, 2017

Step Three:
Describe the overall relationship between Salary and Sex (IV1).

Salary and Sex (IV1)


The mean Salary is greater for female clients ($102,350) than for male clients ($92,690).

This is also true for each level of education. Female clients with a high level of education
have a mean salary ($109,170) which is greater than their male counterparts ($98,910).

Female clients with a lower level of education have a mean salary ($90,070) which is
greater than their male counterparts ($83,980).

Step Four:
Describe the overall relationship between Salary and Education Level (IV2).

Salary and Education Level (IV2)


The mean Salary is greater for clients with a high level of education ($104,920) than for
clients with a lower level of education ($87,170).

This is also true for each sex. Female clients with a high level of education have a mean
salary ($109,170) which is greater than the female clients who have a lower level of
education ($90,070).

Male clients with a high level of education have a mean salary ($98,910) which is
greater than the male clients who have a lower level of education ($83,980).

Overall Conclusion:
 Female clients with a high level of education have a higher mean salary ($109,170)
than any other client group.
 Male clients with a low level of education have the lowest mean salary ($83,980) of
all client groups.

B3.4
A study of Melbourne’s climate compared annual rainfall values for the 75 years from 1861
to 1935 (“Historical”) with annual rainfall values for the following 75 years 1936 to 2010
(“Recent”).

Using the exhibits below, compare the distribution of yearly rainfall totals for the “Historical”
period with the distribution for the “Recent” period by making appropriate references to the

(i) measures of central location,


Business Statistics T3_S2, 2017

(ii) measures of variability, and


(iii) shapes

of the distributions.

30 Histogram of "Historical" Annual Rainfall


25

20

Frequency
15

10

200

300

400

500

600

700

800

900

1000

1100
Annual Rainfall (mm)

30 Histogram of "Recent" Annual Rainfall


25

20
Frequency

15

10

0
200

300

400

500

600

700

800

900

1000

1100
Annual Rainfall (mm)

Recent

Historical

200 300 400 500 600 700 800 900 1000 1100

Annual Rainfall (mm) Mean

SOLUTION:

Measures of central location (L)

• Mean Historical 653.6 vs Recent 642.1 mm


• Median Historical 647.3 vs Recent 625.7 mm
Business Statistics T3_S2, 2017

• Modal class 600-700 mm for both periods.


Of course, modal class is dependent on the bins (class intervals) which are used.

With a lower mean and median, we can conclude that RECENT average annual rainfall is lower
than HISTORICAL average annual rainfall.

(As usual, for a numeric variable with many possible values, the mode is of no interest. By
the way, Historical has no mode. Recent has 1 mode: 757.9, with 2 occurrences.)

Measures of variability (V)

• Range Historical 570.4 vs Recent 542.6 mm.

HISTORICAL rainfall range is larger than the RECENT rainfall range.

(Historical has higher minimum and higher maximum. This fact does not, by itself, assure
that the Historical range will be larger.

Example, A: min 1, max 19  range = 18. B: min 4, max 20  range = 16. B has higher
max and min, but smaller range.)

• Standard deviation Historical 127.7 vs Recent 138.1 mm.


The annual rainfall values are more spread out in RECENT times compared to HISTORICAL
times.

• Interquartile range Historical 159.2 vs Recent 243.0 mm.


The spread of the middle 50% of annual rainfall is larger (more variable) for RECENT times
compared to HISTORICAL times.

• Coefficient of variation Historical 19.5% vs Recent 21.5%.

Co-eff of variation measures the relative variability (std deviation as a percentage of mean)
and is higher for RECENT rainfall than for HISTORICAL.
Business Statistics T3_S2, 2017

Shape (S)

Both distributions are unimodal and very close to being symmetrical (mean ≈ median)
although RECENT is slightly positively skewed with the mean being slightly greater than the
median indicating that there must have been an usually high rainfall at some point.

Overall then, the above evidence shows that whilst HISTORICAL annual rainfall has a larger
range, RECENT annual rainfall has greater variability.

B3.5

As part of an Australian Household Expenditure Survey (1988-89), the following data was
collected for 1000 households:

INCOME = Weekly household income (in dollars)


CONSUME = Consume alcohol (1 = yes, 0 = no)
The variable income was studied for the two groups: “Consume alcohol”, and “Do not
consume alcohol”, and the following graphs and summary statistics were obtained.

Exhibit 1:

Percentage frequency for income of


households that consume alcohol
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%

Weekly income ($)


Business Statistics T3_S2, 2017

Exhibit 2:

Percentage frequency for income of


households that DO NOT consume
alcohol
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%

Weekly income ($)

Exhibit 3: Summary Statistics for Weekly Income ($)

DO NOT consume alcohol Consume alcohol


Mean 456.9 708.4
Median 353 638.5
Modal class $0-$250 $500-$750
Standard deviation 403.0 461.3
Coefficient of variation 88.2% 65.1%
Minimum 12 12
Maximum 3846 3696
Range 3834 3684
Lower quartile 173.75 356.75
Upper quartile 632.25 936
Interquartile range 458.5 579.25
Count 234 766

Using the above results, compare the distribution of the variable “Income” for the two groups,
discussing typical values (i.e. “central tendency”), how spread out the values are (“variability”),
and the shape of the distributions.
Comment on what this tells us about the association between income and the consumption
of alcohol.
Business Statistics T3_S2, 2017

SOLUTION:

Measures of central location (L) - refer to Exhibit 3

 Mean $708.40 Consume vs $456.9 Do Not Consume


 Median $638.5 Consume vs $353 Do Not Consume
 Modal class $500-$750 Consume VS $0-$250 Do Not Consume

With a lower mean and median, we can conclude that the average weekly household
income for the group that does not consume alcohol is much lower than the group that
consumes alcohol.

The group that does not consume alcohol has a clear modal class of $0-$250 while that of the
group that does consume alcohol is not really clear, $500-$750 if pushed.

Measures of variability (V) – refer to Exhibit 3

• Range $3684 Consume vs $3834 Do Not Consume

The income range of the group that does not consume is larger than the group that
consume alcohol by $150.

(Both minimums are the same but Do Not Consume has a higher Maximum, hence a larger
range)

 Interquartile range Consume $579.25 vs Do Not Consume $458.5

The spread of the middle 50% of the income for the group that consumes alcohol is larger
(more variable) than the group that does not consume alcohol.

 Standard deviation Consume $461.3 vs $403 Do Not Consume

The distribution of incomes for households that do not consume alcohol has a lower standard
deviation and interquartile range than the distribution for those that do consume alcohol.
Business Statistics T3_S2, 2017

 Coefficient of variation Consume 65.1% vs Do Not Consume 88.2%.

Co-eff of variation measures the relative variability (standard deviation as a percentage of


mean) and is higher for the group that does not consume alcohol. This means that there is
greater variability.

Overall, although the measures of absolute variability (interquartile range and standard
deviation) are higher for the households that consume alcohol, the relative variability as
measured by the coefficient of variation is considerably higher for the income distribution of
households that do not consume alcohol. This is because the standard deviation is about 88%
of the mean, whereas for the income distribution of alcohol-consuming households the
standard deviation is only about 65% of the mean.

Shape (S) – refer to Exhibits 1, 2 and 3.

The distribution of weekly income is skewed to the right and unimodal for both groups. This
means that in both cases, the mean is greater than the median. This suggests that there are
a “few” very large incomes.

Among those who do not consume alcohol, the distribution is more strongly skewed given
that the difference between mean and median is larger.

Overall then, the above evidence shows that the group that do not consume alcohol has a
larger range and greater relative variability in weekly income.

You might also like