Professional Documents
Culture Documents
a. Construct a box plot manually (you are not required to use Excel) for each
manufacturer, using these five-number summaries. Both box plots need to be plotted
on the same set of axis. Be sure to label the mean on each box plot
x
Business Statistics T3_S2, 2017
Both the distributions of life of bulbs produced by manufacturer A and B are positively skewed
as shown by the mean > median. This means that for both manufacturer A and B, there are a
small number of bulbs that have a much longer lifespan than is usual.
A3.2
Hytex Company
Hytex Company is a direct marketer of electronic equipment and wants to investigate the
efficacy of catalogue mailings to its 1,000 mail order customers. Catalogue Marketing.xlsx
contains customer demographic attributes including the following:
The mean Amount Spent is higher for males than for females. The median Amount Spent is
also higher for males than for females. Overall male customers spend more on average than
female customers.
The range of Amount Spent is higher for males than for females. The interquartile ranges of
Amount Spent are very similar for both males and females.
SALARY
Female Male
Mean $48,197.43 $64,202.43
Median $42,650.00 $62,800.00
Quartile 1 $22,175.00 $43,700.00
Quartile 3 $69,475.00 $84,225.00
IQR $47,300.00 $40,525.00
Min $10,100.00 $10,200.00
Business Statistics T3_S2, 2017
150
Frequency
100
50
0
25000 40000 55000 70000 85000 100000 115000 130000 145000 160000 175000
Salaries ($)
80
60
40
20
0
25000 40000 55000 70000 85000 100000 115000 130000 145000 160000 175000
Salaries ($)
a) Discuss the distributions for the male and female salaries of Hytex customers.
Central Location:
The mean Salary for male customers ($64,202.43) is higher than that for female customers
($48,197.43). The median Salary for male customers ($62,800) is higher than that for female
customers ($42,650). The modal class Salary for male customers ($55,000 - $70,000) is
higher than that for female customers ($10,000 - $25,000). In conclusion, it can be seen that
male customers earn a higher Salary on average compared to their female counterparts.
Business Statistics T3_S2, 2017
Variability:
The range of Salary for male customers ($158,600), is greater than that for female
customers ($125,600). This is due to a higher maximum salary amongst the male customers.
The standard deviation of Salary for male and female customers are quite similar
($29,599.33 and $29,533.11 respectively). The range of the middle 50% of salaries is more
spread out for female customers ($47,300) than for male customers ($40,525). The
coefficient of variation shows that the salaries for females has a higher relative variability
(61.28%) compared to the salaries for males (46.10%). Overall, using the measure of relative
variability as the basis for our conclusion, we see that female salaries are more variable than
male salaries.
Shape:
The distribution of Salary for females is positively skewed as seen in both the histogram and
boxplot. This is confirmed with the mean ($48,197.43) being higher than the median
($42,650). In context, this means that only a few females have a high salary with most of the
females earning lower salaries. The distribution of Salary for males is slightly positively
skewed with the mean ($64,202.43) only slightly higher than the median ($62,800). This can
be seen from the histogram and boxplots having more symmetric distributions in
comparison to the female salary distribution. Both distributions of Salary are unimodal.
b) Using information from the following table, can you conclude that the distribution of
Salary is more variable than that of AmountSpent?
OVERALL
AMOUNT SPENT SALARY
Mean $1,216.77 $56,104
Standard deviation $961.08 $30,616.31
Coefficient of variation 78.99% 54.57%
Whilst the distribution of Salary has a standard deviation which is much higher than that for
the distribution of Amount Spent, I cannot conclude that it is more variable. The table
indicates that the means for these two distributions are very different, which further
signifies that the coefficient of variation is the best measure to use when compare
variability. The measure of relative variability indicates that the distribution of Amount
Spent is in fact more variable than the distribution of Salary.
c) Using information from exhibits in both A3.2 and B3.1, which measure of central
location would be the most appropriate to use for both distributions?
The boxplots for AmountSpent in A3.2 show skewness, there are outliers affecting the
accuracy of the mean value. So the most appropriate measure of central location to use is
the median as it is robust to outliers.
Business Statistics T3_S2, 2017
B3.2
Hytex Company
Hytex Company is a direct marketer of electronic equipment and wants to investigate the
efficacy of catalogue mailings to its 1,000 mail order customers. Catalogue Marketing.xlsx
contains customer demographic attributes including the following:
Use the following pivot table to analyse the data and report on how AmountSpent is related
to these demographic attributes. Is Hytex sending catalogues to the right customers? If not,
to whom should the catalogues be sent to?
Exhibit 1
SOLUTION
Step One:
Identify the dependent and independent variables
Dependent variable (DV) Amount Spent ($) (DV)
Independent variables (IV) Gender (IV1),
Region (IV2)
Step Two:
Describe any relevant overall features of AmountSpent.
Step Three:
Describe the overall relationship between AmountSpent and Gender (IV1).
Business Statistics T3_S2, 2017
This is also true for each region. Female customers from the East region have a mean
AmountSpent ($1,241) which is greater than male customers from the East region
($1,125).
Female customers from the South region have a mean AmountSpent ($1,502) which is
greater than male customers from the South region ($913).
Step Four:
Describe the overall relationship between AmountSpent and Region (IV2).
This holds true for female customers. Female customer from the South region have a
mean AmountSpent ($1,502) which is greater than female customer from the East
region ($1,241).
However, this does not hold true for male customers. Male customers from the East
region have a mean AmountSpent ($1,125) which is greater than male customers from
the South region ($913).
Now, it is important to address HyTex’s question. Is it is sending the catalogues to the right
customers? If not, to whom should HyTex send the catalogues?
Overall Conclusion:
Female customers from the South region have a higher mean spend ($1,502) than
any other customer group.
Female customers from the East region have the next highest mean spend ($1,241).
So, it seems to make good sense for HyTex to send catalogues to these customer
segments.
Business Statistics T3_S2, 2017
EXTRA QUESTIONS
B3.3
PlanFinan Pty Ltd
PlanFinan is a financial planning organisation. The data in PlanFinan.xlsx was obtained from
PlanFinan’s database of a particular group of clients. Definitions are given for the following
variables:
Use the following pivot table to analyse the data and report on how sex and educational
level are associated with annual salary for this group of clients.
Exhibit 1
SOLUTION
Step One:
Identify the dependent and independent variables
Dependent variable (DV) Salary ($,000) (DV)
Independent variables (IV) Sex (IV1),
EducLevel (IV2),
Step Three:
Describe the overall relationship between Salary and Sex (IV1).
This is also true for each level of education. Female clients with a high level of education
have a mean salary ($109,170) which is greater than their male counterparts ($98,910).
Female clients with a lower level of education have a mean salary ($90,070) which is
greater than their male counterparts ($83,980).
Step Four:
Describe the overall relationship between Salary and Education Level (IV2).
This is also true for each sex. Female clients with a high level of education have a mean
salary ($109,170) which is greater than the female clients who have a lower level of
education ($90,070).
Male clients with a high level of education have a mean salary ($98,910) which is
greater than the male clients who have a lower level of education ($83,980).
Overall Conclusion:
Female clients with a high level of education have a higher mean salary ($109,170)
than any other client group.
Male clients with a low level of education have the lowest mean salary ($83,980) of
all client groups.
B3.4
A study of Melbourne’s climate compared annual rainfall values for the 75 years from 1861
to 1935 (“Historical”) with annual rainfall values for the following 75 years 1936 to 2010
(“Recent”).
Using the exhibits below, compare the distribution of yearly rainfall totals for the “Historical”
period with the distribution for the “Recent” period by making appropriate references to the
of the distributions.
20
Frequency
15
10
200
300
400
500
600
700
800
900
1000
1100
Annual Rainfall (mm)
20
Frequency
15
10
0
200
300
400
500
600
700
800
900
1000
1100
Annual Rainfall (mm)
Recent
Historical
200 300 400 500 600 700 800 900 1000 1100
SOLUTION:
With a lower mean and median, we can conclude that RECENT average annual rainfall is lower
than HISTORICAL average annual rainfall.
(As usual, for a numeric variable with many possible values, the mode is of no interest. By
the way, Historical has no mode. Recent has 1 mode: 757.9, with 2 occurrences.)
(Historical has higher minimum and higher maximum. This fact does not, by itself, assure
that the Historical range will be larger.
Example, A: min 1, max 19 range = 18. B: min 4, max 20 range = 16. B has higher
max and min, but smaller range.)
Co-eff of variation measures the relative variability (std deviation as a percentage of mean)
and is higher for RECENT rainfall than for HISTORICAL.
Business Statistics T3_S2, 2017
Shape (S)
Both distributions are unimodal and very close to being symmetrical (mean ≈ median)
although RECENT is slightly positively skewed with the mean being slightly greater than the
median indicating that there must have been an usually high rainfall at some point.
Overall then, the above evidence shows that whilst HISTORICAL annual rainfall has a larger
range, RECENT annual rainfall has greater variability.
B3.5
As part of an Australian Household Expenditure Survey (1988-89), the following data was
collected for 1000 households:
Exhibit 1:
Exhibit 2:
Using the above results, compare the distribution of the variable “Income” for the two groups,
discussing typical values (i.e. “central tendency”), how spread out the values are (“variability”),
and the shape of the distributions.
Comment on what this tells us about the association between income and the consumption
of alcohol.
Business Statistics T3_S2, 2017
SOLUTION:
With a lower mean and median, we can conclude that the average weekly household
income for the group that does not consume alcohol is much lower than the group that
consumes alcohol.
The group that does not consume alcohol has a clear modal class of $0-$250 while that of the
group that does consume alcohol is not really clear, $500-$750 if pushed.
The income range of the group that does not consume is larger than the group that
consume alcohol by $150.
(Both minimums are the same but Do Not Consume has a higher Maximum, hence a larger
range)
The spread of the middle 50% of the income for the group that consumes alcohol is larger
(more variable) than the group that does not consume alcohol.
The distribution of incomes for households that do not consume alcohol has a lower standard
deviation and interquartile range than the distribution for those that do consume alcohol.
Business Statistics T3_S2, 2017
Overall, although the measures of absolute variability (interquartile range and standard
deviation) are higher for the households that consume alcohol, the relative variability as
measured by the coefficient of variation is considerably higher for the income distribution of
households that do not consume alcohol. This is because the standard deviation is about 88%
of the mean, whereas for the income distribution of alcohol-consuming households the
standard deviation is only about 65% of the mean.
The distribution of weekly income is skewed to the right and unimodal for both groups. This
means that in both cases, the mean is greater than the median. This suggests that there are
a “few” very large incomes.
Among those who do not consume alcohol, the distribution is more strongly skewed given
that the difference between mean and median is larger.
Overall then, the above evidence shows that the group that do not consume alcohol has a
larger range and greater relative variability in weekly income.