BUS105 Self-Practices

BUS105 Self-Practices
The self-practices are to assist students to have a feel on how to answer

questions in the examination. This is in no wise an indication on the focus of the
examination. Students are encouraged to go through all materials and not rely
solely on these exercises for their study. Solutions to the self-practices were
provided to guide students on the way to answer questions.
Question 1
(a) The marketing director at Ace Realty Company has collected selling price
information on the houses sold in the last month for his study on the
market trend. Selling price is reported in thousands of dollars and the
following chart was constructed.
30
Number of homes sold
25
26
20
20
15
16
10 12 13
5 7 7
4
0
<150 [150,175) [175,200) [200,225) [225,250) [250,275) [275,300) >=300
Selling Price (Thousands S$)
i. Name the type of statistical chart used above.
ii. Determine the percentage of homes that were at least $250,000.
iii. Discuss the skewness observed in the chart.
iv. If this chart is included in a business report, provide a brief comment

that could be used to accompany this chart.
(b) The summary statistics of the selling prices produced by the Excel are
shown below:
i. Explain the meaning of measure of location and state the values of

three (3) measures of location shown in the above table.
ii. Explain the meaning of measure of dispersion and state the values of
three (3) measures of variation shown in the above table.
iii. Discuss which measure of location would be most appropriate in this

case.
Question 2
(a) The following table shows a recent study on the relationship between
gender and interest in coffee:
Does Not
Likes Coffee Like Coffee TOTAL
Male 230 70 300
Female 110 90 200
TOTAL 340 160 500
If a person is randomly selected, what is the probability that the person
i. is male,
ii. is female and likes coffee,
iii. is male or likes coffee,
iv. likes coffee given that he is male?

(b) Mary is a housing agent. Over the years, she has developed the following
probability distribution for the number of houses she expects to sell on a
typical week:
Number of Houses sold, X Probability, P(X)

0 0.10
1 0.15
2 0.45
3 0.20
4 0.10
More than 4 0.00
i. Compute the probability that Mary can sell at least two houses on a
typical week.
ii. How many houses on average can Mary sell on a typical week?
iii. Calculate the variance and standard deviation of the distribution.
(c) The weight of a bag of rice is normally distributed with mean of 500g and a
standard deviation of 25g.
i. If a bag of rice is randomly selected, what is the probability that its

weight is more than 510g?
ii. A box contains 10 such bags of rice. What is the probability that the
mean weight is between 490g and 515g?
Question 3
(a) A manufacturer of mobile phones wishes to investigate the lifespan of its

new model J-phone. A sample of 12 J-phones was monitored and the
lifespan (in years) was as follows:
4.15, 4.3, 4.25, 4.3, 4.3, 4.5, 4.5, 4.25, 4.45, 4.2, 4.1, 4.55
Using Excel, the manufacturer created the following report:
Lifespan
Mean 4.320833
Standard Error 0.042399
Median 4.3
Mode 4.3
Standard Deviation 0.146874
Sample Variance 0.021572
Kurtosis -1.12413
Skewness 0.256023
Range 0.45
Minimum 4.1
Maximum 4.55
Sum 51.85
Count 12
Confidence Level(95.0%) 0.093319
i. Construct a 95% confidence interval for the mean lifespan.
ii. Provide an interpretation of this interval and explain the significant of

95%.
iii. What are the assumptions required for the construction of the
confidence interval and how we could verify it.
(b) The Wawa University wanted to investigate whether it is true that its
engineering graduates earn more than its business graduates. As such,
the University conducted a graduate survey on its recently graduated
students. Unfortunately, due to poor response, only ten engineering and
eight business graduates responded. These students provided information
on their starting salary and Wawa proceeded with its analysis by
generating the following Excel table.
t-Test: Two-Sample Assuming Equal Variances
Engineering Business
Mean 30000 29000
Variance 4000000 2285714.286
Observations 10 8
Pooled Variance 3250000
Hypothesized Mean Difference 0
df 16
t Stat 1.169410692
P(T<=t) one-tail 0.129682486
t Critical one-tail 1.745883676
P(T<=t) two-tail 0.259364971
t Critical two-tail 2.119905299
i. At the 0.05 significance level, conduct an appropriate hypothesis test

to determine whether engineering graduates earn more than
business graduates?
ii. Discuss two (2) statistical concerns you might have in conducting
this statistical analysis.
Question 4
A major portion of Cranberry Food Delivery Pte Ltd involves delivering lunch
boxes to its customers. In order to schedule the deliveries efficiently, the
managers needed to estimate the total travel time for the drivers of the company
on assignments which are carried out in the day.
It is deemed that the daily total travel time is dependent on the number of
delivery stops and distance travelled (in km). Data was thereby collected and
displayed in the following diagram:
Total Travel Time Number of Delivery Distance Travelled

Assignment in hours (Y) Stops (X1) in km (X2)
1 16.46 3 100
2 15.26 5 70
3 16.48 4 100
4 19.17 4 100
5 12.37 2 80
6 6.5 1 40
7 12.56 4 50
8 13.06 4 70
9 10.07 3 55
10 17.46 5 75
Multiple regression analysis is applied on Excel to produce the following report:

Use the Excel report to answer the following questions:
(a) Describe the linear relationship underlying the data by writing down the linear
equation that relates the Total Travel Time to the Number of Deliveries and
Distance Travelled. Interpret the coefficients obtained as they relate to the
managers’ scheduling problem.
(b) State the coefficient of determination and the adjusted coefficient of

determination. Interpret them and explain how they are different from each
other.
(c) A new route is planned on which there will be 3 delivery stops and the length
of the route is 85 km. Estimate the total travel time in hours.
(d) Conduct a global test of hypothesis on the regression model at the 5%

significance level to investigate whether or not all the independent variables
have zero regression coefficients. Lay down your steps carefully.
(e) Conduct individual t-test to determine which variable is insignificant and

needs to be removed from the regression equation.
(f) What are the assumptions made when the method of regression is applied to
study data? Discuss these assumptions in the context of the delivery scheduling
problem of Cranberry Food Delivery Pte Ltd.
Solutions to BUS105 Self-Practices
Question 1
a)
i. The above chart is a histogram.
ii. Total number of homes = 105

Number of homes that were at least $250,000 = 13 +7 +7 = 27
Hence, percentage of homes that were at least $250,000
= 27/105 x 100 = 25.7%
iii. From the chart, we can observe that it is positively skewed (or right-
skewed) since it had a longer tail to the right.
iv. The above is a histogram showing the selling price of 105 houses sold by
Ace Realty Company during last month. Generally, most houses were sold
at the range of $150,000 to $275,000. We observed that most houses sold
were at the price range of $175,000 to $200,000. Nevertheless, there
were also substantial number of houses sold above the price of above
$275,000. There were very few houses sold at a price that was below
$150,000. (answer to this question may vary)
b)
i. Measure of location is a single value that summarizes a set of data. It

locates the centre of the values. The three common measures of location
are mean, median and mode. In this case the mean is $221,102.9, median
is $213,600 and mode is $188,300.
ii. Measure of dispersion is a measure of the spread of data. A small value

for a measure of dispersion indicates that the data are clustered closely.
The common measures of dispersion are the range, variance, and
standard deviation. In this case, the range is $220,300, variance is
$22,218,919 and standard deviation is $47,105.4.
iii. In this case, the mean would be the most appropriate measure of location
since the standard deviation is small (about 20% of the mean).
Question 2
a)
300 3
(i) P(male)    0.6
500 5
110 11
(ii) P(female and likes coffee)    0.22
500 50
(iii) P(male or likes coffee)

= P(male) + P(likes coffee) - P(male and likes coffee)
300 340 230 41

     0.82
500 500 500 50
(iv) P(likes coffee given male)
230
 500  23  0.767
300 30
500
b)
(i) P(at least two houses) = 0.45 + 0.20 + 0.10 = 0.75
(ii) The average number of cars she can sell

= 0*0.1 + 1*0.15 + 2*0.45 + 3*0.20 + 4*0.1 = 2.05 houses
(iii) Variance = (0 - 2.05)2*0.1 + (1 - 2.05)2*0.15 + (2 - 2.05)2*0.45
+ (3 - 2.05)2*0.2 + (4 - 2.05)2*0.1 = 1.1475
Standard deviation = = 1.07 houses

c) Let X be the weight of a bag of rice
µ = 500, σ = 25
510  500
(i) P(X > 510)  P( Z  )
25
= P(Z > 0.4) = 0.5 - 0.1554 = 0.3446
(ii) X is normally distributed so is also normally distributed
490  500 515  500

P (490  X  515 )  P ( Z )
25 25
10 10
 P (1.26  Z  1.90 )
= 0.3962 + 0.4713 = 0.8675

Question 3
a)
(i) Using the Excel table,
Upper Limit of CI: 4.32 + 0.09332 = 4.4133

Lower Limit of CI: 4.32 - 0.09332 = 4.2269
Therefore, the 95% confidence interval for the mean lifespan of J-

phones is between 4.23 years and 4.41 years.
(ii) The confidence interval constructed above is an interval estimate of

the population mean lifespan of all J-phones. That means, we are
guessing that the population mean lifespan is between 4.23 years
and 4.41 years. As we are using a sample for estimation, this
particular interval may or may not contain the population mean.
However, if we repeat the measurements infinite number of times
with the same sample size then 95% of these intervals will contain
the population mean.
(iii) As we are using the t-distribution for the construction of the

confidence interval, we required that the sample formed is a
random sample and it was from a normally distributed population.
To ensure a random sample, the method of sampling is essential.

For example, the manufacturer could adopt a simple random
sampling or a systematic random sampling across time.
To verify normally, we could plot a histogram and observe shape of

the distribution. If the histogram shows a bell-shaped distribution
then we could infer that the actual population is normally distributed.
b) (i)
Let μ1: population mean of starting salary for Engineering graduates

μ2: population mean of starting salary for Business graduates
Claim: engineering graduates earn more than its business graduates

i.e. μ1 > μ2
Therefore, the hypothesis is
H0: μ1 - μ2 ≤ 0
H1: μ1 - μ2 > 0 (claim)
(right-tailed)
Significant level: α = 0.05
The two samples were independent. The sample variance were close enough to
infer equal population variance. Assuming that both populations were normally
distributed, we can perform a pooled t-test.
From the Excel table, the test statistics is 1.17. This gave rise to a p-value = 0.13
Decision Rule: Reject H0 if p-value < α
Conclusion:
Do not reject H0, cannot accept H1 since p-value = 0.13 > 0.05.
Therefore, we cannot accept that engineering graduates earn more than its
business graduates.
(ii) Statistical Concerns:
1. As the sample sizes were small, there is high chance for the sample to be
biased. Besides, those who replied may be those who are doing well hence they
are more forthcoming in responding to the survey. It may not be representative of
the cohort.
2. In order to conduct the pooled t-test, we need both populations to be normally

distributed. We need to verify this by plotting a histogram for each sample to see
whether there is a bell-shaped distribution.
(There may be other reasonable answers besides these)

Question 4
(a)
Estimated Mean Travel Time
= -0.19 + 1.53 (No. of Deliveries) + 0.12 (Distance Travelled)
Interpretation of Coefficients:
For every additional delivery, the estimated mean travel time is increased by 1.53
hours.
For every one additional km travelled, the estimated mean travel time is
increased by 0.12 hour.
(b) The coefficient of determination (R2) is 0.939 and the adjusted coefficient of
determination (adjusted R2) is 0.922.
This means 93.9% (or 92.2% respectively) of the variation in travelling time could
be explained by the variation of the independent variables - Number of Deliveries
and Distance Travel.
R2 will increase when a new independent variable is added to the LR model.

However, when a new variable is added, one degree of freedom is lost. To offset
for this, the original R2 formula is adjusted to penalize excessive use of
unimportant independent variables.
(c) Estimated Mean Travel Time

= -0.19 + 1.53 x 3 + 0.12 x 85
= 14.6
Therefore, the estimated mean travelling time is 14.6 hours for the new route.
(d) Global F-test
H0: β1 = β2 = 0
H1: Not all β's equal 0
where β1 = Coefficient of No. of Delivery Stops

β2 = Coefficient of Distance Travelled
α = 0.05
We will perform F-test.
According to statistical report, the test statistic is F = 54.0 and this resulted in a p-
value = 5.56 x 10-5.
Since p-value < 0.05, we reject H0 and accept H1.

Therefore, one or more of the independent variables is useful in explaining
differences in the dependent variable.
(e) Individual t-test
H0: β1 = 0 H0: β2 = 0
H1: β1 ≠ 0 H1: β2 ≠ 0
where β1 = Coefficient of No. of Delivery Stops

β2 = Coefficient of Distance Travelled
α = 0.05
We will perform individual t-tests.
According to statistical report,

the test statistic for X1 is t = 5.18 and this resulted in a p-value = 0.00128.
the test statistic for X2 is t = 6.84 and this resulted in a p-value = 0.00024.
Since both p-values are less than 0.05, we reject H0 and accept H1 for both
hypotheses.
Therefore, the independent variables Number of Delivery Stops and Distance
Travelled are both significant and do not need to be removed.
(f) Assumptions for the method of regression:
1. There is a linear relationship between the dependent variable and the set of
independent variables.
>> We can observe this through the scatter plots of Travel Time vs No. of
Deliveries and Travel Time vs km Travelled.
2. The variation in the residuals is the same for both large and small values of
the estimated dependent variable.
>> This can be determined by plotting Residual vs Estimated Mean Travel Time.
These residuals should be scattered randomly in an even, horizontal band
around 0 and show no obvious pattern.
3. The residuals follow the normal probability distribution
>> We can plot a histogram of the residual. We should observe that the
histogram do not fit well in the normal probability distribution.
4. The independent variables should not be correlated.
>> We can compute the VIF for each variable. It all the VIFs are less than 10,
we conclude that multicollinearity is not a concern for this model.
5. The residuals are independent.
>> The independence of the residuals is subject to the design of the study and
the way the data have been collected. For this case, if the dataset of 10 routes
were randomly selected and are independent, we may conclude that the
assumption is satisfied.

BUS105 Self-Practices

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BUS105 Self-Practices

Uploaded by

Copyright:

Available Formats

BUS105 Self-Practices

The self-practices are to assist students to have a feel on how to answer

i. Name the type of statistical chart used above.

ii. Determine the percentage of homes that were at least $250,000.

iii. Discuss the skewness observed in the chart.

iv. If this chart is included in a business report, provide a brief comment

i. Explain the meaning of measure of location and state the values of

iii. Discuss which measure of location would be most appropriate in this

Male 230 70 300

Female 110 90 200

TOTAL 340 160 500

If a person is randomly selected, what is the probability that the person

ii. is female and likes coffee,

iii. is male or likes coffee,

iv. likes coffee given that he is male?

Number of Houses sold, X Probability, P(X)

iii. Calculate the variance and standard deviation of the distribution.

i. If a bag of rice is randomly selected, what is the probability that its

(a) A manufacturer of mobile phones wishes to investigate the lifespan of its

Using Excel, the manufacturer created the following report:

i. Construct a 95% confidence interval for the mean lifespan.

ii. Provide an interpretation of this interval and explain the significant of

t-Test: Two-Sample Assuming Equal Variances

i. At the 0.05 significance level, conduct an appropriate hypothesis test

Total Travel Time Number of Delivery Distance Travelled

Multiple regression analysis is applied on Excel to produce the following report:

(b) State the coefficient of determination and the adjusted coefficient of

(d) Conduct a global test of hypothesis on the regression model at the 5%

(e) Conduct individual t-test to determine which variable is insignificant and

i. The above chart is a histogram.

ii. Total number of homes = 105

i. Measure of location is a single value that summarizes a set of data. It

ii. Measure of dispersion is a measure of the spread of data. A small value

(iii) P(male or likes coffee)

300 340 230 41

(iv) P(likes coffee given male)

(i) P(at least two houses) = 0.45 + 0.20 + 0.10 = 0.75

(ii) The average number of cars she can sell

(iii) Variance = (0 - 2.05)2*0.1 + (1 - 2.05)2*0.15 + (2 - 2.05)2*0.45

+ (3 - 2.05)2*0.2 + (4 - 2.05)2*0.1 = 1.1475

Standard deviation = = 1.07 houses

(ii) X is normally distributed so is also normally distributed

490  500 515  500

= 0.3962 + 0.4713 = 0.8675

(i) Using the Excel table,

Upper Limit of CI: 4.32 + 0.09332 = 4.4133

Therefore, the 95% confidence interval for the mean lifespan of J-

(ii) The confidence interval constructed above is an interval estimate of

(iii) As we are using the t-distribution for the construction of the

To ensure a random sample, the method of sampling is essential.

To verify normally, we could plot a histogram and observe shape of

Let μ1: population mean of starting salary for Engineering graduates

Claim: engineering graduates earn more than its business graduates

Therefore, the hypothesis is

Significant level: α = 0.05

Decision Rule: Reject H0 if p-value < α

(ii) Statistical Concerns:

2. In order to conduct the pooled t-test, we need both populations to be normally

(There may be other reasonable answers besides these)

R2 will increase when a new independent variable is added to the LR model.

(c) Estimated Mean Travel Time

where β1 = Coefficient of No. of Delivery Stops

We will perform F-test.

Since p-value < 0.05, we reject H0 and accept H1.

(iii) Variance = (0 - 2.05)20.1 + (1 - 2.05)20.15 + (2 - 2.05)2*0.45

+ (3 - 2.05)20.2 + (4 - 2.05)20.1 = 1.1475