You are on page 1of 17

BUS105 Self-Practices

The self-practices are to assist students to have a feel on how to answer


questions in the examination. This is in no wise an indication on the focus of the
examination. Students are encouraged to go through all materials and not rely
solely on these exercises for their study. Solutions to the self-practices were
provided to guide students on the way to answer questions.
Question 1

(a) The marketing director at Ace Realty Company has collected selling price
information on the houses sold in the last month for his study on the
market trend. Selling price is reported in thousands of dollars and the
following chart was constructed.

30
Number of homes sold

25
26
20
20
15
16
10 12 13

5 7 7
4
0
<150 [150,175) [175,200) [200,225) [225,250) [250,275) [275,300) >=300
Selling Price (Thousands S$)

i. Name the type of statistical chart used above.

ii. Determine the percentage of homes that were at least $250,000.

iii. Discuss the skewness observed in the chart.

iv. If this chart is included in a business report, provide a brief comment


that could be used to accompany this chart.
(b) The summary statistics of the selling prices produced by the Excel are
shown below:

i. Explain the meaning of measure of location and state the values of


three (3) measures of location shown in the above table.

ii. Explain the meaning of measure of dispersion and state the values of
three (3) measures of variation shown in the above table.

iii. Discuss which measure of location would be most appropriate in this


case.
Question 2

(a) The following table shows a recent study on the relationship between
gender and interest in coffee:

Does Not
Likes Coffee Like Coffee TOTAL

Male 230 70 300

Female 110 90 200

TOTAL 340 160 500

If a person is randomly selected, what is the probability that the person

i. is male,

ii. is female and likes coffee,

iii. is male or likes coffee,

iv. likes coffee given that he is male?


(b) Mary is a housing agent. Over the years, she has developed the following
probability distribution for the number of houses she expects to sell on a
typical week:

Number of Houses sold, X Probability, P(X)


0 0.10
1 0.15
2 0.45
3 0.20
4 0.10
More than 4 0.00

i. Compute the probability that Mary can sell at least two houses on a
typical week.

ii. How many houses on average can Mary sell on a typical week?

iii. Calculate the variance and standard deviation of the distribution.

(c) The weight of a bag of rice is normally distributed with mean of 500g and a
standard deviation of 25g.

i. If a bag of rice is randomly selected, what is the probability that its


weight is more than 510g?

ii. A box contains 10 such bags of rice. What is the probability that the
mean weight is between 490g and 515g?
Question 3

(a) A manufacturer of mobile phones wishes to investigate the lifespan of its


new model J-phone. A sample of 12 J-phones was monitored and the
lifespan (in years) was as follows:

4.15, 4.3, 4.25, 4.3, 4.3, 4.5, 4.5, 4.25, 4.45, 4.2, 4.1, 4.55

Using Excel, the manufacturer created the following report:

Lifespan

Mean 4.320833
Standard Error 0.042399
Median 4.3
Mode 4.3
Standard Deviation 0.146874
Sample Variance 0.021572
Kurtosis -1.12413
Skewness 0.256023
Range 0.45
Minimum 4.1
Maximum 4.55
Sum 51.85
Count 12
Confidence Level(95.0%) 0.093319

i. Construct a 95% confidence interval for the mean lifespan.

ii. Provide an interpretation of this interval and explain the significant of


95%.

iii. What are the assumptions required for the construction of the
confidence interval and how we could verify it.
(b) The Wawa University wanted to investigate whether it is true that its
engineering graduates earn more than its business graduates. As such,
the University conducted a graduate survey on its recently graduated
students. Unfortunately, due to poor response, only ten engineering and
eight business graduates responded. These students provided information
on their starting salary and Wawa proceeded with its analysis by
generating the following Excel table.

t-Test: Two-Sample Assuming Equal Variances

Engineering Business
Mean 30000 29000
Variance 4000000 2285714.286
Observations 10 8
Pooled Variance 3250000
Hypothesized Mean Difference 0
df 16
t Stat 1.169410692
P(T<=t) one-tail 0.129682486
t Critical one-tail 1.745883676
P(T<=t) two-tail 0.259364971
t Critical two-tail 2.119905299

i. At the 0.05 significance level, conduct an appropriate hypothesis test


to determine whether engineering graduates earn more than
business graduates?

ii. Discuss two (2) statistical concerns you might have in conducting
this statistical analysis.
Question 4

A major portion of Cranberry Food Delivery Pte Ltd involves delivering lunch
boxes to its customers. In order to schedule the deliveries efficiently, the
managers needed to estimate the total travel time for the drivers of the company
on assignments which are carried out in the day.
It is deemed that the daily total travel time is dependent on the number of
delivery stops and distance travelled (in km). Data was thereby collected and
displayed in the following diagram:

Total Travel Time Number of Delivery Distance Travelled


Assignment in hours (Y) Stops (X1) in km (X2)

1 16.46 3 100
2 15.26 5 70
3 16.48 4 100
4 19.17 4 100
5 12.37 2 80
6 6.5 1 40
7 12.56 4 50
8 13.06 4 70
9 10.07 3 55
10 17.46 5 75

Multiple regression analysis is applied on Excel to produce the following report:


Use the Excel report to answer the following questions:

(a) Describe the linear relationship underlying the data by writing down the linear
equation that relates the Total Travel Time to the Number of Deliveries and
Distance Travelled. Interpret the coefficients obtained as they relate to the
managers’ scheduling problem.

(b) State the coefficient of determination and the adjusted coefficient of


determination. Interpret them and explain how they are different from each
other.

(c) A new route is planned on which there will be 3 delivery stops and the length
of the route is 85 km. Estimate the total travel time in hours.

(d) Conduct a global test of hypothesis on the regression model at the 5%


significance level to investigate whether or not all the independent variables
have zero regression coefficients. Lay down your steps carefully.

(e) Conduct individual t-test to determine which variable is insignificant and


needs to be removed from the regression equation.

(f) What are the assumptions made when the method of regression is applied to
study data? Discuss these assumptions in the context of the delivery scheduling
problem of Cranberry Food Delivery Pte Ltd.
Solutions to BUS105 Self-Practices

Question 1

a)

i. The above chart is a histogram.

ii. Total number of homes = 105


Number of homes that were at least $250,000 = 13 +7 +7 = 27
Hence, percentage of homes that were at least $250,000
= 27/105 x 100 = 25.7%

iii. From the chart, we can observe that it is positively skewed (or right-
skewed) since it had a longer tail to the right.

iv. The above is a histogram showing the selling price of 105 houses sold by
Ace Realty Company during last month. Generally, most houses were sold
at the range of $150,000 to $275,000. We observed that most houses sold
were at the price range of $175,000 to $200,000. Nevertheless, there
were also substantial number of houses sold above the price of above
$275,000. There were very few houses sold at a price that was below
$150,000. (answer to this question may vary)

b)

i. Measure of location is a single value that summarizes a set of data. It


locates the centre of the values. The three common measures of location
are mean, median and mode. In this case the mean is $221,102.9, median
is $213,600 and mode is $188,300.

ii. Measure of dispersion is a measure of the spread of data. A small value


for a measure of dispersion indicates that the data are clustered closely.
The common measures of dispersion are the range, variance, and
standard deviation. In this case, the range is $220,300, variance is
$22,218,919 and standard deviation is $47,105.4.

iii. In this case, the mean would be the most appropriate measure of location
since the standard deviation is small (about 20% of the mean).
Question 2

a)

300 3
(i) P(male)    0.6
500 5

110 11
(ii) P(female and likes coffee)    0.22
500 50

(iii) P(male or likes coffee)


= P(male) + P(likes coffee) - P(male and likes coffee)

300 340 230 41


     0.82
500 500 500 50

(iv) P(likes coffee given male)

230
 500  23  0.767
300 30
500

b)

(i) P(at least two houses) = 0.45 + 0.20 + 0.10 = 0.75

(ii) The average number of cars she can sell


= 0*0.1 + 1*0.15 + 2*0.45 + 3*0.20 + 4*0.1 = 2.05 houses

(iii) Variance = (0 - 2.05)2*0.1 + (1 - 2.05)2*0.15 + (2 - 2.05)2*0.45

+ (3 - 2.05)2*0.2 + (4 - 2.05)2*0.1 = 1.1475

Standard deviation = = 1.07 houses


c) Let X be the weight of a bag of rice

µ = 500, σ = 25

510  500
(i) P(X > 510)  P( Z  )
25
= P(Z > 0.4) = 0.5 - 0.1554 = 0.3446

(ii) X is normally distributed so is also normally distributed

490  500 515  500


P (490  X  515 )  P ( Z )
25 25
10 10

 P (1.26  Z  1.90 )

= 0.3962 + 0.4713 = 0.8675


Question 3

a)

(i) Using the Excel table,

Upper Limit of CI: 4.32 + 0.09332 = 4.4133


Lower Limit of CI: 4.32 - 0.09332 = 4.2269

Therefore, the 95% confidence interval for the mean lifespan of J-


phones is between 4.23 years and 4.41 years.

(ii) The confidence interval constructed above is an interval estimate of


the population mean lifespan of all J-phones. That means, we are
guessing that the population mean lifespan is between 4.23 years
and 4.41 years. As we are using a sample for estimation, this
particular interval may or may not contain the population mean.
However, if we repeat the measurements infinite number of times
with the same sample size then 95% of these intervals will contain
the population mean.

(iii) As we are using the t-distribution for the construction of the


confidence interval, we required that the sample formed is a
random sample and it was from a normally distributed population.

To ensure a random sample, the method of sampling is essential.


For example, the manufacturer could adopt a simple random
sampling or a systematic random sampling across time.

To verify normally, we could plot a histogram and observe shape of


the distribution. If the histogram shows a bell-shaped distribution
then we could infer that the actual population is normally distributed.
b) (i)

Let μ1: population mean of starting salary for Engineering graduates


μ2: population mean of starting salary for Business graduates

Claim: engineering graduates earn more than its business graduates


i.e. μ1 > μ2

Therefore, the hypothesis is

H0: μ1 - μ2 ≤ 0
H1: μ1 - μ2 > 0 (claim)

(right-tailed)

Significant level: α = 0.05

The two samples were independent. The sample variance were close enough to
infer equal population variance. Assuming that both populations were normally
distributed, we can perform a pooled t-test.

From the Excel table, the test statistics is 1.17. This gave rise to a p-value = 0.13

Decision Rule: Reject H0 if p-value < α

Conclusion:

Do not reject H0, cannot accept H1 since p-value = 0.13 > 0.05.

Therefore, we cannot accept that engineering graduates earn more than its
business graduates.

(ii) Statistical Concerns:

1. As the sample sizes were small, there is high chance for the sample to be
biased. Besides, those who replied may be those who are doing well hence they
are more forthcoming in responding to the survey. It may not be representative of
the cohort.

2. In order to conduct the pooled t-test, we need both populations to be normally


distributed. We need to verify this by plotting a histogram for each sample to see
whether there is a bell-shaped distribution.

(There may be other reasonable answers besides these)


Question 4

(a)
Estimated Mean Travel Time
= -0.19 + 1.53 (No. of Deliveries) + 0.12 (Distance Travelled)

Interpretation of Coefficients:
For every additional delivery, the estimated mean travel time is increased by 1.53
hours.
For every one additional km travelled, the estimated mean travel time is
increased by 0.12 hour.

(b) The coefficient of determination (R2) is 0.939 and the adjusted coefficient of
determination (adjusted R2) is 0.922.
This means 93.9% (or 92.2% respectively) of the variation in travelling time could
be explained by the variation of the independent variables - Number of Deliveries
and Distance Travel.

R2 will increase when a new independent variable is added to the LR model.


However, when a new variable is added, one degree of freedom is lost. To offset
for this, the original R2 formula is adjusted to penalize excessive use of
unimportant independent variables.

(c) Estimated Mean Travel Time


= -0.19 + 1.53 x 3 + 0.12 x 85
= 14.6
Therefore, the estimated mean travelling time is 14.6 hours for the new route.
(d) Global F-test

H0: β1 = β2 = 0
H1: Not all β's equal 0

where β1 = Coefficient of No. of Delivery Stops


β2 = Coefficient of Distance Travelled

α = 0.05

We will perform F-test.

According to statistical report, the test statistic is F = 54.0 and this resulted in a p-
value = 5.56 x 10-5.

Since p-value < 0.05, we reject H0 and accept H1.


Therefore, one or more of the independent variables is useful in explaining
differences in the dependent variable.

(e) Individual t-test

H0: β1 = 0 H0: β2 = 0
H1: β1 ≠ 0 H1: β2 ≠ 0

where β1 = Coefficient of No. of Delivery Stops


β2 = Coefficient of Distance Travelled

α = 0.05

We will perform individual t-tests.

According to statistical report,


the test statistic for X1 is t = 5.18 and this resulted in a p-value = 0.00128.
the test statistic for X2 is t = 6.84 and this resulted in a p-value = 0.00024.

Since both p-values are less than 0.05, we reject H0 and accept H1 for both
hypotheses.
Therefore, the independent variables Number of Delivery Stops and Distance
Travelled are both significant and do not need to be removed.
(f) Assumptions for the method of regression:

1. There is a linear relationship between the dependent variable and the set of
independent variables.

>> We can observe this through the scatter plots of Travel Time vs No. of
Deliveries and Travel Time vs km Travelled.

2. The variation in the residuals is the same for both large and small values of
the estimated dependent variable.

>> This can be determined by plotting Residual vs Estimated Mean Travel Time.
These residuals should be scattered randomly in an even, horizontal band
around 0 and show no obvious pattern.

3. The residuals follow the normal probability distribution

>> We can plot a histogram of the residual. We should observe that the
histogram do not fit well in the normal probability distribution.

4. The independent variables should not be correlated.

>> We can compute the VIF for each variable. It all the VIFs are less than 10,
we conclude that multicollinearity is not a concern for this model.

5. The residuals are independent.

>> The independence of the residuals is subject to the design of the study and
the way the data have been collected. For this case, if the dataset of 10 routes
were randomly selected and are independent, we may conclude that the
assumption is satisfied.

You might also like