You are on page 1of 6

Full name: Mai Thanh My - student ID: 2153588

PART 1. DATA DESCRIPTION (30 points)


1) From the reality, collect a data set which satisfies following characteristics:
- having at least 30 elements/observations
- having at least one qualitative variable and one quantitative variable, and these two
variables must have a meaningful relationship (see the suggestion in the APPENDIX
1)
2) Implement the following:
a) Present all data in a table
No. Name Gender Age Education Devices Quantity
1 An Female 12 Grade 6 Samsung 1
2 Minh Male 19 Freshman Macbook 2
3 Hạ Female 10 Grade 4 Samsung 1
4 Khánh Male 15 Grade 9 iPhone 1
5 Thư Female 27 PhD Macbook 3
6 Khôi Male 11 Grade 5 iPad 1
7 Luật Male 26 Master Macbook 3
8 Như Female 18 Grade 12 iPhone 2
9 Tâm Male 21 Junior Macbook 2
10 Phương Female 9 Grade 3 Samsung 1
11 Khanh Male 20 Sophomore iPad 2
12 Huyền Female 19 Freshman Macbook 3
13 Phát Male 22 Senior Macbook 2
14 Thy Female 24 Bachelor Macbook 3
15 Quang Male 19 Freshman iPad 3
16 Ngân Female 14 Grade 8 iPhone 2
17 Bảo Male 29 Master Macbook 3
18 Lập Male 11 Grade 5 Samsung 2
19 Hạnh Female 15 Grade 9 iPad 2
20 Nguyên Male 16 Grade 10 iPhone 1
21 Trình Male 23 Bachelor Macbook 2
22 Nhã Female 10 Grade 4 Samsung 1
23 Thịnh Male 21 Junior Macbook 2
24 Đạt Male 11 Grade 5 iPhone 1
25 Hậu Male 19 Freshman Macbook 2
26 Nhật Male 22 Senior Macbook 3
27 Hân Female 28 PhD iPad 4
28 Thảo Female 24 Bachelor Macbook 2
29 Sang Male 17 Grade 11 iPhone 2
30 Uyên Female 14 Grade 8 iPhone 1

b) Give a short paragraph (40 to 100 words) to explain why these variables are related
Quantitative and qualitative data provide different outcomes, and are often used together to
get a full picture of a population. For example, the data above is collected on:
- The age (quantitative), the degree of education (qualitative) could also be gathered to
get more detail on the age for each type of education level.
- The quantity of study devices (quantitative), learning devices data (qualitative) could
also be gathered to get more detail on the quantity of study devices for each type of
learning device.

c) For the qualitative variable, construct a table including frequency distributions and percent
frequency. Give a short description for the table.
(Choose types of learning devices as the qualitative variable)
Categories Frequency Percentage
Samsung 5 16.6667
iPhone 7 23.3333
iPad 5 16.6667
Macbook 13 43.3333
-> Looking at the frequency distribution table reveals that 5 out of 30 students have Samsung,
7 have iPhone, 5 have iPad and 13 have Macbook.

d) For the quantitative variable, compute the descriptive statistics including mean, mode,
median, variance, standard deviation, and range (you may use Data Analysis tool pack, or
statistical functions in MS Excel). Give a short description for this variable.
(Choose the quantity of learning devices as the quantitative variable)
60
- Mean: x= =2
30
- Mode: the most frequent value=2
- Median¿ 2
N

- Variance
∑ ( x i−x )2 20 20
i=1
¿ = = ≈ 0.6897
n−1 30−1 29


N

∑ ( x i− x)2

- Standard deviation i=1 20
: s= =
≈ 0.8305
n−1 30−1
- Range: largest number−smallest number=4−1=3
-> The mean tells that in average, each student has 2 devices or 2 devices is the expected
value for each student.
-> The median reveals that the “central” quantity of device(s) is 2.
-> The mode illustrates the majority of students have 2 learning devices.
-> Range shows 3 devices in a set vary and the difference between highest (4 devices) and
lowest values (1 device).
-> Variance represents an actual value of 0.6897 in a data set that vary from the mean and
the spread between quantities in a data set.
-> Standard deviation measures how far apart quantities are in a data set, which is 0.8305.

e) Construct a pivot table in MS Excel:


o Qualitative variable in rows: count number of elements/observations
o Quantitative variable in column headings, the quantitative variable may group into
classes if there are too many values in the variable.

Count of Column
Quantity Labels
Row Labels 1 2 3 4 Grand Total
iPad 1 2 1 1 5
iPhone 4 3 7
Macbook 7 6 13
Samsung 4 1 5
Grand Total 9 13 7 1 30

o Give a comment for this pivot table


- All the unique values such as: iPad, iPhone, Macbook and Samsung are only
displayed one time in the first column of the pivot -> removes duplicates.
- Summarize, analyze data and represent the frequency of variables in the final column.

PART 2. PROBABILITY DISTRIBUTION (10 points)


A pizza store has recorded its daily sales in a district. The mean and standard deviation of
daily sales are $1200 and $180 respectively. Assume that the daily sales are normally
distributed. In the MS Excel:
a) In cell A1 of new sheet, key text Daily Sales. Then using Data > Data Analysis >
Random Number Generation, generate 100 random numbers for the daily sales from
cell A2.
b) Using Insert > Chart > Histogram, create a histogram chart for the data in Cell
A2:A101. Place the chart in the area of D2:H24. Give a chart title as Daily sales (like
the below image). Remember to format the horizontal axis with Number of bins = 10;
Decimal places=2
c) Using Insert > Chart > Box and Whisker, create a Box plot for the daily sales data.
Place the chart in the area of J2:N24. Give the chart title as Daily sales (like the above
image).
d) Give a short comment for the box plot.
e) Using COUNTIF function to calculate the probability that a random day has a sale
greater than $1400.
= COUNTIF(A2:A101,"> 1400") = 13
PART 3 – ESTIMATION AND HYPOTHESES TESTING (60 points)
Question 1 (20 points)
Complaints about rising prescription drug prices caused the U.S.Congress to consider laws
that would force pharmaceutical companies to offer prescription discounts to senior citizens
without drug benefits. The House Government Reform Committee provided data on the
prescription cost for some of the most widely used drugs. Assume the following data show a
sample of the prescription cost in dollars for Zocor, a drug used to lower cholesterol.
110 112 115 99 100 98 104 126
a. Calculate the sample mean and sample standard deviation
110+112+115+99+100+ 98+104+126
- Sample mean: x= =108
8


N

∑ ( x i− x)2

- Sample standard deviation: i=1 654
s= = ≈ 9.6658
n−1 8−1
b. Construct a 90% confidence interval estimate of the population mean cost for
prescription of Zocor
1−α=0.9 => α =0.1 ; d f =n−1=7
t α =t 0.05,7 =1.895 => E=t × s =1.895 × 9.6658 ≈ 6.4759
α
2
,df
2
, df √n √8
- Lower limit: x−E=108−6.4759=101.5241
- Upper limit: x + E=108+ 6.4759=1 14.4759
 101.5241< μ<114.4759
c. Construct a 95% confidence interval estimate of the population mean cost for
prescription of Zocor
1−α=0.95 => α =0. 05 ; d f =n−1=7
t α =t 0.02 5,7 =2.365 => E=t × s =2.365 × 9.6658 ≈ 8.0821
α
2
,df
2
, df √n √8
- Lower limit: x−E=108−8.0821=99.9179
- Upper limit: x + E=108+ 8.0821=11 6.0821
 99.9179< μ<116.0821
d. Discuss why the 90% and 95% confidence intervals are different.
Different  at 90% and 95% -> different t-value at 90% and 95% -> different
confidence intervals.
e. State the assumption about the population when construct the confidence interval in
part (b) and (c)
Level of significance is a statistical term for how willing you are to be wrong:
- With a 95% confidence interval, you have a 5% chance of being wrong.
- With a 90% confidence interval, you have a 10% chance of being wrong.

Question 2 (20 points)


Last year, the population mean earnings per share for financial services corporations
including American Express, E*TRADE Group, Goldman Sachs, and Merrill Lynch was $3.
This year, a sample of 10 financial services corporations provided the following earnings per
share data:
1.92 2.16 3.63 3.16 4.02 3.14 2.20 2.34 3.05 2.38
You want to determine whether or not the population mean earnings per share in this year
differ from $3 reported last year
a. State the null and the alternative hypotheses.

{
H 0 : μ=3
H 1 : μ≠ 3
b. Compute the standard error of the mean. Construct a 95% confidence interval for
population mean. Using the confidence interval, test whether or not the mean of the
population significantly differs from $3.
28
- Sample mean: x= =2.8
10


N

∑ (x i− x)2

- Standard deviation: i=1 4.417
s= = ≈ 0.7006
n−1 10−1
s
- Standard error of the mean ¿ ≈ 0.2215
√n
- Critical value: t α =t 0.025 ,9 =2.262
,df
2
s 0.7006
=> E=t α , df × =2 .262 × ≈ 0.5011
2 √n √10
- Lower limit: x−E=2. 8−0.5011=2.2988
- Upper limit: x + E=2.8+ 0.5011=3.3011
=> 2.2988< μ<3.3011 since the mean of population doesn’t differ from $3
c. Determine the test statistic and at 95% confidence, test whether or not the mean of the
population significantly differs from $3.
x−μ 2.8−3
- t-statistic: t= = ≈−0.9027
s / √ n 0.7006/ √ 10
- Critical value: t α ,df =t 0.025,9=2.262
2
s 0.7006
=> E=t α , df × =2.262 × ≈ 0.5011
2 √n √ 10
- Lower limit: x−E=2.8−0.5011=2.2988
- Upper limit: x + E=2.8+ 0.5011=3.3011
=> 2.2988< μ<3.3011 since the mean of population doesn’t differ from $3

Question 3 (20 points)


A group of young businesswomen wish to open a high fashion boutique in a vacant store but
only if the average income of households in the area is at least $25,000. A random sample of
9 households showed the following results.
$28,000 $24,000 $26,000 $25,000 $24,000
$23,000 $22,000 $21,000 $20,000
a. State the hypotheses for this problem.

{
H 0 : μ ≥ 25,000
H 1 : μ<25,000
b. Calculate the p-value and at 95% confidence, test the hypothesis and give the
conclusion?
- 1−α=0.95 => α =0.05
213000
- Sample mean: x= ≈ 23666.6667
9

N

∑ (x i− x)2

- Sample standard deviation: i=1 50000000
s= = =2500
n−1 9−1
x−μ 23666.6667−2500 0
- t-statistic: t= = =−1.6
s/√n 2500/ √ 9
 p-value < 0.00001
Since p < 0.00001, p <  = 0.05
 Reject H0
 There is no evidence to conclude the average income of households in the area is
at least $25,000

You might also like