You are on page 1of 5

In this assignment, for part 1, there are a total of 5 types of population that is

generated using Excel. Firstly, the number of rows and columns is labelled. Type the
first number of your row or column, which is 1 in a certain cell, then using the tool
“Fill”, followed by “Series”, rows of 10 and columns of 50 will be generated by filling
in the step value as 1 for both, and stop value as 10 for row, and 50 for column.

A random variable is generated using the norm.inv function for normal


distribution and chisq.inv function for chi-squared distribution, where the specific
formula to generate random sample which can have the following five distributions is
listed down below:

Distribution Formula

N(10,2) =NORM.INV(RAND(),10,SQRT(2))

N(10,4) =NORM.INV(RAND(),10,SQRT(4))

X(1) =CHISQ.INV(RAND(),1)

X(10) =CHISQ.INV(RAND(),10)

X(30) =CHISQ.INV(RAND(),30)
    
After the first cell is generated, the following cells will be filled up by dragging
the formula across the columns in order to have random variables. Then find Mean
using the formula =AVERAGE(number1, [number2], …). After that, find Variance
using the formula =VAR.P(number1, [number2], …). Before proceeding to the step,
make sure the mean and variance is close to the value that is needed, for example:

Distribution Mean Variance

Chi-squared degree of 10 20
freedom 10

And, if the values are not close to the desired value, then at the “FORMULAS” bar,
use the tool “CALCULATE NOW” to regenerate the data until the mean and variance
are satisfying. Once satisfied, copy the values of data which are satisfying and paste
value. By using the “Paste value” tool, only the value of the copied range without
formulas and formatting will be pasted, so that the number of cells is allowed to be
extracted. After that, the range of data in the population can be determined using the
formula, to find minimum value with the formula =MIN(number1, [number2], …) and
in contrast to find maximum by using the formula =MAX(number1, [number2], …).
Lastly, the upper limit is listed, for example: 

minimum 5.198212

maximum 13.86957
Upper
limit

10

11

12

13

Next, the probability distribution of each distribution is illustrated graphically


using histogram, the following step is used in Microsoft excel to generate histogram.

Step 1: Click on “data”


Step 2: Click on “data analysis”
Step 3: Select “histogram”, click “ok”. Select all the data generated as the input
range and select bin range, where the bin range is the upper limit of the class
interval, then select the empty cell beside the bin range for the output range followed
by selecting “chart output”, and click “ok”.
Step 4: Calculate the probability which is frequency divide by total frequency, which
is 500 in this study.
Step 5: Calculate the total frequency using the formula =sum(number1, [number2],
…) then pull the formula across to obtain the total probability.
Step 6: Right click on the histogram bar and select “Select Data”. Change the “chart
data range” to include the probability calculated in step 3. Remove the “Series 1”,
which is frequency in the “Legend Entries(Series)” then click “ok”.
Step 7: Change the bin value in the table into an interval.
Step 8: In the graph, change the title of the graph to the respective distribution. (For
example: Normal distribution with mean 10 and variance 2 etc.) Change the label of
the x-axis and the y-axis to probability and random variable respectively.
Step 9: Double-click on the bar in the chart, make sure all the bars are selected. In
the “series options” bar, reduce the gap width until 0%, which is no gap at all. In the
“fill” option, select “no fill”. Next for the border, select “solid line” and the colour as
black.
In part 2, a random sample consisting a sample size, n=10 observations is
generated from the distribution. In order to obtain the data from a population given in
part 1, the index function is used, where the syntax is =INDEX(array, row_num,
[column num]), this function is used to obtain random row numbers and column
numbers, CEILING.MATH(RAND()*number of row),CEILING.MATH(RAND()*number
of column)), where ceiling.math is to round up the value to the next integer. The
dollar sign fixes the reference to a given cell, so that it remains unchanged no matter
where the formula is moved. In other words, using the $ sign in cell references
allows you to copy the formula in Excel without changing any references. Next, drag
the formula across the 10 columns to generate sample size 10.

For a 95% confidence interval, Zα/2  =Z0.025 .From the normal distribution
table ,we can find the value of the z-score by finding p-value which is 0.975.  The
value of Z-score=1.96. Next put the sample size, n=10, The formula used to
calculate the confidence interval is ‾X ± Z(S ÷ √n), where ‾X= sample mean,
(formula), and S= variance,(formula).

Find the sample mean by using the formula =AVERAGE(number1, [number2],


…). Next up, Find the sample variance by using the formula =VAR.p(number1,
[number2], …) and along with finding the unbiased estimate for the population, 𝜎2-
capped by using the formula =n/(n-1)*sample variance. The next step is to find lower
limit of the confidence interval by using the formula =Mean - z score*SQRT(𝜎2-
capped/n) and upper limit by using the formula =Mean + z
score*SQRT(𝜎2-capped/n). Lastly, to determine whether the population mean falls
between upper and lower value of the confidence interval, the formula used is
=IF(AND(logical1,[logical2]), [value_if_true], [value_if_false]), the value if true is
labeled as “YES”, and the value if false is labeled as “NO”.

The coverage probability of the confidence interval for population mean is


calculated by using the proportion of the interval that contain the true mean. For
confidence interval which contain population mean, calculate the frequency by using
the formula =COUNTIF(range, “YES”) and calculate the proportion by using the
formula  =frequency of “YES”/200. On the contrary, for confidence interval which do
not contain population mean, calculate the frequency by using the formula
=COUNTIF(range, “NO”) and calculate the proportion by using the formula
=frequency of “NO”/200. Next, calculate the total frequency by using the formula
=SUM(frequency of “YES”:frequency of “NO”) to ensure the sum of the frequency
equal to 200 and followed by calculating the total proportion by using the formula
=SUM(proportion of “YES”:proportion of “NO”) to ensure the sum of the proportion
equal to 1.

The steps above are repeated to generate sample size 30 and 50. For
different distribution, to determine whether the data that falls between the
upper and lower value in true mean of population, the following formulae are
used.
Contain µ Formula 

i. Normal distribution with mean 10 and =IF(AND(10>lower limit, 10<upper


variance 2 limit),”YES”,”NO”)

ii. Normal distribution with mean 10 =IF(AND(10>lower limit, 10<upper


and variance 4 limit),”YES”,”NO”)

iii. Chi-squared distribution with 1 =IF(AND(1>lower limit , 1<upper limit ),


degree of freedom “YES”,”NO”)

iv. Chi-squared distribution with 10 =IF(AND(10>lower limit , 10<upper limit ),


degree of freedom  “YES”,”NO”)

v. Chi-squared distribution with 30 =IF(AND(30>lower limit , 30<upper limit ),


degree of freedom  “YES”,”NO”)

For part 3, tabulate the data based on the sample size and the coverage
probability for each distribution in a table in accordance to its type of distribution,
which is normal distribution and chi-squared distribution to ease the comparisons of
the distributions. The relationship between the sample size and the coverage
probability is observed. Besides that, the relationship between the underlying
population distribution and the coverage probability is observed in this part.
Introduction

Sampling distribution is the probability distribution of a given random-sample-


based statistic. It is constructed based on a normal distribution. Theoretically, when a
sample is sampled from a normal distribution, the sample mean is taken straight
away since the sampling distribution of the sampling mean is normal. When the
distribution is not normal, and the sample size is sufficiently large, then according to
central limit theorem, the sampling distribution of the sample mean is approximately
normal, making it follow the normal distribution. The formula for the sampling
2
σ
distribution is . Also, a confidence interval (CI) is a range of estimates for an
√n
unknown parameter. In this assignment, it is constructed based on an unknown
variance. In this case we use the sample variance, σ 2as an estimate of the
population variance. In statistics, the coverage probability is defined as a technique
for calculating a confidence interval which is the proportion of the time that the
interval contains the true value of interest.

For normal distribution which is a type of continuous probability distribution for a


real-valued random variable, as the mean increases, the variance shall increase but
the shape of its distribution is bell shaped, also known as being approximately
symmetrical. For chi-squared distribution, a continuous probability distribution that is
used in many hypothesis tests. In theory, as the degrees of freedom increases, the
Chi Square distribution approaches a normal distribution. Although, when it comes
chi-squared distribution with degree of freedom 1, this theory would not be applied,
central limit theorem cannot be used, its sample size is only 10, which is small, it is not
more than 30, so it is not valid, hence the coverage probability very low compared to interval.
It will be positively skewed.

The objectives of this assignment are as listed down below:


(i) To investigate the effect of sample size on coverage probability.
(ii) To determine the effect of underlying distribution on coverage probability.
(iii) To identify the relationship between coverage probability and confidence level.

You might also like