Professional Documents
Culture Documents
____ 20 KBR00360
ASSIGNMENT
Educational Statistics
Course Code: 8614
1
Roll No.____CA654221 Registration No.____ 20 KBR00360
2
Roll No.____CA654221 Registration No.____ 20 KBR00360
ASSIGNMENT No.2
Question#01
What do you know about? An independent sample t-test. And
A paired sample t-test
3
Roll No.____CA654221 Registration No.____ 20 KBR00360
Paired t Tests whether the mean of If you measure the weight of male
the differences between college students before and after
dependent or paired each subject takes a weight-loss
observations is equal to a pill, is the mean weight loss
target value significant enough to conclude that
the pill works?
t-test in Tests whether the values Are high school SAT test scores
regression of coefficients in the significant predictors of college
output regression equation differ GPA?
significantly from zero
4
Roll No.____CA654221 Registration No.____ 20 KBR00360
How big is “big enough”? Every t-value has a p-value to go with it. A p-value is
the probability that the results from your sample data occurred by chance. P-
values are from 0% to 100%. They are usually written as a decimal. For example,
a p value of 5% is 0.05. Low p-values are good; They indicate your data did not
occur by chance. For example, a p-value of .01 means there is only a 1%
probability that the results from an experiment happened by chance. In most
cases, a p-value of 0.05 (5%) is accepted to mean the data is valid.
There are three main types of t-test:
• An Independent Samples t-test compares the means for two groups.
• A Paired sample t-test compares means from the same group at different
times (say, one year apart).
• A One sample t-test tests the mean of a single group against a known mean.
You probably don’t want to calculate the test by hand (the math can get very
messy, but if you insist you can find the steps for an independent samples t test
here.
Reference:
Ans No.1 is written from text book (8614) Unit No.6 Page
No.65,68,69
END Q#01
5
Roll No.____CA654221 Registration No.____ 20 KBR00360
Question#02
Why do we use regression analysis? Write down the types of
regression.
Regression model.
In simple linear regression, the model used to describe the relationship
between a single dependent variable y and a single independent variable x is y =
a0 + a1x + k. a0and a1 are referred to as the model parameters, and is a
probabilistic error term that accounts for the variability in y that cannot be
explained by the linear relationship with x. If the error term were not present,
6
Roll No.____CA654221 Registration No.____ 20 KBR00360
the model would be deterministic; in that case, knowledge of the value of x would
be sufficient to determine the value of y.
Correlation.
Correlation and regression analysis are related in the sense that both deal
with relationships among variables. The correlation coefficient is a measure of
linear association between two variables. Values of the correlation coefficient
are always between -1 and +1. A correlation coefficient of +1 indicates that two
variables are perfectly related in a positive linear sense, a correlation coefficient
of -1 indicates that two variables are perfectly related in a negative linear sense,
and a correlation coefficient of 0 indicates that there is no linear relationship
7
Roll No.____CA654221 Registration No.____ 20 KBR00360
between the two variables. For simple linear regression, the sample correlation
coefficient is the square root of the coefficient of determination, with the sign of
the correlation coefficient being the same as the sign of b1, the coefficient of x1
in the estimated regression equation.
Neither regression nor correlation analyses can be interpreted as
establishing cause-and-effect relationships. They can indicate only how or to
what extent variables are associated with each other. The correlation coefficient
measures only the degree of linear association between two variables. Any
conclusions about a cause-and-effect relationship must be based on the
judgment of the analyst.
Reference:
Ans No.2 is written from text book (8614) Unit No.7 Page
No.72,76-78
END Q#02
8
Roll No.____CA654221 Registration No.____ 20 KBR00360
Question#03
Write a short note on one way ANOVA. Write down main
assumptions underlying and way ANOVA.
9
Roll No.____CA654221 Registration No.____ 20 KBR00360
Example of ANOVA
Ventura is an FMCG company, selling a range of products. Its outlets have
been spread over the entire state. For administrative and planning purpose,
Ventura has sub-divided the state into four geographical-regions (Northern,
Eastern, Western and Southern). Random sample data of sales collected from
different outlets spread over the four geographical regions.
Variation, being a fundamental characteristics of data, would always be
present. Here, the total variation in the sales may be measured by the squared
sum of deviation from the mean sales. If we analyze the sources of variation in
the sales, in this case, we may identify two sources:
• Sales within a region would differ and this would be true for all four
regions (within-group variations)
• There might be impact of the regions and mean-sales of the four regions
would not be all the same i.e. there might be variation among regions
(between-group variations).
So, total variation present in the sample data may be partitioned into two
components: between-regions and within-regions and their magnitudes may be
compared to decide whether there is a substantial difference in the sales with
respect to regions. If the two variations are in close agreement, then there is no
reason to believe that sales are not same in all four regions and if not then it may
be concluded that there exists a substantial difference between some or all the
regions.
Here, it should be kept in mind that ANOVA is the partitioning of variation
as per the assignable causes and random component and by this partitioning
10
Roll No.____CA654221 Registration No.____ 20 KBR00360
11
Roll No.____CA654221 Registration No.____ 20 KBR00360
Conceptual Background
The fundamental concept behind the Analysis of Variance is “Linear Model”.
X1, X2,……….Xn are observable quantities. Here, all the values can be expressed
as:
Xi = µi + ei
Where µi is the true value which is because of some assignable causes and e i is
the error term which is because of random causes. Here, it has been assumed
that all error terms ei are independent distributed normal variate with mean zero
and common variance (σe2).
Further, true value µi can be assumed to be consist of a linear function of
t1, t2,…….tk, known as “effects”.
12
Roll No.____CA654221 Registration No.____ 20 KBR00360
13
Roll No.____CA654221 Registration No.____ 20 KBR00360
Total Sum of Square= Between Group Sum of square+ Within Group Sum of
square
TSS= SSB + SSE
Further, Mean Sum of Square may be given as:
MSB = SSB/(k-1) and MSE = SSE/(n-k),
where (k-1) is the degree of freedom (df) for SSB and (n-k) is the df for SSE.
Here, it should be noted that SSB and SSE added up to TSS and the
corresponding df’s (k-1) and (n-k) add up to total df (n-1) but MSB and MSE will
not be added up to Total MS.
This by partitioning TSS and total df into two components, we may be able
to test the hypothesis:
H0: µ1 = µ2=……….= µk
H1: Not all µ’s are same i.e. at least one µ is different from others.
or alternatively:
H0: α1 = α2=……….= αk =0
H1: Not all α’s are zero i.e. at least one α is different from zero.
MSE has always been an unbiased estimate of σe2 and if H0 is true then MSB
will also be an unbiased estimate of σe2.
Further MSB/ σe2 will follow Chi-square (χ2) distribution with(k-1) df and
MSE/ σe2 will follow Chi-square (χ2) distribution with(n-k) df. These two
χ2 distributions are independent so the ratio of two Chi-square (χ2) variate F=
MSB/MSE will follow variance-ratio distribution (F distribution) with (k-1), (n-k)
df.
Here, the test-statistic F is a right-tailed test (one-tailed Test). Accordingly,
p-value may be estimated to decide about reject/not able to reject of the null
hypothesis H0.
14
Roll No.____CA654221 Registration No.____ 20 KBR00360
If is H0 rejected i.e. all µ’s are not same then rejecting the null hypothesis
does not inform which group-means are different from others, So, Post-Hoc
Analysis is to be performed to identify which group-means are significantly
different from others. Post Hoc Test is in the form of multiple comparison by
testing equality of two group-means (two at a time) i.e. H0: µp = µq by using two-
group independent samples test or by comparing the difference between sample
means (two at a time) with the least significance difference (LSD)/critical
difference (CD)
= terror-df*MSE/ (1/np+1/nq )1/2
If observed difference between two means is greater than the LSD/CD then
the corresponding Null hypothesis is rejected at alpha level of significance.
15
Roll No.____CA654221 Registration No.____ 20 KBR00360
ANOVA vs T-test
We employ two-independent sample T-test to examine whether there
exists a significant difference in the means of two categories i.e. the two samples
have come from the same or different populations. The extension to it may be
applied to perform multiple T-tests (by taking two at a time) to examine the
significance of the difference in the means of k-samples in place of ANOVA. If
this is attempted, then the errors involved in the testing of hypothesis (type I and
type II error) can’t be estimated correctly and the value of type I error will be
much more than alpha (significance level). So, in this situation, ANOVA is always
preferred over multiple intendent samples T-tests.
Like, in our example we have four categories of the regions Northern (N),
Eastern (E), Western (W) and Southern (S). If we want to compare the population
means by using two-independent sample T-test i.e. by taking two categories
(groups) at a time. We have to make 4C2 = 6 number of comparisons i.e. six
independent samples tests have to performed (Tests comparing N with S, N with
E, N with W, E with W, E with S and W with S), Suppose we are using 5% level
of significance for the null hypothesis based on six individual T-tests, then type
I error will be = 1- (0.95)6 =1-0.735= 0.265 i.e. 26.5%.
Reference:
16
Roll No.____CA654221 Registration No.____ 20 KBR00360
Ans No.3 is written from text book (8614) Unit No.8 Page
No.82,86,87
END Q#03
Qurstion#04
What do you know about chi- square (x2) goodness
of fit test? Write down the procedure for goodness of
fit test.
A. Null hypothesis:
In Chi-Square goodness of fit test, the null hypothesis assumes that there is
no significant difference between the observed and the expected value.
17
Roll No.____CA654221 Registration No.____ 20 KBR00360
B. Alternative hypothesis:
In Chi-Square goodness of fit test, the alternative hypothesis assumes that
there is a significant difference between the observed and the expected value.
Compute the value of Chi-Square goodness of fit test using the following formula:
expected value
Degree of freedom:
In Chi-Square goodness of fit test, the degree of freedom depends on the
distribution of the sample. The following table shows the distribution and an
associated degree of freedom:
Hypothesis testing:
Hypothesis testing in Chi-Square goodness of fit test is the same as in other
tests, like t-test, ANOVA, etc. The calculated value of Chi-Square goodness of fit
test is compared with the table value. If the calculated value of Chi-Square
18
Roll No.____CA654221 Registration No.____ 20 KBR00360
goodness of fit test is greater than the table value, we will reject the null
hypothesis and conclude that there is a significant difference between the
observed and the expected frequency. If the calculated value of Chi-Square
goodness of fit test is less than the table value, we will accept the null hypothesis
and conclude that there is no significant difference between the observed and
expected value.
Reference:
Ans No.4 is written from text book (8614) Unit No.9
Page No.92-94
END Q#04
Question#05
What is chi-square (x2) independence test? Explain in detail.
19
Roll No.____CA654221 Registration No.____ 20 KBR00360
Let p1, p2, ..., pk denote the probabilities hypothesized for k possible
outcomes. In n independent trials, we let Y1, Y2, ..., Yk denote the observed
counts of each outcome which are to be compared to the expected counts np1,
np2, ..., npk. The chi-square test statistic is qk-1 =
Reject H0 if this value exceeds the upper critical value of the (k-1)
Reference:
20
Roll No.____CA654221 Registration No.____ 20 KBR00360
Ans No.5 is written from text book (8614) Unit No.9 Page
No.94
END Q#05
21