P. 1
UL(Inferences From Two Samples)-Salva

# UL(Inferences From Two Samples)-Salva

|Views: 0|Likes:
Applied Statistics- PhD
Applied Statistics- PhD

See more
See less

04/22/2013

pdf

text

original

# INFERENCES FROM TWO SAMPLES

SALVACION M. VINLUAN DR. IMELDA E. CUARTEL

UNIVERSITY OF LUZON
Doctor of Philosophy in Development Education

Professor

Definitions  Testing Two Means, Dependent Case: The Mean of the Differences  Testing of Two Variances  Testing of Two Means, Independent Case: The Differences of the Means  Testing Two Proportions

What is Statistical Inference?
This refers to the process of drawing conclusions from data that is subject to random variation, for example, observational errors or sampling variation.

such as observational errors. statistical induction and inferential statistics are used to describe systems of procedures that can be used to draw conclusions from datasets arising from systems affected by random variation.More substantially. random sampling. the term statistical inference. or random experimentation. .

Initial requirements of such a system of procedures for inference and induction are that the system should produce reasonable answers when applied to welldefined situations and that it should be general enough to be applied across a range of situations. .

The outcome of statistical inference may be an answer to the question “what should be done next?”. or about drawing conclusions before implementing some organizational or governmental policy. . where there might be a decision about making further experiments or surveys.

This particular method and process will help us determine how accurate and acceptable our generalizations are.Statistical inference refers to the statistical method concerned with making estimates of population value. .

Dependent Case: The Mean of the Difference There are two possible cases when testing population means. .Testing Two Means. the dependent case and the independent case.

The samples are supposedly taken from one population. For example.The t-Test for Dependent Samples The t-Test for dependent samples is applied to matched pairs or correlated samples. data were taken before and after their individual counseling sessions. in the research study on the degree of seriousness of problems encountered by college freshmen. .

This is also referred to as repeated measures. .From the 15-item problem checklist. the corresponding degree of seriousness of problems encountered by the students before and after their individual sessions comprises the data to be compared.

To compute the t-value for dependent samples. the formula is as follows where: t = t-value D = difference n = number of cases t = ∑D n∑D – (∑D) √ n-1 .

75 3.50 2.50 .60 3.45 2.60 After 4.75 3.25 3.00 3.50 3.50 3.00 2.75 2.Example: Student 1 2 3 4 5 6 7 8 9 10 Before 3.75 2.50 3.00 3.50 2.75 4.51 4.00 3.

Step 4: t-test for dependent samples t= ∑D n ∑D – (∑D) √ n-1 .05 (two-tailed test) Step 3: Reject the null hypothesis if the computed t-value is greater than the critical t-value of 1.05 with nine degrees of freedom.833 at α =0. Ho : P1 = P2 H1 : P1 ≠ P2 Step 2: α = 0.Solution: Step 1.

90 6.0625 4.50 3.50 ∑ D -1.50 2.10 0.00 3.0625 0.75 3.00 0.45 2.96 D2 1.25 -2.00 3.0000 0.0625 0.75 3.75 4.8100 7.2500 1.9476 .50 2.00 0.0000 0.50 1.70 0.75 4.4900 0.2100 0.01 0.0001 0.25 3.50 3.00 2.Student 1 2 3 4 5 6 7 8 9 10 Before (X1) 3.00 3.50 3.75 2.25 0.60 3.25 0.51 4.60 After(X2) 4.

1 10(7.96) t = 3.74 .t = ∑D n∑D – (∑D) √ n-1 t= √ 6.9496) .(6.96 10.

833 at α = 0. .The degree of freedom: df = 10-1 df = 9 Step 5: Reject the null hypothesis since the computed value of 3.74 is greater than the critical value of 1.05 with 9 degrees of freedom.

Conclusion: There is a significant difference between the degrees of seriousness of problems encountered by the students before and after their exposures to individual counseling sessions. .

Testing Two Variances A technique commonly used under F-test is referred the Analysis of Variance (ANOVA). since it deals with the ratio between the variability occurring among the different groups or treatments against the variability occurring within the members of each of the groups or treatments. .

Moreover. Groups or Treatments 1 2 3 … X11 X12 X13 … X21 X22 X23 … X31 X32 X33 … 1 2 Replicates 3 . m Totals Means Sum Squares n X1n X2n X3n Xm1 T1 X1 SS1 Xm2 T2 X2 SS2 Xm3 T3 X3 SS3 … … … … Xmn Tmn Xn SSn . . it is assumed that the different groups must have equal variances. The following is an example of the data layout for this test.

where: m = is the number of replicates n = is the number of groups or treatments Xjk = is the observation belonging to the jth replicate of the kth treatment or group 1≤ j ≤ m and 1 ≤ k ≤ n Tk = is the total of all the kth treatments from each of the m replicates.e. i. Tk=∑Xjk j=1 .

e. i.is the mean of the kth treatments from each of the m replicate.e. i. SSk = ∑Xjk Xk j=1 . Xk = ∑Xjk m SSk is the sum of the squares of all the kth treatments from each of the m replicates.

Ha: At least one mean is different from the other means. Ho: All the means of the different groups or treatments are the same.To perform the test of hypothesis comparing more than two sample means the following steps and needed information must be noted. .

The rejection criteria from this test is stated as follows: “Reject Ho is Fc > Fα(df1,df2). Do not reject Ho otherwise.” where: Fα(df1,df2) is the critical value, α is the level of significance, df1 = n-1, df2 = p-n, p = mn; m = number of replicates and n = number of groups or treatments. Fc is the computed test statistic.

In computing for the test statistic, we construct the ANOVA table as follows:
Analysis of Variance
Sources of Variation
Between Groups
Within Groups Total

df

Sum of Squares

Means Squares

F Ratio

df1 = n-1

SSB

MSB

Fc

df2 = p-n SSW P-1 SS MSW

where: SSB - sum of squares between groups SSW -sum of squares within groups MSB - mean squares between groups MSW -means squares within groups SST - total sum of squares Fc - test statistic n - number of groups or treatments m - number of replicates p - mn

SSB .CF k=1 m d) the sum of squares within groups (SSW) SSW = SST . given by n CF= ∑Tk P b) the total sum of squares (SST) SST=∑SSk-CF c) the sum of squares between groups (SSB) SSB = ∑ Tk .a) the correction factor (CF).

e) the mean squares between groups (MSB) MSB = SSB = SSB df1 n-1 f) the mean squares within groups (MSW) MSW = SSW = SSW df1 n-1 g) the Fc. referred to as the test statistic is defined by Fc = MSB MSW .

Example: Five bus companies were selected in order to determine if there is a difference in the number of hours travelling a 200 kilometer path from place A to place B at 5% level of significance. The data sets were given as follows: .

2 4.60 81.8 4.5 21.40 88.80 y 4.7 2.8 4.62 3.6 4.80 88.9 2.9 2.5 3.1 3.0 22.2 22.2 4.81 3.73 Totals Sum of squares Means .5 3.3 4.6 2.6 5.66 3.70 81.0 2.6 3.18 3.0 4.9 21.60 time of buses 1 2 3 4 5 6 2.6 3.1 3.4 3.70 83.83 3.1 3.3 21.88 3.2 2.0 3.62 4.8 4.u Travel Bus Companies v w x 3.8 5.

p=mn .Solution: Statement of Hypothesis: Ho: The five bus companies have the same traveling time in hours from place A to place B. Ha: At least one of the five bus company has different traveling time in hours. Critical Region and Criteria for rejection Level of significance: α = 5 Test: F-test df: df1= n-1= 5-1=4 df2= p-1= 30-5=25 .

80+21.2) 30 = 404.80 .40+21.25) Do not reject Ho otherwise Computation: Test Statistic: Fc CF= ∑Tk = (21.60) 30 30 = (110.05(4.70+22.70+22.Criteria for rejection: Reject the Ho is Fc > F0.

80 = 19.83+88.CF k=1 m = 21.80+21.19 .70+22.70+22.80 6 6 6 6 6 = 404.404.66).SST=∑SSk .60 .88+81.80 = 0.18+83.81+88.40+21.99 – 404.404.56 SSB= ∑ (Tk) .CF =(81.

56 – 0.MSB = 0.37 MSW = 19.7748 .0613 0.37 = 0.475 = 0.0475 5 -1 SSW = SST – SSB = 19.:F = 0.7748 30-5 .19 = 19.19 = 0.

37 0.56 .7748 29 19.Analysis of Variance Sources of variation Between Bus Company Within Bus Company Total df Sum of squares 0.19 Means squares 0.0475 F Ratio 4 0.0613 25 19.

the Ho is not rejected.25) = 2.76 Conclusion: Since the test statistic does not exceeds the critical value.F0. the traveling time in hours of the five bus companies are the same from place A to place B at 5% level of significance. . Thus.05(4.

Sections Student’s final grade Total SS Mean A B C 87 75 90 81 84 82 86 83 89 91 86 86 87 79 80 90 83 82 522 490 509 45.67 43.Example: Three teachers taught statistics to three sections. Can we say that there is a difference between the scores of each section? Test at 5% level of significance.476 87.265 84.096 81.83 . The final grades of six students per section and some descriptive were shown below.00 40.

Solution: Statement of Hypothesis: Ho: The three sections have the same mean final grades in Statistics Ha: At least one of the three sections has different mean final grades in Statistics Critical Region and Criteria for rejection Level of significance: α = 5% Test: F-test df: df1 = n-1= 3-1= 2 df2 = p-n =18-3 =15 .

05(2.Criteria for rejection: Reject Ho is Fc > F0.15) Do not reject Ho otherwise Computation: Test Statistic: Fc CF=∑Tk = (522+490+509) n 18 = (1521) 18 = 128.50 .524.

50 = 128.524.524.016.SST = ∑SSk – CF k=1 = (45.5 SSB= ∑ (Tk) .265)-128.17) – 128.CF k=1 m = 522 + 490 + 509 -128.67 + 43.837 – 128.524.476+40.5 = 86.524.5 6 6 6 = (45.5 = 128.50 = 312.34 .096+43.414 + 40.524.180.84 – 128.610.

MSB = 86.16 MSW = 226.1700 3-1 SSW = SST – SSB = 312.34 = 226.16 =15.: F = 43.34 = 43.5 – 86.8632 15.0773 .0773 18-3 .1700 = 2.

1700 F Ratio 2 2.8632 Within Sections Total 15 17 226.Analysis of Variance Sources of Variation Between Sections df Sum of Squares 86.0773 .16 312.50 15.34 Means Squares 43.

Thus.F0. the final grades of the students in the three sections are the same at 5% level of significance.15) = 3. we have to accept Ho.05(2.68 Conclusion: Since the test statistic does not exceeds the critical value. .

. the researcher is interested to compare the research skills of assistant professors and associate professors in a certain university. For example.Testing Two Means. Independent Case: The Difference of the Means The independent t-test is used when the two sample means are taken from separate groups of respondents or populations. The assistant professors comprise one group and the associate professors another group.

t = X1 –X2 S1 + S2 √ n1 n 2 where: X1 = X2 = S1 = S2 = n1 = n2 = Mean of the first group Mean of the second group Variance of the first group Variance of the second group Number of cases in the first group Number of cases in the second group .

Example: In a study conducted to determine the research skills of assistant and associate professors in state universities and colleges.75 4.50 4.00 Associate Prof.50 3.(X1) 3.(X2) 3.60 3.50 3.50 2.50 .50 4.0 4. the following data represent the mean scores of the professors: Assistant Prof.00 4.2 2.75 3.00 2.

. H1: There is a significant difference between the research skills of assistant and associate professors.Solution: Step 1: State the hypotheses Ho: There is no significant difference between the research skills of assistant and associate professors.

Step 4: Compute the value of the test statistics from the given data.782. .05.Step 2: Set the level of significance α = 0. two-tailed test Step 3: Reject Ho if the computed value is greater than 1.

75 S2 =0.20 2.26 X2 = 3.Asst.00 3.00 4.64 ___________ ∑X2 = 26.50 4.23 .05 X1 = 3.50 2.75 4. Professor (X2) 3.50 2.29 S1 = 0.00 __________ ∑X1 = 23.50 3.75 3.00 4. Professor (X1) 3.60 Assoc.50 3.50 4.

12 t = .29 -3.46 √.t = X1 .3143 .X2 S1 + S2 √ n2 n2 t = 3.1.23 √ 7 7 t = -.64 + .75 .

Since the computed t-value is lower than the critical tvalue. .value of 1.The degree of freedom: df = n1 + n2 – n2 df = 7 + 7 – 2 df = 14 – 2 df = 12 Step 5: In as much as the computed t.3143 is lower than the critical value. Thus. then there is no sufficient evidence to reject the null hypothesis. they manifest comparable research skills. Step 6: State the conclusion. it means that there is no significant difference between the research skills of assistant and associate professors.

67. . with a standard deviation of 8. Ten students were assigned per group.Example: A researcher wishes to compare the post-test performance of two groups of students classified according to their learning styles – assimilators and convergers. After their exposures to simplified instructional materials on Basic Statistics.26.92. Test the significant difference between the two group means. the assimilators‟ mean post-test performance was 20. with a standard deviation of 8. while the convergers‟ mean posttest performance was 26.

Solution: Step 1: Ho: There is no significant difference in the mean post-test performance of the two groups of students classified according to their learning styles. H1: There exists significant difference in the mean post-test performance of the two groups of students classified according to their learning styles. .

05 (two-tailed test) Step 3: Reject the null hypothesis (Ho) if the computed t-value is greater than the critical t-value of 1.05 with 18 degrees of freedom.734 at α = 0.Step 2: α = 0. Step 4: t-test for independent samples .

67 n1 = 10 Assimilators X2 = 20 S2 = 8.26 S1 = 8.t = X1 – X2 S1 + S2 √ n1 n2 Convergers X1 = 26.92 n2 = 10 .

759 t = 6.26 √1.71 .92 √ 10 10 t = 6.t = X1 – X2 S1 + S2 √ n1 n2 t = 26.26 1.33 t = 4.26 – 20 8.67 + 8.

734. Conclusion: There exists a significant difference between the mean post-test performance of the two groups of students classified according to their learning styles after their exposures to simplified instructional materials in Basic Statistics. .71 is greater than 1.The degree of freedom: df = n1+ n2 .2 df = 10 + 10 – 2 df = 20 – 2 df = 18 Step 5: Reject the null hypothesis since the computed t-value of 4.

Testing Two Proportions Remember that the normal distribution can be used to approximate the binomial distribution in certain cases. the approximation was considered good when np and nq were both at least 5. Now. Specifically. . so np and nq must be at least 5 for both samples. we're talking about two proportions.

We don't have a way to specifically test two proportions for values. It is the total number of successes divided by the total number of trials. So.  We will also be computing an average proportion and calling it pbar. The definitions which are necessary are shown to the right. much like the test for two means from independent populations.  . we will be looking at the difference of the proportions. what we have is the ability to test the difference between the proportions.

The test statistic is given. except the difference of proportions are used instead of a single proportion. . the difference of proportions has a normal distribution. and the value of pbar is used instead of p in the standard error portion.  Since we're using the normal approximation to the binomial. The test statistic used here is similar to that for a single population proportion.

Remember that to add fractions. but the correct simplification is not to simply place the product of p-bar and qbar over the sum of the n„s. . It can be simplified. you must have a common denominator. that is why this simplification is incorrect.Some people will be tempted to try to simplify the denominator of this test statistic incorrectly.

The correct simplification would be to factor a p-bar and q-bar out of the two expressions. because it is easier to calculate. . This is usually the formula given.

Comparing Two Proportions A commonly posed question is "Are two proportions different?" For example. This type of question involves comparing two proportions both of which are estimates. is the end-of-course passing rate this year significantly different from the end-ofcourse passing rate last year. . and thus the necessary formula will be different.

e. But this statistic will still be a z-statistic and we will compare it to the normal scores. The procedure involves the use of test statistic that involves two proportions.In this case we will develop the theory that allows us to apply the hypothesis testing procedure to this problem. z-scores. i. .

. The important point is that the difference of two sample proportions is an example of the difference of two normally distributed variables. This idea can readily be proven. Thus we can standardize them and obtain a a-score.Theoretical Background Given two independent variables that are both normal. their sum or difference will be normal distributed.

4.2. If you have two estimated proportions.For example.6 − 0.6 and the other a mean of 0.4 = 0. then you know their difference will have a mean of 1− 2 = 0. 1 and 22 and you happen to know that one has a mean of 0. . What we now need is the standard deviation of this difference.

04.  All this may seem complicated but the bottom line is that we know the distribution of ( 1− 2). The variance of the difference would be the sum of the two variances or 0.4 ⁄ 300 = 0. Thus we can calculate confidence intervals and do hypothesis tests on this variable.  .0008.Assuming that each estimate is based on 300 observations you would also be able to determine the variance of the difference.0016 = 0.0016. Since the variance of each estimated proportion is 0. Which means the standard deviation of the difference is √0.6×0.

p1 = p2. . then ( 1− 2) will have:  a normal distribution (i.e. If we assume that the underlying proportions from the two samples are the same.)  a mean = 0  an approximate variance = (1 − )( 1 ⁄ n1 + 1 ⁄ n2) where is the estimate of the proportions.Distribution of 1− 2 Let 1 is a proportion based on a sample size n1 and 2 is based on a sample of size n2. the variable can be standardized to a z-score.e. i. Assume that both sample sizes are large enough (>100) then we can assume that the estimates of p are normally distributes.

This fact will be useful to deriving a teststatistic for proportion. In this case 95% of the time the corresponding z-score should not deviate more than 1. This z-score also called the test statistics for comparing two proportions is: z= .96 units from 0.

(i.If we do not assume that the two proportions are the same then ( 1− 2) can estimate their difference.)  a mean.e. μ = p1 − p2  an approximate variance of σ² = 1(1− 1) ⁄ n1 + 2(1 − 2) ⁄ n2. . we can use zscores to find the error margin. In this case this expression will have:  a normal distribution.

96·σ .This means if we want to estimate the difference (or gap) between two proportions. p1 and p2 then the 95% confidence interval will be: ( 1 − 2) − 1.96·σ ≤ p1 − p2 ≤ ( 1 − 2) + 1.

Let p1 be the proportion of female students passing and p2be the proportion of male student.Example: The five. NC DPI ) STEP I: Set up two opposing hypotheses. 2008.step hypothesis test for two proportions. H0 : p1 = p2 VS Ha : p1 ≠ p2 . Example: Test the assertion that the EOG Mathematics test of female fifth grade students is no different than that of the male students. (Data for Asheville schools.

6%) female students passed the test and 119 of 142 (or 83. you still only have a sample since your population is all students that could potentially be taught in the Asheville school system.706 n2 = 142 2 = 0.8%) male students passed the test. # Passed Female Students Male Students All Students 89 119 208 Sample Size n1 = 126 1 = 0.838 n = 268 3 = 0. According to NC Department of Public Instruction 89 of 126 (or 70.STEP II: Get data. Even though all students in each grade are tested.776 Estimated Proportion .

STEP III: Decide on a statistical test. The statistic is: Z – statistic = Z.132 ⁄ √0. STEP IV: Calculate your test statistic.706 − 0. . We will use the two proportion z test given in this lesson.0026 ≈ − 2.stat = (0.838) ⁄ √0. in this case that estimator is . Every test has a formula that standardizes the estimator.587.776×0.224(1 ⁄ 126 + 1 ⁄ 142) Z-stat ≈ −0.

005 = 2.025 = 1. .96 or z0. Remember that the two critical values for the different levels of certainty are: z0. Conclusion: With 99% certainty we can state that there is a statistically significant larger proportion of boys passing the fifth grade EOC in mathematics in the Asheville school district.STEP V: Arrive at a conclusion and state it in clear English.576. Since |z-stat| >z* (the critical value) we reject the Null Hypothesis.

Thank you .

scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->