Professional Documents
Culture Documents
Unit5 - Inference For Numerical Data
Unit5 - Inference For Numerical Data
By
Aparna Kumari
Aparna.kumari@nirmauni.ac.in
Statistical inference
• Statistical inference is the process of using data analysis to
infer properties of an underlying distribution of probability.[1]
• Inferential statistical analysis infers properties of
a population,
• Example: testing hypotheses and deriving estimates. It is
assumed that the observed data set is sampled from a larger
population.
Statistical Analysis
• Descriptive Statistics
• Inferential statistics
Sample Data in Statistics
•Independent Samples
•Paired/Dependent Samples
Paired data is
Paired Samples where natural
matching or
coupling is possible.
Using the significance level of 0.05, we reject the null hypothesis if z is greater
than 1.96 or less than -1.96.
If the standard deviation is unknown:
Using the significance level of 0.05, we reject the null hypothesis if |t| is greater
than the critical value from a t-distribution with df = n-1.
Note: The shaded area is referred to as
the critical region or rejection region.
For a two-sided 95% confidence interval, use the table of the t-distribution (found at the end
of the section) to select the appropriate critical value of t for the two-sided α=0.05.
Example
A company wants to improve sales. Past sales data indicate that
the average sale was $100 per transaction. After training your
sales force, recent sales data (taken from a sample of 25
salesmen) indicates an average sale of $130, with a standard
deviation of $15. Did the training work? Test your hypothesis at a
5% alpha level.
Hypothesis Testing using t-Test
• Step 1: Write your null hypothesis statement .The accepted hypothesis is that there is no
difference in sales, so:
H0: μ = $100.
• Step 2: Write your alternate hypothesis. This is the one you’re testing in the one sample t
test. You think that there is a difference (that the mean sales increased), so:
H1: μ > $100.
• Step 3: Identify the following pieces of information you’ll need to calculate the test
statistic. The question should give you these items:
• Step 5: Find the t-table value. You need two values to find this:
1.The alpha level: given as 5% in the question.
2.The degrees of freedom, which is the number of items in the sample
(n) minus 1: 25 – 1 = 24.
• Look up 24 degrees of freedom in the left column and 0.05 in the top
row. The intersection is 1.711. This is your one-tailed critical t-value.
• What this critical value means in a one tailed t test, is that we would
expect most values to fall under 1.711. If our calculated t-value (from
Step 4) falls within this range, the null hypothesis is likely true.
• Step 5: Compare Step 4 to Step 5. The value from Step 4 does
not fall into the range calculated in Step 5, so we can reject the
null hypothesis. The value of 10 falls into the rejection region
(the left tail).
• In other words, it’s highly likely that the mean sale is greater.
The one sample t test has told us that sales training was
probably a success.
Practice Problem
where x are the individual data points (i and j denote the group and
the individual observation), ε is the unexplained variation and the
parameters of the model (μ) are the population means of each
group. Thus, each data point (xij) is its group mean plus error.
If the average difference between groups is similar to that within groups, the F ratio is about 1. As
the average difference between groups becomes greater than that within groups, the F ratio
becomes larger than 1.
To obtain a P-value, it can be tested against the F-distribution of a random variable with the degrees
of freedom associated with the numerator and denominator of the ratio. The P-value is the probably
of getting that F ratio or a greater one. Larger F-ratios gives smaller P-values.
ANOVA – Degrees of Freedom
• Within groups comparisons:
• df=n-k
Example: n=75, k=3
• Between groups comparisons: • Within groups comparisons: df =
• df=k-1
75 – 3 = 72
• Total df: • Between groups comparisons: df =
• df=n-1 3 – 1= 2
• n = total number of • Total df: df = 75 – 1 = 74
individuals
• k = number of groups
ANOVA – Output (Example)
Example
Example
Example
F-Table for 5% significance level
Practice Problem
• A researcher conducted a study to investigate the effect of
teaching method on the reading ability of schoolchildren.
Fifteen children were randomly allocated to one of three
teaching methods and their reading ability measured on a
validated score. The ANOVA table below shows that the F-
ratio is 52.022, which gives a p-value of P<0.001 (with
degrees of freedom 2 and 42). Figure 1 present the data,
showing the between-group variation and within-group
variation.
Here, p< 0.001, hence, Reject Null hypothesis.
References
• https://brownmath.com/stat/anova1.htm
• OneWayANOVA_LectureNotes.pdf (westga.edu)
• https://learning.edanz.com/research-manuscript-figures/