You are on page 1of 54

Inference for numerical data

By
Aparna Kumari
Aparna.kumari@nirmauni.ac.in
Statistical inference
• Statistical inference is the process of using data analysis to
infer properties of an underlying distribution of probability.[1]
• Inferential statistical analysis infers properties of
a population,
• Example: testing hypotheses and deriving estimates. It is
assumed that the observed data set is sampled from a larger
population.
Statistical Analysis

• Descriptive Statistics

• Inferential statistics
Sample Data in Statistics
•Independent Samples

•Paired/Dependent Samples
Paired data is
Paired Samples where natural
matching or
coupling is possible.

Figure : Line plot for Paired Data


Example of Paired Data

• An individual who completed 10 assignments earned 95% on his or her


test. (10, 95%)
• An individual who completed 5 assignments earned 80% on his or her test.
(5, 80%)
• An individual who completed 9 assignments earned 85% on his or her test.
(9, 85%)
• An individual who completed 2 assignments earned 50% on his or her test.
(2, 50%)
• An individual who completed 5 assignments earned 60% on his or her test.
(5, 60%)
• An individual who completed 3 assignments earned 70% on his or her test.
(3, 70%)
Practice
• If you are running a study on the average amount of time spent
looking at smartphones, and work with a randomly selected
sample of 10 men and 10 women.
• A Independent Sample
• B Paired Sample

• If your sample consists of 10 couples, and you compare the


amount of time a husband and wife spend looking at their phones
• A Independent Sample
• Paired Sample
Practice
• If you are running a study on the average amount of time spent
looking at smartphones, and work with a randomly selected
sample of 10 men and 10 women.
• A) Independent Sample
• B) Paired Sample

• If your sample consists of 10 couples, and you compare the


amount of time a husband and wife spend looking at their phones
• A) Independent Sample
• B) Paired Sample
Types of Paired Data
• Duplicate (double) measurements on the same samples,
meant to account for within-subject variability.
• Sequential measurements (pre-test/post-test). Also,
measuring some factor both before and after a time period
passes or before and after an intervention.
• Cross-over trials: individuals are randomly assigned to one of
two treatments and then afterward assigned to the second
treatment.
• Matched samples: where individuals are matched on similar or
identical personal characteristics, such as age and sex (this
method might be used to assign each test individual a control).
Analysing Paired Data
• The statistical techniques of correlation and regression are used
to analyzed paired data wherein the correlation
coefficient quantifies how closely the data lie along a straight
line and measures the strength of the linear relationship.
• Regression, on the other hand, is used for several applications
including determining which line fits best for our set of data.
This line can then, in turn, be used to estimate or
predict y values for values of x that were not part of our original
data set.
• There is a special type of graph that is especially well suited for
paired data called a scatterplot. In this type of graph, one
coordinate axis represents one quantity of the paired data while
the other coordinate axis represents the other quantity of the
paired data.
• A scatterplot for the above data would have the x-axis denote
the number of assignments turned in while the y-axis would
denote the scores on the unit test.
T-distribution
• The t-distribution, also known as Student’s t-distribution, is a way of
describing data that follow a bell curve when plotted on a graph, with
the greatest number of observations close to the mean and fewer
observations in the tails.
• It is a type of normal distribution used for smaller sample sizes,
where the variance in the data is unknown
T-distribution
• The t-distribution is used when data
are approximately normally
distributed, which means the data
follow a bell shape but the
population variance is unknown.
• The variance in a t-distribution is
estimated based on the degrees of
freedom of the data set (total
number of observations minus 1)
• This means that it gives a lower
probability to the center and a
higher probability to the tails than
the standard normal distribution..
One-Sample Test of Means

• A one sample test of means compares the mean of a sample to


a pre-specified value and tests for a deviation from that value.
• Example: we might know that the average birth weight for white
babies in the US is 3,410 grams and wish to compare the
average birth weight of a sample of black babies to this value.
If the standard deviation is known:

Using the significance level of 0.05, we reject the null hypothesis if z is greater
than 1.96 or less than -1.96.
If the standard deviation is unknown:

Using the significance level of 0.05, we reject the null hypothesis if |t| is greater
than the critical value from a t-distribution with df = n-1.
Note: The shaded area is referred to as
the critical region or rejection region.

We can also calculate a 95%


confidence interval around
the mean. The general form
for a confidence interval
around the mean, if σ is
unknown, is

For a two-sided 95% confidence interval, use the table of the t-distribution (found at the end
of the section) to select the appropriate critical value of t for the two-sided α=0.05.
Example
A company wants to improve sales. Past sales data indicate that
the average sale was $100 per transaction. After training your
sales force, recent sales data (taken from a sample of 25
salesmen) indicates an average sale of $130, with a standard
deviation of $15. Did the training work? Test your hypothesis at a
5% alpha level.
Hypothesis Testing using t-Test
• Step 1: Write your null hypothesis statement .The accepted hypothesis is that there is no
difference in sales, so:
H0: μ = $100.

• Step 2: Write your alternate hypothesis. This is the one you’re testing in the one sample t
test. You think that there is a difference (that the mean sales increased), so:
H1: μ > $100.

• Step 3: Identify the following pieces of information you’ll need to calculate the test
statistic. The question should give you these items:

1.The sample mean(x̄). This is given in the question as $130.


2.The population mean(μ). Given as $100 (from past data).
3.The sample standard deviation(s) = $15.
4.Number of observations(n) = 25.
• Step 4: Insert the items from above into the t score formula.

t = (130 – 100) / ((15 / √(25))
t = (30 / 3) = 10
This is your calculated t-value.

• Step 5: Find the t-table value. You need two values to find this:
1.The alpha level: given as 5% in the question.
2.The degrees of freedom, which is the number of items in the sample
(n) minus 1: 25 – 1 = 24.

• Look up 24 degrees of freedom in the left column and 0.05 in the top
row. The intersection is 1.711. This is your one-tailed critical t-value.
• What this critical value means in a one tailed t test, is that we would
expect most values to fall under 1.711. If our calculated t-value (from
Step 4) falls within this range, the null hypothesis is likely true.
• Step 5: Compare Step 4 to Step 5. The value from Step 4 does
not fall into the range calculated in Step 5, so we can reject the
null hypothesis. The value of 10 falls into the rejection region
(the left tail).
• In other words, it’s highly likely that the mean sale is greater.
The one sample t test has told us that sales training was
probably a success.
Practice Problem

*Note : use Two Tailed Student's t-Distribution Table


Here, t-value is < T-critical (v) , so null hypothesis is accepted
Practice Problem

*Note : use Two Tailed Student's t-Distribution Table


ANOVA (ANalysis Of VAriance)

• Mathematically, ANOVA can be written as:


x ij = μ i + ε ij

where x are the individual data points (i and j denote the group and
the individual observation), ε is the unexplained variation and the
parameters of the model (μ) are the population means of each
group. Thus, each data point (xij) is its group mean plus error.

• It is procedure followed by statisticians to check the potential


difference between scale-level dependent variable by a nominal-
level variable having two or more categories. It was developed by
Ronald Fisher in 1918 and it extends t-test and z-test which
compares only nominal level variable to have just two categories.
Types of ANOVA
• ANOVAs are majorly of three types:
• One-way ANOVA - One-way ANOVA have only one independent
variable and refers to numbers in this variable. For example, to
assess differences in IQ by country, you can have 1, 2, and more
countries data to compare.
• Two-way ANOVA - Two way ANOVA uses two independent
variables. For example, to access differences in IQ by country
(variable 1) and gender(variable 2). Here you can examine the
interaction between two independent variables. Such Interactions
may indicate that differences in IQ is not uniform across a
independent variable. For examples females may have higher IQ
score over males and have very high score over males in Europe
than in America.
• N-way or Multivariate ANOVA - N-way ANOVA have multiple
independent variables. For example, to assess differences in IQ by
country, gender, age etc. simultaneously, N-way ANOVA is to be
deployed.
• Following special kind of ANOVAs can be used to handle
unbalanced groups.
• Hierarchical approach(Type 1) -If data was not intentionally
unbalanced and has some type of hierarchy between the factors.
• Classical experimental approach(Type 2) - If data was not
intentionally unbalanced and has no hierarchy between the factors.
• Full Regression approach(Type 3) - If data was intentionally
unbalanced because of population.
ANOVA Test Procedure

Following are the general steps to carry out ANOVA.


• Setup null and alternative hypothesis where null hypothesis states
that there is no significant difference among the groups. And
alternative hypothesis assumes that there is a significant difference
among the groups.
• Calculate F-ratio and probability of F.
• Calculate Degrees of Freedom (df)
• Compare p-value of the F-ratio with the established alpha or
significance level.
• If p-value of F is less than 0.5 then reject the null hypothesis.
• If null hypothesis is rejected, conclude that mean of groups are not
equal.
Calculation of the F ratio
• Step 1: Variation between groups

• The between-group variation (or between-group sums of squares, SS)


is calculated by comparing the mean of each group with the overall
mean of the data.
• Step 2: Variation within groups

• The within-group variation (or the within-group sums of


squares) is the variation of each observation from its group
mean.
The F ratio
• The F ratio is then calculated as:

If the average difference between groups is similar to that within groups, the F ratio is about 1. As
the average difference between groups becomes greater than that within groups, the F ratio
becomes larger than 1.

To obtain a P-value, it can be tested against the F-distribution of a random variable with the degrees
of freedom associated with the numerator and denominator of the ratio. The P-value is the probably
of getting that F ratio or a greater one. Larger F-ratios gives smaller P-values.
ANOVA – Degrees of Freedom
• Within groups comparisons:
• df=n-k
Example: n=75, k=3
• Between groups comparisons: • Within groups comparisons: df =
• df=k-1
75 – 3 = 72
• Total df: • Between groups comparisons: df =
• df=n-1 3 – 1= 2
• n = total number of • Total df: df = 75 – 1 = 74
individuals
• k = number of groups
ANOVA – Output (Example)
Example
Example
Example
F-Table for 5% significance level
Practice Problem
• A researcher conducted a study to investigate the effect of
teaching method on the reading ability of schoolchildren.
Fifteen children were randomly allocated to one of three
teaching methods and their reading ability measured on a
validated score. The ANOVA table below shows that the F-
ratio is 52.022, which gives a p-value of P<0.001 (with
degrees of freedom 2 and 42). Figure 1 present the data,
showing the between-group variation and within-group
variation.
Here, p< 0.001, hence, Reject Null hypothesis.
References
• https://brownmath.com/stat/anova1.htm
• OneWayANOVA_LectureNotes.pdf (westga.edu)
• https://learning.edanz.com/research-manuscript-figures/

You might also like