You are on page 1of 7

Sample designing and

T Test
MARKETING RESEARCH
WHAT IS A T-TEST ?
• A t-test is a type of inferential statistic used to determine if there is a
significant difference between the means of two groups, which may
be related in certain features. It is mostly used when the data sets,
like the data set recorded as the outcome from flipping a coin 100
times, would follow a normal distribution and may have unknown
variances. A t-test is used as a hypothesis testing tool, which allows
testing of an assumption applicable to a population.
• A t-test looks at the t-statistic, the t-distribution values, and the
degrees of freedom to determine the statistical significance. 
Calculating T-Tests
• T-Distribution Tables-The T-Distribution Table is available in one-tail and two-tails
 formats. The former is used for assessing cases which have a fixed value or range
with a clear direction (positive or negative). For instance, what is the probability
of output value remaining below -3, or getting more than seven when rolling a
pair of dice? The latter is used for range bound analysis, such as asking if the
coordinates fall between -2 and +2
• T-Values- The t-value is a ratio of the difference between the mean of the two
sample sets and the variation that exists within the sample sets.
• Degrees of Freedom- Degrees of freedom refers to the values in a study that has
the freedom to vary and are essential for assessing the importance and the
validity of the null hypothesis. Computation of these values usually depends upon
the number of data records available in the sample set.
Hypotheses

There are two kinds of hypotheses for a one sample t-test, the null hypothesis and the alternative hypothesis. The alternative
hypothesis assumes that some difference exists between the true mean (μ) and the comparison value (m0),
whereas the null hypothesis assumes that no difference exists. The purpose of the one sample t-test is to determine if the null
hypothesis should be rejected, given the sample data. The alternative hypothesis can assume one of three
forms depending on the question being asked. If the goal is to measure any difference,
regardless of direction, a two-tailed hypothesis is used. If the direction of the difference between the sample mean and the
comparison value matters, either an upper-tailed or lower-tailed hypothesis is used.
The null hypothesis remains the same for each type of one sample t-test. The hypotheses are formally defined below:
•• The null hypothesis (H0) assumes that the difference between the true mean (μ) and the comparison value (m0) is equal to zero.
•• The two-tailed alternative hypothesis (H1) assumes that the difference between the true mean (μ) and the comparison value (m0)
is not equal to zero.
•• The upper-tailed alternative hypothesis (H1) assumes that the true mean (μ) of the sample is greater than the comparison value
(m0).
•• The lower-tailed alternative hypothesis (H1) assumes that the true mean (μ) of the sample is less than the comparison value (m0).
The mathematical representations of the null and alternative hypotheses are defined below:
•H0: μ = m0
•H1: μ ≠ m0    (two-tailed)
•H1: μ > m0    (upper-tailed)
•H1: μ < m0    (lower-tailed)
Note. It is important to remember that hypotheses are never about data, they are about the processes which produce the data. If
you are interested in knowing whether the mean weight of a sample of laptops is equal to five pounds,
the real question being asked is whether the process that produced those laptops has a mean of five.
ASSUMPTIONS
As a parametric procedure (a procedure which estimates unknown parameters), the one sample t-test makes several assumptions. Although t-tests are quite
robust, it is good practice to evaluate the degree of deviation from these assumptions in order to assess the quality of the results. The one sample t-test has
four main assumptions:
• • The dependent variable must be continuous (interval/ratio).
• • The observations are independent of one another.
• • The dependent variable should be approximately normally distributed.
• • The dependent variable should not contain any outliers.
• Level of Measurement
• The one sample t-test requires the sample data to be numeric and continuous, as it is based on the normal distribution. Continuous data can take on any
value within a range (income, height, weight, etc.). The opposite of continuous data is discrete data, which can only take on a few values (Low, Medium,
High, etc.). Occasionally, discrete data can be used to approximate a continuous scale, such as with Likert-type scales.
• Independence
• Independence of observations is usually not testable, but can be reasonably assumed if the data collection process was random without replacement. In our
example, we would want to select laptop computers at random, compared to using any systematic pattern. This ensures minimal risk of collecting a biased
sample that would yield inaccurate results.
• Normality
• To test the assumption of normality, a variety of methods are available, but the simplest is to inspect the data visually using a histogram or a Q-Q
scatterplot. Real-world data are almost never perfectly normal, so this assumption can be considered reasonably met if the shape looks approximately
symmetric and bell-shaped. The data in the example figure below is approximately normally distributed.
• Outliers
• An outlier is a data value which is too extreme to belong in the distribution of interest. Let’s suppose in our example that the assembly machine ran out of a
particular component, resulting in a laptop that was assembled at a much lower weight. This is a condition that is outside of our question of interest, and
therefore we can remove that observation prior to conducting the analysis. However, just because a value is extreme does not make it an outlier. Let’s
suppose that our laptop assembly machine occasionally produces laptops which weigh significantly more or less than five pounds, our target value. In this
case, these extreme values are absolutely essential to the question we are asking and should not be removed. Box-plots are useful for visualizing the
variability in a sample, as well as locating any outliers. The boxplot on the left shows a sample with no outliers. The boxplot on the right shows a sample
with one outlier.
PROCEDURE
The procedure for a one sample t-test can be The four steps are listed below:
summed up in four steps. The symbols to be used are • 1. Calculate the sample mean.
defined below: • y¯¯¯ = y1 + y2 + ⋯ + ynn
• 2. Calculate the sample standard deviation.
• Y = Random sample • σ^ = (y1 − y¯¯¯)2 + (y2 − y¯¯¯)2 + ⋯ + (yn − y¯¯¯)2n − 1−−−−−−−−−−
• yi = The ith observation in Y −−−−−−−−−−−−−√
• n = The sample size • 3. Calculate the test statistic.
• m0 = The hypothesized value • t = y¯¯¯ − m0σ^/n√
• y¯¯¯ = The sample mean • 4. Calculate the probability of observing the test statistic under the
• σ^ = The sample standard deviation null hypothesis. This value is obtained by comparing t to a t-
• T =The critical value of a t-distribution with (n − 1) distribution with (n − 1) degrees of freedom. This can be done by
degrees of freedom looking up the value in a table, such as those found in many
• t = The t-statistic (t-test statistic) for a one sample statistical textbooks, or with statistical software for more accurate
t-test results.
• p = The p-value (probability value) for the t- • p = 2 ⋅ Pr(T > |t|) (two-tailed)
statistic. • p = Pr(T > t) (upper-tailed)
• p = Pr(T < t) (lower-tailed)
• Once the assumptions have been verified and the calculations are
complete, all that remains is to determine whether the results
provide sufficient evidence to reject the null hypothesis in favour
of the alternative hypothesis.
Interpretation
There are two types of significance to consider when interpreting
the results of a one sample t-test, statistical significance and
practical significance.
Statistical Significance Practical Significance
• Statistical significance is determined by looking at • Practical significance depends on the subject
the p-value. The p-value gives the probability of matter. In general, a result is practically
observing the test results under the null hypothesis. significant if the size of the effect is large (or
The lower the p-value, the lower the probability of small) enough to be relevant to the research
obtaining a result like the one that was observed if
questions being investigated. It is not
the null hypothesis was true. Thus, a low p-value
indicates decreased support for the null hypothesis. uncommon, especially with large sample sizes,
However, the possibility that the null hypothesis is to observe a result that is statistically significant
true and that we simply obtained a very rare result but not practically significant. Returning to the
can never be ruled out completely. The cutoff value example of laptop weights, an average
for determining statistical significance is ultimately difference of .002 pounds might be statistically
decided on by the researcher, but usually a value significant. However, a difference this small is
of .05 or less is chosen. This corresponds to a 5% unlikely to be of any interest. In most cases,
(or less) chance of obtaining a result like the one both practical and statistical significance are
that was observed if the null hypothesis was true. required to draw meaningful conclusions.

You might also like