You are on page 1of 13

What is a T-Distribution?

A T-Distribution is also known as the student’s T-Distribution. It is a type of


probability distribution that is similar to the normal distribution with its bell
shape but has heavier tails. The sample space of a T-Distribution is usually very
low, for example a certain number of people from a given population can be
chosen as the sample space. As a result, T-Distributions have a greater chance
of getting extreme values than normal distribution and therefore have fatter
tails.

What does a T-Distribution tell you?


The heaviness of the tail is determined by a parameter of the T-Distribution
called Degree of Freedom. Degrees of Freedom are the maximum number of
logical independent values that may vary in a data set. It is calculated by
subtraction one from the total number of items in the sample space. In a
generalized manner, Degree of Freedom = (n-1), where n = sample space. For
example, take a data set consisting of five positive integers. The mean of five
numbers must be 6. If four items within the data set are {3,8,5,4}, then the fifth
number has to be 10. Because the first four numbers are chosen at random,
the Degree of Freedom is 4.
Smaller values of Degree of Freedom give heavier tails, and higher vales
resemble a standard distribution.
How to Calculate the T-Score?
T = [(x̄-µ) *√n] ÷s
Where
T is the T-Score
x̄ is the sample mean
µ is the population mean
n is the sample size
s is the standard deviation
How the topic has shaped over years
n statistics, the t-distribution was first derived as a posterior distribution in
1876 by Helmert and Lüroth. The t-distribution also appeared in a more
general form as Pearson Type IV distribution in Karl Pearson's 1895 paper.It is
statistical technique developed by English statistician William Sealy Gosset in
1908 to basically control the quality of dark beers. A t test used to test whether
there is a difference between two independent sample means is not different
from a t test used when there is only one sample,he was interested in quality
control based on small samples in various stages of the production process.
he t distribution describes the variability of the distances between sample means
and the population mean when the population standard deviation is unknown
and the data approximately follow the normal distribution. It is also known as
student t distribution because his employer preferred staff to use pen
names when publishing scientific papers.

CHARACTERISTICS:
 SYMMETRICAL
 BELL-SHAPED DISTRIBUTION
 SMOOTH SHAPE
 T-DISTRIBUTION HAS A MEAN OF 0
The distribution approximates the Standard Normal Distribution
when the degree of freedom (df) is high with the mean of 0 and
standard deviation of 1.
T-DISTRIBUTION FORMULA;

Set of t values measured for all random samples for a specific sample
size or a particular degree of freedom.it basically approximates the
shape of a normal distribution.
Let x have a normal distribution with mean ‘μ’ for the sample of size
‘n’ with mean x̄ and the standard deviation ‘s’, then the t variable has
student’s t-distribution with a degree of freedom, d.f = n – 1. The
formula for t-distribution Is as follow

T-DISTRIBUTION PROPERTIES;
 RANGES FROM −∞ to +∞.
 SHAPE OF T-DISTRIBUTION VARIES WITH THE CHANGE IN THE
DEGREE OF FREEDOM
 VARIANCE OF T-DISTRIBUTION IS ALWAYS GREATER THAN “1”.

T-TEST;
The t-test is any statistical hypothesis test wherein the test statistics follows the
student’s t-distribution under the null hypothesis.

USES;

* Itcan be used to determine whether the two sets are significantly


different from each other and is most commonly applied when the
test statistic would be similar to a normal distribution if the scaling
term value in the test statistic were given.
BENEFITS
t distribution has flatter tails so it can be used as a model for financial returns
displaying excessive kurtosis. It enables a more realistic calculation of the
VALUE at RISK (VaR) in many cases. ◦ with a T distribution, the probability of
getting values far from the mean is higher. ◦ T distribution is most useful for
sample sizes when the population standard deviation is not known. ◦ the data
collection in T distribution becomes easier. ◦ due to less amount of data ,the
computation, calculations and formulation becomes easier. ◦ the assumptions
provides robustness to the data. ◦ the total amount of data in T distribution to
be collected is very less, hence it will be easy to calculate. ◦ T distribution
becomes more similar to a normal distribution. ◦ T distribution is also known as
student’s t-distribution and is used as when given information(sample sizes) are
small in order to calculate or estimate confidence or determine critical values
that an observation is a given distance from the mean. ◦ The smaller the
sample size, the more it differs from the normal distribution. ◦ T-distribution is
a way of describing data that follows a bell curve when plotted on a graph, with
the greatest number of observations close to the mean and few observations in
the tails. ◦ The T-distribution is used when data are approximately normally
distributed, which means data follow a bell shape but the population variance
is unknown. the variance is estimated based on the degree of freedom of the
data set. ◦ T-distribution is used in many cases and also known as z distribution
which means that it’ll give a lower probability to the centre and a higher
probability to the tails than the normal standard deviation. ◦ T-distribution is
primarily used to find t scores and t score is used in two ways :-

•the upper and lower bounds of a confidence interval when the data are
approximately distributed. •the p-value of the test statistic for t-test and
regression test. ◦ t-score used to generate upper and lower bounds is also
known as the critical value of t, or t* ◦ using a two tailed t-test, you generate an
estimate of the difference between the two classes and a confidence interval
around that estimate. ◦ T-distribution is used while making assumptions about
a mean and when standard deviation not known or given. ◦ T-distribution is
used as a flatter and shorted bell shaped distribution. ◦ higher the degrees of
freedom , closer this distribution will estimate a standard normal deviation
with a mean of 0 and a standard deviation of 1. ◦ with the help of t-distribution
we can approximate the shape of normal distribution. ◦ T-distribution is used to
determine proportions connected with z scores. we use t-distribution table to
find the ratio for t-statistics , it shows the probability of t taking values from a
given value. the obtained probability will be the area off the T-curve between
the ordinates of t-distribution, given value and infinity. ◦ The t-distribution is
used in a variety of statistical studies, which includes the Student's t-test for
determining the statistical significance of a difference in two sample means,
the generation of confidence intervals for a difference in two population
means, and linear regression analysis. ◦ T-distribution has longer tails, which
means it is more likely to produce values that are far from the mean. this
makes it useful for analyzing the statistical behaviour of particular types of
random random quantity ratios , in which volatility in the denominator is
amplified and can lead to outlying values when the ratio’s denominator
approaches zero. ◦ In t-distribution, “bell shaped” or “inverted U-shaped”
curves used for indicating regularly distributed data. ◦ use the t-distribution
when you need to assess the mean and do not know the population standard
deviation. ◦ the t-distribution is specifically designed for use with small sample
sizes and this is because it takes into the account of additional uncertainty that
arises when the sample size is small. ◦ it provides more accurate inference than
the normal distribution in such situations. ◦ The t-distribution can be used
when the population standard deviation is unknown. This is often the case in
real-world applications, where we only have access to a sample of data and not
the entire population. ◦ T-distribution has heavier tails meaning that extreme
values are more likely to occur. ◦ T-distribution is widely applicable in many
fields including business, economics, engineering, social science and many
more.

DISADVANTAGES
◦ the T-distribution can skew the accuracy concerning the normal distribution.
◦ its shortcoming only arises when perfect normality is required. ◦ independent
t-test can mainly help you detect the difference between the sample groups
but it won’t help you in controlling the effects of the environment.
environment changes may affect the output of the t-test. ◦ t-distribution can
not be used for in multiple comparisons because it results in type 1 errors. ◦
when considering a paired t-test among a group pf samples, it will be difficult
to reject the null hypothesis. ◦ when the data collected violates the assumption
pf the t-test , then the output is unreliable. ◦ when depending on paired sample
t-tests, there will be some problems associated with repeated measures
instead of differences between the group designs and this leads to carry over
affects. ◦ if we reduce the degree of freedom then it will be a severe
disadvantage of an independent t-test. ◦ in t-test obtaining the subjects seems
to be difficult. ◦ the repeated measure problem creates the carryover effects. ◦
the t-distribution is heavily dependant on the sample size. ◦ if there occurs any
outliers or extreme values in the sample set, then the t-distribution may not be
an accurate representation of the population. ◦ t-distribution provides only
limited information and has only limited applicability . ◦ it has sensitivity to
outliers. ◦ the t-distribution only provides information about the mean of the
population. ◦ if the observations in t-distribution are not independent such as
in time series data set or clustered , the distribution may not be appropriate or
exactly. ◦ in t-distribution when the df of a group test tends to be lower you
need a higher t-value in order to reach the significance and this creates a
greater trade off between the greater power leading to fewer degrees of
freedom. ◦ it makes difficult to find subjects. ◦ A t-test cannot measure
differences between more than two groups because the error structure for a t-
test will underestimate the actual error when many groups are being
compared. ◦ the potential for sample size seems to reduce in t-test. ◦ it does
not justify the use of very small sample size unless larger samples are
impossible. ◦ when thew sample size is large relative to the population size
then we can almost reject the null hypothesis. ◦ it only can be used to compare
two groups , if we used it to compare more than two groups , we may incur a
type-I error

Application of the theorem:


A statistical test called a t test is employed to compare the means of
two groups. It is frequently employed in hypothesis testing to
establish whether a procedure or treatment actually affects the
population of interest or whether two groups differ from one
another. The research paper we are citing - attempts to explore the
rationales why so many Chinese students choose to study abroad and
why the United States is their preferred destination it is authored by
– Joseph F, Hamburg,Davis . This population is small, however it
makes up a vital component of university life at many colleges and
contributes to a university’s financial revenue greatly. A total of 380
students completed a questionnaire yielding 138 usable responses.
Specifically, the rationale behind Chinese students’ rationales for
attending colleges in the U.S is explored. The results indicate that
Chinese students are seeking education with a worldview, and opt to
break from the Chinese system of learning.

The following are the reason why they want to leave their home
country :

1).Gain a new perspective on my own country;

2). Can attend a better school overseas, but not able to attend the
schools I want due to China National College

3). It is easy to be admitted by a foreign school;

4). It is also costly if I study in my home country;

5). My friends have gone abroad, so I would;


6). I want to be away from my country;

7). I must study abroad because my parents’ wish;

8). There are more fields of studies offered by foreign schools;

9). Political easiness in programs abroad;

10). Better living conditions, housing, eating, and environmental, i.e.


clean air, etc;

11). The educational system is better overseas.

Hypothesis, Test of Hypothesis, and Sampling

The hypotheses for this research are to find if there are any
significant differences in the Chinese students’ decisions for choosing
to study in the U.S. rather than in China.

Ho: There is no significant difference for Chinese students in any of


the reason between China and USA.

Hα: There are significant differences in each of these hypotheses.

Variables

mean t df p-value

1) 0.629 5.490 131 0

2) 0.039 0.302 126 0.764

3) -0.29 -2.698 129 0.008

4) -0.55 -5.070 129 0.000

5) -0.66 -5.627 130 0.000

6) -0.65 -5.274 129 0.000

7) -0.61 -5.342 129 0.000


8) 0.008 0.063 129 0.950

9) -0.33 -2.971 128 0.004

10) -0.12 -0.996 130 0.321

11) 0.382 3.072 130 0.003

This table shows that the variable whose mean difference is positive
they consider it as more valid reason to move to USA for higher
studies and variables with negative mean, they are less prone to
those factors and do not consider it as an important reason to study
abroad as well as some other respondents also argued for factors
with negative mean.

After doing the hypothesis test , we assume level of significance as


0.05 and confidence interval is 95%. For different variable there are
different T-statistics value and p-values are also different. We will see
where the p- value is greater than 0.05 we will accept the null
hypothesis and it appears in case (2,8,10)their p values are greater
than 0.05 and others are less than that. Evidence- so Chinese
students do not view these issues differently. everybody accept
these 3 reasons.

RESEARCH; Due to the China’s emergence as a global economy and to


the understanding of the global business nonacademic reasons are
primarily driving Chinese students to study in USA.

Real life example of Independent T-Test (two


sample):
Suppose a businessman wants to introduce some new techniques so
he compares the time taken in completion two products with
efficiency -one with using traditional techniques and another product
by using new ways like by improving inventory management system
decides to distinguish between two products. With the help f
independent T-Test we will be able of answer the question- is the
difference statistically significant?

Null hypothesis: there is difference between the time taken in


completing the product with different techniques.

Alternate hypothesis: there is no difference between the time taken


in completing the product with different techniques.

PRODUCT A(traditional technique);

Sample mean X1:84 Hrs, SD(S1): 15 Hrs, n1:20

PRODUCT B(New ways):

Sample mean X1: 95 hrs, SD(S2): 20 Hrs, n2:20

We will use a 95% confidence level and α = 0.05.

The two-sample t-statistic is calculated as the following assuming


that the standard deviations of the population is not same and the
population mean is same.

t= (x‾1 -X‾ 2)-( μ1-μ2) / (s / √n)

t=(95-84)-0/SQRT{(15*15/20)+20*20/20)}

t=11/SQRT(31.25)

t= 1.967739

Degree of freedom is n1+n2-2; 20+20-2=38

The critical value for two tailed test with degree of freedom as 38
and level of significance as 0.05 comes out to be 2.0244. As the
current value of t is lower than the critical value one can Accept the
null hypothesis that there is difference between the time taken in
completion of two products if we use different techniques. Thus,
based on the given evidence, the alternate hypothesis stands false.

Conclusion
T distribution uses null and alternative hypotheses. Depending on the
situation, hypothesis tests about a population parameter may take
one of three forms: two use inequalities in the null hypothesis; the
third uses an equality in the null hypothesis. We have looked through
research paper the real life example of independent t tests. The p
value approach to hypotheses we have done in our research paper to
show willingness of Chinese students to move abroad, the p value
and critical value will always lead to the same rejection decision; that
is, whenever the p-value is less than or equal to a, the value of the
test statistic will be less than or equal to the critical value. the
advantage of the p-value approach is that the p-value tells us how
significant the results are (the observed level of significance). If we
use the critical value approach, we only know that the results are
significant at the stated level of significance. However we found out
that in China a new generalized t (new Gt) distribution based on a
distribution construction approach is proposed and proved to be
suitable for fitting both the data with high kurtosis and heavy tail.
The generalized t (Gt) distribution was first proposed by McDonald
and Newey to implement a partially adaptive regression models.
Galbraith and Zhu introduced a new class of asymmetric Student t
distributions and illustrated their applications in financial
econometrics. Harvey and Lange applied the generalized t
distribution generalized t distribution and its extension were involved
in extensive research in the fields of robust estimation and robust
statistical model autoregressive conditional heteroscedasticity
models. For this research paper firstly the mathematicians have
investigated the main properties of the new distribution including
moments, skewness coefficients, kurtosis coefficients and random
number generation. Secondly, they have derived the explicit
expression for the moments of order statistics as well as its
corresponding variance–covariance matrix through recurrence
relations and the distribution transformation technique. The so
obtained method has high efficiency and can greatly reduce the
computation time. After that, focused on the parameter estimation
of this new Gt distribution. Several estimation methods including
MMOM, MLE using the EM algorithm, MLE using a new iterative
algorithm and IPWM have been introduced. Among all these
estimation methods, the IPWM performs best on the whole and this
novel method makes the parameter estimation method of this
distribution not limited to MLE and MMOM. Furthermore, the new
iterative algorithm to acquire MLE is more suitable than the EM
algorithm when the sample kurtosis is more than 2.7. For four-
parameter new Gt distribution, we have established an EM-type
algorithm through the profile maximum likelihood approach and
discovered that the variation of the shape parameter α has a
significant effect on the estimation performance of scale parameter σ
and shape parameter α. However, there are still some limitations and
areas to be improved according to them in their research paper. The
parameter estimation method is limited to PLA using the EM
algorithm when the distribution has a heavier tail. Also the type of
data that previous distributions can be fitted to is not wide
enough ;for the flexibility of the shape, the distribution function is
often very complex, imposing limitations on the parameter
estimation. Therefore, proposing more efficient and accurate
estimation methods will be the focus of the future research. Besides,
the new Gt distribution is suitable for fitting data with good
symmetry. In future work, it can be applied for the asymmetric
situation by adding a skew parameter. Citation: Guan, R.; Zhao, X.;
Cheng, W.; Rong, Y. A New Generalized t Distribution Based on a
Distribution Construction Method. Through the research paper we
have successfully explained and introduced T distribution, its
characteristics and properties and rules and formulas and about T
TESTS and its uses. We have also learnt about the advantages and
disadvantages of T distribution. We have tried to prove applications
of theorem using data sets and provided a real life example for better
clarity of the topic and to give insight about how null and alternative
hypothesis approach works. Lastly we have cited from a research
paper A New Generalized t Distribution Based on a Distribution
Construction Method to get better perspective about the recent
development done using T distribution model

You might also like