Professional Documents
Culture Documents
b).Dispersion: The measure of dispersion shows the scatterings of the data. It tells
the variation of the data from one another and gives a clear idea about the
distribution of the data. The measure of dispersion shows the homogeneity or the
heterogeneity of the distribution of the observations.
Suppose you have four datasets of the same size and the mean is also same, say,
m. In all the cases the sum of the observations will be the same. Here, the measure
of central tendency is not giving a clear and complete idea about the distribution for
the four given sets.
Can we get an idea about the distribution if we get to know about the dispersion of
the observations from one another within and between the datasets? The main idea
about the measure of dispersion is to get to know how the data are spread. It shows
how much the data vary from their average value.
Characteristics of Measures of Dispersion
A measure of dispersion should be rigidly defined
It must be easy to calculate and understand
Not affected much by the fluctuations of observations
Based on all observations
Classification of Measures of Dispersion
The measure of dispersion is categorized as:
(i) An absolute measure of dispersion:
The measures which express the scattering of observation in terms of
distances i.e., range, quartile deviation.
The measure which expresses the variations in terms of the average of
deviations of observations like mean deviation and standard deviation.
(ii) A relative measure of dispersion:
We use a relative measure of dispersion for comparing distributions of two or more
data set and for unit free comparison. They are the coefficient of range, the
coefficient of mean deviation, the coefficient of quartile deviation, the coefficient of
variation, and the coefficient of standard deviation.
Nonparametric tests do not rely on any distribution. They can thus be applied even
if parametric conditions of validity are not met. Nonparametric tests are
more robust than parametric tests. In other words, they are valid in a broader range
of situations (fewer conditions of validity).
The parametric test is the hypothesis test which provides generalisations for making
statements about the mean of the parent population. A t-test based on Student’s t-
statistic, which is often used in this regard.
The t-statistic rests on the underlying assumption that there is the normal distribution
of variable and the mean in known or assumed to be known. The population
variance is calculated for the sample. It is assumed that the variables of interest, in
the population are measured on an interval scale.
The nonparametric test is defined as the hypothesis test which is not based on
underlying assumptions, i.e. it does not require population’s distribution to be
denoted by specific parameters.
The test is mainly based on differences in medians. Hence, it is alternately known as
the distribution-free test. The test assumes that the variables are measured on a
nominal or ordinal level. It is used when the independent variables are non-metric.
For performing hypothesis, if the information about the population is completely
known, by way of parameters, then the test is said to be parametric test whereas, if
there is no knowledge about population and it is needed to test the hypothesis on
population, then the test conducted is considered as the nonparametric test.
4). Explain the following tests with its usage and example.
a). T – Test: A t-test is a type of inferential statistic used to determine if there is
a significant difference between the means of two groups, which may be related
in certain features. It is mostly used when the data sets, like the data set
recorded as the outcome from flipping a coin 100 times, would follow a normal
distribution and may have unknown variances. A t-test is used as a hypothesis
testing tool, which allows testing of an assumption applicable to a population.
A t-test looks at the t-statistic, the t-distribution values, and the degrees of
freedom to determine the probability of difference between two sets of data. To
conduct a test with three or more variables, one must use an analysis of
variance.
b).Annova: Analysis of variance (ANOVA) is a collection of statistical models
and their associated estimation procedures (such as the "variation" among
and between groups) used to analyze the differences among group means in
a sample. ANOVA was developed by statistician and evolutionary biologist
Ronald Fisher. The ANOVA is based on the law of total variance, where the
observed variance in a particular variable is partitioned into components
attributable to different sources of variation. In its simplest form, ANOVA
provides a statistical testof whether two or more population means are equal,
and therefore generalizes the t-test beyond two means.
An ANOVA test is a way to find out if survey or experiment results are
significant. In other words, they help you to figure out if you need to reject the
null hypothesis or accept the alternate hypothesis. Basically, you’re testing
groups to see if there’s a difference between them. Examples of when you
might want to test different groups:
• A group of psychiatric patients are trying three different therapies: counseling,
medication and biofeedback. You want to see if one therapy is better than the
others.
• A manufacturer has two different processes to make light bulbs. They want to
know if one process is better than the other.
• Students from different colleges take the same exam. You want to see if one
college outperforms the other.
c). Chi – Square: The Chi Square statistic is commonly used for testing
relationships between categorical variables. The null hypothesis of the Chi-Square
test is that no relationship exists on the categorical variables in the population; they
are independent. An example research question that could be answered using a
Chi-Square analysis would be: Is there a significant relationship between voter
intent and political party membership?
The Chi-Square statistic is most commonly used to evaluate Tests of Independence
when using a crosstabulation (also known as a bivariate table). Crosstabulation
presents the distributions of two categorical variables simultaneously, with the
intersections of the categories of the variables appearing in the cells of the table.
The Test of Independence assesses whether an association exists between the two
variables by comparing the observed pattern of responses in the cells to the pattern
that would be expected if the variables were truly independent of each other.
Calculating the Chi-Square statistic and comparing it against a critical value from the
Chi-Square distribution allows the researcher to assess whether the observed cell
counts are significantly different from the expected cell counts.
The calculation of the Chi-Square statistic is quite straight-forward and intuitive:
Regression analysis is used in stats to find trends in data. For example, you might
guess that there’s a connection between how much you eat and how much you
weigh; regression analysis can help you quantify that. Regression analysis will
provide you with an equation for a graph so that you can make predictions about your
data. For example, if you’ve been putting on weight over the last few years, it can
predict how much you’ll weigh in ten years time if you continue to put on weight at the
same rate. It will also give you a slew of statistics (including a p-valueand a
correlation coefficient) to tell you how accurate your model is. Most elementary stats
courses cover very basic techniques, like making scatter plots and performing linear
regression. However, you may come across more advanced techniques like multiple
regression.