You are on page 1of 7

Emerging Trends & Analysis

1. What does the following statistical tools indicates in research –


A measure of central tendency is a single value that attempts to describe a set of
data by identifying the central position within that set of data. As such, measures
of central tendency are sometimes called measures of central location. They are
also classed as summary statistics.
a). Mean: The mean (or average) is the most popular and well known measure of
central tendency. It can be used with both discrete and continuous data, although
its use is most often with continuous data. The mean is equal to the sum of all the
values in the data set divided by the number of values in the data set. So, if we
have n values in a data set and they have values x1, x2, ..., xn, the sample mean,
usually denoted by (pronounced x bar), is:

This formula is usually written in a slightly different manner using the


Greek capitol letter, , pronounced "sigma", which means "sum of...":

b).Dispersion: The measure of dispersion shows the scatterings of the data. It tells
the variation of the data from one another and gives a clear idea about the
distribution of the data. The measure of dispersion shows the homogeneity or the
heterogeneity of the distribution of the observations.
Suppose you have four datasets of the same size and the mean is also same, say,
m. In all the cases the sum of the observations will be the same. Here, the measure
of central tendency is not giving a clear and complete idea about the distribution for
the four given sets.
Can we get an idea about the distribution if we get to know about the dispersion of
the observations from one another within and between the datasets? The main idea
about the measure of dispersion is to get to know how the data are spread. It shows
how much the data vary from their average value.
Characteristics of Measures of Dispersion
 A measure of dispersion should be rigidly defined
 It must be easy to calculate and understand
 Not affected much by the fluctuations of observations
 Based on all observations
Classification of Measures of Dispersion
The measure of dispersion is categorized as:
(i) An absolute measure of dispersion:
 The measures which express the scattering of observation in terms of
distances i.e., range, quartile deviation.
 The measure which expresses the variations in terms of the average of
deviations of observations like mean deviation and standard deviation.
(ii) A relative measure of dispersion:
We use a relative measure of dispersion for comparing distributions of two or more
data set and for unit free comparison. They are the coefficient of range, the
coefficient of mean deviation, the coefficient of quartile deviation, the coefficient of
variation, and the coefficient of standard deviation.

c). Skewness: Skewness, in statistics, is the degree of distortion from the


symmetrical bell curve, or normal distribution, in a set of data. Skewness can
be negative, positive, zero or undefined. A normal distribution has a skew of
zero, whiles a lognormal distribution, for example, would exhibit some degree
of right-skew.

The three probability distributions depicted below depict increasing levels of


right (or positive) skewness. Distributions can also be left (negative) skewed.
Skewness is used along with kurtosis to better judge the likelihood of events
falling in the tails of a probability distribution.

d). Normal Distribution: Normal distribution, also known as the Gaussian


distribution, is a probability distribution that is symmetric about the mean, showing
that data near the mean are more frequent in occurrence than data far from the
mean. In graph form, normal distribution will appear as a bell curve.
A normal distribution, sometimes called the bell curve, is a distribution that occurs
naturally in many situations. For example, the bell curve is seen in tests like the
SAT and GRE. The bulk of students will score the average (C), while smaller
numbers of students will score a B or D. An even smaller percentage of students
score an F or an A. This creates a distribution that resembles a bell (hence the
nickname). The bell curve is symmetrical. Half of the data will fall to the left of the
mean; half will fall to the right.
Many groups follow this type of pattern. That’s why it’s widely used in business,
statistics and in government bodies like the FDA:
• Heights of people.
• Measurement errors.
• Blood pressure.
• Points on a test.
• IQ scores.
• Salaries.
The empirical rule tells you what percentage of your data falls within certain number
of standard deviations from the mean:
• 68% of the data falls within one standard deviation of the mean.
• 95% of the data falls within two standard deviations of the mean.
• 99.7% of the data falls within three standard deviations of the mean.

2. Differentiate between Parametric and Non-Parametric Tests with examples.


Parametric tests assume underlying statistical distributions in the data. Therefore,
several conditions of validity must be met so that the result of a parametric test is
reliable. For example, Student’s t-test for two independent samples is reliable only if
each sample follows a normal distribution and if sample variances are
homogeneous. Parametric tests often have nonparametric equivalents. You will find
different parametric tests with their equivalents when they exist in this grid. The
advantage of using a parametric test instead of a nonparametric equivalent is that
the former will have more statistical power than the latter. In other words, a
parametric test is more able to lead to a rejection of H0. Most of the time, the p-value
associated to a parametric test will be lower than the p-value associated to a
nonparametric equivalent that is run on the same data.

Nonparametric tests do not rely on any distribution. They can thus be applied even
if parametric conditions of validity are not met. Nonparametric tests are
more robust than parametric tests. In other words, they are valid in a broader range
of situations (fewer conditions of validity).
The parametric test is the hypothesis test which provides generalisations for making
statements about the mean of the parent population. A t-test based on Student’s t-
statistic, which is often used in this regard.
The t-statistic rests on the underlying assumption that there is the normal distribution
of variable and the mean in known or assumed to be known. The population
variance is calculated for the sample. It is assumed that the variables of interest, in
the population are measured on an interval scale.
The nonparametric test is defined as the hypothesis test which is not based on
underlying assumptions, i.e. it does not require population’s distribution to be
denoted by specific parameters.
The test is mainly based on differences in medians. Hence, it is alternately known as
the distribution-free test. The test assumes that the variables are measured on a
nominal or ordinal level. It is used when the independent variables are non-metric.
For performing hypothesis, if the information about the population is completely
known, by way of parameters, then the test is said to be parametric test whereas, if
there is no knowledge about population and it is needed to test the hypothesis on
population, then the test conducted is considered as the nonparametric test.

3. Write a note on SPSS highlighting its importance in research.


SPSS, which stands for statistical package for the social sciences, is an application
that can aid in quantitative data handling. Before SPSS, researchers had to run
statistical tests on data sets by hand. However, SPSS automates this process. Not
only does SPSS allow you to run statistical tests, you can use SPSS for other
purposes as well.
Data Collection and Organization
SPSS is often used as a data collection tool by researchers. The data entry screen in
SPSS looks much like any other spreadsheet software. You can enter variables and
quantitative data and save the file as a data file. Furthermore, you can organize your
data in SPSS by assigning properties to different variables. For example, you can
designate a variable as a nominal variable, and that information is stored in SPSS.
The next time you access the data file, which could be weeks, months or even years,
you'll be able to see exactly how your data is organized.
Data Output
Once data is collected and entered into the data sheet in SPSS, you can create an
output file from the data. For example, you can create frequency distributions of your
data to determine whether your data set is normally distributed. The frequency
distribution is displayed in an output file. You can export items from the output file
and place them into a research article you're writing. Therefore, instead of recreating
a table or graph, you can take the table or graph directly from the data output file
from SPSS.
Statistical Tests
The most obvious use for SPSS is to use the software to run statistical tests. SPSS
has all of the most widely used statistical tests built-in to the software. Therefore, you
won't have to do any mathematical equations by hand. Once you run a statistical
test, all associated outputs are displayed in the data output file. You can also
transform your data by performing advanced statistical transformations. This is
especially useful for data that is not normally distributed.

4). Explain the following tests with its usage and example.
a). T – Test: A t-test is a type of inferential statistic used to determine if there is
a significant difference between the means of two groups, which may be related
in certain features. It is mostly used when the data sets, like the data set
recorded as the outcome from flipping a coin 100 times, would follow a normal
distribution and may have unknown variances. A t-test is used as a hypothesis
testing tool, which allows testing of an assumption applicable to a population.
A t-test looks at the t-statistic, the t-distribution values, and the degrees of
freedom to determine the probability of difference between two sets of data. To
conduct a test with three or more variables, one must use an analysis of
variance.
b).Annova: Analysis of variance (ANOVA) is a collection of statistical models
and their associated estimation procedures (such as the "variation" among
and between groups) used to analyze the differences among group means in
a sample. ANOVA was developed by statistician and evolutionary biologist
Ronald Fisher. The ANOVA is based on the law of total variance, where the
observed variance in a particular variable is partitioned into components
attributable to different sources of variation. In its simplest form, ANOVA
provides a statistical testof whether two or more population means are equal,
and therefore generalizes the t-test beyond two means.
An ANOVA test is a way to find out if survey or experiment results are
significant. In other words, they help you to figure out if you need to reject the
null hypothesis or accept the alternate hypothesis. Basically, you’re testing
groups to see if there’s a difference between them. Examples of when you
might want to test different groups:
• A group of psychiatric patients are trying three different therapies: counseling,
medication and biofeedback. You want to see if one therapy is better than the
others.
• A manufacturer has two different processes to make light bulbs. They want to
know if one process is better than the other.
• Students from different colleges take the same exam. You want to see if one
college outperforms the other.
c). Chi – Square: The Chi Square statistic is commonly used for testing
relationships between categorical variables. The null hypothesis of the Chi-Square
test is that no relationship exists on the categorical variables in the population; they
are independent. An example research question that could be answered using a
Chi-Square analysis would be: Is there a significant relationship between voter
intent and political party membership?
The Chi-Square statistic is most commonly used to evaluate Tests of Independence
when using a crosstabulation (also known as a bivariate table). Crosstabulation
presents the distributions of two categorical variables simultaneously, with the
intersections of the categories of the variables appearing in the cells of the table.
The Test of Independence assesses whether an association exists between the two
variables by comparing the observed pattern of responses in the cells to the pattern
that would be expected if the variables were truly independent of each other.
Calculating the Chi-Square statistic and comparing it against a critical value from the
Chi-Square distribution allows the researcher to assess whether the observed cell
counts are significantly different from the expected cell counts.
The calculation of the Chi-Square statistic is quite straight-forward and intuitive:

where fo = the observed frequency (the observed counts in the cells)


and fe = the expected frequency if NO relationship existed between the variables
As depicted in the formula, the Chi-Square statistic is based on the difference
between what is actually observed in the data and what would be expected if there
was truly no relationship between the variables.

b).Co-relation: Correlation, in the finance and investment industries, is a statistic that


measures the degree to which two securities move in relation to each other.
Correlations are used in advanced portfolio management, computed as the
correlation coefficient, which has a value that must fall between -1.0 and +1.0.
Correlation is a statistical technique that can show whether and how strongly pairs
of variables are related. For example, height and weight are related; taller people
tend to be heavier than shorter people. The relationship isn't perfect. People of the
same height vary in weight, and you can easily think of two people you know where
the shorter one is heavier than the taller one. Nonetheless, the average weight of
people 5'5'' is less than the average weight of people 5'6'', and their average weight
is less than that of people 5'7'', etc. Correlation can tell you just how much of the
variation in peoples' weights is related to their heights.
Although this correlation is fairly obvious your data may contain unsuspected
correlations. You may also suspect there are correlations, but don't know which are
the strongest. An intelligent correlation analysis can lead to a greater understanding
of your data.
c). Regression: Regression analysis is a powerful statistical method that allows you
to examine the relationship between two or more variables of interest.
While there are many types of regression analysis, at their core they all examine
the influence of one or more independent variables on a dependent variable.
Regression analysis provides detailed insight that can be applied to further improve
products and services.
Regression analysis is a reliable method of identifying which variables have impact on
a topic of interest. The process of performing a regression allows you to confidently
determine which factors matter most, which factors can be ignored, and how these
factors influence each other.
In order to understand regression analysis fully, it’s essential to comprehend the
following terms:
• Dependent Variable: This is the main factor that you’re trying to
understand or predict.
• Independent Variables: These are the factors that you hypothesize have
an impact on your dependent variable.

Regression analysis is used in stats to find trends in data. For example, you might
guess that there’s a connection between how much you eat and how much you
weigh; regression analysis can help you quantify that. Regression analysis will
provide you with an equation for a graph so that you can make predictions about your
data. For example, if you’ve been putting on weight over the last few years, it can
predict how much you’ll weigh in ten years time if you continue to put on weight at the
same rate. It will also give you a slew of statistics (including a p-valueand a
correlation coefficient) to tell you how accurate your model is. Most elementary stats
courses cover very basic techniques, like making scatter plots and performing linear
regression. However, you may come across more advanced techniques like multiple
regression.

5). Explain Multivariate analysis with example.


It refers to any statistical technique used to analyse more complex sets of data.
There are more than 20 different methods to perform multivariate analysis and
which method is best depends on the type of data and the problem you are trying to
solve. Essentially you build models that reflect an actual product or process and
optimise it using different methods.
Multivariate analysis is typically used for:
 Quality control and quality assurance
 Process optimisation and process control
 Research and development
 Consumer and market research
Multivariate analysis is used to study more complex sets of data than what
univariate analysis methods can handle. This type of analysis is almost always
performed with software (i.e. SPSS or SAS), as working with even the smallest of
data sets can be overwhelming by hand.
Multivariate analysis can reduce the likelihood of Type I errors. Sometimes,
univariate analysis is preferred as multivariate techniques can result in difficulty
interpreting the results of the test. For example, group differences on a linear
combination of dependent variables in MANOVA can be unclear. In addition,
multivariate analysis is usually unsuitable for small sets of data.
There are more than 20 different ways to perform multivariate analysis. Which one
you choose depends upon the type of data you have and what your goals are. For
example, if you have a single data set you have several choices:
• Additive trees, multidimensional scaling, cluster analysis are appropriate for
when the rows and columns in your data table represent the same units and the
measure is either a similarity or a distance.
• Principal component analysis (PCA) decomposes a data table with correlated
measures into a new set of uncorrelated measures.
• Correspondence analysis is similar to PCA. However, it applies to
contingency tables.
Although there are fairly clear boundaries with one data set (for example, if you
have a single data set in a contingency table your options are limited to
correspondence analysis), in most cases you’ll be able to choose from several
methods.

You might also like