You are on page 1of 10

Types of Analytical Procedure

Having ensured that we have successfully carried out our survey, our research
hypotheses were formulated correctly; our sampling was scientifically designed, our survey
instruments were constructed correctly, tested and validated and finally, the data were
correctly entered onto the computer, our next task is to plan for analysing the data. Data
analysis is particularly necessary for testing hypotheses or otherwise answers our research
questions and to further our overall goal of understanding social phenomena.
Our data may be interpreted and presented in entirely verbal terms, particularly in
observational studies, and document studies. However, when we are dealing with quantitative
data, we prefer to employ statistical techniques to analyse our data. Our goal in statistical
analysis can be achieved through the process of description, explanation, and prediction. Of
the three tasks, descriptive analysis refers to the transformation of raw data into a form that
will make it easy to understand and interpret. Describing responses or observations is
typically the first form of analysis. Descriptive analysis is simply attempted to telling what
the data “look like,” for example, how many cases were analysed, what the range of score
was, what was the mean score, how the individual scores differ from each other, and so forth.
This is often conducted for one variable at a time, for which it is qualified as univariate
analysis. Explanation and predictions are generally more complicated than the description
and require more comprehension as well as more interpretation. Explanatory statistical
analysis can take several forms but generally consists of the analysis of the relationship
between two or more variables. This is accomplished usually through varieties of statistical
techniques: Test of significance, correlation analysis, regression analysis, and the like.
The following section will provide a brief overview of the methods of data analysis
about;
1) Univariate, 3) Yri-variate, and
2) Bivariate, 4) Multivariate Analysis.
1) Univariate Analysis
The first step in seeing what your data look like is to examine each variable separately. This
can be accomplished by getting the distribution of each variable one by one. Such single-
variable analysis is called univariate analysis, that is, analysis based on one
variable. The simplest form of single-variable analysis is to count the number of cases in each
category. The resulting count is called a frequency distribution. We can form frequency
distributions of such single variables as religion (which is measured on a nominal scale),
level of education (ordinal scale), temperature (interval scale) and age (ratio scale).
A frequency distribution is, however, not usually very interesting and informative without
additional statistical manipulations. Several statistical measures can be obtained from a
frequency distribution. Still, the precise nature of the permissible measures will depend on
the type of a variable or more accurately on the level of measurement. The commonly used
levels of measurement are nominal, ordinal, and interval. The accompanying table shows the
level of education of a group of women as obtained in the 1993-94 BDH Survey.
The education level shown in column 1 is the only variable measured on an ordinal scale.
The distribution is a univariate one. The second and third columns represent the absolute and
percentage frequencies, respectively. The frequencies are absolute numbers and do not lend
themselves for meaningful interpretation unless they are standardized for size. This is more
so when two or more distributions are to be compared. Forming proportions or percentages,
this problem can be removed. Percentages serve two purposes in data analysis. They simplify
by reducing all numbers to a range from 0 to 100. Second, they translate the data into
standard form, with a base of 100, for relative comparison. Caution must, however, be
exercised in using percentages. Note that all percentage values must add to 100 (unless
multiple responses are there). And percentage values cannot be averaged ordinarily.
Note that the variable ‘education level’ is a variable measured on an ordinal scale, for which
we cannot go much beyond the type of analysis presented in the above table. However, we
can attempt to obtain a median, a measure of central tendency, since it is possible to rank the
women according to their level of education. The median category is the ‘no education’
category since on cumulating, 100/2=50th women belong to this category when arranged in
enhanced education level. Mode, another measure of central tendency, also has the same
category in this particular case. Graphical presentation of the data can also work well in the
present instance to describe the data in hand. Pie and bar diagrams appear to be the best
choices in this instance.
When quantitative data (interval or ratio) are at hand, other descriptive measures, such as
mean, standard deviation, coefficient of variation, etc. in addition to median and mode, can
be attempted in the purview of univariate analysis. Consider the ‘Sex Workers’ data, where
we have collected such data as age, height, weight, income, and BMI. We can compute mean,
median, and mode for each of these variables directly from the raw data either by a calculator
or by a computer using SPSS. These statistics, however, do not tell much about the data
unless we analyse them from a comparative perspective.
The above estimates have been made from two single-variate separate distributions: those of
the brothel-based sex workers and street sex-workers. The standard deviation gives the
average distance or variability of individual measurement observations from the group mean.
Other measures of variability are range, quartile deviation, and coefficient of variation. Do
the two groups of sex-workers differ significantly concerning their age, income, etc.? To
answer this question, one can perform the ‘equality of two-mean test’ to assess if the
differences are significant.
Bivariate Analysis
Bivariate presentation places two variables together in a single table in such a manner that
these interrelationships can be examined. The table may be based on two nominal scaled
variables, two ratio level variables, or any combination of them. Such tables are
called bivariate tables or cross tables. Crosstables based on numerical data (interval or
ratio) are sometimes called correlation tables. Tables that are constructed solely based on
nominal data are called contingency tables.
By tradition and convention, one variable called the column variable is usually labeled across
the top so that its categories form columns vertically down the page. The second variable or
row variable is labeled on the left margin with its categories forming the row horizontally
across the page. Since it is always possible to interchange the rows and columns of any table,
general rules about when to use row and column percentages cannot be given. It is, however,
generally advisable to percentage along with the independent variable. If the independent
variable is the row variable, select the row percentages; if the independent variable is the
column variable, select column percentages.
As an example, imagine that we are analysing a survey question that asks: Do you approve of
abortion (Yes/No). We find from a preliminary analysis that gender is an important variable
in determining response to this question and decide to construct a bivariate table containing
these two variables. A person’s opinion cannot affect his/her sex, but sex can affect opinion.
Thus sex is the independent variable, and opinion on abortion is the dependent variable. The
table below shows the results of this investigation. By percentage of the independent variable
(sex), we can see whether a change in the independent variable (e.g., from male to female)
results in a different distribution of yes/no (i.e., favor/do not favor) score on the dependent
variable. Here is a possible analysis of the table;
The types of analytical techniques that are appropriate for studying bivariate relationships
depend on the nature of the variables-whether they are nominal, ordinal, or interval. We
present below a brief overview of the type of data needed to accomplish different types of
bivariate analyses indicating the possible statistical test that can be applied.
When the data are measured on a nominal scale:
Most often, we are interested in determining whether the observed differences within the data
could have occurred by chance alone. In the example above, where both the variables are
nominal, 40% of the males in contrast to 60% of the females are in favor of abortion.
Is this difference statistically significant, or could it have happened by chance alone?
Probably, the most commonly used statistical test is the chi-square test to answer the
question. However, the chi-square statistic does not measure the strength of the relationship.
For this purpose, a “measure of association” is needed. For this purpose, we can employ such
measures as phi-coefficient and Cramer’s V, which are derived from chi-square value.
When the data are measured on the ordinal scale:
There are several different measures of association for cross-tabulations of ordinally
measured variables. Perhaps the most commonly used measure of association for such tables
is called Gamma. The fourfold form of gamma is called Yule’s Q. When the table has more
than four cells, the Q coefficient is called gamma instead of Q. The chief disadvantage of
gamma as a measure of association is that there is no simple significance teat for evaluating
gamma.
When the data are measured on an interval scale:
Relationships between interval variables may be studied with or without cross-tabulation. If
crosstabulation is made of interval variables, one can attempt computing gamma or Cramer’s
V, examining the apparent nature of the relationship between the variables. It is, however,
more common to measure the relationship between pairs of interval variables without
reference to any crosstabulations using Pearson’s product-moment correlation coefficient
denoted by r. A t-test can assess the statistical significance of r. The correlation coefficient
tells us how strongly two variables measured on at least an interval scale are related. Still,
it does not enable us to predict an individual’s value or score on one variable from knowledge
of his or her score on the second variable.
Regression analysis is a technique that allows us to make such a prediction. In this case, the
measure of association is the zero-order regression coefficient, which indicates the average
amount of change in the dependent variable associated with a unit change in the independent
variable.
Multivariate Analysis
Analyses that permit a researcher to study the effect of controlling for one or more variables
are called multivariate analyses since they involve multiple (more than two) variables.
Most multivariate techniques also permit the measurement of the degree of relationship
between a dependent variable and two or more independent variables considered
simultaneously. The most commonly used multivariate techniques include, among others, are
multiple regression analysis, multiple classification analysis (MCA), discriminating analysis,
multivariate analysis of variance (MANOVA), logistic regression analysis, and hazard
analysis. Other methods in multivariate settings are factor analysis, cluster analysis, and
multidimensional scaling. Multivariate techniques can be very powerful analytical tools, but
they must be used with great caution. They are all based on numerous assumptions that are
very difficult to meet in most social science research. As a result, findings are not often valid.
Your plan of analysis should not include any multivariate techniques without being ensured
of its applicability. Before concluding this section, we emphasize here that as a first of
multivariate analysis of data, the readers are advised to start with regression analysis.
Hypothesis (Null and alternate hypothesis)
Generation of the hypothesis is the beginning of a scientific process. It refers to a supposition,
based on reasoning and evidence. The researcher examines it through observations and
experiments, which then provides facts and forecast possible outcomes. The hypothesis can
be inductive or deductive, simple or complex, null or alternative. While the null
hypothesis is the hypothesis, which is to be actually tested, whereas alternative
hypothesis gives an alternative to the null hypothesis. In statistical hypothesis testing,
the null hypothesis of a test always predicts no effect or no relationship between variables,
while the alternative hypothesis states your research prediction of an effect or relationship.
Comparison Chart
BASIS FOR
NULL HYPOTHESIS ALTERNATIVE HYPOTHESIS
COMPARISON

Meaning A null hypothesis is a An alternative hypothesis is statement


statement, in which there is no in which there is some statistical
relationship between two significance between two measured
variables. phenomenon.

Represents No observed effect Some observed effect

What is it? It is what the researcher tries to It is what the researcher tries to prove.
disprove.

Acceptance No changes in opinions or Changes in opinions or actions


actions

Testing Indirect and implicit Direct and explicit

Observations Result of chance Result of real effect

Denoted by H-zero H-one

Mathematical Equal sign Unequal sign


formulation

Definition
Null hypothesis
A null hypothesis is a statistical hypothesis in which there is no significant difference exist
between the set of variables. It is the original or default statement, with no effect, often
represented by H0 (H-zero). It is always the hypothesis that is tested. A null hypothesis can be
rejected, but it cannot be accepted just on the basis of a single test.
Alternate hypothesis
A statistical hypothesis used in hypothesis testing, which states that there is a significant
difference between the set of variables. It is often referred to as the hypothesis other than the
null hypothesis, often denoted by H1 (H-one). It is what the researcher seeks to prove in an
indirect way, by using the test. The acceptance of alternative hypothesis depends on the
rejection of the null hypothesis i.e. until and unless null hypothesis is rejected (ባዶ መላምት
ውድቅ እስካልተደረገ ድረስ ), an alternative hypothesis cannot be accepted.
Their key difference
1. A null hypothesis is a statement, in which there is no relationship between two
variables. An alternative hypothesis is a statement; that is simply the inverse of the
null hypothesis, i.e. there is some statistical significance between two measured
phenomenon.
2. A null hypothesis is what, the researcher tries to disprove whereas an alternative
hypothesis is what the researcher wants to prove.
3. A null hypothesis represents, no observed effect whereas an alternative hypothesis
reflects, some observed effect.
4. If the null hypothesis is accepted, no changes will be made in the opinions or actions.
Conversely, if the alternative hypothesis is accepted, it will result in the changes in the
opinions or actions.
5. As null hypothesis refers to population parameter, the testing is indirect and implicit.
On the other hand, the alternative hypothesis indicates sample statistic, wherein, the
testing is direct and explicit.
6. A null hypothesis is labelled as H0 (H-zero) while an alternative hypothesis is
represented by H1 (H-one).
7. The mathematical formulation of a null hypothesis is an equal sign but for an
alternative hypothesis is not equal to sign.
8. In null hypothesis, the observations are the outcome of chance whereas, in the case of
the alternative hypothesis, the observations are an outcome of real effect.
Thus, generally there are two outcomes of a statistical test, i.e. first, a null hypothesis is
rejected and alternative hypothesis is accepted, second, null hypothesis is accepted, on the
basis of the evidence. In simple terms, a null hypothesis is just opposite of alternative
hypothesis.
T-test & ANOVA
When it comes to achieving the mean of two or more population groups, ANOVA (Analysis of
variance) and t-test are the two best practices preferred. The t-test and ANOVA examine
whether group means differ from one another. The t-test compares two groups, while
ANOVA can do more than two groups.
Both t-test and ANOVA are the statistical methods of testing a hypothesis. And they both share the
following assumptions (their similarities):
 Sample drawn from the population is  Random data sampling 
normally distributed   Observations are independent
 Dependent variable is measured in
 Homogeneous variance ratio or interval levels
Their Difference
T test is used to compare the means of two independent sample/groups, whereas ANOVA is
used to compare the means of more than two independent sample/groups. T test is used for
the comparison between two groups of sample size (n) less than 30 for each group, while
ANOVA is used to compare three groups or more. T-test is a parametric inferential statistical
method used for comparing the means between two different groups (two-sample t-test) or
with the specific value (one-sample t-test). It provides more conservative results for small
sample datasets
T-Test
The Student's t test (also called T test) is used to compare the means between two groups and
there is no need of multiple comparisons as unique P value is observed. T test are three types
i.e., one sample t test, independent samples t test, and paired samples t test.
One-sample t-test
It is a statistical procedure used to determine whether mean value of a sample is statistically
same or different with mean value of its parent population from which sample was drawn. To
apply this test, mean, standard deviation (SD), size of the sample (Test variable), and
population mean or hypothetical mean value (Test value) are used. Sample should be
continuous variable and normally distributed. It is used when sample size is <30.
Independent samples t-test
Also called unpaired t test, is an inferential statistical test that determines whether there is a
statistically significant difference between the means in two unrelated (independent) groups?
To apply this test, a continuous normally distributed variable (Test variable) and a categorical
variable with two categories (Grouping variable) are used.
Paired samples t-test 
The paired samples t test, sometimes called the dependent samples t-test, is used to determine
whether the change in means between two paired observations is statistically significant? In
this test, same subjects are measured at two time points or observed by two different
methods.
ANOVA
In ANOVA, first gets a common P value. A significant P value of the ANOVA test indicates
for at least one pair, between which the mean difference was statistically significant. To
identify that significant pair(s), we use multiple comparisons. In ANOVA, when using one
categorical independent variable, it is called one-way ANOVA, whereas for two categorical
independent variables, it is called two-way ANOVA.  Its significant P value indicates that
there is at least one pair in which the mean difference is statistically significant.
One-way ANOVA 
 The One-way ANOVA is extension of independent samples t test
 Used for independent observations
 Uses one categorical independent variable
 means are compared among three or more independent groups
 In this test, one continuous dependent variable and one categorical independent
variable are used, where categorical variable has at least three categories.
Two-way ANOVA

 It is an extension of one-way ANOVA


 Two independent variables are used
 Its primary purpose is to understand whether there is any interrelationship between
two independent variables on a dependent variable
 In this test, a continuous dependent variable (approximately normally distributed) and
two categorical independent variables are used. 
Definition of Outlier in research: An outlier is an observation that lies an abnormal
distance from other values in a random sample from a population. In a sense, this definition
leaves it up to the analyst (or a consensus process) to decide what will be considered
abnormal. Generally they are bad data points.
Compariso
T-TEST ANOVA
n variable

ANOVA is an observable
t-test is statistical hypothesis test used
technique used to compare
Definition to compare the means of two
the means of more than two
population groups.
population groups.

Test
statistics

t-tests are used for pure hypothesis ANOVA is used to examine


Utilization
testing purposes. standard deviations

t-test compares two sample sizes (n) ANOVA equates three or


Feature
both below 30. more such groups.

ANOVA has more error


Error t-test is less likely to commit an error.
risks.

Sample from class A and B students


When one crop is being
have given a mathematics course may
Example cultivated from various seed
have different mean and standard
varieties.
deviation.

t-test can be performed in a double- ANOVA is one-sided test


Test
sided or single-sided test. due to no negative variance.

t-test is used when the population is ANOVA is used for huge


Population
less than 30. population counts.

Generally student's t test, ANOVA, and ANCOVA are the statistical methods frequently used
to analyze the data. Two common things among these methods are dependent variable must
be in continuous scale and normally distributed, and comparisons are made between the
means. All above methods are parametric method.[2] When the size of the sample is small,
mean is very much affected by the outliers, so it is necessary to keep sufficient sample size
while using these methods.

You might also like