You are on page 1of 30

chapter 9

RESEARCH MATTERS: CHOICE OF APPROPRIATE


STATISTICS, DATA INTERPRETATION AND
EXPLANATION OF OFTEN MISUSED TERMINOLOGIES

Professor Emmanuel E. Achor, PhD


Department of Curriculum & Teaching, Faculty of Education,
Benue State University, Makurdi; nuelachor@yahoo.com

Abstract
This paper addressed what matters most in research. These include choice of
appropriate statistic and data interpretation. It also provided explanation to
some confusing terminologies including those whose use in some contexts
are subject of debate.

Key words: Choice of statistic; data interpretation; scales; Likert scale; Chi
square; difference between achievement and performance; influence, impact,
effect and perceptual studies.

Introduction
In recent time many world over, have raised eyebrows on some of the
choices of statistics made, patterns of data analysis, interpretation and even
the way tables are made (not in line with 6th APA style). Many claimed that
what is in use are far outdated. Among experts, sometimes there are
148

Electronic copy available at: https://ssrn.com/abstract=2952685


disagreements in the manner data is analysed and interpreted with reference
to the statistical tool used. Sometimes it is alleged that some lecturers are old
schools.
In view of these complaints and confusion, there is need for
agreement on what should be accepted or rejected keeping in mind what is
current and generally acceptable (in line with 6th APA). This precisely
informed this topic.
In this discussion, emphasis is on appropriate choice of statistic, data
analysis and interpretation, decision making on hypotheses, conclusion,
recommendation, limitation and contribution to knowledge. Others are use
of Likert scale and its interpretation, chi square use, differences between
achievement and performance, distinction between effect, influence, impact
studies among others.

Choice of Statistics and Data Analysis


The choice of statistics is guided primarily and principally by the nature data
to be collected (discrete or continuous); other factors include all or some of
the followings: the design of the study, type of instrument used in collecting
data, statement of your hypotheses, what is acceptable, etc. Some general
rules and examples are presented in Table 1.

The necessary conditions for the use of parametric statistics


1 The distribution of the population must be normal.
2 the variable of the distribution must be independent.
3 the distribution must be of equal variance ie homogeneity of variance.

149

Electronic copy available at: https://ssrn.com/abstract=2952685


Table 1: Choice of statistics and conditions
Statistics Possible Design/s Likely Instrument/s Remarks

Mean & Applicable to all Achievement test, Possible with continuous data, Likert
standard designs Likert scales & scales but not head counts or data from
deviation other similar types inventory instruments, etc.

Frequency Applicable to survey, Generally, Generally it is discrete or non


counts, anecdotal study & inventory and continuous data. Percentage has a more
percentages & demographic data or scales robust use even with continuous data.
graphs biodata of all
instruments

Chi square: two Surveys, expost facto, Inventory ratings This is used where the assumptions of
basic types demographic data in (categorical data), parametric statistics about the
any design counts (numerical distribution are not satisfied. We have
data).Generally Chi square test of goodness of
deals with discrete fit/contingency and chi square test of
data. difference/independence. It must be
noted that any given data no matter
what, can fit into one form of chi square
or the other and sometimes some have
no interpretation. Watch! Chi square
test generally is about the weakest
statistics for testing hypotheses and
should be used with caution.

t-test: 2 main Surveys, expost facto, Achievement test, A parametric statistics. It tests
types ( true experiment, etc measurements. difference between means & must
Independent t- Data from Likert reflect in the way you state your
test and scales transformed. hypotheses (one or two tailed). The
dependent or Note that it must be source of data (different group or same
related t-test) the means that will group of people) determines if you
be used. should use independent or dependent t-
tests (that is, unrelated or related t-
tests). Watch!

You can use t-test in quasi experimental


study with caution, ie, by first establish
equivalence among subjects with the
pretest results.

ANOVA Surveys, expost facto, Achievement test, A parametric statistic, has same
(Analysis of true experiment, etc. measurements. conditions with independent t-test and
Variance) Data from Likert used when you have more than two
scales groups to compare. When such happens
(transformed) posthoc analysis using Scheffe or LSD
converted or test, etc is necessary to know direction
means used can fit of significance…by pairing them in
in. groups of two for comparison

ANCOVA Quasi Experimental Achievement test, Can be used for 2 or more groups. If
(Analysis of study mainly-pretest measurements. above 2 groups Pair wise analysis will
Covariance) posttest type. Data from Likert be necessary to know the direction of
Assumes that the scales transformed

150
study groups are not or means used can significance.
equivalent. fit in.
However, if the data do not satisfy the
assumptions of ANCOVA, ANOVA of
the post test scores using the pretest
measure as a blocking variable will be
appropriate.

Also gain mean could be used.

Other statistics Regression and correlation r could give


exist and we the direction as well the magnitude of
need to read relationship between two or more
before choice. variables.

Reliabilities

Cronbach Alpha Surveys, expost facto, Polytomously Good for Likert scales & its kinds
experiments, etc scored instrument where no one right or wrong answer
eg Likert scales & generally.
its kinds generally

Kuder Any design Dichotomously Achievement test, short answer


Richardson: two scored instrument; questions eg true or false, yes or no
types types.
K-R20 best for
instrument with Can be used for other instruments with
items which vary in cautions.
terms of their
difficulty level **Many people do not mind this
difference in application or sometimes
exchange them. Please support its use
with source. Eg Nworgu (2015) is in
K-R21 best for line with the position in this paper.
achievement test
with instrument
with items with the
same difficulty
level

Spearman Rank Any design Essay tests, etc Can If more than two judges or scorers use
Order be use for two other statistics; some use ANOVA but
correlation judges or scorers the result is not giving you “r”. Simply
only if the differences among the scorers ( 3
& above) are significant then it is
reliable

Spearman Any design Any instrument Split half type of reliability. Best with
Brown split into two eg achievement test and total item number
Prophecy odd even or first should be EVEN.
and second half.

Pearson Product Experimental designs Any instrument Can be use for two judges or scorers
Moment mainly with retention used but preferably only
test. achievement test

151
For other If more than two scorers are used in any
statistics please case above then Kendall Coefficient of
read before Concordance W would be applicable
making choices

This Table 2 should be considered as guide only, and each case should be considered on
its merits.

Table 2: Choice of statistics

152
(a) If data are censored.
(b) The Kruskal-Wallis test is used for comparing ordinal or non-Normal
variables for more than two groups (it is used when the distribution violate
assumptions of one-way ANOVA), and is a generalisation of the Mann-
Whitney U test (is the non-parametric statistics for 2, 3-way ANOVA). The
technique details are described in more advanced books and is available in
common software (Epi-Info, Minitab, SPSS).
(c) If the outcome variable is the dependent variable, then provided the
residuals (see) are plausibly Normal, then the distribution of the independent
variable is not important.

153
(d) There are a number of more advanced techniques, such as Poisson
regression, for dealing with these situations. However, they require certain
assumptions and it is often easier to either dichotomise the outcome variable
or treat it as continuous.

Presentation and Interpretation of Results


We shall use the following tables for presentation and interpretation of
results.

Table 3: ANOVA Test of Mean Difference among JS 1, JS 3 and SS3.

Mean
Sum of Squares Df Square F Sig.
Between Groups 1.276 2 .638 3.615 .028
Within Groups 52.440 297 .177
Total 53.717 299

Table 4: Scheffe Posthoc Test on Mean Rating of JS 1, SS 3 and SS 3.

(I) Class (J) Class Mean Difference (I-J) Std. Error Sig.
JS 1 JS 3 -.07691 .05930 .432
SS 3 -.15901* .05914 .028
JS 3
SS 3 -.08210 .05988 .392

For the ANOVA, F2, 297 = 3.62, P=.028 0.03<0.05. This is significant and
we therefore reject the null hypothesis. You either reject or do not reject a
null hypothesis. When P is less or equal to 0.05 it is significant but it is
154
not significant if P is greater than 0.05. This is the reverse of the old style
in terms of t-crit and t-cal; eg if t-cal≥tcrit, it is significant but if t -cal<t-
crit, it is not significant. The old style of t-calculated and t-critical is out
of use though correct. (This is because the output from electronic analysis
of SPSS and others gives both the value of the test statistics and the
associated level of sig unlike the manual that you need to read from
book). Alpha value of 0.05 means that in every 100 there is a chance that
95 is correct and 5 wrong or put differently for 100 patients experimented
with new drug under test, there is the chance that 5 persons will die. This
informs the choice of alpha value of 0.01 by Health sciences,
engineering, etc. We want drug tested and no person will die or at most
only a person will die out of 100. In education and social sciences we
need to note that when we reduce the probability of making Type 1(α)
error (setting the alpha level too high that you reject results that should be
accepted), we increase the probability of making Type 2 Error (β) (when
alpha value is set too low that you accept results that should be rejected
eg 0.1). It is therefore desirable to strike a balance on alpha level that is
not too small nor too large since we are dealing with objects, human
opinions and events.

Table 5: t-Test on mean difference between Male and Female Creativity rating
Std. T Sig. 2 tail
Gender N Mean Deviation df
Students Female 157 3.0916 .4576 296 0.131 0.896
Mean Rating Male
on Creativity 143 3.1565 .3822
t296 =0.13, P= 0.896>0.05; it is not significant. Do not reject null hypothesis.

Table 6: Influence of Fear Caused by Clashes on Educational Development


tems Description Std. Remarks
N Mean Deviation
155
1. Some parents do not allow their children
to go to school during communal conflict 600 3.2567 1.19156
for fear of being killed Agree
2. It has been alleged that during conflict,
most parents do not allow their children to
600 3.2867 1.14163
go to school in their neighbourhood
Agree
3. Teachers and students abandon their
schools to safer places during communal 600 3.1233 1.17207
conflict Agree
4. Communal conflict usually affect Agree
600 3.1617 1.11995
students’ school attendance
5. I prefer to remain indoor with my children Agree
than risk their lives to school during 600 3.2250 1.12835
conflict
Cluster mean 600 3.2100 - Agree
Upper and lower boundaries: Strongly Disagree 1.00 -1.49; Disagree 1.50 – 2.49;
Undecided 2.50 – 3.00; Agree 3.10 – 4.49; Strongly Agree 4.50 – 5.00

Table 7: Correlations
Test1 Test2
Test1 Pearson Correlation 1 .969**
Sig. (2-tailed) .000
Sum of Squares and Cross-products 3095.800 2840.800
Covariance 162.937 149.516
N 20 20
**. Correlation is significant at the 0.01 level (2-tailed).

The Scales
The construction of the scales involve generating a list of statement
(questions and items) about what is been measured and providing a set of
graduated response options. Using this graduated a respondent is expected to
indicate a degree of agreement or disagreement with the statement.
The advantageous side of the Likert Scale is that they are the most universal
method for survey collection, therefore they are easily understood. The
responses are easily quantifiable and subjective to computation of some
mathematical analysis. Since it does not require the participant to provide a
simple and concrete yes or no answer, it does not force the participant to
156
take a stand on a particular topic, but allows them to respond in a degree of
agreement; this makes question answering easier on the respondent. Also,
the responses presented accommodate neutral or undecided feelings of
participants. These responses are very easy to code when collating data since
a single number represents the participant’s response. Likert surveys are also
quick, efficient and inexpensive methods for data collection. They have high
versatility and can be sent out through mail, over the internet, or given in
person.
Attitudes of the population for one particular item in reality exist on a vast,
multi-dimensional continuum. However, the Likert Scale is uni-dimensional
and only gives 5-7 options of choice, and the space between each choice
cannot possibly be equidistant. Therefore, it fails to measure the true
attitudes of respondents. Also, it is not unlikely that peoples’ answers will be
influenced by previous questions, or will heavily concentrate on one
response side (agree/disagree). Frequently, people avoid choosing the
“extremes” options on the scale, because of the negative implications
involved with “extremists”, even if an extreme choice would be the most
accurate. While these remain strong criticisms for the use of Likert scale to
measure attitude, it only calls for use with caution as the scale is still
relevant.

Table 10: Likert Scale

Likert Type Items


Strongly Disagree Neutral= Agree Strongly
Disagree=1 =2 3 =4 Agree=4

Prayer and fasting has been a good SD D N A SA


experience for me.

157
My parents have provided support SD D N A SA
for my prayer and fasting
programme.

Likert Items
I eat healthy foods on a regular SD D N A SA
basis.
When I purchase food at the grocery SD D N A SA
store, I ignore "junk" food.

Difference between Likert-type items and Likert scale items is that Likert-
type items are single questions that use some aspects of the original Likert
response alternatives eg prayer and fasting as in Table 10. Here, multiple
questions may be used in a research instrument and there is no attempt by
the researcher to combine the responses from the items into a composite
scale. Each item stands for an idea (eg experience and provision of support
by parents). A Likert scale, on the other hand, is composed of a series of
four or more items that are combined into a single composite score/variable
during the data analysis process. Combined, the items are used to provide a
quantitative measure of a character or personality trait (in Table 10 it is
healthy food eg ignoring junk still means eating healthy foods).
The statements are framed such that half are positively cued whereas the
other half is negatively cued. To avoid response set, the positive and
negative statements are placed in alternate positions. The response options or
categories are weighted or scored in such a way that a higher value indicates
a more positive/intense response or attitude. Thus for the positively cued
statement, the options are weighed or scored as follows:

AS = 5; A = 4; U=3 D=2 SD = 1
And for the negatively cued item, the weighting /scoring is reversed thus:
AS = 1; A = 2; U=3 D=4 SD = 5

158
Several variations of the Likert scale, with varying number of points and
descriptions (ie adjectives) can be found in the literature.
The response options are not always expressed in terms of degree of
agreement or disagreement. Other appropriate terms may be used in place of
agreement. For instance, other expressions (adjectival labels) or descriptions
such as those indicating degree of importance or adequacy could be used.
The use and interpretation of the ‘undecided’ or ‘neutral’ response category
in the Likert-type scale has become quite controversial. The main issues here
which border on weighing and position of the ‘undecided’ response category
are examined below with proposal on how these could be resolved.

Weighting and Position of Neutral or Undecided in the Likert-Type


Scale
The Likert-type scale, no doubt, enjoys a good measure of popularity among
educational and other social science researchers. This scale is constructed
(after Likert) to have a midpoint labeled either ‘undecided’ or ‘neutral’ or
‘uncertain’ on an option/attitude continuum ranging from “strongly
disagree” to “strongly agree”. On a five point scale ranging from 1 to 5, a
response of ‘undecided’ (or ‘neutral’ or ‘uncertain’) attracts a score of 3. In
other words the scale point labeled ‘undecided’ or ‘neutral’ or ‘uncertain’ is
interpreted to represent some measure of opinion or attitude which has a
magnitude of 3 on a scale yielding a maximum score of 5. This practice and
interpretation seem to be illogic. This is because when a person is said to be
‘neutral’ or ‘uncertain’ about an issue, then such a person has no
opinion/attitude on such an issue. A response of ‘undecided’ is therefore not
interpretable as a measure of opinion. Hence it would be absurd to assign a
weight or value of 3 to such a response category. In the same line of
argument the point undecided cannot lie half-way on the continuum.
Perhaps, the logic behind this practice is that as one progresses from one
pole of the continuum to the other, there is a point of transition, a mid-point
between the two opinion/attitude extremes. This sounds logic. Such a point
is conceivable and must correspond to some measure of opinion/attitude. But
to label such a point ‘undecided’ or ‘neutral’ or ‘uncertain’ is a logical
absurdity. ‘Undecided’ (or ‘neutral’ or ‘uncertain’) means having no
opinion/attitude. it corresponds to no measure of opinion/attitude. Therefore,
having no opinion cannot attract a score of 3 on a scale of maximum value
of 5. In fact, no matter the scale value it should not attract any score.
159
Needless to say that this misinterpretation has supported the collection and
use of incorrect (spuriously high) data in opinion/attitude studies, a situation
which cast serious questions on the validity of the results of such studies.

Resolving the Problem


How then can this basic scaling issue be resolved? One proposal is to drop
the undecided from the scale and use a 4-points scale from strongly agree to
strongly disagree (SA-4, A-3, D-2 & SD-1). Nothing could be more real and
practical than encountering people who are undecided, uncertain or neutral
about an issue, event or object. If an opinion/attitude instrument fails to
make provision for this real and practical situation, the respondents will now
be forced to indicate an opinion/attitude even though he truly have no
opinion/attitude on the particular issue. This proposal raised a new issue
namely that of forced response. Notwithstanding, the modified 4-point
Likert-type scale as proposed here is recommended as many experts in
measurement prefer this option.
The second proposal involves the modification in the position and weighting
of the undecided response category. This is based on the fact that the point
undecided (uncertain or neutral) is not located at the middle of the
continuum. Rather, it represents a situation or condition of the mind prior to
the formation of any opinion/attitude on the issue in question. It is not a
transition state from one type of opinion/attitude (say negative) to another
(say positive) as has been misconstrued by some people. Based on the
foregoing, the undecided point should mark the beginning of the scale and
be interpreted as absolute zero since it corresponds to a complete absence of
opinion/attitude in the particular circumstance. This modification can be
represented as (SA-4, A-3, D-2, SD-1 & U-0). This is being used by some
people though not very popular
First, the modification has recognized that respondents could actually
be undecided, uncertain or neutral about certain issues, events and objects.
Second, this condition of no opinion has properly been reflected on the scale
both in terms of position and weighting.

Table 11. Suggested Data Analysis Procedures for Likert-Type and Likert Scale
Data

160
Likert Type Data Likert Scale Data

Central Median or mode Mean


Tendency
Variability Frequencies Standard deviation
Associations Kendal tau B or Pearson’s r
C
Other statistics Chi Square ANOVA, t-test, regression

Chi Square Statistic

Chi-square test for independence is applied when you have


two categorical variables from a single population. It is used to
determine whether the variables dependent or not there is a
significant association between the two variables.
For example, in an election survey, voters might be classified by
gender (male or female) and voting preference (Democrat,
Republican, or Independent). We could use a chi-square test for
independence to determine whether voting preference is dependent
on gender as related to voting preference.
Chi square test of independence is appropriate when the following
conditions are met:
 The sampling method is simple random sampling.
 The variables under study are each categorical. i.e each
person, item or entity contributes to only one cell of the
contingency table.
 If sample data can be displayed in a contingency table, the
expected frequency count for each cell of the table is at least
5.
Suppose that Variable A has r levels, and Variable B has c levels.
The null hypothesis states that knowing the level of Variable A
does not help you predict the level of Variable B. That is, the
variables are independent. H0: Variable A and Variable B are
161
independent.

Degrees of freedom. The degrees of freedom (DF) is equal


to: DF = (r - 1) * (c - 1)
where r is the number of row and c is the number of
column; * means multiplication.
Test statistic. The test statistic is a chi-square random
variable (χ2) defined by the following equation.
χ2 = Σ [ (Or,c - Er,c)2 / Er,c ]
where Or,c is the observed frequency count at level r of
Variable A and level c of Variable B, and Er,c is the
expected frequency count at level r of Variable A and
level c of Variable B.

Example
A public opinion poll surveyed a simple random sample of
1000 voters. Respondents were classified by gender (male
or female) and by voting preference (Republican,
Democrat, or Independent). Results are shown in
the contingency table below.

162
Voting Preferences
Row total
Republican Democrat Independent

Male 200 150 50 400

Female 250 300 50 600

Column total 450 450 100 1000

Is there a gender gap? Do the men's voting preferences differ significantly


from the women's preferences? Use a 0.05 level of significance.

Analyze sample data. Applying the chi-square test for independence


to sample data, we compute the degrees of freedom, the expected
frequency counts, and the chi-square test statistic. Based on the chi-
square statistic and the degrees of freedom, we determine the P-value.
DF = (r - 1) * (c - 1) = (2 - 1) * (3 - 1) = 2

Er,c = (nr * nc) / n


E1,1 = (400 * 450) / 1000 = 180000/1000 = 180
E1,2 = (400 * 450) / 1000 = 180000/1000 = 180
E1,3 = (400 * 100) / 1000 = 40000/1000 = 40
E2,1 = (600 * 450) / 1000 = 270000/1000 = 270
E2,2 = (600 * 450) / 1000 = 270000/1000 = 270
E2,3 = (600 * 100) / 1000 = 60000/1000 = 60

Χ2 = Σ [ (Or,c - Er,c)2 / Er,c ]


Χ = (200 - 180) /180 + (150 - 180) /180 + (50 - 40)2/40
2 2 2

+ (250 - 270)2/270 + (300 - 270)2/270 + (50 - 60)2/60


2
Χ = 400/180 + 900/180 + 100/40 + 400/270 + 900/270 + 100/60
Χ2 = 2.22 + 5.00 + 2.50 + 1.48 + 3.33 + 1.67 = 16.2
where DF is the degrees of freedom, r is the number of levels
of gender, c is the number of levels of the voting preference,
nr is the number of observations from level r of gender, nc is
the number of observations from level c of voting preference,
n is the number of observations in the sample, Er,c is the
expected frequency count when gender is level r and voting
163
preference is level c, and Or,c is the observed frequency count
when gender is level r voting preference is level c.
The P-value is the probability that a chi-square statistic
having 2 degrees of freedom is more extreme than 16.2.
We use the Chi-Square Distribution Calculator to find P(χ2 >
16.2) = 0.0003.

Interpret results. Since the P-value (0.0003) is less than the


significance level (0.05), we reject the null hypothesis. Thus, we
conclude that there is a relationship between gender and voting
preference.

Chi-square goodness of fit test is applied when you have


one categorical variable from a single population. It is used to
determine whether sample data are consistent with a hypothesized
distribution. There are two variants here; first when there is no
policy or theory stipulating how the frequency should be
distributed, second there is a policy or theory stipulating how the
frequency should be distributed. Eg university admission policy of
science to arts ratio of 60-40, quota system, federal character, etc.

For example, suppose a company distributed textbooks. It claimed


that 30% of its books were English texts; 60%, sciences; and 10%,
social sciences. We could gather a random sample of textbooks
distributed and use a chi-square goodness of fit test to see whether
our sample distribution differed significantly from the distribution
claimed by the company. The sample problem at the end of the
lesson considers this example.

164
When to Use the Chi-Square Goodness of Fit Test
The chi-square goodness of fit test is appropriate when the
following conditions are met:
 The sampling method is simple random sampling.
 The variable under study is categorical.
 The expected value of the number of sample observations in
each level of the variable is at least 5.
For a chi-square goodness of fit test, the hypotheses take the
following form:

H0: The distribution of the observed frequencies of a given


phenomena or event is a good fit to that stipulated by a given
policy or theory (p<0.05).

H0: The pattern of occurrence of the phenomena as shown by the


observed frequency is due to chance (p<0.05).
Test method. Use the chi-square goodness of fit test to
determine whether observed sample frequencies differ
significantly from expected frequencies specified in the null
hypothesis.

Using sample data, find the degrees of freedom, expected


frequency counts, test statistic, and the P-value associated with the
test statistic.
 Degrees of freedom. The degrees of freedom (DF) is equal to
the number of levels (k) of the categorical variable minus 1:
DF = k - 1.
 Expected frequency counts. The expected frequency counts
at each level of the categorical variable are equal to the

165
sample size times the hypothesized proportion from the null
hypothesis
Ei = npi
where Ei is the expected frequency count for the ith level of
the categorical variable, n is the total sample size, and p i is
the hypothesized proportion of observations in level i.
 Test statistic. The test statistic is a chi-square random
variable (χ2) defined by the following equation.
χ2= Σ [ (Oi - Ei)2 / Ei ]
where Oi is the observed frequency count for the ith level of
the categorical variable, and Ei is the expected frequency
count for the ith level of the categorical variable.
 P-value. The P-value is the probability of observing a sample
statistic as extreme as the test statistic. Since the test statistic
is a chi-square, use the Chi-Square Distribution Calculator to
assess the probability associated with the test statistic. Use
the degrees of freedom computed above.

Interpret Results
If the sample findings are unlikely, given the null hypothesis, the
researcher rejects the null hypothesis. Typically, this involves
comparing the P-value to the significance level, and reject the null
hypothesis when the P-value is less than the significance level.
Notice, if p>0.05 level of significance we accept null hypothesis
but if p<0.05 we reject the null hypothesis.
Test Your Understanding

166
Example
Heinemann Company distributes books. The company policy is
that 30% of the books are English texts, 60% science, and 10% are
social sciences.
Suppose a random sample of 100 books has 50 English texts, 45
sciences, and 5 social sciences. Is this consistent with Heinemann’s
policy? Use a 0.05 level of significance.
The solution to this problem takes four steps: (1) state the
hypotheses, (2) formulate an analysis plan, (3) analyze sample
data, and (4) interpret results. We work through those steps below:

 Analyze sample data. Applying the chi-square goodness of


fit test to sample data, we compute the degrees of freedom,
the expected frequency counts, and the chi-square test
statistic. Based on the chi-square statistic and the degrees of
freedom, we determine the P-value.
DF = k - 1 = 3 - 1 = 2

(Ei) = n * pi
(E1) = 100 * 0.30 = 30
(E2) = 100 * 0.60 = 60
(E3) = 100 * 0.10 = 10

χ2 = Σ [ (Oi - Ei)2 / Ei ]
χ = [ (50 - 30) / 30 ] + [ (45 - 60) / 60 ] + [ (5 - 10)2 / 10 ]
2 2 2

χ2 = (400 / 30) + (225 / 60) + (25 / 10) = 13.33 + 3.75 + 2.50 = 19.58
where DF is the degrees of freedom, k is the number of levels of the categorical
variable, n is the number of observations in the sample, Ei is
the expected frequency count for level i, Oi is the observed
frequency count for level i, and χ2 is the chi-square test
statistic.
The P-value is the probability that a chi-square statistic
having 2 degrees of freedom is more extreme than 19.58.
167
We use the Chi-Square Distribution Calculator to find P(χ2 >
19.58) = 0.0001.

Interpret results. Since the P-value (0.0001) is less than the significance
level (0.05), we cannot accept the null hypothesis. Notice, if p>0.05 level of
significance we do not reject null hypothesis but if p<0.05 we reject the null
hypothesis.
These rigor may not be necessary if you use SPSS for analysis. However, the
interpretation is the same.

Academic Achievement or Academic Performance?


Attempts to differentiate between these two educational
terminologies have often resulted to confusion but it should not so.
Academic achievement represents performance outcomes that
indicate the extent to which a person has accomplished specific
goals that were the focus of activities in instructional
environments, specifically in school, college, and university.
Achievement therefore is accumulation of all performances from
all measures and could be indicated by a terminal grade or CGPA.
School systems mostly define cognitive goals that either apply
across multiple subject areas (e.g., critical thinking) or include the
acquisition of knowledge and understanding in a specific
intellectual domain (e.g., numeracy, literacy, science, history).
Therefore, academic achievement should be considered to be a
multifaceted construct that comprises different domains of
learning. Because the field of academic achievement is very wide-
ranging and covers a broad variety of educational outcomes, the
definition of academic achievement depends on the indicators used
to measure it. Among the many criteria that indicate academic
achievement, there are very general indicators such as procedural
and declarative knowledge acquired in an educational system,
168
more curricular-based criteria such as grades or performance on an
educational achievement test, and cumulative indicators of
academic achievement such as educational degrees and certificates.
All criteria have in common that they represent intellectual
endeavours and thus, more or less, mirror the intellectual capacity
of a person.
In developed societies, academic achievement plays an important
role in every person’s life. Academic achievement as measured by
the GPA (grade point average) or by standardized assessments
designed for selection purpose such as the SAT (Scholastic
Assessment Test) determines whether a student will have the
opportunity to continue his or her education (e.g., to attend a
university). Therefore, academic achievement defines whether one
can take part in higher education, and based on the educational
degrees one attains, it could influence one’s vocational career after
education. Besides the relevance for an individual, academic
achievement is of utmost importance for the wealth of a nation and
its prosperity. The strong association between a society’s level of
academic achievement and positive socioeconomic development is
one reason for conducting international studies on academic
achievement, such as PISA (Programme for International Student
Assessment), administered by the OECD (Organisation for
Economic Co-operation and Development). The results of these
studies provide information about different indicators of a nation’s
academic achievement; such information is used to analyze the
strengths and weaknesses of a nation’s educational system and to
guide educational policy decisions. Given the individual and
societal importance of academic achievement, it is not surprising
that academic achievement is the research focus of many scientists;
for example, in psychology or educational disciplines.
When people hear the term “academic performance” they often
think of a person’s GPA. However, several factors indicate a
student’s academic success. Academic performance is measured by
the final grade earned in the course. The academic performance is
169
defined by students’ reporting of past semester CGPA/GPA and
their expected GPA for the current semester. The grade point
average or GPA is now used by most of the tertiary institutions as
a convenient summary measure of the academic performance of
their students. The GPA is a better measurement because it
provides a greater insight into the relative level of performance of
individuals and different group of students. This implies that even
termly results in subjects and experimental studies carried out over
6 to 12 weeks indicate performance and not achievement.
Consider a study by Musa, S. A. A. (2017) titled “Effects of
Motivation-Enhanced Activity-Based Learning of Difficult Physics
Concepts and Cognitive Load on Senior Secondary Two Students’
Achievement and Academic Engagement”. Experts (that is, those
in measurements) are of the view that it should read “Effects of
Motivation-Enhanced Activity-Based Learning of Difficult Physics
Concepts and Cognitive Load on Senior Secondary Two Students’
Performance and Academic Engagement”.

Discussion, Conclusion, Recommendations, Limitation and


Contribution to Knowledge
These are discussed briefly in this section.

a.Discussion of findings
In discussion of findings it is necessary to note the following:
1. It is not necessary to represent results afresh. Simply state
the finding and move on to discussion. People are not
comfortable to see figures while discussing your results.
Figures stop at the level of result presentation.
2. It is appropriate that only empirical studies are used to
discuss empirical findings.
3. If you want people to cite your work and ideas, it is
appropriate to adduce reason/s for your findings while
170
discussing. Maturity is displayed when convincing reason/s
are given for a finding. It is necessary.
4. Until you have linked your finding/s to previous studies and
or suggested and explained reason for the finding,
discussion has not taken place but mere presentation of
result.
b.Conclusion
Conclusions are categorical statements on your findings.
Simply put it is your position in the paper especially if it is a
non empirical paper. By implication it is based on both the
finding and convincing discussion done to enable you take a
stand. For instance a finding in a study is “there is no
significant difference in mean achievement between male and
female students taught basic science using activity method”.
The conclusion from this finding could be “gender is not a
factor in students’ achievement in basic science when activity
method is used in teaching”. Put differently, “activity method
is a good strategy that could be used to eliminate gender
differences in students’ achievement in basic science”. The
conclusion from here is not “male and female students do not
differ significantly in basic science achievement” as often
presented by many.

c. Recommendations
Recommendations must be an offshoot of a finding in a study.
Do not recommend on what you did not find and do not extend
the coverage beyond your geographic and content scope
especially for non experimental studies. Even for experimental
studies (quasi inclusive), a study on methods using concept
mapping cannot be generalized to include discussion method or
a study on primary school pupils cannot be generalized to
include university students. From a finding on attitude of
students towards science in Gwer West LGA, you cannot freely
recommend what is to be done in the entire Benue State.
171
Though debatable there is a school of thought that says there is
an exception to this in Experimental studies that are not
culture biased, that is, one can generalize the recommendation
beyond the immediate scope. This also is linked to why we
accept small sample size for such studies. For instance, what is
the sample size and how many countries/continents did Pavlov
conduct his experiment on salivating dog before the
generalized conclusion given as a theory on operant
conditioning? Consider this finding “there is no significant
difference in mean achievement between male and female
students taught basic science using activity method” as
example. A recommendation could be “science teachers in
Gwer West LGA are encouraged to use activity method in
teaching in coeducational secondary schools since it is gender
friendly”.

d.Limitation of study
Limitation is different from delimitation or scope. Limitation is
not limits or set back experienced in your study due to finance,
time, distance, interest, inconvenience, etc. which you had
control over, and could be avoided or you were not compelled
to do. However, limitations are those things which at the start
of study were not envisaged but came up suddenly and beyond
your control. These may include attrition rate of subjects in the
study due to sudden Fulani invasion in Gwer West LGA; Strike
action embarked upon by teachers which affected your original
plan and time lines; use of small sample size in a survey which
of course is the entire population and you cannot get more as
described in the required sample characteristics; use of quasi
experimental design in a study that true experimental design
could have been better but because it is a school setting you
cannot use it, etc. In the last two examples the researcher was
aware of the limitations from the start of the study but went for
a better option and such cases steps are taken to forestall their
172
effects like use of ANCOVA for data analysis, use of entire
population as sample, etc.

e. Contribution to knowledge
It is better to first identify where one could derive contribution
to knowledge from in a study. Usually it is from the statement
of problem and summary of literature review. Though the
significance of the study may imply that but is more of utility
than novelty and therefore not often used to determine
contribution to knowledge. Note that for any study you cannot
locate those it will be significant too, the essence of that study
stands questionable and the same applies to contribution to
knowledge. The following could guide if a study has
contribution to knowledge:
1.What is new in it? What is the need gap that your findings
will close? Is the study the first to address such issue such
that your finding is entirely new and novel? Such studies
may lack relevant empirical literature for review but it is
allowed for novelty sake.
2.Is it the first in the study area/location though studied in other
location? This kind of study could be a replication or
modified form but not significantly different from previous
studies.
3.Are you doing it because of time lag, generation gap or
noticeable changes after many years(say 10 years and
above)?. This of course must be stressed in background and
statement of problem.
4.Are you introducing new variables into a previous study?
This could be dependent, independent or moderator variable.
This gives a fairly new direction to the study in purpose,
scope and findings.
173
5.Is it an action research? In general, a research targeted at
finding way of solving an identified/existing
problem/challenge on ground within the shortest possible
time is action research. Here it does not matter if such study
was carried out previously in other areas or same area but it
is determined by what issue is currently on ground which
has not been addressed. Remember that if previous studies
have adequately addressed it there may be no need for
another. Any research that is directed by the teacher in the
classroom to address his/her challenge is also called action
research. Notice that there may be ethical issues to consider
with action research, such as, "Is it fair for one group to get a
type of instruction that may be more effective than another?"
This aspect is seriously abused by the way people use lecture
method in control group even when they know it is not a
comparable method to that used in experimental class and
will place the control group at a disadvantage.
6.In general it is not all findings in a study that may form part
of contribution to knowledge rather it is those that are novel,
striking, fill observed gap in literature, or raise surprises.

f. Effect, impact, influence and perceptual studies

People use effect, impact and influence to define their


research topics interchangeably even when it should not be
so. The design in most cases determines which operator to
select. Perception cannot be used alone but could be
perceived influence, perceived impact or perceived effect;
the last not very common. Debate on how they should be
used is high and not yet resolved or divergent. Each is
explained briefly here from the perspective of the writer:

174
1. Effect is often used in cause and effect studies, expost
facto and causal comparative studies or
experimental/quasi design studies eg, Effect of cognitive
reasoning ability and prior exposure to content on Upper
Basic two students’ achievement/performance in Basic
Science.

2. Influence is used mainly in survey, expost facto and


causal comparative studies; eg, Influence of school
location and type on the senior secondary two physics
students’ achievement/performance.

3.Impact is often used in survey, expost facto and causal


comparative studies and sometimes experimental designs
depending on variables what is being investigated. If there is
an intervention (be it new way of teaching such as group
project, trial testing of a newly introduced programme, PTA
assistance in staff employment on the teaching staff strength
in a named school or location, etc), impact could be used as
an operator word. Most people belief that impact studies
should result in something causing a visible effect or
influence that is visible and measurable. Consider this topic:
The influence of international collaboration on the impact
of research results. Though an influence study, the
meaning of impact is shown clearly.

4.Perception is the process of recognizing and interpreting


sensory stimuli. Perception is our sensory experience of the
world around us and involves both the recognition of
environmental stimuli and actions in response to these
stimuli. Through the perceptual process, we gain information
about properties and elements of the environment that are
175
critical to our survival. Perception not only creates our
experience of the world around us; it allows us to act within
our environment. There is a school of thought that claims
that all opinion pooling studies are perceptual studies also.
The question is, must perception be reflected in the topic
before one could know that it is a perception study? Eg,
perceived influence of ….This again is debatable.

Conclusion
What matters most in research is central in this paper. They include
choice of appropriate statistic and data interpretation. It also
provided explanation to some confusing terminologies including
those whose use in some contexts are subject of debate. The fact
remains that research is dynamic and knowledge about research
must be constantly changing to remain relevant. Though this
chapter is not all inclusive, there is much to be consulted once you
know what you are looking for.

176
References/Bibliography
Achor, E. E. & Ejigbo, M. A. (2006). A guide to writing research
report. Lagos-Nigeria: Sam Artrade

Andy, F. (2006). Discovering statistics using SPSS. London: Sage


Publications.

Boone, H. N. & Boone, D. A (2012). Analyzing Likert data.


Journal of Extension, 50(92). http:/ /www. joe.org /joe/
2012april/tt2p.shtml

Harbor-Peters, V. F. (1999). Noteworthy points in measurement


and evaluation. Enugu: SNAAP Press Ltd.

Likert, R. (1932). A technique for the measurement of


attitudes. Archives of Psychology, 140.

Musa, S. A. A. (2017). Effects of motivation-enhanced activity-


based learning of difficult physics concepts and cognitive load
on senior secondary two students’ achievement and academic
engagement. Unpublished PhD thesis, Benue State University,
Makurdi, Nigeria

Nworgu, B. G. (2015). Educational research: Basic issues and


methodology. Nsukka: University Trust Publisher.

Pallant, J. (2001). A step by step guide to data analysis using SPSS for
Windows (Versions 10 and 11). Australia: Allen & Unwin.

177

You might also like