Professional Documents
Culture Documents
net/publication/299859913
CITATIONS READS
0 917
1 author:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
اﻟﺸﺨﺼﻴﺔView project
All content following this page was uploaded by Alsedig Abdalgadr Al-Shahomee on 07 April 2016.
'
((
'
)*+,(
(
-
#
*$
'%
.
-
*
!"!
!
"
# $
!%
!
&
$ '
'
($
' # % %
)
%* %'
$ '
+
"
%
&
'
! #
$,
( $
-
.
!
"-
( %
.
% % % %
$
$ $ -
-
- -
i
Acknowledgments
I begin by praising ALLAH Almighty. I praise him and seek his help and pleasure. I
wish to express my grateful appreciation to Prof. Richard Lynn and my supervisor
Prof. Peter Eachus and my co-supervisor Dr Simon Cassidy. My thanks should also
go to all the participants who took part in this study, and all those who helped me
during this study, especially my colleagues Prof. A. Attashani, Prof. S. Elghmary, Dr.
M. Hammad and Mr. K. Khelifa. Finally, special thanks to my parents, wife, children:
Abubaker, Ashraf, Alamin and zahra who make my life worthy. It is also to my sister
and brothers for their understanding, support and faithfulness during the years of my
study in England .
ii
Contents
Page
Tables.......................................................................................................................... vii
Figures......................................................................................................................... xi
iii
2.6.1.2 Classification of tests according to procedure of administration…………. 36
2.6.1.3 Classification of tests according to content……………………………….. 37
2.7 Use of Mental Tests……………………………………………………….. 37
2.8 Use of Intelligence Tests………………………………………………….. 38
2.9 Culture-Free and Culture-Fair Tests………………………………………. 41
2.10 Achievement Tests………………………………………………………… 44
2.11 Intelligence and academic achievement…………………………………… 47
2.12 Increase in IQ with time…………………………………………………… 50
2.13 Chapter Summary………………………………………………………….. 57
iv
4.7.1 Content Validity…………………………………………………………… 105
4.7.2 Construct Validity…………………………………………………………. 106
4.7.2.1 Factor analysis…………………………………………………………….. 107
4.7.2.2 Internal consistency……………………………………………………….. 110
4.7.3 Criterion-related Validity…………………………………………………. 111
4.7.3.1 Correlation of SPM test with Intelligence Tests…………………………... 112
4.7.3.2 Correlation of SPM test with Achievement Tests…………………………. 120
4.8 Item analysis of the SPM test……………………………………………… 130
4.8.1 Item difficulty…………………………………………………………….... 130
4.8.1 Item discrimination………………………………………………………... 131
4.9 Review of previous studies that employed SPM test……………………… 132
4.9.1 Studies on SPM test in developed countries………………………………. 134
4.9.1 Studies on SPM test in developing countries……………………………… 146
4.10 Chapter Summary………………………………………………………….. 157
v
5.13 Chapter Summary………………………………………………………….. 186
vi
Chapter seven: META-ANALYSIS
7.1 Introduction………………………………………………………………... 240
7.2 Advantages of Meta-analysis……………………………………………… 241
7.3 Disadvantages of Meta-analysis…………………………………………… 242
7.4 Literature review…………………………………………………………... 243
7.5 Method…………………………………………………………………….. 244
7.5.1 Criteria for studies selection………………………………………………. 244
7.5.2 Strategy of analysis………………………………………………………... 246
7.6 Results……………………………………………………………………... 248
7.6.1 SPM means and standard deviations according to the independent
variables…………………………………………………………………… 251
7.6.2 Differences in SPM scores………………………………………………… 252
7.6.2.1 Difference according to development status………………………………. 252
7.6.2.2 Difference according to age groups……………………………………….. 253
7.6.2.3 Difference according to gender……………………………………………. 255
7.6.2.4 Difference according to development status and age……………………… 256
7.6.2.5 Difference according to development status and gender………………….. 260
7.6.2.6 Difference according to age groups and gender…………………………… 262
7.6.3 Multiple Regressions according to the independent variables…………….. 266
7.7 Chapter Summary………………………………………………………….. 267
vii
8.5.3 SPM and gender…………………………………………………………… 292
8.5.4 SPM and region……………………………………………………………. 297
8.5.5 SPM and age (study level)………………………………………………… 298
8.5.6 SPM and academic discipline……………………………………………... 301
8.5.7 Relationship and prediction of SPM………………………………………. 301
8.5.8 SPM percentiles…………………………………………………………… 302
8.6 Study conclusions…………………………………………………………. 305
8.7 Study contributions………………………………………………………... 308
8.8 Limitations of the Study…………………………………………………… 308
8.9 Recommendations of the Study…………………………………………… 313
8.10 Further research……………………………………………………………. 315
viii
Tables
Page
Table 4.1 SPM standardization studies……………………………………………… 96
Table 4.2 Summary of the studies performed on the SPM test reliability…………... 103
Table 4.3 Summary of studies on SPM test concurrent validity with r to z Fisher’s
transformation results…………………………………………………….. 118
Table 4.4 The average of the correlation between SPM test with intelligence tests... 119
Table 4.5 Summary of the studies on SPM test predictive validity and with r to z
Fisher’s transformation results…………………………………………… 127
Table 4.6 The average of correlation between the SPM test and achievement tests... 129
Table4.7 Shows a sample of worldwide studies that utilised the SPM test as a …. 132
Table 5.1 Principals of selecting sample in schools………………………………… 175
Table 5.2 The target sample size for selecting the pre-university students in the two
cities in proportion to their real numbers…………………………………. 175
Table 5.3 The target sample size for selecting the pre-university students in the
nine villages in proportion to their real numbers…………………………. 176
Table 5.4 The target sample size for selecting the undergraduate university students
in Omar El-Mukhtar University in proportion to their real numbers…….. 176
Table 6.1 Descriptive statistics of overall collected data and tests of normality……. 188
Table 6.2 SPM score means and standard deviations……………………………….. 191
Table 6.3 SPM test-retest reliabilities according to age, gender and study levels…... 193
Table 6.4 SPM split-half reliabilities according to gender, age and total Sample…... 194
Table 6.5 SPM Alpha reliabilities according to gender, age and total sample……… 195
Table 6.6 Correlations matrix between the five sets of the SPM test among Libyan
male and female students (N=2600, 8 to21 years) and extracted factor….. 196
Table 6.7 Correlations matrix between the five sets of the SPM test among Libyan
male students (N=1300, 8 to21 years) and Extracted Factor……………... 198
Table 6.8 Correlations matrix between the five sets of the SPM test among Libyan
female students (N=1300, 8 to21 years) and extracted factor……………. 199
Table 6.9 Correlations coefficients between the five sets and the total scores of the
SPM test (n=2600, age 8 to21 years)…………………………………….. 200
ix
Page
Table 6.10 Correlations coefficients between the five sets and the total scores of the
SPM test (males n=1300 and females n= 1300, age 8 to21 years)……….. 201
Table 6.11 Correlation between the SPM and achievement scores according to age,
level of study, gender, academic discipline and total sample…………….. 202
Table 6.12 Item difficulty (percentages of correct answers) and SPM Means of the
Correct Answers (N = 2600)……………………………………………... 203
Table 6.13 Index of Discrimination and Items Evaluation…………………………… 205
Table 6.14 Point biserial and significant level for each SPM item…………………… 205
Table 6.15 Summary of item analysis of the five SPM sets………………………….. 206
Table 6.16 Comparison of gender…………………………………………………….. 208
Table 6.17 Comparison of regions……………………………………………………. 209
Table 6.18 Comparison of academic discipline………………………………………. 210
Table 6.19 Comparison of geographic areas…………………………………………. 211
Table 6.20 Post Hoc Tukey (HSD) Test……………………………………………… 211
Table 6.21 Comparison according to age…………………………………………….. 212
Table 6.22 Post Hoc Tukey (HSD) Tests…………………………………………….. 213
Table 6.23 Comparison according to study levels…………………………………… 214
Table 6.24 Post Hoc Tukey (HSD) Test……………………………………………… 214
Table 6.25 Comparison of the region according to study levels……………………... 215
Table 6.26 Levene's Test of Equality of Error Variances of SPM scores……………. 215
Table 6.27 Tests of Between-Subjects Effects of SPM scores……………………….. 215
Table 6.28 Post Hoc Tukey (HSD) Test……………………………………………… 216
Table 6.29 Comparison of the regions according to gender………………………….. 217
Table 6.30 Levene's Test of Equality of Error Variances of SPM scores……………. 217
Table 6.31 Tests of Between-Subjects Effects of SPM scores……………………….. 217
Table 6.32 Comparison of age according to region…………………………………... 218
Table 6.33 Levene's Test of Equality of Error Variances of SPM scores……………. 218
Table 6.34 Tests of Between-Subjects Effects of SPM scores………………………. 219
Table 6.35 Post Hoc Tukey (HSD) test………………………………………………. 219
Table 6.36 Comparison of the geographic areas according to gender………………... 221
Table 6.37 Levene's Test of Equality of Error Variances of SPM scores……………. 221
Table 6.38 Tests of Between-Subjects Effects of SPM scores……………………….. 221
x
Page
Table 6.39 Post Hoc Tukey (HSD) Test……………………………………………… 222
Table 6.40 Comparison of academic discipline according to gender………………… 223
Table 6.41 Levene's Test of Equality of Error Variances of SPM scores……………. 223
Table 6.42 Tests of Between-Subjects Effects of SPM scores……………………….. 223
Table 6.43 Comparison of age according to gender………………………………….. 224
Table 6.44 Levene's Test of Equality of Error Variances…………………………….. 225
Table 6.45 Tests of Between-Subjects Effects of SPM scores……………………….. 225
Table 6.46 Post Hoc Tukey (HSD) test………………………………………………. 225
Table 6.47 Comparison of academic discipline according to age……………………. 227
Table 6.48 Levene's Test of Equality of Error Variances of SPM scores……………. 227
Table 6.49 Tests of Between-Subjects Effects of SPM scores……………………….. 228
Table 6.50 Post Hoc Tukey (HSD) test………………………………………………. 228
Table 6.51 Magnitude of gender differences in means score and variability on SPM
as functions of age, geographic areas and discipline……………………... 229
Table 6.52 Stepwise Regression for Independent Variables and the SPM Scores…… 232
Table 6.53 detailed percentile 2007-2008 Norms for Libya students according to age 233
Table 6.54 detailed percentile 2007-2008 Norms for the Libyan students according
to age and gender…………………………………………………………. 234
Table 6.55 Detailed percentile (2007-2008) Norms for Libyan students according to
age and academic discipline……………………………………………… 235
Table 7.1 Studies included in the meta-analysis…………………………………….. 245
Table 7.2 Descriptive statistics for means scores of overall collected data and tests
of normality………………………………………………………………. 249
Table 7.3 Showing SPM score means and standard deviations according to
independent variables…………………………………………………….. 251
Table7.4 Comparison of the SPM Mean according to development status………… 252
Table 7.5 Post hoc tests multiple comparisons of SPM scores (Tukey HSD)………. 252
Table 7.6 Comparison of the SPM Mean scores according to age groups………….. 253
Table 7.7 Post Hoc Tests Multiple Comparisons of SPM Scores (Tukey HSD)……. 253
Table 7.8 Comparison of the gender mean scores of SPM test……………………... 255
Table 7.9 Comparison of the development status mean scores of SPM test
according to age…………………………………………………………... 256
xi
Page
Table 7.10 Levene's Test of Equality of Error Variances of SPM scores……………. 256
Table 7.11 Tests of Between-Subjects Effects of SPM scores……………………….. 256
Table 7.12 Post Hoc Tests Multiple Comparisons of SPM Scores (Tukey HSD)……. 257
Table 7.13 Magnitude of the development status of countries (developed and
developing countries) in mean scores and variability on SPM as
functions of age and total sample………………………………………… 258
Table 7.14 Comparison of the development status mean scores of SPM test
according to gender……………………………………………………….. 260
Table 7.15 Levene's Test of Equality of Error Variances of SPM scores……………. 260
Table 7.16 Tests of Between-Subjects Effects of SPM scores……………………….. 261
Table 7.17 Comparison of the age groups mean scores of SPM test according to
gender…………………………………………………………………….. 262
Table 7.18 Levene's Test of Equality of Error Variances of SPM scores……………. 262
Table 7.19 Tests of Between-Subjects Effects of SPM scores……………………….. 262
Table 7.20 Post Hoc Tests Multiple Comparisons of SPM Scores (Tukey HSD)……. 263
Table 7.21 Magnitude of gender differences in mean scores and variability on SPM
as a function of age and development status……………………………... 264
Table 7.22 Stepwise Regression for Independent Variable and the SPM Score
Means……………………………………………………………………... 266
Table 8.1 Mean IQs and average for some developed and developing countries…... 283
xii
Figures
Page
Figure 4.1 Typical items from the SPM Test. A5 presents an easy item whereas E1
presents a difficult item …………………………………………….. 92
Figure 5.1 Summary of the sampling method and theory………………………….. 171
Figure 5.2 Sampling process……………………………………………………….. 177
Figure 6.1 Histogram showing normal distribution for means scores……………... 188
Figure 6.2 Normal Q-Q plot……………………………………………………….. 189
Figure 6.3 Detrended normal Q-Q plot…………………………………………….. 189
Figure 6.4 Box plot of scores distribution…………………………………………. 189
Figure 6.5 Screen Plot for the five Factors………………………………………… 197
Figure 6.6 Screen Plot for the five Factors………………………………………… 198
Figure 5.7 Screen Plot for the five Factors………………………………………… 199
Figure 5.8 Means score differences of age and region…………………………….. 220
Figure 5.9 Means score difference of age and gender……………………………... 226
Figure 7.1 The distribution for means scores……………………………………… 249
Figure 7.2 Box plot of scores distribution…………………………………………. 249
Figure 7.3 Normal Q-Q plot……………………………………………………….. 250
Figure 7.4 Detrended normal Q-Q plot…………………………………………….. 250
Figure 7.5 Means score differences of age group and gender……………………... 263
Figure 8.1 Urbanisation development in Libya 1954-1995………………………... 297
xiii
Chapter One: INTRODUCTION
Humans differ from one another in their ability to understand complex ideas, adapt
effectively to the surrounding environment, learn from experience, engage in various forms
of reasoning and overcome obstacles through thinking. Although individuals’ differences can
be substantial, they are never entirely consistent over time. A given person's intellectual
performance will vary on different occasions, in different domains and as judged by different
criteria. The concept of "intelligence" is an attempt to represent and organize this complex set
of phenomena. Such conceptualization has achieved great success in clarifying some areas.
Nonetheless it has not yet answered all the important questions nor has it established
universal assent. Indeed, when two of the prominent theorists, in the field, were asked to
define intelligence, they gave two somewhat distinct definitions (Sternberg & Detterman,
1986). Such a disagreement is not a cause of dismay. Scientific research rarely begins with
Intelligence tests play a vital role at all stages and in every aspect of a person's life. From
pre-school days through to postgraduate years, tests are administered for grouping, course
selection purposes, and placement in special classes or special institutions. Not only that, but
also for career orientation, college entrance and admission to professions. A person's
Intelligence Quotient (IQ) score largely determines the type of education he/she received and,
ultimately, the type of position he/she might occupy within society. Therefore the concept of
Though Libya has witnessed a huge development in education within the last decades, some
areas still lack the benefits of such advancements. To date, no single test of intellectual ability
has been officially adopted to be used for the measurement of intelligence. Schools and
universities alike use examination grades as the primary and only method in determining who
should be accepted for study at various academic establishments. Similar procedures take
place in the vocational sector. These, grades, might be considered as a good criterion for
such purposes. Additional criteria, however, are essential for reliable and valid judgements.
One of which is the application of mental, or particularly intelligence tests in decision making
processes. The lack of intelligence tests in Libya in the selection of students for different
educational programs had caused many problems. Failure to allocate students according to
their abilities and interests deprived Libya from one of its most valuable resources. This also
had an adverse effect on business and commerce. Employees scoring well in tests might not
The health service system is another affected sector. Mental tests currently employed in
Libya are either misused or used in an incomplete form. The use of incomplete tests has
serious negative implications for educational and clinical decisions. The chief drawback is
the bias of the test predictions. In the clinical case, the use of incomplete test scores for the
estimation of mental ability might result in invalid assessment. This will lead to grave
consequences on individuals’ lives. Intelligence tests are useful tools in accomplishing the
desired goals and avoid unwanted side-effects. Their effectiveness will depend on the skills
Nowadays a relevant and accurate selection procedure is required in Libya more than ever
before. Not only in the fields of education, health and vocation but in the whole agenda of the
government. Indeed, a clear failing of the current system could be seen, for example, at the
job market. Many university graduates were posted to office work which could be done by
In response to the current gaps, this book aims at introducing one of the well known
intelligence tests in the world in Libya. This is the classic form of the Standard Progressive
Matrices (SPM) test. Moreover, the current study attempts to develop norms for the SPM test
and identify the distribution of IQ scores of a Libyan sample. The study objectives include:
2. Study the relationship between SPM mean scores and student’s academic
SPM test according to gender, region (cities and villages), academic discipline
(science and arts), geographical areas (main city, secondary city, coastal, mountain
test according to region and gender, age and region, region and study levels,
geographic areas and gender, academic discipline and gender, age and gender and age
5. Investigate variability of SPM means score gender based on age, and gender based on
6. Examine the contribution of the independent variables gender, age and regions and
7. Compute the percentile ranks for the SPM scores according to the sample age levels.
8. Compare performance on the SPM test for a Libyan sample with that of other
The book begins, in chapter two, with a historical review of literature. First, the definition of
the concept of intelligence, its evolution and means of testing are presented. A brief look at
some of the important theories of intelligence developed over the past century is then
highlighted. After that, the definitions, classification, and uses of some mental tests including
culture fair tests, achievement tests, intelligence and academic achievement are discussed in
depth. The evolution of the Intelligence Quotient (IQ) with time in different countries will
also be studied.
Chapter three introduces the statement of problem and the study rationale. It provides a short
description of the education system and intelligence testing in Libya. It also includes the
After setting the atmosphere of the research, the focus is then shifted, in chapter four, to the
general information regarding the Progressive Matrices tests. A description of the SPM test
and its standardization are presented. After that the reliability, validity and item analysis of
the SPM test are rigorously investigated. Towards the end, a brief review of previous studies
Chapter five is concerned with methodology issues such as research design, ethical approval,
pilot study, and sample and data collection. It also covers statistical methods to be used, the
modification and administration of the SPM test. The tests are performed in Libya on a
sample of students.
Once the test is performed and data are available, the results are then examined and analysed
in chapter six. The initial step in the data-analysis pipeline was the standardisation of the
SPM test. The primary reason is to determine whether the SPM test can be effectively used in
Libya. The next step is the analysis of the rest of the study objectives such as the relationship
of the SPM test scores and Students Academic Achievements (SAA). The outcomes of this
chapter are compared to those found in other studies in chapter seven (meta-analysis). These
studies are sampled from both developed and developing countries. Also covered in this
chapter are literature review of meta-analysis applications on SPM tests, methodology, data
The final part of the book, chapter eight, brings together the key research findings and
discusses them in context with the wider existing literature. Intelligence testing and IQ
distribution in Libya are discussed and evaluated in context of the available facilities. The
methods of data collection; SPM test and meta-analysis, are highlighted. The major
conclusions of the whole book and its contribution in the field of intelligence testing in Libya
are outlined. Moreover, strengths and weaknesses of the study are presented. Finally
recommendations for practice and future research naturally emerge from the study findings
are suggested.
Chapter two: INTELLIGENCE LITERATURE REVIEW
2.1 Introduction
and Rothman (in Li: 1996), social scientists and educators were questioned on the
important, and 96% indicted that capacity to acquire knowledge was important. This
elements of intelligence (Marais, 2007). In another study nearly 500 laypeople and 24
experts were asked to define intelligence; Sternberg (2000) found that their responses
because it can be seen as the ability to gain knowledge, to think about abstract
constructed the first intelligence test, in 1905, is that although intelligence is relatively
The purpose of this chapter is a historical review of the literature. First, the definition
of the concept of intelligence, its evolution and means of testing are presented. A brief
look at some of the important theories of intelligence that have been developed over
the past century is then highlighted. After that, the definitions, classification, and uses
of some mental tests including culture fair tests, achievement tests, intelligence and
academic achievement are discussed in depth. Finally the evolution of the Intelligence
6
Quitenance (IQ) with time in different countries will be studied.
Intelligence, a in word common using today, was almost unknown in popular speech a
century ago. After intelligence tests had been invented to measure intelligence,
scientists felt the urge to define it. They reintroduced the ancient Latin term
Sternberg (1990) mentions that today, as in the past, there seem to be as many
definition of intelligence as there are investigators of it. Wechsler (1975) also stated
that intelligence has been viewed by educators as the ability to learn, by biologists as
relationships.
intelligence is reflected in two symposia to define intelligence; the first was in 1921,
take part in a symposium (Intelligence and its Measurement). The contributors were
7
or flexibility of association; facility in imagination, span of attention, quickness or
(Haggerty) P.212.
Intelligence involves two factors; the capacity for knowledge and knowledge
of stimuli are brought together and given a somewhat unified effect in behaviour
(Peterson) P.198.
Intelligence is the power of good responses from the point of view of truth or fact
(Thorndike) P.124
instinctive adjustment in the light of imaginably experienced trial and error, and c) the
8
The most famous definition of intelligence which explains the absence of agreement
among psychologists, was made by Boring in1923, who claimed that intelligence is
what intelligence tests test. Spearman (1927) pointed out that intelligence had become
These psychologists gave different views about the nature of intelligence, although
there was much in common in their definitions (Sattler, 1982). In 1975 Samuda,
talked about ambiguity and little agreement found between psychologists in the 1921
Symposium, he stated:
Sixty-five years after the 1921 Symposium to define intelligence, Strenberg and
Detterman (1986) noticed that the effort to define intelligence had not been repeated.
In 1986 they asked experts in the field of intelligence to respond to the very same
question that was posed in the 1921 Symposium, to see what theorists of intelligence
today believed intelligence to be. The following are some of the 1986 Symposium
definitions:
9
Intelligence is a set of whatever abilities make people successful at achieving their
Intelligence is adaptive for a given cultural group in permitting members of the group,
Intelligence is the sum total of all cognitive processes, including planning, coding of
(Detterman) P.57.
Intelligence consists of three capacities: (a) the capacity to manipulate symbols, (b)
the capacity to evaluate the consequences of alternative choices, and (c) the capacity
(Glaser) P.79.
Intelligence provides a means to govern ourselves so that our thought and action are
organised, coherent, and responsive both to our internally driven needs and to the
10
needs of the environment (Sternbrg) P.141.
(Zigler) P.149.
After the two symposia to define intelligence, no single definition of intelligence was
agreed upon by psychologists. Viewed broadly, however, two themes seemed to run
through at least several of the definitions in the complete set: the capacity to learn
Again Sternberg (1990) found that some general agreement exists across the two
adaptation to the environment, basic mental processes, and higher order thinking like
reasoning, problem solving and decision making were prominent in both symposia.
Charles Spearman defined intelligence as the ability to recognise relations and related
Lynn and Vanhanen in (2006) reported that a useful definition of intelligence was
proposed by Neisser in 1996; intelligence is the ability "to understand complex ideas,
adapt effectively to the environment, learn from experience, engage in various forms
Also a similar definition by Gottfredson was published in the Wall Street Journal in
1994 as "Intelligence is a very general mental capacity which, among other things,
involves the ability to reason, plan, solve problems, think abstractly, comprehend
complex ideas, learn quickly and learn from experience. It is not merely book
learning, a narrow academic skill, or test taking smarts. Rather, it reflects a broader
11
and deeper capability for comprehending our surroundings - 'catching on', 'making
More recently Schmidt and Hunter (2004, p. 162) have taken stock of the results of a
century’s research on intelligence: “the accumulated evidence has become very strong
that general intelligence is correlated with a wide variety of life outcomes, ranging
from risky health-related behavior to criminal offenses, to the ability to use a bus or a
subway system”. Among the numerous tasks that intelligent people do more
effectively than less intelligent people are to acquire complex skills and work more
In general two themes seem to run through at least several of the definitions in the
complete set: the capacity to learn from experience and the capacity to adapt to one’s
environment.
civilization. In 400 BC, the Greek used the term “nous” to express intelligence.
Plato in his “Republic” claimed that “nous” is mostly inherited and that off
spring can be bred for it by selectively from parents who had the most “nous”
In addition 500 BC in China the Sui dynasty used tests of ability for the
mathematics and astronomy, were still employed until the 20th century (Lynn
12
intelligence. In 1594 the book was translated into English in which the term
“wits” was used to express intelligence. The book evaluates the various types
During the nineteenth century the study of mental retardation witnessed a strong
awakening of interest in the human treatment, training and education of the mentally
retarded. Anastasi (1988) stated that one of the first problems that stimulated the
Marks (1981) and Rust and Golombak (1989) observed that the rapid scientific and
social progress in Europe during the nineteenth century led to the development of
several assessment techniques, most notably in medical diagnosis of the mentally ill.
13
Empirical support for the theoretical basis of intelligence as a unitary construct
essentially began with the development of factor analysis (Ittenbach, Esters, &
Wainer, 1997). The historical antecedents for factor analysis originated with the work
single entity and that intelligence determines the level of civilization. He studied the
number of geniuses compared to the size of their populations and reached the
being the most intelligent while the Australian Aborigines being lowest (Galton,
1869 ).
Galton was the first researcher to utilize empirically objective devices to measure
of assessing mental ability. Galton analyzed the scores and applied statistical
reasoning to the study of those with high ability. He was the first to identify "general
One of Galton's followers, Spearman, was the first to assert that all individual
ability (Jensen, 1980). Spearman introduced factor analysis, in part, to ascertain the
degree to which a test measures a general factor (Jensen, 1980). Spearman used
14
specific factors (Gould, 1996). Spearman believed each test of mental abilities had a
single general factor, g, as well as specific factors (s) unique to the test. These
and many scholars (Carroll, 1993; Hermstein & Murray, 1994; Jensen, 1980;
Rushton, 1997) continued to believe scores on intelligence tests are reflected best
one's intelligence and thus to use when examining mean IQ differences between
Factor analysis soon became one of the most important techniques in modern
statistical technique that allows one to analyze the sources of variance of a particular
measure by examining the pattern of correlations between two measures and other
measures. The technique is useful to reduce a complex set of correlations into fewer
variables that were most highly correlated were combined to form the first principal
component by placing an axis through all the points. Other axes, drawn to account for
the other variables, are labeled second and third (etc.) order factors (Edwards, 2003).
Relative to intelligence testing, factor analysis has been applied to show positive
correlations among different mental tests (Gould, 1996). In that most correlation
coefficients in mental tests are positive, factor analysis yielded a reasonably strong
demonstrate the viability of g as the first factor to emerge when analyzing factor
scores for intelligence tests. Other theorists used factor analysis to suggest that IQs
15
depend on a number of independent factors, not a large general factor (Gardner,
Although researchers may disagree about the structure of intelligence, they agree that
IQs arise as a function, at least to some degree, from a general factor as well as
1998; Urbach, 1974). To reiterate, g is important because it is considered the best way
The history of mental measurement development during the nineteenth and early
Seguin, Esquirol, Galton, Cattell, Binet, and Spearman. Detailed description of these
contributions are voluminous and moreover, beyond the scope of this study so we will
The French physician Seguin started his career as an assistant to Jean Itard, who was
working with a wild boy found by hunters in the forest of Aveyron. In 1837 Seguin
established the first school for training and education of mentally retarded children. In
1844 he emigrated to America where his ideas gained wide recognition. Guilford
(1967) mentioned that Seguin was pioneering in the training of mentally retarded
individuals by exercising their sensory and motor function. In 1866 he developed the
first non-verbal test, the Seguin Form Board, in which the individual is required to put
variously shaped blocks back in their closely fitting spaces as quickly as possible.
Corsini (1984) mentioned that Seguin's test was the first to be used as some measure
of intellectual functioning. Domino & Dominom (2006) reported that Eduard Seguin
16
developed many procedures to enhance muscular control and sensory abilities for the
mentally deficient. Some of these procedures were later incorporated into tests of
intelligence.
In 1838 Esquirol, another French physician was the first person to make a clear
distinction between mental retardation and mental illness. He pointed out that mental
retards may have never developed their intellectual capacity, whereas mentally ill
people had lost the abilities they once possessed. He also pointed out that the
individual's use of language and therefore language tests provided the most
differentiating mental retardation from mental illness (Anastasi & Urbina, 1997;
Some commentators have suggested that testing movement began with the English
biologist Galton who was interested in human heredity. Anastasi (1988) believed that
(1991) also believed that the first person to seriously attempt to measure intelligence
was Galton. Galton realised the need for measuring characteristics of related and
offspring. Galton was the first scientist who undertook statistical measurement of
individual differences.
For seven years from 1884 to 1890 Galton set up an anthropometric laboratory at
South Kensington Museum in London, where for a small fee, visitors could have
17
strength, reaction time, and other simple sensorimotor functions (Anastasi & Urbina,
1997; Snyderman & Rothman, 1988; Virgolim, 2005; Domino & Domino, 2006).
Herrnstein & Murray (1994) stated that Galton had the idea that intelligence would
measures of sight, hearing, sensitivity to light, skin pressure, and speed of reaction to
simple stimuli. He therefore concluded that the more perceptive the senses, the larger
the range of information would be on which intelligence could act. Jensen, as reported
by Corsini (1984) points out that Galton's contribution to statistics and psychometrics
included percentile ranks, the use of central tendency and rating scales.
American born psychologist James Cattell went to Germany and studied with
Wilhelm Wundt at Leipzig where the first psychological laboratory was founded in
1879. The first psychologists at Leipzig studied the same processes that physiologists
did, namely seeing, hearing and speed of response (Attashani and Abdalla, 2005).
Anastasi (1988) claimed that the principal focus of early experimental psychology in
time. He lectured at Cambridge University where he met Galton, who shared Cattell’s,
interests. He was also active in the spread of the testing movement in the USA (
18
and motor nature, and differing little from those designed by Galton. In an article
published in 1890 in Mind, entitled "Mental Tests and Measurements", Cattell was the
first to use the term "mental test" in psychological literature (Freeman, 1962; Eysenck
et al., 1972; Sattler, 1982; Fancher, 1985; Anastasi, 1988; Sternberg, 1990).
Freeman (1962) and Jensen (1981) both concluded that the Galton-Cattell approaches
testing, did nonetheless strongly affect the course taken by test experimenters until
about 1900 when the influence of Alfred Binet was first felt.
The history of mental testing is widely considered to have begun with the work of
Binet. Binet, Simon, and Henri, spent many years in research on ways of measuring
intelligence. Anastasi (1988) stated that in 1895 Binet and Henri published an article
in which they criticised most available tests (Galton type tests) as being too sensory
and concentrating on simple specialised abilities. Their research suggested that the
Binet assumed that intelligence was not much involved in sensory-motor tasks but in
tasks calling for more complex mental processes, especially judgement (Jensen 1980).
Binet and Simon believed that essential activities of intelligence were to judge well, to
comprehend well and to reason well. Binet found that children who were best in
the procedures for the education of mentally retarded children. A member of this
19
commission was Binet. In 1905 Binet, in collaboration with Simon, prepared the first
Binet-Simon Scale. The scale consisted of 30 items, designed for children aged 3 to
12 years arranged in order of difficulty. Improved versions came out in 1908 and 1911
in which unsatisfactory items were eliminated, items increased and grouped into age
levels and the test was extended to adult level (Roid and Barram, 2004).
reported in terms of mental age (MA). A mental age below the child's chronological
psychologist, Wilhelm Stern proposed the use of the ratio of mental age to
chronological age to yield the "intelligence quotient" (IQ). Mental age was the level
of ability of the average child certain age, e.g. mental age of 12 is defined by mental
tests the child at 12 years would pass. IQ was Mental Age divided by Chronological
Age multiplied by 100. So a child at 10 years but functions as a child of 5 years would
scores to a metric with a mean set at 100, and a standard deviation of 15. This would
mean that 96% of the population’s IQ was between 70 and 130. 2% of the population
under 70 and considered mentally retarded while 2% were above 130 and considered
gifted.
Many researchers believe that the testing movement began to flourish after the
introduction of the Binet-Simon Scale in 1905. For example Herrnstein and Murray
20
test met a key criterion that Galton's test could not. Sattler (1982) mentioned that the
Binet - Simon scale served the purpose of objectively diagnosing a degree of mental
retardation, and became the prototype of subsequent scales for mental ability
assessment.
Within a few years translations and adaptations of the Binet-Simon Scale appeared in
many countries. The most rapid development took place in the USA in 1916 (SB1)
where Lewis M. Terman developed the Stanford revision of the Binet-Simon Scale
(SB1), now familiar as the Stanford-Binet Intelligence Scale. Terman added more
items and made other improvements to the test. The test was revised in 1937 (SB2) (L
and M forms), 1960-1973 (SB3) and again in 1986 where Thorindike, Hagan &
Sattler developed the (SB4) based on a four-factor hierarchical model with general
ability “g” as the overarching summary score. More recently Roid 2003 constructed
The Stanford-Binet Intelligence Scale very quickly became the "standard" I.Q on both
sides of the Atlantic. For more than half a century the Stanford-Binet test has been
one of the most widely used individual test of intelligence and has often served as a
standard for the construction of other tests (Jensen, 1980; Richardson, 1991).
2.3.6 The First World War and the Development of Group Tests
In spite of the success of the Stanford-Binet test, there was one problem in that it was
an individual test administered to one subject by one examiner. As the USA entered
the First World War the need arose for rapid testing of a large numbers of subjects in
a short time (Anastasi & Urbina 1997; Kaufman & Kaufman 2004).
21
substantially help to win the war and shorten the necessary period of conflict. He
recruits in order that they would be properly placed in the military service and to
screen all army recruits for mental defectiveness (Anastasi & Urbina 1997; Kaufman
A major contribution to group tests during the World War was made by Arthur. S.
Otis whose group intelligence test "The Scale for the Group Measurement of
Intelligence" was used by the committee becoming the basis of the Army Alpha Test
The committee quickly developed two tests; the Army Alpha for literate, and the
Army Beta for non-English speakers who were unable to take the test in English. The
sequences. The Beta test included mazes, finding the missing element in pictures and
coding. By the end of the war in 1918 about 1,750,000 men had been given the Army
Alpha or Beta tests (Freeman, 1962; Guilford, 1967; Noll and Scannell, 1979; Ebel,
Shortly after the First World War the tests were released for civilian use and served as
models for most group intelligence tests. Concurrently, their development gave rise to
In a summary of the misuse of scores in the United States after the development of
22
immigration quotas, and early racism. Tyler and Walsh (1979) stated that after the
factor in his two-factor theory was the object of the Standard Progressive Matrices
(SPM) test. Kline (1979) believed that the first contribution from psychometrics to
psychological insight into the nature and structure of human abilities emerged from
the work of Spearman. Eysenck et al. (1972) were also of the view that Spearman’s
two factor theory of intelligence together with Binet-Simon's Scale represented the
starting point for the development of the theory and measurement of intelligence in
Spearman's two-factor theory was based on analysis of empirical data from test
scores. Spearman's first investigation was with children in village school (N=24), to
school" having the two oldest children rank the members of their class for "sharpness
and common sense out of school", and Spearman's rank of children's performance on
Spearman found a correlation of 0.55 between the three intellectual variables, the
correlation between the three sensory measures was 0.25, and a correlation of 0.38
His second investigation was with boys from an upper class preparatory school
23
(N=22). This time he took examination grades in Classics, French, English and Maths
as measures of "intelligence" and correlated them with a pitch discrimination task and
with the music teacher’s ranking of the boys' on musical proficiency. Spearman found
music and pitch correlated with the four intelligence scores at the average of 0.56,
while music and pitch correlated with each other at 0.40 and the correlation between
the four examination grades was on average 0.71 (Fancher, 1985; Richardson, 1991).
Spearman also discovered that the correlation between the six variables (Classics,
French, English, Maths, Pitch and Music) were not only all positive, but also ranged
themselves in a nearly perfect hierarchy. This was one of the observations that lead to
the formulation of the “g”-theory, which will be presented in the nest section.
Spearman further identified two components of "g" factor as; (a) eductive ability, that
is, the mental activity making meaning out of confusion, developing new insight,
going beyond the given to perceive that which is not immediately obvious, and
generating high level schemata, which make it easy to handle complex events.
Eductive ability is largely non-verbal. (b) Reproductive ability, that is, the ability
24
Brody (1992) identified at least five important contributions of Spearman's theory to
emphasized that intelligence tests should contain subscales or measures that have high
g-to-s ratios. Second, his methods for analyzing correlation matrices were the
foundation of factor analysis. It can be said that his method was the precursor of the
not be identified with any particular measure or subset of measures. Fourth, his theory
contained a strong empirical claim that all measures of intelligence were measures of
contemporary research. Finally, Spearman may have been correct when he assumed
(Columbia University students), and second because of the lack of ideal conditions of
One of the most important contributions to the study of intelligence emerged from
the work of Jean Piaget, who sought to explain intellectual development as a result
of changes in the cognitive function (Piaget, 1961). Piaget began his inquiry in a
non-scientific way, selecting only three subjects to study (his own children) without
a control group. However, he described the results of his observations in such a clear
25
and detailed manner, that his evidence permitted him to explain important principles
reported his principles as viable and useful (Clark, 1992; Wadsworth, 1993).
interaction of a child with his/her environment. The interaction among the critical
(Wadsworth, 1993). The Piagetian tests, unlike the traditional psychometric tests
used so far, aimed to assess not what we know (the product), but rather how we
know or think (the process), and how people obtain and use information to solve
Piaget was also one of the first theorists to establish an interactive theory of
contributions as well as quality of environment where the child lived. This position
has numerous followers and, as pointed out by Plomin (1989), the most recent
researchers support the notion that genetic influences on behavior are multifactorial,
factors, in general, account for no more than half of the variance of behavioral traits,
1997). However, as pointed out by Neisser and his collaborators (1996), the pathways
largely unknown. Similarly, the exact way the environment contributes to those
26
2.4 Theories of Intelligence
(1904) in the early twentieth century. Spearman showed that all cognitive abilities are
positively inter-correlated, e.g. people who do well on some tasks tend to do well on
others. He invented the statistical method of factor analysis to show that the efficiency
designated this common factor “g” for "general intelligence" and defined it as "the
the common factor, Spearman proposed the presence of some general mental power
determining performance on all cognitive tasks and responsible for their positive
different abilities are not perfect (Lynn and Vanhanen 2006). To explain this he
proposed that in addition to “g”, there were a number of specific abilities that
determined performance on particular types of tasks; over and above the effect of “g”.
Spearman identified three major laws of cognitive activities associated with “g”.
The first was the Law of Apprehension, that is, the fact that a person
approaches the stimulation he receives from all external and internal sources
via the ascending nerves.... Next we have the eduction of Relations. Given two
stimuli, ideas, or impressions, we can immediately discover any relationship
existing between them-one is larger, simpler, stronger or whatever than the
other. And finally, we have the eduction of Correlates-given two stimuli,
joined by a given relation, and a third stimulus, we can produce a fourth
stimulus that bears the same relation to the third as the second bears to the
first.... If Spearman is right, then tests constructed on these principles, that is,
using apprehension, eduction of relations and eduction of correlates, should be
the best measures of g; that is, correlate best with all other tests. This has been
found to be so; the Matrices test... has been found to be just about the purest
measure of IQ. (Eysenck, 1998, p. 57).
By the end of the twentieth century Spearman’s basic theory had become virtually
27
universally accepted in the academic discipline of differential psychology. The
principal elaboration of the theory has been the development of what is called the
there are numerous narrow specific abilities at the base, eight “second order or group
mathematical, cultural knowledge and cognitive speed in the middle of the structure
and a single general factor - Spearman’s “g” - at the apex. This model was widely
accepted among contemporary experts such as the American Task Force chaired by
Ulrich Neisser (1996), Jensen (1998), Mackintosh (1998), Carroll (1994), Deary
Matrices tests such as the Raven's Progressive Matrices employed Spearman's theory
and have been widely used as measures of intelligence (Eysenck, 1998). Matrices
tests contained substantial loadings of “g” and demanded conscious and complex
(Sattler, 1988). Conversely, tests that require less conscious and complex mental
effort are low in g (Sattler, 1988). Intelligence tests with lower g emphasize specific
factors such as recognition, recall, speed, visual-motor abilities, and motor abilities
(Sattler, 1988).
Louis Thurstone (1938) disagreed with the idea that intelligence comprised a general
Thurstone was intent on showing how intelligence could be separated into the noted
multiple factors, each of which had equivalent significance (Sattler, 1998). In his
1935 book, The Vectors of Mind, he hypothesized that intelligence consists of a small
28
number of independent factors, corresponding to different cognitive domains, each of
factors were: verbal ability, general reasoning (inductive and deductive), numerical
ability, memory, perceptual speed, word fluency, and spatial ability. These factors are
students and came to the conclusion that there were seven primary mental abilities
that made up a person’s intelligence. The abilities or factor were; Spatial (S) the
ability to form spatial and visual images. Perceptual (P): the ability to find or
recognise particular items in a perceptual field. Numerical (N): the ability to perform
simple numerical calculations. Verbal relations (V): the ability to conceptualize ideas
and meanings in language. Word (W) the ability to deal with single and isolated
words in a fluent manner. Memory (M) the ability to recognize and recall words,
number and figures after having memorized them. Inductive Reasoning (I) the ability
to find a rule or principle and apply it. Restrictive reasoning (R): the ability to
limited to one correct solution. Deductive Reasoning (D) the ability to draw a logical
pointed out that the differences between Spearman's and Thurstone's theories seemed
compelled to recognize the existence of group factors, while Thurstone was forced to
to the primary mental abilities (Snyderman & Rothman, 1988). In 1941, Cattell
29
proposed a reconciliation between the two theories by postulating the existence of a
hierarchical structure of ability (Snyderman & Rothman, 1988; Brody, 1992). The
“g” factor would be a general, common factor, presented in all measures of the
ability, derivable from the relationships that exist among the more specialized factors
postulated by Thurstone.
Guilford (1967, 1985) identified many different factors which together make up the
information. Memory: ability to recall and recognise information that has been
making a decision concerning criterion satisfaction. Visual: the visual category refers
to information that is visually perceived, e.g. correct perception of words that have
missing letters. Auditory: refers to information that is heard and therefore auditory
information: in the form of tokens or signs and stands for something else, e.g. printed
classified according to the processes and content but also according to the form in
which the information was processed. The form of information is classified into
product categories. The products identified were Units: the most basic form of
30
information is units or parts of wholes. Units can be seen as chunks of information,
e.g. single words. Classes: a class is a set of objects with one or more common
properties, e.g. in number classification, the number 22 first in the class formed by the
numbers 44, 55 and 33. Relations: a relation is a connection between two things. An
item testing the cognition of relations e.g. may require the identification of a relation
spatial orientation tasks may be used, where visual rotation and consideration of many
different parts and their changing relationships to each other are involved.
product of information in one state changed over into another state involves
have to explain the many different ways in which two common objects, such as an
apple and an orange, are alike. This involves the redefinition of the objects by
other in the manner of a crossword so that words may be read down or across.
Considering position of letters gives rise to the expectation that one of the other words
meaningful areas. Gardner did not agree with the concept of a general intelligence
31
factor (g) and held that eight different intelligences were found to a greater or lesser
numerical patterns; ability to handle long chains of logical reasoning. Musical: ability
to produce and appreciate pitch, rhythm (or melody) and aesthetic-sounding tones;
ability to use the body skillfully for expressive as well as goal-directed purposes;
ability to handle objects skillfully. Naturalist: to recognize and classify all varieties of
to discriminate complex inner feelings and to use them to guide one’s own behaviour;
knowledge of one’s own strengths, weaknesses, desires and intelligences. Only few
factor analytical studies support the existence of multiple intelligences as Gardner saw
Cattell proposed a theory that intelligence consisted of two major types of cognitive
acquired skills and knowledge that were dependent on exposure to a particular culture,
as well as formal and informal education, for example, vocabulary. The abilities that
independent of any specific instruction, for example, memory for digits (Cohen &
Swerdlik 2002).
32
Tests that measured the ability to manipulate information and solve problems were
considered measures of fluid ability whereas tests that require simple recall or
1998).
Carroll (1994) used exploratory factor analysis to test his belief that human cognitive
as conceptualized by Spearman, at the top (Berk, 2000). Eight broad abilities occupied
the second stratum, arranged from left to right in terms of their decreasing correlation
with ‘g’. The eight abilities were fluid intelligence, Crystallised Intelligence, General
Spearman's theory of g, the fluid and crystallized intelligence theories of Cattell and
Horn, and the factor-analytic work of Carroll. The Cattell-Horn theory of intelligence
of human cognitive abilities that many scientists would agree on (Cohen & Swerdlik
2002).
In Cattell-Horn-Carroll (CHC) model, there were ten broad stratum abilities and over
seventy narrow stratum abilities. Each broad stratum ability included two or more
33
narrow stratum abilities. The ten broad stratum abilities were: Fluid Intelligence (Gf),
(Grw), Short-term Memory (Gsm), Visual Processing (Ga), Long-term storage and
Retrieval (Glr), Processing Speed (Gs) and Decision/Reaction time or Speed (Gt).
Recent studies showed that the CHC model offered a better representation of the
standardisation is that it gives the test scores psychological meaning and thus makes
guidance or personnel selection where decisions about individuals are made (Kline
2000).
Kline (2000) argued that as norms are essential for the understanding the
measurements (test scores) they must be accurate. To ensure this he mentioned that
some requirements for a good standardisation should be met. These include sampling
and expressing of the results which will be discussed in detail later (chapters 5 and 6).
Cronbach, (1990) stated that a test is a systematic procedure for observing behaviour
and describing it with the aid of numerical scales or fixed categories. Anastasi (1988)
Jensen, (1981) defined a mental test as a small sample of behaviour used to predict
34
more extensive or important behaviour or capability. He added that mental tests were
essentially similar to other tests. Tyler and Walsh (1979) defined tests as standardised
From the above definitions it is clear that a test is a tool used to measure a sample of
behaviour, not a complete inventory. Psychological tests are standardised, that is, each
test is administered under a prescribed set of procedures, and objective which implies
judgement or evaluation of test scores. Scarr (1981) mentioned that the sampling
rationale was that an individual who can repeat six digits backwards can also
Mental tests are a subset of psychological tests. Psychological tests can be divided
into:
a) Mental tests which are used to measure general intellectual ability of individuals
a) Speed tests: measure speed and efficiency with which a subject can perform test
items. In a speed test the items are so easy and simple that almost anyone could get
them all right if given sufficient time. Such test identifies who works faster (Jensen,
35
1980; 1981; Brown, 1983).
b) Power tests: determine highest level of knowledge, skill or reasoning the subject
can demonstrate without time pressure. They consist of items graded in difficulty or
complexity. In a power test there is no time limit or a very liberal time limit which
allows individuals to complete all items they can answer correctly. Scores in a power
test reflect the level of difficulty of items the test taker can answer correctly. (Jensen,
allows the examiner to observe the subjects performance on the test items, which
helps in evaluating test scores. Common examples of individual intelligence tests are
b) Group tests: administered to a number of subjects at the same time. Often referred
to as paper and pencil tests because they require subjects to write answers or make
marks on specially prepared answer sheets. Because of their simplicity and low cost,
this type of test is more popular than individual tests (Ahmann & Glock, 1976).
Group intelligence tests are more often used for initial screening in schools and
businesses because they can be administered quickly and economically by people with
clinical and other settings where clinical diagnoses are made and where they serve as
36
2.6.1.3 Classification of tests according to content
a) Verbal tests: involves the use of language, spoken or written, but they may or may
not require reading or writing. Typical verbal tests are general information, verbal
b) Non-verbal tests: paper and pencil tests that involve no explicit use of language, in
some cases not even for giving instructions for taking the test. These tests consist of
such things as figural analogies, matrices, and embedded figures. The SPM test is an
c) Performance tests: non-verbal tests that require the subject to perform certain
figure copying, block design, and picture completion or picture arrangement. The
Sternberg 2000).
Classification, training, and education of mentally retarded individuals were the initial
sparks to development of mental tests. In general, mental tests have been used for
aptitude. For example, mental tests are used for diagnostic purposes to estimate the
performance of individuals in the future on the basis of their present ability (Anastasi
Brown (1983) noted that there were three situations where tests were used as aids in
decision making about an individual, a group or some hypothesis. The first use was
37
selection, where the role of the test is used to select the most promising applicants,
those with the greatest probability of success. The second use of tests was for
ability. A third use of tests was in diagnosis to identify the individual's strengths and
The purpose of using mental tests in schools was to estimate the mental ability of
students and provide them with educational or vocational guidance. Anastasi (1988,
p.4) stated:
At present, schools are among the largest test users. The classification
of children with reference to their ability to profit from different types of
school instruction, the identification of intellectually retarded in one hand and
gifted in the other, the diagnosis of academic failures, the educational and
vocational counselling of high school and college students, and the selection
of applicants for professional and other special school programs are among the
many educational uses of tests.
intelligence tests. Therefore, the use of each intelligence test must be guided by
address hypotheses that guide this study have the potential of adding to the research
The use of intellectual and other forms of psychological and mental tests with
different children frequently are uncertain which test instruments provide the most
valid, relevant and equitable results. Interest in providing fair and equitable mental
test results extended back several decades, but what is considered fair and objective
38
changed as values in our culture change (Oakland, 1976; Oakland & Laosa, 1976).
tests are designed carefully and deliberately to produce score variance (Wesson,
to acquire knowledge and make judgments about, between, and within group
led to various decisions (e.g., eligibility for placement in special education and
Summarising uses of intelligence tests after the Second World War in the United
Intelligence tests play a vital role at all stages and in every aspect of a
person's life. From pre-school days through postgraduate years, tests are
administered for grouping and course selection purpose, for placement in
special education classes or special institutions, for career orientation, college
entrance, and admission to professions. A person's IQ score largely determined
the type of education he/she received and, ultimately, the type of position
he/she might occupy within society. Therefore, the concept of intelligence was
central to an individual's life.
It should be stressed that intelligence tests should be used alongside other methods as
interviews, history records or other test score before reaching to a decision regarding
any test taker. Layman (1968, p.8) pointed out the problem of using intelligence tests
39
According to Urbina (2004), the current uses of tests, which take place in a wide
• Decision making:
The primary use of psychological tests has been as decision making tools. This
particular application of testing invariably involved value judgment on the part of one
or more decision makers who needed to determine the bases upon which to select
programs.
When tests are used for making significant decisions about individuals or programs,
strategy that takes into account a particular context in which the decisions are made,
the limitations of the tests, and other sources of data in addition to tests.
information- tests have been made to bear the responsibility for flawed decisions-
making processes that placed too much weight on test results and neglected other
pertinent information.
• Psychological testing:
Tests have often been used in research in the fields of differential, development,
provide a well-recognized method for studying the nature, development, and internal
relationships of cognitive, affective, and behavioral traits. It should be noted that the
advantages that psychological tests offer pertain to their characteristic efficiency and
40
objectivity.
Most humanistic psychologists and counselors have traditionally perceived the field
individuals in terms of rigid numerical criteria. Constance Fisher (1984) began using
tests in an individualized manner. This practice has evolved into the therapeutic model
of assessment espoused by Finn and Tonsager (1997). One of the most pertinent
The use of tests in cultures other than the one for which it were originally designed,
and the issue of cultural bias have led psychologists to develop what they thought at
The term culture fair test refers to tests that are not biased toward a particular cultural
group. Culture bias exists in a test when a member from one culture is discriminated
against in his or her ability to answer questions solely on the basis of the culture in
which he or she grew up (Corsini, 1984; Anastasi, 1988)). Anastasi & Urbina (1997)
mentioned that the concern with cross-cultural testing was recognised at least as early
To overcome the cultural bias in ability tests, psychologists have tried to develop
culture-free tests that have no such bias. Their first attempt to develop a test of
intelligence which would be free of cultural influences was to minimise the use of
language if cultural groups spoke different languages. However they noticed that the
direct translation of test items from one language to another did not eliminate the
41
cultural differences, nor produce comparable tests (Anastasi & Urbina 1997).
content that a person raised in different culture may lack the experience to understand
and furthermore may seem pointless. Anastasi (1988) pointed out that non-verbal
tests are often used in hope of obtaining culture fair tests, but many researchers
Kline (1979) argued that non-verbal tests in non-western cultures avoided the
language problem but encountered another perhaps more serious problem, when these
tasks seem pointless to subjects. Kline gave an example of this problem based upon
"when an old African who was tested was asked to trace the maze,
imagining he was asked to lead his cattle into the kraal, the old African replied
that he preferred not to, since any one who built a kraal like that was mad"
p.309.
A culture free test is meant to have a test with items that are unfamiliar to all subjects.
completely free from cultural bias (Biesheuvel, 1969; Brislin et al., 1973; Noll and
Anastasi (1988, p.357) reviewed the problem of culture free tests and concluded that:
Since all behaviors are affected by the cultural milieu in which the
individual is reared and since psychological tests are but samples of behavior,
cultural influences will and should be reflected in test performance. It is
therefore futile to try to devise a test that is "free" from cultural influences.
Noll and Scannell, (1979) had the same opinion. They stated that no test could be
42
culture free, since the only way to respond to it is in terms of what has been learned,
fair tests. They believed that to have a culture-fair test, all test items should be equally
familiar to all subjects. Biesheuvel (1969) defined culture-fair tests as tests which
Brown (1983) believed that culture-fair tests, though not eliminating culture effects,
attempted to make the tests equally fair to all persons by controlling certain critical
variables, such as, language, speed in responding within limited time, and differences
Summarising the problem of speed in responding within limited time, Samuda (1975)
reported that many researchers found that the attitude toward speed varies greatly in
different cultures and not all people will work on the test with equal interest in getting
it done in the shortest time possible. For example, they found that the injunction to
"do this as quickly as you can" seemed to make no impression whatsoever on the
Anastasi (1988) also mentioned that the present objective in cross-cultural testing is to
develop tests that presuppose only experiences that are common to different cultures.
Kline (1979) concluded that for cross-cultural test construction it was best to use our
knowledge and experience of the culture as a guideline to writing items, and to retain
43
those that show themselves to be criteria-based or valid in factor analysis. Such tests
influencing the major ability factors which is one of the stated aims of cross-cultural
psychologists.
The following are examples of culture-fair tests that have been used in cross-cultural
testing; Porteus Maze Test 1913, Kohs Block Design Test 1923, Goodenough-Harris
Drawing Test 1926, Raven's Progressive Matrices 1938, Cattell's Culture Free Test
1940 (in the late 1950s, Cattell changed the term "Culture-Free Test" to Culture-Fair
Test), D48 Test (dominoes) 1948, and Witkin's Embedded Figures Test 1945.
Brislin et al., 1973; Kline, 1979; Raven, 1989; Murphy and Davidshover, 1991
believed that Raven's Progressive Matrices was one of the most widely used
educational subject matter after a period of instruction. They were not designed for
prediction. Instead, they measured what has been learned or the mastery of school
Achievement tests served many functions. Aiken (1988, p.125) outlined the
following: (a) to determine how much people knew about certain topics or how well
they can perform certain skills; (b) to inform students, as well as their teachers and
students to learn; (d) to provide teachers and school administrators with information
to plan or modify the curriculum; and (e) to serve as a means of evaluating the
44
instructional program and staff.
The distinction between achievement and intelligence or aptitude tests is not simple.
Anastasi (1988, p.412) believed that differences between achievement and aptitude
aptitude tests measured the effects of learning under uncontrolled and unknown
conditions, whereas achievement tests measured the effects of learning that occurred
Jensen (1980, p.239) also argued that all performance was a form of achievement, and
intelligence or aptitude and achievement tests, Jensen outlined the following points;
a) Intelligence tests are much broader and more heterogeneous based on a wide
variety of experiences than are achievement tests which have specific types of
b) Intelligence tests sample cumulated knowledge and skills from the individual's past
past.
c) Intelligence tests predict future intellectual achievement, even though the contents
d) Most intelligence measures are more stable across time and are less susceptible to
45
the influence of instruction or training than most achievement tests.
Aiken (1988) believed that the distinction between achievement tests and intelligence
tests can be made in terms of focus. Achievement tests focus more on the present,
what the person knows or can do now, whereas intelligence tests focus on the future
Sattler (1982) pointed out that intelligence tests and achievement tests have
However, intelligence tests are broader in coverage than achievement tests and sample
mathematical tests, are heavily dependent on formal learning experiences that are
acquired in school or at home which make them more culture bound than are
intelligence tests. Sattler added that intelligence tests stress the ability to apply
information in new and different ways, while achievement tests stress mastery of
factual information. Thus, intelligence tests measure less formal achievement than do
achievement tests.
Achievement tests can be divided into standardised and teacher-made tests. The
former mainly differ from teacher- made tests in that they are intended to be used over
a period of many years, and cover a broader range of skills and educational objectives
common to many schools. The term standardised refers to specific instructions for
administration and scoring. Teacher-made tests are tests designed to assess the
comparisons across schools. Teacher-made tests are sometimes called classroom tests
or "informal" tests, and are constructed by classroom teachers for use in their
particular classes under conditions of their choosing (Ahmann and Gluck 1976).
46
Brown (1983) distinguished between teacher-made and standardised achievement
tests. Brown stated that for the teacher-made tests, teacher will refer to textbook
sources of items. Standardised tests developed by test publishers will consider not
one text, but the most commonly used material covered, not by one teacher, but by a
Aiken (1988) believed that teacher-made and standardised tests are complementing
standardised achievement tests. He stated that a teacher made test is more specific to a
particular teacher, classroom, and a unit of study and is easier to keep up to date.
Standardised tests, on the other hand, are built around a core of general educational
testing was born from the need to develop a test that would predict children’s school
The study of intelligence and education provides an example for the fruitful
interaction between the practical demands of educators and the basic research focus of
47
observation: students of some chronological age displayed a range of individual
The study of intelligence has been motivated by the practical problems of education.
By 1905, Binet and his colleagues achieved a solution that was innovative,
intelligence scale. In this scale if a child failed to answer correctly questions that most
other children of the same age could answer, the child was considered below average
in the ability to learn. Likewise, if a child was able to answer questions that most
other same-aged children could not answer, the child could be considered above
average in the ability to learn. These were based on the assumption that all children at
the same age level had the same opportunities to learn. Binet’s test was successful to
some extent in predicting children’s ability to learn in school. This test has served as
Academic achievement at school is the result of learning and problem solving ability
(Bester, 1998) Intelligence is seen to be the ability to think and learn and is therefore
academic performance were reported as being usually close to 0.50 (Brody 1992;
Neisser, Boodoo, Bouchard, Boykin, Boykin, Brody, Ceci, Halpern, Loehlin, Perloff,
Sternberg & Urbine, 1996) but can be as high as 0.75 (Jensen, 1998).
Studies have shown that IQs predict educational achievement. IQs predict subsequent
determines the efficiency of learning and comprehension of all cognitive tasks. The
48
correlations between IQ and subsequent educational attainment were not perfect
substantial and show that intelligence tests measured real cognitive abilities that are
Many empirical investigations have shown that intelligence is the best single predictor
students, developed a path model to show the relative influence of different variables
on achievement. They found that when compared to other factors, such as previous
knowledge and motivational factors, general intelligence was found to have a highly
significant direct effect on achievement, independent of any the other variable in the
Chen, Lee and Stevenson (1996) carried out a study investigating the relative
achievement in Chinese, Japanese and American cultures. It was concluded that there
were similar correlations between intelligence and academic achievement for each
their achievement was tested 10 years later in grade 11. The single most predictive
was general intelligence. The study found a correlation of between 0.48 and 0.53 for
mathematics achievement, between 0.28 and 0.51 for reading and 0.35 and 0.44 for
general knowledge. Gagne and St Pere (2002) in a study comparing the predictive
49
abilities were by far the best predictor of school achievement. In this test, it was found
and achievement in reading, mathematics and general language tasks from grade 1 to
6. The researchers found that the correlation between verbal ability and achievement
The abovementioned study showed the importance of verbal intelligence with regard
to academic achievement, but the results revealed that other measures of intelligence
are also important in predicting scholastic success. In the study carried out by
and a hidden patterns test, was found to be a good predictor of scholastic success in
reading and mathematics. Spatial intelligence was, however, a less powerful predictor
than verbal ability of achievement in the general language area. In the study carried
out by Marais (1992) it was shown that the ability to do mathematics, accountancy
and general science appeared to require the contribution of both verbal and nonverbal
abilities.
scores during the last 60 years. Scores on measures of intellectual functioning have
risen, and in some cases rather sharply, during this period (Flynn, 1999; Neisser,
1998). Analysis of intelligence data from several countries (e.g., Belgium, France,
50
Canada, Britain, and the United States of America) found, without exception, large
gains in IQs over time (Flynn, 1998). The pattern of gains corresponded with the
1987, 1994, 1999; Raven, Raven, & Court, 1993). Average IQs have risen by about
three points a decade during the last 50 years (Flynn, 1999). These IQ gains across
decades, referred to as the "Flynn effect," provided evidence that gains in average IQ
were part of a persistent and perhaps universal phenomenon (Flynn, 1999; Herrnstein
& Murray, 1994). Gains were most dramatic on tests that assesed a general factor, g,
and most widely used culture-free test of intelligence (Jensen, 1980). Many scholars
believe the test measures ‘‘g’’ and might be the most reliable measure to identify
Raven's scores are highly influenced by environmental variables. To illustrate, all 18-
year-old males in the Netherlands took an adaptation of the Raven's upon entrance
into the military. Data available from this population revealed the mean scores of
those tested between 1952 and 1982 rose 21 IQ points. Genetic changes within
populations could not occur in such a short time span (Flynn, 1999). Therefore, the
increase in Raven's IQs must have been a function of changes in the environment
nutrition, the acquisition of information as a result of computers and the internet) led
51
generations may also occur within generations and lead to IQ differences among
subgroups (Flynn, 1987). Thus, the finding of substantial changes in population IQs
over time raises the question as to whether the historically observed pattern of mean
Most of these IQ increases have been reported in economically developed nations but
Lawless, Lambert & Newton, 2005), Kenya (Daley, Whaley, Sigman, Espinosa &
Have increases been greater for fluid IQ (non-verbal & reasoning abilities) than for
crystallized intelligence (verbal and educational abilities) and if so, why? Wheeler
(1942) appeared to be the first to find greater gains in non-verbal than in verbal
abilities in a report regarding the increase in IQs in East Tennessee children aged 6-16
over the years 1930- 40. The average gain was considerably greater for non-verbal
ability (6.0 IQ points per decade) than for verbal ability (2.6 IQ points per decade).
In 1982 Lynn (1982) showed that IQs had increased in Japan over the preceding three
decades. The result of Lynn’s study was confirmed in many other studies in a number
of countries Flynn (1987, 2007), Lynn & Hampson (1986), and Lynn (1990b). Lynn
& Hampson (1986) showed that in Britain fluid intelligence measured by the Standard
Progressive Matrices in children aged 7-15 years increased by 1.86 IQ points a decade
for the years 1938 to 1979. Lynn (2009) has shown that approximately the same gain
Has the amount of increase been the same at all ability levels or greater among lower
52
IQ groups? This question was addressed by Cattell (1951) in his study on the IQ
increase in Britain (1936-49) in which he reported that the gain was only present in
the lower half of the distribution. In an early study, Elley (1969) reported that IQ
gains in New Zealand (1936-68) were smallest in children of professional parents and
greatest in children of unskilled parents. Other studies finding greater gains among
those at lower levels of ability have been reported for Denmark (Teasdale & Owen,
1987, 1989, 2008), Norway (Sundet, Barlaug & Torjussen, 2004) and Spain (Colom,
Lluis-Font & Andres-Pueyo, 2005). However, gains have been equally great among
those at higher levels of ability in France, Netherlands and United States (Flynn,
2007, p.104), while Spitz (1989) has reported that gains in the United States have
been greatest at the average IQ level. A number of studies noted in the introduction
have reported that the IQ increase has been greater among lower IQ groups but there
have also been some studies finding that the increases have been the same at all ability
levels. Lynn’s (2009) data confirmed the previous studies showing greater IQ
What factor or factors have been responsible for the IQ increase? Nine principal
(1) Increased test sophistication. Flynn has recorded that when he began working on
the effect, he canvassed expert opinion and reported that “scholarly correspondents of
causes of IQ gains over time, increased test sophistication and a rising level of
educational achievement” (Flynn, 1984, p.47). These two factors had been advanced
some decades earlier by Tuddenham (1948) in another early report of the effect, while
increased test sophistication has subsequently been endorsed by Jensen (1998, p.327)
53
who wrote of “increasing test wiseness from more frequent use of tests”.
(2) Improvement in educational achievement was the other factor cited by scholars of
high competence from whom Flynn sought advice. This had also been advanced some
decades earlier by Tuddenham (1948, p.56) who stated “the superior performance of
the World War II group can be accounted for largely in terms of education”. Flynn
Many others have favoured the ‘improvement in education theory’ of the Flynn effect,
research of Teasdale and Owen (1994, p.333), Jensen (1998, p.324), Meisenberg,
Lawless, Lambert and Newton (2006, p. 273), Weede and Kampf (2002, p.365),
Stelzl, Merz, Ehlers and Remer (1995, p.294), Flieller (1999, p.1056), Garlick (2002),
Blair, Gamson, Thorne, and Baker (2005), all supported the following statement taken
from Meadows, Herrick, Feiler, et al. (2007, p.58) which stated: “its likeliest cause
(3) The greater complexity of more recent environments provides greater cognitive
stimulation arising from, for example, television, media and computer games. The
following quotes are all taken from research that broadly agree with this point:
p.15)
54
• “Growing exposure to and awareness of the kinds of problems found in
• “Television and other mass media may have left their mark” (Elley, 1969)
• The reasons given are: “Wider exposure to mass media” (Jensen, 1998,
p.326)
• The reasons given are: “TV, video games and computers” (Greenfield, 1998,
p.91).
(4) Improvements in child rearing, e.g. “Better educated parents have more
enlightened views on child rearing” (Elley, 1969), and “…better child rearing
(5) More confident test-taking attitudes have been advanced by Brand (1987) and
Brand, Freshwater and Dockrell (1989). They suggested that increasing liberalism,
permissiveness, and risk-taking promoted speed and guessing, which in turn increased
test scores.
(6) Reduction in family size. This has been advanced by Flynn (2007, p.356) who
dismissed nutrition and wrote “better education and smaller families are much more
plausible (reasons)”.
(7) The “individual multiplier” and the "social multiplier" theories have been
proposed by Dickens and Flynn (2001) and elaborated by Flynn (2007). The concept
of the “individual multiplier” was that intelligent people have a thirst for cognitive
stimulation and this increased their intelligence through positive feedback. The "social
multiplier" posited that “other people are the most important feature of our cognitive
55
development and the mean IQ of our social environs is a potent influence on our own
IQ” (Flynn, 2007). This led Flynn to predict that children brought up in a university
town should have higher intelligence that those without this advantage, because the
(8) Heterosis: Jensen (1998, p.327) has suggested that the genetic factor of heterosis
(hybrid vigor) could have contributed to the Flynn effect. Heterosis resulted from the
mating of two persons from different ancestral lines. Jensen argued this has probably
countries. Further arguments for the heterosis theory have been advanced by Mingroni
(2004).
(9) Improvements in nutrition as a reason has been advanced by Lynn (1990, 1993
1998), who has pointed out that nutrition affected intelligence, and that the quality of
nutrition had improved over the course of the twentieth century. This has been
responsible for increases in height and brain size of about the same magnitude as have
occurred for intelligence. This theory has been endorsed by Jensen (1998, p.325) and
factors.
Endorsed as one causal factor by Arija, Esparo, Fernandez-Ballart et al. (2006),
Colom, Lluis-Font & Andres-Pueyo (2005), and Jensen (1998, p.325) was better able
to explain the large IQ gains of 4 year olds and the larger gains of fluid intelligence
than of crystallized intelligence. The nutrition theory posited that the crucial effect of
improvement in nutrition impacted on fetus and infants when the brain is growing,
and had little subsequent effect. Hence the IQ gains should be fully present in 4 year
olds and should not show increased effects in older children. The improvement in
56
utrition theory can also explain the greater improvement in fluid than in crystallized
intelligence, because numerous studies have shown that fluid ability is more
1998). Hence, as sub-optimal nutrition has declined during the last century, fluid
In addition, Lynn (2009) showed greater IQ gains among those with lower ability
which also might be explained by the improvement in nutrition theory. Those at the
lower ability levels are more likely to have had sub-optimal nutrition in earlier times
and have benefited more from the improvements in nutrition that have followed rising
living standards during the last century. It is doubtful whether any prediction
regarding the size of gains at different ability levels can be made from the increases
stimulation theory (Lynn, 2009). However, Flynn (2007) had argued against the
theory on the grounds that increases in height have ceased in the United States
intelligence was discussed. This chapter introduced the concept of intelligence and
difficult to define. In addition, the chapter has presented an overview of the evolution
of intelligence and intelligence testing, the contribution of scholars in this field and
retardation was the problem that stimulated Sequin, Esquiral and Binet to develop
57
psychological tests. Galton and Cattell both had the idea that intelligence would be
expressed in the form of sensitivity of perception, so they used tests to measure this.
In 1905 Binet and Simon prepared the first IQ test which has been the most widely
used test of intelligence in many countries. The need for rapid testing of a large group
of subjects came with the First World War when in 1917 a group of American
Binet’s test of intelligence and Spearman two factor theory were the starting point for
This chapter has also presented the definitions, classification and use of mental tests.
content. In general tests are used for selection, placement and diagnosis purposes.
The problem of culture bias arose when intelligence tests were used in cultures other
than the one for which they were designed. Researchers explored culture free tests
which minimized the use of language, and then they developed the culture fair test in
which test content is familiar to all subjects. Other researchers believed that there is
no such thing as a free or fair test. Issues surrounding the definitions of intelligence
and the differences between intelligence and achievement tests have been covered.
Finally, the chapter discussed the issue of IQ increase with time and evaluated the
The next chapter will introduce Libya, the educational system and intelligence testing
in Libya. In addition, the study aims, objectives and rationales will be evaluated.
58
Chapter three: RATIONALE AND STATEMENT OF PROBLEM
3.1 Introduction
Libya is a country in northern Africa. The name "Libya" is derived from the Egyptian
term "Libu", which refers to one of the tribes of Berber peoples living west of the
Nile. In Greek this became "Libya", although in ancient Greece the term had a
broader meaning, encompassing all of North Africa west of Egypt, and sometimes
referring to the entire continent of Africa. Bordering the Mediterranean Sea to the
north, Libya lies between Egypt to the east, Sudan to the southeast, Chad and Niger to
the south, and Algeria and Tunisia to the west. With an area of almost 1.8 million
square kilometres (700,000 sq mi), Libya is the fourth largest country in Africa by
Most of Libya’s people are descended from a mixture of Berbers, the country’s
original inhabitants, and Arabs, who arrived in the 7th century AD. Small numbers of
Berbers still live in the far south of the country. Libyan people are Muslims, and
Islam is the official state religion. Arabic is the official language. The southern
mountains and deserts occupy two third of the country, the remaining third are the
Urbanisation refers to the rise in the proportion of the total population living in urban
areas. Urban population increases: 1) when the number birth of exceeds death, and 2)
phenomenon has been clearly described by Ravbar (1997, p. 70) in these words:
59
migration, the upward mobility of the population, and the growth of
city function.
Urbanisation is not a new phenomenon in the Libyan society as many old civilisations
had, at different periods of time, their impacts on Libya and built towns and large
According to the General Authority of Information in the 2006 census, Libya has a
population of about 5.3 million with a growth rate of 1.9 %. One third of the
population are under 15 years of age, and 89.03 % are urban. The literacy for both
sexes (10 years and above) was 88.5%, (males 93.7% and females 83.11%). The gap
mentioned that at independence in 1951 the overall literacy rate among the Libyans
over the age of ten years did not exceed 20 percent. By 1977 the overall rate had risen
to 51%, (73% males and 31% females). The Libyan economy depends mainly on oil
The following section, section two, provides a short description of the education
system in Libya. Whilst the third section is concerned with intelligence testing in
Libya. The fourth and fifth sections are about adoption of intelligence tests and
the statement of the problem and study rationale. The seventh, eighth and ninth
sections deal with study aim, research questions and objectives. The final section
A detailed and comprehensive report about the educational system in Libya has been
60
general framework of educational system. Education in Libya is free for all
individuals’ at all educational levels and compulsory for elementary, preparatory and
secondary school age children (6-15 years). The Ministry of Education supervises the
textbooks, and method of teaching. Preparatory and high schools are segregated by
The school year begins in September and ends in May, and classes are held six days
from Saturday to Thursday every week from 8:00 am to 1:00 p.m. The school system
1. Elementary education level: this level covers the first six years of study (age
6-11 year). In the first three years students study courses in arabic language,
2. Preparatory education level: from grade 7 to 9 (age 12-14 year). In this level
education.
From grade 4 up to grade 8 at end of each school year students sit for an exam to
transfer to the next grade. These exams are prepared by teachers at school level.
At the end of grade 9 students sit for a local exam prepared by a committee of
61
which in turn is required for admission to secondary level. Students must pass this
examination.
student’s interest, students are allocated into one of the four different
specialities. Because of higher pay, status and salary enjoyed by engineers and
At the end of grade 12 students sit for the General Secondary Certification Exam, a
centralised national exam. These exams are run by the Ministry of Education and are
The student's progress depends upon his/her passing the national exams which include
a two to three hours written examination in each subject in the final year. The General
system in the final examination depends on the total scores in all subjects, as follows;
Usually students that successfully finish high school directly get enrolled into the
universities, because work opportunities are extremely limited for high school
obtaining a job.
62
The selection of students for universities is done by the Ministry of Education
studies, postgraduate studies including master and PhD degrees and advanced
diplomas in various specialisation areas are offered (Said lagga et al., 2004).
During recent decades, due largely to concerted efforts in economic and social
Hundreds of schools have been built, many universities have been established, a great
number of students have studied at home and have ventured further afield into Europe
and other parts of the Western hemisphere to study higher education in different
Whilst all of these events have occurred, some areas have not benefited from the
single test of intellectual ability has been officially adopted or developed to be used
for the measurement of intelligence in Libya. Many sectors in Libya use examination
grades as the primary method in determining who should be accepted for study at
various academic establishments and for various jobs in the vocational sector. These
grades were used in some cases as the primary criterion for identifying both gifted and
mentally retarded children and in addition were used for guidance and counseling
purposes. There is no reason, however, to believe that all examination grades have a
63
guidance and counseling purposes Although it might be considered as a good
Testing experts feel ambivalent about school based assessment. Such assessments are
not standardised, criteria vary from school to school and from teacher to teacher
(Heyneman, 1987). Durojaiye (1984) gave one reason for using examination grades
for selection in Africa. Durojaiye believed that school leaving results are more often
used in Africa for selection instead of ability tests because of the shortage of
developing countries makes the adoption of tests from western countries necessary
Owing to the lack of any prevailing local intelligence tests researchers have
because there were some colloquial Egyptian Arabic translations and adaptations for
these types of test. Students viewed personality tests as easier to administer and
interpret than intelligence tests (Attashani and Abdalla 2005). After graduation most
of these students became teachers at secondary schools, and few of them act as
psychologists even though not qualified. Although they had theoretical knowledge
about psychology and psychological testing, they did not have access to a wide range
Mahdawi and Al-Roey (1991) in their study of mental health program in Libya
mentioned that the mental health services suffered from shortage of staff,
psychological services and a lack of facilities. They concluded that as the main
64
more health personnel and community members such as teachers to deliver
psychological and psychiatric services From the above mentioned, it would appear
that at present the academic system in Libya fails to provide what is essential and
psychological testing
Kline (1979) pointed out that intelligence was a variable which is important and has a
definite meaning to Western people. However, the general public in Libya knows
little about the usefulness, purposes, or functions of intelligence and aptitude tests.
For some people IQ testing is something that was associated with psychological or
mental testing. This may point towards a stigma attached with this type of testing
Psychologists have taken many precautions in developing tests but there was
have misgivings about tests and their use in decision-making. Part of this
people who use them. Alexopoulos (1979) noted that misuse of tests may cause harm
Many researchers have studied the problems of misuse of test scores or use of
incomplete test scores for selection and prediction purpose. For example: Parmar
(1989) in India found that the information subtest of Wechsler Intelligence Scale for
Children-Revised (WISC-R) is simply deleted when testing Indian subjects and this
scale was not considered when computing IQ scores. He concluded that the use of the
incomplete test was likely to bias predictions based on test results and had serious
65
Georgas & Georgas (1972) in their study of the use and misuse of intelligence tests in
Greece argued that the use of incomplete test scores for estimation of mental ability
individuals. Bertrand and Cebula (1980) believed that tests in themselves are not bad
and do not hurt children. However, they become bad only in the hands of those who
Sattler (1982, p.4) concluded that intelligence tests are tools which maybe useful in
accomplishing goals, and their effectiveness will depend on the skill and knowledge
When they used wisely and cautiously, they will assist us in helping
children, parents, teachers and other professionals obtain valuable
insight. When used inappropriately, they may mislead and cause harm
and grief.
It is interesting to note that the first IQ test (the Binet-Simon Scale) was constructed
in France in 1905 as a contribution to identify mentally retarded children who did not
profit from regular classroom instruction. Failure to achieve a good assessment for
the mental ability of retarded child at an early age made the problem worse in the
should be well standardised for the local population, also they have to be reliable,
Other areas that have been affected by lack of intelligence tests in Libya were the
selection of students for different educational programs (e.g. gifted and special needs
programs). Intelligence tests play an important role in the educational and economic
66
system of a society because they prevent waste of human resources due to
failure to allocate students according to their abilities and interests deprived the
country from one of its most valuable resources In addition, this also had an adverse
effect on business and commerce where employees scoring well in tests might not
In Libya today, a relevant and accurate selection procedure is required more than ever
before, not only in the field of education but also as an intermediate level of training
for skilled manpower. Indeed, a clear failing of the current system could be seen
whereby many university graduates were posted to office work which could be done
Durojaiye (1984) believed that selection of students for educational purposes is very
university education is not compulsory, and a large number of students aspire to the
few places in the limited number of schools and universities. He stated for this reason
the best testing apparatus had to be devised for selecting students who will benefit
from their education and later meet the high demand for manpower requirements of
Jensen (1981, p.19) believed that using standardised tests for selection was necessary
and unavoidable when number of applicants for university far exceeds the number
67
The problem of adapting intelligence tests to a new setting was by no means
uncommon as this was a general problem for many developing countries in the past.
In addition, if the aim was to assess the mental ability of people in to a culture that has
yet to develop its own testing scheme or system, it was necessary to assess what was
important in and for that culture (Brislin and Thorndike, 1973). Ortar (1972), for
example, mentioned that most countries did not produce their own psychological tests
and had to adapt and modify instruments developed elsewhere to make them suitable
Schwarz & Krug (1972, p.3) in their book about ability testing in developing
countries pointed out that educators and researchers in developing countries held
At one extreme there are those who look mainly at the vast
environmental differences between the developing countries and the
highly industrialised nation, and conclude that any test designed for
one ipso facto can not serve the other. At the other extreme there are
those who attach greater importance to the fact that the skills needed
in both developed and developing countries are exactly the same, and
who fear that "simplified" tests will hamper them in producing
equally high levels of skill in their own population.
Schwarz & Krug concluded that neither view was correct because one view would
exclude all classic testing procedures from use in developing countries, since they
were designed in and for the Western culture, and the other view would oppose the
use of anything else, since this would be a tacit acceptance of lower performance
standards.
In this regard, Ezeilo (1978) suggested that African researchers and psychologists
68
. Design their own test to the local environment; this involves a great deal of time
and effort
. Modify a widely used international test by introducing some changes in its items,
norms
The third choice was the most frequently applicable in the field of the measurement of
mental abilities and personality traits. It required less time and effort than the first two
alternatives Therefore, this approach was applied in this study. The Raven’s
Progressive Matrices test was employed because it has been widely used and enjoys
moderately high indices of validity and reliability when used in a wide range of
cultures.
Kline (1979) concluded that for cross-cultural test construction it was best to use
one’s knowledge and experience of the culture as a guideline to writing items, and
retain those that show themselves to be criteria-based or valid in factor analysis. Such
influencing major ability factors. This was one of the stated aims of cross-cultural
psychologists
Raven's Progressive Matrices test is an example of a culture-fair test that has been
used in cross-cultural testing Brislin et al. (1973), Kline (1979), Raven (1989), and
Murphy and Davidshover (1991) held that Raven's Progressive Matrices was one of
69
3.5 Standard Progressive Matrices (SPM) test
The present study investigated intelligence tests with special interest in the British
mental ability test- the “Raven's Standard Progressive Matrices (SPM)”- as a measure
of general ability. It consists of 60 problems in 5 sets of 12. The tests are called
progressive because each problem in a set, and each set are progressively more
difficult. Each problem consists of geometric design with a missing piece; the
respondent selects the missing piece from six or eight choices given (Domino and
Domino, 2006). A more extensive description of the SPM test shall be given in the
next chapter.
The SPM test was selected because it has been regarded not only by its author, but
also by many researchers (e.g. Burke, 1958; Anastasi, 1988; Raven, 1989; Carpenter
et al., 1990; Arthur, & Woher, 1993; and Arthur & Day, 1994) as a useful non-verbal
measure of ability which was easy to administer and score. It is a group test, which
can be used with subjects of all language backgrounds and does not depend to any
The Progressive Matrices (RPM, Raven, Raven & Court, 2000, Lynn & Vanhanen
2006) is the most widely used test of intelligence in numerous countries throughout
the world. One reason for the popularity of the test was that it is non-verbal and can
therefore be applied cross-culturally, while verbal tests are more culture specific and
preclude cross-cultural comparisons. Another reason for the popularity of the test is
that it was considered to be the best test of g, the general factor present in all cognitive
tasks that was first identified by Spearman (1904) and which was largely a measure of
reasoning ability (e.g. Carroll, 1993; Jensen, 1998; McGrew and Flanagan, 1998). The
70
test was constructed by Raven (1939) and consisted of a series of 5 or 7 designs that
progressed according to some rule. The problem was to identify the rule and
extrapolation and had to select the correct one. Items were scored either right or
wrong. A participant’s score was the number of right answers. Maximum possible
score was 60. The right answers were provided in the SPM manual.
The Raven’s Standard Progressive Matrices (SPM) test was constructed to measure
(Raven & Court, 1998, updated 2003). Kaplan and Saccuzzo (1997) stated that
intelligence, or Spearman’s g factor. In fact, the Raven may be the best available
single measure of g.
In the same vein, Jensen (1998) maintained that in numerous factor analyses, the
Raven tests, when compared with many others, had the highest g loading and the
lowest loadings on any of the group factors. The total variance of Raven scores in fact
comprised virtually nothing besides g and random measurement error. He also added
that Raven’s Progressive Matrices was often used as a “marker” test of Spearman’s g.
That is, if it was entered into a factor analysis with other tests of unknown factor
composition, and if the Matrices had a high loading on the general factor of the matrix
of unknown tests, its g loading served as a standard by which the g loadings of the
By the same token, Lynn, et al. (2004) stated that the Progressive Matrices was
widely regarded as the best test of abstract or nonverbal reasoning ability, and this
itself was widely regarded as the essence of “fluid intelligence” and of Spearman’s g.
71
Mackintosh (1996) had described it as the paradigm test of non-verbal, abstract
reasoning ability.
This view is not, of course, universally accepted. Indeed, Raven and Court (2000)
referred to several studies which emphasised a spatial ability loading, and a review of
the extensive literature dealing with this topic from the point of view of researchers
keen to distinguish “Working Memory” from “g” was provided by Ackerman, Beier,
Court & Raven (1995); Kline (2000); Murphy & Davidshofer (1998) noted the
Gregory (1992) also noted that a huge body of published research has shown the
validity of this test. Therefore, as Irvine & Berry (1988) noted, it has gained
widespread acceptance and use in many countries over the world. No other test had
(2002) summarized extensive number of studies based on normative data for the test
which had been collected in 61 countries. For all these reasons, Kaplan and Saccuzzo
(1997) concluded that with its new worldwide norms and updated test manual, the
Raven was regarded as one of the major authorities in the psychological testing field
Some tests seemed to be more appropriate than others for use with literate children
and adults in developing countries. For example, at middle primary level there was
the Raven's Coloured Progressive Matrices (CPM) test. From the eight year old
upwards there was Raven's Standard Progressive Matrices (SPM) test (Ord, 1972).
72
The Progressive Matrices tests (Standard, Coloured, and Advanced) were the best
known and most widely used as measures of individual differences in cognitive ability
and as culture-reduced tests (Powers et al., 1986.a; DeShon et al., 1995). According
to Thorndike and Hagen (1977) and Ogunlade (1978) the SPM test's freedom from
Jensen (1980, p.648) examined the usefulness of the SPM test and made the following
observations:
Due to all of the abovementioned advantages of the SPM as being widely used as a
cultural test, the researcher chose the SPM test as a measure for mental ability for the
Measuring mental ability accurately and objectively has been a major concern of
needs by providing a sound assessment of intelligence and this is a gap worth closing.
73
ability, has to adopt one intelligence test which is suitable for the measurement of the
Thus, in summary, the problem is related to the adoption of one of the appropriate
Libyan setting, where no single test of intelligence had been officially adapted or
developed to give better judgment and evaluation for the Libyan samples
part, since tests were statistically structured to distinguish between individuals, and
knowledge and make judgments about, between, and within group differences. This
knowledge allowed for the interpretation of the distribution of scores that led to
various decisions (e.g., eligibility for placement in special education and gifted
Not much is known of the intelligence of populations of North Africa (Lynn and
Vanhanen, 2002, 2006). Libya as a developing country faces the same problems
which has been and is being faced by many of its Arab neighbors. It lacks the
programs and during their university study they received some theoretical knowledge
about intelligence and personality tests, still there is a lack of intelligence test
74
Psychologists and scientific research related to educational and psychological issues
in Libya lack the knowledge about IQ tests among the population in general. There is
to suit Libyan samples. All these create a misuse, misunderstanding and unwise
application of the few intelligence tests which are available in Libya and which have
been used in Libya during the past years (Mahdawi and Al-Roey, 1991; Attashani and
Abdalla (2005).
Abdalla (2002) noted that in 1988 during his work as an educational psychologist at
Massa Institution for Mentally Retarded Children in Libya, with little modification,
translated and administered the short Form (L-M) of the Stanford-Binet Intelligence
Scale from English to colloquial Libyan Arabic language, in order to measure mental
ability of retarded and normal children aged 6 to 12 years. The project failed because
the sample was too small (N=54), the test required too much time to administer and
score and there were no test experts to analysis the data which were mainly verbal.
Furthermore, such standardisation for an individual test like the Stanford-Binet can be
done only through professional organisations which have great deal of time, effort and
money. These findings prompted the researcher to study and use the Raven's Standard
Progressive Matrices (SPM) test, as a tool to measure mental ability in the present
Lack of intelligence test adaptation or development and the misuse of the few tests
available now in Libya created problems in the areas of mental measurement and
school selection. One of the major problems facing Libyan psychology researchers
75
now is the lack of accurate measurement of mental abilities. This type of
intelligence tests. For example, only few institutions such as the Benghazi Children's
Hospital or the Tripoli Centre for Mentally Retarded Children were currently using
some items, but not the whole test, from the Stanford Binet Intelligence Scale or from
the Wechsler Intelligence Scale for Children-Revised (WISC-R) for the measurement
of intelligence
Unfortunately these tests items were used in these institutions without suitable
modification and adaptation to estimate some aspects of mental ability of the children
who were referred by parents or schools for diagnosis or treatments. It is clear that
such methods of assessment may have limited the application of test results or led to
wrong classification of a child's mental ability Again, this appeared to point to a lack
of understanding about these tests based upon a lack of knowledge in their application
and how to adapt such tests to suit the intended target groups.
At the Second Family Conference in Beida city in May 1991, the problems of testing
of children with special needs were discussed in a paper presented by Abdalla. One
of the recommendations was to stop testing and labelling deaf and mentally retarded
intelligence tests. Shelley and Cohen (1986) stated that attaching numbers to people
Previous studies that carried out the SPM in Libya included Aboujaaferin 1983; and
Majdub in 1991; Attashan and Abdalla in 2005 and Ahlam in 2005. These studies
were carried out without the prior standardization of the test. This present study
76
carried out the necessary standardization. Standardization of a test means obtaining
average scores and distributions from a representative population (Kline 2000).
mental testing. Thus, the main purpose was to develop norms, for the Classic form of
the Standard Progressive Matrices (SPM) in Libya to find out the distribution of IQ
scores within a Libyan setting. Norms of this groups were compared to norms of other
their age, sex and regions. This was done to examine the conclusion advanced by
Lynn (2006) that average scores are somewhat lower in economically developing
nations than in the economically developed nations of Europe and North America.
This study determined the psychometric characteristics validity, reliability, and item
Matrices (SPM) test in a Libyan setting and computed the percentile ranks for (SPM)
test scores according to sample age levels (Standardization of the Raven's Standard
The last century has marked the success of the means of measurement, in testing in
general and intelligence testing in particular. Group standardised tests, however, have
come to the fore together with individual tests, practical tests, written tests and verbal
and non-verbal tests. Measuring intelligence as a general intellectual ability has been
77
Attashani and Abdalla (2005) mentioned that in 1905 Alfred Binet in collaboration
with Simon in France constructed the first intelligence test and improved versions
came out in 1908 and 1911. This was when intelligence measurements found their
way into many countries and were being widely used for many purposes e.g.
weakness diagnosis and helping in the decision making process. These types of
measurement were critical and provided many benefits, especially in countries where
There remained no serious doubt about the potential usefulness of testing procedures
Whether tests would be adapted and how they were best applied were no longer major
issues; likewise whether such tests needed to be culture free or culture fair. Major
issues centred on such matters as the long-term validity of selection measures; the
prospects for further, as yet relatively untried, measures, as part of the selection
may be operating in the selection situation; the possibility that of adopting more
efficient strategies of selection than traditional ones from the viewpoint of fitting the
job to the man as well as the man to job; and, perhaps most important of all, the
means of building locally appropriate, efficient selection institutions that would prove
Intelligence tests have been used in many areas in both USA and UK. The results
have been used in making decisions for entering schools, colleges, and universities,
78
have been used for vocational guidance and psychological diagnosis. In the USSR,
intelligence tests have been used in the educational sector as well as in vocational
deserved primacy within any culture, for the wealth of any nation-developed,
developing, or “primitive”- was the ability of its people. Once properly identified as
having requisite abilities for differential placement, each person can then conceivably
contribute more to the health. Well-being and the productivity of his country (Brislin
and Thorndike, 1973). It is axiomatic that the great nations have become great,
industrial, and prosperous because mental energies were tapped (Brislin and
Thorndike1973).
Since most developing countries were keen to make use of these tests, and since they
did not have sufficient scientific and technical abilities to help them design suitable
cultural tests, they opted for a standardising process. They were in need of different
tests of this type to satisfy the needs of human and social development plans, which
were usually adopted in these countries. To reach such goals, they needed to apply
these tests, and to conduct scientific research on these tests that represented part of the
measurement. They needed to do such research in order to adjust these tests to their
societies, and to help them reach an appropriate interpretation for the score that a
In this respect, the importance of standardising these tests and measurements came to
the fore from one day to another. This was reflected in the interest of developed
countries in designing and standardising these tests and using them in different life
79
sectors, such as educational, health and other institutions. Moreover, there were now
Regardless to say, intelligence tests are mainly used in the educational sector. They
are also helpful in predicting what students in one class or school learnt in the level
that was expected for them, and also helped teachers to predict what students can
achieve (Alwakfi, 1998). Generally there was a need for intelligence tests to discover
talented individuals. Such students do not differ in appearance from other students.
Unless these tests were conducted, such students had no chance to be recognised
(Rajha, 1970).
There were also many other contributions of testing to society, such as better
distribution of educational and professional opportunities based upon merit and good
judgment, not on luck or personal judgment. Alexopoulos (1979, p.18) in his research
into standardization of the Wechsler Intelligence Scale in Greece, mentioned the help
Eells et al. (1971), Drenth (1972), Miron (1977), Drenth et al., (1979) and Heyneman
(1987) argued that testing had contributed to more effective use of manpower, and
80
These comprehensive tests which recognised skilled young students from others were
widely used. They were used to such an extent that scores in some studies were
considered a scale for a student’s or child’s skills. These tests as well (intellectual
skills in particular) were used to distinguish students with special skills in science, arts
or other skills such as human relation skills. They also helped in distinguishing
students with special skills and with high intelligence skills (Shafile,
).
In Libya we can use these intelligence tests to recognise the intellectual abilities of
students. Depending on the test results, students with high or low scores can receive
Zahran (1990) identified the importance of intelligence tests in particular for children
that may be classified according to their levels. Majdub (1991, p.215) who studied the
that
The research found that Libya is now in urgent need, more so than at any other time,
and universities. In Libya, we do not need a large number of graduates: more so, we
Doubtless to say, the proper use of mental and other ability tests and measurements
within the local environment would provide the indigenous local market with
81
workers, especially when they are classified according to their skills. Issawi (1973)
found that these tests were widely used in filling empty jobs and in choosing the best
person for the best place (vocational, industrial, or even the military sector).
Attashani and Abdalla, (2005) stated that it was harmful for the country’s economy to
select a person for a job that did not agree with his or her intellectual abilities.
Heynman (1987, p.251) pointed out the importance of educational selection to the
Abdalla (2002) mentioned that for school selection, in many western countries, it is
developing and western countries ( for example, Sinha, 1968; Rao, 1974; Maqsude,
1980 and 1983; Carver, 1990; Andrich and Styles, 1994) made use of intelligence
tests especially the Raven's Standard Progressive Matrices (SPM) test for school
Depending solely on students' grades of the last year in secondary school to gain
secondary school grades could minimize two principal errors, for example, admitting
students who might fail in the university and rejecting students who might succeed
(Majdub, 1991.
82
The study highlights the following aspects also:
(1991) reported that psychological tests are seriously neglected in Libya. They
have not been standardized or introduced to the Libyan society. Lynn and
Vanhanen (2006) stated that not much is known of the intelligence of the
• Providing norms for the (SPM) test for use, in conjunction with examination
grades, to help the authority in implementing appropriate decisions related to
the future of individuals, and to guide them to educational programs that will
suit their abilities. Also, for use in job selection to match applicants to suitable
various jobs in the vocational sector. Attashani and Abdalla (2005) mentioned
that no single test of intellectual ability or aptitude has been officially adapted
• Providing the means to estimate levels of intelligence since our society lacks
these tests, to be able to recognize high IQ in the society and well as low IQ.
From the above mentioned points and in view of the present situation in Libya it is
clear that there is a great demand and need for adapting at least one test in each of the
researchers, psychologists and policy makers with effective tests for evaluation,
selection, and diagnostic purposes. For a developing country like Libya such tests
83
which give accurate measures of intelligence, achievement and personality are crucial
To develop norms for the classic form of the Standard Progressive Matrices (SPM)
test in Libya and to identify the distribution of IQ scores within a sample of Libyan
students.
“What are the norms for a Libyan sample when the SPM test is applied as an
2. To study the relationship between SPM mean scores and student’s academic
city, secondary city, coastal, mountain and desert), age and study levels.
on the SPM test according to region and gender, age and region, region and
study levels, geographic areas and gender, academic discipline and gender,
5. To investigate variability of SPM means score gender based on age and gender
84
6. To examine the contribution of the independent variables gender, age and
7. To compute the percentile ranks for the SPM scores according to the sample
age levels.
8. TO compare performance on the SPM test for a Libyan sample with that of
single test of mental ability has been officially constructed or adopted for the
in Libya is mainly due to a lack of test experts and information and knowledge
regarding the usefulness and effectiveness of these tests among people who were
The lack and misuse of some intelligence tests to estimate the mental ability has some
students whom underwent the test. Also guidance, counselling and direction of
students towards universities and colleges and of personnel to various types of jobs
have been affected by the absence and misuse of intelligence and personality tests. It
is believed that intelligence tests are important and vital to the educational and
The present study tried to remedy and rectify the above problems. It is an attempt to
provide an intelligence test that best suits a Libyan setting. It will investigate and
85
The focus of the study was to standardize the British mental ability test; the Raven's
The study aims to develop norms for the classic form of the SPM test to identify the
In the next chapter we give a complete description of the SPM test. This will mainly
include past studies along with their findings. A detail review of the available
86
Chapter four: REVIEW OF STANDARD PROGRESSIVE MATRICES LITERATURE
4.1 Introduction
The aim of this study was to develop norms for the classical form of the Standard
Progressive Matrices (SPM) test and identify the distribution of IQ scores for a
sample of Libyan students. This chapter presents, in details, this review and sheds
light on prominent studies that have extensively employed the SPM test and related
subjects.
To achieve the desired aim, a comprehensive review was undertaken to identify and
appraise the available literature that described psychological and mental testing.
Greater emphasis was on the SPM test in particular. Studies in this review were
Web of Science, Dissertation Abstracts, the British Index to Theses, and Cambridge
Scientific Abstracts. In addition, the following active researchers in the field were
contacted; John Raven, Richard Lynn, Ahmed Abdal-Khalek and Omar Khelefeeh.
The earliest article published on SPM testing dated back to the year 1948. The first
step in the searching process was the identification of key concepts and location of
Data were extracted using the following categories: author, country, year of
publication, population sampled, age, SPM means and standard deviation’s and
sample size. Many papers published between 1948 and 2009, were identified and
87
In addition, the SPM 1988, 1996, 2000, 2003, 2004 and 2008 manuals were included
to the papers and were utilized in this study (Raven, et al., 1988; 1996; 2000; 2003;
2004; 2008).
This chapter has been divided into nine sections. The first section provides general
information regarding the Progressive Matrices tests. The second section describes the
SPM test. The third section talks about reporting SPM results. Section four deals with
standardization of the SPM test. Sections five, six and seven investigate reliability,
validity and item analysis of the SPM test. Section eight briefly reviews relevant
previous studies which have employed of the SPM test. Last but not least section nine,
The Progressive Matrices Tests resulted from the work of the British psychologist
John C. Raven and geneticist Lionel Penrose. It was first published in 1938. Their
work was based on Spearman’s two-factor theory. In fact, the Progressive Matrices
tests are among very few tests which are based on a theory of intelligence (Raven,
2004).
Sinha (1950), a student of Cyril Burt, claimed that the Progressive Matrices tests were
not an original idea of Raven’s, as was often thought. He argued that they were
developed slowly out of the non-verbal analogy test constructed by Burt. Burke
(1958) also attributes the origins of the Progressive Matrices to the work and thinking
Spearman (1946) reported that the measurement of the “g” factor had been achieved
by the use of the Matrices test. He went further by considering the Progressive
88
Matrices test as the best of all nonverbal tests of “g”. Anastasi and Urbina (1997)
stated that Raven Progressive Matrices and vocabulary test were developed to
evaluate the two components of “g”; eductive ability and reproductive ability.
Eductive ability, on one hand, is mostly a nonverbal ability measured by the matrices.
On the other hand, reproductive ability is mostly verbal and measured by vocabulary
tests.
Lewis (1974) wrote that the Progressive Matrices test was a test of reasoning, based
on non-verbal data. Items were devised especially to evaluate the ability to perceive
Murphy and Davidshofer (1991) noted that a number of factor analyses of Raven’s
Progressive Matrices suggested that Spearman’s “g” is the only variable that is
reliably measured by the test. Little evidence can be drawn to indicate any significant
effects of spatial visualization or perceptual ability on the test scores. Carpenter et al.,
Powers et al., (1986a) pointed out that Progressive Matrices were designed to measure
89
According to the 2004 SPM manual, Raven published the first version of the SPM test
in 1938. The current version of the SPM test is essentially the same. In 1947, small
adjustments to item (B.8) were made to improve the absolute order of difficulty.
a) Standard Progressive Matrices (SPM) test for use with individuals over six
years of age, within the normal adult range of ability. The1938 published SPM
b) Coloured Progressive Matrices (CPM) test was developed for use with
children aged five to eleven, the elderly, and the mentally retarded.
c) Advanced Progressive Matrices (APM) test sets I and II for individuals above
The CPM and APM tests were both published in 1947 for the first time. All three
tests were designed to be used in association with a vocabulary scale. This is such that
verbal ability can be measured when required. There are two versions of the
vocabulary scales according to age; the Crichton Vocabulary Scale for children and
the Mill Hill Vocabulary scale for adults. The latter is available in senior and junior
The SPM test was adopted as the basic intelligence test by the USA Army and Navy
personnel selection departments in 1941. It was the main test for military
classification in Great Britain. It was utilized to ensure that normal intelligent recruits
were not rejected due to poor education. Before the end of the Second World War, it
had been already applied to several millions of recruits (Vernon, 1960; Cronbach,
1970).
90
In addition to the above characteristics, Raven Progressive Matrices test is probably
one of the most widely used culture-fair tests. Raven et al., (1996) mentioned that for
comparative purposes the SPM test became used internationally, and no general
The SPM test is a non-verbal ability test consisting of a series of geometrical designs;
a 3x3 "matrix" grouped into five sets lettered A, B, C, D and E. Each set consists of
12 matrices. These Matrices are presented in black and white pictorial context. The
first matrix in each set is easy so as to be self-evident then it is followed by more and
Jensen (1980) showed that each set involves different principles of varying matrix
patterns. Also, within each set the items become progressively more difficult. Thus
after every 12 items, the subject is always faced by a quite simple item. This prevents
The early matrix serves to teach one how to solve the later matrix. Thus it appears to
be a measure of a person’s ability to learn and apply new material, at least in the
In each matrix, a part located in the lower right-hand of the geometrical design is
missing. Six alternative (sets A and B) and eight alternatives (sets C, D, and E) are
given below each matrix. All of these alternatives fit in the missing part. Only one,
91
The test instructs the participants to look across the rows and then look down the
columns to identify the rules of determining the missing part. The items are scored
either right or wrong. The subject's score on the SPM test is the total correct answers.
Progressive Matrices problems are usually easier to solve than to describe (Hunt,
1975). An example of the Progressive Matrices problem is shown in Figure 4.1. The
pattern on the top is missing a piece, and the subjects must determine which numbered
Figure 4.1 Typical items from the SPM Test. A5 presents an easy item whereas E1
presents a difficult item (Reproduced From Anistasi and Urbina, 1997, p.263).
Raven et al., (1988) described the SPM test as a test of a person's capacity, at the time
of the test, to apprehend meaningless figures presented for his observation. Seeing the
relations between them, conceiving the nature of the figure and completing each
reasoning.
92
Researchers investigated various methods in an attempt to understand the most
efficient process that can be used to determine the missing parts, for example, an
answer which fits may, as Raven et al., (1988) puts it: (a) complete a pattern, (b)
Hunt (1975) suggested that there were two quite different solution algorithms; a)
Gestalt algorithm, which deals with a problem by using the operations of visual
perception, such as the continuation of lines through blank areas and the
superimposition of visual images upon each other. The gestalt algorithm relies upon
analytic algorithm deals with abstracted features of displays, by operations such as,
Anastasi (1988) thought that the easier items require accuracy of discrimination
whereas the more difficult items involve analogies, permutations and alternations of
pattern, and other logical relation. Moreover, Carpenter et al., (1990) concluded that
the following five different types of rules were used when attempting an SPM test to
determine the missing part; 1) Constant in a row: the same value occurs throughout a
subtraction: a figure from one column is added to or subtracted from another figure to
93
4) Distribution of three values: three values from categorical attribute are distributed
through a row. 5) Distribution of two values: two values from categorical attribute are
The Progressive Matrices test is usually administered with no time limit and can be
follow once the method is understood. But since there is no time limit, time taken to
According to the 2003 SPM manual (P.69), the most effective and convenient method
percentage frequency. Where, a similar score is found to occur among people of the
the population and group people’s scores accordingly. In this way, it is possible to
GRADE I: “intellectually superior”; if the score lies at or above the 95th percentile
for people of that same age group.
GRADE II: “definitely above the average in intellectual capacity”; if the score lies
at or above the 75th percentile of that same age group.
II+: if the score lies at or above the 90th percentile of that same age group.
GRADE III: “intellectually average”; if the score lies between 25th and 75th
percentile.
III+: if the score is greater than the median or 50th percentile of that same
age group.
III -: if the score is less than the median of that same age group.
GRADE IV: “definitely below average in intellectual capacity”: if the score lies at or
below the 25th percentile of that same age group.
94
GRADE V: “intellectually impaired”: if the score lies at or below the 5th percentile
for that age group.
The SPM test was first fully standardised by Raven in 1938 on a sample of 1407
was performed and the test was re-standardised on school children from Colchester.
The Mill Hill Vocabulary Scale was also standardised in that study. During the fifties
and sixties, several checks were run to determine the norms accuracy. The following
95
Table 4.1 SPM standardization studies
COUNTRY YEAR N AGE RESULTS OTHER COMMENTS.
China 1986 5108 6 to 79 Percentile norms for each half- SPM standardization (Raven, et al. 2003)
year interval( 6 to 16), for three
years interval(17 to 19) and for ten
years interval ( aged 20 to 97)
UK 1979 3500 8 to 18 Percentile norms for each half- SPM standardization (Raven, et al. 2003)
year interval( 6 to 16)
Belgium 1984 to 952 25 to 89 Percentile norms for each ten years SPM standardization (Raven, et al. 2003)
1990 interval ( aged 25 to 89)
Scotland 1992 629 20 to 75 Percentile norms for five-year SPM and MHV standardization (Raven, et al. 2003)
intervals (aged 20 to 65)
Turkey 1993 2485 6 to 14 Percentile norms for each half- SPM standardization (Raven, et al. 2008)
year interval ( aged 6 and 14)
Slovenia 1998 1556 6 to 18 Percentile norms for each year SPM standardization (Boben, 2007)
96
interval (8 to 18). Also, mean
scores for each year (aged 8 to 18)
Pakistan 2004 to 1662 11 to 18 Percentile norms for each year SPM standardization (Ahmad, et al. 2008)
2006 interval (aged 11to 18)
Syria 2004 2489 7 to 18 Mean scores for each year ( aged 7 Rahmn 2004 in his PhD as standardistion SPM test reported
to 18 ) by ( Keleefa and Lynn, 2008a)
Sudan 1999 6,202 9 to 25 Mean scores for each year ( aged 9 SPM standardization ( Keleefa et al., 2008b)
to 25 )
Qatar 2001 1135 6 to 11.6 Mean scores for each year ( aged 6 SPM standardization ( Keleefa and Lynn, 2008a)
to 11.6 )
Kuwait 2006 6529 8 to 15 Mean scores for each year ( aged 8 SPM standardization Abdel-Khalek and Lynn (2006)
to 15 )
Oman 2003 5212 9 to 21 Mean scores for each year ( aged 8 SPM standardization Abdel-Khalek and Lynn (2009)
to 15 )
4.6 Reliability of the SPM Test
measuring. The more reliable a test is the more confidence we have about the
obtained. It assures that the scores obtained from the test are identical to the scores
that would be obtained if the test was re-administered to the same takers. In other
words, reliability means that a test is stable in measuring a trait i.e. the results of
measuring the same trait do not differ from one time to another (Domino, Domino
2006).
There are two ways to build consistency into a test: one is to do with the test
environment; while the other with test construction. Test environment could be
divided into physical and psychological factors. Physical factors, such as room
temperature, lighting and setting, are relatively easy to keep constant. On the contrary,
Psychological factors such as emotional stress anxiety and physical illness are
Test construction, or test nature, is another factor which affects reliability. A test
participants will rank about the same, each time they attempt it. Length and quality of
the test-items are two important factors in test construction. The longer the test, the
more reliable it will be. The less ambiguous the questions, the more likely the answers
will be the same on two different occasions (Bertrand, & Cebula, 1980).
It is essential that the test should have a high level of reliability. Raven, et al., (1996)
mentioned that several studies dealing with the reliability of the SPM test have
97
reported positive results. These studies covered a wide range of ages, cultural groups
and populations.
There are several methods to determine reliability. The three most commonly used
are: split-half, test-retest and internal consistency (Cronbach’s Alpha) (Anastasi and
Urbina 1997; Kenneth 1998; Kline 2000; Langdridge 2004; Domino and Domino
2006). All of these methods have been employed in the current study.
Kline (2000) stated that test-retest reliability is a correlation of the items within a test
administered at two separate occasions. The test is first conducted to a certain group.
It is then repeated on the same group after an interval extending from one week to
several years. Some factors determine the time interval to be long or short. For
example, if the test items can be remembered easily then the time interval may be
taken to be long. However, if the sample is children then the interval needs to be
short.
It is known that the shorter the intervals the higher the test-retest reliability is.
According to the SPM test 2004 manual, test-retest correlation ranges from as low as
0.46 for an 11 years interval, in a study carried out in Germany in 1983 (N=1000
school children) tested from sixth grade, to as high as 0.93 within two weeks interval,
From the original studies of the SPM test, Raven provided a test-retest reliability
ranging from 0.83 to 0.93 for several age groups. The results were: 0.88 for 13 years
and over, 0.93 for 30 years and below, 0.88 for 30 to 39 years, 0 .87 for 40 to 49 years
98
In India, Rao (1974) mentioned that the SPM retest reliability in two weeks interval
was found to be 0.93 for a group of college students. Abdel-Khalek (1987), in his
Nkaya et al., (1994) administered the SPM test three times at two weeks intervals to
88 students from Congo and 68 students from France. The French mean age was 12.3
years and the Congolese was 13.3 years. For the French students the reliability
between test 1 and 2 was 0.81, between test 2 and 3 was 0.74 and between test 1 and 3
was 0.75. For the Congolese students the reliability between test 1 and 2 was 0.91,
between test 2 and 3 was 0.92 and between test 1 and 3 was 0.87. They concluded that
According to the SPM test 1996 manual, the 1986 Chinese standardisation test-retest
reliability was 0.82 at 15 days interval and 0.79 at 30 days interval. More recently,
Abdel-Khalek (2005) with Kuwaiti school students (N=968) found a retest reliability
correlation range between 0.69 (age 12) and 0.85 (age 9). The time interval between
Khelefeeh and Lynn (2009) conducted a study to evaluate the SPM test norms in a
Qatari standardization sample, 1135 students aged 6 to 11.5 years (517 males and 618
females). The test-retest correlation coefficients of 0.89 for males, 0.95 for females
and 0.93 for the total sample were reported. From the above studies it was concluded
99
4.6.2 Split-half reliability
Split-half reliability test was first devised by Spearman in 1907 as an alternative to the
test-retest method. It solved the memory effect problem associated with the test-retest.
In this method the test items are split into two halves, then correlated with each other.
It is possible to split the test using the first and second halves of the test, or more
commonly, using the scores of the even and odd items (this is particularly important
with test ability where items are often arranged in an order of difficulty). Clearly,
where this is the case, there might be poor correlation between the first and second
exceeded 0.90. The lowest reliability was 0.86 with 174 Iranian children aged 9
years. The highest reliability was 0.96 in a study with 91 psychiatric male patients
Burke and Bingham (1969) found a split-half corrected reliability coefficient of 0.96.
This was in a study with 91 male patients with a mean age of 35.1 years who were
Baraheni (1974) found a split-half correlation that ranged from 0.86 to 0.95 with
Iranian subjects aged 9 to 18 attending primary and secondary schools. The lowest
correlation, 0.86, was with 174 girls aged 9 and the highest correlation, 0.95, was with
291 boys and 425 girls aged 15 years. For subjects aged 18, split-half correlation was
0.93 (N=304). Sinha (1977) found a total split-half reliability coefficient (odd-even
split) of 0.90 with an Indian sample consisted of 140 students aged 11 to 15. They
were studying at grades 8, 9, 10 and 11. Sinha stated that the SPM test had a high
reliability for the Indian sample. Another high split half reliability of 0.94 with a
100
sample of 194 psychiatric patients in Germany in 1983 was reported in the 2004 SPM
test manual.
Bart et al., (1986) used the SPM test to study the development of proportional
reasoning in Qatar and United States. The American sample (N=281) ranged from 10
to 13 years of age. The Qatari sample (N=273) age was between 10 to 16 years.
Participants were students in the fifth, sixth and seventh grades. The SPM test
reliability, as indexed by the coefficient alpha, was 0.95. They stated that the value of
Comparing two cultural groups in Arizona, Powers et al., (1986.a) found a reliability
of 0.87 with 127 (69 boys and 58 girls) Hispanics. The same reliability was found
with 103 (53 boys and 50 girls) Anglo-American sixth grade students.
In 1994, Duzen et al., in a study carried out on 2277 Turkish students (6 to 15 years)
sample of 1662 adolescents aged (12 to 19) years and 2016 adults aged (18 to 45),
Qatari sample of 1135 students aged 6-11.5 (517 males and 618 females) confirmed a
split-half reliability of 0.84 for males, 0.88 for females and 0.87 for the total sample.
The above stated studies showed a high reliability of the SPM test. The average value
consistency reliability by determining how items of a test relate to each other and to
101
the total test. The KR-20 formula is a special case of the general Cronbach’s Alpha.
KR-20 formula provides reliability estimates that are equivalent to the average of the
split-half reliabilities computed for all possible halves. KR-20 is useful for multiple
choice items that are scored as right or wrong. In the case where the items can have
more than two scores then Cronbach’s Alpha formula should be used (Anastasi,
0.95. Dey (1984) with 136 talented Indian students, obtained a Kuder-Richardson
Rushton and Skuy (2000) administered an SPM test to 309 (17 to 23 years) students in
South Africa (173 Africans, 136 whites; 104 men, 205 women). The test aimed at
comparing the performance between African and white students. The study showed
internal consistencies based on Cronbach's alpha of 0.83 for white males, 0.73 for
white females, 0.89 for African males, and 0.92 for African females.
In 2002, Rushton et al, carried out an SPM test on 342 university students (198
computed by Cronbach’s Alpha were 0.88 for the sample as a whole, 0.61 for whites,
0.82 for Indians, and 0.87 for Africans. Moreover, Abdel-Khalek (2005) on a sample
of 6529 Kuwaiti school students found that Cornbach’s alpha coefficients ranged
between 0.88 (age 14) and 0.93 (age 9). Similarly, Taylor in 2007 carried out a study
in South Africa on 144 female and 199 male job applicants. 46.9% were black and
41.8% white. A very good internal consistency reliability (0.96) of the SPM was
reported. In the same year, Boben (2007) conducted an SPM test on 1,556 children
102
and adolescents aged 7.5 to 18 years in Slovenia. Male students consisted 53% of the
sample. Calculated Cronbach’s alpha ranged from 0.89 (age group of 12 years) to
The following table (table 4.2) summarizes the above studies about the SPM test three
Table 4.2 Summary of the studies performed on the SPM test reliability
SPM TEST-RETEST RELIABILITY
Abdel-khalek Egypt 87 44 0.82
Nkaya et al., Congo 88 0.91
France 86 0.81
Abdel-kalek Kuwait 2005 968 0.78
Khelefeeh & Lynn Qatar 2009 517 0.89
618 0.95
1135 0.93
SPM SPLIT-HALF RELIABILITY
Researcher Country Year N Reliability value
Burke & Bingham USA 1969 91 0.96
Baraheni Iran 1974 174 0.86
425 0.95
Sinha Indian 1977 140 0.90
Raven et al., Germany 1983 194 0.94
Bart et al., Qatar & USA 1986 554 0.95
Powers et al., USA 1986 127 0.87
103 0.87
Duzen et al., Turkey 1994 2277 0.91
Ahmad, et al. Pakistan 2008 1662 0.89
Khelefeeh & Lynn Qatar 2009 517 0.84
618 0.88
1135 0.87
SPM TEST ALPHA RELIBILITY
Dey Indian 1984 136 0.91
Bart et al., Qatar & USA 1986 554 0.95
Duzen et al., Turkey 1994 2277 0.95
Rushton and Skuy South Africa 2000 309 0.84
Rushton South Africa 2002 342 0.88
Abdel-kalek Kuwait 2005 6529 0.91
Taylor South Africa 2007 243 0.96
Boben Slovenia 2007 1556 0.92
103
It can concluded that the SPM test has a high degree of reliability for all three tests:
test-retest, split-half and internal consistency. Thus, their combination assures that it
has a high reliability. Looking at the regions where the test has been performed; it
covers a large proportion of the world including developing and developed countries.
The fact that the reliability of the test was relatively constant implies that the SPM test
Validity denotes the extent to which a test measures what it is supposed to measure
other hand, as discussed in the previous section indicates the consistency of the scores
produced. The validity of a test depends on its reliability. A valid test is always
can consistently measure the wrong thing and hence be rendered invalid. Suppose an
instrument that is intended to measure social studies concepts actually measured only
social studies facts. It would not be a valid measure of concepts but can measure the
facts very consistently (Mills, Airasian 2006, Langdridge 2004 and Anastasi, Urbina
1997). Therefore reliability of a test is necessary but not sufficient for establishing its
validity. Reliability and validity are specific to the interpretation being made and the
group being tested. As a result we cannot simply say that a certain test is reliable
and/or valid. We rather must say that the test is reliable and/or valid for this particular
104
Validity is the most paramount characteristic of a psychological test. To the extent
that without empirical data regarding the validity of a test we have no evidence,
possible to provide meaning to or interpret the test scores (Brown, 1983, Anastasi and
There are three types of validity used in educational and psychological measurements:
content validity, criterion-related validity and construct validity (Anastasi and Urbina
1997).
Content validity refers to the extent to which a test measures a sample of the
content of a measuring instrument, one is concerned with the question of how well the
content of the instrument represents the entire universe of the content being measured
analysis of the measured objects in terms of partial elements. If the items of the test
cover those elements in typical portions and the test appropriately samples the whole
measured content then the content validity is considered to be high. Content validity is
evaluated objectively and determined by logical analysis of the test content. However
it cannot be expressed in terms of a numerical index (Anastasi and Urbina 1997 and
the face validity. Although the meanings of the two often overlap they are quite
distinct. Face validity is essentially the apparent measurement of the test and not the
105
actual one. In other words, face validity refers to the degree to which the test appears
to be valid for non-technical observers such as examinees and test administers. Its
main role in the process of validation is the initial scanning in test selection
As an example, the SPM test meets an important requirement for use in cross-cultural
contexts. It has face validity in the sense that it appears to those who take and
Construct validity of a given test is the extent to which the test is said to measure a
variables that have evolved either informally or from psychology theory. Intelligence,
anxiety, aptitude, musical ability, critical thinking, ego strength, dominance and
the systematic analysis of test scores designed to assess whether there is a basis for
validity. The questions to be answered by construct validity are: what traits are
measured by the test? And to what degree? The process of construct validation
involves identifying and clarifying the factors that have an effect on the test scores.
The test performance can then be interpreted most meaningfully. This process
106
(Gronlund, 1981, and Ary et al., 1985). Anastasi and Urbina (1997) stated that factor
Factor analysis provides research information regarding the extent to which a set of
evaluates the extent to which the individual items on a scale truly cluster together
around one or more dimension. Items constructed to measure the same dimension
should load on the same factor; those constructed to measure different dimensions
should load on different factors (Anastasi (1988), Anastasi, Urbina (1997), Kunnally
and Bernstein (1993)). In addition, Geri and Judith (2006) reported that this analysis
showed whether the items in the instrument reflected single or several constructs.
The SPM test was designed to be a measure of the general intellectual ability “g”, as
It had been universally accepted for over half a century that the test was an
appropriate measure of “g”. This position was endorsed by Emmett (1949) based on
factor analysis of the SPM items in a sample of 11 years old children. More recently,
Jensen (1998, p. 541) contended that “the total variance of Raven scores in fact
comprised virtually nothing besides g and random measurement error”. Raven, Raven
& Court (2000, p.34) stated that “The Progressive Matrices has been described as one
The SPM test (2004) manual reports several factor-analytic studies involving a large
number of children and adults. For example, investigations of British children showed
a high loading of up to 0.83 on “g” factor (Raven et al., 2004). Burke and Bingham
107
(1969) found a very high loading of up to 0.76 on “g” with adults. Also, as reported in
the SPM 1996 manual, (Zager et al., 1980) obtained a loading of .080 with “g”.
students (205 males and 247 females). A principal component factor-analysis with
unities inserted in the diagonals was carried out to determine if the items contained a
general factor and possibly other factors. Analysis showed a significant factor
(eigenvalue >1.0) that was extracted from both groups. This factor accounted for
79.6% and 72.6% of the total variance for male and female undergraduates
respectively. Another study carried out by the same author in Kuwait (2005), on a
sample of 6529 students aged 8-15 years (3278 boys and 3251 girls), investigated
was carried out to find present factors. Results showed only one significant factor
which had a large eigenvalue of 3.46 that accounted for 69.2% of the variance.
Despite the above findings, a dispute was raised on the issue of whether the
Progressive Matrices are really a pure measure of “g”. A number of scholars have
contended that while the Progressive Matrices were largely a measure of “g” they also
contained a small visualization or spatial factor. Among them were Adcock (1948),
Keir (1949), Banks (1949), Vernon (1950), Gabriel (1954), Gustaffson (1984, 1988).
They concluded that the SPM test measures a reasoning factor and another factor
which was called “cognition of figural relations”. Hertzog and Carter (1988)
contended that the SPM contained two further factors named: verbal intelligence and
spatial visualization.
In agreement with the previous studies, Rimoledi (1948), Banks and Sinha, (1951)
and Sinha (1968) reported that “g” accounted for only 36% to 37% of the total
108
variance of the test scores. They suggested that the SPM test measures other factors
overlap between skills on the Raven and other test of mental abilities. These studies,
which have most often been conducted with adult or older adolescent participants,
have provided evidence that Raven test evaluates perceptual and spatial abilities as
On a sample of 920 Mexican primary school children, factor-analysis of the SPM test
results showed a strong reasoning factor and a weaker visualization ability factor. This
was among the results on contrary to the view that the SPM only measures “g” (Lynn
et al., 2004). Furthermore Lynn et al., (2004) conducted an SPM test in 2001 in
Estonia on a sample of 2735 adolescents whose age ranged from 12 to 18 years. They
identified a general factor and three further factors that they reported as: the gestalt
continuation, found by Van der Ven and Ellis (2000), verbal-analytic reasoning and
visuo-spatial ability. Further analysis of this study showed a higher order factor
identified as “g”.
The question that can arise here is how does “g” relate to the other three
factors? Contemporarily, the widely accepted theory that counts for this relation is
• Stratum 1: “g”
Stratum 3: around fifty factors. These are approximately the same as what are called
109
4.7.2.2 Internal consistency
One of the methods used to identify a construct is the internal consistency method.
The chief criterion of this method is the total score of the test. Correlation methods are
often employed in this validation process. These involve item-test scores correlation
and subtest-test scores correlation (Anastasi (1988) and Anastasi, Urbina (1997)). The
latter correlation may be used in some intelligence tests where separately conducted
subtests are performed. The score on each subtest is correlated with the total score of
the test. In doing so, only those subtests which show correlation of 0.3 or higher are
retained (Tabachnick & Fidell 2007). The test is then said to be validated by internal
consistency.
As stated above, the internal consistency plays a role in determining the characteristic
of a trait or domain behaviour represented by the test. This can be easily seen by the
fact that highly correlated items and subtests with the test strongly suggest that the test
is measuring what it is meant to measure. In this sense, the internal consistency shares
some features with construct validity (Anastasi (1988) and Anastasi, Urbina (1997)).
It should be noted that no single validation process can establish the construct validity
consistency of the five sets of the SPM test. The Pearson’s product-moment was
employed. All of the inter-correlations between the sets were positive and statistically
significant. They ranged for the male group from 0.32 to 0.67 (N = 205) and for the
110
investigated the internal consistency of the SPM. The Pearson correlation coefficients
were statistically significant. They ranged from 0.43 to 0.77 for p < 0.001.
performance on another test or measure. The second test measure is the criterion
against which the validity of the initial test is evaluated (Mills, Airasian (2006) and
b) Predictive validity: correlation between test scores and a criterion that occurs
at a later point in time (Ary et. al, 1985 and Domino, Domino 2006).
Anastasi (1988) stated their definitions and distinguished between them in the
following:
Domino and Domino (2006) mentioned that the SPM concurrent validity with
111
ranging from 0.50 to 0.80. Predictive validity, especially of academic achievement,
generally fell in the region of 0.20 to 0.60 (Raven, 2004). Powers and Barkan (1986a)
reported that the SPM scores had a correlation of 0.40 with reading achievement
Anastasi and Urbina (1997) mentioned that specific indices used as criteria measures
teachers’ or instructors’ rating for intelligence. Such ratings given within an academic
Likewise they may be properly classified with the criterion of academic achievement.
The correlations of the SPM test with intelligence test, standardised achievement tests
and school examinations varied with age, gender and sample homogeneity. Some
The SPM test manual (2003) reported correlations in the range of 0.54 to 0.86
between the SPM and other IQ tests e.g. Stanford-Binet and Wechsler Scales for
research with non-English speaking children and adolescents, as reported in the SPM
test manual (1996), tend to be lower. Generally they range from 0.30 to 0.68. Also as
for students from non-English speaking cultures (e.g. Southern and Eastern European
and Middle Eastern countries) and those with non-professional fathers to score lower.
112
The following is a brief review of the studies conducted to determining the
relationship of the SPM test scores with more widely used intelligence tests such as
Lorg-Thorndike Test, Wechsler Scales (WISC-R for children, WAIS for adults),
Army General Classification Test (AGCT) Cohen Test, General Mental Ability
(GMA), Minnesota Paper Form Board (MPFB), Otis Gamma, Revised Beta, Quick
Test, Orange Juice Test (OJT), Stanford-Binet, AH2 tests, Otis-Lennon, Primary
Mental Abilities (PMA), Cattell's Culture Fair Test (CCFT), Arabic Verbal Reasoning
Test (AVRT), San Diego Test of Reasoning Ability (SANTRA), and Draw-a-Man
test.
Tulkin and Newbrough (1968) conducted an SPM test and Lorg-Thorndike test to 356
fifth grade and sixth grade high and low social class and black and white students.
Correlation between SPM test scores and Lorg-Thorndike Verbal IQ was 0.45 for
white high class (N=128); 0.33 for white low class (N=75); 0.40 for black high class
Correlation between SPM test and Non-verbal IQ was 0.53 for white high class; 0.52
for white low class; 0.40 with black high class and 0.45 with black low class. It was
concluded that all correlations between SPM test and Lorg-Thorndike IQ test were
significantly different from zero. For the white groups the SPM test score was
somewhat more related to Non-verbal IQ than to Verbal IQ. This pattern was not
In India Mehrotra (1968), with a small sample (N=45) of students with a mean age of
14.2 years, found a correlation of 0.68 between SPM test and WISC-R Full Scale,
0.60 with Verbal and 0.61 with Performance sub-tests. Burke and Bingham (1969)
found a significant correlation between SPM scores and Army General Classification
113
Test (AGCT). Similar results found with the Cohen Test with a sample of 91 male
patients (mean age 35.1 year) who were referred for vocational counselling services.
The correlation between the SPM and Cohen Verbal was 0.59; with Cohen Memory
0.49; with Cohen Perceptual Organization was 0.61. The correlation between the SPM
and AGCT Verbal was 0.60; with AGCT Numerical 0.66 and with AGCT Total was
0.67.
Mohan (1972) in India investigated the relationship between verbal and non-verbal
ability tests. He found a correlation of 0.65 between the SPM test and General Mental
Ability (GMA). The sample consisted of 310 college and university students ranging
Mclaurin and Farrar (1973) administered both SPM test and WAIS test to 201
correlation between the SPM test and the WAIS were 0.57 for Full Scale, 0.45 for
Verbal and 0.54 for Performance. In the same study they investigated the validity of
the SPM test by correlating it with grade point average (GPA) and Minnesota Paper
Form Board (MPFB). Correlation between SPM test and MPFB test was 0.45.
Correlation between the SPM test and GPA was 0.21. This correlation was as good as
the correlation between GPA and WAIS-Full Scale which was .28 (N=201). The
Three studies evaluated the use of the SPM test with psychotic patients in the USA
reported reasonable correlations between the SPM test scores and WAIS Full Scale,
Verbal, and Performance IQs. Burke and Bingham (1969), with 91 American male
patients at a veteran’s hospital referred for vocational counselling with a mean age of
114
35.1 years, found a correlation of 0.75 between the SPM and the WAIS Full scale,
0.65 with the WAIS Verbal IQ and 0.76 with WAIS Performance IQs.
In another investigation with psychiatric patients in Texas, Vincent and Cox (1974)
found that the SPM test correlated reasonably well with the WAIS Scale. Correlations
were 0.85 with Full Scale, 0.84 with Verbal and 0.75 with Performance. The sample
(N=131) was taken from psychological files of the Texas Vocational Rehabilitation
Unit. Most patients suffered physical, emotional or mental disability. It was concluded
that the SPM test is a viable tool for measuring intelligence in such population.
Also in the above study Vincent and Cox (1974) correlated the SPM scores for a
sample of 226 psychiatric patients with three IQ tests. Most patients had a physical,
emotional, or mental disability. The sample mean age was 28.7 year and consisted of
scores and Otis Gamma scores was .70 (N=97), with Revised Beta .38 (N=58) and the
The third study with psychiatric patients (N=256) was done by Burke (1985) who
correlated the SPM scores with WAIS score and found that the correlation between
the SPM and WAIS Full scale was .66, with Verbal scale .61, and with Performance
Bart et al., (1986) administered the SPM and the test of proportional reasoning Orange
Juice Test (OJT) to a sample of 273 American and 281 Qatari fifth, sixth and seventh
grader students. They found a significant correlation of .49 between SPM and OJT.
According to the 1996 SPM test manual, Zhang & Wang (1989) in China found that
the SPM correlated .71 with Full scale WISC-R, .54 with Verbal and .70 with
115
Performance (no age level or sample size were reported). Another study by Narayanan
and Paramesh (1978) using the SPM test in India, administered the SPM test and
Cattell's Culture Fair Test to Tamil subjects, and reported a correlation of .58.
Horton and Karees (1987) administered the SPM test to a small sample (N=20) of
students participating in a gifted students program in the United States. They found a
correlation of .72 between the SPM test and Stanford-Binet. Correlation between
Helms (1987) with 130 Canadian university students (65 females, 65 males and
average age of 19.3 years), reported a low correlation ranging from .22 to .36 between
AH2 Scales (a general ability test) and the SPM test. A correlation of .22 for Verbal,
.28 for Numerical, .31 for Perceptual and .36 with AH2 total scores. Helms
concluded that the SPM test correlation with other mental ability test was in a range of
.50 to .70, according to Jensen (1980). These values of AH2 correlation are somewhat
lower than the usual value for correlation among test of general ability, but the
In the US, the SPM test was administered by Jensen, et al., (1988) with a time limit of
Advanced Progressive Matrices (APM) and Otis-Lennon Mental Ability Test form.
Correlation between SPM and APM was .58 and correlation with Otis-Lennon was
.47.
In a study in Mississippi by Karnes and Whorton (1988), the SPM and Culture-fair
Intelligence Test was administered to 625 (441 white and 211 black students), in rural
county elementary school (grade 3-8). The mean age was 8.10 years. 410 students
116
were on free or reduced lunches and 245 students on paid lunches. The Pearson
correlation between the SPM and Culture-fair Intelligence Test was a moderate .46
and significant.
In a study carried out in Libya on two groups from Tripoli University, Majdub (1991)
found significant correlation between SPM and an Arabic Verbal Reasoning Test
(AVRT). For the Arabic major group correlation between SPM and AVRT was .53
(N=78). For the Education major group correlation between SPM and AVRT was .25
(N=111).
In a study by Johnson et al., (1994) a sample of 449, second, fifth and seventh grade
students in San Diego city school were given the SPM test. In this group, 77 were
African American, 122 Asian, 54 Filipino, 156 Latino and 40 White American. Of
these 215 were boys and 234 were girls. The mean age of children was 11 years (age
range from 6 years 8 months to 13 years 10 months). They administered the SPM and
an alternate form of the SPM called the San Diego Test of Reasoning Ability
(SANTRA). Correlation between SPM and SANTRA tests was highly significant
(.90).
Khelefeeh and Lynn (2009) in a Qatari sample of 1135 students aged 6-11.5 (male N
The correlation of the SPM with both general intelligence test (full score) and a total
doing so the Fisher’s z transformation was employed (Garret and Woodworth 1966).
It is mentioned there, Garret and Woodworth 1966, that this transformation is more
117
stable and has open limits (not from -1 to +1 as for r). Each sample r is converted into
a new equivalent statistic z. The averaged z is then converted back to r. The following
table summarises the above studies about the SPM test concurrent validity and r to z
Table 4.3 Summary of studies on SPM test concurrent validity with r to z Fisher’s
transformation results
Researcher Country Year (N) IQ test r z
Tulkin & USA 1968 128 Lorg-Thorndike;(Verbal) 0.45 0.45
Newbrouhg Lorg-Thorndike;(Non-Verbal) 0.53 0.53
75 Lorg-Thorndike; (Verbal) 0.33 0.33
Lorg-Thorndike;(Non-Verbal) 0.52 0.52
50 Lorg-Thorndike; (Verbal) 0.40 0.40
Lorg-Thorndike;(Non-Verbal) 0.40 0.40
103 Lorg-Thorndike; (Verbal) 0.48 0.48
Lorg-Thorndike;(Non-Verbal) 0.45 0.45
Vincent & Cox USA 1974 131 WAIS; (Verbal) 0.84 0.84
WAIS; (Performance) 0.75 0.75
WAIS; (Full Scale) 0.85 0.85
Vincent & Cox USA 1974 97 Otis Gamma 0.70 0.70
58 Revised Beta 0.38 0.38
118
71 Quick test 0.60 0.60
Narayanan & India 1978 ---- Cattell’s Culture Fair 0.58 0.66
Paramesh
Karnes & USA 1988 649 Culture Fair Intelligence Test 0.46 0.66
Whorton
Zhang & Wang Chine 1989 ---- WAIS; (Verbal) 0.54 0.60
WAIS; (Performance) 0.70 0.87
WAIS; (Full Scale) 0.71 0.89
The correlation-means between the SPM test and the general intelligence and the
three intelligence subtests are found in the table below, table 4.4
Table 4.4 the average of the correlation between SPM test with intelligence tests
Sub-Tests N Z’ Means (r)
General intelligence 3623 0.80 0.66
Non-verbal 3726 0.68 0.59
Verbal 1904 0.54 0.49
Numerical 218 0.54 0.49
119
It can be seen in table 4.4 that the SPM test correlates highly with general intelligence
and non-verbal tests than with verbal and Numerical tests. Since the SPM test is a
nonverbal test, contains no verbal items, it is expected to have a high correlation with
General intelligence is an ambiguous word. On one side, it can mean the sum of all
cognitive abilities. This is the meaning when it is said that the Wechsler tests measure
general intelligence. On the other side, it can be considered as the common factor in
all cognitive tests, i.e. “g”. There are other cognitive factors in addition to “g”. The
SPM test measures the “g” factor in all cognitive abilities. This, therefore, explains
the reason why the SPM test correlates to a high degree with general intelligence tests
(Lynn, 2008).
According to the SPM test manual (2004), the external criterion usually adapted in
correlations with academic achievement tests generally fall in the region 0.20 to 0.60
with higher correlations being found with mathematics and science. Language and
Achievement Test (CAT) served as the criterion to relate the SPM test scores.
Correlation with CAT Reading, Language, Arithmetic and over all achievement
Tulkine and Newbrough (1968) with 356 black and white, high and low social class,
fifth and sixth grade students correlated the SPM test scores with Iowa Test for Basic
120
Skills (ITBS) achievement test. They found that for white high class (N=128) the
correlation was 0.30 with Vocabulary; 0.40 with Reading; 0.31 with Language; 0.39
with Work-study; and 0.39 with Arithmetic. For white low class (N=75) the
correlation was 0.25 with Vocabulary; 0.26 with Reading; 0.27 with Language; 0.41
The correlation between the SPM test and ITBS for black high social class (N=50)
was 0.39 with Vocabulary; 0.14 with Reading; 0.32 with Language; 0.36 with Work-
study; and 0.40 with Arithmetic. For black low class (N=103) the correlation was
0.32 with Vocabulary; 0.26 with Reading; 0.38 with Language; 0.33 with Work-study
and 0.39 with Arithmetic. In comparison, the correlation of SPM test to achievement
Sinha (1968) reported a correlation of 0.32 between SPM scores and grade point
average (GPA) with 220 students from art and science branches and a correlation of
0.36 with 204 engineering students from India. Dosajh, in his study in India as
reported by Sinha, (1968) found that the score on SPM could safely be taken as a
criterion for selection of students for technical and science courses. Dosajh’s
observation was based on the correlation of SPM test scores with examination scores
Mclaurin and Farrar (1973) concluded a low correlation between the SPM test and
grade point average (GPA). Correlation was .21 with a sample of 201 university
students in the USA. Though low this corerelation score is still within the range (.20-
.60) given by Domino and Domino (2006) and Reven (2004) as mentioned above.
GPA may base on course work and partly determined by motivation and essay writing
121
ability. Since the SPM is a non-verbal test it is no surprise that it will weakly
Baraheni (1974) evaluated validity of the SPM test in primary and secondary school
in Iran, by calculating correlation between scores on the SPM test and end of year
average school marks. A correlation of .44 was found with grade 6 (N=472), .29 for
grade 7 (N=360), .61 for grade 8 (N=203) and a correlation of .51 for grade 9
(N=643). Baraheni reported that the indices of the SPM test in predicting average
school marks in Iranian schools appeared to be as high as or even higher than the
Sinha (1977) in India found significant correlations between the SPM test and school
examination grades, .46 with grade eight (N=46), .47 with grade nine (N=5) and .38
with grade ten (N=35). The total correlation was .45 (N=86). Student’s age ranged
from 11-15 years old. Sinha found that the SPM test scores correlated significantly
with school examination grades in all groups except with grade nine which consisted
of only 5 students.As for the validity of the SPM test, he concluded that the results did
In another study in Nigeria, Maqsud (1980) investigated the validity of SPM test with
two different groups of primary school boys. A correlation which ranged from .19 to
.65 between the SPM test, English and Arithmetic was reported. He found a
correlation of .19 between the SPM test and English, and .38 with Arithmetic (N=60)
among primary school boys in traditional schools, and a correlation of .65 between the
SPM test and English, and .49 with Arithmetic (N=60) for primary school boys in
122
homes, whereas students from traditional schools came from lower-middle and lower
Maqsud concluded that a significant positive link between subjects' scores on the
SPM test and their achievement scores generally supported the theory that mental
ability is perhaps the best predictor of school achievement. Also he suggested that the
SPM test could be used for selection of secondary school intakes in Nigeria. Also, it
has been found by Chan (1982) that SPM test correlates well with non-verbal subtests
but rather poorly with numerical and verbal subtests of comprehensive scholastic
Powers et al., (1986.b) in their study with 426 students (225 boys and 201 girls), from
sixth and seventh grades, reported the following correlation between the SPM test and
CAT. For sixth grade boys (N=116) the correlation was .34 with Reading, .41 with
Language, and .39 with Math. For sixth grade girls (N=96) the correlation was .36
with Reading, .50 with Language, and .60 with Math. Total sample correlation for
sixth grade (N=212) was .35 for Reading, .45 with Language and .48 for Math.
The correlation for seventh grade boys (N=109) was .45 with Reading, .50 with
Language, and .52 with Math. For seventh grade girls (N=105) the correlation was
.54 with Reading, .55 with Language, and .56 with Math. Total sample correlation for
seventh grade (N=214) was .49 for Reading, .51 with Language and .54 for Math.
Correlation ranged from .34 to .60 for sixth grade and from .45 to .57 for seventh
grade students. For sixth grade the lower correlation of .34 was with boys in Reading,
and the higher correlation of .60 was with girls in Maths. For the seventh grade the
lower correlation of .45 was with boys in Reading and the higher correlation of .57
123
It was concluded that the validity coefficients were higher for the seventh grade than
for the sixth grade students. It was higher for females than males. Further, it was clear
that the coefficients increased from reading to mathematics. The result of the study
indicated that the SPM test had a moderate predictive validity that varied depending
Sidles and Avoy (1987) administered the SPM test and Comprehensive Test of Basic
Skills (CTBS), a standardised achievement test, to 124 Navajo (one of the largest
Indian tribes in America) seventh and eighth grade students ranging in age from 14 to
16 years old. They found a correlation of .38 with Spelling, .39 with Reading, .46 with
Mathematics, and .47 with Language. Correlations were also computed between SPM
test and CTBS for female and male subjects. Correlations for male subjects (N=62)
were .28 with Reading, .34 with Spelling, .34 with Mathematics and .39 with
Language. For female subjects (N=62) correlations were .51 with Reading, 52 with
Spelling, .56 with Mathematics and .58 with Language. They concluded that the
correlation between the SPM test and CTBS was higher for females than males.
Carver (1990) studied the relationship between reading ability and SPM test. He
found that a correlation between the National Reading Standards Test (NRST) and the
SPM test that ranged from .36 to .68. The sample consisted of 486 students from
grade 2 to 12, from a small town, rural school system in Mid-west USA. The
correlation was .45 with grade 2 (N=42), .36 with grade 3 (N=44), .42 with grade 4
(N=42), .68 with grade 5 (N=52), .51 with grade 6 (N=54), .39 with grade 7 (N=62),
.55 with grade 8 (N=42), .59 with grade 9 (N=53), .36 with grade 10 (N=50), .54 with
grade 11 (N=19) and .51 with grade 12 (N=26). A low correlation of .36 was with
grade 3 and 10 whereas a high correlation of .68 was with grade 5. The mean of the
124
five correlations for grade 2 to 6 was .48, and the mean of the six correlations for
grade 7 to12 was .49. Carver found no evidence regarding that the relationship
between reading ability and the SPM test increased with age. Also, he concluded that
general intelligence, as measured by the SPM test, had a strong and consistent
significant correlation between SPM and academic achievement. For the Arabic major
group, correlation between SPM and academic achievement was 0.39 (N=75). For the
Education major group, correlation between SPM and academic achievement was .34
(N=110).
Andrich, & Styles, (1994) believed that the progressive matrices test contained
material not taught directly in schools and yet showed substantial relationship with
Comprehensive Test of Basic Skills (CTBS) in a small sample (N=32) from second,
fifth and seventh grade students in San Diego city school. The correlation between
SPM and Language was .48; with Reading .42 and with Math .56.
Pind et al., (2003) examined the criterion-related validity of the SPM test, in relation
to the results of the Icelandic National Examination for students in 4th, 7th, and 10th
grades. Generally the SPM sample average lied close to the INE average. In addition,
correlation of the SPM scores with the INE scores was calculated. Correlation was
found to be variable. In fourth grade (N=53) correlation with Icelandic was 0.38
whereas 0.50 with Mathematics. These correlations were appreciably higher in the
seventh grade (N= 59), being, respectively, 0.64 and 0.75. The correlations were
slightly lower in the tenth grade (N=51), 0.53 with Icelandic and 0.64 with
125
Mathematics. Finally, the two foreign languages, English and Danish, showed
correlations of 0.48 and 0.59, respectively, with the SPM. It supported the theory that
the SPM test showed higher correlation with mathematics than with language
subjects. In general, these correlations are at the higher end of those found in similar
studies.
In 2007, Laidra et al., carried out the SPM test on 3618 students (1746 boys and 1872
girls) from all over Estonia in grades 2, 3, 4, 6, 8, 10, and 12 to investigate the
Pearson correlation was carried out to correlate between SPM test scores and GPA.
Correlation values were for grade 2 (0.54, p= 0.001; N=364), for grade 3 (0.46, p=
0.001; N=388; ), for grade 4 (0.49, p= 0.001; N=430), for grade 6 (0.53, p= 0.001;
N=609), for grade 8 (0.48, p= 0.001; N=697), for grade 10 (0.43, p= 0.001; N=642)
and for grade 12 (0.32, p= 0.001; N=488). The analysis showed that the SPM means
score increased with increasing age. It was concluded that there did not appear to be
large differences in the way intelligence and personality dispositions related to the
some traits had more effect in elementary school (e.g., Agreeableness) and others
relied most strongly on their cognitive abilities through all grade levels. Intelligence,
as measured by SPM test was found to be the best predictor of GPA in all grades.
The SPM test correlation with achievement tests (Vocabulary, Reading, Language,
Math, Work-Study and Spelling) the Fisher’s z transformation was employed. The
126
above studies about the SPM test predictive validity are shown in table 4.5. A detail
Table 4.5 Summary of the studies on SPM test predictive validity with r to z Fisher’s
transformation results
Researcher Country Year N Achievement r Z
Tulkine & USA 1968 128 ITBS test; Vocabulary 0.30 0.31
Newbrough ITBS test; Reading 0.40 0.42
ITBS test; Language 0.31 0.32
ITBS test; Work-study 0.39 0.41
ITBS test; Arithmetic 0.39 0.41
75 ITBS test; Vocabulary 0.25 0.26
ITBS test; Reading 0.26 0.27
ITBS test; Language 0.27 0.28
ITBS test; Work-study 0.41 0.44
ITBS test; Arithmetic 0.27 0.28
50 ITBS test; Vocabulary 0.39 0.41
ITBS test; Reading 0.41 0.44
ITBS test; Language 0.32 0.33
ITBS test; Work-study 0.36 0.38
ITBS test; Arithmetic 0.40 0.42
103 ITBS test; Vocabulary 0.32 0.33
ITBS test; Reading 0.26 0.27
ITBS test; Language 0.38 0.40
ITBS test; Work-study 0.33 0.34
ITBS test; Arithmetic 0.39 0.41
Mclaurin & Farrar USA 1973 220 Academic Achievement 0.21 0.21
127
Powers et al., USA 1986 116 CAT test; Reading 0.34 0.35
CAT test; language 0.41 0.44
CAT test; Math 0.39 0.41
96 CAT test; Reading 0.36 0.38
CAT test; language 0.50 0.55
CAT test; Math 0.60 0.69
Powers et al., USA 1986 212 CAT test; Reading 0.35 0.37
CAT test; language 0.45 0.48
CAT test; Math 0.48 0.52
109 CAT test; Reading 0.45 0.48
CAT test; language 0.50 0.55
CAT test; Math 0.52 0.58
105 CAT test; Reading 0.54 0.60
CAT test; language 0.55 0.62
CAT test; Math 0.56 0.63
214 CAT test; Reading 0.49 0.54
CAT test; language 0.51 0.56
CAT test; Math 0.54 0.60
Sidles & Avoy USA 1987 62 CTBS test; Spelling 0.28 0.29
CTBS test; Reading 0.34 0.35
CTBS test; Math 0.34 0.35
CTBS test; Language 0.39 0.41
62 CTBS test; Spelling 0.51 0.56
CTBS test; Reading 0.52 0.58
CTBS test; Math 0.56 0.63
CTBS test; Language 0.58 0.66
124 CTBS test; Spelling 0.38 0.40
CTBS test; Reading 0.39 0.42
CTBS test; Math 0.46 0.50
CTBS test; Language 0.47 0.51
128
Johnson et al., USA 1994 32 CTBS test; Reading 0.42 0.44
CTBS test; Math 0.56 0.63
CTBS test; Language 0.48 0.52
The correlation between the SPM test and both academic achievement and a total of 6
Table 4.6 the average of correlation between the SPM test and achievement tests
Sub-Tests N Z’ Means z to r
Academic achievement 6148 0.44 0.41
Vocabulary 356 0.33 0.41
Reading 1364 0.46 0.43
Language 1535 0.41 0.39
Maths 1298 0.54 0.49
Work-Study 356 0.39 0.37
Spelling 124 0.41 0.39
Total 11181 0.43 0.41
The highest correlations of the SPM test were with mathematics. This was in
agreement with the findings of most earlier studies. Carpenter, Just & Shall (1990)
showed that the SPM is largely a mathematical problem solving test in design format.
arithmetical and geometrical progression. Note, on the other hand, that the lowest
value of the correlations was with the vocabulary tests. This was due to the fact that
129
4.8 Item analysis of the SPM test
Item analysis indicates which item may be too easy or too difficult and which may fail
for other reasons. Thus makes it transparent to discriminate clearly between the better
and the poorer examinees (Ebel 1972). Brown (1971) mentioned that item analysis
has two purposes: First it enables us, by identifying defective items, to improve our
test and evaluation procedures. Second, through indicating which items or material
students have and have not mastered, we can plan, revise, and improve our
instructions.
It is worthwhile knowing that both the validity and reliability of any test depend
ultimately on the characteristics of its items. High reliability and validity can be built
into a test in advance through item analysis (Anastasi and Urbina 1997).
between students who have greater aptitude with the material tested (Brown,
1981).
In item difficulty, if most students answered an item correctly then the item was an
easy one. If most students answered an item incorrectly then it should have been a
difficult one (Brown, 1983). The higher the values of the difficulty index the easier
the item. This definition is somewhat illogical and has led some researchers to refer
(Ebel, 1972 and Nunnally, 1972). Nunnally (1972) and Burroughs (1975) argued that
130
their order of difficulty. The easiest is administered first so that to give a sense of
Item discrimination shows whether the test items differentiate between people of
varying degrees of knowledge and ability. It may be defined as the percentage of the
“high” group passing the item minus the percentage of the “low” group passing the
correct answers is higher in the upper group than in the lower group. A negatively
one in which the percentage of correct answers is about the same for the upper and
the total test score, was used to explore the SPM item discrimination (Brown, 1983;
Anastasi 1988 and Anastasi, Urbina 1997; Roid and Barram 2004; Kline, 2000; Kline,
2005). The greater the correlation of the item the more discriminating it is. That is, it
discriminates between higher and lower groups more effectively. For an item to be
valid, its correlation with the total score should be fairly high.
Ebel and Frisbie (1991, p.232) believed that the more items classified as highly or
moderately discriminating the better the test. Burroughs (1975) showed that an item
which does not discriminate between these groups, upper and lower, contributes
131
nothing to the establishment of an order of merit. It may be useful for warming-up
purposes though. An item which is easier for weaker students than it is for good
students would not only be a very curious item, but also one that detracts from the
The present study is making use of the SPM test as a measure of non-verbal reasoning
used the SPM test in a variety of settings including education, vocation, clinic and
15 developing, between 1948 and 2009. The developed country with the highest
number of SPM studies conducted was the United States, with 15 studies. Its
counterpart in the developing countries was India with a total of 5 studies. The earliest
study was in the USA (1948) while the latest in Qatar (2009). For clarity and easy
reference, the above studies are organised in Table 4.7. A thorough description of
each of the studies mentioned in the table is given below it. After presenting the
Table 4.7 A sample of worldwide studies that utilised the SPM test
COUNTRY YEARS REFERENCES
Congo 1994 Nkaya et al.,
Denmark 1968 Vejleskov,
Egypt 1987 Abdel-khalek,
Estonia 2004 Lynn, et al.,
France 1994 Nkaya et al.,
Hong Kong 1988 Lynn et al.,
Iceland 2003 Pind, et al.,
India 1968; 1968; 1972; Sinha, Mehot, Mohan, Rao and Sinha,
1974 and 1977
Iran 1974 Baraheni,
Israel 1991 Kaniel, & Fisherman,
Italy 1962 Young et al.,
Kuwait 2006 Abdel-Khalek and Lynn
132
Libya 1983;1991;2005 and Aboujaafer, and Majdub, Attashan and
2005 Abdalla and Ahlam
Mexico 2004 Lynn, et al.,
Nigeria 1980 Maqsud,
Oman 2009 Abdel-khalek and Lynn
Qatar 1986; 2009 Bart et al., ; Khaleefa, & Lynn,
Pakistan 2006 Ahmad, et al.,
Slovenia 2007 Boben
South Africa 2000; 2002; 2007 Rushton and Skuy, Rushton, et al., Taylor
Sudan 2008.b Khaleefa, et al.,
Syria 2008.a Khaleefa, & Lynn,
Tanzania 1967 Klingelhofer,
Turkey 1993 Duzen, et al.
UK 1962; 1962; 1963; Foulds & Dixon, Foulds et al., King, Lynn et
1988; 1989 and 1994 al., Egan and van den Broek and Bradshaw
USA 1948; 1966; 1968; Rimoldl, Bingham et al., Tulkin &
1969; 1972; 1973; Newbrough, Burke & Bingham, Burke,
1986.a.b; 1987; 1988; Mclaurin & Farrar, Powers et al., Sidles &
1988; 1986; 1986; Avoy, Jensen et al., Karnes & Whorton, Bart
1994 and 1994 et al., Whorton & Karnes, Johnson et al., and
Blennerhssett et al.,
The objectives of the investigation of these studies include the effects of the following
independent variables on the SPM test results: age, gender, variability, study levels,
region (cities and villages) and academic discipline (sciences and arts) and a
Since each study may investigate more than one variable, it was quite difficult to
group them under a certain variable. Alternatively the studies outlined in Table 3.2
and GDP per capita. It is, HDI, claimed to be a standard mean of measuring human
(UNDP), refers to the process of widening the options of people, giving them greater
133
opportunities for education, health care, income, employment, etc. The basic use of
HDI is to rank countries by level of "human development". The index was developed
in 1980 by the Pakistani economist Mahbubul Haq and Sir Richard Jolly with help
from Gustav Ranis of Yale University and Lord Meghnad Desai of the London School
of Economics. It has been used since then by the UNDP in its annual Human
Development Report. Nowadays the HDI is a pathway for researchers into the wide
longevity.
2007/2008).
countries are evaluated. Similarly comments and analysis are given at the end.
Rimoldi (1948) carried out the SPM test on USA children aged 9 to15 years. The
mean time for attempting the test for a population of 1680 subjects was 38 minutes
134
11.60); 11 (M = 28.82, SD = 10.49); 12 (M = 33.45, SD = 9.98); 13 (M = 35.90, SD =
that SPM mean scores increased with age and there was a drop in the mean number of
problems solved from Set A through C, there was no significant difference between
the means for Set C and D, and there was a final drop in Set E. In addition, analysis
showed one factor common to all of the sets of the SPM test.
Two earlier studies carried out in the UK by Foulds and Dixon (1962) and Foulds et
al., (1962) with adult psychiatric patients concluded that males were significantly
superior to females in SPM test results. Another early study was that of Young et al.,
(1962) in Italy who applied the SPM test to a random sample of elementary school
children in two regions. The children age ranged from 9 years and 6 months to 14
years and 6 months. Results showed that boys obtained higher scored than girls in the
city (mean percentiles: boys 59.06, girls 49.39), while in rural areas, girls scored
King (1963) in another study also in the UK found significant sex differences in
favour of girls in the SPM test. The boys age mean was 10.6 years and their SPM
score mean was 35.5; SD = 11.5. The girls age mean was 11.2 years and their SPM
score mean was 38.5; SD = 12.0. In total sample the SPM mean score was 37.1; SD =
11.9. Bingham et al., (1966) studied a small sample of patients (N=39) referred to
Vocational Counselling and Psychological Service in the USA. The subjects ranged
in age from 20 to 52 (mean age 36.1 year, SD = 7.7). The SPM mean scores was
40.6, SD = 11.80.
Tulkin and Newbourgh (1968) administered the SPM test to 356 fifth and sixth grade
students, from the suburban Maryland school system in the USA, to determine the
135
effect of past experiences related to race, social class, and gender on performance in
the SPM test. They found the following SPM test means with the eight groups; for
white high class females (N=64) was 41.1, SD =8.18; for high class white males
(N=64) was 42.2, SD =5.81, for low class white females (N=32) was 30.6, SD
=10.48; for low class white males (N=43) was 30.7, SD = 9.94, for high class black
females (N=23) was 39.7, SD = 6.71; for high class black males (N=27) was 39.0, SD
= 8.43; for low class black females (N=53) was 26.3, SD = 10.98; for low class black
They concluded that: (a) gender differences were not significant, (b) higher social
class and white subjects showed significantly higher SPM test scores and (c)
significant differences between races on the SPM test were found only in the lower
class students. The black low class scored significantly below the white low class.
Vejleskov (1968) in Denmark with 628 fifth grade children from two cities found that
boys (N = 174) and girls (N = 192) in Gentofte city had the same score (39.9) on
SPM, while Esbjerg city girls (N = 137) scored slightly better than boys (N = 125).
Boys mean score was 37.4 whereas girls mean score was 38.2. Also Vejleskov
noticed that boys, in general, worked faster than girls on SPM test. The SPM mean
Burke and Bingham (1969) in the USA concluded a SPM mean score of 41.2, SD =
11.5 for a sample of 91 male patients referred for vocational counselling (mean age =
35.1 year).
Another study by Burke (1972) investigated 567 SPM answer sheets of veterans
(black and white) who had taken the SPM test when referred for vocational
136
counselling. Veterans means age was 35.5, SD = 9.1 months (age range 16 to 64
Mclaurin and Farrar (1973) in their study on 96 male and 105 female university
students in America concluded that the SPM did not have sufficient ceiling for
university students as indicated by the closeness of the SPM mean score 50.39, SD =
6.50 to the maximum score possible. Vincent and Cox (1974) studied a sample of 380
psychiatric patients which was taken from psychological files of the Texas Vocational
Rehabilitation Unit. Most of the sample either had a physical, emotional, or mental
disability. The sample mean age was 28.7 year and consisted of 57 % white, 36 %
black and 7 % Latin Americans. The SPM mean score for the total sample was 39.25,
SD = 12.00. They concluded that the SPM test is a viable tool for measuring
Bart et al., (1986) compared the performance of 273 Qatari students (151 boys with a
mean age of 12.97 years and 122 girls with a mean age of 12.63 years) on the SPM
test to that of 281 American students (150 boys with a mean age of 12.37 years and
131 girls with a mean age of 12.70 years) in the fifth, sixth and seventh grades.
American students scored higher (M=43.39) than the Qatari students (M=30.24), and
also they added that males students performed better than females and older students
tended to perform better than younger students. They did not report any data
level.
Powers et al., (1986.a) carried out a study in the USA on 127 Hispanic (69 boys and
58 girls) and 103 Anglo- American (53 boys and 50 girls). Mean age of students was
11.6 year. Students were enrolled in grade 6 of four elementary schools of a large
137
urban school district in the South west of USA. Hispanic and Anglo-American
students were compared for their overall scores on the SPM test. When the total mean
score of Hispanic students (M=38.43, SD = 7.45) was compared to that of the Anglo-
Powers et al., concluded that these result support the continued use of the SPM test
In another study by Powers et al., (1986.b) in the USA to examine gender differences
in performance on the SPM test, they administered the SPM test to 212 sixth grade
students (116 boys and 96 girls) and 214 seventh grade students (109 boys and 105
girls). The ethnic background of the students consisted of Native American, Black,
Hispanic, and non Hispanic Caucasian. The students were from four schools that
ranged in socio-economic status from lower middle to upper middle SES in urban
school district in the South west of the USA. Sex differences in performance on SPM
test were examined at each grade level. Sixth grade boys' mean 38.81, SD = 6.84 did
not differ significantly from girls' mean 39.26, SD = 7.35. Seventh grade boys' mean
score of 39.48, SD = 8.06 and girls' mean of 38.88, SD = 8.21 also did not differ
significantly.
Sidles and Avoy (1987) administered the SPM test to 124 Navajo students (62 boys
and 62 girls, age 14 and 15 years), in seventh and eighth grade, in Arizona and New
Mexico. They reported that the raw scores mean for females was 39.85 and for males
was 39.88. Mean score for seventh grade students was 38.83, while the mean for
eighth grade students was 40.11. Mean score of SPM test for total students was 39.86.
They noticed that this mean was lower than that obtained for the United Kingdom
students of similar age group during the 1981 standardisation of the SPM test. They
138
concluded that the SPM test had potential for being included by school psychologists
Lynn et al., (1988) carried out a study in the UK and Hong Kong with 120 boys and
77 girls from Hong Kong and 75 boys and 95 girls from the UK. The students mean
age was 10.5 years, and the British students were Caucasian. They found that, the
Hong Kong boys and girls both obtained significantly higher mean on SPM than their
British counterparts. The Hong Kong boy’s SPM mean percentile was 71.48; SD =
20.00 and Hong Kong girls’ SPM mean percentile was 68.44; SD = 21.34. The higher
mean obtained by Hong Kong boys as compared with Hong Kong girls was not
significant. British boys and girls in this study obtained identical means equivalent to
In the USA, the SPM test was administered by Jensen et al., (1988) with time limits of
40 minutes to a total of 261 undergraduate’s students. The overall SPM mean was
51.32, SD = 4.69.
Mississippi US, Whorton and Karnes (1988) found that the SPM mean for the total
sample was 32.2, SD = 11.2. The sample consisted of 70 black and 237 white
students; 142 were girls and 165 boys. The mean age was 10.8 years with a range
from 8.3 to 15.7 years. For black students the SPM mean score was 25.4, SD = 9.9
(N=70). The SPM mean score for white students was 34.3, SD = 10.7 (N=237). The
means difference between students on the basis of race was significant. In another
study also in Mississippi by the same researchers (1988) the SPM was administered to
625 students in a rural a county elementary school (grade 3 to 8). 441 white students
139
and 211 black students with a mean age of 8.10 years carried out the test. Of them 410
students were on free or reduced lunches, and 245 students on paid lunches. The SPM
mean for students on free lunch was 29.7, SD = 10.9 whereas for students on paid
Egan (1989) in the UK with a sample of 94 (43 male and 51 female) trainees, with a
mean age of 16.7 years, SD = 9.7 months that had been unemployed for 6 months
following leaving school, administered the SPM with a 30 minutes time limit. The
SPM mean for the total sample was 36.5, SD = 9.9; the SPM mean for males was
38.4, SD = 9.8 and for females was 34.6, SD = 9.8. Gender difference was not
significant.
The second investigation about the SPM in Libya was carried out by Majdub (1991)
who administered the SPM to two groups that consisted of 193 students (68 males and
125 females) from Tripoli University. He found that the Education major group had
significantly higher means than the Arabic major group. For the Arabic major group
the SPM mean was 34.40, SD = 9.13 (N=81). For the Education major group the SPM
mean was 39.14, SD = 9.08 (N=112). Majdub concluded that differences between the
two groups with respect to SPM, in favour of the education groups, maybe due to the
Nkaya et al., (1994) claimed that comparisons of intelligence test scores of individuals
shown high disparities in favour of western subjects regardless of the type of the test.
For example, they administered the SPM test three times to students in France and
Congolese (45 boys and 43 girls with a mean age of 13.3 years) and 68 French (36
140
boys and 32 girls with a mean age of 12.3 years) who were in the sixth year of
schooling. Neither the French nor the Congolese students had ever been administered
an intelligence test. The test situation, however, was much more familiar to French
students due to exposure to material and educational games similar to materials used
The SPM test was administered to the same standards three times (T1, T2 and T3) at
two weeks intervals. The test was self-paced but students were encouraged to work
rapidly. Time and items solved correctly after 20 minutes were recorded. For self-
paced conditions, the SPM test means scores for French students in test 1 was 46.9,
SD = 5.9; test 2 was 49.4, SD = 4.9; and test 3 was 49.1, SD = 4.6. For the Congolese
students SPM test mean for test 1 was 29.6, SD = 11.6; test 2 was 33.0, SD = 11.9 and
test 3 the mean was 32.5, SD = 12.0. The means of the SPM test for timed condition
for French students in test 1 was 40.4, SD = 5.2; test 2 was 48.0, SD = 5.2 and test 3
was 48.5, SD = 5.0. For Congolese the SPM test timed mean in test 1 was 23.5, SD =
9.3; test 2 was 29.5, SD = 11.1 and in test 3 was 32.0, SD = 12.1.
They concluded that student’s scores increased more rapidly from test 1 to test 2 than
from test 2 to test 3 especially when the test was timed (7.6 points increase for French
and 6 points increase for Congolese). There were no improvements for the French
self-paced mean between test 2 and test 3 (- 0.3 points) and 3.4 points increase for
Congolese. There was little improvement (0.5 points) in the mean for timed condition
for French students between test 2 and test 3, and for the Congolese there was an
increase of 3.4 points. From test 1 to test 3 with timed condition there were 8.1 points
increase For French and 8.5 points increase for Congolese students. In general the
141
performance on SPM test was higher for French students than for Congolese students
In a study by Johnson et al., (1994), a sample of 449 second, fifth and seventh grade
students in San Diego city school were given the SPM test. In this group, 77 students
were African American, 122 Asian, 54 Filipino, 156 Latino and 40 White American.
Of these 215 were boys and 234 were girls. The mean age of the children was 11
years (age range from 6 years, 8 months to 13 years 10 months). The SPM mean
In the UK, van den Broek and Bradshaw (1994) administered the SPM to normal and
patient samples. The normal sample was 77 subjects (58 females and 19 males), all of
them were native English speakers and none had a history of psychiatric or
neurological disorder. The patient sample was 75 native English speaking (42 male
and 33 females). The patient sample was allocated to one of three groups: left-
age for normal sample was 35.2 year, SD = 12.8 months, for left-hemisphere 48.3
year, SD = 16.7 months, for right- hemisphere was 48.8 year, SD = 17.1 months and
for bilateral lesions was 60.4 year, SD = 12.4 months. The SPM mean scores for the
normal sample was 47.3, SD = 8.2; for bilateral sample was 21.2, SD = 11.2; for left
sample was 33.8, SD = 12.6; and was 30.0, SD = 14.5 for the right sample.
For the use of the SPM with deaf subjects, in a survey by Levine’s (1974) the
Ravens’s Matrices test ranked in the top ten for frequency of use with deaf subjects.
Armfield (1985) administered the SPM to 240 deaf/mute students from South China
and concluded that the SPM appeared to be helpful as a tool for teachers making
142
A study by Blennerhssett et al., (1994) with 102 deaf residential adolescents showed a
SPM test mean of 33.98, SD = 10.80. The mean age was 14.7 years with a range
from 10 to 19 years. They concluded that the SPM test appeared to be suitable for
especially useful when a quick screening technique was needed for deaf adolescents.
Pind et al., (2003) carried out the SPM test on Icelandic school children aged 6 to16
years. A total of 665 children were tested and the standardization sample consisted of
550 of the 665 children. The median total score rose from 23 in the 1st grade to 50 in
the tenth grade. Scores increased regularly with increasing age. Icelandic norms were
higher 2 to 3 points than UK norms. Performance of girls and boys on the SPM was
compared. Average score of girls in the standardisation sample was 40.1 with boys
gender was not significant, F (1,530) = 0.61, P=0.434, as was the interaction of gender
and grade, F (9,530) = 0.65, P= 0.759. The effect of geographical district was also not
significant, F (7,542) = 0.89, P=0.516. It was concluded that grade, or age, was the
only factor in this study which had a significant effect on the children’s SPM score.
Lynn et al., (2004) conducted an SPM test on an Estonian sample to investigate any
sex difference. 2738 adolescents (1250 male and 1439 female) attending 6th, 8th, 10th,
11th and 12th grades carried out the test. Overall, females obtained a higher mean than
males. Female obtained higher means by (3.8 IQ points) than males in the ages of 12
to 15 year, whereas males obtained higher means by (1.6 IQ points) than females in
the age average 16 to18 year. Overall, males had statistically significant larger
143
variance than females. Also lrwing and Lynn (2005) established sex differences on the
PM among university students. Men obtain significantly higher scores than females.
In 2007, Duzen et al., began the process of standardization of the SPM test in Turkey
in an aim to identify gifted children. An overall 2458 students were tested (1170 girls,
1288 boys; aged between of 6½ to 14½ years) 1341 students were from rural origins
while the reaming 1117 were from urban. Results obtained showed that students from
urban origins obtained significantly higher scores than students from rural origins:
they also showed that grade predicts SPM scores more accurately than age.
In 2007, Boben conducted an SPM test on 1,556 children and adolescents aged 7.5 to
18 years in Slovenia 53% were male students. 9 items were shown to be misplaced in
difficulty (A6, A9, A10, B9, B10, B11, C5, C7, C9). Both Cronbach Alpha and split-
half tests showed a (0.95) reliability. This study showed that subgroups differed in
statistically significant ways in relation to sex (F =13.13, p = 0.00) and age group (one
year intervals) from 8 to 18 years (F = 76.48, p = 0.00), but not in the interaction
between them (F = 0.65, p = 0.77). A more detailed analysis showed that sex
differences occurred only in older age groups. T-test revealed statistically significant
differences for age groups of 16-year olds (p = 0.02), 17-year olds (p = 0.01) and 18-
year olds (p= 0.04). Nevertheless, statistically significant differences regarding sex
Some important features are to be noted about these studies. In the first hand most of
the studies have selected their samples randomly and with adequate sizes. Few studies
have not mentioned their selection procedures, such as Mclaurin and Farrar 1973 in
USA; Vancent and Cox 1974 in USA; and Brook and Bradshaw 1994 in UK. In some
studies, neither the sample size nor selection criteria have been reported. Examples of
144
such studies are: Young et al., 1962 in Italy; King 1963 in UK; and Mclaurin and
Farrar 1973 in USA. Since the larger the sample size the more representative it is of
the behaviour domain, a total of 5 studies have taken advantage of this fact. These
include: Lynn et al., 2004 in Estonia; Duzen et al., 2007 in Turkey; and Boben 2007
analysis.
So far the analysis was concerned about the sample selection and size. Now the
attention will be paid to the characteristics of the samples themselves. Along with
healthy people, a number of SPM tests were conducted on patients with physical and
patients. These types of studies were not included in the meta-analysis chapter
(chapter 6). Other studies took into account various variables such as the economical
status of the subjects. The criteria upon which lower and upper classes were
distinguished were not among those adopted in the field of economics though. As an
example, the study conducted in the USA by Kernes and Whorton (1988) on a sample
of students classified them into two categories: those on paid launch representing the
upper class; and those on free launch representing the lower class.
As a final remark, only 2 out of 10 studies performed their SPM test on rural and
urban residents. These were carried out by Duzen et al., 2007 in Turkey and Young
1962 in Italy. This element, difference between urban and rural lives, had a noticeable
effect on the SPM test. Ignoring it will render the sample ill-representative.
145
4.9.2 Studies on SPM test in developing countries:
Klingelhofer (1967) administered the SPM test with a time limit of 30 minutes to
African and Asian secondary school students in Tanzania. The African sample
consisted of 2963 students (2125 males and 838 females) and the Asian sample
consisted of 729 students (415 males and 314 females). The mean age for the four
groups were; African boys 17.1 years, African girls 16.1 years, Asian boys 14.8 years
and Asian girls 14.3 years. The SPM test mean scores were 34.3 for African boys,
34.1 for African girls, 43.9 for Asian boys and 41.7 for Asian girls. There was no
statistical significant difference in mean scores between African boys and girls, and
performance on the SPM test. There was a significant mean difference between Asian
and African students in favour of Asian students, also Asian boys scored better than
Asian girls. Klingelhofer, claimed that the significantly better performance of Asians
than Africans on the SPM was probably associated with a number of cultural factors
that differentiate the two group. e.g. Asian children start school early, have literate
parents and live in towns where they have daily contact with stimuli of modern life,
whereas African come from rural environment and low income families.
Sinha (1968) reported the following means for both sexes from rural and urban
population from India. For rural boys the SPM mean scores were 22.50 at age 12
years; 26.50 at 13 years and 27.10 at 14 years. For urban boys the SPM mean scores
were 24.00 at 12 years; 27.40 at 13 years and 29.10 at 14 years. For rural girls the
SPM mean scores were 26.83 at 13 years and 30.00 at 14 years (no data for age 12).
For urban girls the SPM mean scores were 25.50 at age12 years; 28.90 at age 13 years
and 30.10 at age 14 years. Sinha concluded that urban children scored higher than
rural children, and girls scored higher than boys in both rural and urban areas. In the
146
same study Sinha reported that the SPM mean score for Art-Science students was
47.84, SD = 4.46 (N=220) while the SPM mean score for Engineering students was
From India, Mohan (1972) administrated the SPM test to 310 university and college
students (165 females and 145 males) with an age range of 18 to 25 years. Mohan
reported the following means; for males mean score was 46.48, SD = 7.32; the mean
score for females was 43.88, SD = 7.70. Mohan found that the mean score of 45 on
SPM test corresponds to the 50% as given by Raven for the age range 14 to 25. Also
there was significant difference on SPM test scores favouring male students.
Another study from India by Rao (1974) administered a shortened version of the SPM
college students with a mean age of 18.10 years. Rao found the following means; the
mean for Engineering students mean (N=452) was 54.14, SD = 3.9; Agricultural
students (N=207) was 46.42, SD = 6.55; Science students mean (N=769) was 45.18,
SD = 7.82; Education students (N=219) was 42.84, SD = 8.51; Art students (N=487)
mean was 41.28, SD = 8.30; and Commerce students mean (N=122) was 39.76, SD =
8.19. Also, Rao compared the SPM test means of high and low academic achievers
and found that the mean of high achievers (N=106) was 53.26, SD = 3.04; while the
mean of low achievers (N=106) was 51.37, SD = 3.87. In the same time the mean
scores of high achievers in the achievement test was 18.32, SD = 3.2; while the mean
scores of low achievers in the achievement test was 2.48, SD =1.3. In comparison
between SPM and achievement tests, Rao concluded that the SPM test scores failed to
discriminate between the high and low academic achievers. Nevertheless he claimed
147
that the Standard Progressive Matrices test was as good as any other test of
Baraheni (1974) carried out a study in Iran. The study was designed to cover a
and secondary schools in Tehran. Baraheni found that Iranian boys scored higher on
the SPM test than Iranian girls. The differences were statistically significant from age
9 up to 13 years. He mentioned that the slight superiority of boys over girls on the
SPM test might reflect the fact that progressive matrices measures, in addition to a
general factor, a spatial dimension in which boys have been found to excel girls. He
also added that although a steady increase in SPM test scores was observed at
successive age levels, both in males and females, the magnitude of differences at
some age levels was very small, especially after 15 years of age. Baraheni claimed
that this steady increase in average performance which was significant up to age 15
was in accordance with data reported by Raven. The SPM mean for age 17 years was
37.93; SD= 11.41; and N=256. The SPM mean for age 18 years was 39.36; SD=
10.34 and N = 304. Baraheni concluded that on the basis of his data, the SPM test
was an efficient test of general intelligence for use with Iranian children.
Sinha (1977) also from India administered the SPM test to an indian sample which
consisted of 100 boys and 100 girls aged 11 to 15 years. Sinha, reported the following
total means for the performance of students on SPM test according to age; for age 11
years mean was 27.25, SD = 9.30; for age 12 years was 27.25, SD = 8.90; for age 13
was 30.30, SD = 10.50; for age 14 years was 33.00, SD = 9.40; and for age 15 years
mean was 32.25, SD = 11.20. Sinha concluded that with increase in age, there were
some increases in SPM test means for Indian students from age 11 to 14 years. Also
148
the means of the Indian students were very low compared with Raven's British norm
for children at the same age. In the same study, Sinha found that science students
scored higher than art students on the SPM test in Indian sample. In addition, he
reported that Shanthamani’s in1970 found similar results on Alexander’s Battery for
intelligence test.
Maqsud (1980) in Nigeria administered the SPM test to 120 primary school students
with an average age of 12.2 years for the students in a modern school and 12.6 years
for the students in a traditional school. Sixty students were randomly drawn from a
modern school (upper-middle class homes), and 60 from a traditional school (lower-
middle and lower class families). The mean score of the SPM test for students from
the traditional school was 23.25, SD = 3.49 while the mean score for students from
the modern school was 20.85, SD = 4.27. The mean score of SPM test for students
from the traditional school was found significantly higher than for students from the
modern school.
The first investigation of the SPM in Libya was that of Aboujaafer (1983) who
studied pupils’ achievement in preparatory schools in Tripoli. The SPM test was
administered to a sample of 201 boys and girls who were in grade 8. The age mean
was 14 years. The boys SPM mean was 35.40; SD = 10.40; (N=100). The girls SPM
mean was 33.50; SD = 10.80; (N=101). The SPM mean for the total sample was
34.50; SD = 10.60; (N=201). The difference between boys and girls means was not
significant.
undergraduates, 205 males with a mean age of 24 years and 247 females with a mean
149
Language and English Literature. Mean scores for males was 44.2, SD = 7.8; while
for females was 40.8, SD = 8.4. Abdel-khalek claimed that gender differences which
emerged in the study may be related to social factors in an eastern society, but did not
mentioned these factors. He stated that, in brief, the SPM test may provide a
Kanil and Fisherman (1991) compared the performance of 250 Ethiopian Jews (115
boys and 135 girls, with average age of 14.7 years) on the SPM test to that of 1740
Israeli Jews ages 9 to 15 years. The mean for Ethiopian Jews aged 15 and 16 years
was 27.0, whereas mean for Israeli children aged 9 and 10 years was 28.0, and mean
for Israeli aged 14 and 15 years was 45.0. They concluded that the SPM test mean for
the Ethiopian Jews aged 15 and 16 years was very similar to the mean of Israeli aged
9 and 10 years. They added that when the two culture groups were roughly matched
for total score in the SPM test (mean score obtained by 9 year old Israelis and 14 year
old Ethiopians); they exhibited the same pattern of distribution of errors in the SPM
test. They claimed that these results suggested that the performance of Ethiopian
Jews reflected a developmental delay, and not a different cognitive style. They added
that the SPM test scores merely told us how Ethiopian Jews compared to the Israeli
children at this point in time, but they did not tell us about their response to new
learning situations.
Rushton and Skuy (2000) carried out a SPM test to 309 students (17 to 23 years) in
South Africa (137 Africans, 136 whites; 104 men, 205 women). The test aimed to
(ANOVA) with race and sex as factors showed significant main effects and a
marginally significant interaction, F (1,305) = 131.85, p < 0.001; F (1,305) = 8.89, p <
150
0.01; and F (1,305) = 3.67, p < 0.10. Men averaged higher scores than women (M =
50.47; SD = 7.9) The 1993 US norms for 18- to 22-year-olds show that White men,
with 54 out of 60 correct responses, averaged at the 61st percentile; and that White
women, with 53 correct responses, averaged at the 55th percentile; and that African
men, with 46 correct responses, averaged at the 19th percentile; and that African
women with 42 correct responses averaged at the 11th percentile. These SPM grades
and percentile points were converted to IQ equivalents of 105 for Whites and 84 for
African. Males also averaged slightly higher than females. In addition, item analysis
(difficultly and discrimination) was carried out. Percentages were used to calculate
item difficulties between whites and africans across the 60 items. For all groups, set E
was the most difficult followed by set C and then D. Sets A and B were the easiest.
item as ``too easy,'' 54 of the 60 items (90%) proved as being too easy for Whites and
41 of the 60 items (68%) too easy for Africans. Overall, Africans found the items
more difficult than did the Whites, as did women compared to men. For calculation of
to Hopkins (1998) Index of Discrimination and Items Evaluation, the number of items
that were considered as having excellent discriminating value was 41 items for
africans and 13 for whites, good discriminating value were 10 items for africans and 7
for white and fair discriminating value were 6 items for africans and 18 for whites.
In 2002, Rushton et al., administered the SPM test to 342 university students (198
African, 86 whites, 58 Indians; 271 men and 71 women). The White, Indian, and
African mean scores were, in order, 56, 53, and 50 out of 60 (S.D. = 2.6, 4.9, 6.4;
ranges = 46–60, 37–60, 11–60). Men averaged similar scores to women (unweighted
means = 52.9, 52.5; S.D. = 5.0, 3.3; ranges = 11–60, 35–60). Analysis of variance
151
(ANOVA) with race and sex as factors showed a significant main effect only for race,
with no effect for sex either as a main effect or in interaction, F(2,342) = 24.23, P
< .001; F(1,342) < 1.00; and F(2,342) < 1.00. For the total score, the African–White
difference was 1.00 S.D. (based on total S.D. of 6.05). The 1993 USA norms for 18 to
22 years showed the Whites at the 75th percentile, the Indians at the 55th percentile
and the Africans at the 41st percentile. These translated into IQ equivalents of 110,
102, and 97, respectively. Item analyses were measured by the proportion getting the
correct answer. Item analyses was very similar for Africans, Indians, and Whites
(r > .90; r >.79, P < .01) suggesting that the test measured the same construct in all
three groups. Using a proportion of 70% of respondents passing as the criterion for
judging an item as ‘‘too easy’’ 57 of the 60 items (95%) proved too easy for Whites,
53 or 88% for Indians, and 50 or 83% for Africans. Also the item-total correlation for
each item was calculated using the point-biserial correlation of each item’s pass or fail
Lynn et al., (2004) carried out a sex difference SPM test in Mexico. The SPM was
administered to a sample of 920 (aged 7 to 10 years old) children (472 males and 448
females) from three different ethnic groups. Analysis of variance showed a statistical
significant age affect (SPM scores increased with age), and no statistical significant
gender affect. This study showed a very small overall gender difference in the SPM
A third investigation conducted in Libya was carried out by Ahlam (2005) to evaluate
achievement. An SPM test was conducted on 240 (16 and 17 years) students (120
males and 120 females). Mean scores obtained for males was (M=38.31 and
152
SD=8.53) whereas that for females was (M=35.68and SD=7.73). Total mean scores
was (M=37.00 and SD=9.23). Results showed gender difference in favour of males.
Also analysis showed the correlation between SPM mean scores and students’
A fourth investigation in Libya was carried out by Attashan and Abdalla (2005) to
Mean scores obtained for males was (M=40.50 and SD=8.80) whereas that for
the other hand, arts students mean scores was (M=35.82 and SD=8.09) while that of
science students was (M=44.54 and SD=7.73). Significant difference in the mean was
in favour of science discipline students. Total overall mean scores was (M=40.36 and
SD=9.21). In addition, analysis showed the correlation between SPM mean scores and
Abdel-Khalek and Lynn in 2006 investigated sex difference on the SPM test in
Kuwait, on a sample of 6,529 (8 to 15) year old students (boys 3278 and girls 3251)
from six different districts in Kuwait. In each district, one socially representative
elementary, intermediate and secondary school for boys and one for girls were
randomly chosen from a list of schools. Children were tested in classes which were
randomly selected. The selection of school districts used a stratified random sampling
procedure. Study results showed that girls obtained significantly higher means then
boys among 8,9,10 and 14 year olds. No statistically significant differences were
found among 11, 12, 13 and 15 year olds. Overall girls’ advantaged in the total
sample statistically significant higher mean scores (M = 35.75 SD = 11.49) than boys
153
(M = 34.81 SD = 12.11) p = < 0.001 although it is very small at .08d, equivalent to
Taylor in 2007 carried out a study in South Africa on 144 female and 199 male job
applicants, of whom 46.9% were Black and 41.8% White. The average age was 33.8
years. The mean SPM scores was (M=44. 65, SD=11.94). Scores on the SPM were
compared across gender and ethnic groups using an independent samples t-test. Males
scored a mean SPM value of (M=44. 69, SD=12.64) whereas females scored (M=44.
45, SD=11.28). The results of the t-test across gender groups showed that there were
no significant differences on the SPM score. The black ethnic group scored a mean
SPM value of (M=41. 20 and SD=13.06) whereas the white ethnic group scored a
mean SPM value of (M=48. 21 and SD=9.33). The White group on average scored
significantly higher than the Black group. Although this finding may cause some
concern at first, it is important to consider the context in which the test was
administered.
Kaleeefa and Lynn (2008a) carried out a standardization of the Standard Progressive
the test (1739 male and 1750 females). Results showed no sex difference. There was
It has frequently been asserted that there is no sex difference in general intelligence
but that males have greater variability than females. This assertion was made in the
early years of the twentieth century by Havelock Ellis (1904), Thorndike (1910) and
When they found that there is no sex difference in general intelligence, a greater
154
variability among males entailing more males among those with very high intelligence
(as well as more males with very low intelligence) seemed to provide a solution to this
problem. Kelefeeh and Lynn investigated sex difference in variability. There was no
consistent answer. Overall, girls had greater variability than boys. In 7 age groups
boys had greater variability whereas girls had greater variability in 4 age groups. In
the sample considered as a whole; girls had greater variability than boys. This study
also showed that average SPM scores were lower in developing countries when
Matrices in Sudan for 6202 participants for ages 9 through to 25 years. They analysed
the data for sex difference in mean and variability. The study showed no sex
means from age 14 through to 18. At 19 years, males did not have significantly higher
addition, results showed no consistent sex difference in variability. Males had greater
variability in 7 age groups whereas females had greater variability in 5 age groups,
2004 to 2006. The sample consisted of adolescents aged 12 to 19 years and adults
schools in four provinces into which Pakistan is divided (North West Frontier,
Baluchistan, Sindh and Punjab) and were tested in groups. The adult sample consisted
of 2,016 participants (1,019 females and 997 males). The results overall suggested
addition, in most age groups, females had greater variability than males. The mean
155
scores of the Pakistani sample were lower than those obtained by standardization
Abdal-Khalek and Lynn 2009 investigated the SPM on 5,139 school students aged
9 to 18 years with approximately equal numbers of males and females, drawn from
representative school students and 92 university students (43 male and 49 female)
in the capital city of Oman (Muscat). They reported an average of 85 for school
students and 93.7 for university students. There were no significant gender
differences among the 9 to17 year olds, but at age 18 years males obtained a higher
Khelefeeh and Lynn (2009) conducted a study to evaluate the SPM test norms in a
Qatari standardization sample, 1135 students aged 6-11.5 (male N = 517 and female N
this study reported an average IQ of 88. This difference was attributed to possible
the Middle East that failed to showed greater male variance in SPM scores. This study
showed in total sample that females obtained higher mean scores (M = 25.7 SD =
11.34) than males (M = 23.7 SD = 9.98). Furthermore, the analysis showed that SPM
Generally the studies performed in the developing countries had clarified the sample
selection procedures in details including random selection and large sample sizes.
Abdel-khalek and Lynn in Kuwait (2006), for example, carried out an SPM test on a
number of 6529 students; Khaleefe et al., (2008b) tested 6202 subjects including
children and adults. Comparing to the studies in the developed countries, the largest
156
sample composed of 2738 children in Estonia managed by Lynn et al., (2004).
Furthermore, the analytical methods employed in many studies were identical to those
used in the developed countries studies. Lastly, it should be noted that more modern
Although the studies in the developing countries had covered various variables and
Firstly, some studies lacked the description of the sample in terms of sample age
the differences between rural and urban areas, only one study evaluated this
Unlike studies performed in the developed world, there was a study among those
done in the developing world that had employed an incomplete SPM test. Rao
(1994) in India had used 45 test-items out of 60 items, which were designed for the
review of earlier studies has revealed that the SPM test is, without any doubt, a
157
The Progressive Matrices Tests resulted from the work of the British psychologist
John C. Raven and geneticist Lionel Penrose around the thirties of last century.
Matrices are probably amongst the most widely used culture-fair tests. They exist
The SPM test is a non-verbal ability test consisting of increasingly difficult sets. It
was first fully standardised by Raven for children. Later on, the test was re-
standardised for adults. Standardisation took place in different countries both in the
developed and developing world. Since its introduction, several checks were run to
Literature showing the reliability, validity and item analysis characteristics of the
SPM were presented and discussed. To determine the reliability of the SPM test
accurately a single technique is not sufficient. Therefore three methods have been
reliability. The average scores of the three tests were found to be 0.93 for test re-
test after two weeks interval; 0.90 for split-half test; and 0.95 for alpha (Kuder-
Likewise, to firmly establish the validity of the SPM test one should look at the
validity and construct validity. It was found that the SPM test can be used in cross-
cultural contexts due to its culture-fair reliability. The majority of the examined
studies showed that the SPM test is a measure of the intellectual ability “g” only
158
Furthermore literature showed that the correlation of the SPM concurrent validity
with standard intelligence ranged from 0.50 to 0.80. Whereas, the SPM predictive
validity correlation with academic achievement tests generally fell in the region of
0.20 to 0.60.
Studies that focused on item analysis, item difficulty and item discrimination, of
the SPM test were presented. Those which employed the SPM in different cultures
were also mentioned and evaluated. It can be concluded that the SPM test has been
anthropological all over the globe. This is essentially due to its high degree of
Next chapter will focus on the work flow of this study. It will shed light on the on-
ground tests conducted and their related work. In addition, it presents the
methodology adopted in the research, materials such as statistical software and the
159
Chapter five: MATERIALS AND METHODS
5.1 Introduction
This chapter outlines and critically analyzes methods and approaches employed in this
study. Chosen methodologies were explored and contributions offered were also
subjected to critical appraisal. Statistical techniques for data analysis were justified
and evaluated for their suitability. Ethical issues relating to data collection and data
The intent of any research is to create new knowledge through systematic enquiry.
Research is governed by scientific principles that vary from one discipline to another
(Gomm & Davies, 2000). Quantitative research approaches are applied to describe
quantitative research approach was used in this study due to the numerical nature of
the data and large sample size tested. Qualitative research methods were not
appropriate for this study as the only available method to measure intelligence was by
conducting a test. Quantitative research designs can be divided into experimental and
is manipulated, while the remaining variables are controlled, and the effect on one or
category of non-experimental designs was the survey and correlational designs, which
was employed in this study (Gay 2006, and Lobiondo-Wood, Haber 2006).
160
5.3 Methodology
Two main activities were employed in this study: first, a survey using the standard
progressive matrices (SPM) test was conducted to obtain preparatory data from a
Libyan sample. Second a meta-analysis was performed to compare the SPM test
In survey designs, subjects are selected and an investigator carries out a test,
and other types of information (Creswell, 2000). Usually, research is designed so that
information regarding a large number of people (population) can be inferred from the
addition, correlational designs are useful when exploring new topics, or topics that
frequency distributions, means, standard deviations and charts for the obtained sample
was carried out to present an overview regarding performance in the SPM test and to
compute percentile ranks (norms) according to sample age levels (8 to 21 years old).
according to their gender, age groups and regions (developing and developed
countries, and urban (cities) and rural (villages)). A correlational design was used to
161
test and Student's Academic Achievement (SAA) of Libyan students aged 8 to 21
years old. Finally, a cross-sectional approach was identified in this study as data were
collected from a sample with different age groups in a single time period.
5.4 Methods
In this study, the SPM test was used as a method to measure intelligence objectively.
The SPM resulted from the work of the British psychologist John C. Raven and
British geneticist Lionel Penrose. Their work was based on Spearman's two-factor
theory. The SPM tests are one of very few tests based on Spearman’s general (g)
factor theory of intelligence. Spearman (1946) felt that the goal of measuring “g” had
been achieved by the use of the Matrices test and considered the Progressive Matrices
Raven et al., (1996) mentioned that the SPM is used internationally, and no general
revision of it has been deemed necessary. Burke, (1958); Anastasi, (1988); Raven,
(1989); Carpenter et al., (1990); Arthur, & Woher, (1993); Arthur & Day (1994);
Court & Raven (1995); Murphy & Davidshofer (1998); Raven (2000); Kline (2000)
and Lynn (2006) noted that the SPM was the most widely used test due to the
following reasons:
• Being the best test of g; general factor present in all cognitive tasks.
162
• Being a popular instrument for use in developing countries (Thorndike &
• Being the first version of the RPM tests to be constructed (Raven, 1939) with
the possibility to be used for children from the age of 6 years onwards (Yoon,
2006).
Reliability and validity are both important measurements for identifying the suitability
psychological test (Brown, 1983, Urbina, 1997, Kenneth, 1998, Kline, 2000,
Langdridge, 2004, Domino, Domino, 2006. Airasian, 2006, and Lobiondo-Wood &
Haber 2006). To achieve the aim of this study; validity, reliability and item analysis
on the SPM test of a Libyan sample with that of other countries (developed and
developing countries). A review of relevant studies published on the SPM test from
studies. These studies were carried out in various countries between 1948 and 2009.
From each relevant study the following data were recorded and coded: (a) Author (b)
Country (c) Year of publication; (d) Population sampled; (e) Age (f) SPM means and
These studies were carried out in Congo, Denmark, Egypt, Estonia, France, India, Iran,
Israel, Libya, Nigeria, Mexico, Qatar, Tanzania, Turkey, Syria, Sudan, Pakistan, UK
and the USA between 1948 and 2009. To be included, a study should provide
163
5.5 Ethical approval
This study was considered the first attempt to standardise Raven’s Standard
Progressive Matrices (SPM) test, and apply it on a sample from Libya. Ethics
appropriateness of your behaviour in relation to the rights of those who become the
subject of your work, or are affected by it” (p.178). Ethics in research is an important
issue and must be taken into consideration in any research design. Ethical approval
was obtained from the Research Governance and Ethics Committee at the University
Education in Libya.
SPM testing was carried out by the researcher and well-trained teaching assistants
whom helped the researcher to distribute and administer the SMP test. The researcher
Mukhtar in 2001 during his study for a Masters degree. Only the researcher knew the
identity of the participants as their details were only accessible to the researcher. All
obtained data were secured in a safe place. The study included students from the age
of 8 to 21 years. The main purpose for this study was to develop the norms to find out
the distribution of IQ scores with Libyan students. Providing these norms would serve
as a guide in helping people to take appropriate decisions related to their future, and
choose educational programs that will best suit their abilities and assist in matching
164
Participation in this study was optional. An information sheet was provided and each
participant (or guardian of participant) was asked to sign a consent form. The
researcher also provided a simplified information sheet for children. “Please refer to
information sheet /children”. Information sheets and consent forms were available in
the native language of the participants (Arabic) and were comprehensive in content
and concepts. Each participant was free not to take part in the study or to withdraw at
any time without stating a reason. Also, participants were assured that their scores in
the SPM test was to be used for research purposes only. The researcher was available
on a contact number given if the participant wanted to discuss any matter that might
occur during the study. Results of the study were made available to all participants
and are possibly be published in Intelligence Journal. Participants that were willing to
attempt the test (children needed guardians/parents consent) were registered and then
A pilot study was first conducted to determine validity and reliability of the SPM test
to ascertain the applicability of the test. In addition, the pilot was done to determine
how clear the instructions of the test were for the participants, and to introduce the
The sample consisted of 200 students (100 males and 100 females). Using Social
Package for Statistical Science (SPSS) (version 16) software, reliability was
investigated using split-half and Alpha (KR-20) methods and validity was
investigated using correlations coefficients (internal consistency of SPM test sets) and
ranged from (0.87 to 0.88) and internal consistency reliability ranged from (0.93 to
165
0.94). The validity using correlations coefficients (internal consistency) showed
statistically significant high correlations ranging from (0.70* to 0.89**) between the
SPM test sets and the total test score. Moreover, validity using correlation between the
SPM test and the external criterion (SAA) showed statistically significant moderate
correlations of (0.52**). It was concluded that the SPM provided a promising measure
Sample size (2600 students) was based on the original SPM test that was standardized
on a sample of 735 British children aged 6-13 years tested individually, 1,407 British
children aged 8-14 years tested in groups and 629 British adults aged 20-70 years old
(Raven, 1960 and Raven, et al. 1998). Kline (2000) stated that the sample size has to
The researchers aimed to achieve the highest possible number of participants in this
The researcher lacked any sample framework (a record to select the candidates from)
for Libyan students aged between 8 to 21 years old, who were mainly in different
educational grades either for those enrolled in the different schools aged from 8 to 17
grades aged from 18 to 21 years. In addition, the research dealt with a huge dispersed
area, the Eastern Libyan Region. It encompassed a large number of cities and villages.
Moreover, the researcher dealt with a wide range of different age groups; from 8 to 21
166
years old. Consequently, the only available way to choose the sample was to employ a
multi-stage sampling technique. Its main advantages included no need for a sample
framework prior to conducting the survey and the ability to prepare it in the field.
In cluster sampling, intact groups, not individuals are randomly selected. All members
when the population is large or spread out over a wide geographic area. Cluster
sampling can be carried out in stages, involving selection of clusters within clusters.
This process is called multistage sampling (Mills & Airasian, 2006). When Raven, in
1981, standardized the Irish and British SPM test, he used this sampling method,
selecting samples from samples, each sample being drawn from within the previously
selected sample. In principal, the multi stage sampling method, which is an outright
random probability sampling method, can go on through any number of levels, each
level involving a sample drawn from the previous level (Bryman, 2005).
whole population and focusing on them, the researcher saved time and money instead
of spending them on travelling to the research sites scattered though the length and
breadth of the region. In addition, it enabled the researcher to prepare the sample
main method for selecting suitable representative samples for this research.
167
5.7.2.2 Disproportional stratified sampling
Although, the stratified sampling method continues to adhere with the underlying
applies the principles of randomness within these boundaries (Denscombe, 1998). The
assert some control over the selection of the sample to guarantee the inclusion of
crucial events or crucial people or social groups in the sample. This sample design
varied the sample fraction between different strata which increased the sample size in
small strata allowing enough cases for analysis, which is important for comparing
2. To select nine villages from the existing thirty. Villages were divided
least one classroom from every grade of the six grades in the elementary
school or from the three grades in both the preparatory and the secondary
in these villages.
4. To select at least five male and five female students from every
168
5. To select at least five male and five female students from every
6. To randomly select male and female students from either the scientific or
University.
A main difference between cities and villages was the existence of separate schools
for male and female students in the preparatory and secondary school education levels
and common schools in elementary education levels in the cities, contrarily to villages
where all the schools are common and shared for both genders.
than one school to represent the city. This meant that it was impossible to select one
elementary school for example to represent all the elementary schools in the city.
Consequently the researcher decided to divide the main city into six administrative
boundaries and the secondary city into three administrative boundaries. This was
followed by selecting one school for male students and one school for female students
for every educational level located within the selected administrative boundaries. In
addition, only one school was available for each educational level in each village in
contrast, to the availability of many schools for each educational level in the city.
For this purpose of the study, two cities were chosen; a main city (Al-Beida) and a
secondary city (Shahat). Al-Beida is the main city in the eastern region of Libya.
During the monarchy (1951-1969), Al-Beida was the second capital of Libya. Now
the municipality of the eastern region has a university (Omar El-Mukhtar University),
consisting of five campuses situated in the following cities: Al-Baida, Al-Marj, Al-
169
Gooba, Tobruk and Darnah. Al-Beida is considered as an educational, trade and
health centre for neighbouring settlements and small cities (Kezeiri, 1995). According
to the General Authority of Information in 2006, Al-Beida city has been divided into
Shahat city, previously known as Cyrene, was established by the Greeks in 631 B.C.
It was the first city to be formed in Libya. The location of the city played a significant
role in its growth and prosperity as did the availability of water from the Apollo
springs and abundance of rain. Its proximity to Apollonia port provided easy contact
religious, agricultural and industrial centre (Kezeiri, 1995). According to the General
Authority of Information in 2006, Shahat city has been divided into three
representative school was chosen for each administrative boundary in these two cities.
In addition, eastern regions provided the researcher with a wealth of resources (i.e.)
the researcher was born in Al-Beida city, and had good links to academic fellow
students and researchers. Also, he had taught in various cities located in the eastern
regions of Libya.
A large and more easily accessible sample was chosen from two of the Libyan cities
(AL-Beida, and Shahat) and nine villages because of its manageability both in terms
of time and resources, besides the researcher’s familiarity with the social context.
Figure 5.1 summarizes the importance and process of sampling method followed.
170
Multi-stage stratified Grouping and clustering the six cities and 30 villages to two main
probability sample for urban and rural clusters
the selection of
students aged between Select two cities from urban cluster and nine villages from rural cluster
8 to 21 years old
either in the basic
Selecting one elementary school, one preparatory school, and one
educational level or in secondary school from every village of the nine villages, then select at
the university least one classroom from every grade from grade three to grade twelve,
graduating level. followed by selecting at least five male and five female students from
each classroom.
sampling from one higher level unit called the preparatory sampling unit (Eastern
Libyan Region) and then sampling of secondary sampling units from and within that
higher level unit (cities and villages). This was followed by classifying the cities to
two homogenous urban area clusters using the criterion of their administrative
boundaries as the third sampling level; main and secondary cities. The researcher
selected one city from each category, In addition, villages were classified into three
different categories (third clustering sampling level); coastal, dessert and mountain
171
villages. Three villages were selected from each category with different weights or
ratios as the fourth sampling level. Followed by classifying and counting for the
existing schools either in the two selected cities or the nine selected villages as the
fifth sampling level according to their educational levels in Libya; elementary level
(grade three to grade six), preparatory level (grade seven to grade nine), and
The aim was to select one elementary, one preparatory and one secondary school from
each village, where most schools are common; for male and female students. The
researcher visited 27 schools in the nine villages to select the prospect respondents
(students) randomly from a list (sample framework), prepared by himself in the field
(during his visit to these schools). In the two cities, the aim was slightly different due
to the fact that preparatory and secondary schools apply a one gender policy and due
boundaries on the inability of selecting one school as a representative for the whole
city. Consequently, the researcher found himself in need of selecting at least two
schools in the preparatory and secondary educational level, one for male and one for
female students. This resulted in selecting six elementary schools, twelve preparatory
schools, and twelve secondary schools in the main city and three elementary schools,
six preparatory schools, and six secondary schools in the secondary city. Overall, the
researcher visited 72 schools from the existing 124 schools (about 58%) in the
different 11 settlements (two cities and nine villages); 27 schools located in the
selected nine villages and 45 schools located in the selected two cities.
Selection of one classroom from every grade in every school either in the nine
villages or in the two cities was conducted by the researcher. Children in Libya start
elementary school at the age of six years old. The researcher randomly selected
172
classrooms in the elementary schools from grade three and onwards. The student list
Regarding the respondents aged from 18 to 21 years old enrolled in the universities,
campuses in different settlements situated in; Al-Beida city, and Al-Marj. This could
be traced back to the fact that the researcher taught at Omar El-Mukhtar University in
visiting lecturer. Consequently the researcher had much more access to the university
easing the researcher tasks in collecting a reasonable amount of data, accessing to the
available data resources and establishing good links with past and current academic
staff.
The application of the multi-stage stratified sampling method to select the respondents
aged from 18 to 21 years old from this university as the primary sampling level
involved classifying its different specialisations into two main curriculum groups; the
sampling level. The two main specialisations or curriculum were divided by the four
academic years or grades as the third sampling level. Finally the researcher selected
students from every grade within the two curriculums. The aim was to select at least
200 students from each grade (100 students from the scientific curriculum and 100
students from the art curriculum) in the same time assuring gender equality (100 male
and 100 female students) disproportional to the real numbers of students in these two
173
main curriculum and regardless of the real numbers of either male or the female
students.
Overall, 2600 respondents aged from 8 to 21 years old with different fractions,
weights or ratios to the real numbers of prospect respondents in each group were
• 900 respondents or students from nine villages, aged from 8 to 17 years old,
• 900 students from two cities, aged from 8 to 17 years old, enrolled in three
educational levels.
Table 5.1 shows the followed principals in selecting the respondents from different
educational level in the rural and urban areas. Tables 5.2 and 5.3 show the
differentiation in the frictions between the selected sample sizes and the real numbers
of students due to the applied stratified sampling method either in the two selected
cities (table 5.2) or in the nine villages (table 5.3). Finally, table 5.4 shows the
174
Table 5.1 principals of selecting sample in schools
EDUCATIONAL VILLAGES CITIES TOTAL
LEVEL
Elementary school 9 villages* 1 school* 4 2 cities* 9 boundaries * 1 school 720
grades* 1 classroom* (5 (shared school) * 4 grades
male students + 5 female * 1 classroom* (5 male and 5
students) = 360 students female students) = 360 students
preparatory school 9 villages* 1 school* 3 2 cities* 9 boundaries * 2 schools 540
grades* 1 classroom* (5 (1 male+1 female school)* 3 grades
male students + 5 female * 1 classroom* (5 male or 5 female
students) = 270 students students) = 270 students
Secondary school 9 villages* 1 school* 3 2cities* 9 boundaries * 2 schools 540
grades* 1 classroom* (5 (1 male+1 female school)* 3 grades
male students + 5 female * 1 classroom* (5 male or 5 female
students) = 270 students students) = 270 students
Total 900 900 1800
Table 5.2 Target sample size of the pre-university students in the two cities in
proportion to their real numbers
AGE STUDY LEVEL GENDER TOTAL
Male Female
8 Year three at elementary 45/290=15.5% 45/304=14.8% 90/594=15.1%
9 Year four at elementary 45/287=16.6% 45/298=15.1% 90/585=15.3%
10 Year five at elementary 45/284=15.8% 45/296=15.2% 90/580=15.5%
11 Year six at elementary 45/278=16.1% 45/286=15.7% 90/564=15.9%
12 Year one at preparatory 45/256=17.5% 45/274=16.4% 90/530=16.9%
13 Year two at preparatory 45/252=17.8% 45/270=16.6% 90/522= 17.2%
14 Year three at preparatory 45/265=16.9% 45/268=16.7% 90/533= 16.8%
15 Year one at secondary 45/239=18.8% 45/254=17.7% 90/493=18.2%
16 Year two at secondary 45/235=19.1% 45/248=18.1% 90/483=18.6%
17 Year three at secondary 45/243=18.5% 45/252=17.8% 90/495=18.1%
Total 450/2629= 17.1% 450/2750= 16.3% 900/5379= 16.7%
175
Table 5.3 Target sample size of pre-university students in the nine villages in
proportion to their real numbers
AGE STUDY LEVEL GENDER TOTAL
Male Female
8 Year three at elementary 45/230=19.5% 45/262=17.1% 90/492=18.3%
9 Year four at elementary 45/247=18.2% 45/250=18.0% 90/497=18.1%
10 Year five at elementary 45/236=19.0% 45/242=18.7% 90/478=18.8%
11 Year six at elementary 45/239=18.8% 45/258=17.4% 90/497=18.0%
12 Year one at preparatory 45/231=19.4% 45/251=17.9% 90/482=18.6%
13 Year two at preparatory 45/213=21.1% 45/224=20.0% 90/437=20.5%
14 Year three at preparatory 45/220=20.4% 45/236=19.0% 90/456=19.7%
15 Year one at secondary 45/216=20.8% 45/225=20.0% 90/441= 20.4%
16 Year two at secondary 45/211=21.3% 45/220=20.4% 90/431= 20.8%
17 Year three at secondary 45/217=20.7% 45/229=19.6% 90/446= 21.8%
Total 450/2260= 19.9% 450/2397= 18.7% 900/4657= 19.3%
176
Figure 5.2 Sampling process
-Wide eastern region Multi stage- stratified eleven case studies (two
area cities and nine villages) sample design
-Lack of sample
framework
-Random probability Divide existing cities to two clusters according to
sample has sufficient administrative boundaries and villages to three clusters
accurate results according to geographic region)
-Many settlement types
and it is hard to select
one case study Two categories for Three geographic regions for
-Research limitation cities; Main, and villages; Coastal, Dessert and
especially limited field Secondary cities Mountain villages
work time and cost.
Main city Secondary city Coastal villages Mountain village Dessert villages
Alhanih, Alhammh qsarlibya, Maraoh, Aslanth
Al-Beida Shahat and Suasa Garnada and and Gantolah
Satih
Selecting one classroom in every grade Selecting one classroom in every grade
from grade three in the elementary school from grade three in the elementary school
to grade twelve in secondary school to grade twelve in secondary school
Graduate students in university aged between 18 and 21 years old, selecting 400 students
from science and 400 students from art specialization; 200 students (100 male and 100
female) from each year in both specializations, from both campuses.
177
5.8 Field work arrangement
Assistance in the field work was provided from five well trained psychologists who
after introducing and explaining the SPM test form, purposes and questions order to
them.
A request was made to the directors of the education sector to issue a letter to enable
the researcher to carry out the study in the chosen schools and universities.
The researcher contacted each school principal and dean faculty by a letter from
the sector of education explaining the purpose of the study and the procedure to be
followed in selecting and testing the students. At each school and university on the
day of the SPM testing, the researcher arrived one hour earlier to randomly select
students (males and females) from grades 3 to 12 from the sample framework (record
with students’ names in the selected classroom) which the researcher prepared in the
field with the help of the student affairs and student admission manager (students aged
from 8 to 17 years), or to select then 200 students in each year of university for both
disciplines (students aged from 18 to 21 years old). All participants were given an
information sheet and were required to sign a consent form before participation in the
study.
A place for testing the students was made available at each school. The place, in most
cases was either the school theatre or library where each student had his own table and
chair. Due to the large numbers of students in schools and existence of differences in
their age ranges, less than forty students were tested at a time using the SPM test. In
the university tests, the same methodology was adopted using groups of fifty at a time.
Participants were coded. Regarding school students, code was be based on location
178
whether city or village. In the case of students from villages, code was based on the
three types of village, name of villages, name of school, grade, gender and finally
number of participant. While, in the case of students from cities, code was based on
name of city, name of school, grade, gender and finally number of participant.
Moreover, no two cities or villages had names starting with the same letter.
5= Year level.
2= Participant number.
Regarding university students, code was based on name of city, name of university,
A= Arts Specialization.
3= Year level.
only by the researcher. Each participant name was assigned the code present on the
179
first page of the answer sheet. Only the researcher knew both the name and assigned
code for each participant. The researcher had supervised access to the children. At all
times, the school headmaster and teachers accompanied him and supervised him while
The Standard Progressive Matrices test consisted of 60 items in 60 pages, and was
divided into five sets lettered A, B, C, D and E. Each set consisted of 12 items. Each
page of the booklet contained a matrix with one missing part. Students were asked to
choose the missing part from six or eight options given below each matrix, and
indicate its number on a separate answer sheet. The following modifications were
introduced into the SPM test, to make it more suitable for the Libyan sample
. English letters (A, B, C, D and E) in the five sets were changed into Arabic letters
3. Page order (direction) of the test booklet was changed from left to right, to suit the
4. A new answer sheet was designed with Arabic letters, and right to left direction for
During September to November 2007, the SPM test was administered to 1800 school
students, and during September to November 2008, the SPM test was administered to
800 university students. The researcher was introduced to the students by the head
followed a definite numbers of unified steps during conducting the SPM test with the
respondents as follows:
180
1. Some time was spent at the beginning of each SPM test to establish a good rapport
with students, by discussing the purpose of the study, and why certain students from
the whole school were randomly selected to participate in the study. Also, the students
were assured that their scores in the SPM test would remain anonymous, and would
be used for research purpose only. After the test they were thanked for participating.
2. After the introduction, the SPM test booklets were distributed to the students and
they were asked not to open the booklets, until told to do so.
3. To ensure that the students understood the test and the unfamiliar procedures for
recording their responses on a separate sheet, the standard instruction for group
(a) This is a test of observation and clear thinking. Please open your test booklet at
the first page. You will find problem Number A1. Now look at your answer
sheet, you will see that under the heading set A there is a column of numbers
from 1 to 12.
(b) Now look at item A1, it is a pattern with a part cut out of it. Look at the pattern,
think what is the piece needed to complete the pattern correctly. Then find the
(c) All the pieces are the right size to fill the right space, but only one of them is the
right pattern. Number 1 is the right shape, but is not the right pattern. Number
2 is not a pattern at all. Numbers 3 and 5 are quite wrong. Number 6 is nearly
right, but is wrong here. Number 4 is the right answer because it is correct
(d) Now you write "4" next to number 1 under set A on your answer sheet. Please
181
(e) On every page of the booklet there is a pattern with a piece missing, you have to
choose which one of the pieces below is the right one to complete the pattern,
and write its number next to the problem number on your answer sheet. Go on
like this by yourself until you reach the end of the booklet.
(f) The problems are simple at the beginning and get harder as you go on. Do not
miss any out if you are not sure make a guess. If you get stuck, move on to the
next problem, and then come back to the one you have difficulty with.
(f) Any questions? I will come around to see that you are getting on all right.
(h) You can have as much time as you like. Now turn over to problem 2 and start.
4. The SPM test was administered without a time limit, as recommended by the SPM
test manual. However the researcher recorded the definite time needed to complete it
by each student. When each student had completed the SPM test and handed in his /
her test booklet and answer sheet, the researcher checked the answer sheet to make
sure that it had been filled in correctly and that every item had been answered, then
registered the time that the student needed to complete the test. The longest test time
5. The SPM test scores for the students were obtained by using the scoring key
6. The SPM items were scored by hand and double checked. The items were scored
either right or wrong. The maximum possible score was 60. The score was the number
of correct answers.
The researcher succeeded in achieving 100% of the target sample size in the pre-
university schools and in university students. In the chosen cities and villages, 90
students (45 males, 45 females) in each of the 10 educational levels were chosen. This
182
led to a total of 1800 student (900 male and 900 Female) who took the test (900 from
nine villages and 900 from two cities). Regarding university students, 100 students
(50 male and 50 female) in each of the 4 study level were chosen. This led to a total of
800 students (400 male and 400 female) 200 students in each year of university for
both disciplines 100 Science students (50 male and 50 female) and 100 Arts students
This section discusses data preparing, cleaning and the rational for statistical tests
used in this study. Data collected were imported into (SPSS) (version 16) software.
Afterwards, data was screened for errors and missing parts and then analysis using
deviations and charts for all study variables were conducted to present an overview of
the performance of Libyan participants on the SPM test. Also, normality of the data
was tested using the Kolmogorov-Smirnov test and normal probability plots. Data
Second to compute differences between SPM test means, independent sample T-test
was used when one continuous dependent variable (SPM test scores) was examined
and subjects divided into two groups e.g. male and female or science and arts
disciplines or cities and villages (Pallant, 2007). The analysis based on region and
geographic area was not carried out on university students, because all university
183
Third to compute differences between SPM test means, One-Way Analysis of
Variance was used when one continuous dependent variable (SPM test score) was
examined and sample divided into more than two groups e.g. age (Pallant, 2007).
Variance was used when one continuous dependent variable (SPM test score) was
examined and the sample divided by two independent variables e.g. gender and age
or region and age. This analysis allowed the investigation of the individual and joint
Fifth To investigate the effect size of the SPM means by calculation of cohen’s d,
which is equal to the subtraction of the means divided by the mean of the standard
deviation. In addition, cohen’s d was used to calculate IQ point difference which was
Seventh To convert SPM means score to IQ scores using British and American
percentile indices and a conversion table from percentiles to IQ scores The British and
USA norms for the Standard Progressive Matrices were used to calculate the IQ of the
Libyan sample. This method has been used in many recent studies such as Lynn and
Vanhanen in 2006, Abdel-Khalek and Lynn in 2006, Keleefa and Lynn, in 2008a,
Keleefa et al.in 2008b, Abdel-Khalek and Lynn in 2009 and Lynn in 2009. In
addition, kaplan and Saccuzzo (1997) concluded that Raven was regarded as one of
the major authorities in the psychological testing field in the 21st century.
relationships (between SPM test scores and Student's Academic Achievement (SAA))
184
was investigated following these guidelines; r = 0.10 small effect, r = 0.30 medium
effect and r = 0.50 large effect (Field and Hole, 2005). Also Pearson Product-Moment
(correlation coefficients between SPM test total score and SPM test sets) (Anastasi
Ninth Multiple regression stepwise analysis method was used to investigate which
independent variable was the best predictor (gender, age, (SAA) and regions; urban
Tenth Reliability of SPM test scores were investigated using split-half, Alpha and
test-retest (KR-20) methods. In the split-half method, items were divided into odd and
even items, because the items were arranged in order of difficulty (Kline 2000). Alpha
(KR-20) estimated how test items related to each other and to the total test. It is useful
for multiple choice items that were scored as right or wrong (Anastasi, Urbina 1997
and Mills, Airasian 2006). Test-retest correlated items within a test, when the test was
Eleventh two different methods were used for validity estimation; the first was the
Construct Validity by using Factor analysis and internal consistency and the second
was the criterion-related validity by using (SAA) as an external criterion. Due to lack
of standardized mental tests in Libya it was not possible in this study to use any other
intelligence test as an external criterion to investigate the validity of the SPM test.
185
Twelfth Item Analysis (difficulty and item discrimination) was investigated.
(a) Item difficulty: the proportion of respondents who answered an item correctly. If
most respondents answered an item correctly; the item was an easy item. If most
(b) Item discrimination index showed whether items differentiate between people with
varying degrees of knowledge and ability (Brown, 1983). The point biserial
correlation between “pass/fail” on each item and total test score was used to
investigate the SPM item discrimination ability (Anastasi 1988 and Anastasi, Urbina
1997).
considerations were considered. A pilot study was conducted and results showed that
the SPM test was valid and reliable and it was subsequently recommended for use for
Libyan students. A sample size of 2600 students (aged between 8-21 years) was based
on two previous British standardized SPM tests. Sampling process included a multi-
schools located in 11 different settlements; nine villages and two cities and two
universities located in two cities; Al-Beida and Al-Marj. The researcher succeeded in
achieving 100% of the target sample size. A meta-analysis was carried out to compare
performance in the SPM test for a Libyan sample with that of other countries. Finally
statistical tests employed and rationales were justified. Next will be the SPM Libyan
186
Chapter 6 Results
6.1 Introduction
This study represented a preliminary standardization for the SPM test on a Libyan
sample to develop norms for the classical form of the Standard Progressive Matrices
(SPM) test in Libya and to identify the distribution of IQ scores in a sample of Libyan
students. There were seven research objectives and results analyzed in this chapter.
The meaning and significance of the attained results and objectives will be postponed
to the next chapter. The SPSS version (16) analysis was carried out as follows
2. To study the relationship between SPM mean scores and student’s academic
the SPM test according to gender, region (cities and villages), academic discipline
(science and arts), geographical areas (main city, secondary city, coastal, mountain
SPM test according to region and gender, age and region, region and study levels,
geographic areas and gender, academic discipline and gender, age and gender and age
5. To investigate variability of SPM means score gender based on age and gender
6. To examine the contribution of the independent variables gender, age and regions
7. To compute the percentile ranks for the SPM scores according to the sample age
levels.
187
In addition, a eighth study research objective, which dealt with comparing
performance on the SPM test for a Libyan sample with that of other countries (meta-
analysis), was carried out and is reported in chapter seven. Data obtained were tested
for normality. For this, the Kolmogorov-Smirnov, Shapiro-Wilk test (table 6.1) and
normal probability plots (figures 6.1, 6.2, 6.3 and 6.4) were employed to investigate
Table 6.1 Descriptive statistics of overall collected data and tests of normality.
Descriptive statistics Statistic Std Error
Mean 32.31 .234
95% confidence 31.85
Interval for Mean 32.76
5% Trimmed Mean 32.40
Median 33.00
Variance 142.670
Std. Deviation 11.94
Minimum 6
Maximum 58
Range 52
Interquartile Range 19
Skewness -.217 .073
Kurtosis -.596 .146
Tests of normality
Kolmogorov-smirnov Shapiro-Wilk
Statistic df Sig Statistic df Sig
.070 2600 .005 .971 2600 .005
188
Figure 6.2 Normal Q-Q plot. Figure 6.3 Detrended normal Q-Q plot.
70
60
50
40
30
20
10
0
N= 2600
totaliq
Figure 6.1 is a histogram showing the SPM scores. They appeared to be normally
distributed. Figure 6.2 showed a normal probability plot (normal Q-Q plot). Here the
observed value of each mean is plotted against its expected value. A reasonable
straight line suggested a normal distribution. Figure 6.3 showed the detrended normal
Q-Q plot, where the actual deviation of the scores from the straight line are plotted.
Most scores were collected around the zero line with no real clustering of scores. This
indicated a normal distribution. Figure 6.4 showed a box plot. 50% of score are
189
represented by the rectangular, while the line inside the box represents the median
value, whereas the whiskers represent the highest and lowest values.
The statistical results of both tests of normality were significant (p = 0.000). However
the sample size in this study was large and that indicated a normal distribution
(Pallant, 2007). In addition, Pearson’s Skewness Coefficient was used to verify the
Hildebrand (1986) stated that skewness values above 0.2 or below -0.2 indicate severe
skewness. The skewness coefficient in this sample was -0.05 indicating minor
skewness. All of the above tests indicated that the sample used was normally
distributed and that parametric tests may be applied with confidence to analyze the
data.
A total of 2600 Libyan students participated in this study. Students were divided into
discipline. 1800 school students (900 males and 900 females) and 800 University
students (400 males and 400 females) carried out the test. According to region, 900
school students were from cities, whereas the remaining 900 were from villages. They
were chosen from 72 schools located in 11 different settlements; nine villages (27
schools) and two cities (45 schools). The 800 university students were from two
universities located in two cities; Al-Beida and Al-Marj during the academic year
2007-2008. Of them, 400 students were from science and 400 students from art
discipline.
190
The Following tables showed descriptive statistics of the SPM score means according
to gender, region, geographic areas, study levels, academic discipline and age. Table
6.2 shows SPM score means and standard deviations according to the independent
variables.
Based on gender, males mean scores were only slightly higher than females. Based on
regions, cities were only slightly higher than villages. Similarly, based on geographic
areas, the main city also showed slightly higher mean scores than other geographic
areas. In regards to age, score means increased as age increased; the highest score
means were achieved by 21 years old students. According to study levels, score means
increased as study levels increased; the highest score means were achieved by the
191
university level. Based on academic discipline, science students obtained a
characteristics (reliability, validity, difficulty and discrimination) of the SPM test, the
• Test-retest reliability.
• Split-half reliability.
Reliability refers to the consistency of scores obtained by the same person when
retested with the same test or equivalent form. To establish the reliability of the SPM
test when used with the Libyan students, three different methods were employed. The
first method was split-half reliability with the total sample (N = 2600), the second
method was coefficient Alpha (KR-20) which also used with the total sample (N =
2600) and the third was test retest reliability with a sample of 280 students.
192
6.3.1 Test-retest reliability of the SPM test
The test-retest method was used to evaluate reliability; measure of the stability of
students’ scores over a period of time on the SPM test. The SPM test was
administered twice to a group of 280 Libyan students (140 males and 140 females).
The time interval between test-retest was two weeks. Table 6.3 showed the SPM test-
Table 6.3 SPM test-retest reliabilities according to age, gender and study levels
AGE GROUPS STUDY LEVELS MALES FEMALES TOTAL
N r N r N r
8-11 Elementary 40 .86 40 .87 80 .87
12-14 Preparatory 30 88 30 .87 60 .88
15-17 Secondary 30 .88 30 .91 60 .91
18-21 University 40 .92 40 .91 80 .92
Total Sample 140 .89 140 .89 280 .90
The SPM test-retest reliability ranged from 0.86 for male students age groups 8-11
year (N=40) to 0.92 for males and females university students. The overall test-retest
The split half method was used to investigate the reliability of the SPM test. The SPM
items were divided into odd and even items, as the items are arranged in order of
difficulty (Kline 2000). The split-half reliability was then corrected by the Spearman-
Brown prophesy formula. Whereas it is a general formula that can be used to assess a
variety of different questions about test length and reliability, it is presented here
(Kline, 2000 and Kline, 2005). The reliability coefficients were computed separately
for male and female students, age and total sample. Table 6.4 showed the SPM split-
193
Table 6.4 SPM split-half reliabilities according to gender, age and total Sample
AGE MALES FEMALES TOTAL
N SH (r.) SB N SH (r.) SB N SH (r.) SB
8 9
.77 .88 9
.85 .92 8
.81 .90
9 9
.84 .91 9
.77 .88 8
.80 .89
10 9
.79 .88 9
.84 .91 8
.83 .91
11 9
.90 .95 9
.88 .94 8
.89 .94
12 9
.80 .89 9
.87 .93 8
.84 .91
13 9
.82 .90 9
.85 .92 8
.84 .91
14 9
.83 .91 9
.84 .91 8
.84 .91
15 9
.81 .90 9
.87 .93 8
.84 .91
16 9
.88 .94 9
.89 .94 8
.89 .94
17 9
.86 .92 9
.89 .94 8
.88 .93
18
.87 .93
.88 .94 200 .88 .94
19
.90 .95 100 .86 .93 200 .88 .94
20
.90 .95
.88 .94 200 .89 .94
21
.91 .96
.86 .93 200 .89 .94
Total 1300 .92 .96 1300 .91 .96 2600 .92 96
SH (r.) = Split-half. SB = Spearman-Brown (SPSS provide SB).
Table 6.4 showed that the split-half reliability for the SPM test ranged from (0.77 to
0.92) and its Spearman-Brown (PS) correction ranged from (0.88 to 0.96). In total
determines how items in a test relate to other test items and to the total test. KR-20
formula provides reliability estimates that are equivalent to the average of the split-
half reliabilities computed for all possible halves. In addition, alpha (KR-20) is useful
for multiple choice items that were scored as right or wrong (Anastasi, Urbina 1997
and Mills, Airasian 2006).The reliability coefficients were computed separately for
gender, age and total sample. The results obtained were given in table 6.5.
194
Table 6.5 SPM Alpha reliabilities according to gender, age and total sample
AGE MALES FEMALES TOTAL
N Alpha N Alpha N Alpha
8 9
.85 9
.86 8
.86
9 9
87 9
.86 8
.87
10 9
.87 9
.90 8
.90
11 9
.92 9
.91 8
.92
12 9
.88 9
.93 8
.91
13 9
.90 9
.90 8
.90
14 9
.89 9
.90 8
.89
15 9
.88 9
.90 8
.90
16 9
.93 9
.91 8
.92
17 9
.91 9
.91 8
.90
18
.91
.94 200 .93
19
.93 100 .90 200 .91
20
.89
.93 200 .92
21
.93
.92 200 .93
Total 1300 .96 1300 .94 2600 .94
Table 6.5 showed alpha reliabilities (KR-20) for the SPM ranged from 0.85 (males
aged 8) to 0.96 (total males). In total sample the SPM alpha reliability (KR-20) was
0.94 (N=2600).
Validity is the degree to which a test measures what is supposed to measure and,
of the SPM test two different methods were employed. The first method was
Construct Validity with the total sample (N = 2600), the second method was criterion-
related validity which was also used with the total sample (N = 2600).
with the extent to which a test measures a specific trait or construct. The term
construct is used to refer to something that is not itself directly measurable but which
195
explains observable effect. In other words, construct validation is the systematic
analysis of test scores designed to assess whether there is a basis for validity. A
subtype of construct validity is factor analysis and internal consistency (Anastasi and
Urbina, 1997).
This procedure shows the extent to which a set of items measures the same underlying
validity of the SPM test scale, the intercorrelations between the five sets of the SPM
test initially were subjected to principal components factor analysis for male and
female separately to ascertain whether the items contained a general factor and
possibly other factors. In this procedure the number of significant factors is normally
taken to be those with eigenvalues greater than unity. An eigenvalue is the amount of
the total variance, deviation from the average weighted by the sample size, explained
by the corresponding factor (Tabachnick & Fidell 2007). Table 6.6 and figure 6.5
shows the results of the factor analysis of the SPM score means for the entire sample.
Table 6.6 Correlations matrix between the five sets of the SPM test among Libyan
male and female students (N=2600, 8 to21 years) and extracted factor
SET CORRELATIONS FACTOR 1
A B C D E
A 0.67
B 0.63** 0.84
C 0.57** 0.71** 0.87
D 0.56** 0.70** 0.76** 0.85
E 0.46** 0.55** 0.61** 0.60** 0.68
Eigen value 3.47
% of variance 69.41
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy 0.871
Bartlett's Test of Approx. Chi-Square 7323.359
Sphericity df 10
Sig. 0.000
196
Figure 6.5 Screen Plot for the five Factors
Table 6.6 showed all the correlation coefficients that were statistically significant
should be 0.3 or higher (r > 0.3) in the principal component analysis. One highly loaded
factor (from 0.67 to 0.87) was extracted which accounted for 69.41% of the common
variance which was Spearman’s “g”. These results indicate the internal consistency
and factorial validity as a result of the test items’ homogeneity. In addition, results
show the Kaiser-Meyer-Oklin value was 0.871, exceeding the recommended value of
0.6 (minimum value for good factor analysis) (Kaiser 1970, 1974 and Tabachnick &
Fidell 2007) and the Bartletts’ test of sphericity (Bartlett, 1954) reached statistical
investigation; factor analysis of the SPM test was computed based on gender. The
following tables (Table 6.7 and 6.8) and figures (Figures 6.6 and 6.7) showed factor
197
Table 6.7 Correlations matrix between the five sets of the SPM test among Libyan
male students (N=1300, 8 to21 years) and Extracted Factor
SET CORRELATIONS FACTOR 1
A B C D E
A 0.70
B 0.64** 0.84
C 0.58** 0.70** 0.85
D 0.59** 0.70** 0.75** 0.86
E 0.46** 0.56** 0.60** 0.61** 0.69
Eigen value 3.49
% of variance 69.76
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy 0.874
Bartlett's Test of Approx. Chi-Square 3683.603
Sphericity df 10
Sig. 0.000
Table 6.7 showed all the correlation coefficients that were statistically significant
(0.46 to 0.75). One highly loaded factor (0.69 to 0.86) was extracted which accounted
for 69.76% of the common variance which was Spearman’s “g”. These results
indicated the internal consistency and factorial validity as a result of the test items’
homogeneity. Also, results showed that the Kaiser-Meyer-Oklin value was 0.874, and
198
the Bartletts’ Test of Sphericity reached statistical significance (0.000), supporting the
Table 6.8 Correlations matrix between the five sets of the SPM test among Libyan
female students (N=1300, 8 to21 years) and extracted factor
SET CORRELATIONS FACTOR 1
A B C D E
A .67
B 0.62** .84
C 0.56** 0.72** .88
D 0.54** 0.69** 0.78** .85
E 0.46** 0.55** 0.62** 0.59** .68
Eigen value 3.46
% of variance 69.22
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy 0.865
Bartlett's Test of Approx. Chi-Square 3689.905
Sphericity df 10
Sig. 0.000
Table 6.8, showed all the correlation coefficients that were statistically significant
(0.46 to 0.78). One highly loaded factor (from 0.67 to 0.88) was extracted which
199
accounted for 69.41% of the common variance which was Spearman’s “g”. These
results indicated the internal consistency and factorial validity as a result of the test
items’ homogeneity. Also results showed that the Kaiser-Meyer-Oklin value was
0.865, and the Bartletts’ Test of sphericity reached statistical significance (0.000),
subscales on the same test and on total score. It measures whether several subscales
that propose to measure the same general construct produce similar scores (Anastasi
coefficients between the five sets and the total scores of the SPM test were computed
for validity estimation. Table 6.9 shows correlations coefficients between the five sets
and the total scores of the SPM test for the entire sample.
Table 6.9 Correlations coefficients between the five sets and the total scores of the
SPM test (n=2600, age 8 to21 years)
SETS CORRELATIONS
Total A Total B Total C Total D Total E
Total A 1.000
Total B 0.64** 1.000
**
Total C 0.59 0.71** 1.000
Total D 0.56** 0.69** 0.74** 1.000
**
Total E 0.50** 0.57 0.62** 0.64** 1.000
**
Total 0.72** 0.84 0.85** 0.85* 0.74**
** Correlation is significant at the 0.01 level
The relationship between sub-scales and total scales scores of the SPM test was
and statistically significant positive correlation coefficients between the five sets (A,
200
B, C, D and E) and total scores, ranging from 0.50 to 0.85, n= 2600 (p<0.01). In
addition, the internal consistency of the SPM test was computed based on gender.
Table 6.10 shows correlations coefficients between the five sets and the total scores of
Table 6.10 Correlations coefficients between the five sets and the total scores of the
SPM test (males n=1300 and females n= 1300, age 8 to21 years)
MALE N= 1300 SETS CORRELATIONS
Total A Total B Total C Total D Total E
Total A 1.000
Total B 0.65** 1.000
Total C 0.58** 0.69** 1.000
** **
Total D 0.59 0.69 0.73** 1.000
** **
Total E 0.51 0.58 0.63** 0.65** 1.000
** ** ** **
Total 0.71 0.83 0.84 0.85 0.74**
The relationship between the five sets and the total scores of the SPM test was
strong, positive correlation coefficients, statistically significant between the five sets
(A, B, C, D and E) and total scores ranging from 0.51 to 0.85 (p<0.01) for males and
201
6.4.2 Criterion-related validity
To evaluate validation of the SPM with Students Academic Achievement (SAA) the
total of final examination scores was used as criterion to validate the SPM test
(predictive validity). This is the correlation between test scores and a criterion that
occurs at a later point in time. Also the second research objective focused on
establishing the relationship between SPM scores and student’s scores in final school
and university exams in all studied courses (SAA) and Pearson product-moment
correlations were used. Table 6.11 shows the correlation between the SPM scores and
the students’ academic achievement scores in final school and university exams in all
studied courses (SAA) according to age, levels of study, gender and total sample.
Table 6.11 Correlation between the SPM and achievement scores according to age,
level of study, gender, academic discipline and total sample
Age and level of study variables N= 2600
Elementary Preparatory Secondary University
N= 720 N= 540 N= 540 N= 800
Age r Age r Age r Age r
8 .56** 12 .41** 15 .37** 18 .37**
9 .41** 13 .39** 16 .43** 19 .50**
10 .37** 14 .33** 17 .50** 20 .47**
11 .41** Total .38** Total .43** 21 .41**
Total .44** Total .44**
Results in table 6.11 showed that the validity coefficients between the SPM scores
and students’ SAA ranged from 0.33 to 0.56. For arts students the correlation
between the SPM scores and their SAA was 0.41. The correlation for both samples
(science and arts scores; n =800) between the students SAA and their SPM scores was
202
0.46, which is statistically significant from 0.41. In general, all correlation coefficients
between SPM and students SAA were statistical significant for all groups.
Item analysis was used in this study to investigate the difficulty and discrimination
power of the item. An item analysis was performed on the SPM test based upon the
total sample (N=2600) students. Table 6.12 showed the difficulty levels of the SPM
items, Table 6.13 showed item discrimination and Table 6.14 exhibited a summary for
The SPM test consisted of 5 sets of items, lettered (A, B, C, D, and E). Each set
consists of 12 items which become progressively more difficult. Furthermore the level
Item difficulty is defined as the percentage of students obtaining the correct answer to
an item. The higher the value of the difficulty index, the easier the item. Table 6.12
showed the item difficulty indices of the five SPM sets for total sample.
Table 6.12 Item difficulty (percentages of correct answers) and SPM Means of the
Correct Answers (N = 2600)
Set Diff 1 2 3 4 5 6 7 8 9 10 11 12
A Diff 100 99 97 95 94 92 74 75 82 70 45 34
B Diff 97 90 82 75 66 64 50 43 49 57 41 33
C Diff 79 76 69 65 63 49 54 40 51 30 23 9
D Diff 84 73 65 61 70 58 54 52 49 39 22 7
E Diff 60 42 40 26 24 23 21 12 11 7 5 4
SPM means of the percent of correct answers.
Set A B C D E
Means 0.79 0.62 0.57 0.58 0.35
203
It was clear from table 6.12 that 11 SPM items which were answered by 80 - 100 % of
the students appeared to be easy and 7 items were from section A. 42 SPM items
difficulty and 7 SPM items which were answered by less than 20 % of the students
In addition, it was evident from table 6.12 that three items in set A (A7, A8 and A9);
four items in set B (B7, B8, B9 and B10); three items in set C (C7, C8, and C9); and
three items in set D (D3, D4 and D5) did not follow an order of increasing in
According to the 2004 SPM manual, items should steadily increase in difficulty
within the series. In order to test this, as Raven claimed, the degree of difficulty of
the 60 items and five sets of the SPM test were measured by means of the percent of
correct answers. Table 6.12 showed the SPM means of the percent of correct answers
for each SPM set. Set D mean was higher than set C, which suggested that set D was
comparatively easier than set C. Inspection of the mean for each item and set showed
that only thirteen items and one set appear to be of misplaced difficulty.
The discrimination index showed whether items differentiate between people with
varying degrees of knowledge and ability. It is the percentage of the “high” group
passing the item, minus the percentages of the “low” group passing the item. Also
discrimination. The point biserial correlation between “pass/fail” on each item and
total test score was used to investigate the SPM item discrimination (Brown, 1983;
Anastasi 1988 and Anastasi, Urbina 1997; Roid and Barram 2004; Kline, 2000; Kline,
204
2005). The greater the correlation of the item the more discriminating the item is i.e. it
discriminates between higher and lower group more effectively. For an item to be
valid, the correlation between the items and total scores should be fairly high.
Hopkins (1998) suggested that the indices of item discrimination can be evaluated in
Hopkins suggestion was utilized to analyze the point biserial correlation data. The
point biserial correlation between “pass/fail” for each SPM item and total test score
Table 6.14 Point biserial and significant level for each SPM item
Set 1 2 3 4 5 6 7 8 9 10 11 12
A -- .12** .35** .42** .50** .46** .63** .56** .61** .67** .62** .52**
B .24** .41** .54** .54** .65** .61** .57** .71** .72** .74** .69** .61**
C .58** .57** .70** .65** .71** .60** .72** .60** .65** .50** .49** .12**
D .60** .76** .76** .76** .77** .73** .71** .63** .66** .63** .38** .14**
E .60** .61** .63** .63** .67** .60** .49** .50** .48** .33** .20** .11**
**Significant at 0.001
Generally, correlations lay between (r = 0.11 and 0.77; p < 0.001) with a general
mean of (r = 0.44; p < 0.001). The 60 correlations calculated were significant and all
were so easy for this sample that they did not generate any variance and hence no
covariance was evident. Also table 5.11 showed that the correlations ranged from (r =
0.12 to 0.77; p < 0.001) with a mean of (r = 0.54; p < 0.001) for set A; from (r =
0.12 to 0.67; p < 0.001) with a mean of (r = 0.59; p < 0.001) for set B; from (r =
0.24 to 0.74; p < 0.001) with a mean of (r = 0.57; p < 0.001) for set C; from (r = 0.12
205
to 0.72; p < 0.001) with a mean of (r = 0.63; p < 0.001) for set D and from (r = 0.14
to .77; p < 0.001) with a mean of (r = 0.49; p < 0.001) for set E.
According to Hopkins (1998) this SPM test had 51 items as having excellent
having fair discriminating value. With the remaining items, correlations ranged from
(r = 0.49 to 0.61; p < 0.001). This indicated that the SPM test showed many
discriminating items.
Table 6.15 showed a summary of tables 6.12 and 6.14. It showed numbers of difficult
items, discriminate items, item not in order of difficulty, order of difficulty for the
SPM sets and order of excellent discriminated sets for the SPM.
1. As designed, set A is the easiest set whereas set E is the most difficult set. Set A
had 5 items with moderate difficulty level (less than .79); set B had 9 items; set C
had 11 items; set D had 10 items and set E had 7 items. The order of difficulty of
the SPM five sets according to the numbers of difficult items in each set in order
206
2. 40 out of 60 items had excellent discriminating value. Set A had 8 items, set B and
D had 10 items, set C had 11 items and set E had 9 items of excellent
discriminating value. The excellent discriminated SPM sets in order from high to
3. 13 items were not arranged in order of increasing difficulty. Set D had 4 items,
207
6.6 Differences in SPM scores
As mentioned in the beginning of this chapter, one of the objectives of this study was
discipline(science and arts), geographic nature (main city, secondary city, coastal,
mountain and desert), age and study levels. In addition, significant differences in
sample performance on the SPM test according to region and gender, age and region,
region and study levels, geographic nature and gender, academic discipline and
gender, age and gender and age and academic discipline was carried out. The
An independent t-test was carried out to compare the SPM score means in regards to
This table showed that there was no significant difference in mean scores between
males and females (male mean = 32.49, SD = 12.06 and females mean = 32.12, SD =
11.83; t (2598) = 0.789, p = 0.430). The magnitude of the differences in the means
208
(mean difference = 0.370, 95% CI:-.594 to 1.288) was very small (partial eta squared
= 0.019). SPSS did not provide eta squared values for t-test. It was however,
An independent t-test was carried out to compare the SPM score means in regards to
As levene's test was significant, the t value when equal variances not assumed was
used (Pallant, 2007). There was no significant difference in scores for cities (mean
0.556)). The magnitude of the differences in the means (mean difference = -0.309,
95% CI:-1.340 to .721) was very small (partial eta squared = -0.028). SPSS did not
provide eta squared values for t-test. It was, however, calculated using the information
209
6.6.3 Difference according to academic discipline
An independent t-test was carried out to compare the SPM score means in regards to
Results showed that there was a statistically significant difference in scores between
arts discipline (mean 40.16, SD 7.88) and science discipline (mean = 42.34, SD =
8.56; t (798) = -3.76, p = 0.000) in favour of science students. The magnitude of the
differences in the means (mean difference = -2.178, 95% CI:-3.32 to -1.04) was large
(partial eta squared = -0.27). SPSS did not provide eta squared values for t-test. It
210
6.6.4 Difference according to geographic areas
One way ANOVA was conducted to compare the SPM means for the geographic
areas (table 6.19) and post hoc Tukey test for multiple comparisons (table 6.20).
211
Participants were from five different geographic areas. The results showed that there
were no statistically significant differences in SPM scores for the five geographic
areas F (4, 1795) = 0.623, p = 0.646. The effect size, calculated using eta squared
(divide the sum of squares between-groups (309.571) by the total sum of squares
(223459.320) (Pallant, 2007)) the resulting eta squared value was 0.001, which
indicated a very small effect size. Post-hoc comparisons using the Tukey HSD test
indicated that there were no statistical significant differences between the five
One-way ANOVA was used to compare the SPM score means difference in regards to
age (table 6.21), and post hoc Tukey (HSD) test (table 6.22).
212
Table 6.22 Post Hoc Tukey (HSD) Tests
Age 8 9 10 11 12 13 14 15 16 17 18 19 20
8
9 .453
10 .000 .036
11 .000 .000 .000
12 .000 .000 .000 .005
13 .000 .000 .000 .000 .005
14 .000 .000 .000 .000 .000 .962
15 .000 .000 .000 .000 .000 .158 .980
16 .000 .000 .000 .000 .000 .000 .120 .936
17 .000 .000 .000 .000 .000 .000 .000 .000 .140
18 .000 .000 .000 .000 .000 .000 .000 .000 .008 1.000
19 .000 .000 .000 .000 .000 .000 .000 .000 .000 .105 .519
20 .000 .000 .000 .000 .000 .000 .000 .000 .000 .007 .082 1.000
21 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .005 .935 1.000
*. The mean difference is significant at the 0.05 level.
Participants were from fourteen different ages. There were statistically significant
differences (p = 0.05) in SPM scores for age F (13, 2586) = 225.846, p = 0.000. The
effect size, calculated using eta squared (divide the sum of squares between-groups
(3535.138) by the total sum of squares (8979.386) (Pallant, 2007)), the resulting eta
squared value was 0.53, which indicated a large effect size. Post-hoc comparisons
using the Tukey HSD test indicated that there were statistical significant differences
between the different ages except between the (8 and 9 years), (13 through 15 years),
(14 through 16 years), (16 and 17 years), (17 through 19 years), (18 through 20 years)
and (19 through 21 years) with the exception of higher mean scores for older student.
One way ANOVA was conducted to compare the SPM means in regards to study
levels (table 6.23) and post hoc Tucky (HSD) test for multiple comparisons
(table 6.24)
213
Table 6.23 Comparison according to study levels
Study levels (N) Mean SD
Elementary 720 19.96 8.38
Preparatory 540 31.39 8.77
Secondary 540 36.43 8.68
University 800 41.25 8.29
Total 2600 32.31 11.94
Source Sum of Squares df Mean Squares F. Ratio F. Prob.
Between Groups 183360.732 3 61120.244 846.504 .000
Within Groups 187439.303 2596 72.203
Total 370800.035 2599
Participants were from four study levels. There were statistically significant
differences in SPM scores between the four study levels F (3, 2596) = 846.504, p =
0.000. The effect size, calculated using eta squared (divide the sum of squares
2007)), the resulting eta squared value was 0.49, which indicated a large effect. Post-
hoc comparisons using the Tukey HSD test indicated that there were statistical
214
significant differences between the different all study levels in favour of highest
levels.
Two-way ANOVA was conducted on SPM scores in regards to study levels and
215
Table 6.28 Post Hoc Tukey (HSD) Test
(I) Study (J) Study MD Std. Sig. 95% Confidence Interval
levels levels Error Lower Upper
Bound Bound
Elementary Preparatory -11.43* .489 .000 -12.58 -10.28
Secondary -16.47* .489 .000 -17.62 -15.32
Preparatory Elementary 11.43* .489 .000 10.28 12.58
Secondary -5.04* .523 .000 -6.27 -3.82
Secondary Elementary 16.47* .489 .000 15.32 17.62
Preparatory 5.04* .523 .000 3.82 6.27
The mean difference is significant at the .05 level. MD= Mean Difference (I-J)
The interaction effect between regions and study levels was not statistically
significant, F (2, 1794) = .757, P = .469. There was no statistically significant main
effect for region, F (1, 1794) = .696 P = 0.404; the magnitude of the effect size was
very small (partial eta squared = .001). Post-hoc comparisons using Tukey HSD test
showed that there were statistical significant differences between the different study
levels. The main effect for study levels, F (2, 1794) = 616.203, P =.000, exhibited
statistical significance.
It is worth noting that the Leven’s test was significant, indicating that group variance
dividing the largest variance by the smallest variance in each group. A result of 2 or
above means the variance was unequal. All results were below 2 which indicated
216
6.6.8 Difference according to regions and gender.
Participants were divided into two groups according to the regions (cities and
villages). The interaction effect between regions and gender was not statistically
significant, F (1, 1796) = 0.091, P = 0.762. There was no statistically significant main
effect for regions, F (1, 1796) = 0.346 P = 0.556; the magnitude of the effect size was
very small (partial eta squared = .001). The main effect for gender, F (1, 1796) =
217
1.003 P = 0.317; did not exhibit statistical significance. The significant result of
Leven’s test was further tested as mentioned earlier. Variance was equal.
218
Table 6.34 Tests of Between-Subjects Effects of SPM scores
Source Type III df Mean F Sig. Partial
Sum of Square Eta
Squares Squared
Corrected Model 103802.844 19 5463.308 81.272 .000 .465
Intercept 1444705.915 1 1444705.915 21491.328 .000 .924
age 103536.949 9 11504.105 171.134 .000 .464
Region 43.031 1 43.031 .640 .424 .000
age * Region 222.864 9 24.763 .368 .950 .002
Error 119656.476 1780 67.223
Total 1668165.234 1800
Corrected Total 223459.320 1799
a. R Squared = .465 (Adjusted R Squared = .459)
219
Figure 5.8 means score differences of age and region
Participants were divided into two groups according to region (cities and villages).
The interaction effect between region and age was not statistically significant, F (9,
1780) = .368, P = .590. There was no statistically significant main effect for region, F
(1, 1780) = .640 P = .424; the magnitude of the effect size was large (partial eta
squared = .47). The main effect for age, F (9, 1780) = 171.134 P = .000; was
statistical significance. Post-hoc comparisons using Tukey HSD test showed that in
cities, statistical significance were found between all age groups except between the
(8 and 9), (9 and 10), (11 and 12), (12 and 13), (13, 14 and 15), (14, 15 and 16) and
(15,16 and 17) ages. In villages, statistical significant differences were found between
all age groups except between the (8 and 9), (9 and 10), (11 and 12), (13, 14 and 15),
(14, 15 and 16), (15 and 16) and (16 and 17) ages. The significant result of Leven’s
220
6.6.10 Difference according to geographic areas and gender
221
Table 6.39 Post Hoc Tukey (HSD) Test
(I) (J) MD Std. Sig. 95% Confidence Interval
Geographic Geographic Error Lower Upper
areas areas Bound Bound
Main city Coastal .16 .781 1.000 -1.97 2.29
Mountain 1.16 .781 .572 -.97 3.29
Dessert .53 .781 .960 -1.60 2.67
Secondary city .12 .821 1.000 -2.12 2.36
Secondary city Coastal .04 .945 1.000 -2.54 2.63
Mountain 1.04 .945 .804 -1.54 3.63
Dessert .42 .945 .992 -2.16 3.00
Main city -.12 .821 1.000 -2.36 2.12
Coastal Mountain 1.00 .911 .808 -1.49 3.49
Dessert .37 .911 .994 -2.11 2.86
Main city -.16 .781 1.000 -2.29 1.97
Secondary city -.04 .945 1.000 -2.63 2.54
Mountain Coastal -1.00 .911 .808 -3.49 1.49
Dessert -.63 .911 .959 -3.11 1.86
Main city -1.16 .781 .572 -3.29 .97
Secondary city -1.04 .945 .804 -3.63 1.54
Dessert Coastal -.37 .911 .994 -2.86 2.11
Mountain .63 .911 .959 -1.86 3.11
Main city -.53 .781 .960 -2.67 1.60
Secondary city -.42 .945 .992 -3.00 2.16
MD= Mean Difference (I-J)
The interaction effect between geographic areas and gender was not statistically
significant, F (4, 1790) = .213, P = .887. There was no statistically significant main
effect for geographic areas, F (4, 1790) = .622 P = 0.647; the magnitude of the effect
size was very small (partial eta squared = .003). Post-hoc comparisons using Tukey
HSD test showed that there were no statistical significant differences between the
different geographic areas. The main effect for gender, F (1, 1790) = .538, P =.463,
did not exhibit statistical significance. The significant result of Leven’s test was
222
6.6.11 Difference according to academic discipline and gender
Participants were divided into two groups according to academic discipline (arts and
science). The interaction effect between academic discipline and gender was not
significant main effect for academic discipline, F (1, 796) = 14.050 P = 0.000; the
223
magnitude of the effect size was a small (partial eta squared = .022). The main effect
for gender, F (1, 796) = .001 P = 0.976; did not exhibit statistical significance.
Leven’s equality test was not significant indicating that the group variance was equal.
224
Table 6.44 Levene's Test of Equality of Error Variances
F df1 df2 Sig.
3.131 27 2572 .000
225
Figure 5.9 Means score difference of age and gender
The interaction effect between age groups and gender was statistically significant, F
(13, 2572) = 2.827, P = 0.000. There was a statistically significant main effect for age,
F (13, 2572) = 227.950 P = 0.000; the magnitude of the effect size was large (partial
eta squared = .54). Post-hoc comparisons using the Tukey HSD test indicated that
there were statistical significant differences between the different age except between
the (8 and 9 years), (13 through 15 years), (14, through 16 years), (16 and 17 years),
(17 through 19 years), (18 through 20 years) and (19 through 21 years) with the
exception of higher mean scores for older student. As a significant interaction result
was obtained, an analysis of simple effects was carried out, in which the sample
would be split into groups according to one of the independent variables and running
statistical tests to explore the effect of the other variable. So, to determine whether
there are statistically significance differences between either males or females score
means among different ages the sample was split according to age and an Independent
Samples test was employed to compare means. Results showed there were no gender
statistically significant differences at the ages of 8, 9, 12, 13, 14 and 18 through 19.
226
Female obtained statistically significant higher mean than male at the age 10 year. At
the ages of 11 and 15 through 17, male obtained significantly significant higher means
than female. However the main effect for gender, F (1, 2572) = 1.414, P =.234, did
not exhibit statistical significance. The significant result of Leven’s test was further
227
Table 6.49 Tests of Between-Subjects Effects of SPM scores
Source Type III df Mean F Sig. Partial
Sum of Square Eta
Squares Squared
Corrected Model 2595.849 7 370.836 5.614 .000 .047
Intercept 1361167.501 1 1361167.501 20605.755 .000 .963
AGE 1187.124 3 395.708 5.990 .000 .022
DISCIPLINE 948.301 1 948.301 14.356 .000 .018
AGE * 460.424 3 153.475 2.323 .074 .009
DISCIPLINE
Error 52317.650 792 66.058
Total 1416081.000 800
Corrected Total 54913.499 799
a R Squared = .047 (Adjusted R Squared = .039)
The interaction effect between academic discipline and age was not statistically
significant, F (3, 792) = 2.323, P = .074. There was a statistically significant main
effect for academic discipline, F (1, 792) = 14.356P = 0.000; the magnitude of the
effect size was very small (partial eta squared = .047). There was a statistically
significant main effect for age, F (3, 792) = 5.990, P = 0.000. Post-hoc comparisons
using the Tukey HSD test indicated that only the means score for age 18 year (M =
228
38.13, SD 8.81) was different from the 20 year (M = 40.81, SD 7.84) and from the 21
year (M = 40.81, SD 7.84). The significant result of Leven’s test was further tested as
mentioned earlier. Variance was equal. Furthermore, the magnitude of the difference
(Pallant, 2007).
Table 6.51 Magnitude of gender differences in means score and variability on SPM as
functions of age, geographic areas and discipline.
Age (N 2600; male = 1300 and female = 1300).
age t sig Vr d IQ Point Pc IQs
8 -.663 .508 0.93 -0.01 -0.15 16 85
9 -1.767 .079 0.98 -0.26 -3.90 13 83
10 -3.608 .000 0.86 -0.52 -7.80 8 79
11 2.502 .013 1.23 0.37 5.55 4 74
12 .476 .757 0.65 -0.02 -0.30 7 78
13 .169 .634 0.90 0.07 1.05 9 80
14 .169 .866 0.89 0.03 0.45 8 79
15 2.152 .033 0.79 0.32 4.80 10 81
16 2.115 .036 1.24 0.31 4.65 10 81
17 2.106 .037 0.87 0.31 4.65 12 83
18 .851 .396 0.78 0.12 1.80 9 80
19 .051 .959 1.04 0.01 0.15 11 82
20 -.304 .762 0.72 0.04 0.60 11 82
21 -.732 .465 1.26 -0.10 -1.50 4 83
229
T values for the difference between males and females in each age group, t values for
the difference between males and females in each geographic nature, t values for the
difference between males and females in each academic discipline and t value for the
difference between males and females in the total sample, level of significance,
Cohen’s d scores (the difference between the male and female means divided by the
within group standard deviation; Cohen, 1977), the variance ratios (Vr, i.e. the
variance of the male divided by the variance of the female; Lynn and Irwing, 2004)
Vr’s greater than 1.0 indicate that males had greater variance than females, while Vr’s
less than 1.0 indicate that females had greater variance than males (Khaleefa and
Lynn 2008), IQ point differences between males and females in each age group as
well as in total sample, British percentile equivalents of the means of the male and
female combined on the British norms for the Standard Progressive Matrices collected
in 1979 and given in Raven (1981) , and these converted to IQs, were all calculated.
The results showed three interesting features. First, the British percentile equivalents
are the 16th PC for the 8 year olds (IQ=85), the 13th PC for the 9 year olds (IQ=83),
the 8th PC for the 10 year olds (IQ= 79), and average the 6.7th PC (IQ= 79.4) for the
11-17 year olds. The American percentiles percentile equivalents are the 9th PC for the
18 year olds (IQ=80), the 11th PC for the 19 and 20 years olds (IQ=82), the 4th PC for
the 21 year olds (IQ= 83), and average the 8.75th PC (IQ= 81.75). Overall, the IQs
obtained by the Libyan students range between 74 and 85. The average IQ for the
Second, lack of significant gender differences in total means and in ages 8, 9, 12, 13,
14, 18, through 21. At the age 10 years, females obtained a significantly higher mean
than males. Males obtained statistically higher means than female at ages of 11 and 15
230
through 17. In total, males obtained a higher mean than females by 0.03d = (0.45 IQ)
differences in total means and in all geographic areas means. In total, males obtained
a higher mean than females by 0.04d = (0.60 IQ) points. Concerning academic
discipline analysis also showed lack of significant gender differences in total means
and in each discipline (science & art) means. In total, males obtained a higher mean
Third, the gender difference in variability (Vr) in total sample and within each age
group, geographic areas and academic discipline can be seen from the standard
deviations and variance ratios. At the ages of 8, 9, 10, 12, 13, 14, 15, 17, 18 and 20
years old, females have greater variability than males. In total means score and at ages
of 11, 16, 19 and 21 years, males had greater variability than females. Concerning
geographic areas, results showed males have greater variability than females in total
sample and in each geographic area. Regarding academic discipline, results showed
females have greater variability than males in total sample and in each academic
variability.
231
6.7 Multiple Regression according to independent variables
To investigate the contribution of the independent variables; age, gender, region and
Table 6.52 Stepwise Regression for Independent Variables and the SPM Scores
Model Unstandardised Coffi. Standardised T Sign.
Coffi.
B Std. Error Beta
1- (Constant) 8.838 .545 .670 16.204 .000
age 2.599 .068 38.268 .000
2- (Constant) 7.929 .554 14.324 .000
age, achievement 4.230 .085 .575 26.194 .000
6.218 .001 .404 13.027 .000
Model Summary
Model R R Adjusted Stand. Error of
Square R Square Estimate
1- Age .670 .449 .449 8.276
2- Age, Achievement. .681 .464 .463 8.167
As age was equal in effect to study level, age was used in this analysis. Using the
Gender was not a significant predictor (p = 0.989). Also region was not a significant
predictor (p = 0.986). This showed that both age and achievement were predictors for
232
6.8 The Percentile Ranks of the SPM Score
The sixth research objective was “to compute the percentile ranks for the SPM scores
according to the significant variables”. Since Raven has used the percentiles to test
intelligence percentage and to determine the position of an individual among all the
individuals of the sample and of the same age, we also used the same scale
(percentiles). Age, gender and academic discipline have been taken into account. As
region was not a significant variable, its percentile ranks was not calculated. Table
6.53 showed detailed percentile 2007-2008 Norms for Libya students according to age
Table 6.53 detailed percentile 2007-2008 Norms for Libya students according to age
Percentile Age in years
18 1 2
2
0 6 1 47 7 8 49 52 50 53 54
22 8 1 42 3 48 50 48 51 52
18 1 6 2 5 40
0 3 4 46 46 47 48
6 8 20 6 5 39 41 42 43 43
12 2 7 8 8 29 2 33 35 37 37
0 2 4 5 29 29 32 33
5 9 9 0 2 6 9 19 20 20 25 29 30
N 180 180 180 180 180 180 180 180 180 180 200 200 200 200
To explain these results, a ten years old child gets 33 in the SPM test which is better
than 95% of the same sample at the same age because this score falls in the percentile
95 of the total sample. On the other hand, another 13 years old gets in the SPM test 33
but it is better than 50% in the sample of the same age. The same score, 33, puts an 18
year old in the percentile of 25. A 21 year old goes in the percentile of 10. Table 5.54
showed detailed percentile 2007-2008 Norms for the Libyan students according to age
and gender. Full range of the Libya norms according to age and each SPM score (1 to
233
Table 6.54 detailed percentile 2007-2008 Norms for the Libyan students according to
age and gender.
Age in years
8 9 10 11 12 13 14
Percentile MA FE MA FE MA FE MA FE MA FE MA FE MA FE
95 29 32 35 33 33 37 44 41 42 45 44 48 48 47
90 22 25 27 29 30 37 39 37 39 43 42 42 45 43
75 18 17 18 22 21 28 34 29 35 36 39 37 39 39
50 15 15 16 17 18 23 26 23 29 27 34 32 34 33
25 12 12 13 14 14 18 20 17 25 23 27 27 29 29
10 9 10 11 12 11 12 14 13 17 15 20 19 23 21
5 7 9 10 11 10 10 11 11 14 12 15 16 18 17
n 90 90 90 90 90 90 90 90 90 90 90 90 90 90
Age in years
15 16 17 18 19 20 21
Percentile MA FE MA FE MA FE MA FE MA FE MA FE MA FE
95 48 47 50 47 51 51 52 53 53 52 53 53 55 54
90 46 44 49 46 49 49 50 51 52 49 52 52 53 52
75 41 40 44 42 46 44 47 47 48 47 47 48 48 50
50 37 34 37 34 41 38 40 40 42 43 42 44 42 42
25 32 28 33 30 38 34 34 34 36 36 37 38 36 37
10 24 22 26 22 29 23 30 22 30 29 33 31 33 34
5 21 17 19 19 24 20 23 20 27 24 30 23 29 31
n 90 90 90 90 90 90 90 90 90 90 90 90 90 90
It is apparent from this table that differences between gender in some ages were
significant. For example, at age 10, differences were in favour of females. These
differences were also noticed in the percentiles from 0 to 7points. They vary by 4
points at 95th percentile, 7 points at 90th percentile, 7 points at 75th percentile, 5 points
at 50th percentile, 4 points at 25th percentile, 1 point at 10th percentile and 0 points at
5th percentile. Another example, at age 17, differences were in favour of males. These
differences were also noticed in the percentiles from 0 to 4points. They vary by 3
points at 95th percentile, 4 points at 90th percentile, 2 points at 75th percentile, 3 points
at 50th percentile, 3 points at 25th percentile, 4 points at 10th percentile and 0 points at
5th percentile. Table 5.55 showed detailed percentile 2007-2008 Norms for Libyan
234
Table 6.55 Detailed percentile (2007-2008) Norms for Libyan students according to
age and academic discipline
Percentile Age in years
18 19 20 21
Disciplines SC AR SC AR SC AR SC AR
95 55 51 53 53 54 53 55 51
90 53 48 51 51 52 51 53 48
75 48 44 49 47 48 47 52 46
50 42 38 44 42 45 41 45 40
25 37 34 38 35 38 37 39 36
10 25 29 30 29 35 30 34 33
5 22 20 27 24 27 26 32 30
n 100 100 100 100 100 100 100 100
It can be seen that difference between the percentile scores of Libyan science students
and arts students; e.g. (Sciences student 18 years) is from 7 to 14 points. They differ
points at 50th percentile, 3 points at 25th percentile, 4 points at 10th percentile and 2
Percentile ranks indicated that performance of Libyan students on the SPM test is
lower than subjects from other countries. Assessed against the SPM manual (1988,
1996, 2003, 2004 and 2008) data, Libyan students were below norms given for some
western countries. A comparison of the present data with the SPM norms given for
Taiwan (1989), India (1992), Netherlands (1992), France (1998), Turkey (1993),
Kosice & Slovakia (1987), British (1979 & 1992), Australia (1986), China (1986),
United States of America (1979 & 1992) and Slovenia (1998) and in the 1988, 1996,
2003, 2004 and 2008 SPM manuals according to the same age group, all indicated
that Libyan students were below the norms of the above countries (Appendix 2).
235
6.9 Chapter Summary
This chapter presented the results of the statistical analysis performed on the data
collected for this study. The SPM test was administered to 2600 students; 1800 school
students (900 males and 900 females) and 800 university students (400 males and 400
females). According to region, 900 school students were from cities, whereas the
remaining 900 were from villages. The university students (400 science students and
400 art students) were from two universities located in two cities; Al-Beida and
The overall SPM score means was 32.31with a standard deviation of 11.94 (minimum
scores 6 and maximum 58). Using the British and American percentiles, the SPM
scores were converted to IQ scores. Overall, the IQs obtained by the Libyan students
ranged between 74 and 85. The average IQ for the fourteen tested Libyan age groups
Test-retest, split-half reliability and alpha Reliability (KR 20) procedures were used to
investigate the SPM reliability. Test-retest reliability was .90 (N = 280), split-half
reliability for the total sample was .96 (N = 2600) and Alpha reliability was .94 (N =
2600). The results, in general, were in agreement with previous research and
supported the validity and reliability of the SPM test with Libyan sample.
validity methods were used to establish validity of the SPM test; construct validity
factor analysis showed only one significant factor; Spearman’s “g”. Eigenvalue =
3.47; (69.41% of the variance). In addition, internal consistency results showed strong
positive correlation coefficients (0.50** to 0.85**) between the five subsets and the
236
SPM total score. According to criterion-related validity, analysis showed correlations
Item analysis was carried out for the SPM 60 items (N=2600). The SPM item
difficulty, 11 items appeared to be easy and 7 items appeared to be too difficult. Based
on SPM order of difficulty, results indicated that there were 13 items (three items in
set A , four items in set B, three items in set C and three items in set D, whereas set
(E) followed an order of increasing in difficulty) and one set (D) that did not follow
The results of SPM reliability, validity and item analysis indicated that the SPM test
summary it may provide a promising tool for the measurement of mental ability in
Libyan setting.
Normality testing was carried out and showed that the collected data were normally
distributed which warranted the use of parametric tests. In order to test the differences
between SPM score means, independent sample t-test, one and two way ANOVA
statistical tests were used. In addition, the relationships between SPM test scores and
which independent variable was the best predictor of SPM scores. The investigation
237
1. There was no gender differences on SPM means score in total sample as well as
in ages 8, 9, 12, 13, 14, 18, through 21. However, females obtained significantly
higher SPM means than males at age of 10 years. Whereas, males scored
significantly higher means than female at the ages of 11 and 15 through 17. In
addition, there were no significant gender differences in total means and in each
region means. Also there was a lack of significant gender differences in total
means and in each discipline means (science & art). Thus, the gender variable was
not an important factor affecting the Libyan students’ scores on the SPM test.
12, 13, 14, 15, 17, 18 and 20 years females had greater variability than males. At
ages of 11, 16, 19 and 21 years males had greater variability than females, as well
in total sample. Also males had greater variability than females in total sample and
in each region. Whereas females had greater variability than males in total sample
according to region. Thus, the region variable was not an important factor
affecting the Libyan students’ scores on the SPM test. Whereas there was a
significant difference in regards to age as well as study levels. Thus, the region
variable was not an important factor affecting the Libyan students’ scores on the
SPM test. On other hand, age and study levels were important factors.
238
4. Students from science discipline had significantly higher SPM mean scores than
students from art discipline. Thus, the academic discipline was an important factor
5. Significant coefficients between the SPM scores and students’ SAA ranged from
0.33 to 0.56. In general, all correlation coefficients between SPM and students
6. A multiple regression for Libyan students indicated that both age and achievement
were predictors for SPM results with the age being a better predictor. Whereas
7. The performance of Libyan students on the SPM can be considered lower than
students from other countries. Assessed against the SPM manual (1988, 1996,
2003, 2004 and 2008) data, Libyan students were below norms given for all
developed countries.
The next chapter presents the meta-analysis method. Moreover the outcomes of this
chapter, which are entirely about the SPM test for a Libyan sample, will be compared
239
Chapter seven: META-ANALYSIS
7.1 Introduction
It has became widely accepted that the best way to resolve issues on which there are a
large number of studies is to carry out a meta-analysis. The 1980s and 1990s witnessed a
rapid upsurge of this statistical approach (Anastasi and Urbina, 1997). Meta-analysis
summarizes the results of many quantitative studies that have investigated the same
studies. It delineates specific procedures for finding, describing, classifying, and coding
research studies to be included in a meta-analysis review, and for measuring and analysis
approaches is the emphasis placed on making the review as inclusive as possible. This
technique was first proposed by Glass (1976) and by the end of the 1980s it had become
accepted as a useful method for synthesizing the results of many different studies.
Primary analysis is the original analysis of data in a research study. Secondary analysis is
re-analysis of data for the purposes of answering the original research question with
better statistical techniques, or answering new questions with all data. Meta-analysis
refers to the analysis of analyses; the statistical analysis of a large collection of analysis
results from individual studies for the purposes of integrating the findings. It connotes a
rigorous alternative to the casual, narrative discussion of research studies which typify
240
It contributes in the creation of new knowledge synthesized from existing studies. The
research results have been used for many years and have received a great amount of
Meta-analysis usually involves three major phases; the three “Ps”: preparation,
performance, and presentation. This sequence is the same as for any other type of
research. The project must be planned in advance, then systematically carried out, then
Any statistical procedure or analytic approach can be misused or abused. As Green and
Hall (1984) aptly stated “Data analysis is an aid to thought, not a substitute”. Most of the
• It increases power and leads to stronger conclusions because more studies can be
this can bring effects into sharper focus, particularly when the results of all studies
241
effects of research quality on study findings, meta-analysis is likely to be more
• It can answer questions not posed by the individual studies (Higgins and Green,
2006).
• It can settle controversies arising from apparently conflicting studies (Higgins and
Green, 2006).
meta-analyses built potential mediating factors into their designs rather than
Green, 2006).
subjective. In some cases consensus may be hard to reach (Higgins and Green,
2006).
242
7.4 Literature review
A thorough investigation into the literature revealed three meta-analysis studies carried
out; two published and one unpublished. The two published studies examined the SPM
test in relation to gender differences while the unpublished meta-analysis study examined
In 2004, Lynn and Irwing (2004) conducted a meta-analysis to investigate sex differences
on the progressive matrices. About 57 studies were included and they studied sex
progressive matrices. Results showed that there was no difference among children aged 6
to14 years, and that males obtained higher means than females from the age of 15
The same researchers in 2005 carried out a meta-analysis studying the sex differences in
means and variability on the progressive matrices in university students. 22 studies were
identified and analyzed. This meta-analysis disconfirmed the frequent assertion that there
was no sex difference in the mean and that males have greater variability. It showed that
males obtained a higher mean than females. The SPM tests showed greater variability
among females while the APM studies showed no significant difference in variability.
Abdalla et al. (2002) carried out a meta-analysis in sex and age differences in SPM
results. As all collected studies used the SPM test as a measuring tool, they used the
differences between males and females, but showed statistically significant differences
243
between all age groups; below 13 years group, 13 to 19 years group and 19 to 22 years
group. Higher age groups had higher mean scores than lower age groups.
7.5 Method
Raven’s Standard Progressive Matrices test according to age groups and gender.
Using available databases, an extensive and thorough search for studies to be included in
the meta-analysis has been carried out. Criteria for selection of studies included the
following:
244
• First the study must investigate the area of interest of the meta-analysis.
• Second the study must provide information regarding the research design,
• Third the study must provide sufficient statistical information as SPM mean
scores.
A careful review of relevant studies published on the SPM test from computer databases,
dissertation and bibliographies of review articles produced 44 studies. These studies were
carried out in various countries between 1948 and 2009. From each relevant study the
following data were recorded and coded: (a) Author (b) Country (c) Year of publication;
(d) Population sampled; (e) Age (f) SPM mean’s and standard deviation’s and (g) Sample
size.
245
Turkey 1993 Duzen, et al.,
UK 1989 and 1994 Egan and van den
USA 1948; 1968; 1969; Rimoldl, Tulkin & Newbrough, Burke &
1985; 1986.a.b; 1987; Bingham, Burke, Powers et al., Sidles & Avoy,
1988; 1988; 1986; Jensen et al., Karnes & Whorton, Bart et al.,
1987 & 1988; 1994 Whorton & Karnes, Johnson et al., and
and 1994 Blennerhssett et al.,
Data have been organized into three categories; first based on development status either
developed or developing countries, second based on age groups and third based on
gender.
The key feature of meta-analysis is that each study’s results are translated into an effect
size. Effect size is a numerical way of expressing the strength or magnitude of a reported
summarizing literature (Mills & Airasian 2006). Many effect size statistics are available
and choosing which one to be used depends on the nature of data collected.
The nature of data reported in the SPM tests is numerical continuous data and means
were calculated using the same scale, which was the SPM test itself. The term
‘continuous’ in statistics conventionally refers to data that can take any value in a
specified range. When dealing with numerical data, this means that any number may be
In the presence of continuous numerical data obtained using a same scale, the means of
the studies can be used as a measure of effect size (Higgins & Green 2006). SPSS 16.0
statistics software was used to carry out the statistical analysis of the meta-analysis.
246
SPSS was carried out in the following manner:
standard deviations.
• Third independent sample t-test was used to compute differences between SPM
SPM test means among different studies according to the development status of
SPM test means among different studies according to both; development status of
countries and age groups variables or development status of countries and gender
variables or age groups and gender variables. In addition, this method was used to
SPM scores.
• Sixth To investigate the effect size of the SPM means by calculation of Cohen’s
d, which is equal to the subtraction of the means divided by the mean of the
247
• Seventh To evaluate the variability (variance ratios); Vr + the average of the
• Eighth To convert SPM means score to IQ scores using British and American
which independent variable (development status, age and gender) is the best
7.6 Results
An extensive review of the studies was carried out and data were organized based upon
(a) Development status group; developed countries, developing countries and Libya.
(b) Four age groups; 8-11 years, 12-14 years, 15-17 years and 18-21 years.
Using SPSS, data collected for the meta-analysis was investigated for normality. Both
Kolmogrovo-Smirnov and Shapiro-Wilktests were carried out. The resultant p value was
0.200 and 0.308 respectively. Both values were well above 0.05, which indicated that the
data were normally distributed. This allowed the use of parametric tests to investigate and
Following is the descriptive statistics for the overall collected data for the meta-analysis
248
Table 7.2 Descriptive statistics for means scores of overall collected data and tests of
normality.
Statistic Std Error
Mean 34.9755 .74322
95% confidence Lower Bound 33.5049
Interval for Mean Upper Bound 36.4462
5% Trimmed Mean 35.0786
Median 35.9750
Variance 70.704
Std. Deviation 8.40856
Minimum 12.65
Maximum 52.76
Range 40.11
Interquartile Range 10.4175
Skewness -.271 .214
Kurtosis -.080 .425
Tests of normality
Kolmogorov-smirnov Shapiro-Wilk
Statistic df Significant Statistic df Significant
.062 128 .200 .988 128 .308
scores
16 60
14
50
12
40
10
8 30
6
20
4
Frequency
93
0 N = 128.00 0
N= 128
12
16
20
24
28
32
36
40
44
48
52
scores
.0
.0
.0
.0
.0
.0
.0
.0
.0
.0
.0
scores
Figure 7.1 the distribution for means scores. Figure 7.2 Box plot of scores distribution.
249
Figure 7.3 Normal Q-Q plot. Figure 7.4 Detrended normal Q-Q plot.
.
Figure 7.1 is a histogram showing the SPM scores. They appeared to be normally
distributed. Figure 7.2 showed a box plot. 50% of scores are represented by the
rectangular, while the line inside the box represents the median value, whereas the
whiskers represent the highest and lowest values. Figure 7.3 showed a normal probability
plot (normal Q-Q plot). Here the observed value of each mean is plotted against its
expected value. A reasonable straight line suggested a normal distribution. Figure 7.4
showed the detrended normal Q-Q plot, where the actual deviation of the scores from the
straight line are plotted. Most scores were collected around the zero line with no real
250
7.6.1 SPM means and standard deviations according to the independent variables
Table 7.3 showing SPM score means and standard deviations according to independent
variables.
SPM Scores Development status
Groups (N) sample Mean SD (N) Group
Developed Countries 9514 38.88 8.61 44
Developing Countries 19579 33.10 7.31 70
Libya 2600 32.31 9.02 14
Total 31693 34.98 8.41 128
AGE
8- 11 years (Primary) 8309 27.33 7.63 35
12-14 years. (prep) 9924 34.94 6.71 44
15-17 years. (Secondary) 8991 40.09 5.31 28
18-21 years (University) 4469 40.97 6.21 21
Total 31693 34.98 8.41 128
gender
Males 11961 33.95 8.95 93
Females 11423 33.82 9.00 91
Total 23384 33.88 8.95 184
Based on development status, the developed countries showed the highest mean score
while Libya showed the lowest. Based upon age groups, score means increased as age
increased; the highest score means were achieved by the 18-21 years age group.
According to gender, males were only slightly higher than females when SPM score
Using SPSS, seven meta-analysis procedures were carried out to investigate statistical
significant differences between SPM score means based upon the independent variables,
as follows:
251
7.6.2 Differences in SPM scores
One-way ANOVA was used to compare the SPM score means for the development status
group.
Table 7.5 Post hoc tests multiple comparisons of SPM scores (Tukey HSD)
(I) (J) Mean Std. Error Sig. 95% Confidence Interval
Develop. Develop. Difference Lower Bound Upper Bound
status status (I-J)
Developed developing 5.7818 1.53358 .001 1.9825 9.5810
Libya 6.8222 2.44598 .023 .7626 12.8818
developing developed -5.7818 1.53358 .001 -9.5810 -1.9825
Libya 1.0404 2.33376 .905 -4.7412 6.8220
Libya developing -1.0404 2.33376 .905 -6.8220 4.7412
developed -6.8222 2.44598 .023 -12.8818 -.7626
*The mean difference is significant at the .05 level.
Tables 7.4 and 7.5 showed the effect of development status on SPM means scores.
Subjects were divided into three groups; developed, developing and Libya. There were
statistically significant differences (p =.05) in SPM scores for the three development
status groups: F (2, 125) = 8.157, p = .000. The effect size, calculated using eta squared
(to divide the sum of squares between-groups (1036.658) by the total sum of squares
(8979.386) (Pallant, 2007)), the resulting eta squared value was 0.12, which indicated a
large effect. Post-hoc comparisons using the Tukey HSD test indicated that the mean
252
score for the developed group (M =38.88, SD = 8.61) was significantly different from the
developing group (M = 33.10, SD = 7.31) and from the Libya group (M = 32.31, SD =
9.02). The developing group did not differ significantly from the Libya group. Based
upon these results it was decided to combine Libya with the developing countries group,
so the development status group was categorized into developed and developing
countries only.
One way ANOVA was conducted to compare the SPM means for the age group.
Table 7.6 Comparison of the SPM Mean scores according to age groups
Age Groups (N)sample Mean SD (N) Group
8-11 8309 27.33 7.63 35
12-14 9924 34.94 6.71 44
15-17 8991 40.09 5.31 28
18-21 4469 40.97 6.21 21
Total 31693 34.98 8.41 128
Source Sum of Squares df Mean Squares F. Ratio F. Prob.
Between Groups 3535.138 3 1178.379 26.839 .000
Within Groups 5444.248 124 43.905
Total 8979.386 127
Table 7.7 Post Hoc Tests Multiple Comparisons of SPM Scores (Tukey HSD)
(I) Age (J) Mean Difference Std. Error Sig. 95% Confidence Interval
groups Age (I-J) Lower Bound Upper Bound
groups
8-11 12-14 -7.6188 1.50076 .000 -11.8723 -3.3652
15-17 -12.7633 1.68002 .000 -17.5249 -8.0016
18-21 -13.6450 1.82898 .000 -18.8288 -8.4611
12-14 8-11 7.6188 1.50076 .000 3.3652 11.8723
15-17 -5.1445 1.60184 .019 -9.6846 -.6045
18-21 -6.0262 1.75743 .010 -11.0072 -1.0451
15-17 8-11 12.7633 1.68002 .000 8.0016 17.5249
12-14 5.1445 1.60184 .019 .6045 9.6846
18-21 -.8817 1.91279 .975 -6.3030 4.5397
18-21 8-11 13.6450 1.82898 .000 8.4611 18.8288
12-14 6.0262 1.75743 .010 1.0451 11.0072
15-17 .8817 1.91279 .975 -4.5397 6.3030
253
* The mean difference is significant at the .05 level.
Tables 7.6 and 7.7 show the effect of age on SPM means scores. Subjects were divided
into four age groups. There were statistically significant differences (p =.05) in SPM
scores for the four age groups: F (3, 124) = 26.839, p = 0.000. The effect size was
calculated using eta squared (to divide the sum of squares between-groups (3535.138) by
the total sum of squares (8979.386) (Pallant, 2007)), the resulting eta squared value was
0.39, which indicated a large effect. Post-hoc comparisons using the Tukey HSD test
indicated that there were statistical significant differences between the different age
groups except between the 15-17 years age group (M = 40.09, SD 5.31) and the 18-21
254
7.6.2.3 Difference according to gender
An independent t-test was carried out to compare the SPM score means for the gender
group.
An independent-samples t-test was conducted to compare the SPM mean scores for males
and females. There was no significant difference in scores for males (mean 33.95, SD
8.95) and females, mean = 33.82, SD = 9.00; t (182) = 0.102, p = 0.919). The magnitude
of the differences in the means (mean difference = 0.1349, 95% CI:-2.477 to 2.746) was
very small (partial eta squared = 0.007). SPSS did not provide eta squared values for t-
test. It was, however, calculated using the information provided in the output.
255
7.6.2.4 Difference according to development status and age
Two-way ANOVA test was carried out on the SPM scores for the development status
Table 7.9 Comparison of the development status mean scores of SPM test according to
age.
Development status Age groups (N)sample Mean SD (N) Group
developed 8-11 4223 31.98 6.28 18
developing 4086 22.33 5.61 17
Total 8309 27.33 7.63 35
developed 12-14 2659 40.50 5.93 14
developing 7265 32.35 5.39 30
Total 9924 34.94 6.71 44
developed 15-17 1814 45.92 5.92 8
developing 7177 37.76 4.37 20
Total 8991 40.09 5.31 28
developed 18-21 818 50.22 4.21 4
developing 3651 38.80 3.04 17
Total 4469 40.97 6.21 21
developed Total 9514 38.88 8.63 44
developing 22179 32.93 7.57 84
Total 31693 34.99 8.41 128
256
Table 7.12 Post Hoc Tests Multiple Comparisons of SPM Scores (Tukey HSD).
(I) (I) Age (J) Mean Std. Sig. 95% Confidence
Develop. Age Difference Error Interval
status (I-J) Lower Upper
Bound Bound
Developed 8-11 12-14 -8.52000* 2.10509 .001 -14.1625 -2.8775
15-17 -13.94125* 2.51016 .000 -20.6695 -7.2130
18-21 -18.23750* 3.26544 .000 -26.9902 -9.4848
12-14 8-11 8.52000* 2.10509 .001 2.8775 14.1625
15-17 -5.42125 2.61818 .180 -12.4391 1.5966
18-21 -9.71750* 3.34918 .029 -18.6947 -.7403
15-17 8-11 13.94125* 2.51016 .000 7.2130 20.6695
12-14 5.42125 2.61818 .180 -1.5966 12.4391
18-21 -4.29625 3.61753 .638 -13.9928 5.4003
18-21 8-11 18.23750* 3.26544 .000 9.4848 26.9902
12-14 9.71750* 3.34918 .029 .7403 18.6947
15-17 4.29625 3.61753 .638 -5.4003 13.9928
developing 8-11 12-14 -9.95410* 1.44341 .000 -13.7414 -6.1668
15-17 -15.35826* 1.56851 .000 -19.4738 -11.2427
18-21 -16.39706* 1.63086 .000 -20.6762 -12.1179
12-14 8-11 9.95410* 1.44341 .000 6.1668 13.7414
15-17 -5.40417* 1.37257 .001 -9.0056 -1.8027
18-21 -6.44296* 1.44341 .000 -10.2303 -2.6556
15-17 8-11 15.35826* 1.56851 .000 11.2427 19.4738
12-14 5.40417* 1.37257 .001 1.8027 9.0056
18-21 -1.03879 1.56851 .911 -5.1544 3.0768
18-21 8-11 16.39706* 1.63086 .000 12.1179 20.6762
12-14 6.44296* 1.44341 .000 2.6556 10.2303
15-17 1.03879 1.56851 .911 -3.0768 5.1544
• The mean difference is significant at the .05 level
Tables 7.9, 7.10, 7.11 and 7.12 showed the impact of development status according to age
on SPM mean scores. Subjects were divided into two groups according to the
development status and age was not statistically significant, F (3, 120) = .410, P = .746.
There was a statistically significant main effect for development status, F (1, 120) =
257
74.180 P = .000; the magnitude of the effect size was large (partial eta squared = .38).
The main effect for age, F (3, 120) = 55.135 P = .000; was statistical significance. Post-
hoc comparisons using Tukey HSD test showed that in developing countries statistical
significance differences were found between all age groups except between the 15-17 age
group and the 18-21 age group. In developed countries, statistical significant differences
were found between all age groups except between the 12-14 age group and the 15-17
age group and also between the 15-17 age group and the 18-21 age group. Leven’s
equality test was not significant indicating that group variance was equal. Moreover, the
magnitude of the difference between groups in terms of standard deviation units (Cohen’s
Table 7.13 Magnitude of the development status of countries (developed and developing
countries) in mean scores and variability on SPM as functions of age and total sample
Age Development (N) (N) Mean SD t sig d Vr IQ IQs
status Group sample Point
8-11 developed 18 4223 31.98 6.28 -4.75 .000 1.26 1.25 18.90 96
developing 17 4086 22.33 5.61 85
Total 35 8309 27.33 7.63 91
12-14 developed 14 2659 40.50 5.93 -4.52 .000 1.21 1.21 18.15 93
developing 30 7265 32.35 5.39 81
Total 44 9924 34.94 6.71 87
15-17 developed 8 1814 45.92 5.92 -5.10 .000 1.53 1.84 22.95 95
developing 20 7177 37.76 4.37 83
Total 28 8991 40.09 5.31 89
18-21 developed 4 818 50.22 4.21 -4.80 .000 1.84 1.91 27.60 96
developing 17 3651 38.80 3.04 79
Total 21 4469 40.97 6.21 88
258
Table 7.13 showed the mean scores obtained by developed and developing countries in
each age group, standard deviations, t values for the difference between developed and
developing countries in each age group, t value for the difference between developed and
developing countries within the total sample, level of significance, Cohen’s d scores (the
difference between the developed and developing countries means divided by the within
group standard deviation; Cohen, 1977), the variance ratios; Vr (i.e. the variance of the
developed countries divided by the variance of the developing countries; Lynn and
Irwing, 2004) Vr’s greater than 1.0 indicate that developed countries had greater variance
than developing countries, while Vr’s less than 1.0 indicate that developing countries had
greater variance than developed countries (Khaleefa and Lynn 2008). Finally IQ point
differences between developed and developing countries in each age group as well as
within total sample. Results showed three interesting features. First, the analysis showed
that the British percentile average equivalent was 39th PC for developed countries 8-11
age group (IQ=96), 31st PC for the 12-14 age group (IQ=93), and 37th PC for the 15-17
age group (IQ= 95). The American percentile average equivalent was 39th PC (IQ= 96)
for 18-21 age group. In addition, the British percentile average equivalent was 16th PC
for developing countries 8-11 age group (IQ=85), 10th PC for the 12-14 age group
(IQ=81) and 12th PC for the 15-17 age group (IQ= 83). The American percentiles’
average equivalent was 8th PC (IQ= 79) for the 18-21 age group. Overall, the highest IQ
obtained was 96 for the 8-11 years age group in developed countries whereas the lowest
IQ was 79 for the 18-21 years age group in developing countries. The average IQ for the
developed countries was 95 whereas for the developing countries it was 82.
259
Second, statistical significantly differences in development status of countries in total and
in every age group was in favour of developed countries. In total, developed countries
point).
Third, gender difference in variability within the total sample as well as within each age
group (as can be seen from the standard deviations and variance ratios) showed a large
countries.
Two-way ANOVA was conducted on SPM scores for the development status according
to gender.
Table 7.14 Comparison of the development status mean scores of SPM test according to
gender.
Development status Gender (N)sample Mean SD (N) Group
developed Male 2626 39.47 8.72 23
Female 2704 39.57 9.23 22
Total 5330 39.50 8.86 45
developing Male 9335 32.14 8.31 70
Female 8719 31.99 8.19 69
Total 18054 32.07 8.22 139
Total Male 11961 33.95 8.95 93
Female 11423 33.82 9.00 91
Total 23384 33.88 8.95 184
260
Table 7.16 Tests of Between-Subjects Effects of SPM scores
Source Type III Sum df Mean F Sig. Partial Eta
of Squares Square Squared
Corrected Model 1880.43 3 626.81 8.825 .000 .128
Intercept 174051.85 1 174051.85 2450.522 .000 .932
REGION 1879.56 1 1879.56 26.463 .000 .128
GENDER 5.1090 1 5.109.0 .001 .979 .000
REGION * GENDER .391 1 .391 .006 .941 .000
Error 12784.76 180 71.03
Total 225925.97 184
Corrected Total 14665.19 183
a R Squared = .128 (Adjusted R Squared = .114)
Tables 7.14, 7.15 and 7.16 showed the impact of development status according to gender
on SPM mean scores. Subjects were divided into two groups according to the
development status and gender was not statistically significant, F (1, 180) = .006, P =
.941. There was a statistically significant main effect for development status, F (1, 180) =
26.463 P = .000; the magnitude of the effect size was large (partial eta squared = .13).
The main effect for gender, F (1, 180) = .001 P = .979; did not exhibit statistical
significance. Leven’s equality test was not significant indicating that the group variance
was equal.
261
7.6.2.6 Difference according to age groups and gender
Two-way ANOVA was conducted on SPM scores for age groups according to gender.
Table 7.17 Comparison of the age groups mean scores of SPM test according to gender
Age Gender (N)sample Mean SD (N) Group
8-11 Male 3133 26.09 7.87 27
Female 2918 25.67 8.27 27
Total 6051 25.88 7.99 54
12-14 Male 3373 33.12 7.17 31
Female 3267 34.19 6.81 30
Total 6640 33.65 6.95 61
15-17 Male 3871 39.79 5.45 23
Female 3656 38.95 6.14 23
Total 7527 39.37 5.76 46
18-21 Male 1584 42.60 4.20 12
Female 1582 42.07 4.41 11
Total 3166 42.35 4.21 23
Total Male 11961 33.95 8.95 93
Female 11423 33.82 9.00 91
Total 23384 33.88 8.95 184
262
Table 7.20 Post Hoc Tests Multiple Comparisons of SPM Scores (Tukey HSD)
(I) Age (J) Age Mean Difference Std. Sig. 95% Confidence Interval
(I-J) Error Lower Bound Upper Bound
8-11 12-14 -7.7645 1.27067 .000 -11.0603 -4.4687
15-17 -13.4903 1.36449 .000 -17.0294 -9.9511
18-21 -16.4696 1.69329 .000 -20.8616 -12.0777
12-14 8-11 7.7645 1.27067 .000 4.4687 11.0603
15-17 -5.7257 1.32799 .000 -9.1702 -2.2813
18-21 -8.7051 1.66401 .000 -13.0211 -4.3891
15-17 8-11 13.4903 1.36449 .000 9.9511 17.0294
12-14 5.7257 1.32799 .000 2.2813 9.1702
18-21 -2.9793 1.73671 .319 -7.4839 1.5252
18-21 8-11 16.4696 1.69329 .000 12.0777 20.8616
12-14 8.7051 1.66401 .000 4.3891 13.0211
15-17 2.9793 1.73671 .319 -1.5252 7.4839
• The mean difference is significant at the .05 level.
Tables 7.17, 7.18, 7.19 and 7.20 showed that the effect of age group according to gender
on SPM test scores. The interaction effect between age groups and gender was not
statistically significant, F (3, 176) = .213, P = .887. There was a statistically significant
main effect for age groups, F (3, 176) = 46.763 P = 0.000; the magnitude of the effect
size was large (partial eta squared = .44). Post-hoc comparisons using Tukey HSD test
showed that there were statistical significant differences between the different age
263
groups except between the 15-17 years age group (M = 39.37, SD = 5.76) and the 18-21
years age group (M = 42.35, SD = 4.21). The main effect for gender, F (1, 176) = .028, P
=.867, did not exhibit statistical significance. Leven’s equality test was not significant
indicating that the group variance was equal. Furthermore, the magnitude of the
Table 7.21 Magnitude of gender differences in mean scores and variability on SPM as a
function of age and development status
function of age
Age Gender (N) (N) Mean SD t sig d Vr IQ
Group sample Point
8-11 Male 27 3133 26.09 7.87 .194 .847 0.05 0.95 0.75
Female 27 2918 25.67 8.27
Total 54 6051 25.88 7.99
12-14 Male 31 3373 33.12 7.17 -.599 .552 -0.15 1.05 -2.25
Female 30 3267 34.19 6.81
Total 61 6640 33.65 6.95
15-17 Male 23 3871 39.79 6.14 .491 .626 0.14 1.27 2.1
Female 23 3656 38.95 5.45
Total 46 7527 39.37 5.76
18-21 Male 12 1584 42.60 4.20 .294 .772 0.12 0.95 1.8
Female 11 1582 42.07 4.41
Total 23 3166 42.35 4.21
function of development status
status Gender (N) (N) Mean SD t sig d Vr IQ
Group sample Point
Devel Male 23 2626 39.47 8.72 .104 .917 -0.01 0.89 -0.17
oped Female 22 2704 39.57 9.23
Total 45 5330 39.50 8.86
Devel Male 70 9335 32.14 8.31 -.026 .980 0.15 1.03 2.25
oping Female 69 8719 31.99 8.19
Total 139 18054 32.07 8.22
Function of total sample
Score Gender (N) (N) Mean SD t sig d Vr IQ
Group sample Point
Male 93 11961 33.95 8.95 .102 .919 0.01 0.99 0.15
Female 91 11423 33.82 9.00
Total 184 23384 33.88 8.95
264
Table 7.21 showed the mean scores obtained by males and females in each age group and
in each development status, the standard deviations, t values for the difference between
males and females in each age group, t values for the difference between males and
females in each development status, t value for the difference between males and females
within the total sample, level of significance, Cohen’s d scores (the difference between
the male and female means divided by the within group standard deviation; Cohen,
1977), the variance ratios; Vr (i.e. the variance of the male divided by the variance of the
female; Lynn and Irwing, 2004) Vr’s greater than 1.0 indicate that males had greater
variance than females, while Vr’s less than 1.0 indicate that females had greater variance
than males (Khaleefa and Lynn 2008). Finally IQ point differences between males and
females in each age group and in each development status as well as within total sample
were showed. Results indicated two interesting features. First, lack of significant gender
differences in total and in every age group and in each development status. In total, males
obtained a higher mean than females by 0.01d (0.15 IQ point). In the 8-11 age group,
males obtained a higher mean than females by 0.05d (0.75 IQ point), while among the
12-14 age group females obtained a higher mean than males by 0.15d (2.25 IQ points).
In the 15-17 age group, males scored a higher mean than females by 0.14d (2.1 IQ
points). In the 18-21 age group, males scored a higher mean than females by 0.12d (1.8
IQ points). In developed countries, females obtained a higher mean than males by 0.01d
(0.17 IQ points). Finally, in developing countries males scored a higher mean than
females by 0.15d (2.25 IQ points). Second, gender difference in variability within the
total sample (as can be seen from the standard deviations and variance ratios) as well as
within each age group and within development status was marginally low except in the
265
15-17 age group where males had greater variability than females (Vr = 1.27). In
addition, females achieved greater variability than males (Vr = 0.89) in developed
countries.
Table 7.22 Stepwise Regression for Independent Variable and the SPM Score Means
Model Unstandardised Coffi. Standardised Coffi. T Sign.
B Std. Error Beta
1- (Constant) 22.889 .887 .603 25.793 .000
Age group 4.889 .352 13.863 .000
2- (Constant) 12.032 1.257 9.576 .000
Age group 5.175 .305 .638 16.985 .000
Development status 7.951 .730 .409 10.886 .000
Model Summary
Model R R Adjusted R Square Stand. Error of
Square Estimate
1- Development .603 .363 .361 7.00954
status, Gender
2- Gender. .727 .529 .526 6.03585
Using the Step-Wise method, a significant model emerged (Adjusted R square = 0.526; F
This showed that both age and development status were predictors for SPM results with
266
7.7 Chapter Summary
The overall SPM score means was 34.98 with a standard deviation of 8.41 (minimum
12.65 and maximum 52.76). The developed countries showed the highest mean score M
=38. 88; SD = 8.61 whereas Libya showed the lowest mean score M =32.31; SD = 9.02,
and was slightly lower than developing countries mean score M =33.10; SD = 7.31. The
18-21 years age group showed the highest mean score M = 40.97; SD = 6.21 whereas the
8-11 years age group showed the lowest mean score M =27.33; SD =7.63. Males showed
a slightly higher mean score M = 33.95; SD = 8.95 whereas female mean score was M =
33.82; SD = 9.00. The average IQ score for developed countries was 95, whereas the
Normality testing was carried out and showed that the collected data was normally
distributed which warranted the use of parametric tests. To test the differences between
SPM score means, independent sample t-test, one and two way ANOVA statistical tests
independent variable was the best predictor of SPM scores. The following was
concluded:
1. Significant differences were found between the SPM scores based on development
status. Developed countries achieved higher SPM scores than developing countries
and than Libya. No statistically significant differences were found in SPM scores
between Libya and developing countries. Thus development status was concluded as
267
2. Significant differences were found between the SPM scores based on age groups.
Differences were in favour of older age groups. In addition, SPM scores of the age
groups were statistically different based on development status but not different based
on gender. Thus age was concluded as being an important factor affecting the SPM.
3. Using the British and American percentiles, SPM scores were converted to IQ scores.
IQ score of the 8-11 age group in developed countries was 96, whereas that in
developing countries was 85. IQ score of the 12-14 age group in developed countries
was 93, whereas that in developing countries was 81. IQ score of the 15-17 age group
in developed countries was 95, whereas that in developing countries was 83. IQ score
of the 18-21 age group in developed countries was 96, whereas that in developing
addition, no gender differences were found among the different age groups or
development status. Thus gender was concluded as not being an important factor
5. Variability difference in SPM mean scores was high in each age group based on
mean scores was low in each age group based on gender, except in the 15-17 age
group where variability was high in favour of males. In addition, females achieved
was low, in favour of males. Extremely low variability was found in the total sample.
268
Consequently, results indicated no consistent tendency in variability for a gender
difference.
6. Multiple regression step-wise showed age and development status as predictors for
The next chapter brings together the key research findings and discusses them in context
269
Chapter eight: DISCUSSION AND CONCLUSION
8.1 Introduction
Individuals differ from one another in their ability to understand complex ideas, to adapt
attempts to clarify and organize this complex set of phenomena. Although considerable
clarity has been achieved in some areas, no such conceptualization has yet answered all
the important questions and none commands universal assent (Neisser, 1995).
For historical reasons, the term "IQ" is often used to describe scores on tests of
dividing a so-called mental age by a chronological age, but this procedure is no longer
1930s and 1940s in the United States and Britain to ‘adjust’ test questions to equalize the
scores of boys and girls, because in previous versions of the tests girls had scored higher.
Many tests have been “tailored” to ensure that the scores of boys and girls are equalized
because of the assumption that there are no gender differences in general intelligence
defined as the sum of all cognitive abilities. But this has not been done for the SPM.
The aim of this chapter is to discuss and evaluate the results of the study that have thus
far been presented. The next section, section two, discusses intelligence testing in Libya.
The third and fourth sections describe the SPM test and meta-analysis respectively.
Section five presents an analytical discussion of the entire study. The remaining sections,
270
six till nine, investigate the following points: conclusion of the major findings;
contributions of the current study in the domain of intelligence testing; highlight of study
Though Libya has witnessed a huge development in education within the last 5 decades,
some areas have not benefited from the positive effects of this development. To date, no
single test of intellectual ability has been officially adopted or developed to be used for
accepted for study at various academic establishments and for various jobs in the
vocational sector. Although this might be considered as a good criterion for such
Mental health services in Libya suffer from shortage of staff, psychological services and
a lack of facilities. The general public in Libya know very little about the usefulness,
Mental tests currently used in Libya are misused or partially used. The use of incomplete
tests was likely to bias predictions based on test results and had serious negative
implications for educational or clinical decisions In addition, the use of incomplete test
scores for estimation of mental ability might result in invalid assessment, leading to grave
271
Other aspects that have been affected by lack of intelligence tests in Libya were the
selection of students for different educational programs. In Libya today, a relevant and
accurate selection procedure is essential and in need, not only in the field of education but
also at an intermediate level of training for skilled manpower. Indeed, a clear failing of
the current system could be seen whereby many university graduates were posted to
office work which could be performed to a similar level of competence by less qualified
The problem of adapting intelligence tests to a new setting was by no means uncommon,
as this was a general problem for many developing countries in the past. In addition, if
the aim was to assess the “mental ability” of people in a culture that has yet to develop its
own testing scheme or system, it was necessary to assess what was important for that
In this study, an international culture-fair test was adopted, and standardization was
carried out to achieve local norms This was done because it required less time and effort
than to design a test specifically for Libya (Ezeilo 1978). The Raven’s Standard
Progressive Matrices (SPM) test was employed because it had been widely used and
enjoyed moderately high indices of validity and reliability when used in a wide range of
cultures.
Raven's Progressive Matrices test is an example of a culture-fair test that has been used in
cross-cultural testing Brislin et al. (1973), Kline (1979), Raven (1989), and Murphy and
272
Davidshover (1991) held that Raven's Progressive Matrices was one of the most widely
It is a group test, which can be used with subjects of all language backgrounds and does
not depend to any large extent upon education or prior knowledge of the subjects. In
addition, it is suitable for all ages from the age of 6 years onwards
The Progressive Matrices (RPM, Raven, Raven & Court, 2000, Lynn & Vanhanen 2006)
is the most widely used test of intelligence in numerous countries throughout the world.
One reason for the popularity of the test was that it is non-verbal and can therefore be
applied cross-culturally. Also, it was considered to be the best test of g, the general factor
present in all cognitive tasks. The test was constructed by Raven (1939). Lynn, Allik,
Pullman, and Laidra (2004) have stated that the Progressive Matrices is widely regarded
The Progressive Matrices test has good psychometric characteristics. A huge body of
published research has shown the validity of this test. It has gained widespread
acceptance and use in many countries around the world. No other test has been so
extensively used in cross-cultural studies of intelligence. The RPM test is free from
language and apparently has limited dependence on cultural variables which make it a
273
of this technique are to collect all the studies on the issue, convert the results to a
common metric and average them to give an overall result. Procedures employed in meta-
analysis permit quantitative reviews and syntheses of research literature that address
these issues (Wolf, 1986). An epidemiologist has described meta-analysis as “a boon for
policy makers who find themselves faced with a mountain of conflicting studies” (Mann,
1990).
Any meta-analyst has to address three problems that have been identified by Sharpe
(1997) as the “Apples and Oranges”, “File Drawer” and “Garbage in - Garbage out”
problems.
The “Apples and Oranges” problem refers to the idea that different phenomena are
sometimes aggregated and averaged, where disaggregation may show different effects for
different phenomena. The best way of dealing with this problem is to carry out meta-
analyses, in the first instance, on narrowly defined phenomena and populations and then
attempt to integrate these into broader categories. In the present meta-analysis, this
problem has been dealt with by confining the analysis to studies using the Progressive
The “File Drawer” problem means that studies producing significant effects tend to be
published, while those producing non-significant effects tend not to be published and
remain unknown in the file drawer. It is considered that this should not be a problem for
this present inquiry because in SPM studies results are not regarded as having significant
effect or not. Any result whatever its nature can be significant and deemed publishable.
274
The “Garbage in – Garbage out” problem concerns poor quality studies. Meta-analyses
that include many poor quality studies have been criticized by Feinstein (1995) as
“statistical alchemy” which attempt to turn a lot of poor quality studies into good quality
gold. Poor quality studies are liable to obscure relationships that exist and can be detected
by good quality studies. Meta-analysts differ in the extent to which they judge studies to
be of such poor quality that they should be excluded from the analysis. Some meta-
analysts are “inclusionist” while others are “exclusionist”, in the terminology suggested
in the sense that it included all the studies on the Progressive Matrices among school and
university students that have been located if the strict inclusion criteria apply to them.
The next problem in the meta-analysis was to obtain all the studies of the issue in
concern. This is a difficult problem and one that it is rarely and probably never possible
to solve completely. An attempt to find all relevant studies of the phenomena being
Information Centre (ERIC), Ingenta, Web of Science, Dissertation Abstracts, the British
Index to Theses, and Cambridge Scientific Abstracts for the years covered up to and
including 2009. In addition, active researchers in the field were contacted. In total, the
review of literature covered the years 1948 to 2009. It was considered that, although
finding all relevant studies was a problem for this and for many other meta-analyses, it
was not a serious problem for our present study because the results were sufficiently
275
obvious that they are unlikely to be seriously overturned by further studies that have not
been identified. If this should prove incorrect, other researchers will produce these
A careful and thorough search for published and unpublished studies on the SPM test
using the above searching procedures produced 44 studies. They were carried out in 23
countries; 9 developed and 14 developing. The developed country with the highest
number of SPM studies was the United States (14 studies) while the developing country
with the highest number of SPM studies was India (four studies). The earliest study was
in the USA (1948) while the latest were in Qatar and Oman (2009). The overall sample
consisted of 31693 students aged from 8 years (grade 3) to 21 years (final year university
student). Although many studies were found using SPM, some of them did not fulfil the
inclusion criteria. Some studies lacked sufficient information or results. Some studies did
not carry out the test on all desired age groups. Some studies did not report the mean
values of the SPM test but reported the norms values only. These studies were excluded.
When studies did not report results based on age, different studies carried out on
After a thorough investigation into the criteria that define social classes, it was not
possible to locate a single criterion that can be used in this context. Income, parent’s
occupation, education and culture were all used and the differences between the various
studies were vast. Many researchers have used different criteria when determining social
class. Tulkin and Newbrough in 1968 used occupation and education as factors to
determine social class, while Whorton and Karnes in 1979 used income as a sole factor.
276
Also, Nkaya et al. (1994), used occupation, culture and income as determinants of social
class. They reported that criteria applied to one country may not be applicable in other
countries to define them socially due to the huge social differences between countries. In
addition, the number of SPM studies that reported such criteria was limited. Eventually, it
was decided not to include social status in the meta-analysis for the above mentioned
reasons.
The discussion below has been organized according to the objectives of the study
outlined in chapter there. The primary focus is analysing the applicability of the SPM test
scores within the sample is identified and compared with that found in other countries,
(developed and developing). After that, the effects of independent variables on the SPM
test results are presented. Finally, SPM norms of the Libyan sample are discussed and
Until now, no single test of mental ability has been officially constructed or adopted for
the measurement of the intelligence in a Libyan setting. Lack of use of intelligence tests
in Libya is mainly due to a lack of test experts and information and knowledge regarding
the usefulness and effectiveness of these tests among people who were directly affected
by testing.
277
The present study tried to rectify this problem by investigating and examining the
reported in the literature (Brown 1983; Anastasi and Urbina 1997; Kenneth 1998; Kline
2000; Langdridge 2004; Domino and Domino 2006; Mills and Airasian 2006; Lobiondo-
Wood and Haber 2006) that reliability and validity both were important for judging the
suitability of a test or measuring instrument and both were the most paramount
characteristics of a psychological test. To test the suitability of the SPM test, its
A) Test-retest
Raven provided a test retest reliability ranging from .83 to .93 for several age groups: .88
(13 years and over), .93 (under 30 years), .88 (30-39 years), .87 (40-49 years), and .83
(50 years and over). The results of the present study (0.86 to 0.92) were in accordance to
results reported in the literature, such as Rao (1974), Abdel-Khalek (1988), Nkaya et al.,
B) Split half
exceeded 0.90. The lower reliability was 0.86 with 174 Iranian children (aged 9 years).
The higher reliability was 0.96 (91 psychiatric male patients) (Raven, 2004). This was in
agreement with the results of this study (0.88 to 0.96) and many other studies such as
278
(Raven et al., 2003). Burke and Bingham (1969), Baraheni (1974), Bart et al., (1986),
Powers et al., (1986.a), Duzen (1994), Court and Raven (1995), Ahmad et al (2008) and
The majority of alpha consistency coefficients reported in the literature exceeded 0.95.
Our results (0.85 to 0.96) matched those of Dey (1984), Duzen et al, (1994), Rushton and
Skuy (2000), Rushton et al, (2002), Abdel-Khalek (2005) and Taylor (2007).
When this study results were compared to earlier studies, they appeared quite similar and
provided evidence that the SPM is a reliable measure when used with Libyan students.
These figures indicated a satisfactory reliability for the SPM test with the present Libyan
sample and gave strong evidence for the consistency of the SPM test. Anastasi (1988)
and Pallant (2007) believed that the desirable reliability coefficients should fall in the
range of .80’s or .90’s. The present results generally can be considered as high reliability
coefficients for the Libyan sample and support the reliability of the SPM test.
In addition, one would conclude that the measure of constancy of the reliability is high.
It was particularly noteworthy that the coefficients alpha reliabilities (KR-20) were
higher than the test-retest correlations, which was predictable as a result of the high
279
8.5.1.2 Validity of the SPM test
A) Construct Validity
This is divided into two analyses. First was the factor analysis. The SPM is considered by
distinct from other kinds of intelligence such as verbal knowledge, memory and spatial
ability. Cross-cultural studies, also, confirm the high ‘g’ saturation of the SPM. Some
factor analytic studies, however, suggest that the SPM measures other factors such as
visuo-spatial or ‘K’ factors, spatial ability, or memory, as well as a large ‘g’ factor
(Raven et al., 1977). A number of scholars have contended that while the Progressive
factor. These include Adcock (1948), Keir (1949), Banks (1949), Vernon (1950), Gabriel
(1954), Gustaffson (1984, 1988), who concluded that the SPM measures a reasoning
factor and a further factor that he called “cognition of figural relations”. Hertzog and
Carter (1988) have contended that the SPM contained two factors: verbal intelligence and
spatial visualization. Lynn, Allik & Irwing (2004) identified a general factor and three
further factors that they reported as the gestalt continuation found by van der Ven and
Ellis (2000), verbal-analytic reasoning and visuospatial ability. Further analysis of the
Whatever the number, the evidence relating to factors other than “g” is, according to
Jensen (1980), inconclusive and dubious. He reported that the PM measures “g” and little
280
else, and that the loadings occasionally found on other “perceptual” and “performance”
type factors, independently of “g” are usually trivial and inconsistent from one analysis to
another. In fact, the PM has very meagre loadings on these factors, when “g” is excluded.
Anastasi (1982), on the other hand, stateed that the PM is heavily loaded with a factor
psychologists) but that spatial aptitude, inductive reasoning, perceptual accuracy, and
The outcome of the factor analysis in this study showed the presence of only one factor
which was spearman’s “g”. This result was in agreement with the SPM test 1996 and
2004 manuals, Burke and Bingham (1969), Zager et al., (1980), Abdel-Khalek (1987)
and (2005),
Second was internal consistency. In the present study, there were strong, positive
correlation coefficients, statistically significant between the five sets (A, B, C, D and E)
and total scores ranging from 0.51 to 0.85. This was in agreement with Abdel-Khalek
(1987) and Abdel-Khalek (2005). Overall, construct validity showed good characteristics
B) Criterion-related Validity
This study provided evidence that the validity of the SPM was found to have moderate
significant correlation with students’ academic achievements (SAA) when it was used as
external criterion validity. According to the SPM test manual (2004), the external
281
generally fall in the region of 0.26 to 0.76. Our results were in agreement with Raven et
al. (2004), Tulkine and Newbrough (1968), Mclaurin and Farrar (1973), Sinha (1968),
Baraheni (1974), Sinha (1977), Maqsud (1980), Powers et al., (1986.b), Avoy (1987),
Carver (1990), Majdub (1991) and Laidra et al (2007). The results of the study showed
Nunnally (1972) and Burroughs (1975) argued that item difficulty is required because it
is almost always necessary to present items in their order of difficulty, the easiest first to
give a sense of accomplishment and an optimistic start, and if this is not done a blockage
may occur with many students being unable to progress beyond the first items, while the
more difficult items are placed near the end to prevent students from spending undue
Many researchers believe that test items should include some easy and some difficult
items, but most items should be located in the 20 to 80 percent zone of easiness, Karmel
(1978). Our analysis showed that set A was the easiest set whereas set E was the most
difficult set but noticing that set D was easier than set C (0.01 means percentage
value and 13 items and one set were not arranged in an order of increasing difficulty.
Rushton et al, (2002) and Boben et al. (2007) also showed set D to be easier than set C.
Overall results indicated that the difficulty level of the SPM test employed in the present
282
8.5.2 IQ in Libya
Overall, the mean IQ result obtained from the Libyan students was 81 (85 maximum
mean and 74 minimum mean). The average IQ score of developing countries was 82,
whereas the average IQ score for developed countries was 95. As there was no
Libya was considered as a developing country for the comparison purposes of this study.
The following table (8.1) showed mean IQs for some countries in North Africans and
Table 8.1 mean IQs and average for some developed and developing countries
IQs of North Africans = 80.71
Location Age N Test IQ Reference
North Africa Adults 90 SPM 84 Raveau et al., 1976
Egypt 6–12 129 SPM 83 Ahdel-Khalek, 1988
Sudan 8–12 148 SPM 75 Ahmed, 1989
Sudan 6-9 1683 CPM 81 Khatib et al., 2006
Sudan 9-25 6202 SPM 79 Khaleefa et al., 2008b
Sudan 9 3185 SPM 79 Irwing et al., 2008
Tunisia 20 509 SPM 84 Abdel-Khalek & Raven, 2006
IQs of South Asians = 83.93
Location Age N Test IQ Reference
Bahrain 19-29 100 SPM 81 Khaleefa & AlGharaibeh, 2002
Iran 15 627 SPM 84 Valentine, 1959
Iraq 14–17 204 SPM 87 Abul-Hubb, 1972
Iraq 18–35 1185 SPM 87 Abul-Hubb, 1972
Jordan 11-40 2542 APM 86 Lynn & Abdel-Khalek, 2009
Kuwait 6–15 6529 SPM 86 Abdel-Khalek & Lynn, 2006
Oman 5-11 1042 CPM 87 Khaleefa & Lynn, 2009
Oman 9-18 5139 SPM 82 Abdel-Khalek & Lynn, 2008
Qatar 10–13 273 SPM 78 Bart et al., 1987
Qatar 6–11 1135 SPM 88 Khaleefa & Lynn, 2008d
Saudi Arabia 8-14 3967 SPM 80 Abu-Hatab et al., 1977
Syria 7 241 CPM 83 Guthke & Al-Zoubi, 1987
Syria 7-18 3489 CPM 83 Khaleefa & Lynn, 2008a
Yemen 6–11 1000 CPM 85 Al-Heeti et al., 1997
Yemen 6-11 896 CPM 83 Khaleefa & Lynn, 2008c
UAE 6-11 4496 CPM 83 Khaleefa & Lynn, 2008b
Average IQs for developing countries = 82.95
283
Average IQs of Europeans = 97.77
Location Age N Test IQ Reference
Czech Rep. 5-11 832 CPM 96 Raven et al, 1995
Denmark 5-11 628 SPM 97 Vejleskov, 1968
Estonia 12/18 2,689 SPM 100 Lynn et al., 2002
Estonia 7/11 1,835 SPM 98 Lynn et al., 2003
Finland 7 755 CPM 98 Kyostio, 1972
France 6-9 618 CPM 97 Bourdier, 1964
Germany 5-7 563 CPM 99 Winkelman, 1972
Germany 11-15 2,068 SPM 105 Raven, 1981
Germany 11-15 1,000 SPM 99 Raven, 1981
Germany 6-10 3,607 CPM 101 Raven et al., 1995
Germany 5-10 980 CPM 97 Raven et al., 1995
Iceland 6-16 665 SPM 101 Pind et al., 2003
Ireland 6/12 1,361 SPM 93 Carr, 1993
Ireland 9/12 2,029 SPM 87 Carr, 1993
Ireland 9/12 2,029 SPM 91 Carr, 1993
Netherlands 5-10 1,920 CPM 99 Raven et al., 1995
Netherlands 6-12 4,032 SPM 101 Raven et al., 1996
Russia 14-15 432 SPM 97 Lynn, 2001
Slovakia 5-11 823 CPM 96 Raven et al., 1995
Slovenia 8-18 1,556 SPM 96 Raven et al., 2000
Spain 6-9 854 CPM 97 Raven et al., 1995
Spain 11/18 3,271 APM 102 Albade Paz & Monoz, 1993
Switzerland 6-10 200 CPM 101 Raven et al., 1995
Switzerland 9-15 246 SPM 104 Spicher, 1993
Turkey 6/15 2,272 SPM 90 Sahin & Duzen, 1994
United Kingdom 6-15 3,250 SPM 100 Raven et al., 1998
Average IQs of East Asians = 104.42
Location Age N Test IQ Reference
China 6/15 5,108 SPM 101 Lynn, 1991
China 6/12 269 SPM 104 Geary et al., 1997
China 17 218 SPM 103 Geary et al., 1999
Hong Kong 6/13 13,822 SPM 103 Lynn, Pagliari & Chan, 1988
Japan 9 444 SPM 110 Shigehisa & Lynn, 1991
Taiwan 6/8 764 CPM 105 Rabinowitz et al., 1991
Taiwan 9/12 2,476 CPM 105 Lynn, 1997
Average IQs of South Americans = 97.50
Location Age N Test IQ Reference
Canada 7/12 313 SPM 97 Raven et al., 1996
United States 18/70 625 SPM 98 Raven et al., 1996
284
Average IQs Israel, Singapore& Australia = 95.78
Location Age N Test IQ Reference
Israel 10/12 268 SPM 95 Globerson, 1983
Israel 11 2,781 SPM 89 Lancer & Rim, 1984
Israel 9-15 1740 SPM 90 Lynn, 1994
Singapore 13 337 SPM 103 Lynn, 1977b
Australia 18 6,700 SPM 100 Craig, 1974
Australia 5/10 700 CPM 98 Raven et al, 1995
Average IQs for developed countries = 98.60
Table (7.1) illustrates that the mean IQ result obtained from the Libyan student’s (81 IQs)
was similar to the IQ value of other developing countries in North Africa and South Asia
reported by Lynn and Vanhanen (2002, 2006). This indicated the validity and reliability
of the SPM test and may be considered as an appropriate measure of mental ability for
Libyan students. Lynn and Vanhanen (2006) showed the average IQs for the developing
countries value to be (82.95 IQs), which was similar to the IQ value of developing
countries (82 IQs) obtained from the present meta-analysis. Similarly, Lynn and
Vanhanen (2006) showed the average IQs for the developed countries value to be (98.6
IQs), which was similar to IQ value of developed countries (95 IQs) obtained from the
present meta-analysis which indicated the validity and reliability of meta-analysis study.
It is noteworthy that data from some studies carried out in developed countries reported
the norms to calculate the IQ scores and not the means. Therefore, as the SPM means
were used in this meta-analysis, it was not possible to use such data in the meta-analysis.
during the last 70 years or so (Flynn, 1984, 2007; Lynn & Hampson, 1986). The reasons
for this are not fully understood. Reasons probably lie in improvements in nutrition and
education that have accompanied rising living standards (Lynn, 1990, Ceci, 1991,
285
Benton, 2001), and it can be anticipated that as living standards rise in North Africa and
the Middle East, abstract reasoning ability will also rise. Many people from Galton
(1869) onwards have considered that it would be desirable if intelligence could increase.
Although education appears to improve intelligence, the process by which it does this
remains unknown. Presumably, education teaches problem-solving skills which are used
in intelligence tests. Education in Sudan and other Arab countries tends to concentrate on
rote learning and memorization. In Sudan, Irwing et al., (2008) evaluated the effects of
Abacus Training in mental computation on intelligence assessed with the SPM test.
which information is stored in working memory while other mental operations are
performed, and then retrieved. The training procedure has been described by Hatano
(1977) and Hatano & Osawa (1983). Mental arithmetic is required in a number of tests of
fluid intelligence such as the Progressive Matrices. It has been shown by Carpenter, Just
& Shall (1990) that the Progressive Matrices is largely a mathematical problem solving
test in a design format, requiring the application of five mathematical rules involving
addition, subtraction, arithmetical and geometrical progression. The results suggested that
Further, schools in Libya do not promote problem solving abilities in students as well as
do those in the United Kingdom, teachers are not as well trained, and children in Libya
do not have much experience in carrying out intelligence tests (Attashan and Abdalla
2005). It is possible that the observed group differences are attributable, at least in part, to
the relative novelty of the testing process, as suggested by Stanczak et al. (2001).
286
Lynn & Vanhanen (2002, 2006) proposed three theories in an attempt to explain how
interaction.
The current data are consistent with all three of these. Lynn and Vanhanen presented
arguments that the third hypothesis is the most reasonable. In addition, nine principal
factors have been reported as being responsible for some groups achieving higher IQ
(1) Improvement in education: this has been the most favoured factor, proposed by
Tuddenham (1948), Flynn (1984, 2007), Teasdale and Owen (1994), Flieller (1996,
1999), Greenfield (1998), Jensen (1998), Weede & Kampf (2002), Garlick (2002), Blair,
Gamson, Thorne & Baker (2005), and Meisenberg, Lawless, Lambert & Newton (2006).
Education engulfs many aspects and can be obtained by many various ways, but
education is mostly achieved by attending school. Students from developed countries are
expected to receive better schooling education than their counterparts. Schools affect
promote and permit the development of significant intellectual skills, which develop to
different extents in different children. Also schooling changes mental abilities, including
those abilities measured on psychometric tests. It has been shown that students who have
been in school longer have higher mean scores, which would explain why higher SPM
287
scores are achieved as age of student’s increases. Also, students who attend school
intermittently score below those who go regularly (Neisser, 1995). Also, parent’s
education plays a significant role. Students from families with educated parents scored
higher SPM results than families with uneducated parents (Abdulla 2002).
(2) Increased test sophistication; Tuddenham (1948), Brand (1987), and Jensen (1998).
Students in developed countries attempt such psychometric tests since childhood and gain
some familiarity with such tests, whereas students from developing countries do not
usually attempt such tests and may exhibit some fear in attempting such tests (Abdulla
2002).
(3) The greater cognitive stimulation arising from the greater complexity of more recent
environments provided by e.g. television, media and computer games: Elley (1969),
Jensen (1998), Schooler (1998), Williams (1998), and Sundet, Barlaug & Torjussen
(2004), Essawe (1973). All these would enhance the perception and awareness of
children and improve mental abilities. In addition, cognitive ability increases with age,
probably as a result of learning and brain growth (Lynn, 2008 personal communication).
Abdalla et al., in 2002, Lynn and Irwing 2004 and 2005 studies supported the result
(4) Improvements in child rearing: Elley (1969) and Flieller (1996). Normal child
288
(5) More confident test taking attitudes: Brand (1987) and Brand, Freshwater & Dockrell
(1989). Usually students in developing countries do not have much experience of taking
(2001), Lynn et al., 2008. In addition, in developing countries students are usually
apprehensive and afraid from tests. Also, older students would have more confidence
towards attempting tests than younger students. This is a very important point. Students
with more experience and confidence would logically score higher in the test, even
though their mental ability might not be higher. This factor might be one of the causes of
(6) The “individual multiplier” and the "social multiplier" (Dickens & Flynn, 2001;
Flynn, 2007). The concept of the “individual multiplier” is that intelligent individuals
have a thirst for cognitive stimulation and this increases their intelligence through
positive feedback. The "social multiplier" posits “that other people are the most important
feature of our cognitive development and that the mean IQ of our social environs is a
potent influence on our own IQ” (Flynn, 2007). This would explain that children brought
up in a university town should have higher intelligence that those without this advantage,
because the high intelligence of the professors will enhance the intelligence of the
population.
(7) Improvements in nutrition: Lynn (1990a, 1993, 1998), Jensen (1998), Colom, Lluis-
Font & Andres-Pueyo (2005), and Arija, Esparo, Fernandez-Ballart et al. (2006).
Prolonged malnutrition during childhood has long-term intellectual effects. The effects
289
may well be indirect. Malnourished children are typically less responsive to adults, less
motivated to learn, and less active in exploration than their more adequately nourished
(8) Smaller family size (Sundet, Borren & Tambs, 2008). Smaller families means less
economical burden. Parents would be able to provide better education, nutrition and child
needs. Child rearing would be easier and more focused. In the United States and Europe
it has invariably been found that the relation between intelligence and family size is
negative, i.e. children with large numbers of siblings have lower IQs than children in
small families (Abdel-Khalek, Lynn, 2008). Moreover, Lynn (1996) summarized results
of 17 studies that reported this negative relationship. The correlations varied between -
0.19 and -0.34 with an average of -0.26. A theory to explain these results positing that
family size has causal effects on intelligence was advanced by Lynn (1959). This theory
proposed that parents give more attention to children in small families and this enhances
children’s intelligence.
Two theories have been advanced to explain these results. These are:
• The confluence theory of Zajonc’s (1976, 1983, 2001a) states that the child’s IQ
is partly determined by the attention the parents and siblings give to it. This
explains the negative relation between family size and intelligence, because the
smaller the number of children in the family, the greater the amount of attention
they are likely to receive from their parents. The result of this will be that children
in small families will have higher average IQs than those from large families.
290
• The resource dilution theory of Blake (1981) and Downey (2001) proposes that
“parental resources are finite and that as the number of children in the family
increases, the resources accrued by any one child necessarily decline” (Downey,
2001). The theory is similar to the confluence theory but broader in so far as it
material, financial and cultural quality of the home, parental treatment of children,
in so far as it purports to explain the negative relation between sibship size and
(9) Heterosis: Jensen (1998, p.327) suggested heterosis (hybrid vigor) as a possible
contributor to the Flynn effect. Heterosis is the mating of two individuals from
different ancestral lines i.e. the marriage of two individuals that are from different
or Asian American. Jensen argued this is wide spread in the United States as a result
of immigration from many different countries. Mingroni (2004) had further argued
this theory.
The author agrees with the above mentioned factors and stresses the importance of
education as a major factor. In addition, economy plays a pivotal role. IQ scores are
countries will increase by about 3 points a decade with further economic development
291
The above mentioned factors explain the reason why IQ in students from developed
countries is higher than their counterparts. Students from developed countries have
environmental advantages from better nutrition, health, education, and sometimes smaller
family size.
On the other hand, human intelligence, like height, is influenced by numerous genetic
evidence of genetic factors associated with IQ, but the extent is still controversial. In
environment. This has been shown in studies conducted in twin’s studies and adoption
The Progressive Matrices is a useful test to examine sex differences in intelligence. The
issue of whether there are any sex differences on the Progressive Matrices has frequently
been discussed and it has been virtually universally concluded that there is no difference
in the mean scores obtained by males and females. This has been one of the major
foundations for the conclusion that there is no sex difference in reasoning ability or in g,
The first statement that there is no sex difference on the test came from Raven himself
who constructed the test and wrote that in the standardisation sample “there was no sex
difference, either in the mean scores or the variance of scores, between boys and girls up
292
to the age of 14 years. There were insufficient data to investigate sex differences in
ability above the age of 14” (Raven, 1939, p.30). The conclusion that there is no sex
The results of the present study and meta-analysis supported this hypothesis and were in
agreement with previous studies of Eysenck (1981), Court (1983), Mackintosh (1996),
Jensen (1998), Rushton et al. (2002), Pind et al. (2003), Lynn et al. (2004), Abdel-Khalek
and Lynn (2006), Taylor (2007), Kaleefa and Lynn (2008), Khaleefe et al. (2008),
Ahmad et al. (2008) and Abdal-Khalek and Lynn (2009). They examined the hypothesis
that there is no gender difference on the Progressive Matrices and that, as Mackintosh
(1998a) put it the gender difference on the Progressive Matrices is “0.15 to 2.1 IQ points
The assertion that there is no gender difference in average general intelligence has been
made repeatedly since the early decades of the twentieth century. Terman (1916) and
Spearman (1923) asserted that there is no gender difference in g. Jensen (1998) calculated
gender differences in g on five samples and concluded that, “no evidence was found for
gender differences in the mean level of g”. Similarly “there is no gender difference in
Some studies found no sex differences in SPM scores for subjects at younger age e.g.
Tulkin & Newbourgh, (1968) with fifth and sixth grade students; Powers et al., (1986.b)
with sixth and seventh grade students; Sidles and Avoy, (1987) with seventh grade
students; Persaud (1987) and Zeidner (1988) with seventh grade students. Sex differences
in Libya are similar to those found in many economically developed countries, i.e. there
293
are no significant differences at the ages of 8 and 9 years. Girls obtained a significantly
higher mean than boys at the age 10 years, supporting the developmental theory that girls
mature more rapidly than boys at this age, advanced in Lynn (1994, 1999, 2004, 2005).
At 11 years, males scores were statistically higher than female’s scores. At 12, 13 and 14
years, there were no differences in SPM scores between males and females. At the ages
of 15 through 17, boys obtained consistently higher means than girls. These higher
means were statistically significant. This again supports the developmental theory that
boys obtain higher average means at these ages. These age trends are consistent with
numerous studies from western developed countries such as Irwing and Lynn in 2005. At
These are interesting results because they show that sex differences in Libya are similar
sometimes been made that girls in traditional societies are socially handicapped and this
impairs their intellectual development, and that as females have become more
their cognitive abilities improve. This theory receives no support from the present results.
This significant gender by age interaction is explained by Lynn (1994) and Lynn &
Irwing (2005). It is because boys and girls mature at different rates. Boys and girls have
the same development and IQ up to about 11 years. Then girls accelerate in the "growth
spurt”. Than at about age 16, girls cease to grow but boys continue to grow physically
294
In the present study the gender difference in variability (Vr) in total sample and within
each age, geographic nature and academic discipline can be detected from the standard
deviations and variance ratios. At the ages of 8, 9, 10, 12, 13, 14, 15, 17, 18 and 20 years
old, females have greater variability than males. In total sample and at ages of 11, 16, 19
and 21 years old males have greater variability than females (note that Vr greater than 1.0
indicate that males have greater variance than females, while Vr less than 1.0 indicate
that females have greater variance than males). Concerning geographic areas, results
showed that males have greater variability than females in total sample and in each
geographic area. Regarding academic discipline, results showed that females have greater
variability than males in total sample and in each study academic discipline.
In regards to variance in the meta-analysis, there were small differences between males
and females in total sample, in favour of males. In the different age groups the variability
was also small except in the 15-17 age groups, in favour of males. In addition, females
had greater variability than males in developed countries. The age groups 12-14 and 18-
showed small variability in favour of males. It has been repeatedly asserted that males
have greater variability of IQs than females, but there are a number of contrary studies.
The present study and meta-analysis results add to these in showing no consistent sex
development status, this study showed a large variance in favour of developed countries
in all age groups and in total sample. These overall results showed no consistent tendency
295
Gender differences in variance were examined because it has frequently been contended
that males have greater variability than females. This assertion was made in the early
years of the twentieth century by Havelock Ellis (1904), Thorndike (1910) and Terman
(1916). This difference in variability was proposed by these early writers to explain why
men are so greatly over-represented among geniuses. As there was no sex difference in
general intelligence, a greater variability among males entailing more males among those
with very high intelligence (as well as more males with very low intelligence) was
Thorndike (1910) put the theory as follows: “The trivial difference between the central
tendency of men and that of women which is a common finding of psychological tests
and school experience may seem at variance with the patent fact that in the great
achievements of the world in science, art, invention, and management, women have been
by far excelled by men. One who accepts the equality of typical representatives of the
two sexes must assume the burden of explaining this great difference in the high ranges
within the male”. Thorndike examined test data on variability and concluded that men are
Terman (1916) also discussed the question and wrote that “it is often said that women are
grouped closely around the average, while men show a wider range of distribution”.
However, in his data for 1000 children aged 6 to 14 years he found no difference between
boys and girls in variability. The greater male variability was reaffirmed by Eysenck
(1981, p. 42) and recently by Deary, Irwing, Der and Bates (2007). However, not all
296
studies have found greater male variability, including a meta-analysis of the performance
of college students on the Progressive Matrices by Irwing and Lynn (2005). This study
showed that there was no consistency in variability between males and females in SPM
found between cities and villages, or between coastal, mountain and dessert villages or
between main and secondary cities. This can be attributed to the urbanisation process of
Libya. According to the first general National General Censuses of 1954 only 25% the
total population were classified as urban settlers. However, within just four decades the
proportion of urban population had increased substantially to 90% of the total population
(Figure 1).
This dramatic and quick increase of urban population on the expense of rural
population has led some analysts to classified Libya as one of the most urbanised
297
countries in the world (Kezieri, 1995). This situation has also affected the specific
already been replaced by urban lifestyle. Many rural populations are now engaged in
urban life style such as jobs and occupation activities, and using modern household
number of analysts have pointed out that the nature of rural areas and communities are
now being replaced by urban features (Attir and Al-Azzabi, 2002; Kezeiri, 1995). This
present study failed to detect significant differences between rural and urban students.
Both urban and rural students have similar schools, level of teacher training and
facilities. Moreover, all mainstream level schools in Libya follow the same national
curriculum. This fact can be directly associated with a similar level of cognitive
Flynn effect stated that IQ is directly related to education. As both rural and urban
students were receiving the same level of education, no differences in IQ were detected.
For the purposes of this study, age was equivalent to study level. Statistically significant
differences in SPM mean scores was found. In the main study, analysis showed that the
British percentile equivalents of the means of the ages combined on the British norms for
the SPM collected in 1979 and given in Raven (1981) are the 16th PC for the 8 year olds
(IQ=85), the 13th PC for the 9 year olds (IQ=83), the 8th PC for the 10 year olds (IQ= 79),
and average the 6.7th PC (IQ= 79.4) for the 11-17 year olds. The American percentiles
percentile equivalents are the 9th PC for the 18 year olds (IQ=80), the 11th PC for the 19
and 20 years olds (IQ=82), the 4th PC for the 21 year olds (IQ= 83), and average the
298
8.75th PC (IQ= 81.75). Overall, the IQs obtained by the Libyan students ranged between
74 and 85. The average IQ for the fourteen tested Libyan age groups 8 through 21 was
81.
Similarly, in the meta-analysis, older students achieved higher SPM scores than younger
students. (8-11 age group IQ 91, 12-14 age group IQ 87, 15-17 age group IQ 89, 18-21
As the age of the student increased, naturally the study level increased. All tested students
in a certain grade were all in the same age e.g. all tested 3rd grade students were 8 years
of age. That was done to ensure all students has the same academic experience, re-sit
students usually had more academic experience than first time students.
These results were in agreement with other studies. Abdalla et al., in 2002, Lynn and
Irwing 2004 and 2005 studies supported the result showing that IQ scores increased with
age. It is suggested that cognitive ability increases with age, probably as a result of the
In addition, greater cognitive stimulation arises from the greater complexity of more
recent environments provided by e.g. television, media and computer games: Elley
(1969), Jensen (1998), Schooler (1998), Williams (1998), and Sundet, Barlaug &
Torjussen (2004), Essawe (1973). All these would enhance the perception and awareness
In a representative sample for the entire population from childhood to adulthood one
would expect to find a progressive increase in the SPM scores with age groups. Previous
299
studies reported the increase of SPM scores with younger subjects e.g. Baraheni (1974),
Sinha (1977), Pind et al. (2003), Lynn et al. (2004) and Khelefeeh and Lynn (2009).
that there was a tendency for the SPM scores to vary inversely with age especially 15, 16
and 17 years. Burke and Bingham (1969) found that the performance on the SPM was
negatively related to age for a sample of 91 patients with age ranged from 19 to 59 years.
Also, Byrt and Gill (1973) who standardized the SPM test in Ireland concluded that
intelligence does not remain constant from age 15 throughout the adulthood but rises and
fall in different groups depending upon education, training or intellectual activities which
In Iran, Baraheni (1974) reported that intellectual functions tapped by the Progressive
Matrices reached a maximum level in an Iranian group by age 15 and that at a higher age
level the test failed to differentiate age groups. Burke (1985) found that the score of the
SPM decreased with increasing age, his result was based on the screening of 500
vocational counselling and 2992 psychiatric patients. Finally, in study carried out in
Jamaica, Persaud (1987) suggested that the decline of intellectual capacity of women
from the age of 26 years onwards on the SPM can be attributed to age.
An interesting finding in this study was that there was an increase in SPM scores until 19
years of age. After that, an almost steady plateau in SPM results until 21 years of age was
found; there were no differences in SPM scores after 19 years of age. This was consistent
with numerous SPM data sets reviewed in Raven (1939), Raven (1941), Raven (1986),
300
Raven (1989), Raven, et al., (1995), Raven, et al., (1996), Raven, et al., (1996a), Raven,
(1998), Raven, et al., (2000). Thus, fluid intelligence reached its plateau around the age
of 20.
mean scores in favour of the scientific academic discipline in all four university study
levels. This may be attributed to the familiarity of science students with some courses in
science discipline which deal with abstract reasoning. One of the major problems in the
education system in Libya, particularly in the art discipline, is that the method of learning
in this academic discipline relies heavily on rote memorisation, and little attention is paid
on reasoning or abstract thinking. It seems that rote learning is a factor that the SPM
The findings of this study is similar to Shanthamani’s (1970) who found that science
students scored higher than art students on Alexander’s Battery for intelligence and also
agreed with Sinha (1977) who found that science students scored higher on the SPM in
an Indian sample and (Attashan and Abdalla 2005) in his unpublished data.
According to the SPM test manual (2004), the external criterion commonly adapted in
correlations with academic achievement tests generally fall in the region of 0.20 to 0.60
(Raven et al., 2004). This study showed a correlation of 0.33 to 0.56. This was in
agreement with Tulkine and Newbrough (1968) Mclaurin and Farrar (1973) Sinha (1968)
301
Baraheni (1974) Sinha (1977) Maqsud (1980) Powers et al., (1986.b) Avoy (1987)
Carver (1990) Majdub (1991) Laidra et al (2007). The average correlation of these
studies and others was found to range between 0.37 to 0.49 (see table 4.6). A possible
explanation would be that of Andrich, & Styles, (1994). They believed that Progressive
Matrices test contains material not taught directly in schools and yet shows substantial
The results of this study showed that age and achievement were predictors of SPM results,
with age being the best predictor. As age and achievement increased, SPM results
increased. Similarly, in the meta-analysis, results showed that SPM score means were
predicted by age and development status; age was also the best predictor. SPM scores
increased as age increases and as development status improved. Our results were in
agreement with previous studies carried out by Pind et al. (2003) and Taylor (2007). This
confirms earlier results that gender and region in the main study and gender in the meta-
A number of studies have indicated that students from developing countries performed
less well than students from developed countries on the SPM test. According to the SPM
(1996) manual, an Australian study by de Lemose (1989) noted a tendency for students
from non-English speaking cultures, such as Southern / Eastern European and Middle
Raven et al., (2004) reported that some groups lagged behind the British norms such as
groups from Brazil, Ireland and black and Native Americans within the USA. In all
302
countries, norms of children from less privileged socio-economic backgrounds and rural
area are lower than their counterparts. They added that the explanation most commonly
offered for these differences was that the test did not engage the concerns of people from
unfamiliar to them.
The range of difference between the percentile scores between the Libyan students and
the British sample aged (13 years) was from 7 to 14 points. They varied by 7 points at
95th percentile, 10 points at 90th percentile, 9 points at 75th percentile, 10 points at 50th
percentile, 12 points at 25th percentile, 14 points at 10th percentile and 13 points at 5th
percentile. E.g. if a Libyan student aged 13 years scored 33 on the SPM test, he would
score in the 50th percentile according to the Libyan norms. However, according to the
SPM manual (1988, 1996 and 2008) he would score in the 10th percentile of the British
norms. Also, if a Libyan student aged 14 years scored 47 he would be in the 95th
percentile of the Libyan norms, 50th percentile according to the Slovenia, Australian
norms and British. These two examples illustrated the misuse and misinterpretation of
intelligence tests used now in Libya due to the use of standardised western norms instead
of local norms (please refer back to chapter three for more discussion).
The lower scores of the Libyan sample in the SPM test with respect to developed
countries norms were expected. All studies conducted in developing countries determined
that individuals from developed countries scored higher than individuals from developing
countries in the SPM test. The meta-analysis which was conducted in chapter seven in
this study revealed that there was a significant difference between students from
303
developed countries and students from developing countries in the SPM mean scores (df
rearing, social income, confidence in test taking, family size, the “individual multiplier”
and “social multiplier” and heterosis. In addition, amount of previous familiarity with test
material and testing situation may have had a role. For almost all of the Libyan students
Regarding education in Libya, the human development report in 2002 in Libya stated an
obvious deficiency in teaching skills among teachers. The average is 30 or more students
per teacher. Also, school building and facilities were deemed out-dated and inappropriate
for carrying out the teaching process. This reached a maximum of about 70% of schools
in some places. Up-to-date computer programs are not available in 89% of the school
(p327). Nutrition in Libya shows a lack of strategic planning on the national level, with
there a huge dependence on imported food (p378). According to the General Authority of
Information in 2006 Average family size in Libya was 6 individuals. 18% of the families
contained more than 10 individuals, whereas 50% of the families had more than 5
individuals. The average income in Libya was 2618 Libyan Dinar (Equivalent to £1300
pounds) per year. Also, traditions in Libya dictate that marriages are done from within the
304
The percentile ranks of the SPM scores for the Libyan sample in this study emphasized
the need for separate norms for age groups, male and female students and art and science
discipline students.
In this chapter we have examined and evaluated the findings of this study. The aim is to
adopt a mental ability test suitable for a Libyan population. The lack of such complete
and useful means of testing in the third world, generally, and Libya in particular is
sufficiently an indicator of the vitality of this research study. As stated in section 7.2, the
current employed mental tests in Libya share the feature of incompleteness. The test does
not cover the whole range of test-items that is meant to cover. As a solution of this
problem the current study presents the SPM test as an alternative. Its psychometric
characteristics place it in the top of the list of appropriate intelligence tests in Libya.
Since the whole study is made up of two parts: main study and meta-analysis, the
1. It showed that intelligence measured by the SPM has validity in a new country
(Libya) in which the SPM has not been used until now.
2. The overall SPM score means for the Libyan sample was 32.31 with a standard
deviation of 11.94 (minimum scores 6 and maximum 58). This was considered lower
than students from developed countries but similar to those from developing
countries.
3. The IQ score was 81 for the fourteen, from 8 to 21, Libyan age-groups.
305
4. No gender significant differences were found on SPM means score in total sample as
well as in ages 8, 9, 12, 13, 14, 18, through 21. However, females obtained
significantly higher SPM means than males at age of 10 years. Whereas, males scored
significantly higher means than female at the ages of 11 and 15 through 17. In
addition, there were no significant gender differences in total means and in each
region means. Also there was a lack of significant gender differences in total means
and in each discipline means (science & art). Thus, the gender variable was not an
important factor affecting the Libyan students’ scores on the SPM test.
Thus, the region variable was not an important factor affecting the Libyan students’
7. Significant differences were found between the SPM scores based on age as well as
study levels. Thus, age and study levels variables were important factors affecting
8. Students from the science academic discipline had significantly higher SPM mean
scores than students from the art discipline. Thus, the academic discipline was an
important factor affecting the Libyan students’ scores on the SPM test.
306
9. All correlation coefficients between SPM and students (SAA) were statistical
10. Age and achievement were predictors for SPM results with age being a better
B) Meta-analysis conclusions:
1. The SPM test was valid in a different culture (Libya) from economically developed
western nations.
2. Developed countries achieved higher SPM scores than developing countries and than
Libya. No statistically significant differences were found in SPM scores between Libya
and developing countries. Thus development status was concluded as being an important
3. The IQ score was 95 for developed countries and 82 for developing countries.
4. SPM scores increased as age increased. In addition, SPM scores of the age groups
were statistically different based on development status but not different based on gender.
addition, no gender differences were found among the age groups or development status.
7. Age and development status were predictors for SPM results. Age was a better
predictor.
307
8.7 study contributions
Following are the contributions of this study to the intelligence testing in Libya:
• Providing norms for the (SPM) test for use, in conjunction with examination
• Providing the means to estimate levels of intelligence since our society lacks these
gender, age groups and different locations such as rural and urban areas.
This study was carried out to standardize the British mental ability test; administering the
Raven's Standard Progressive Matrices (SPM) test to a sample consisting of School and
308
University students (8 to 21 years) from the eastern province in Libya during the year
2007 – 2009. To provide an intelligence test that best suited a Libyan setting.
It should be taken into account that the goal was not to change or underestimate the
achievement, but to offer researchers and psychologists a mental ability test to be used in
procedures.
general cognitive ability) that the SPM measures such as social intelligence, emotional
scientist. All these elude the ‘g’ straightjacket. Also, IQ tests do not measure intelligence
directly but those qualities that are thought to reflect it. As a consequence, within each
IQ tests are criticised on a number of other levels. For example, they are validated
primarily in terms of their correlation with educational achievement. But this ignores the
opportunity, and motivation. Another interesting phenomenon is the fact that a person can
(people who score high on one such subtest are likely to be above average on others as
309
well), individuals rarely perform equally well on all the different kinds of items included
in a test of intelligence. One person may perform relatively better on verbal than on
spatial items, for example, while another may show the opposite pattern.
These complex patterns of correlation can be clarified by factor analysis, but the results
of such analyses are often controversial themselves. Spearman has emphasized the
importance of a general factor, “g”, which represents what all the tests have in common,
while Thurstone focused on more specific group factors such as memory, verbal
intelligence on test scores alone is to ignore many important aspects of mental ability.
Other mental abilities defined broadly but not measured by intelligence tests include
Proponents of general intelligence posit that intelligence is innate and heritable single and
measurable, and does not change, nor is it affected by culture or environment. The
evidence based testing of “g” using standardized tests validates its use as a reliable
predictor of student success. There is a huge amount of evidence that “g” is a reliable
1992; Lynn & Vanhanen, 2002, 2006; Mackintosh, 1998b). Brody, N. (1992).
This suggests that all mental test of cognition (verbal, mathematical, spatial visual, and
memory) measure “g”, a similar single factor. It is the “g” factor that makes mental tests
intelligence, proponents of plural intelligences (Gardner, 1983, 1993, and 1995) suggest
310
“g” measures only verbal-linguistic and mathematical-logical intelligences, omitting
argued that intelligence is comprised of three abilities; and Gardner’s original theory
multiple intelligence (MI) theory posits intelligence is plural, culturally bound, varies in
strength, develops at various rates, and is immeasurable using psychometric tests. His
work with retarded and savant children and adults with brain damage led to the
Group comparisons of IQ are problematic. Attempts have been made to make ‘culture-
fair’ or ‘culture-free’ tests, as if such a thing were possible, to allow comparisons of ‘g’
between people from very different societies. But “culture fair” is not valid in all settings
in which the SPM was conducted. When Lev Vygotsky tested Russian peasants back in
the 1930s, he found that answers that seemed logical to an urbanite were responded to
It has become well established that intelligence has increased in a number of countries
during the last 80 years or so. An early study by Tuddenham (1948) reported that the IQ
of American conscripts increased by 4.4 IQ points a decade over the years 1917-1943.
311
Subsequent studies confirmed that IQ increases have occurred in the United States,
Scotland, England, Japan and several countries in continental Europe (Scottish Council
for Research in Education, 1949; Cattell, 1951; Lynn, 1982; Flynn, 1984, 1987, 2007;
Lynn & Hampson, 1986; Lynn, Hampson & Mullineaux, 1987). Most of these IQ
increases have been reported in the economically developed nations and very few
2007), Dominica (Meisenberg, Lawless, Lambert & Newton, 2005), Kenya (Daley,
Whaley, Sigman, Espinosa & Neuman, 2003), and Sudan (Khaleefa & Lynn, 2009).
Within the last years, it was noticed that the SPM test was failing to discriminate above
the 75th percentile among adolescents and young adults living in societies with a tradition
of literacy. This happened due to the dramatic and unexpected international increase in
SPM scores over the years. This was evident in societies where individuals have been
tested by the SPM several times and were acquainted with such tests. As our tested
sample in Libya did not carry out the SPM test before and they had no past experience
with mental testing, the SPM test deemed appropriate to be used in this situation. Also,
the ceiling effect exhibited in tested developed countries was not evident in developing
countries. The highest score obtained in our sample was 58 correct items out of 60. The
ceiling effect means that a number of test takers get all the answers right and have
therefore reached a ceiling. It can be inferred that these would have been able to answer
more difficult answers correctly. Ceiling effects have been observed in the Progressive
Matrices as average scores have increased during the last 70 years and increasing
proportions have reached the ceiling. To deal with this problem, Raven has added some
312
more difficult items to the Standard Progressive Matrices in a new version called the
matrices test in Libya. Many difficulties were faced during this process, in which
difficulties. The wide landscape covered in this study and huge financial obligations were
not easy to be met. In this respect, the researcher would like to suggest the following:
1. It is hoped that result of this study will help Libyan researchers and psychologists to
develop a better understanding of mental test and their use, misuse and limitations
and to stop testing and labelling of children according to scores and norms obtained
effort will stimulate similar studies in the area of psychological testing in Libya today
2. The results of this study are encouraging enough to start the testing movement in
Libya by conducting more studies and adapting more psychological tests. Culture
fair tests for intelligence such as the SPM test which were constructed in developed
Therefore, because of the need for psychological tests, at least one test in each of the
313
specialized psychological department in the Ministry of Education and Ministry of
3. Due to the significant differences noticed in this study between students according to
academic discipline and age, it is recommended to use separate norms for each group.
measure of mental ability. Since the SPM test is considered as a measure of nonverbal
ability, therefore it should be always used in conjunction with other test of verbal
5. This study indicated that the SPM has high reliability and validity. Therefore, it
seems that the SPM is capable of identifying higher achieving students and thus can
6. It is recommended to use the SPM test in Libya to identify gifted students, and
students with low mental ability or with low academic achievements. The SPM has
been shown to be one of the best predictors of both high and low educational
attainments (Brody, 1992; Lynn & Vanhanen, 2002, 2006; Mackintosh, 1998b). It
education has now largely superseded this approach. It is argued that the advantage of
this is that gifted students can be given accelerated education. Conversely, students
with low mental ability or with low academic achievements can be identified and put
in classes for slow learners and taught at a slower pace suitable for their ability.
314
7. As Libyan children fail to develop reasoning skills while they are in school, as
compared with British children, it may be that the solution to this problem would be
The SPM is also used for job selection, i.e. to identify those with the ability to perform
well in cognitively demanding occupations, and could usefully be introduced in Libya for
this purpose.
This study has provided a useful basis for further studies. Based on the limitations and the
findings of this study the following related topics are recommended for further research:
1) Carry out the SPM test on age groups that were not tested in this study;
norms for these groups not tested in the present study, especially for a
representative sample of adults of different ages and gender. This would be useful
for job selection, and to see whether among adults men have higher a higher
(2004).
2) Carry out and standardization of other mental tests as SPM Plus, coloured PM
tests and advanced SPM tests in Libya. A standardization of these tests would
provide additional useful norms for Libya. The Colored PM (CPM) is suitable for
young children aged 5-10 years, and the SPM Plus and advanced PM (APM) are
more difficult versions of the test suitable for people ranking at the top of the
315
intelligence tests studies. These tests are based on plural intelligence theories such
as Gardner’s theory.
3) Study the effect of other factors as parents’ occupations, family size, parent’s
education, birth order and experience with the test on SPM results. The collection
of data for these would provide useful information about the correlation of
intelligence in Libya.
4) Designing and development of a mental test in Libya that best suits the local
environment. It would be useful to obtain data for Libya for other kinds of
316
References:
Ahmad, R. K., S.J., Z. And L, R (2008). Gender differences in means and variance
on the Standard Progressive Matrices in Pakistan . Mankind Quarterly, 49,
50-57.
Ahlam (2003) evaluate the relationship between intelligence and high school
students’ academic achievement. University of Omar El-Mukhtar [in
Arabic].
317
Aiken, L. (1988). Psychological Testing and Assessment. Boston, Allyn and Bacon,
Inc.
Attashani S. and Abdalla Saleh (2005). Analysis mores of the study and effect extent
of this mores by collection from factors of personality, family and academic
achievement with students of university sample. University of Omar El-
Mukhtar [in Arabic].
Arija, V. Esparo, G., Fernandez-Ballart, J., Murphy, M.M., Biarnes, E. & Canals, J
(2006). "Nutritional status and performance in test of verbal and non-verbal
intelligence in 6 year old children." Intelligence 34: 141-149.
Arthur, W. A. D., D. (1994). "Development of a Short Form for the Raven APM
Test." Educational and Psychological Measurement 54: 394-403.
Arthur, W. A. W., D. (1993). "A Confirmatory Factor Analytic Study Examining the
Dimensionality of Raven’s Progressive Matrices." Educational and
Psychological Measurement 53: 471-478.
318
Baraheni, M. (1974). "Raven’s Progressive Matrices as Applied to Iranian Children."
Educational and Psychological Measurement 34: 983-988.
Blair, C. Gamson, D., T., S. and B., D (2005). "Rising mean IQ: Cognitive demand
of mathematics education for young children, population exposure to formal
schooling, and the neurobiology of the prefrontal cortex." Intelligence 33:
93 -106.
Blood, D. A. B., W. (1972). Educational and Evaluation. New York, Harper and Row
Publishers.
Bocéréan, C. Fischer, J-P., & Flieller, A. (2003). "Long term comparison (1921-2001)
of numerical knowledge in 3 to five and a half year old children." European
Journal of Psychology of Education 18: 405-424.
319
Brand, C. R. (1987). "Bryter still and bryter?" Nature 328: 110.
Brand, C. R., Freshwater, S. & Dockrell, W.B. (1989). "Has there been a massive
rise in IQ levels in the West? Evidence from Scottish children." Irish
Journal of Psychology 10: 388-393.
Carpenter, P. J., M. and Shell, P (1990). "What One Intelligence Test Measures: A
Theoretical Account of Processing in (SPM) Test." Psychological Review
97: 404 - 431.
320
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies.
New York, Cambridge University Press.
Carver, R. (1990). "Intelligence and Reading Ability in Grades 2-12." Intelligence 14:
449-455.
Ceci, S. J. (1991). "How much does schooling influence general intelligence and its
cognitive components? A reassessment of the evidence." Developmental
Psychology( 27): 703-722.
Chan, J. (1982). The Use of Raven’s Progressive Matrices Test in Hong Kong. 20th
International Congress of Applied Psychology. . Edinburgh Scotland.
Colom, R., Flores-Mendoza, C.E. & Abad, F.J. (2007). "Generational changes on the
Draw-a-Man test: a comparison of Brazilian urban and rural children tested
in 1930, 2002 and 2004." Journal of Biosocial Science 39: 79-89.
Cotton, S. M., Kiely, P.M., Crewther, D.P., Thomson, B., Laycock, R. & Crewther,
S.G, (2005). "A normative and reliability study for the Raven’s Colored
Progressive Matrices for primary school aged children in Australia."
Personality and Individual Differences 39: 647-660.
321
Court, J. (1983). "Sex Differences in Performance on Raven’s Progressive Matrices:
A Review." The Alberta Journal of Educational Research 29 54-74.
Cronbach, L. (1970). Essential of Psychological Testing. New York, Harper and Row
Publisher INC.
Cronbach, L. (1990). Essential of Psychological testing. New York, Harper and Row
Publisher INC.
Daley, T. C. Whaley, S. E., Sigman, M. D., Espinosa, M. P., and Neuman, C. (2003).
"IQ on the rise: the Flynn effect in rural Kenyan children." Pychological
Science 14: 215-219.
Denscombe, M. (1998). The Good Research Guide: For Small-scale Social Research.
Buckingham: Open University Press.
322
Education, S. C. f. R. I. (1949). he Trend of Scottish Intelligence. T. London,
University of London Press.
Eells, K. D., A.; Havighurts, R. and Tyler, R. (1971). Intelligence and Cultural
Differences. Chicago:, University Press.
Ezeilo, B. (1978). "Validating Panga Munthu Test and Porteus Maze Test in
Zambia." International Journal of Psychology, 13: 333- 42.
Fancher, R. (1985). The Intelligence Men: Makers of the IQ Controversy. New York,
Morton and Company.
Flieller, A. (1996). "Trends in child rearing practices as a partial explanation for the
increase in children’s scores on intelligence and cognitive development
tests." Polish Quarterly of Developmental Psychology 2: 51-61.
323
Flynn, J. R. (1987). "Massive gains in 14 nations: What IQ tests really measure."
Psychological Bulletin 101(171-191): 171.
Flynn, J. R. (1998). IQ gains over time: Toward finding the causes. In U. Neisser
(Ed.), The rising curve: Long-term gains in IQ and related measures (pp. 25-
66). Washington, DC, American Psychological
Flynn, J. R. (1999). "Searching for justice: The discovery of IQ gains over time."
American Psychologist 54: 5-20.
Flynn, J. R. and . (2007). What is Intelligence? Beyond the Flynn effect. Cambridge,
Cambridge University Press.
Fontes, P. K., T. Madaus, G.; and Airasian, W (1983). "Opinions of the Irish Public
on intelligence." Journal of Education 17: 55-67.
Freeman, F. (1962). Theory and Practice of Psychology Testing. New York, Henry
Halt and Company.
Freeman, F. (1962). Theory and Practice of Psychology Testing. New York:, Henry
Halt and Company.
Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. New York,
Basic Books.
324
Garlick, D. (2002). "Understanding the nature of the general factor of intelligence:
the role of individual differences in neural plasticity as an exploratory
mechanism." Psychological Review 109: 116-136.
Gomm, R. D., C. (2000). Using Evidence in Health and Social Care. London
Open University/Sage Publications Ltd.
Gould, S. J. (1996). The mismeasure of man (Rev. ed.). New York, Norton.
Green, B. and J. Hall (1984). “Quantitative methods for literature review.” Annal
Review of Psychology 35: 37-53.
325
Guilford, J. P. (1985). The structure-of-intellect model. In Wolman, B.B. (1985).
Handbook of intelligence: measurements, and applications. New york: ,
John Wiley & Sons.
Irwing, P. and L., R. (2005). Sex differences in means and variability on the
Progressive Matrices in university students: A meta-analysis. British Journal
of Psychology, 96, 505–524.
Irwing, P., H., A. K., O. and L., R. (2008). "Effects of Abacus training on the
intelligence of Sudanese children." Personality and Individual Differences
45: 694-696.
Hennstein, R. J., Y. C, (1994). The bell curve: Intelligence and class structure in
American life. New York: , Free Press.
Herrnstein, R. A. M., C. (1994). The Bell Curve: Intelligence and Class Structure in
American Life. New York, The Free Press.
326
Jensen, A. (1980). Bias in Mental Testing. London, Methuen and Co., Ltd.
Jensen, A. (1981). Straight Talk about Mental Tests. London, Methuen and Co., Ltd.
Jensen, A. S., D. and Larson, G. (1988). "Equating the Standard and the Advanced
Form of the Raven Progressive Matrices." Educational and Psychological
Measurement 48: 1091-1095.
Kaia Laidra , H. P., Juri Allik (2007). "Personality and intelligence as predictors of
academic achievement: A cross-sectional study from elementary to
secondary school " Personality and Individual Differences 42: 441-451
Kamin, L. a. E., H. (1981). Intelligence: The Battle for Mind. London, Pan Books.
Karmel, L. K., M (1978). Measurement and Evaluation in the Schools. New York: ,
Macmillan Publishing Co., Inc.
Khaleefa, O. & Lynn, R. (2008a) Sex differences on the Progressive Matrices: Some
data from Syria. Mankind Quarterly, 48, 345-352.
327
Khaleefa, O., Khatib, M.A., Mutwakkil, M.M. & Lynn, R. (2008b). Norms and
gender differences on the Progressive Matrices in Sudan . Mankind
Quarterly, 49, 177-183.
Khaleefa, O. & Lynn, R. (2008d). Norms for intelligence assessed by the Standard
Progressive Matrices in Qatar . Mankind Quarterly, 49, 65-71.
Levine, E. (1974). "Psychological Tests and Practices With the Deaf: A Survey of
the State of the Art." The Volta Review 76: 298-319.
Lorge, I. (1945). "Schooling makes a difference." Teachers College Record 46: 483-
492.
Lynn, R. (1982). " IQ in Japan and the United States shows a growing disparity."
Nature 297: 222-223.
328
Lynn, R. H., S.L. (1986). "The rise of national intelligence: evidence from
Britain, Japan and the USA." Personality and Individual Differences 7: 323-332.
Lynn, R. P., C. C., J. (1988). "Intelligence in Hong Kong Measured for Spearman’s g
and the Visuosptial and Verbal Primaries Intelligence." 12: 423-433.
Lynn, R., Hampson, S.L. & Mullineaux, J.C. (1987). " A long term increase in the
fluid intelligence of English children." Nature 328: 797.
Lynn, R. (1994). Sex differences in brain size and intelligence: a paradox resolved.
Personality and Individual Differences, 17, 257-271
Lynn, R., Allik, J. & Irwing, P. (2004). Sex differences on three factors identified in
Raven’s Standard Progressive Matrices. Intelligence, 32, 411-424.
Lynn, R. and Irwing, P. (2004). Sex differences on the Progressive Matrices: a meta-
analysis. Intelligence, 32, 481-498.
Lynn, R. (2008). The Global Bell Curve. Augusta, GA: Washington Summit
Publishers.
Lynn, R. (2009). What has caused the Flynn effect? Secular increases in the
Development Quotients of infants Intelligence.
329
MacAvoy, J. O., S. and Sidle, C (1993). "The Raven Matrices and Navajo Children:
Normative characteristics and culture fair Application to Issues of
Intelligence, giftedness and Academic Proficiency." Journal of American
Indian Education 33: 32-43.
Mackintosh, N. J. (1996). "Sex differences and IQ." Journal of Biosocial Science 28:
559-571.
Marais, C. A. (2007). Using the differential Aptitude test to estimate intelligence and
scholastic achievement at grade nine level, University of South Africa. McS.
Marks, R. (1981). The Idea of IQ. New York:, University Press of America.
330
Mehryar, A. (1972). "Father’s Education, Family Size and Children’s Intelligence
and Academic Performance in Iran." International Journal of Psychology, 7:
47-50.
Meisenberg, G., Lawless, E., Lambert, E. & Newton, A. (2005). "The Flynn effect in
the Caribbean: generational change in test performance in Dominica."
Mankind Quarterly 46: 29-70.
Mohan, V. (1972). " Raven’s Progressive Matrices and Verbal Test of General
Mental Test." Journal of Psychological Research 16: 67-69.
Neisser, U. (1998). The rising curve: Long-term gains in IQ and related measures.
Washington, DC, American Psychological Association.
Nelson, H. (1979). Area Handbook Series: Libya a Country Study. Washington, D.C,
The American University.
331
Oakland, T. (1976). Non-biased assessment of minority group children: With bias
toward none. Paper presented at a national planning conference on
nondiscriminatory assessment for handicapped children. Lexington, KY.
Oakland, T., & Laosa, L.M (1976). Professional, legislative, and judicial influences
on psycho educational assessment practices in schools. In T. Oakland (Ed.)
(1976). Non-biased assessment of minority group children: With bias
toward none. Paper presented at a national planning conference on
nondiscriminatory assessment for handicapped children. Lexington, KY.
Ogunlade, J. (1978). "The Predictive Validity of the (RPM) with some Nigerian.
Educational and Psychological Measurement." 33: 465-467.
Persaude, G. (1987). "Sex and Age difference on the Raven’s Matrices." Perceptual
and Motor Skills 65: 47-52.
Powers, S. B., J. and Jones, P (1986.a). "Reliability of the (SPM) Test for Hispanic
and Anglo-American Children." Perceptual and Motor Skills 62: 348-350.
Raven, J. (1986). "A nation really at risk:A review of goodlad,s ''A Place Called
School''." Higher Education Review 18: 65-79.
332
Raven, J., J. C. Raven, ( 2003). Manual for Raven’s Progressive Matrices and
Vocabulary Scales. Section 3: The Standard Progressive Matrices. San
Antonio, Harcourt Assessment, Inc.
Rust, J. and S. and Golombok (2004). Modern psychometrics, 2nd ed. New York,
Routledge.
Raven, J. (1986). Manual for Raven's Progressive Matrices and Vocabulary Scales.
London, Lewis.
Raven, J. (1989). "The Raven Progressive Matrices: A Review of National Norming
Studies and Ethnic and Socioeconomic Variation within the United States."
Journal of Educational Measurement 26: 1 - 16.
Raven, J., Raven, J.C., & Court, J.H (1993). Manual for Raven's Progressive
Matrices and Vocabulary Scales (Section 1). Oxford, England:, Oxford
Psychologists Press.
Raven, J., Court, J.H. and Raven, J.C (1996). Standard Progressive Matrices. Oxford,
UK: Oxford Psychologists Press.
Raven, J., Raven, J.C. and Court, J.H (1998). Coloured Progressive Matrices. Oxford:
Oxford Psychologists Press.
Raven, J., Raven, J.C. & Court, J.H. (1998). Standard Progressive Matrices. Oxford,
UK: Oxford Psychologists Press.
Raven, J., Raven, J.C. and Court, J.H (2000). Standard Progressive Matrices. Oxford,
Oxford Psychologists Press.
Raven, J. a. C., J.H (1989). Manual for Raven's Progressive Matrices and Vocabulary
Scales. London, Lewis.
Raven, J. C., Court, J.H. and Raven, J (1996a). Raven Matrices Progressivas.
Madrid:, TEA Ediciones, S.A.
333
Raven, J. C. (1939). "The RECI series of perceptual tests: An experimental survey."
British Journal of Medical Psychology 18(16-34): 16.
Raven, J. C., Court, J.H. & Raven, J. (1977). Manual for Raven’s Progressive
Matrices & Vocabulary Scales: The Crichton Vocabulary Scale, 1983
Revision. London, H.K.Lewis.
Raven, J. C., Court, J.H. & Raven, J. (1982). The Mill Hill Vocabulary Scale.
London, H.K.Lewis.
Raven, J. C., Court, J.H. & Raven, J. (1983). Manual for Raven’s Progressive
Matrices & Vocabulary Scales: Section 2. London, H.K.Lewis.
Raven, J. C., Court, J.H. and Raven, J (1995). Coloured Progressive Matrices.
Oxford, UK: Oxford Psychologists Press.
Raven, J. C., Court, J.H. & Raven, J. (1996). Standard Progressive Matrices. Oxford,
UK: Oxford Psychologists Press.
Raven, J. R., J. and Court, J (1988). Raven Manual: General Overview. Oxford,
Oxford Psychological Press.
Raven, J., J. C. Raven. ( 2003). Manual for Raven’s Progressive Matrices and
Vocabulary Scales. Section 3: The Standard Progressive Matrices. San
Antonio, Harcourt Assessment, Inc.
Raven, J., Raven, J. C., & Court, I. H. (2000, updated 2004). Manual for Raven’s
Progressive Matrices and Vocabulary Scales. Section 3: The Standard
Progressive Matrices. San Antonio, TX: Harcourt Assessment.
334
Richardson, K. (1991). Understanding Intelligence. Philadelphia, Milton Keynes.
Rimoldi, H. (1948). "A Note on the Raven’s Progressive Matrices Test." Educational
and Psychological Measurement 8: 347-352.
Roid, G.H., & Barram, R.A. (2004). Essentials of Stanford-Binet Assessment. New
York: Wiley
Rushton, J. P. (1997). "Race, intelligence, and the brain: The errors and omission of
the "revised" edition of S.J. Gould's the mismeasure of man (1996)."
Personality and Individual Differences 23: 169-180.
Rust, J. (2008b). Standard Progressive Matrices Plus Version and Mill Hill Manual.
London, Pearson.
Rust, J. A. G., S (1989). The Science of Psychological Assessment. New York,
Routledge.
Sahin, N. and E. and Duzen (1994). "turkish Standardization of the Rave's SPM(Age
6 to 15) " Paper presented to the 23rd International Conference of Applied
Psychology, Madrid.
Sattler, J. (1982). Children’s Intelligence and Special Abilities. Boston, Allyn and
Bacon Inc.
335
Scarr, S. (1981). Race, Social Class, and Individual Differences in IQ. New Jersey,
Lawrence Erlbaum Associates Publishers.
Shayer, M., Demetriou, A. & Pervez, M (1988). "The structure and scaling of
concrete operational thought: three studies in four countries." Genetic,
Social & Psychological Monographs: 309-375.
Shayer, M. (2007). "30 Years on-a large anti-'Flynn effect'? The Piagetian test
Volume & Heaviness norms 1975-2003." British Journal of Educational
Psychology 77: 25-42.
Shelley, D. A. C., D (1986). Testing Psychological Tests. London, Croom Helm Ltd.
Sidles, C. A., J (1987). "Navajo Adolescents Scores on (PLQ), (SPM), and (CTBS)."
Educational and Psychological Measurement 47: 703-709.
Sinha, U. (1950). Reliability and Validity of the Progressive Matrices Test. London,
University of London. M.A.
Sinha, U. (1968). "The Use of Raven’s Progressive Matrices Test in India." Indian
Educational Review(3): 75-88.
Singh, U. (1951). "A study of Reliability and Validity of the progressive Matrices
Test." british Journal of educational Psychology 21: 221-226.
Snyderman, M., & Rothman, S (1988). The IQ controversy. The media and public
policy. New Brunswick, NJ, Transaction Publishers.
Sokal, M. (1987). Psychological Testing and American Society 1890 - 1930, New
Brunswick: Rutgers University Press.
336
Sorokin, B. (1954). "Standardisation and analysis of Progressive Matrices Test by
Penrose and Raven." Unpublished Report from Zagred Yugoslavia
Spearman, C. J., L.L (1950). Human ability: a continuation of “The abilities of Man”.
London: Macmillan.
Sundet, J. M., Barlaug, D.G. & Torjussen, T.M (2004). "The end of the Flynn effect?
A study of secular trends in mean intelligence test scores of Norwegian
conscripts during half a century." Intelligence 32: 349-362.
Sundet, J. M., Borren, I. & Tambs, K (2008). "The Flynn effect is partly caused by
changing fertility patterns." Intelligence 36: 183-191.
337
Teasdale, T. W. O., D.R. (1989). "Continuing secular increases in intelligence and a
stable prevalence of high intelligence levels." Intelligence 13: 255-262.
Teasdale, T. W. O., D.R. (1994). "hirty year secular trend in the cognitive abilities of
Danish male school leavers at a high educational leve." Tl. Scandinavian
Journal of Psychology 35: 328-335.
Turner, S. M., DeMers, S. T., Fox, H. R., & Reed, G., M. (2001). "APA's Guidelines
for Test User Qualifications: An Executive Summary." American
Psychologist 56(12): 1099-1113.
Tulkin, S. a. N., J (1968). "Social Class, Race and Sex Differences on the Raven
(1956) Standard Progressive Matrices." Journal of Consulting and Clinical
Psychology 32: 400-406.
338
U.S. Department of Education, O. f. C. R. (2000). The Use of Tests as Part of High-
Stakes Decision-Making for Students: A Resource Guide for Educators and
Policy-Makers.
Urbach, P. (1974). "Progress and degeneration in the "IQ debate"." British Journal of
the Philosophy of Science 25: 99-135, 235-259.
Vernon, P. E. (1942). The reliability and Validity of the Progressive Matrices Test.
London, Admiralty Report,.
339
Wheeler, L. R. (1942). "A comparative study of the intelligence of East Tennesse
mountain children." Journal of Educational Psychology 33: 321-334.
Whorton, J. a. K., F (1988). "Comparison of the 1979 and the 1986 Norms on the
Standard Progressive Matrices for Economically Disadvantaged Students:
Implication for Identification of Gifted Children." Perceptual and Motor
Skills 67: 749-750.
Williams, W. M. (1998). Are we raising smarter children today? School and home
related influences on IQ. In U.Neisser (Ed) The Rising Curve. Washington,
DC, American Psychological Association.
Yoon, S., N. (2005). Comparing the Intelligence and Creativity Scores of Asian
American Gifted students and Caucasian Gifted students. Graduate School,
University of Purdue. PhD thesis . pp2-3.
Young, H. T., R.; Tesi, G. and Montemagni, G (1962). "Influence of Town and
Country Upon Children’s Intelligence." British Journal of Educational
Psychology 32: 151-158.
Yousefi, F. S., A.; Razavich, A.; Mehryar, A.; Hosseini, A. and Alborzi, S (1992).
"Some Normative Data on the Bender Gestalt Test Performance of Iranian
Children." British Journal of Educational Psychology 62: 410-416.
340
Appendix 1
341
Appendix 2
Smoothed 2007-2008 Norms for the Libya in the Context of the 1989 Taiwan Data
Age in years
9 10 11 12
Percentile Li TA Li TA Li TA Li TA
95
6 1
90 8 1
75 1 6 32
5
50 8 20 6
25 12
2
10 0 2 4
5 9 9 0 2
n 180 180
180
180
Smoothed 2007-2008 Norms for the Libya in the Context of the 1992 India Data
Age in years
11 12 12 14 15
Percentile Li IN Li IN Li IN Li IN Li IN
95 0
6 1 50
90 22
8 1 49
75 18 1 6 32 5 45
50 6 8 20 6 40
25
12 2 31
10 0 2 4 16
5
9 9 0 2 12
n 180 180 180 180
180 131
Smoothed 2007-2008 Norms for the Libya in the Context of the 1992 Netherlands Data
Age in years
8 9 10 11 12
Percentile Li HU Li HU Li HU Li HU Li HU
95 0 43 6
1
90 22 41
8
1
75 18 37 1 6 32 5
50 6 29 8 20 6
25 22 12 2
10 17 0 2 4
5
13 9 9 0 2
Smoothed 2007-2008 Norms for the Libya in the Context of the 1998 France Data
Age in years
8 9 10 11 12
Percentile Li FR Li FR Li FR Li FR Li FR
95 0 45 47 6 51 1 52 52
90 22 42 44 48 8 49 1 50
75 18 39 1 42 6 45 32 45 5 45
50 6 33 8 36 20 39 6 41 41
25 22 12 27 33 37 2 37
10 15 0 20 28 2 31 4 33
5
12 9 13 9 21 0 27 2 30
n 180 62 180 71 180 64 180 63 180 70
342
Smoothed 2007-2008 Norms for the Libya in the Context of the 1993 Turkey Data
Age in years
8 9 10 11 12 13 14
Percentile Li TR Li TR Li TR Li TR Li TR Li TR Li TR
95 0 37 45 6 47 1 48 49 47 52 7 52
90 22 34 42 45 8 46 1 47 42 51 3 51
75 18 29 1 37 6 40 32 42 5 42 40 44 48
50 6 21 8 27 20 31 6 33 34 36 41
25 17 12 22 25 27 2 28 7 28 8 29
10 12 0 13 14 2 14 4 14 15 18
5
11 9 11 9 12 0 12 2 12 12 6 13
n 180 104 180 186 180 381 180 274 180 168 180 119 180 72
Smoothed 2007-2008 Norms for the Libya in the Context of the 1987 Kosice, Slovakia
Percentil Age in years
e 15 16 17 18
LI SK LI S LI S LI S LI S LI S LI SK LI SK
K K K K K
1 51 53 47 54 7 55 56 8 57 49 58 52 58
8 49 1 51 42 52 3 53 54 55 48 56 50 56
2 46 5 48 40 49
51 0 52 3 53 4 53 46 53
6 42 44 45 47 5 49 50 39 50 41 50
36 2 38 7 41 8 42 8 44 29 45 2 46 33 47
2 29 4 31 34 36 37 39 5 40 29 41
5 0 24 2 27 29 6 31 9 32 19 33 20 34 20 35
N 18 - 18 - 18 - 18 - 18 - 18 - 18 - 20 -
0 0 0 0 0 0 0 0
Smoothed 2007-2008 Norms for the Libya in the Context of the 1979 & 1992 British Data
Age in years
8 9 10 11 12 13 14
Percentile Li UK Li UK Li UK Li UK Li UK Li UK Li UK
95 0 40 6 48 1 50 52 47 54 7 55
90 22 38 46 8 48 1 50 42 52 3 54
75 18 33 1 6 42 32 44 5 46 40 49 50
50 6 25 8 20 38 6 40 41 43 45
25 17 12 32 34 2 37 7 39 8 42
10 14 0 23 2 29 4 31 33 36
5
12 9 9 17 0 24 2 26 28 6 30
n 180 174 180 166 180 172 180 187 180 164 180 185 180 196
Age in years
15 16 17 18-21
Percentile Li UK Li UK Li UK Li UK
95 57 8 - 49 - 53 59
90 55 - 48 - 51 58
75 0 51 3 - 4 - 47 57
50 5 47 - 39 - 43 54
25 8 42 29 - 2 - 36 49
10 36 - 5 - 31 44
5 9 33 19 - 20 - 26 39
n 180 191 180 - 180 - 800 58
343
Smoothed 2007-2008 Norms for the Libya in the Context of the 1986 Australia Data
Age in years
8 9 10 11 12 13 14
Percentile Li Au Li Au Li Au Li Au Li Au Li Au Li Au
95 0 44
6 1 47 7
90 22 42 8 1
42 3
75 18 39 1 6 32 5 40
50 6 32 8 20 6
25 22 12 2 7
8
10 13 0 2 4
5
11 9 9
0 2
6
n 180 - 180 - 180 - 180 - 180 - 180 - 180 -
Age in years
15 16 17
Percentile Li Au Li Au Li Au
95 8
49
90 48
75 0 3 4
50 5 39
25 8 29 2
10 5
5 9 19 20
n 180 - 180 - 180 -
Smoothed 2007-2008 Norms for the Libya in the Context of the 1986 China Data
Age in years
8 9 10 11 12 13 14
Percentile Li Ch Li Ch Li Ch Li Ch Li Ch Li Ch Li Ch
95 0 44 47 6 50 1 52 53 47 53 7 55
90 22 39 43 48 8 48 1 50 42 52 3 52
75 18 31 1 37 6 42 32 43 5 46 40 50 50
50 6 23 8 33 20 35 6 39 42 45 48
25 15 12 25 27 33 2 37 7 40 8 43
10 13 0 14 17 2 25 4 27 35 36
5
10 9 12 9 13 0 19 2 21 30 6 34
n 180 - 180 - 180 - 180 - 180 - 180 - 180 -
Age in years
15 16 17-19 18-21
Percentile Li Ch Li Ch Li Ch Li Ch
95 57 8 57 49 58 53 57
90 54 56 48 57 51 56
75 0 51 3 53 4 55 47 54
50 5 48 49 39 52 43 50
25 8 43 29 44 2 47 36 44
10 36 41 5 40 31 38
5 9 34 19 36 20 37 26 33
n 180 - 180 - 180 - 800 -
344
Smoothed 2007-2008 Norms for the Libya in the Context of the 1979 & 1992 United States of America
Age in years
8 9 10 11 12 13 14
Percentile Li Us Li Us Li Us Li Us Li Us Li Us Li Us
95 0 38 42 6
1 50 47 7
90 22 36 40 44 8
1 42
3
75 18 31 1
6 40 32 5 40
50 6 23 8 20 6
25 16 12
2 7 8
10 13 0
2 4
5
10 9 9 0 2 6
n 180 - 180 - 180 - 180 - 180 - 180 - 180 -
Age in years
15 16 17 18-21
Percentile Li Us Li Us Li Us Li Us
95
8 49 - 53
90 48 - 51
75 0
3 4 - 47
50 5 39 - 43
25 8 29 2 - 36
10
5 - 31
5 9 19
20 - 26
n 180 - 180 - 180 - 800
Smoothed 2007-2008 Norms for the Libya in the Context of the 1998 Slovenia Data
Age in years
8 9 10 11 12 13 14
Percentile Li SL Li SL Li SL Li SL Li SL Li SL Li SL
95 0 39 44 6 49 1 51 52 47 53 7 54
90 22 37 42 47 8 49 1 50 42 51 3 52
75 18 33 1 39 6 43 32 45 5 47 40 48 49
50 6 24 8 31 20 36 6 40 44 45 46
25 16 12 21 29 33 2 36 7 37 8 38
10 11 0 14 19 2 25 4 30 32 33
5
9 9 12 9 15 0 19 2 22 24 6 24
n 180 48 180 71 180 59 180 59 180 58 180 68 180 72
Age in years
15 16 17 18 19 20 21
Percentile Li SL Li SL Li SL Li SL Li SL Li SL Li SL
95 56 8 57 49 57 52 57 50 53 54
90 53 54 48 54 50 55 48 51 52
75 0 50 3 51 4 52 46 53 46 47 48
50 5 47 47 39 48 41 49 42 43 43
25 8 40 29 41 2 43 33 44 35 37 37
10 34 35 5 35 29 36 29 32 33
5 9 25 19 26 20 28 20 30 25 29 30
n 180 67 180 147 180 127 200 43 200 200 200
345
Buy your books fast and straightforward online - at one of world’s
fastest growing online book stores! Environmentally sound due to
Print-on-Demand technologies.