AStudyof Intelligencein North Africaandthe Middle East

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/299859913
A Study of Intelligence in North Africa and the Middle East.
Book · January 2012
CITATIONS READS
0 917
1 author:
Alsedig Abdalgadr Al-Shahomee

University of Tripoli
23 PUBLICATIONS 108 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
‫ اﻟﺸﺨﺼﻴﺔ‬View project
An Increase in Intelligence in Libya from 2006 to 2017 View project
All content following this page was uploaded by Alsedig Abdalgadr Al-Shahomee on 07 April 2016.
The user has requested enhancement of the downloaded file.

! "!

!
"

#

$ %&

'

((
'

)*+,(

(

-
#
*$

'%

.
-

*

!"!

!

" # $ !%
! & $ '
' ($ ' # % %
) %* %' $ '
+ " % &
' ! #
$,
( $

- .

!
"- ( %

.
% % % %
$ $ $ -
-
- -
// $$$
0

1"1"#23."

"0" )*4/ +)

* !5!& 6!7%66898& % ) -
2
: !
* &

#$%&'(')*$'+&+$$'(
/- ;9<89"0" )*4/ +)

"3 " & 9<89
A Study of Intelligence in North Africa and the Middle East.
Alsedig Abdalgadr Ali Alshahomee

Omar Al-Mukhtar University
El-Beida Libya
Dedication
To my mother (Zahra) and child daughter (Hajar) who both passed away during my
study. I will always remember you and keep praying for you.
i
Acknowledgments
I begin by praising ALLAH Almighty. I praise him and seek his help and pleasure. I
wish to express my grateful appreciation to Prof. Richard Lynn and my supervisor
Prof. Peter Eachus and my co-supervisor Dr Simon Cassidy. My thanks should also
go to all the participants who took part in this study, and all those who helped me
during this study, especially my colleagues Prof. A. Attashani, Prof. S. Elghmary, Dr.
M. Hammad and Mr. K. Khelifa. Finally, special thanks to my parents, wife, children:
Abubaker, Ashraf, Alamin and zahra who make my life worthy. It is also to my sister
and brothers for their understanding, support and faithfulness during the years of my
study in England .
ii
Contents
Page
Tables.......................................................................................................................... vii
Figures......................................................................................................................... xi
Chapter one: INTRODUCTION

1.1 Introduction……………………………………………………………….. 1
Chapter two: INTELLIGENCE LITERATURE REVIEW

2.1 Introduction……………………………………………………………….. 6
2.2 Definitions of Intelligence………………………………………………… 7
2.2.1 The 1921 Symposium……………………………………………………... 7
2.2.2 The 1986 Symposium……………………………………………………... 9
2.3 Evolution of the Concept of Intelligence and Intelligence Testing……….. 12
2.3.1 Contribution of Edward Seguin (1812-1880). ……………………………. 16
2.3.2 Contribution of Jean Etienne Esquirol (1772-1840)..……………………... 17
2.3.3 Contribution of Sir Francis Galton (1822-1911)…………………………... 17
2.3.4 Contribution of James McKeen Cattell (1860-1944)……………………… 18
2.3.5 Contribution of Alfred Binet (1857-1911)………………………………… 19
2.3.6 The First World War and the Development of Group Tests………………. 21
2.3.7 Contribution of Charles Spearman (1863-1945)…………………………... 23
2.3.8 Contribution of Piaget (1896-1980)……………………………………...... 25
2.4 Theories of Intelligence……………………………………………………. 27
2.4.1 Spearman’s “g” Theory……………………………………………………. 27
2.4.2 Thurstone's Primary Mental Abilities (1938)…………………………….... 28
2.4.3 Guilford’s structure of the intellect theory………………………………… 30
2.4.4 Gardner’s theory of multiple intelligences………………………………… 31
2.4.5 Cattell and Horn’s theory of fluid and crystallized intelligence…………... 32
2.4.6 Carroll’s three-startum theory of cognitive abilities………………………. 33
2.4.7 The Cattell-Horn Carroll Model…………………………………………… 33
2.5 Definitions of Mental Test………………………………………………… 34
2.6 Classification of Mental Tests…………………………………………….. 35
2.6.1.1 Classification of tests according to timing………………………………... 35
iii
2.6.1.2 Classification of tests according to procedure of administration…………. 36
2.6.1.3 Classification of tests according to content……………………………….. 37
2.7 Use of Mental Tests……………………………………………………….. 37
2.8 Use of Intelligence Tests………………………………………………….. 38
2.9 Culture-Free and Culture-Fair Tests………………………………………. 41
2.10 Achievement Tests………………………………………………………… 44
2.11 Intelligence and academic achievement…………………………………… 47
2.12 Increase in IQ with time…………………………………………………… 50
2.13 Chapter Summary………………………………………………………….. 57
Chapter three: RATIONALE AND STATEMENT OF PROBLEM

3.1 Introduction………………………………………………………………... 59
3.2 Education System in Libya………………………………………………... 60
3.3 Intelligence testing in Libya………………………………………………. 63
3.4 Adoption of intelligence tests……………………………………………… 68
3.5 Standard Progressive Matrices (SPM) test………………………………… 70
3.6 Statement of problem and study rationale………………………………… 73
3.7 Study aim………………………………………………………………….. 84
3.8 Research Question………………………………………………………… 84
3.9 Research objectives……………………………………………………….. 84
3.10 Chapter Summary………………………………………………………….. 85
Chapter four: REVIEW OF STANDARD PROGRESSIVE MATRICES

LITERATURE
4.1 Introduction………………………………………………………………... 87
4.2 Progressive Matrices Tests………………………………………………… 878
4.3 Description of the SPM test……………………………………………….. 91
4.4 Reporting SPM Results……………………………………………………. 94
4.5 Standardisation of the SPM test…………………………………………… 95
4.6 Reliability of the SPM……………………………………………………... 97
4.6.1 Test-retest reliability Test…………………………………………………. 98
4.6.2 Spilt-half reliability………………………………………………………... 100
4.6.3 Cronbach’s alpha reliability ………………………………………………. 101
4.7 Validity of the SPM test…………………………………………………… 104
iv
4.7.1 Content Validity…………………………………………………………… 105
4.7.2 Construct Validity…………………………………………………………. 106
4.7.2.1 Factor analysis…………………………………………………………….. 107
4.7.2.2 Internal consistency……………………………………………………….. 110
4.7.3 Criterion-related Validity…………………………………………………. 111
4.7.3.1 Correlation of SPM test with Intelligence Tests…………………………... 112
4.7.3.2 Correlation of SPM test with Achievement Tests…………………………. 120
4.8 Item analysis of the SPM test……………………………………………… 130
4.8.1 Item difficulty…………………………………………………………….... 130
4.8.1 Item discrimination………………………………………………………... 131
4.9 Review of previous studies that employed SPM test……………………… 132
4.9.1 Studies on SPM test in developed countries………………………………. 134
4.9.1 Studies on SPM test in developing countries……………………………… 146
4.10 Chapter Summary………………………………………………………….. 157
Chapter five: MATERIALS AND METHODS

5.1 Introduction………………………………………………………………... 160
5.2 Research design……………………………………………………………. 160
5.3 Methodology………………………………………………………………. 161
5.4 Methods……………………………………………………………………. 162
5.5 Ethical approval…………………………………………………………… 164
5.6 Pilot study………………………………………………………………….. 165
4.7 Main Study………………………………………………………………… 166
5.7.1 Sample size……………………………………………………………….... 166
5.7.2 Sample selection…………………………………………………………… 166
5.7.2.1 Multi-stage-cluster sampling design………………………………………. 166
5.7.2.2 Disproportional stratified sampling……………………………………….. 168
5.7.2.3 The multi-stage-cluster sampling process and procedures………………… 171
5.8 Field work arrangement…………………………………………………… 178
5.9 Preparation of the SPM test……………………………………………….. 180
5.10 Administration of the SPM test…………………………………………… 180
5.11 The proposed and achieved sample size…………………………………... 182
5.12 Data Statistical Analysis…………………………………………………... 183
v
5.13 Chapter Summary………………………………………………………….. 186
Chapter six: RESULTS

6.1 Introduction………………………………………………………………... 187
6.2 Description of students and SPM score means……………………………. 190
6.3 Reliability of the SPM Test………………………………………………... 192
6.3.1 Test-retest reliability of the SPM test……………………………………… 193
6.3.2 Spilt-half reliability………………………………………………………... 193
6.3.3 Alpha Reliability…………………………………………………………... 194
6.4 Validity of the SPM test…………………………………………………… 195
6.4.1 Construct Validity…………………………………………………………. 195
6.4.1.1 Factor analysis of SPM test………………………………………………... 196
6.4.1.2 Internal consistency validity………………………………………………. 200
6.4.2 Criterion-related validity…………………………………………………... 202
6.5 Item Analysis of the SPM test……………………………………………... 203
6.5.1 Item Difficulty……………………………………………………………... 203
6.5.2 Item Discrimination……………………………………………………….. 204
6.6 Differences in SPM scores………………………………………………… 208
6.6.1 Differences according to gender…………………………………………... 208
6.6.2 Difference according to regions (cities and villages)……………………… 209
6.6.3 Difference according to academic discipline……………………………… 210
6.6.4 Difference according to geographic areas…………………………………. 211
6.6.5 Difference according to age……………………………………………….. 212
6.6.6 Difference according to study levels……………………………………… 213
6.6.7 Difference according to regions and study levels………………………… 215
6.6.8 Difference according to regions and gender………………………………. 217
6.6.9 Difference according to age and region…………………………………… 218
6.6.10 Difference according to geographic areas and gender…………………….. 221
6.6.11 Difference according to academic discipline and gender…………………. 223
6.6.12 Difference according to age and gender…………………………………… 224
6.6.13 Difference according to academic discipline and age……………………... 227
6.7 Multiple Regression according to independent variables…………………. 232
6.8 The Percentile Ranks of the SPM Score…………………………………... 233
6.9 Chapter Summary…………………………………………………………. 236
vi
Chapter seven: META-ANALYSIS
7.1 Introduction………………………………………………………………... 240
7.2 Advantages of Meta-analysis……………………………………………… 241
7.3 Disadvantages of Meta-analysis…………………………………………… 242
7.4 Literature review…………………………………………………………... 243
7.5 Method…………………………………………………………………….. 244
7.5.1 Criteria for studies selection………………………………………………. 244
7.5.2 Strategy of analysis………………………………………………………... 246
7.6 Results……………………………………………………………………... 248
7.6.1 SPM means and standard deviations according to the independent
variables…………………………………………………………………… 251
7.6.2 Differences in SPM scores………………………………………………… 252
7.6.2.1 Difference according to development status………………………………. 252
7.6.2.2 Difference according to age groups……………………………………….. 253
7.6.2.3 Difference according to gender……………………………………………. 255
7.6.2.4 Difference according to development status and age……………………… 256
7.6.2.5 Difference according to development status and gender………………….. 260
7.6.2.6 Difference according to age groups and gender…………………………… 262
7.6.3 Multiple Regressions according to the independent variables…………….. 266
7.7 Chapter Summary………………………………………………………….. 267
Chapter eight: DISCUSSION AND CONCLUSION

8.1 Introduction………………………………………………………………... 270
8.2 Intelligence testing in Libya……………………………………………….. 271
8.3 The SPM test………………………………………………………………. 272
8.4 Meta-analysis……………………………………………………………. 273
8.5 Study discussion…………………………………………………………… 277
8.5.1 Psychometric characteristics of the SPM test in Libya……………………. 277
8.5.1.1 Reliability of SPM test…………………………………………………….. 278
8.5.1.2 Validity of SPM test…………………………………………………….…. 280
8.5.1.3 Item analysis of SPM test………………………………………………….. 282
8.5.2 IQ and Libya………………………………………………………………. 283
vii
8.5.3 SPM and gender…………………………………………………………… 292
8.5.4 SPM and region……………………………………………………………. 297
8.5.5 SPM and age (study level)………………………………………………… 298
8.5.6 SPM and academic discipline……………………………………………... 301
8.5.7 Relationship and prediction of SPM………………………………………. 301
8.5.8 SPM percentiles…………………………………………………………… 302
8.6 Study conclusions…………………………………………………………. 305
8.7 Study contributions………………………………………………………... 308
8.8 Limitations of the Study…………………………………………………… 308
8.9 Recommendations of the Study…………………………………………… 313
8.10 Further research……………………………………………………………. 315
viii
Tables
Page
Table 4.1 SPM standardization studies……………………………………………… 96
Table 4.2 Summary of the studies performed on the SPM test reliability…………... 103
Table 4.3 Summary of studies on SPM test concurrent validity with r to z Fisher’s
transformation results…………………………………………………….. 118
Table 4.4 The average of the correlation between SPM test with intelligence tests... 119
Table 4.5 Summary of the studies on SPM test predictive validity and with r to z
Fisher’s transformation results…………………………………………… 127
Table 4.6 The average of correlation between the SPM test and achievement tests... 129
Table4.7 Shows a sample of worldwide studies that utilised the SPM test as a …. 132
Table 5.1 Principals of selecting sample in schools………………………………… 175
Table 5.2 The target sample size for selecting the pre-university students in the two
cities in proportion to their real numbers…………………………………. 175
Table 5.3 The target sample size for selecting the pre-university students in the
nine villages in proportion to their real numbers…………………………. 176
Table 5.4 The target sample size for selecting the undergraduate university students
in Omar El-Mukhtar University in proportion to their real numbers…….. 176
Table 6.1 Descriptive statistics of overall collected data and tests of normality……. 188
Table 6.2 SPM score means and standard deviations……………………………….. 191
Table 6.3 SPM test-retest reliabilities according to age, gender and study levels…... 193
Table 6.4 SPM split-half reliabilities according to gender, age and total Sample…... 194
Table 6.5 SPM Alpha reliabilities according to gender, age and total sample……… 195
Table 6.6 Correlations matrix between the five sets of the SPM test among Libyan
male and female students (N=2600, 8 to21 years) and extracted factor….. 196
male students (N=1300, 8 to21 years) and Extracted Factor……………... 198
female students (N=1300, 8 to21 years) and extracted factor……………. 199
Table 6.9 Correlations coefficients between the five sets and the total scores of the
SPM test (n=2600, age 8 to21 years)…………………………………….. 200
ix
Page
SPM test (males n=1300 and females n= 1300, age 8 to21 years)……….. 201
Table 6.11 Correlation between the SPM and achievement scores according to age,
level of study, gender, academic discipline and total sample…………….. 202
Table 6.12 Item difficulty (percentages of correct answers) and SPM Means of the
Correct Answers (N = 2600)……………………………………………... 203
Table 6.13 Index of Discrimination and Items Evaluation…………………………… 205
Table 6.14 Point biserial and significant level for each SPM item…………………… 205
Table 6.15 Summary of item analysis of the five SPM sets………………………….. 206
Table 6.16 Comparison of gender…………………………………………………….. 208
Table 6.17 Comparison of regions……………………………………………………. 209
Table 6.18 Comparison of academic discipline………………………………………. 210
Table 6.19 Comparison of geographic areas…………………………………………. 211
Table 6.20 Post Hoc Tukey (HSD) Test……………………………………………… 211
Table 6.21 Comparison according to age…………………………………………….. 212
Table 6.22 Post Hoc Tukey (HSD) Tests…………………………………………….. 213
Table 6.23 Comparison according to study levels…………………………………… 214
Table 6.25 Comparison of the region according to study levels……………………... 215
Table 6.26 Levene's Test of Equality of Error Variances of SPM scores……………. 215
Table 6.27 Tests of Between-Subjects Effects of SPM scores……………………….. 215
Table 6.29 Comparison of the regions according to gender………………………….. 217
Table 6.32 Comparison of age according to region…………………………………... 218
Table 6.34 Tests of Between-Subjects Effects of SPM scores………………………. 219
Table 6.35 Post Hoc Tukey (HSD) test………………………………………………. 219
Table 6.36 Comparison of the geographic areas according to gender………………... 221
x
Page
Table 6.40 Comparison of academic discipline according to gender………………… 223
Table 6.43 Comparison of age according to gender………………………………….. 224
Table 6.44 Levene's Test of Equality of Error Variances…………………………….. 225
Table 6.47 Comparison of academic discipline according to age……………………. 227
Table 6.51 Magnitude of gender differences in means score and variability on SPM
as functions of age, geographic areas and discipline……………………... 229
Table 6.52 Stepwise Regression for Independent Variables and the SPM Scores…… 232
Table 6.53 detailed percentile 2007-2008 Norms for Libya students according to age 233
Table 6.54 detailed percentile 2007-2008 Norms for the Libyan students according
to age and gender…………………………………………………………. 234
Table 6.55 Detailed percentile (2007-2008) Norms for Libyan students according to
age and academic discipline……………………………………………… 235
Table 7.1 Studies included in the meta-analysis…………………………………….. 245
Table 7.2 Descriptive statistics for means scores of overall collected data and tests
of normality………………………………………………………………. 249
Table 7.3 Showing SPM score means and standard deviations according to
independent variables…………………………………………………….. 251
Table7.4 Comparison of the SPM Mean according to development status………… 252
Table 7.5 Post hoc tests multiple comparisons of SPM scores (Tukey HSD)………. 252
Table 7.6 Comparison of the SPM Mean scores according to age groups………….. 253
Table 7.7 Post Hoc Tests Multiple Comparisons of SPM Scores (Tukey HSD)……. 253
Table 7.8 Comparison of the gender mean scores of SPM test……………………... 255
Table 7.9 Comparison of the development status mean scores of SPM test
according to age…………………………………………………………... 256
xi
Page
Table 7.13 Magnitude of the development status of countries (developed and
developing countries) in mean scores and variability on SPM as
functions of age and total sample………………………………………… 258
Table 7.14 Comparison of the development status mean scores of SPM test
according to gender……………………………………………………….. 260
Table 7.17 Comparison of the age groups mean scores of SPM test according to
gender…………………………………………………………………….. 262
Table 7.21 Magnitude of gender differences in mean scores and variability on SPM
as a function of age and development status……………………………... 264
Table 7.22 Stepwise Regression for Independent Variable and the SPM Score
Means……………………………………………………………………... 266
Table 8.1 Mean IQs and average for some developed and developing countries…... 283
xii
Figures
Page
Figure 4.1 Typical items from the SPM Test. A5 presents an easy item whereas E1
presents a difficult item …………………………………………….. 92
Figure 5.1 Summary of the sampling method and theory………………………….. 171
Figure 5.2 Sampling process……………………………………………………….. 177
Figure 6.1 Histogram showing normal distribution for means scores……………... 188
Figure 6.2 Normal Q-Q plot……………………………………………………….. 189
Figure 6.3 Detrended normal Q-Q plot…………………………………………….. 189
Figure 6.4 Box plot of scores distribution…………………………………………. 189
Figure 6.5 Screen Plot for the five Factors………………………………………… 197
Figure 5.8 Means score differences of age and region…………………………….. 220
Figure 5.9 Means score difference of age and gender……………………………... 226
Figure 7.1 The distribution for means scores……………………………………… 249
Figure 7.2 Box plot of scores distribution…………………………………………. 249
Figure 7.3 Normal Q-Q plot……………………………………………………….. 250
Figure 7.4 Detrended normal Q-Q plot…………………………………………….. 250
Figure 7.5 Means score differences of age group and gender……………………... 263
Figure 8.1 Urbanisation development in Libya 1954-1995………………………... 297
xiii
Chapter One: INTRODUCTION
Humans differ from one another in their ability to understand complex ideas, adapt
effectively to the surrounding environment, learn from experience, engage in various forms
of reasoning and overcome obstacles through thinking. Although individuals’ differences can
be substantial, they are never entirely consistent over time. A given person's intellectual
performance will vary on different occasions, in different domains and as judged by different
criteria. The concept of "intelligence" is an attempt to represent and organize this complex set
of phenomena. Such conceptualization has achieved great success in clarifying some areas.
Nonetheless it has not yet answered all the important questions nor has it established
universal assent. Indeed, when two of the prominent theorists, in the field, were asked to
define intelligence, they gave two somewhat distinct definitions (Sternberg & Detterman,
1986). Such a disagreement is not a cause of dismay. Scientific research rarely begins with
fully agreed definitions, though it may eventually lead to them.
Intelligence tests play a vital role at all stages and in every aspect of a person's life. From
pre-school days through to postgraduate years, tests are administered for grouping, course
selection purposes, and placement in special classes or special institutions. Not only that, but
also for career orientation, college entrance and admission to professions. A person's
Intelligence Quotient (IQ) score largely determines the type of education he/she received and,
ultimately, the type of position he/she might occupy within society. Therefore the concept of
intelligence is central to an individual's life (Samuda, 1975).
Though Libya has witnessed a huge development in education within the last decades, some
areas still lack the benefits of such advancements. To date, no single test of intellectual ability
has been officially adopted to be used for the measurement of intelligence. Schools and
universities alike use examination grades as the primary and only method in determining who

should be accepted for study at various academic establishments. Similar procedures take
place in the vocational sector. These, grades, might be considered as a good criterion for
such purposes. Additional criteria, however, are essential for reliable and valid judgements.
One of which is the application of mental, or particularly intelligence tests in decision making
processes. The lack of intelligence tests in Libya in the selection of students for different
educational programs had caused many problems. Failure to allocate students according to
their abilities and interests deprived Libya from one of its most valuable resources. This also
had an adverse effect on business and commerce. Employees scoring well in tests might not
necessarily possess the attributes to perform the job effectively.
The health service system is another affected sector. Mental tests currently employed in
Libya are either misused or used in an incomplete form. The use of incomplete tests has
serious negative implications for educational and clinical decisions. The chief drawback is
the bias of the test predictions. In the clinical case, the use of incomplete test scores for the
estimation of mental ability might result in invalid assessment. This will lead to grave
consequences on individuals’ lives. Intelligence tests are useful tools in accomplishing the
desired goals and avoid unwanted side-effects. Their effectiveness will depend on the skills
and knowledge of the psychologist.
Nowadays a relevant and accurate selection procedure is required in Libya more than ever
before. Not only in the fields of education, health and vocation but in the whole agenda of the
government. Indeed, a clear failing of the current system could be seen, for example, at the
job market. Many university graduates were posted to office work which could be done by
less qualified people (Attashan and Abdalla 2005).

In response to the current gaps, this book aims at introducing one of the well known
intelligence tests in the world in Libya. This is the classic form of the Standard Progressive
Matrices (SPM) test. Moreover, the current study attempts to develop norms for the SPM test
and identify the distribution of IQ scores of a Libyan sample. The study objectives include:
1. Determine psychometric characteristics (reliability, validity, difficulty and
discrimination) of the SPM test when applied to a Libyan sample.
2. Study the relationship between SPM mean scores and student’s academic
achievement (SAA) for a Libyan sample aged 8 – 21 years.
3. Investigate the presence of significant differences in sample performances on the
SPM test according to gender, region (cities and villages), academic discipline
(science and arts), geographical areas (main city, secondary city, coastal, mountain
and desert), age and study levels.
4. Investigate the presence of significant differences in sample performance on the SPM
test according to region and gender, age and region, region and study levels,
geographic areas and gender, academic discipline and gender, age and gender and age
and academic discipline.
5. Investigate variability of SPM means score gender based on age, and gender based on
geographic areas, and gender based on academic discipline.
6. Examine the contribution of the independent variables gender, age and regions and
academic achievement in predicting SPM scores.
7. Compute the percentile ranks for the SPM scores according to the sample age levels.

8. Compare performance on the SPM test for a Libyan sample with that of other
countries (developed and developing countries).
The book begins, in chapter two, with a historical review of literature. First, the definition of
the concept of intelligence, its evolution and means of testing are presented. A brief look at
some of the important theories of intelligence developed over the past century is then
highlighted. After that, the definitions, classification, and uses of some mental tests including
culture fair tests, achievement tests, intelligence and academic achievement are discussed in
depth. The evolution of the Intelligence Quotient (IQ) with time in different countries will
also be studied.
Chapter three introduces the statement of problem and the study rationale. It provides a short
description of the education system and intelligence testing in Libya. It also includes the
research questions, study aims and objectives.
After setting the atmosphere of the research, the focus is then shifted, in chapter four, to the
general information regarding the Progressive Matrices tests. A description of the SPM test
and its standardization are presented. After that the reliability, validity and item analysis of
the SPM test are rigorously investigated. Towards the end, a brief review of previous studies
which have employed of the SPM test will be given.
Chapter five is concerned with methodology issues such as research design, ethical approval,
pilot study, and sample and data collection. It also covers statistical methods to be used, the
modification and administration of the SPM test. The tests are performed in Libya on a
sample of students.
Once the test is performed and data are available, the results are then examined and analysed
in chapter six. The initial step in the data-analysis pipeline was the standardisation of the

SPM test. The primary reason is to determine whether the SPM test can be effectively used in
Libya. The next step is the analysis of the rest of the study objectives such as the relationship
of the SPM test scores and Students Academic Achievements (SAA). The outcomes of this
chapter are compared to those found in other studies in chapter seven (meta-analysis). These
studies are sampled from both developed and developing countries. Also covered in this
chapter are literature review of meta-analysis applications on SPM tests, methodology, data
analysis tools and finally meta-analysis results.
The final part of the book, chapter eight, brings together the key research findings and
discusses them in context with the wider existing literature. Intelligence testing and IQ
distribution in Libya are discussed and evaluated in context of the available facilities. The
methods of data collection; SPM test and meta-analysis, are highlighted. The major
conclusions of the whole book and its contribution in the field of intelligence testing in Libya
are outlined. Moreover, strengths and weaknesses of the study are presented. Finally
recommendations for practice and future research naturally emerge from the study findings
are suggested.

Chapter two: INTELLIGENCE LITERATURE REVIEW
2.1 Introduction
Intelligence is a difficult construct to define. In a survey carried out by Snyderman
and Rothman (in Li: 1996), social scientists and educators were questioned on the
nature of intelligence. 99.3% indicated that abstract thinking or reasoning was an
important element of intelligence, 97.7% indicated that problem-solving ability was
important, and 96% indicted that capacity to acquire knowledge was important. This
survey emphasized the importance of thinking, learning and problem solving as
elements of intelligence (Marais, 2007). In another study nearly 500 laypeople and 24
experts were asked to define intelligence; Sternberg (2000) found that their responses
were surprisingly similar. Both groups viewed intelligence as a complex construct
made up of verbal ability, practical problem solving and social competence.
Intelligence is an important component of learning and academic achievement
because it can be seen as the ability to gain knowledge, to think about abstract
concepts, to reason as well as the ability to solve problems (Li, 1996).
An important consideration which has been in existence since Alfird Binet
constructed the first intelligence test, in 1905, is that although intelligence is relatively
stable, it should not be seen as a fixed characteristic.
The purpose of this chapter is a historical review of the literature. First, the definition
of the concept of intelligence, its evolution and means of testing are presented. A brief
look at some of the important theories of intelligence that have been developed over
the past century is then highlighted. After that, the definitions, classification, and uses
of some mental tests including culture fair tests, achievement tests, intelligence and
academic achievement are discussed in depth. Finally the evolution of the Intelligence
6
Quitenance (IQ) with time in different countries will be studied.
2.2 Definitions of Intelligence
Intelligence, a in word common using today, was almost unknown in popular speech a
century ago. After intelligence tests had been invented to measure intelligence,
scientists felt the urge to define it. They reintroduced the ancient Latin term
"intelligence" to refer to individual differences in mental ability (Aiken, 1988).
Sternberg (1990) mentions that today, as in the past, there seem to be as many
definition of intelligence as there are investigators of it. Wechsler (1975) also stated
that intelligence has been viewed by educators as the ability to learn, by biologists as
the ability to adapt to environment, and by psychologists as the ability to understand
relationships.
The history of the differences between psychologists regarding definitions of
intelligence is reflected in two symposia to define intelligence; the first was in 1921,
while the second was in 1986.
2.2.1 The 1921 Symposium
In 1921, the Editors of the Journal of Educational Psychology invited psychologists to
take part in a symposium (Intelligence and its Measurement). The contributors were
asked to define “ intelligence”. Following are some of their definitions:
Intelligence is equivalent to the capacity to learn or it is the ability to learn to adjust
oneself to environment (Colvin) P.136.
Intelligence is the capacity to learn or profit by experience (Dearbon) P.210.
Intelligence is sensory capacity, capacity for perceptual recognition, quickness, range
7
or flexibility of association; facility in imagination, span of attention, quickness or
alertness in response (Freeman) P.133.
Intelligence is a group of complex mental processes such as sensation, perception,
association, memory, imagination, discrimination, judgement, and reasoning
(Haggerty) P.212.
Intelligence involves two factors; the capacity for knowledge and knowledge
possessed (Henmon) P.195.
Intelligence seems to be a biological mechanism by which the effects of a complexity
of stimuli are brought together and given a somewhat unified effect in behaviour
(Peterson) P.198.
Intelligence is the ability of the individual to adapt himself adequately to relatively
new situations in life (Pintner) P.139.
Intelligence is the ability to carry on abstract thinking (Terman) P.128.
Intelligence is the power of good responses from the point of view of truth or fact
(Thorndike) P.124
Intelligence is the capacity to acquire capacity (Woodrow) P.207.
Intelligence contains at least three psychologically differentiable components: a) the
capacity to inhibit an instinctive adjustment, b) the capacity to redefine the inhibited
instinctive adjustment in the light of imaginably experienced trial and error, and c) the
capacity to realise the modified instinctive adjustment in overt behaviour to the
advantage of the individual as a social animal (Thurstone) P.201-202.
8
The most famous definition of intelligence which explains the absence of agreement
among psychologists, was made by Boring in1923, who claimed that intelligence is
what intelligence tests test. Spearman (1927) pointed out that intelligence had become
a word with so many meaning that finally it had none.
These psychologists gave different views about the nature of intelligence, although
there was much in common in their definitions (Sattler, 1982). In 1975 Samuda,
talked about ambiguity and little agreement found between psychologists in the 1921
Symposium, he stated:
If the experiment was to be replicated today, the same ambiguity that

existed some 50 years ago would still be apparent, for one need only
look at the more common definition of intelligence in order to realise
that psychologists still have not characterised explicitly and
universally what it means. P.26
2.2.2 The 1986 Symposium
Sixty-five years after the 1921 Symposium to define intelligence, Strenberg and
Detterman (1986) noticed that the effort to define intelligence had not been repeated.
In 1986 they asked experts in the field of intelligence to respond to the very same
question that was posed in the 1921 Symposium, to see what theorists of intelligence
today believed intelligence to be. The following are some of the 1986 Symposium
definitions:
Intelligence is quality of behaviour that is adaptive, representing effective ways of
meeting the demands of environment as they change (Anastasi) P.19.
Intelligence is a construct such as innate intellectual capacity, intellectual reserve
capacity, learning capacity, intellectual abilities, intelligence systems, problem-
solving ability, and knowledge system (Baltes) P.24.
9
Intelligence is a set of whatever abilities make people successful at achieving their
rationally chosen goals (Baron) P.29.
Intelligence is adaptive for a given cultural group in permitting members of the group,
as well as a whole, to operate effectively in a given ecological context (Berry) P.35.
Intelligence is the sum total of all cognitive processes, including planning, coding of
information and arousal of attention (Das) P.55.
Intelligence is a finite set of independent abilities operating as a complex system
(Detterman) P.57.
Intelligence consists of three capacities: (a) the capacity to manipulate symbols, (b)
the capacity to evaluate the consequences of alternative choices, and (c) the capacity
to search through sequences of symbols (Estes) P.65
Intelligence is proficiency (or competence) in intellectual cognitive performance
(Glaser) P.79.
Intelligence is the repertoire of intellectual knowledge and skills available to the
person at a particular point in time (Humphreys) P.98.
Intelligence is a general factor obtained from factoring an intercorrelation matrix of a
large number of diverse mental tests (Jensen) P.110.
Intelligence is implicitly determined by the interaction of organisms’ (individuals')
cognitive machinery and their social-culture environment (Pellegrino) P.113.
Intelligence provides a means to govern ourselves so that our thought and action are
organised, coherent, and responsive both to our internally driven needs and to the
10
needs of the environment (Sternbrg) P.141.
Intelligence is a hypothetical construct referring to an individual's cognitive processes
(Zigler) P.149.
After the two symposia to define intelligence, no single definition of intelligence was
agreed upon by psychologists. Viewed broadly, however, two themes seemed to run
through at least several of the definitions in the complete set: the capacity to learn
from experience and the capacity to adapt to one’s environment.
Again Sternberg (1990) found that some general agreement exists across the two
symposia regarding the nature of intelligence. He stated that attributes such as
adaptation to the environment, basic mental processes, and higher order thinking like
reasoning, problem solving and decision making were prominent in both symposia.
Charles Spearman defined intelligence as the ability to recognise relations and related
items (Abdel-Khalek, 2000) which is what John Raven’s test measures.
Lynn and Vanhanen in (2006) reported that a useful definition of intelligence was
proposed by Neisser in 1996; intelligence is the ability "to understand complex ideas,
adapt effectively to the environment, learn from experience, engage in various forms
of reasoning, and to overcome obstacles by taking thought" (Neisser, 1996, p. 1).
Also a similar definition by Gottfredson was published in the Wall Street Journal in
1994 as "Intelligence is a very general mental capacity which, among other things,
involves the ability to reason, plan, solve problems, think abstractly, comprehend
complex ideas, learn quickly and learn from experience. It is not merely book
learning, a narrow academic skill, or test taking smarts. Rather, it reflects a broader
11
and deeper capability for comprehending our surroundings - 'catching on', 'making
sense' of thing, or 'figuring out' what to do" (Gottfredson, 1997a, p. 13).
More recently Schmidt and Hunter (2004, p. 162) have taken stock of the results of a
century’s research on intelligence: “the accumulated evidence has become very strong
that general intelligence is correlated with a wide variety of life outcomes, ranging
from risky health-related behavior to criminal offenses, to the ability to use a bus or a
subway system”. Among the numerous tasks that intelligent people do more
effectively than less intelligent people are to acquire complex skills and work more
proficiently ( Lynn and Vanhanen, 2006).
In general two themes seem to run through at least several of the definitions in the
complete set: the capacity to learn from experience and the capacity to adapt to one’s
environment.
2.3 Evolution of the Concept of Intelligence and Intelligence Testing
Differences in intelligence have been evident since the beginnings of
civilization. In 400 BC, the Greek used the term “nous” to express intelligence.
Plato in his “Republic” claimed that “nous” is mostly inherited and that off
spring can be bred for it by selectively from parents who had the most “nous”
(Lynn and Vanhanen, 2006).
In addition 500 BC in China the Sui dynasty used tests of ability for the
administrative class of mandarins. These tests, in Chinese history, literature,
mathematics and astronomy, were still employed until the 20th century (Lynn
and Vanhanen, 2006).
In his book “Examen de Ingenios”, Huarte (1575) investigated the nature of
12
intelligence. In 1594 the book was translated into English in which the term
“wits” was used to express intelligence. The book evaluates the various types
of intelligence needed to succeed in medicine, law, the army, administration
and church ( Lynn and Vanhanen, 2006).
In (1651) Thomas Hobbes’ wrote in Leviathan:
“Virtue generally, in all sorts of subjects, is somewhat that is valued

for eminence, and consisteth in comparison. For if all things were
equal in all men, nothing would be prized. And by “virtues
intellectual” are always understood such abilities of the mind as men
praise, value and desire should be in themselves; and go commonly
under the name of a “good wit” (pp.38-39).
Hobbes proposed the concept of “natural wit” which:
“Is gotten by use only and experience; without method, culture or

instruction” (p.39)
And he was distinguishing here between intelligence and educational
attainment. He proposed further that:
“This natural wit consisteth in two things, celerity of imagining, that

is swift succession of one thought to another, and steady direction to
some approved end. A slow imagination maketh that defect or fault
of mind that is commonly called dullness, stupidity, and sometimes
by other names that signify slowness of motion” (p.39).
During the nineteenth century the study of mental retardation witnessed a strong
awakening of interest in the human treatment, training and education of the mentally
retarded. Anastasi (1988) stated that one of the first problems that stimulated the
development of psychological tests was the identification of the mentally retarded.
Marks (1981) and Rust and Golombak (1989) observed that the rapid scientific and
social progress in Europe during the nineteenth century led to the development of
several assessment techniques, most notably in medical diagnosis of the mentally ill.
13
Empirical support for the theoretical basis of intelligence as a unitary construct
essentially began with the development of factor analysis (Ittenbach, Esters, &
Wainer, 1997). The historical antecedents for factor analysis originated with the work
of Galton who developed many of the quantitative devices utilized in psychometry
(e.g., the bivariate scatter diagram, regression, correlation, and standardized
measurements) (Jensen, 1980). Galton in (1869) further developed the concept of
intelligence in his publications. He claimed that intelligence is a mainly inherited
single entity and that intelligence determines the level of civilization. He studied the
number of geniuses compared to the size of their populations and reached the
conclusion that there is a difference in average-intelligence among races; the Greeks
being the most intelligent while the Australian Aborigines being lowest (Galton,
1869 ).
Galton was the first researcher to utilize empirically objective devices to measure
individual differences in mental abilities (Jensen, 1980). He administered different
measures of mental functioning to thousands of individuals as he refined his methods
of assessing mental ability. Galton analyzed the scores and applied statistical
reasoning to the study of those with high ability. He was the first to identify "general
mental ability" in humans (Jensen, 1980).
One of Galton's followers, Spearman, was the first to assert that all individual
variance in higher order mental abilities is positively correlated. The
aforementioned contention supported Galton's belief in a general factor of mental
ability (Jensen, 1980). Spearman introduced factor analysis, in part, to ascertain the
degree to which a test measures a general factor (Jensen, 1980). Spearman used
factor analysis to determine whether the shared variance in a matrix of correlation
coefficients resulted in a single general factor or in several independent more
14
specific factors (Gould, 1996). Spearman believed each test of mental abilities had a
single general factor, g, as well as specific factors (s) unique to the test. These
beliefs led to the development of the two-factor theory of intelligence. Spearman
and many scholars (Carroll, 1993; Hermstein & Murray, 1994; Jensen, 1980;
Rushton, 1997) continued to believe scores on intelligence tests are reflected best
by g. These theorists consider g to be the most parsimonious method to describe
one's intelligence and thus to use when examining mean IQ differences between
races (Neisser, 1998).
Factor analysis soon became one of the most important techniques in modern
multivariate statistics (Gould, 1996; Kamphaus, Petosky, & Morgan, 1997). It is a
statistical technique that allows one to analyze the sources of variance of a particular
measure by examining the pattern of correlations between two measures and other
measures. The technique is useful to reduce a complex set of correlations into fewer
dimensions by factoring a matrix of correlation coefficients (Gould, 1981). The
variables that were most highly correlated were combined to form the first principal
component by placing an axis through all the points. Other axes, drawn to account for
the other variables, are labeled second and third (etc.) order factors (Edwards, 2003).
Relative to intelligence testing, factor analysis has been applied to show positive
correlations among different mental tests (Gould, 1996). In that most correlation
coefficients in mental tests are positive, factor analysis yielded a reasonably strong
first principal component (Gould, 1996).
General factor theorists such as Spearman used factor analytic techniques to
demonstrate the viability of g as the first factor to emerge when analyzing factor
scores for intelligence tests. Other theorists used factor analysis to suggest that IQs
15
depend on a number of independent factors, not a large general factor (Gardner,
1983; Spearman, 1923).
Although researchers may disagree about the structure of intelligence, they agree that
IQs arise as a function, at least to some degree, from a general factor as well as
reflecting multidimensional aspects of intellectual functioning (Carroll, 1993; Sattler,
1998; Urbach, 1974). To reiterate, g is important because it is considered the best way
to express one's general mental ability.
The history of mental measurement development during the nineteenth and early
twentieth century can be classified through the contributions of scientists such as
Seguin, Esquirol, Galton, Cattell, Binet, and Spearman. Detailed description of these
contributions are voluminous and moreover, beyond the scope of this study so we will
confine ourselves to providing a brief summary of each one.
2.3.1 Contribution of Edward Seguin (1812-1880)
The French physician Seguin started his career as an assistant to Jean Itard, who was
working with a wild boy found by hunters in the forest of Aveyron. In 1837 Seguin
established the first school for training and education of mentally retarded children. In
1844 he emigrated to America where his ideas gained wide recognition. Guilford
(1967) mentioned that Seguin was pioneering in the training of mentally retarded
individuals by exercising their sensory and motor function. In 1866 he developed the
first non-verbal test, the Seguin Form Board, in which the individual is required to put
variously shaped blocks back in their closely fitting spaces as quickly as possible.
Corsini (1984) mentioned that Seguin's test was the first to be used as some measure
of intellectual functioning. Domino & Dominom (2006) reported that Eduard Seguin
16
developed many procedures to enhance muscular control and sensory abilities for the
mentally deficient. Some of these procedures were later incorporated into tests of
intelligence.
2.3.2 Contribution of Jean Etienne Esquirol (1772-1840)
In 1838 Esquirol, another French physician was the first person to make a clear
distinction between mental retardation and mental illness. He pointed out that mental
retards may have never developed their intellectual capacity, whereas mentally ill
people had lost the abilities they once possessed. He also pointed out that the
individual's use of language and therefore language tests provided the most
dependable criterion of his or her intellectual level in developing a method for
differentiating mental retardation from mental illness (Anastasi & Urbina, 1997;
Domino & Domino, 2006).
2.3.3 Contribution of Sir Francis Galton (1822-1911)
Some commentators have suggested that testing movement began with the English
biologist Galton who was interested in human heredity. Anastasi (1988) believed that
Galton was primarily responsible for launching mental measurement. Richardson
(1991) also believed that the first person to seriously attempt to measure intelligence
was Galton. Galton realised the need for measuring characteristics of related and
unrelated individuals to discover the degree of resemblance between parents and
offspring. Galton was the first scientist who undertook statistical measurement of
individual differences.
For seven years from 1884 to 1890 Galton set up an anthropometric laboratory at
South Kensington Museum in London, where for a small fee, visitors could have
themselves measured on a variety of physical traits like vision, hearing, muscular
17
strength, reaction time, and other simple sensorimotor functions (Anastasi & Urbina,
1997; Snyderman & Rothman, 1988; Virgolim, 2005; Domino & Domino, 2006).
Herrnstein & Murray (1994) stated that Galton had the idea that intelligence would
surface in the form of sensitivity of perception, so he constructed tests that relied on
measures of sight, hearing, sensitivity to light, skin pressure, and speed of reaction to
simple stimuli. He therefore concluded that the more perceptive the senses, the larger
the range of information would be on which intelligence could act. Jensen, as reported
by Corsini (1984) points out that Galton's contribution to statistics and psychometrics
included percentile ranks, the use of central tendency and rating scales.
2.3.4 Contribution of James McKeen Cattell (1860-1944)
American born psychologist James Cattell went to Germany and studied with
Wilhelm Wundt at Leipzig where the first psychological laboratory was founded in
1879. The first psychologists at Leipzig studied the same processes that physiologists
did, namely seeing, hearing and speed of response (Attashani and Abdalla, 2005).
Anastasi (1988) claimed that the principal focus of early experimental psychology in
Leipzig was on formulating generalised descriptions of human behaviour. Thus
individual differences were either ignored or accepted as a form of error or as a
necessary evil that limited the applicability of generalisation.
For his doctorate Cattell completed a dissertation on individual differences in reaction
time. He lectured at Cambridge University where he met Galton, who shared Cattell’s,
interests. He was also active in the spread of the testing movement in the USA (
Anastasi & Urbina 1997; Sternberg 2000).
Cattell proposed a series of 50 psychophysical tests, most of them were of a sensory
18
and motor nature, and differing little from those designed by Galton. In an article
published in 1890 in Mind, entitled "Mental Tests and Measurements", Cattell was the
first to use the term "mental test" in psychological literature (Freeman, 1962; Eysenck
et al., 1972; Sattler, 1982; Fancher, 1985; Anastasi, 1988; Sternberg, 1990).
Freeman (1962) and Jensen (1981) both concluded that the Galton-Cattell approaches
to measurement of mental ability, whilst not of major significance in the field of
testing, did nonetheless strongly affect the course taken by test experimenters until
about 1900 when the influence of Alfred Binet was first felt.
2.3.5 Contribution of Alfred Binet (1857-1911)
The history of mental testing is widely considered to have begun with the work of
Binet. Binet, Simon, and Henri, spent many years in research on ways of measuring
intelligence. Anastasi (1988) stated that in 1895 Binet and Henri published an article
in which they criticised most available tests (Galton type tests) as being too sensory
and concentrating on simple specialised abilities. Their research suggested that the
key to the measurement of intelligence lay in focusing on higher mental processes
instead of measuring simple sensory functions as in Galton and Cattell tests.
Binet assumed that intelligence was not much involved in sensory-motor tasks but in
tasks calling for more complex mental processes, especially judgement (Jensen 1980).
Binet and Simon believed that essential activities of intelligence were to judge well, to
comprehend well and to reason well. Binet found that children who were best in
judgement tended to be superior in attention, and vocabulary (Sternberg 1990).
In 1904 the Ministry of Public Instruction in France appointed a committee to study
the procedures for the education of mentally retarded children. A member of this
19
commission was Binet. In 1905 Binet, in collaboration with Simon, prepared the first
Binet-Simon Scale. The scale consisted of 30 items, designed for children aged 3 to
12 years arranged in order of difficulty. Improved versions came out in 1908 and 1911
in which unsatisfactory items were eliminated, items increased and grouped into age
levels and the test was extended to adult level (Roid and Barram, 2004).
Binet’s test emphasised judgement, comprehension, and reasoning which Binet
regarded as essential components of intelligence. A child's score on the test was
reported in terms of mental age (MA). A mental age below the child's chronological
age (CA) indicated some degree of mental retardation; a higher MA than CA
indicated some degree of accelerated intellectual development. In 1912 a German
psychologist, Wilhelm Stern proposed the use of the ratio of mental age to
chronological age to yield the "intelligence quotient" (IQ). Mental age was the level
of ability of the average child certain age, e.g. mental age of 12 is defined by mental
tests the child at 12 years would pass. IQ was Mental Age divided by Chronological
Age multiplied by 100. So a child at 10 years but functions as a child of 5 years would
have an IQ of 5/10 × 100=50. Nowadays, IQ is calculated by transforming the test
scores to a metric with a mean set at 100, and a standard deviation of 15. This would
mean that 96% of the population’s IQ was between 70 and 130. 2% of the population
under 70 and considered mentally retarded while 2% were above 130 and considered
gifted.
Many researchers believe that the testing movement began to flourish after the
introduction of the Binet-Simon Scale in 1905. For example Herrnstein and Murray
(1994) mentioned that Binet developed questions that attempted to measure
intelligence by measuring a person's ability to reason. They concluded that Binet’s
20
test met a key criterion that Galton's test could not. Sattler (1982) mentioned that the
Binet - Simon scale served the purpose of objectively diagnosing a degree of mental
retardation, and became the prototype of subsequent scales for mental ability
assessment.
Within a few years translations and adaptations of the Binet-Simon Scale appeared in
many countries. The most rapid development took place in the USA in 1916 (SB1)
where Lewis M. Terman developed the Stanford revision of the Binet-Simon Scale
(SB1), now familiar as the Stanford-Binet Intelligence Scale. Terman added more
items and made other improvements to the test. The test was revised in 1937 (SB2) (L
and M forms), 1960-1973 (SB3) and again in 1986 where Thorindike, Hagan &
Sattler developed the (SB4) based on a four-factor hierarchical model with general
ability “g” as the overarching summary score. More recently Roid 2003 constructed
(SB5) on a five-factor hierarchical cognitive model ( Roid & Barram 2004) .
The Stanford-Binet Intelligence Scale very quickly became the "standard" I.Q on both
sides of the Atlantic. For more than half a century the Stanford-Binet test has been
one of the most widely used individual test of intelligence and has often served as a
standard for the construction of other tests (Jensen, 1980; Richardson, 1991).
2.3.6 The First World War and the Development of Group Tests
In spite of the success of the Stanford-Binet test, there was one problem in that it was
an individual test administered to one subject by one examiner. As the USA entered
the First World War the need arose for rapid testing of a large numbers of subjects in
a short time (Anastasi & Urbina 1997; Kaufman & Kaufman 2004).
In 1917 Robert Yerkes, the president of the American Psychological Association
claimed that psychology had achieved a position which would enable it to
21
substantially help to win the war and shorten the necessary period of conflict. He
formed a committee of American intelligence testers to develop a test to classify all
recruits in order that they would be properly placed in the military service and to
screen all army recruits for mental defectiveness (Anastasi & Urbina 1997; Kaufman
& Kaufman 2004).
A major contribution to group tests during the World War was made by Arthur. S.
Otis whose group intelligence test "The Scale for the Group Measurement of
Intelligence" was used by the committee becoming the basis of the Army Alpha Test
(Anastasi & Urbina 1997; Kaufman & Kaufman 2004).
The committee quickly developed two tests; the Army Alpha for literate, and the
Army Beta for non-English speakers who were unable to take the test in English. The
Alpha tests included arithmetic problems, general information, and number
sequences. The Beta test included mazes, finding the missing element in pictures and
coding. By the end of the war in 1918 about 1,750,000 men had been given the Army
Alpha or Beta tests (Freeman, 1962; Guilford, 1967; Noll and Scannell, 1979; Ebel,
1972; Marks, 1981; Fancher, 1985; Sokal, 1987; Anastasi, 1988).
Shortly after the First World War the tests were released for civilian use and served as
models for most group intelligence tests. Concurrently, their development gave rise to
a number of controversial questions. Amongst these were the relative influence of
heredity and environment & the explanation of racial differences in measured
performance (Anastasi & Urbina 1997; Kaufman & Kaufman 2004)..
In a summary of the misuse of scores in the United States after the development of
intelligence tests, Kamin (1981) mentioned as examples: sterilisation laws,
22
immigration quotas, and early racism. Tyler and Walsh (1979) stated that after the
development of intelligence group tests, attempts to measure personality
characteristics as well as ability became more and more common.
2.3.7 Contribution of Charles Spearman (1863-1945)
Spearman's work focused on determining whether intelligence was a single ability
factor or a combination of various factors. The measurement of Spearman’s "general"
factor in his two-factor theory was the object of the Standard Progressive Matrices
(SPM) test. Kline (1979) believed that the first contribution from psychometrics to
psychological insight into the nature and structure of human abilities emerged from
the work of Spearman. Eysenck et al. (1972) were also of the view that Spearman’s
two factor theory of intelligence together with Binet-Simon's Scale represented the
starting point for the development of the theory and measurement of intelligence in
the twentieth century.
Spearman's two-factor theory was based on analysis of empirical data from test
scores. Spearman's first investigation was with children in village school (N=24), to
estimate their "intelligence" in three ways: teacher's ranking of children "cleverness in
school" having the two oldest children rank the members of their class for "sharpness
and common sense out of school", and Spearman's rank of children's performance on
three sensory tasks involving pitch, light and weight discrimination.
Spearman found a correlation of 0.55 between the three intellectual variables, the
correlation between the three sensory measures was 0.25, and a correlation of 0.38
between intellectual and sensory measures (Fancher, 1985).
His second investigation was with boys from an upper class preparatory school
23
(N=22). This time he took examination grades in Classics, French, English and Maths
as measures of "intelligence" and correlated them with a pitch discrimination task and
with the music teacher’s ranking of the boys' on musical proficiency. Spearman found
music and pitch correlated with the four intelligence scores at the average of 0.56,
while music and pitch correlated with each other at 0.40 and the correlation between
the four examination grades was on average 0.71 (Fancher, 1985; Richardson, 1991).
In 1904 Spearman published his conclusions in his famous article "General
Intelligence, Objectively Determined and Measured", in which he stated;
On the whole, then, we reach the profoundly important conclusion

that there really exists something that we may provisionally term
"General Sensory Discrimination" and similarly a "General
Intelligence" and further that the functional correspondence between
these two is not appreciably less than absolute (P. 272).
Spearman also discovered that the correlation between the six variables (Classics,
French, English, Maths, Pitch and Music) were not only all positive, but also ranged
themselves in a nearly perfect hierarchy. This was one of the observations that lead to
the formulation of the “g”-theory, which will be presented in the nest section.
Spearman further identified two components of "g" factor as; (a) eductive ability, that
is, the mental activity making meaning out of confusion, developing new insight,
going beyond the given to perceive that which is not immediately obvious, and
generating high level schemata, which make it easy to handle complex events.
Eductive ability is largely non-verbal. (b) Reproductive ability, that is, the ability
involving mastering, recalling, and reproducing the material to recall acquired
information. Reproductive ability is largely verbal (Raven, 1989). According to
Herrnstein (1973), to be clever, for Spearman, meant having lots of "g".
24
Brody (1992) identified at least five important contributions of Spearman's theory to
our understanding of individual differences in intelligence. First, he provided an
explicit theoretical rationale for the construction of a test of intelligence, and
emphasized that intelligence tests should contain subscales or measures that have high
g-to-s ratios. Second, his methods for analyzing correlation matrices were the
foundation of factor analysis. It can be said that his method was the precursor of the
use of construct validation procedures to access the validity of a measure. Third,
Spearman conceived intelligence as a construct and a hypothetical entity, which could
not be identified with any particular measure or subset of measures. Fourth, his theory
contained a strong empirical claim that all measures of intelligence were measures of
a single common theoretical entity, a supposition that is still in debate in
contemporary research. Finally, Spearman may have been correct when he assumed
the existence of a relationship between simple sensory discrimination tasks and
intelligence, as hypothesized in previous studies. However, he criticized the results of
Wissler's research, first because of the intellectual homogeneity of his sample
(Columbia University students), and second because of the lack of ideal conditions of
measurements in the experiment, 3 subjects were tested at once, responding to 22 tests
in 45 minutes (Virgolim, 2005).
2.3.8 Contribution of Piaget (1896-1980)
One of the most important contributions to the study of intelligence emerged from
the work of Jean Piaget, who sought to explain intellectual development as a result
of changes in the cognitive function (Piaget, 1961). Piaget began his inquiry in a
non-scientific way, selecting only three subjects to study (his own children) without
a control group. However, he described the results of his observations in such a clear
25
and detailed manner, that his evidence permitted him to explain important principles
of growth and development (Virgolim, 2005). Many subsequent studies have
reported his principles as viable and useful (Clark, 1992; Wadsworth, 1993).
According to Piaget (1961), the cognitive processes emerged as a result of the
reorganization of psychological structures that resulted from the dynamic
interaction of a child with his/her environment. The interaction among the critical
variables to cognitive development (such as maturation, experience, social
interaction and equilibration) regulated the direction of the child's development
(Wadsworth, 1993). The Piagetian tests, unlike the traditional psychometric tests
used so far, aimed to assess not what we know (the product), but rather how we
know or think (the process), and how people obtain and use information to solve
problems and acquire knowledge (Weinberg, 1989).
Piaget was also one of the first theorists to establish an interactive theory of
intelligence. According to him, the cognitive development equally depended on genetic
contributions as well as quality of environment where the child lived. This position
has numerous followers and, as pointed out by Plomin (1989), the most recent
researchers support the notion that genetic influences on behavior are multifactorial,
equally comprising hereditary transmission and the environment. Although genetic
factors, in general, account for no more than half of the variance of behavioral traits,
they affect probabilistic propensities rather than predetermined programming (Plomin,
1997). However, as pointed out by Neisser and his collaborators (1996), the pathways
by which genes make their contributions to individual differences in intelligence were
largely unknown. Similarly, the exact way the environment contributes to those
differences still remain a mystery.
26
2.4 Theories of Intelligence
2.4.1 Spearman’s “g” Theory
An important advance in the theory of intelligence was made by Charles Spearman
(1904) in the early twentieth century. Spearman showed that all cognitive abilities are
positively inter-correlated, e.g. people who do well on some tasks tend to do well on
others. He invented the statistical method of factor analysis to show that the efficiency
of performance on all cognitive tasks was partly determined by a common factor. He
designated this common factor “g” for "general intelligence" and defined it as "the
eduction of relations and correlates" (Spearman, 1927). To explain the existence of
the common factor, Spearman proposed the presence of some general mental power
determining performance on all cognitive tasks and responsible for their positive
inter-correlation. Nevertheless, he also found that correlations between tests of
different abilities are not perfect (Lynn and Vanhanen 2006). To explain this he
proposed that in addition to “g”, there were a number of specific abilities that
determined performance on particular types of tasks; over and above the effect of “g”.
Spearman identified three major laws of cognitive activities associated with “g”.
The first was the Law of Apprehension, that is, the fact that a person
approaches the stimulation he receives from all external and internal sources
via the ascending nerves.... Next we have the eduction of Relations. Given two
stimuli, ideas, or impressions, we can immediately discover any relationship
existing between them-one is larger, simpler, stronger or whatever than the
other. And finally, we have the eduction of Correlates-given two stimuli,
joined by a given relation, and a third stimulus, we can produce a fourth
stimulus that bears the same relation to the third as the second bears to the
first.... If Spearman is right, then tests constructed on these principles, that is,
using apprehension, eduction of relations and eduction of correlates, should be
the best measures of g; that is, correlate best with all other tests. This has been
found to be so; the Matrices test... has been found to be just about the purest
measure of IQ. (Eysenck, 1998, p. 57).
By the end of the twentieth century Spearman’s basic theory had become virtually
27
universally accepted in the academic discipline of differential psychology. The
principal elaboration of the theory has been the development of what is called the
hierarchical model of intelligence. This consists of a hierarchical structure in which
there are numerous narrow specific abilities at the base, eight “second order or group
factors” consisting of verbal comprehension, reasoning, memory, spatial, perceptual,
mathematical, cultural knowledge and cognitive speed in the middle of the structure
and a single general factor - Spearman’s “g” - at the apex. This model was widely
accepted among contemporary experts such as the American Task Force chaired by
Ulrich Neisser (1996), Jensen (1998), Mackintosh (1998), Carroll (1994), Deary
(2000) and many others.
Matrices tests such as the Raven's Progressive Matrices employed Spearman's theory
and have been widely used as measures of intelligence (Eysenck, 1998). Matrices
tests contained substantial loadings of “g” and demanded conscious and complex
mental effort, often evident in analytical, abstract, and hypothesis-testing tasks
(Sattler, 1988). Conversely, tests that require less conscious and complex mental
effort are low in g (Sattler, 1988). Intelligence tests with lower g emphasize specific
factors such as recognition, recall, speed, visual-motor abilities, and motor abilities
(Sattler, 1988).
2.4.2 Thurstone's Primary Mental Abilities (1938)
Louis Thurstone (1938) disagreed with the idea that intelligence comprised a general
factor. Thurstone viewed intelligence as a multidimensional rather than a unitary trait.
Thurstone was intent on showing how intelligence could be separated into the noted
multiple factors, each of which had equivalent significance (Sattler, 1998). In his
1935 book, The Vectors of Mind, he hypothesized that intelligence consists of a small
28
number of independent factors, corresponding to different cognitive domains, each of
them contributing in different degrees, depending on the individual's situation. These
factors were: verbal ability, general reasoning (inductive and deductive), numerical
ability, memory, perceptual speed, word fluency, and spatial ability. These factors are
still present in traditional measures of intelligence (Snyderman & Rothman, 1988).
Thurstone initially discounted a general factor as a component of mental functioning.
He analysed the results of 50 intelligence tests which he administered to college
students and came to the conclusion that there were seven primary mental abilities
that made up a person’s intelligence. The abilities or factor were; Spatial (S) the
ability to form spatial and visual images. Perceptual (P): the ability to find or
recognise particular items in a perceptual field. Numerical (N): the ability to perform
simple numerical calculations. Verbal relations (V): the ability to conceptualize ideas
and meanings in language. Word (W) the ability to deal with single and isolated
words in a fluent manner. Memory (M) the ability to recognize and recall words,
number and figures after having memorized them. Inductive Reasoning (I) the ability
to find a rule or principle and apply it. Restrictive reasoning (R): the ability to
successfully complete tasks that involve restriction in the solution. Arithmetical
reasoning utilizes restrictive reasoning as the answer to an arithmetical calculation is
limited to one correct solution. Deductive Reasoning (D) the ability to draw a logical
conclusion from a set of assumptions (Thurstone, 1938). However, Sternberg (1985a)
pointed out that the differences between Spearman's and Thurstone's theories seemed
to be of emphasis rather than of substance. Later in their lives, Spearman was
compelled to recognize the existence of group factors, while Thurstone was forced to
acknowledge the existence of a higher-order general factor, connected, in some way,
to the primary mental abilities (Snyderman & Rothman, 1988). In 1941, Cattell
29
proposed a reconciliation between the two theories by postulating the existence of a
hierarchical structure of ability (Snyderman & Rothman, 1988; Brody, 1992). The
“g” factor would be a general, common factor, presented in all measures of the
ability, derivable from the relationships that exist among the more specialized factors
postulated by Thurstone.
2.4.3 Guilford’s structure of the intellect theory
Guilford (1967, 1985) identified many different factors which together make up the
structure of “intellect” or “intelligence”. Intelligent functions were defined according
to three different dimensions: operation, content and product. Mental processes
identified by Guilford (1967) were Cognition: comprehension or understanding of
information. Memory: ability to recall and recognise information that has been
memorised. Divergent Production: creative thinking which involves fluency,
flexibility and elaboration abilities. Convergent Production: this refers to thinking in
which a single correct answer to a question is produced. Evaluation: comparing a
product of information with known information according to logical criteria and
making a decision concerning criterion satisfaction. Visual: the visual category refers
to information that is visually perceived, e.g. correct perception of words that have
missing letters. Auditory: refers to information that is heard and therefore auditory
discrimination is important, e.g. listening to and interpreting a radio code. Symbolic;
information: in the form of tokens or signs and stands for something else, e.g. printed
language. Semantic: meanings of words comprise semantic content. Behavioural:
nonverbal information is involved in human interactions. Abilities were not only
classified according to the processes and content but also according to the form in
which the information was processed. The form of information is classified into
product categories. The products identified were Units: the most basic form of
30
information is units or parts of wholes. Units can be seen as chunks of information,
e.g. single words. Classes: a class is a set of objects with one or more common
properties, e.g. in number classification, the number 22 first in the class formed by the
numbers 44, 55 and 33. Relations: a relation is a connection between two things. An
item testing the cognition of relations e.g. may require the identification of a relation
as the movement of a line by 45 degrees in a clockwise direction. This relation is then
applied to another set of figures. Systems: complexes, patterns or organizations of
interdependent or interacting parts from systems. In testing the cognition of systems,
spatial orientation tasks may be used, where visual rotation and consideration of many
different parts and their changing relationships to each other are involved.
Transformations: changes, revisions, redefinitions or modification, by which any
product of information in one state changed over into another state involves
transformation. In testing cognition of semantic transformation, the respondent may
have to explain the many different ways in which two common objects, such as an
apple and an orange, are alike. This involves the redefinition of the objects by
emphasising one attribute or another. Implication: an implication is something
expected, anticipated or predicted from given information. In an item testing the
cognition of symbolic implications, different words are placed in relation to each
other in the manner of a crossword so that words may be read down or across.
Considering position of letters gives rise to the expectation that one of the other words
would fit in a certain place (Guilford, 1967).
2.4.4 Gardner’s theory of multiple intelligences
Gardner (1993) defined intelligence as comprising different kinds of processing
operations that allow a person to achieve in one or more of eight culturally
meaningful areas. Gardner did not agree with the concept of a general intelligence
31
factor (g) and held that eight different intelligences were found to a greater or lesser
extent in different individuals. The eight intelligences identified by Gardner were
Linguistic: sensitivity to sounds, rhythms, meanings of words and different language
functions. Logico-mathematical: sensitivity and capacity to detect logical or
numerical patterns; ability to handle long chains of logical reasoning. Musical: ability
to produce and appreciate pitch, rhythm (or melody) and aesthetic-sounding tones;
understanding forms of musical expressiveness. Spatial: to perceive visual-spatial
words accurately, to perform transformations on those perceptions, and to recreate
aspects of visual experience in the absence of relevant stimuli. Bodily-kinaesthetic:
ability to use the body skillfully for expressive as well as goal-directed purposes;
ability to handle objects skillfully. Naturalist: to recognize and classify all varieties of
animals, minerals and plants. Interpersonal: detection and appropriate responding to
the moods, temperaments, motivations and intentions of others. Intrapersonal: ability
to discriminate complex inner feelings and to use them to guide one’s own behaviour;
knowledge of one’s own strengths, weaknesses, desires and intelligences. Only few
factor analytical studies support the existence of multiple intelligences as Gardner saw
them (Marais, 2007).
2.4.5 Cattell and Horn’s theory of fluid and crystallized intelligence
Cattell proposed a theory that intelligence consisted of two major types of cognitive
abilities: crystallised and fluid intelligence. Crystallised intelligence (Gc) referred to
acquired skills and knowledge that were dependent on exposure to a particular culture,
as well as formal and informal education, for example, vocabulary. The abilities that
made up fluid intelligence (Gf) were nonverbal, relatively culture-free, and
independent of any specific instruction, for example, memory for digits (Cohen &
Swerdlik 2002).
32
Tests that measured the ability to manipulate information and solve problems were
considered measures of fluid ability whereas tests that require simple recall or
recognition of information were considered measures of crystallized abilities (Sattler,
1998).
2.4.6 Carroll’s three-startum theory of cognitive abilities
Carroll (1994) used exploratory factor analysis to test his belief that human cognitive
abilities could be conceptualized hierarchically (McGrew & Woodcock, 2001). He
developed a hierarchically arranged model of cognitive abilities. This model
elaborated on the models proposed by Spearman, Thurstone and Cattell. Carroll
represented the structure of intelligence as a pyramid, with ‘g’, or general intelligence
as conceptualized by Spearman, at the top (Berk, 2000). Eight broad abilities occupied
the second stratum, arranged from left to right in terms of their decreasing correlation
with ‘g’. The eight abilities were fluid intelligence, Crystallised Intelligence, General
Memory and Associative learning, Broad Visual perception, Broad Cognitive
Speediness and Processing Speed ( Berk, 2000).
2.4.7 Cattell-Horn Carroll Model
The Cattell-Horn-Carroll theory of intelligence was most closely derived from
Spearman's theory of g, the fluid and crystallized intelligence theories of Cattell and
Horn, and the factor-analytic work of Carroll. The Cattell-Horn theory of intelligence
was combined with the Carroll model, to provide a comprehensive conceptualization
of human cognitive abilities that many scientists would agree on (Cohen & Swerdlik
2002).
In Cattell-Horn-Carroll (CHC) model, there were ten broad stratum abilities and over
seventy narrow stratum abilities. Each broad stratum ability included two or more
33
narrow stratum abilities. The ten broad stratum abilities were: Fluid Intelligence (Gf),
Crystallised Intelligence (Gc) Quantitive Knowledge (Gq), Reading/Writing ability
(Grw), Short-term Memory (Gsm), Visual Processing (Ga), Long-term storage and
Retrieval (Glr), Processing Speed (Gs) and Decision/Reaction time or Speed (Gt).
Recent studies showed that the CHC model offered a better representation of the
structure of intelligence compared to other selected models or theories (Marais, 2007).
2.5 Definitions of Mental Test
Standardisation of a test designates setting up norms .This means obtaining average
scores and distributions from a representative population. The importance of
standardisation is that it gives the test scores psychological meaning and thus makes
interpretation possible. Practically, standardization of test is essential in vocational
guidance or personnel selection where decisions about individuals are made (Kline
2000).
Kline (2000) argued that as norms are essential for the understanding the
measurements (test scores) they must be accurate. To ensure this he mentioned that
some requirements for a good standardisation should be met. These include sampling
and expressing of the results which will be discussed in detail later (chapters 5 and 6).
Cronbach, (1990) stated that a test is a systematic procedure for observing behaviour
and describing it with the aid of numerical scales or fixed categories. Anastasi (1988)
and Brown (1983) defined psychological testing essentially as an objective and
standardised measure of a sample of behaviour. Also Anastasi (1988) added that a
diagnostic or predictive value of a psychological test depends on the degree to which
it serves as an indicator of a relatively broad and significant area of behaviour.
Jensen, (1981) defined a mental test as a small sample of behaviour used to predict
34
more extensive or important behaviour or capability. He added that mental tests were
essentially similar to other tests. Tyler and Walsh (1979) defined tests as standardised
situations designed to elicit a sample of an individual's behaviour.
From the above definitions it is clear that a test is a tool used to measure a sample of
behaviour, not a complete inventory. Psychological tests are standardised, that is, each
test is administered under a prescribed set of procedures, and objective which implies
judgement or evaluation of test scores. Scarr (1981) mentioned that the sampling
rationale was that an individual who can repeat six digits backwards can also
manipulate other information in his/her head.
2.6 Classification of Mental Tests
Mental tests are a subset of psychological tests. Psychological tests can be divided
into:
a) Mental tests which are used to measure general intellectual ability of individuals
(intelligence tests) or to measure an individual's ability of a specific kind, like
mechanical, clerical or musical (aptitude tests).
b) Personality tests which are used to evaluate non-intellectual traits of personality by
questionnaires, self-rating inventories or projective techniques.
Mental tests can be classified according to timing, procedure of test administration,
and content (Attashani and Abdalla, 2005).
2.6.1.1 Classification of tests according to timing
a) Speed tests: measure speed and efficiency with which a subject can perform test
items. In a speed test the items are so easy and simple that almost anyone could get
them all right if given sufficient time. Such test identifies who works faster (Jensen,
35
1980; 1981; Brown, 1983).
b) Power tests: determine highest level of knowledge, skill or reasoning the subject
can demonstrate without time pressure. They consist of items graded in difficulty or
complexity. In a power test there is no time limit or a very liberal time limit which
allows individuals to complete all items they can answer correctly. Scores in a power
test reflect the level of difficulty of items the test taker can answer correctly. (Jensen,
1980; 1981 and Brown, 1983).
2.6.1.2 Classification of tests according to procedure of administration
a) Individual tests: test is administered to one subject by one examiner at a time. It
allows the examiner to observe the subjects performance on the test items, which
helps in evaluating test scores. Common examples of individual intelligence tests are
Wechsler and Stanford-Binet tests (Kline 2000).
b) Group tests: administered to a number of subjects at the same time. Often referred
to as paper and pencil tests because they require subjects to write answers or make
marks on specially prepared answer sheets. Because of their simplicity and low cost,
this type of test is more popular than individual tests (Ahmann & Glock, 1976).
Group intelligence tests are more often used for initial screening in schools and
businesses because they can be administered quickly and economically by people with
minimum training. Individual intelligence tests are preferred by psychologists in
clinical and other settings where clinical diagnoses are made and where they serve as
measures of general ability and as a means of obtaining insight into personality
functioning and disabilities (Anastasi & Urbina 1997).
36
2.6.1.3 Classification of tests according to content
a) Verbal tests: involves the use of language, spoken or written, but they may or may
not require reading or writing. Typical verbal tests are general information, verbal
analogies, and vocabulary tests (Kline, 2000).
b) Non-verbal tests: paper and pencil tests that involve no explicit use of language, in
some cases not even for giving instructions for taking the test. These tests consist of
such things as figural analogies, matrices, and embedded figures. The SPM test is an
example of such tests (Kline, 2000; Domino & Domino 2006).
c) Performance tests: non-verbal tests that require the subject to perform certain
actions such as drawing, manipulation or construction. These tests may consist of
figure copying, block design, and picture completion or picture arrangement. The
performance part of Wechsler test is an example of performance tests (Kline, 2000;
Sternberg 2000).
2.7 Uses of Mental Tests
Classification, training, and education of mentally retarded individuals were the initial
sparks to development of mental tests. In general, mental tests have been used for
determination and analysis of individual differences in general intelligence and
aptitude. For example, mental tests are used for diagnostic purposes to estimate the
present ability of individuals, and for prognostic purposes to predict ability or
performance of individuals in the future on the basis of their present ability (Anastasi
& Urbina 1997).
Brown (1983) noted that there were three situations where tests were used as aids in
decision making about an individual, a group or some hypothesis. The first use was
37
selection, where the role of the test is used to select the most promising applicants,
those with the greatest probability of success. The second use of tests was for
placement to assign one or more individuals to several alternatives according to their
ability. A third use of tests was in diagnosis to identify the individual's strengths and
weakness and to determine a suitable program or treatment for him or her.
The purpose of using mental tests in schools was to estimate the mental ability of
students and provide them with educational or vocational guidance. Anastasi (1988,
p.4) stated:
At present, schools are among the largest test users. The classification
of children with reference to their ability to profit from different types of
school instruction, the identification of intellectually retarded in one hand and
gifted in the other, the diagnosis of academic failures, the educational and
vocational counselling of high school and college students, and the selection
of applicants for professional and other special school programs are among the
many educational uses of tests.
2.8 Use of Intelligence Tests
Psychological assessment often depends heavily on the use of standardized
intelligence tests. Therefore, the use of each intelligence test must be guided by
substantial research, including research on subgroup differences. The results that
address hypotheses that guide this study have the potential of adding to the research
database in this area (Edwards, 2003).
The use of intellectual and other forms of psychological and mental tests with
students who differ culturally, linguistically, or racially has been subject to
substantial controversy. Professionals responsible for assessment of culturally
different children frequently are uncertain which test instruments provide the most
valid, relevant and equitable results. Interest in providing fair and equitable mental
test results extended back several decades, but what is considered fair and objective
38
changed as values in our culture change (Oakland, 1976; Oakland & Laosa, 1976).
Differences in intelligence scores between different groups are considered
important, in part, since tests are statistically structured to distinguish between
individuals, and groups, because groups are aggregates of individuals. Intelligence
tests are designed carefully and deliberately to produce score variance (Wesson,
2000). The generation of a broad range of individual scores permits psychologists
to acquire knowledge and make judgments about, between, and within group
differences. This knowledge allowed interpretation of the distribution of scores that
led to various decisions (e.g., eligibility for placement in special education and
gifted programs) (Wesson, 2000 & Yoon, 2006).
Summarising uses of intelligence tests after the Second World War in the United
States, Samuda (1975, p.25) reached the following conclusion:
Intelligence tests play a vital role at all stages and in every aspect of a
person's life. From pre-school days through postgraduate years, tests are
administered for grouping and course selection purpose, for placement in
special education classes or special institutions, for career orientation, college
entrance, and admission to professions. A person's IQ score largely determined
the type of education he/she received and, ultimately, the type of position
he/she might occupy within society. Therefore, the concept of intelligence was
central to an individual's life.
It should be stressed that intelligence tests should be used alongside other methods as
interviews, history records or other test score before reaching to a decision regarding
any test taker. Layman (1968, p.8) pointed out the problem of using intelligence tests
alone for prediction and judgement. He stated:
Intelligence tests are far from perfect indicators of what sort of

schoolwork may be expected of a child, and they should be used thoughtfully
and with caution. Intelligence test scores should be not used as the sole basis
for judgements about a student.
39
According to Urbina (2004), the current uses of tests, which take place in a wide
variety of contexts, can be classified into three categories; decision making,
psychological research and self-understanding and personal development.
• Decision making:
The primary use of psychological tests has been as decision making tools. This
particular application of testing invariably involved value judgment on the part of one
or more decision makers who needed to determine the bases upon which to select
place, classify, diagnosis, otherwise deal with individuals, groups, organizations, or
programs.
When tests are used for making significant decisions about individuals or programs,
testing should be merely a part of a thorough and well-planned decision-making
strategy that takes into account a particular context in which the decisions are made,
the limitations of the tests, and other sources of data in addition to tests.
Unfortunately, very often–for reasons of expediency, carelessness, or lack of
information- tests have been made to bear the responsibility for flawed decisions-
making processes that placed too much weight on test results and neglected other
pertinent information.
• Psychological testing:
Tests have often been used in research in the fields of differential, development,
abnormal, educational, social, and vocational psychology, among others. They
provide a well-recognized method for studying the nature, development, and internal
relationships of cognitive, affective, and behavioral traits. It should be noted that the
advantages that psychological tests offer pertain to their characteristic efficiency and
40
objectivity.
• Self-understanding and personal development:
Most humanistic psychologists and counselors have traditionally perceived the field
of testing, often justifiably, as overemphasizing the labeling and categorization of
individuals in terms of rigid numerical criteria. Constance Fisher (1984) began using
tests in an individualized manner. This practice has evolved into the therapeutic model
of assessment espoused by Finn and Tonsager (1997). One of the most pertinent
applications of this model was in counseling and psychotherapeutic settings.
2.9 Culture-Free and Culture-Fair Tests
The use of tests in cultures other than the one for which it were originally designed,
and the issue of cultural bias have led psychologists to develop what they thought at
first to be culture-free tests.
The term culture fair test refers to tests that are not biased toward a particular cultural
group. Culture bias exists in a test when a member from one culture is discriminated
against in his or her ability to answer questions solely on the basis of the culture in
which he or she grew up (Corsini, 1984; Anastasi, 1988)). Anastasi & Urbina (1997)
mentioned that the concern with cross-cultural testing was recognised at least as early
as 1910, during the testing of waves of immigrants to the United States.
To overcome the cultural bias in ability tests, psychologists have tried to develop
culture-free tests that have no such bias. Their first attempt to develop a test of
intelligence which would be free of cultural influences was to minimise the use of
language if cultural groups spoke different languages. However they noticed that the
direct translation of test items from one language to another did not eliminate the
41
cultural differences, nor produce comparable tests (Anastasi & Urbina 1997).
Psychologists tried another approach, to develop non-verbal ability tests. Most of
these non-verbal tests contained information or emphasised pictorial and figural
content that a person raised in different culture may lack the experience to understand
and furthermore may seem pointless. Anastasi (1988) pointed out that non-verbal
tests are often used in hope of obtaining culture fair tests, but many researchers
believed that non-language tests may be more culturally loaded.
Kline (1979) argued that non-verbal tests in non-western cultures avoided the
language problem but encountered another perhaps more serious problem, when these
tasks seem pointless to subjects. Kline gave an example of this problem based upon
performance in the Porteus Mazes test. He stated that:
"when an old African who was tested was asked to trace the maze,
imagining he was asked to lead his cattle into the kraal, the old African replied
that he preferred not to, since any one who built a kraal like that was mad"
p.309.
A culture free test is meant to have a test with items that are unfamiliar to all subjects.
Technically, it has been proved that it is impossible to develop a test that is
completely free from cultural bias (Biesheuvel, 1969; Brislin et al., 1973; Noll and
Scannell, 1979; Brown, 1983; Anastasi, 1988).
Anastasi (1988, p.357) reviewed the problem of culture free tests and concluded that:
Since all behaviors are affected by the cultural milieu in which the
individual is reared and since psychological tests are but samples of behavior,
cultural influences will and should be reflected in test performance. It is
therefore futile to try to devise a test that is "free" from cultural influences.
Noll and Scannell, (1979) had the same opinion. They stated that no test could be
42
culture free, since the only way to respond to it is in terms of what has been learned,
that is, in terms of one's culture.
Again to overcome this problem, psychologists shifted to the development of culture-
fair tests. They believed that to have a culture-fair test, all test items should be equally
familiar to all subjects. Biesheuvel (1969) defined culture-fair tests as tests which
avoid culture-bound features such as emphasis on speed of performance, pictures
presenting objects or situations that lack universality.
Brown (1983) believed that culture-fair tests, though not eliminating culture effects,
attempted to make the tests equally fair to all persons by controlling certain critical
variables, such as, language, speed in responding within limited time, and differences
in competitive motivation between cultures.
Summarising the problem of speed in responding within limited time, Samuda (1975)
reported that many researchers found that the attitude toward speed varies greatly in
different cultures and not all people will work on the test with equal interest in getting
it done in the shortest time possible. For example, they found that the injunction to
"do this as quickly as you can" seemed to make no impression whatsoever on the
American Indian children.
Anastasi (1988) also mentioned that the present objective in cross-cultural testing is to
develop tests that presuppose only experiences that are common to different cultures.
For this reason, such terms as culture-common, culture-fair, culture-reduced and
cross-cultural have replaced the earlier “culture-free” term.
Kline (1979) concluded that for cross-cultural test construction it was best to use our
knowledge and experience of the culture as a guideline to writing items, and to retain
43
those that show themselves to be criteria-based or valid in factor analysis. Such tests
enable the cross-cultural psychologist to elucidate the environmental factors
influencing the major ability factors which is one of the stated aims of cross-cultural
psychologists.
The following are examples of culture-fair tests that have been used in cross-cultural
testing; Porteus Maze Test 1913, Kohs Block Design Test 1923, Goodenough-Harris
Drawing Test 1926, Raven's Progressive Matrices 1938, Cattell's Culture Free Test
1940 (in the late 1950s, Cattell changed the term "Culture-Free Test" to Culture-Fair
Test), D48 Test (dominoes) 1948, and Witkin's Embedded Figures Test 1945.
Brislin et al., 1973; Kline, 1979; Raven, 1989; Murphy and Davidshover, 1991
believed that Raven's Progressive Matrices was one of the most widely used
intelligence or ability tests in cross-cultural research.
2.10 Achievement Tests
Achievement tests were intended to measure the individual's actual learning of
educational subject matter after a period of instruction. They were not designed for
prediction. Instead, they measured what has been learned or the mastery of school
subjects (Freeman, 1962).
Achievement tests served many functions. Aiken (1988, p.125) outlined the
following: (a) to determine how much people knew about certain topics or how well
they can perform certain skills; (b) to inform students, as well as their teachers and
parents, about students' scholastic accomplishments and deficiencies; (c) to motivate
students to learn; (d) to provide teachers and school administrators with information
to plan or modify the curriculum; and (e) to serve as a means of evaluating the
44
instructional program and staff.
The distinction between achievement and intelligence or aptitude tests is not simple.
Anastasi (1988, p.412) believed that differences between achievement and aptitude
tests were in the degree of uniformity or relevant antecedent experience. Thus
aptitude tests measured the effects of learning under uncontrolled and unknown
conditions, whereas achievement tests measured the effects of learning that occurred
under partially known and controlled conditions. In differentiating between aptitude
and achievement tests she stated:
No distinction between aptitude and achievement tests can be rigidly

applied.... We should especially guard against the naive assumption that
achievement tests measure the effects of learning, while aptitude tests measure
innate capacity independent of learning.
Jensen (1980, p.239) also argued that all performance was a form of achievement, and
of course there is no performance free psychological test. To distinguish between
intelligence or aptitude and achievement tests, Jensen outlined the following points;
a) Intelligence tests are much broader and more heterogeneous based on a wide
variety of experiences than are achievement tests which have specific types of
knowledge associated with formal schooling.
b) Intelligence tests sample cumulated knowledge and skills from the individual's past
experience, whereas achievement tests sampled knowledge acquired in the recent
past.
c) Intelligence tests predict future intellectual achievement, even though the contents
of the achievement have nothing in common with the aptitude tests.
d) Most intelligence measures are more stable across time and are less susceptible to
45
the influence of instruction or training than most achievement tests.
Aiken (1988) believed that the distinction between achievement tests and intelligence
tests can be made in terms of focus. Achievement tests focus more on the present,
what the person knows or can do now, whereas intelligence tests focus on the future
or what a person should be able to do with further education or training.
Sattler (1982) pointed out that intelligence tests and achievement tests have
commonalties as well as differences. Both tests sample aptitude and learning.
However, intelligence tests are broader in coverage than achievement tests and sample
from a wider range of experience. Achievement tests, such as reading and
mathematical tests, are heavily dependent on formal learning experiences that are
acquired in school or at home which make them more culture bound than are
intelligence tests. Sattler added that intelligence tests stress the ability to apply
information in new and different ways, while achievement tests stress mastery of
factual information. Thus, intelligence tests measure less formal achievement than do
achievement tests.
Achievement tests can be divided into standardised and teacher-made tests. The
former mainly differ from teacher- made tests in that they are intended to be used over
a period of many years, and cover a broader range of skills and educational objectives
common to many schools. The term standardised refers to specific instructions for
administration and scoring. Teacher-made tests are tests designed to assess the
academic progress of students in a particular classroom, not to give broad
comparisons across schools. Teacher-made tests are sometimes called classroom tests
or "informal" tests, and are constructed by classroom teachers for use in their
particular classes under conditions of their choosing (Ahmann and Gluck 1976).
46
Brown (1983) distinguished between teacher-made and standardised achievement
tests. Brown stated that for the teacher-made tests, teacher will refer to textbook
assignment, supplemental reading lists, lecture outline and class discussions as
sources of items. Standardised tests developed by test publishers will consider not
one text, but the most commonly used material covered, not by one teacher, but by a
variety of teachers and experts.
Aiken (1988) believed that teacher-made and standardised tests are complementing
rather than replacing each other. He distinguished between teacher-made and
standardised achievement tests. He stated that a teacher made test is more specific to a
particular teacher, classroom, and a unit of study and is easier to keep up to date.
Standardised tests, on the other hand, are built around a core of general educational
objectives common to many different schools. In addition to being more carefully
constructed and having broader content coverage than teacher-made tests,
standardised tests have norms and higher reliability coefficients.
2.11 Intelligence and academic achievement
Intelligence and education are so intimately bound together that it would be
impossible to understand intelligence without knowing about its relation to education.
Intelligence is considered to be the child of education because the field of intelligence
testing was born from the need to develop a test that would predict children’s school
success (Sternberg, 2000).
The study of intelligence and education provides an example for the fruitful
interaction between the practical demands of educators and the basic research focus of
cognitive scientists (Sternberg, 2000). As mandatory public education became
commonplace by the late 1800s educators were confronted with overwhelming
47
observation: students of some chronological age displayed a range of individual
difference in intellectual ability (Sternberg, 2000).
The study of intelligence has been motivated by the practical problems of education.
By 1905, Binet and his colleagues achieved a solution that was innovative,
straightforward, and most important, successful- the development of the Binet-Simon
intelligence scale. In this scale if a child failed to answer correctly questions that most
other children of the same age could answer, the child was considered below average
in the ability to learn. Likewise, if a child was able to answer questions that most
other same-aged children could not answer, the child could be considered above
average in the ability to learn. These were based on the assumption that all children at
the same age level had the same opportunities to learn. Binet’s test was successful to
some extent in predicting children’s ability to learn in school. This test has served as
the basis for subsequent intelligence tests (Sternberg, 2000).
Academic achievement at school is the result of learning and problem solving ability
(Bester, 1998) Intelligence is seen to be the ability to think and learn and is therefore
considered to be fundamental to academic achievement.
In the literature, correlations between tests of general intelligence and measures of
academic performance were reported as being usually close to 0.50 (Brody 1992;
Neisser, Boodoo, Bouchard, Boykin, Boykin, Brody, Ceci, Halpern, Loehlin, Perloff,
Sternberg & Urbine, 1996) but can be as high as 0.75 (Jensen, 1998).
Studies have shown that IQs predict educational achievement. IQs predict subsequent
educational achievement at a magnitude of a correlation of around .5 to .7. IQ
determines the efficiency of learning and comprehension of all cognitive tasks. The
48
correlations between IQ and subsequent educational attainment were not perfect
because educational attainment is partly determined by motivation, interests,
compliance and the effectiveness of teaching. Nevertheless the correlations are
substantial and show that intelligence tests measured real cognitive abilities that are
also expressed in educational attainment (Lynn and Vanhanen, 2006).
Many empirical investigations have shown that intelligence is the best single predictor
of academic success. Horn et al. (1993) in their study on undergraduate university
students, developed a path model to show the relative influence of different variables
on achievement. They found that when compared to other factors, such as previous
knowledge and motivational factors, general intelligence was found to have a highly
significant direct effect on achievement, independent of any the other variable in the
model. Intelligence showed a correlation of 0.55 with achievement, explaining 30% of
the students’ performance in this study.
Chen, Lee and Stevenson (1996) carried out a study investigating the relative
contribution of intelligence, previous achievement and family factors to later school
achievement in Chinese, Japanese and American cultures. It was concluded that there
were similar correlations between intelligence and academic achievement for each
culture investigated. Participants were administered intelligence tests in Grade 1 and
their achievement was tested 10 years later in grade 11. The single most predictive
variable for Grade 11 achievement in mathematics, reading and general knowledge
was general intelligence. The study found a correlation of between 0.48 and 0.53 for
mathematics achievement, between 0.28 and 0.51 for reading and 0.35 and 0.44 for
general knowledge. Gagne and St Pere (2002) in a study comparing the predictive
values of intelligence, motivation and persistence, similarly found that cognitive
49
abilities were by far the best predictor of school achievement. In this test, it was found
that intelligence correlates with an achievement of between 0.36 and 0.56.
Verbal ability, as measured in intelligence tests, appears to contribute most to
achievement in scholastic success. Thompson and Plomin (1991) conducted an
investigation to ascertain the correlations between different measures of intelligence
and achievement in reading, mathematics and general language tasks from grade 1 to
6. The researchers found that the correlation between verbal ability and achievement
was higher than correlation between other measures of intelligence.
The abovementioned study showed the importance of verbal intelligence with regard
to academic achievement, but the results revealed that other measures of intelligence
are also important in predicting scholastic success. In the study carried out by
Thompson et al. (1991), spatial intelligence, as measured by a spatial relations test
and a hidden patterns test, was found to be a good predictor of scholastic success in
reading and mathematics. Spatial intelligence was, however, a less powerful predictor
than verbal ability of achievement in the general language area. In the study carried
out by Marais (1992) it was shown that the ability to do mathematics, accountancy
and general science appeared to require the contribution of both verbal and nonverbal
abilities.
2.12 Increase in IQ with time
Discourse on IQ differences should reference substantial increases in intelligence
scores during the last 60 years. Scores on measures of intellectual functioning have
risen, and in some cases rather sharply, during this period (Flynn, 1999; Neisser,
1998). Analysis of intelligence data from several countries (e.g., Belgium, France,
Norway, Denmark, Germany, Austria, Switzerland, Japan, China, Israel, Brazil,
50
Canada, Britain, and the United States of America) found, without exception, large
gains in IQs over time (Flynn, 1998). The pattern of gains corresponded with the
worldwide move from an agriculture-based economy to industrialization (Flynn,
1987, 1994, 1999; Raven, Raven, & Court, 1993). Average IQs have risen by about
three points a decade during the last 50 years (Flynn, 1999). These IQ gains across
decades, referred to as the "Flynn effect," provided evidence that gains in average IQ
were part of a persistent and perhaps universal phenomenon (Flynn, 1999; Herrnstein
& Murray, 1994). Gains were most dramatic on tests that assesed a general factor, g,
of intelligence. One of the best examples of an intelligence test that primarily
measured “g” was the Raven's Progressive Matrices (Jensen, 1980).
Research with the Raven's Progressive Matrices is particularly relevant because of
the finding that it is considered to be the best-known, most extensively researched,
and most widely used culture-free test of intelligence (Jensen, 1980). Many scholars
believe the test measures ‘‘g’’ and might be the most reliable measure to identify
intellectually able children from impoverished backgrounds (Jensen, 1980). However,
Raven's scores are highly influenced by environmental variables. To illustrate, all 18-
year-old males in the Netherlands took an adaptation of the Raven's upon entrance
into the military. Data available from this population revealed the mean scores of
those tested between 1952 and 1982 rose 21 IQ points. Genetic changes within
populations could not occur in such a short time span (Flynn, 1999). Therefore, the
increase in Raven's IQs must have been a function of changes in the environment
(Neisser, 1998). Current geometric rates of change in society (e.g., improvements in
nutrition, the acquisition of information as a result of computers and the internet) led
to concomitant changes in population IQs and, important to this study, changes in
subgroup IQ differences. The unknown factors producing secular IQ gains over
51
generations may also occur within generations and lead to IQ differences among
subgroups (Flynn, 1987). Thus, the finding of substantial changes in population IQs
over time raises the question as to whether the historically observed pattern of mean
IQ differences among racial/ethnic groups also show substantial change.
Most of these IQ increases have been reported in economically developed nations but
IQ increases have also been found in few economically developing countries
including Brazil (Colom, Flores-Mendoza & Abad, 2007), Dominica (Meisenberg,
Lawless, Lambert & Newton, 2005), Kenya (Daley, Whaley, Sigman, Espinosa &
Neuman, 2003), and Sudan (Khaleefa & Lynn, 2009).
Have increases been greater for fluid IQ (non-verbal & reasoning abilities) than for
crystallized intelligence (verbal and educational abilities) and if so, why? Wheeler
(1942) appeared to be the first to find greater gains in non-verbal than in verbal
abilities in a report regarding the increase in IQs in East Tennessee children aged 6-16
over the years 1930- 40. The average gain was considerably greater for non-verbal
ability (6.0 IQ points per decade) than for verbal ability (2.6 IQ points per decade).
In 1982 Lynn (1982) showed that IQs had increased in Japan over the preceding three
decades. The result of Lynn’s study was confirmed in many other studies in a number
of countries Flynn (1987, 2007), Lynn & Hampson (1986), and Lynn (1990b). Lynn
& Hampson (1986) showed that in Britain fluid intelligence measured by the Standard
Progressive Matrices in children aged 7-15 years increased by 1.86 IQ points a decade
for the years 1938 to 1979. Lynn (2009) has shown that approximately the same gain
took place over the years 1979 to 2008
Has the amount of increase been the same at all ability levels or greater among lower
52
IQ groups? This question was addressed by Cattell (1951) in his study on the IQ
increase in Britain (1936-49) in which he reported that the gain was only present in
the lower half of the distribution. In an early study, Elley (1969) reported that IQ
gains in New Zealand (1936-68) were smallest in children of professional parents and
greatest in children of unskilled parents. Other studies finding greater gains among
those at lower levels of ability have been reported for Denmark (Teasdale & Owen,
1987, 1989, 2008), Norway (Sundet, Barlaug & Torjussen, 2004) and Spain (Colom,
Lluis-Font & Andres-Pueyo, 2005). However, gains have been equally great among
those at higher levels of ability in France, Netherlands and United States (Flynn,
2007, p.104), while Spitz (1989) has reported that gains in the United States have
been greatest at the average IQ level. A number of studies noted in the introduction
have reported that the IQ increase has been greater among lower IQ groups but there
have also been some studies finding that the increases have been the same at all ability
levels. Lynn’s (2009) data confirmed the previous studies showing greater IQ
increases in the lower range of the ability distribution.
What factor or factors have been responsible for the IQ increase? Nine principal
theories have been advanced. These were:
(1) Increased test sophistication. Flynn has recorded that when he began working on
the effect, he canvassed expert opinion and reported that “scholarly correspondents of
high competence (H.J.Eysenck, J.C.Loehlin, D.Zeaman) have offered two possible
causes of IQ gains over time, increased test sophistication and a rising level of
educational achievement” (Flynn, 1984, p.47). These two factors had been advanced
some decades earlier by Tuddenham (1948) in another early report of the effect, while
increased test sophistication has subsequently been endorsed by Jensen (1998, p.327)
53
who wrote of “increasing test wiseness from more frequent use of tests”.
(2) Improvement in educational achievement was the other factor cited by scholars of
high competence from whom Flynn sought advice. This had also been advanced some
decades earlier by Tuddenham (1948, p.56) who stated “the superior performance of
the World War II group can be accounted for largely in terms of education”. Flynn
(2007) also endorsed the improvement in the education theory
Many others have favoured the ‘improvement in education theory’ of the Flynn effect,
including Cattell (1971, p.275) who stated: “the inter-generational changes …
probably represent the unquestionably marked improvement in schooling”. The
research of Teasdale and Owen (1994, p.333), Jensen (1998, p.324), Meisenberg,
Lawless, Lambert and Newton (2006, p. 273), Weede and Kampf (2002, p.365),
Stelzl, Merz, Ehlers and Remer (1995, p.294), Flieller (1999, p.1056), Garlick (2002),
Blair, Gamson, Thorne, and Baker (2005), all supported the following statement taken
from Meadows, Herrick, Feiler, et al. (2007, p.58) which stated: “its likeliest cause
may be improvements in education reflecting more effective teaching”.
(3) The greater complexity of more recent environments provides greater cognitive
stimulation arising from, for example, television, media and computer games. The
following quotes are all taken from research that broadly agree with this point:
• “The complexity of the modern world causes massive intelligence gains”
(Vincent, 1993, p.62)
• “Computer games have always been my favourite candidate” (Wolf, 2005,
p.15)
54
• “Growing exposure to and awareness of the kinds of problems found in
intelligence tests is enough to account for the small increases observed”
(Rabbitt, 2006, p 674)
• “Television and other mass media may have left their mark” (Elley, 1969)
• The reasons given are: “Wider exposure to mass media” (Jensen, 1998,
p.326)
• The reasons given are: “TV, video games and computers” (Greenfield, 1998,
p.91).
(4) Improvements in child rearing, e.g. “Better educated parents have more
enlightened views on child rearing” (Elley, 1969), and “…better child rearing
practices as a partial explanation for the increase in children’s scores on intelligence
tests” (Flieller, 1996).
(5) More confident test-taking attitudes have been advanced by Brand (1987) and
Brand, Freshwater and Dockrell (1989). They suggested that increasing liberalism,
permissiveness, and risk-taking promoted speed and guessing, which in turn increased
test scores.

(6) Reduction in family size. This has been advanced by Flynn (2007, p.356) who
dismissed nutrition and wrote “better education and smaller families are much more
plausible (reasons)”.

(7) The “individual multiplier” and the "social multiplier" theories have been
proposed by Dickens and Flynn (2001) and elaborated by Flynn (2007). The concept
of the “individual multiplier” was that intelligent people have a thirst for cognitive
stimulation and this increased their intelligence through positive feedback. The "social
multiplier" posited that “other people are the most important feature of our cognitive
55
development and the mean IQ of our social environs is a potent influence on our own
IQ” (Flynn, 2007). This led Flynn to predict that children brought up in a university
town should have higher intelligence that those without this advantage, because the
high intelligence of the professors would enhance intelligence of the population.
(8) Heterosis: Jensen (1998, p.327) has suggested that the genetic factor of heterosis
(hybrid vigor) could have contributed to the Flynn effect. Heterosis resulted from the
mating of two persons from different ancestral lines. Jensen argued this has probably
increased in the United States as a result of immigration from many different
countries. Further arguments for the heterosis theory have been advanced by Mingroni
(2004).
(9) Improvements in nutrition as a reason has been advanced by Lynn (1990, 1993
1998), who has pointed out that nutrition affected intelligence, and that the quality of
nutrition had improved over the course of the twentieth century. This has been
responsible for increases in height and brain size of about the same magnitude as have
occurred for intelligence. This theory has been endorsed by Jensen (1998, p.325) and
by Colom, Flores-Mendoza and Abad (2007) as one among a number of causal
factors.

Endorsed as one causal factor by Arija, Esparo, Fernandez-Ballart et al. (2006),
Colom, Lluis-Font & Andres-Pueyo (2005), and Jensen (1998, p.325) was better able
to explain the large IQ gains of 4 year olds and the larger gains of fluid intelligence
than of crystallized intelligence. The nutrition theory posited that the crucial effect of
improvement in nutrition impacted on fetus and infants when the brain is growing,
and had little subsequent effect. Hence the IQ gains should be fully present in 4 year
olds and should not show increased effects in older children. The improvement in
56
utrition theory can also explain the greater improvement in fluid than in crystallized
intelligence, because numerous studies have shown that fluid ability is more
vulnerable to cerebral criticism, including sub-optimal nutrition (Lynn, 1990a, 1993,
1998). Hence, as sub-optimal nutrition has declined during the last century, fluid
ability had increased more than crystallized ability.
In addition, Lynn (2009) showed greater IQ gains among those with lower ability
which also might be explained by the improvement in nutrition theory. Those at the
lower ability levels are more likely to have had sub-optimal nutrition in earlier times
and have benefited more from the improvements in nutrition that have followed rising
living standards during the last century. It is doubtful whether any prediction
regarding the size of gains at different ability levels can be made from the increases
and/or improvement in education theory or other variants of greater cognitive
stimulation theory (Lynn, 2009). However, Flynn (2007) had argued against the
theory on the grounds that increases in height have ceased in the United States
whereas increases in intelligence have continued.
2.13 Chapter Summary
This thesis is primarily concerned with intelligence in Libya. A detailed account of
intelligence was discussed. This chapter introduced the concept of intelligence and
summarized the different definitions of intelligence. It showed that although of the
great efforts of researchers in this matter, it was concluded that it is a construct
difficult to define. In addition, the chapter has presented an overview of the evolution
of intelligence and intelligence testing, the contribution of scholars in this field and
theories of intelligence. Many researchers believe the identification of mental
retardation was the problem that stimulated Sequin, Esquiral and Binet to develop
57
psychological tests. Galton and Cattell both had the idea that intelligence would be
expressed in the form of sensitivity of perception, so they used tests to measure this.
In 1905 Binet and Simon prepared the first IQ test which has been the most widely
used test of intelligence in many countries. The need for rapid testing of a large group
of subjects came with the First World War when in 1917 a group of American
psychologists developed the Army Alpha and Beta group test.
Binet’s test of intelligence and Spearman two factor theory were the starting point for
the theory of intelligence in the twentieth century.
This chapter has also presented the definitions, classification and use of mental tests.
Tests can be classified according to timing, procedure of administration and test
content. In general tests are used for selection, placement and diagnosis purposes.
The problem of culture bias arose when intelligence tests were used in cultures other
than the one for which they were designed. Researchers explored culture free tests
which minimized the use of language, and then they developed the culture fair test in
which test content is familiar to all subjects. Other researchers believed that there is
no such thing as a free or fair test. Issues surrounding the definitions of intelligence
and the differences between intelligence and achievement tests have been covered.
Finally, the chapter discussed the issue of IQ increase with time and evaluated the
reasons behind it.
The next chapter will introduce Libya, the educational system and intelligence testing
in Libya. In addition, the study aims, objectives and rationales will be evaluated.
58
Chapter three: RATIONALE AND STATEMENT OF PROBLEM
3.1 Introduction
Libya is a country in northern Africa. The name "Libya" is derived from the Egyptian
term "Libu", which refers to one of the tribes of Berber peoples living west of the
Nile. In Greek this became "Libya", although in ancient Greece the term had a
broader meaning, encompassing all of North Africa west of Egypt, and sometimes
referring to the entire continent of Africa. Bordering the Mediterranean Sea to the
north, Libya lies between Egypt to the east, Sudan to the southeast, Chad and Niger to
the south, and Algeria and Tunisia to the west. With an area of almost 1.8 million
square kilometres (700,000 sq mi), Libya is the fourth largest country in Africa by
area, and the seventeenth largest in the world.
Most of Libya’s people are descended from a mixture of Berbers, the country’s
original inhabitants, and Arabs, who arrived in the 7th century AD. Small numbers of
Berbers still live in the far south of the country. Libyan people are Muslims, and
Islam is the official state religion. Arabic is the official language. The southern
mountains and deserts occupy two third of the country, the remaining third are the
fertile agricultural plains of the north.
Urbanisation refers to the rise in the proportion of the total population living in urban
areas. Urban population increases: 1) when the number birth of exceeds death, and 2)
when there is migration from rural areas. (Yenigul, 2005). Urbanisation as a
phenomenon has been clearly described by Ravbar (1997, p. 70) in these words:
Urbanisation includes all events and changes related to the

consequences of the change way of life and work. Therefore,
urbanisation by nature represents a very interwoven and complex
process and is dependent on deagrarianisation, industrialisation,
59
migration, the upward mobility of the population, and the growth of
city function.
Urbanisation is not a new phenomenon in the Libyan society as many old civilisations
had, at different periods of time, their impacts on Libya and built towns and large
cities (Kezieri, 1995).
According to the General Authority of Information in the 2006 census, Libya has a
population of about 5.3 million with a growth rate of 1.9 %. One third of the
population are under 15 years of age, and 89.03 % are urban. The literacy for both
sexes (10 years and above) was 88.5%, (males 93.7% and females 83.11%). The gap
is narrowing because of increased female school attendance. Nelson (1979)
mentioned that at independence in 1951 the overall literacy rate among the Libyans
over the age of ten years did not exceed 20 percent. By 1977 the overall rate had risen
to 51%, (73% males and 31% females). The Libyan economy depends mainly on oil
exports and petrochemical industrial products.
The following section, section two, provides a short description of the education
system in Libya. Whilst the third section is concerned with intelligence testing in
Libya. The fourth and fifth sections are about adoption of intelligence tests and
Standard Progressive Matrices (SPM) test respectively. The sixthsection highlights
the statement of the problem and study rationale. The seventh, eighth and ninth
sections deal with study aim, research questions and objectives. The final section
presents a summary of the chapter.
3.2 Education System in Libya
A detailed and comprehensive report about the educational system in Libya has been
published at the International Conference on Education, in Geneva 2004. It sets the
60
general framework of educational system. Education in Libya is free for all
individuals’ at all educational levels and compulsory for elementary, preparatory and
secondary school age children (6-15 years). The Ministry of Education supervises the
educational policies, and determines the general guidance of schools curriculum,
textbooks, and method of teaching. Preparatory and high schools are segregated by
sex except in rural schools due to lack of school buildings or teachers.
The school year begins in September and ends in May, and classes are held six days
from Saturday to Thursday every week from 8:00 am to 1:00 p.m. The school system
in Libya is organised on a twelve-year basis, and is divided into three levels:
1. Elementary education level: this level covers the first six years of study (age
6-11 year). In the first three years students study courses in arabic language,
religious education, mathematics, drawing and physical education. From grade
4 up to 6 grade ( the end of elementary education level ) students study
courses in Arabic language, religious education, mathematics, history,
geography, basic science, drawing and physical education.
2. Preparatory education level: from grade 7 to 9 (age 12-14 year). In this level
students study courses in Arabic and English language, religious education,
mathematics, science, history, geography, sociology, drawing and physical
education.
From grade 4 up to grade 8 at end of each school year students sit for an exam to
transfer to the next grade. These exams are prepared by teachers at school level.
At the end of grade 9 students sit for a local exam prepared by a committee of
teachers at the municipality level, to obtain the certificate of preparatory education,
61
which in turn is required for admission to secondary level. Students must pass this
examination.
3. Secondary education level: covers the period from grade 10 to 12 (age 15 - 17
year). Secondary education is divided into four specialities; biology,
engineering, social and economical. Depending on marks in grade 9 and
student’s interest, students are allocated into one of the four different
specialities. Because of higher pay, status and salary enjoyed by engineers and
medical doctors more students prefer to choose science branch.
At the end of grade 12 students sit for the General Secondary Certification Exam, a
centralised national exam. These exams are run by the Ministry of Education and are
prepared by a committee of teachers and inspectors at the national level whom
construct the exams for all schools in Libya.
The student's progress depends upon his/her passing the national exams which include
a two to three hours written examination in each subject in the final year. The General
Secondary Certification is a prerequisite for admission into university. The grading
system in the final examination depends on the total scores in all subjects, as follows;
less than 50 % fail, 50 % to 64 % pass, 65 % to 74 % good, 75 % to 84 % very good,
and 85 % and above excellent.
Usually students that successfully finish high school directly get enrolled into the
universities, because work opportunities are extremely limited for high school
diploma-holders, whereas university diploma-holders have a much better chance in
obtaining a job.
62
The selection of students for universities is done by the Ministry of Education
depending on the secondary speciality chosen; the student is then enrolled in a
suitable faculty at the university. The educational system in universities depends on
speciality studied. It can be a semester or year system. In addition to undergraduate
studies, postgraduate studies including master and PhD degrees and advanced
diplomas in various specialisation areas are offered (Said lagga et al., 2004).
3.3 Intelligence testing in Libya
During recent decades, due largely to concerted efforts in economic and social
planning, Libya had witnessed considerable expansion in the education sector.
Hundreds of schools have been built, many universities have been established, a great
number of students have studied at home and have ventured further afield into Europe
and other parts of the Western hemisphere to study higher education in different
fields. In addition, educational policy and administration has been reshaped
Whilst all of these events have occurred, some areas have not benefited from the
positive effects of development in the field of education. Significantly, to date no
single test of intellectual ability has been officially adopted or developed to be used
for the measurement of intelligence in Libya. Many sectors in Libya use examination
grades as the primary method in determining who should be accepted for study at
various academic establishments and for various jobs in the vocational sector. These
grades were used in some cases as the primary criterion for identifying both gifted and
mentally retarded children and in addition were used for guidance and counseling
purposes. There is no reason, however, to believe that all examination grades have a
direct relationship or correlation to measurement of intelligence, let alone for
63
guidance and counseling purposes Although it might be considered as a good
criterion for such purposes, additional criterions are desirable.
Testing experts feel ambivalent about school based assessment. Such assessments are
not standardised, criteria vary from school to school and from teacher to teacher
(Heyneman, 1987). Durojaiye (1984) gave one reason for using examination grades
for selection in Africa. Durojaiye believed that school leaving results are more often
used in Africa for selection instead of ability tests because of the shortage of
psychologists in many African countries. The scarcity of trained psychologists in
developing countries makes the adoption of tests from western countries necessary
(Vernon, 1969; Miron, 1977, Majdub 2004).
Owing to the lack of any prevailing local intelligence tests researchers have
historically sought to do their research or projects using personality tests such as
Sentence Completion, Thematic Apperception Test (TAT), or other projective tests
because there were some colloquial Egyptian Arabic translations and adaptations for
these types of test. Students viewed personality tests as easier to administer and
interpret than intelligence tests (Attashani and Abdalla 2005). After graduation most
of these students became teachers at secondary schools, and few of them act as
psychologists even though not qualified. Although they had theoretical knowledge
about psychology and psychological testing, they did not have access to a wide range
of intelligence tests, due to shortage of viable local options
Mahdawi and Al-Roey (1991) in their study of mental health program in Libya
mentioned that the mental health services suffered from shortage of staff,
psychological services and a lack of facilities. They concluded that as the main
problem seemed to be manpower shortage, special efforts should be taken to train
64
more health personnel and community members such as teachers to deliver
psychological and psychiatric services From the above mentioned, it would appear
that at present the academic system in Libya fails to provide what is essential and
necessary for Libyan psychologists and researchers, especially in the area of
psychological testing
Kline (1979) pointed out that intelligence was a variable which is important and has a
definite meaning to Western people. However, the general public in Libya knows
little about the usefulness, purposes, or functions of intelligence and aptitude tests.
For some people IQ testing is something that was associated with psychological or
mental testing. This may point towards a stigma attached with this type of testing
which could be indicative of a cultural and social perception
Psychologists have taken many precautions in developing tests but there was
widespread misuse and misunderstanding in developing countries. Thus many people
have misgivings about tests and their use in decision-making. Part of this
misunderstanding could be attributed to the inadequate knowledge of tests by the
people who use them. Alexopoulos (1979) noted that misuse of tests may cause harm
to the testing movement.
Many researchers have studied the problems of misuse of test scores or use of
incomplete test scores for selection and prediction purpose. For example: Parmar
(1989) in India found that the information subtest of Wechsler Intelligence Scale for
Children-Revised (WISC-R) is simply deleted when testing Indian subjects and this
scale was not considered when computing IQ scores. He concluded that the use of the
incomplete test was likely to bias predictions based on test results and had serious
negative implications for educational or clinical decisions
65
Georgas & Georgas (1972) in their study of the use and misuse of intelligence tests in
Greece argued that the use of incomplete test scores for estimation of mental ability
might result in invalid assessment, leading to grave consequences on the lives of
individuals. Bertrand and Cebula (1980) believed that tests in themselves are not bad
and do not hurt children. However, they become bad only in the hands of those who
administer and interpret them poorly.
Sattler (1982, p.4) concluded that intelligence tests are tools which maybe useful in
accomplishing goals, and their effectiveness will depend on the skill and knowledge
of the psychologist. He stated:
When they used wisely and cautiously, they will assist us in helping
children, parents, teachers and other professionals obtain valuable
insight. When used inappropriately, they may mislead and cause harm
and grief.
It is interesting to note that the first IQ test (the Binet-Simon Scale) was constructed
in France in 1905 as a contribution to identify mentally retarded children who did not
profit from regular classroom instruction. Failure to achieve a good assessment for
the mental ability of retarded child at an early age made the problem worse in the
future especially for purposes like special education or rehabilitation programs. It is
believed that the misuse of intelligence tests led to inaccurate prediction,
misplacement, and inappropriate treatment of children. Tests for such purposes
should be well standardised for the local population, also they have to be reliable,
valid and used by experts only.
Other areas that have been affected by lack of intelligence tests in Libya were the
selection of students for different educational programs (e.g. gifted and special needs
programs). Intelligence tests play an important role in the educational and economic
66
system of a society because they prevent waste of human resources due to
misplacement of abilities or interests (Attashani & Abdalla 2005). It is believed that
failure to allocate students according to their abilities and interests deprived the
country from one of its most valuable resources In addition, this also had an adverse
effect on business and commerce where employees scoring well in tests might not
necessarily possess the attributes to perform the job effectively.
In Libya today, a relevant and accurate selection procedure is required more than ever
before, not only in the field of education but also as an intermediate level of training
for skilled manpower. Indeed, a clear failing of the current system could be seen
whereby many university graduates were posted to office work which could be done
by less qualified people (Attashani and Abdalla 2005).
Durojaiye (1984) believed that selection of students for educational purposes is very
necessary in most developing countries in Africa. This is because secondary and
university education is not compulsory, and a large number of students aspire to the
few places in the limited number of schools and universities. He stated for this reason
the best testing apparatus had to be devised for selecting students who will benefit
from their education and later meet the high demand for manpower requirements of
these developing countries.
Jensen (1981, p.19) believed that using standardised tests for selection was necessary
and unavoidable when number of applicants for university far exceeds the number
that can be enrolled. He stated:
Results of standardised test are unquestionably better for making direct

comparisons between applicants than any other means of selection, and
they can add substantially to the accuracy of prediction of applicant's
future performance.
67
The problem of adapting intelligence tests to a new setting was by no means
uncommon as this was a general problem for many developing countries in the past.
In addition, if the aim was to assess the mental ability of people in to a culture that has
yet to develop its own testing scheme or system, it was necessary to assess what was
important in and for that culture (Brislin and Thorndike, 1973). Ortar (1972), for
example, mentioned that most countries did not produce their own psychological tests
and had to adapt and modify instruments developed elsewhere to make them suitable
for local subjects
Schwarz & Krug (1972, p.3) in their book about ability testing in developing
countries pointed out that educators and researchers in developing countries held
widely divergent view about test adaptation. They stated:
At one extreme there are those who look mainly at the vast
environmental differences between the developing countries and the
highly industrialised nation, and conclude that any test designed for
one ipso facto can not serve the other. At the other extreme there are
those who attach greater importance to the fact that the skills needed
in both developed and developing countries are exactly the same, and
who fear that "simplified" tests will hamper them in producing
equally high levels of skill in their own population.
Schwarz & Krug concluded that neither view was correct because one view would
exclude all classic testing procedures from use in developing countries, since they
were designed in and for the Western culture, and the other view would oppose the
use of anything else, since this would be a tacit acceptance of lower performance
standards.
3.4 Adoption of intelligence tests
In this regard, Ezeilo (1978) suggested that African researchers and psychologists
might use one of three approaches
68
. Design their own test to the local environment; this involves a great deal of time
and effort
. Modify a widely used international test by introducing some changes in its items,
then standardize and obtain local norms
. Use an international culture-free test after standardization and achievement of local
norms
The third choice was the most frequently applicable in the field of the measurement of
mental abilities and personality traits. It required less time and effort than the first two
alternatives Therefore, this approach was applied in this study. The Raven’s
Progressive Matrices test was employed because it has been widely used and enjoys
moderately high indices of validity and reliability when used in a wide range of
cultures.
Kline (1979) concluded that for cross-cultural test construction it was best to use
one’s knowledge and experience of the culture as a guideline to writing items, and
retain those that show themselves to be criteria-based or valid in factor analysis. Such
tests enable cross-cultural psychologists to elucidate environmental factors
influencing major ability factors. This was one of the stated aims of cross-cultural
psychologists
Raven's Progressive Matrices test is an example of a culture-fair test that has been
used in cross-cultural testing Brislin et al. (1973), Kline (1979), Raven (1989), and
Murphy and Davidshover (1991) held that Raven's Progressive Matrices was one of
the most widely used intelligence or ability tests in cross-cultural research.
69
3.5 Standard Progressive Matrices (SPM) test
The present study investigated intelligence tests with special interest in the British
mental ability test- the “Raven's Standard Progressive Matrices (SPM)”- as a measure
of general ability. It consists of 60 problems in 5 sets of 12. The tests are called
progressive because each problem in a set, and each set are progressively more
difficult. Each problem consists of geometric design with a missing piece; the
respondent selects the missing piece from six or eight choices given (Domino and
Domino, 2006). A more extensive description of the SPM test shall be given in the
next chapter.
The SPM test was selected because it has been regarded not only by its author, but
also by many researchers (e.g. Burke, 1958; Anastasi, 1988; Raven, 1989; Carpenter
et al., 1990; Arthur, & Woher, 1993; and Arthur & Day, 1994) as a useful non-verbal
measure of ability which was easy to administer and score. It is a group test, which
can be used with subjects of all language backgrounds and does not depend to any
large extent upon education or prior knowledge of the subjects. In addition, it is
suitable for all ages from the age of 6 years
The Progressive Matrices (RPM, Raven, Raven & Court, 2000, Lynn & Vanhanen
2006) is the most widely used test of intelligence in numerous countries throughout
the world. One reason for the popularity of the test was that it is non-verbal and can
therefore be applied cross-culturally, while verbal tests are more culture specific and
preclude cross-cultural comparisons. Another reason for the popularity of the test is
that it was considered to be the best test of g, the general factor present in all cognitive
tasks that was first identified by Spearman (1904) and which was largely a measure of
reasoning ability (e.g. Carroll, 1993; Jensen, 1998; McGrew and Flanagan, 1998). The
70
test was constructed by Raven (1939) and consisted of a series of 5 or 7 designs that
progressed according to some rule. The problem was to identify the rule and
extrapolate it further. Testees were given 5 or 8 alternatives for this further
extrapolation and had to select the correct one. Items were scored either right or
wrong. A participant’s score was the number of right answers. Maximum possible
score was 60. The right answers were provided in the SPM manual.
The Raven’s Standard Progressive Matrices (SPM) test was constructed to measure
the educative component of g as defined in Spearman‘s theory of cognitive ability
(Raven & Court, 1998, updated 2003). Kaplan and Saccuzzo (1997) stated that
research supported the Raven Progressive Matrices (RPM) as a measure of general
intelligence, or Spearman’s g factor. In fact, the Raven may be the best available
single measure of g.
In the same vein, Jensen (1998) maintained that in numerous factor analyses, the
Raven tests, when compared with many others, had the highest g loading and the
lowest loadings on any of the group factors. The total variance of Raven scores in fact
comprised virtually nothing besides g and random measurement error. He also added
that Raven’s Progressive Matrices was often used as a “marker” test of Spearman’s g.
That is, if it was entered into a factor analysis with other tests of unknown factor
composition, and if the Matrices had a high loading on the general factor of the matrix
of unknown tests, its g loading served as a standard by which the g loadings of the
other tests in the battery could be evaluated.
By the same token, Lynn, et al. (2004) stated that the Progressive Matrices was
widely regarded as the best test of abstract or nonverbal reasoning ability, and this
itself was widely regarded as the essence of “fluid intelligence” and of Spearman’s g.
71
Mackintosh (1996) had described it as the paradigm test of non-verbal, abstract
reasoning ability.
This view is not, of course, universally accepted. Indeed, Raven and Court (2000)
referred to several studies which emphasised a spatial ability loading, and a review of
the extensive literature dealing with this topic from the point of view of researchers
keen to distinguish “Working Memory” from “g” was provided by Ackerman, Beier,
and Boyle (2002).
Court & Raven (1995); Kline (2000); Murphy & Davidshofer (1998) noted the
Standard Progressive Matrices test enjoyed good psychometric characteristics.
Gregory (1992) also noted that a huge body of published research has shown the
validity of this test. Therefore, as Irvine & Berry (1988) noted, it has gained
widespread acceptance and use in many countries over the world. No other test had
been extensively used in cross-cultural studies of intelligence. Lynn and Vanhanen
(2002) summarized extensive number of studies based on normative data for the test
which had been collected in 61 countries. For all these reasons, Kaplan and Saccuzzo
(1997) concluded that with its new worldwide norms and updated test manual, the
Raven was regarded as one of the major authorities in the psychological testing field
in the 21st century.
Some tests seemed to be more appropriate than others for use with literate children
and adults in developing countries. For example, at middle primary level there was
the Raven's Coloured Progressive Matrices (CPM) test. From the eight year old
upwards there was Raven's Standard Progressive Matrices (SPM) test (Ord, 1972).
72
The Progressive Matrices tests (Standard, Coloured, and Advanced) were the best
known and most widely used as measures of individual differences in cognitive ability
and as culture-reduced tests (Powers et al., 1986.a; DeShon et al., 1995). According
to Thorndike and Hagen (1977) and Ogunlade (1978) the SPM test's freedom from
language and apparently limited dependence on cultural variables had made it a
popular instrument for use in developing countries
Jensen (1980, p.648) examined the usefulness of the SPM test and made the following
observations:
Because the Raven Progressive Matrices is an excellent culture-reduced

measure of fluid g, one of its chief values is for screening illiterate,
semiliterate, bilingual, and otherwise educationally disadvantaged or
socially depressed populations for potential academic talent that might
easily remain undetected by parents and teachers or by the more
conventional culture loaded tests of scholastic aptitude. It is probably the
surest instrument we now possess for discovering intellectually gifted
children from disadvantaged background....
Due to all of the abovementioned advantages of the SPM as being widely used as a
measure of general ability, enjoyed good psychometric characteristics, being a cross-
cultural test, the researcher chose the SPM test as a measure for mental ability for the
Libyan sample in the present study
3.6 Statement of problem and study rationale
Measuring mental ability accurately and objectively has been a major concern of
researchers and psychologists in many countries since the beginning of the
psychological testing movement in 1905 by Binet. In Libya, as we have noted
previously, there is no valid or reliable instrument available to meet the researchers'
needs by providing a sound assessment of intelligence and this is a gap worth closing.
Libya, as a developing country which has no single standardized test to measure
73
ability, has to adopt one intelligence test which is suitable for the measurement of the
mental abilities of a Libyan sample
Thus, in summary, the problem is related to the adoption of one of the appropriate
Western instruments, which was suitable for measurement of general intelligence in a
Libyan setting, where no single test of intelligence had been officially adapted or
developed to give better judgment and evaluation for the Libyan samples
Differences in intelligence scores for different groups were considered important, in
part, since tests were statistically structured to distinguish between individuals, and
groups, because groups were aggregates of individuals. Intelligence tests were
designed carefully and deliberately to produce score variance (Wesson, 2000).The
generation of a broad range of individual scores permitted psychologists to acquire
knowledge and make judgments about, between, and within group differences. This
knowledge allowed for the interpretation of the distribution of scores that led to
various decisions (e.g., eligibility for placement in special education and gifted
programs) (Yoon, 2006).
Not much is known of the intelligence of populations of North Africa (Lynn and
Vanhanen, 2002, 2006). Libya as a developing country faces the same problems
which has been and is being faced by many of its Arab neighbors. It lacks the
prevalence of a pre-eminent, well established infrastructure to support key sectors like
education. Although many Libyan students graduated from educational psychology
programs and during their university study they received some theoretical knowledge
about intelligence and personality tests, still there is a lack of intelligence test
adaptation or development in Libya which is mainly due to a lack of test expertise.
74
Psychologists and scientific research related to educational and psychological issues
in Libya lack the knowledge about IQ tests among the population in general. There is
no perfect translation of the verbal items of the Stanford-Binet or WISC-R tests
currently in use in Libya. Therefore, no standardization or norms have been obtained
to suit Libyan samples. All these create a misuse, misunderstanding and unwise
application of the few intelligence tests which are available in Libya and which have
been used in Libya during the past years (Mahdawi and Al-Roey, 1991; Attashani and
Abdalla (2005).
Abdalla (2002) noted that in 1988 during his work as an educational psychologist at
Massa Institution for Mentally Retarded Children in Libya, with little modification,
translated and administered the short Form (L-M) of the Stanford-Binet Intelligence
Scale from English to colloquial Libyan Arabic language, in order to measure mental
ability of retarded and normal children aged 6 to 12 years. The project failed because
the sample was too small (N=54), the test required too much time to administer and
score and there were no test experts to analysis the data which were mainly verbal.
Furthermore, such standardisation for an individual test like the Stanford-Binet can be
done only through professional organisations which have great deal of time, effort and
money. These findings prompted the researcher to study and use the Raven's Standard
Progressive Matrices (SPM) test, as a tool to measure mental ability in the present
study to avoid the Stanford-Binet problems and difficulties.
Lack of intelligence test adaptation or development and the misuse of the few tests
available now in Libya created problems in the areas of mental measurement and
school selection. One of the major problems facing Libyan psychology researchers
75
now is the lack of accurate measurement of mental abilities. This type of
measurement in Libya had been affected by the lack of adapted or developed
intelligence tests. For example, only few institutions such as the Benghazi Children's
Hospital or the Tripoli Centre for Mentally Retarded Children were currently using
some items, but not the whole test, from the Stanford Binet Intelligence Scale or from
the Wechsler Intelligence Scale for Children-Revised (WISC-R) for the measurement
of intelligence
Unfortunately these tests items were used in these institutions without suitable
modification and adaptation to estimate some aspects of mental ability of the children
who were referred by parents or schools for diagnosis or treatments. It is clear that
such methods of assessment may have limited the application of test results or led to
wrong classification of a child's mental ability Again, this appeared to point to a lack
of understanding about these tests based upon a lack of knowledge in their application
and how to adapt such tests to suit the intended target groups.
At the Second Family Conference in Beida city in May 1991, the problems of testing
of children with special needs were discussed in a paper presented by Abdalla. One
of the recommendations was to stop testing and labelling deaf and mentally retarded
children according to scores obtained from incomplete and unstandardised
intelligence tests. Shelley and Cohen (1986) stated that attaching numbers to people
is not hard; attaching "meaningful" numbers is very problematic.
Previous studies that carried out the SPM in Libya included Aboujaaferin 1983; and
Majdub in 1991; Attashan and Abdalla in 2005 and Ahlam in 2005. These studies
were carried out without the prior standardization of the test. This present study
76
carried out the necessary standardization. Standardization of a test means obtaining
average scores and distributions from a representative population (Kline 2000).
This study responded to the lack of psychological testing in Libya, particularly in
mental testing. Thus, the main purpose was to develop norms, for the Classic form of
the Standard Progressive Matrices (SPM) in Libya to find out the distribution of IQ
scores within a Libyan setting. Norms of this groups were compared to norms of other
countries and a meta-analysis was carried out to investigate whether significant
differences exist in Raven’s Standard Progressive Matrices test scores between
developed countries ( e.g. UK ) and developing countries ( e.g. Libya ), according to
their age, sex and regions. This was done to examine the conclusion advanced by
Lynn (2006) that average scores are somewhat lower in economically developing
nations than in the economically developed nations of Europe and North America.
This study determined the psychometric characteristics validity, reliability, and item
analysis (difficulty and discrimination levels) of the Raven's Standard Progressive
Matrices (SPM) test in a Libyan setting and computed the percentile ranks for (SPM)
test scores according to sample age levels (Standardization of the Raven's Standard
Progressive Matrices (SPM) test with Libyan sample).
The last century has marked the success of the means of measurement, in testing in
general and intelligence testing in particular. Group standardised tests, however, have
come to the fore together with individual tests, practical tests, written tests and verbal
and non-verbal tests. Measuring intelligence as a general intellectual ability has been
taken into account by psychologists since the beginning of educational and
psychological measurement (Mohammad, 1984).
77
Attashani and Abdalla (2005) mentioned that in 1905 Alfred Binet in collaboration
with Simon in France constructed the first intelligence test and improved versions
came out in 1908 and 1911. This was when intelligence measurements found their
way into many countries and were being widely used for many purposes e.g.
intellectual ability measurement, educational guidance, educational selection,
educational diagnosis, vocational guidance, vocational selection, intellectual
weakness diagnosis and helping in the decision making process. These types of
measurement were critical and provided many benefits, especially in countries where
they believe in such measurements.
There remained no serious doubt about the potential usefulness of testing procedures
for purposes of educational and occupational selection in developing countries.
Whether tests would be adapted and how they were best applied were no longer major
issues; likewise whether such tests needed to be culture free or culture fair. Major
issues centred on such matters as the long-term validity of selection measures; the
prospects for further, as yet relatively untried, measures, as part of the selection
procedures; the education of more precise information on how moderator variables
may be operating in the selection situation; the possibility that of adopting more
efficient strategies of selection than traditional ones from the viewpoint of fitting the
job to the man as well as the man to job; and, perhaps most important of all, the
means of building locally appropriate, efficient selection institutions that would prove
viable (Ord, 1972).
Intelligence tests have been used in many areas in both USA and UK. The results
have been used in making decisions for entering schools, colleges, and universities,
and for accepting a work or a job opportunity. In France, intelligence measurements
78
have been used for vocational guidance and psychological diagnosis. In the USSR,
intelligence tests have been used in the educational sector as well as in vocational
guidance (Mikhaeel, 1995).
The measurement and utilization of intelligence, however, quite appropriately
deserved primacy within any culture, for the wealth of any nation-developed,
developing, or “primitive”- was the ability of its people. Once properly identified as
having requisite abilities for differential placement, each person can then conceivably
contribute more to the health. Well-being and the productivity of his country (Brislin
and Thorndike, 1973). It is axiomatic that the great nations have become great,
industrial, and prosperous because mental energies were tapped (Brislin and
Thorndike1973).
Since most developing countries were keen to make use of these tests, and since they
did not have sufficient scientific and technical abilities to help them design suitable
cultural tests, they opted for a standardising process. They were in need of different
tests of this type to satisfy the needs of human and social development plans, which
were usually adopted in these countries. To reach such goals, they needed to apply
these tests, and to conduct scientific research on these tests that represented part of the
scientific research in the fields of educational, intellectual and psychological
measurement. They needed to do such research in order to adjust these tests to their
societies, and to help them reach an appropriate interpretation for the score that a
person who sits for the tests achieves (Kamil, 2004).
In this respect, the importance of standardising these tests and measurements came to
the fore from one day to another. This was reflected in the interest of developed
countries in designing and standardising these tests and using them in different life
79
sectors, such as educational, health and other institutions. Moreover, there were now
specialised institutions which dealt exclusively in designing and standardising these
tests (Attashani and Abdalla, 2005).
Regardless to say, intelligence tests are mainly used in the educational sector. They
are also helpful in predicting what students in one class or school learnt in the level
that was expected for them, and also helped teachers to predict what students can
achieve (Alwakfi, 1998). Generally there was a need for intelligence tests to discover
talented individuals. Such students do not differ in appearance from other students.
Unless these tests were conducted, such students had no chance to be recognised
(Rajha, 1970).
There were also many other contributions of testing to society, such as better
distribution of educational and professional opportunities based upon merit and good
judgment, not on luck or personal judgment. Alexopoulos (1979, p.18) in his research
into standardization of the Wechsler Intelligence Scale in Greece, mentioned the help
that IQ tests could provide. He concluded that
We can imagine what could happen if there were no tests available

and the whole system was based on personal judgment, the social
position of the examinee etc., as is often the case with
underdeveloped or even developing countries, where the whole
system is not based on merit but on social position, acquaintance
with other persons of higher social or political standing. Thus tests
can help to create a society based on merit and equal opportunity to
all members of society
Eells et al. (1971), Drenth (1972), Miron (1977), Drenth et al., (1979) and Heyneman
(1987) argued that testing had contributed to more effective use of manpower, and
more equal distribution of educational and professional opportunities, and
identification of talents that might otherwise remain hidden.
80
These comprehensive tests which recognised skilled young students from others were
widely used. They were used to such an extent that scores in some studies were
considered a scale for a student’s or child’s skills. These tests as well (intellectual
skills in particular) were used to distinguish students with special skills in science, arts
or other skills such as human relation skills. They also helped in distinguishing
students with special skills and with high intelligence skills (Shafile,
).
In Libya we can use these intelligence tests to recognise the intellectual abilities of
students. Depending on the test results, students with high or low scores can receive
the appropriate attention and assistance.
Zahran (1990) identified the importance of intelligence tests in particular for children
that may be classified according to their levels. Majdub (1991, p.215) who studied the
academic achievement of two groups of students from Tripoli University concluded
that
Psychological tests are seriously neglected in Libya to a serious

extent. Tests of very important psychological variables have not
been standardized or introduced to Libyan society. In fact the
Standard Progressive Matrices test and the other psychological scales
are introduced to the Libyan culture for the first time
The research found that Libya is now in urgent need, more so than at any other time,
of an intellectual test to be used in selecting students and nominating them to colleges
and universities. In Libya, we do not need a large number of graduates: more so, we
require a greater number of vocational students.
Doubtless to say, the proper use of mental and other ability tests and measurements
within the local environment would provide the indigenous local market with
81
workers, especially when they are classified according to their skills. Issawi (1973)
found that these tests were widely used in filling empty jobs and in choosing the best
person for the best place (vocational, industrial, or even the military sector).
Attashani and Abdalla, (2005) stated that it was harmful for the country’s economy to
select a person for a job that did not agree with his or her intellectual abilities.
Heynman (1987, p.251) pointed out the importance of educational selection to the
economic performance of developing countries. He stated
In a competitive international environment, not choosing one's

technical elite from among the brightest citizens can have a grave
effect on economic performance. By one estimate, developing
countries could improve their Gross National Product per Capita by
5% if they were to base leadership upon merit
Abdalla (2002) mentioned that for school selection, in many western countries, it is
customary to give both intelligence and achievement tests. Many studies in
developing and western countries ( for example, Sinha, 1968; Rao, 1974; Maqsude,
1980 and 1983; Carver, 1990; Andrich and Styles, 1994) made use of intelligence
tests especially the Raven's Standard Progressive Matrices (SPM) test for school
selection and prediction
Depending solely on students' grades of the last year in secondary school to gain
admission to Libyan universities may lead to making some mistakes regarding
students' admission. However using some psychological tests in conjunction with
secondary school grades could minimize two principal errors, for example, admitting
students who might fail in the university and rejecting students who might succeed
(Majdub, 1991.
82
The study highlights the following aspects also:
• This study is considered to be the first attempt to standardize Raven’s

Standard Progressive Matrices (SPM) test for a sample from Libya. Majdub
(1991) reported that psychological tests are seriously neglected in Libya. They
have not been standardized or introduced to the Libyan society. Lynn and
Vanhanen (2006) stated that not much is known of the intelligence of the
populations of North Africa.
• Providing norms for the (SPM) test for use, in conjunction with examination
grades, to help the authority in implementing appropriate decisions related to
the future of individuals, and to guide them to educational programs that will
suit their abilities. Also, for use in job selection to match applicants to suitable
employment. Many sectors in Libya only use examination grades as the
method in matching students to various academic establishments and for
various jobs in the vocational sector. Attashani and Abdalla (2005) mentioned
that no single test of intellectual ability or aptitude has been officially adapted
or developed to be used for intelligence measurement or aptitude in Libya.
• Providing the means to estimate levels of intelligence since our society lacks
these tests, to be able to recognize high IQ in the society and well as low IQ.
• To study difference in level of intelligence between sexes, age groups and
different locations such as rural and urban areas.
From the above mentioned points and in view of the present situation in Libya it is
clear that there is a great demand and need for adapting at least one test in each of the
following areas: intelligence, aptitude, vocational interests, and personality to provide
researchers, psychologists and policy makers with effective tests for evaluation,
selection, and diagnostic purposes. For a developing country like Libya such tests
83
which give accurate measures of intelligence, achievement and personality are crucial
in the future development of its students and workforce alike.
3.7 Study aim
To develop norms for the classic form of the Standard Progressive Matrices (SPM)
test in Libya and to identify the distribution of IQ scores within a sample of Libyan
students.
3.8 Research Question
“What are the norms for a Libyan sample when the SPM test is applied as an
appropriate measure of mental ability?”
3.9 Research objectives
1. To determine psychometric characteristics (reliability, validity, difficulty and
2. To study the relationship between SPM mean scores and student’s academic
3. To investigate the presence of significant differences in sample
performances on the SPM test according to gender, region (cities and
villages), academic discipline (science and arts), geographical areas (main
city, secondary city, coastal, mountain and desert), age and study levels.
4. To investigate the presence of significant differences in sample performance
on the SPM test according to region and gender, age and region, region and
study levels, geographic areas and gender, academic discipline and gender,
age and gender and age and academic discipline.
5. To investigate variability of SPM means score gender based on age and gender
based on geographic areas and gender based on academic discipline.
84
6. To examine the contribution of the independent variables gender, age and
regions and academic achievement in predicting SPM scores.
7. To compute the percentile ranks for the SPM scores according to the sample
age levels.
8. TO compare performance on the SPM test for a Libyan sample with that of
other countries (developed and developing countries).
Libya has witnessed extensive improvement in the education sector. Nevertheless, no
single test of mental ability has been officially constructed or adopted for the
measurement of the intelligence in a Libyan setting. Lack of use of intelligence tests
in Libya is mainly due to a lack of test experts and information and knowledge
regarding the usefulness and effectiveness of these tests among people who were
directly affected by testing.
The lack and misuse of some intelligence tests to estimate the mental ability has some
detrimental implications and lead to wrong prediction, placement and treatment of
students whom underwent the test. Also guidance, counselling and direction of
students towards universities and colleges and of personnel to various types of jobs
have been affected by the absence and misuse of intelligence and personality tests. It
is believed that intelligence tests are important and vital to the educational and
economical system of the society.
The present study tried to remedy and rectify the above problems. It is an attempt to
provide an intelligence test that best suits a Libyan setting. It will investigate and
examine the performance of a Libyan sample on the Standard Progressive Matrices
test, and explore its applicability as an appropriate measure of mental ability.
85
The focus of the study was to standardize the British mental ability test; the Raven's
Standard Progressive Matrices (SPM) test to a sample consisting of School and
University students (8 to 21 years) from the eastern province in Libya.
The study aims to develop norms for the classic form of the SPM test to identify the
distribution of IQ scores within Libyan students.
In the next chapter we give a complete description of the SPM test. This will mainly
include past studies along with their findings. A detail review of the available
literature with critical analysis will also be exposed.
86
Chapter four: REVIEW OF STANDARD PROGRESSIVE MATRICES LITERATURE
4.1 Introduction
The aim of this study was to develop norms for the classical form of the Standard
Progressive Matrices (SPM) test and identify the distribution of IQ scores for a
sample of Libyan students. This chapter presents, in details, this review and sheds
light on prominent studies that have extensively employed the SPM test and related
subjects.
To achieve the desired aim, a comprehensive review was undertaken to identify and
appraise the available literature that described psychological and mental testing.
Greater emphasis was on the SPM test in particular. Studies in this review were
identified through an electronic search of databases such as PsycINFO, American
Psychological Asociation (APA), American Educational Research Association
(AERA), Educational Testing Association (ETS), National Council on Measurement
of Education (NCME), Educational Resources Information Centre (ERIC), Ingenta,
Web of Science, Dissertation Abstracts, the British Index to Theses, and Cambridge
Scientific Abstracts. In addition, the following active researchers in the field were
contacted; John Raven, Richard Lynn, Ahmed Abdal-Khalek and Omar Khelefeeh.
The earliest article published on SPM testing dated back to the year 1948. The first
step in the searching process was the identification of key concepts and location of
appropriate references. Key words used to locate relevant articles included:
standardization, intelligence testing, SPM test, validity, reliability and meta-analysis.
Data were extracted using the following categories: author, country, year of
publication, population sampled, age, SPM means and standard deviation’s and
sample size. Many papers published between 1948 and 2009, were identified and
subsequently critically appraised.
87
In addition, the SPM 1988, 1996, 2000, 2003, 2004 and 2008 manuals were included
to the papers and were utilized in this study (Raven, et al., 1988; 1996; 2000; 2003;
2004; 2008).
This chapter has been divided into nine sections. The first section provides general
information regarding the Progressive Matrices tests. The second section describes the
SPM test. The third section talks about reporting SPM results. Section four deals with
standardization of the SPM test. Sections five, six and seven investigate reliability,
validity and item analysis of the SPM test. Section eight briefly reviews relevant
previous studies which have employed of the SPM test. Last but not least section nine,
a summary of the main issues discussed in this chapter.
4.2 Progressive Matrices Tests
The Progressive Matrices Tests resulted from the work of the British psychologist
John C. Raven and geneticist Lionel Penrose. It was first published in 1938. Their
work was based on Spearman’s two-factor theory. In fact, the Progressive Matrices
tests are among very few tests which are based on a theory of intelligence (Raven,
2004).
Sinha (1950), a student of Cyril Burt, claimed that the Progressive Matrices tests were
not an original idea of Raven’s, as was often thought. He argued that they were
developed slowly out of the non-verbal analogy test constructed by Burt. Burke
(1958) also attributes the origins of the Progressive Matrices to the work and thinking
of Burt, Spearman and their students.
Spearman (1946) reported that the measurement of the “g” factor had been achieved
by the use of the Matrices test. He went further by considering the Progressive
88
Matrices test as the best of all nonverbal tests of “g”. Anastasi and Urbina (1997)
stated that Raven Progressive Matrices and vocabulary test were developed to
evaluate the two components of “g”; eductive ability and reproductive ability.
Eductive ability, on one hand, is mostly a nonverbal ability measured by the matrices.
On the other hand, reproductive ability is mostly verbal and measured by vocabulary
tests.
Lewis (1974) wrote that the Progressive Matrices test was a test of reasoning, based
on non-verbal data. Items were devised especially to evaluate the ability to perceive
relation and so provide, in combination, a measure of “g” factor.
Murphy and Davidshofer (1991) noted that a number of factor analyses of Raven’s
Progressive Matrices suggested that Spearman’s “g” is the only variable that is
reliably measured by the test. Little evidence can be drawn to indicate any significant
effects of spatial visualization or perceptual ability on the test scores. Carpenter et al.,
(1990, p.404) described Progressive Matrices as a non-verbal measure of analytic
intelligence, they said:
Analytic intelligence refers to the ability to deal with novelty, to adapt

one's thinking to a new cognitive problem. It is the ability to reason
and solve problems involving new information, without relying
extensively on an explicit base of declarative knowledge derived from
either schooling or previous experience.
Powers et al., (1986a) pointed out that Progressive Matrices were designed to measure
individual’s nonverbal mental ability through the assessment of abstract reasoning or
ability to perceive and apply relationships.
89
According to the 2004 SPM manual, Raven published the first version of the SPM test
in 1938. The current version of the SPM test is essentially the same. In 1947, small
adjustments to item (B.8) were made to improve the absolute order of difficulty.
Progressive Matrices are available in three forms with increasing difficulty:
a) Standard Progressive Matrices (SPM) test for use with individuals over six
years of age, within the normal adult range of ability. The1938 published SPM
is the most widely used form of Progressive Matrices tests.
b) Coloured Progressive Matrices (CPM) test was developed for use with
children aged five to eleven, the elderly, and the mentally retarded.
c) Advanced Progressive Matrices (APM) test sets I and II for individuals above
eleven years of age with average or higher intellectual ability.
The CPM and APM tests were both published in 1947 for the first time. All three
tests were designed to be used in association with a vocabulary scale. This is such that
verbal ability can be measured when required. There are two versions of the
vocabulary scales according to age; the Crichton Vocabulary Scale for children and
the Mill Hill Vocabulary scale for adults. The latter is available in senior and junior
forms (Court, 1983 and Raven, 1989).
The SPM test was adopted as the basic intelligence test by the USA Army and Navy
personnel selection departments in 1941. It was the main test for military
classification in Great Britain. It was utilized to ensure that normal intelligent recruits
were not rejected due to poor education. Before the end of the Second World War, it
had been already applied to several millions of recruits (Vernon, 1960; Cronbach,
1970).
90
In addition to the above characteristics, Raven Progressive Matrices test is probably
one of the most widely used culture-fair tests. Raven et al., (1996) mentioned that for
comparative purposes the SPM test became used internationally, and no general
revision of it has appeared necessary.
4.3 Description of the SPM Test
The SPM test is a non-verbal ability test consisting of a series of geometrical designs;
a 3x3 "matrix" grouped into five sets lettered A, B, C, D and E. Each set consists of
12 matrices. These Matrices are presented in black and white pictorial context. The
first matrix in each set is easy so as to be self-evident then it is followed by more and
more difficult ones.
Jensen (1980) showed that each set involves different principles of varying matrix
patterns. Also, within each set the items become progressively more difficult. Thus
after every 12 items, the subject is always faced by a quite simple item. This prevents
discouragement and loss of interest of participants.
The early matrix serves to teach one how to solve the later matrix. Thus it appears to
be a measure of a person’s ability to learn and apply new material, at least in the
visual mode (Armfield, 1985).
In each matrix, a part located in the lower right-hand of the geometrical design is
missing. Six alternative (sets A and B) and eight alternatives (sets C, D, and E) are
given below each matrix. All of these alternatives fit in the missing part. Only one,
however, logically belongs to the matrix.
91
The test instructs the participants to look across the rows and then look down the
columns to identify the rules of determining the missing part. The items are scored
either right or wrong. The subject's score on the SPM test is the total correct answers.
The maximum and minimum scores are 60 and 0 respectively.
Progressive Matrices problems are usually easier to solve than to describe (Hunt,
1975). An example of the Progressive Matrices problem is shown in Figure 4.1. The
pattern on the top is missing a piece, and the subjects must determine which numbered
piece below will complete it.
Figure 4.1 Typical items from the SPM Test. A5 presents an easy item whereas E1
presents a difficult item (Reproduced From Anistasi and Urbina, 1997, p.263).
Raven et al., (1988) described the SPM test as a test of a person's capacity, at the time
of the test, to apprehend meaningless figures presented for his observation. Seeing the
relations between them, conceiving the nature of the figure and completing each
system of relations presented. The bottom line is to develop a systematic method of
reasoning.
92
Researchers investigated various methods in an attempt to understand the most
efficient process that can be used to determine the missing parts, for example, an
answer which fits may, as Raven et al., (1988) puts it: (a) complete a pattern, (b)
complete an analogy, (c) systematically alter a pattern, (d) introduce systematic
permutations, or (e) systematically resolve figures into parts.
Hunt (1975) suggested that there were two quite different solution algorithms; a)
Gestalt algorithm, which deals with a problem by using the operations of visual
perception, such as the continuation of lines through blank areas and the
superimposition of visual images upon each other. The gestalt algorithm relies upon
the mental manipulation of sensory images. b) Analytic algorithm, which applies
logical operations to features contained within elements of problem matrix. The
analytic algorithm deals with abstracted features of displays, by operations such as,
supplement, delete, subtraction and movement.
Anastasi (1988) thought that the easier items require accuracy of discrimination
whereas the more difficult items involve analogies, permutations and alternations of
pattern, and other logical relation. Moreover, Carpenter et al., (1990) concluded that
the following five different types of rules were used when attempting an SPM test to
determine the missing part; 1) Constant in a row: the same value occurs throughout a
row, but changes down a column. 2) Quantitative pairwise progression: a quantitative
increment or decrement occurs in size, position or number. 3) Figure addition or
subtraction: a figure from one column is added to or subtracted from another figure to
produce the third.
93
4) Distribution of three values: three values from categorical attribute are distributed
through a row. 5) Distribution of two values: two values from categorical attribute are
distributed through a row, the third value is null.
The Progressive Matrices test is usually administered with no time limit and can be
attempted individually or in groups. Raven's Progressive Matrices are very easy to
follow once the method is understood. But since there is no time limit, time taken to
finish it varies from one subject to another.
4.4 Reporting SPM Results
According to the 2003 SPM manual (P.69), the most effective and convenient method
of interpreting the significance in SPM scores is by their evaluation in terms of
percentage frequency. Where, a similar score is found to occur among people of the
same age. For practical purposes, it is convenient to consider certain percentages of
the population and group people’s scores accordingly. In this way, it is possible to
classify a given subject, according to the score obtained, as:
GRADE I: “intellectually superior”; if the score lies at or above the 95th percentile
for people of that same age group.
GRADE II: “definitely above the average in intellectual capacity”; if the score lies
at or above the 75th percentile of that same age group.
II+: if the score lies at or above the 90th percentile of that same age group.
GRADE III: “intellectually average”; if the score lies between 25th and 75th
percentile.
III+: if the score is greater than the median or 50th percentile of that same
age group.
III -: if the score is less than the median of that same age group.
GRADE IV: “definitely below average in intellectual capacity”: if the score lies at or
below the 25th percentile of that same age group.
94
GRADE V: “intellectually impaired”: if the score lies at or below the 5th percentile
for that age group.
4.5 SPM test standardization
The SPM test was first fully standardised by Raven in 1938 on a sample of 1407
children in Ipswich, United Kingdom. In 1943, extensive collection of adult norms
was performed and the test was re-standardised on school children from Colchester.
The Mill Hill Vocabulary Scale was also standardised in that study. During the fifties
and sixties, several checks were run to determine the norms accuracy. The following
table (table 4.1) illustrates some SPM standardisation studies.
95
Table 4.1 SPM standardization studies
COUNTRY YEAR N AGE RESULTS OTHER COMMENTS.
China 1986 5108 6 to 79 Percentile norms for each half- SPM standardization (Raven, et al. 2003)
year interval( 6 to 16), for three
years interval(17 to 19) and for ten
years interval ( aged 20 to 97)
UK 1979 3500 8 to 18 Percentile norms for each half- SPM standardization (Raven, et al. 2003)
year interval( 6 to 16)
Belgium 1984 to 952 25 to 89 Percentile norms for each ten years SPM standardization (Raven, et al. 2003)
1990 interval ( aged 25 to 89)
Scotland 1992 629 20 to 75 Percentile norms for five-year SPM and MHV standardization (Raven, et al. 2003)
intervals (aged 20 to 65)
Turkey 1993 2485 6 to 14 Percentile norms for each half- SPM standardization (Raven, et al. 2008)
year interval ( aged 6 and 14)
Slovenia 1998 1556 6 to 18 Percentile norms for each year SPM standardization (Boben, 2007)
96
interval (8 to 18). Also, mean
scores for each year (aged 8 to 18)
Pakistan 2004 to 1662 11 to 18 Percentile norms for each year SPM standardization (Ahmad, et al. 2008)
2006 interval (aged 11to 18)
Syria 2004 2489 7 to 18 Mean scores for each year ( aged 7 Rahmn 2004 in his PhD as standardistion SPM test reported
to 18 ) by ( Keleefa and Lynn, 2008a)
Sudan 1999 6,202 9 to 25 Mean scores for each year ( aged 9 SPM standardization ( Keleefa et al., 2008b)
to 25 )
Qatar 2001 1135 6 to 11.6 Mean scores for each year ( aged 6 SPM standardization ( Keleefa and Lynn, 2008a)
to 11.6 )
Kuwait 2006 6529 8 to 15 Mean scores for each year ( aged 8 SPM standardization Abdel-Khalek and Lynn (2006)
to 15 )
Oman 2003 5212 9 to 21 Mean scores for each year ( aged 8 SPM standardization Abdel-Khalek and Lynn (2009)
to 15 )
4.6 Reliability of the SPM Test
Reliability is the degree to which a test consistently measures whatever it is
measuring. The more reliable a test is the more confidence we have about the
obtained. It assures that the scores obtained from the test are identical to the scores
that would be obtained if the test was re-administered to the same takers. In other
words, reliability means that a test is stable in measuring a trait i.e. the results of
measuring the same trait do not differ from one time to another (Domino, Domino
2006).
There are two ways to build consistency into a test: one is to do with the test
environment; while the other with test construction. Test environment could be
divided into physical and psychological factors. Physical factors, such as room
temperature, lighting and setting, are relatively easy to keep constant. On the contrary,
Psychological factors such as emotional stress anxiety and physical illness are
difficult to control (Anastasi, 1988).
Test construction, or test nature, is another factor which affects reliability. A test
must be constructed in such a way that it assures, as much as possible, that
participants will rank about the same, each time they attempt it. Length and quality of
the test-items are two important factors in test construction. The longer the test, the
more reliable it will be. The less ambiguous the questions, the more likely the answers
will be the same on two different occasions (Bertrand, & Cebula, 1980).
It is essential that the test should have a high level of reliability. Raven, et al., (1996)
mentioned that several studies dealing with the reliability of the SPM test have
97
reported positive results. These studies covered a wide range of ages, cultural groups
and populations.
There are several methods to determine reliability. The three most commonly used
are: split-half, test-retest and internal consistency (Cronbach’s Alpha) (Anastasi and
Urbina 1997; Kenneth 1998; Kline 2000; Langdridge 2004; Domino and Domino
2006). All of these methods have been employed in the current study.
4.6.1 Test-Retest reliability
Kline (2000) stated that test-retest reliability is a correlation of the items within a test
administered at two separate occasions. The test is first conducted to a certain group.
It is then repeated on the same group after an interval extending from one week to
several years. Some factors determine the time interval to be long or short. For
example, if the test items can be remembered easily then the time interval may be
taken to be long. However, if the sample is children then the interval needs to be
short.
It is known that the shorter the intervals the higher the test-retest reliability is.
According to the SPM test 2004 manual, test-retest correlation ranges from as low as
0.46 for an 11 years interval, in a study carried out in Germany in 1983 (N=1000
school children) tested from sixth grade, to as high as 0.93 within two weeks interval,
in a study carried out in India.
From the original studies of the SPM test, Raven provided a test-retest reliability
ranging from 0.83 to 0.93 for several age groups. The results were: 0.88 for 13 years
and over, 0.93 for 30 years and below, 0.88 for 30 to 39 years, 0 .87 for 40 to 49 years
and 0.83 for 50 years and over.
98
In India, Rao (1974) mentioned that the SPM retest reliability in two weeks interval
was found to be 0.93 for a group of college students. Abdel-Khalek (1987), in his
study with Egyptian undergraduates (N=44), found a retest reliability correlation of
0.82. The time interval was one week.
Nkaya et al., (1994) administered the SPM test three times at two weeks intervals to
88 students from Congo and 68 students from France. The French mean age was 12.3
years and the Congolese was 13.3 years. For the French students the reliability
between test 1 and 2 was 0.81, between test 2 and 3 was 0.74 and between test 1 and 3
was 0.75. For the Congolese students the reliability between test 1 and 2 was 0.91,
between test 2 and 3 was 0.92 and between test 1 and 3 was 0.87. They concluded that
the test-retest reliability was higher in Congo than in France.
According to the SPM test 1996 manual, the 1986 Chinese standardisation test-retest
reliability was 0.82 at 15 days interval and 0.79 at 30 days interval. More recently,
Abdel-Khalek (2005) with Kuwaiti school students (N=968) found a retest reliability
correlation range between 0.69 (age 12) and 0.85 (age 9). The time interval between
the test and retest was one week.
Khelefeeh and Lynn (2009) conducted a study to evaluate the SPM test norms in a
Qatari standardization sample, 1135 students aged 6 to 11.5 years (517 males and 618
females). The test-retest correlation coefficients of 0.89 for males, 0.95 for females
and 0.93 for the total sample were reported. From the above studies it was concluded
that the SPM test exhibited a high test-retest reliability.
99
4.6.2 Split-half reliability
Split-half reliability test was first devised by Spearman in 1907 as an alternative to the
test-retest method. It solved the memory effect problem associated with the test-retest.
In this method the test items are split into two halves, then correlated with each other.
It is possible to split the test using the first and second halves of the test, or more
commonly, using the scores of the even and odd items (this is particularly important
with test ability where items are often arranged in an order of difficulty). Clearly,
where this is the case, there might be poor correlation between the first and second
halves of the test (Langdridge 2004 and Kline 2000).
The majority of split-half internal consistency coefficients reported in the literature
exceeded 0.90. The lowest reliability was 0.86 with 174 Iranian children aged 9
years. The highest reliability was 0.96 in a study with 91 psychiatric male patients
(Raven et al., 2003).
Burke and Bingham (1969) found a split-half corrected reliability coefficient of 0.96.
This was in a study with 91 male patients with a mean age of 35.1 years who were
referred for vocational counselling.
Baraheni (1974) found a split-half correlation that ranged from 0.86 to 0.95 with
Iranian subjects aged 9 to 18 attending primary and secondary schools. The lowest
correlation, 0.86, was with 174 girls aged 9 and the highest correlation, 0.95, was with
291 boys and 425 girls aged 15 years. For subjects aged 18, split-half correlation was
0.93 (N=304). Sinha (1977) found a total split-half reliability coefficient (odd-even
split) of 0.90 with an Indian sample consisted of 140 students aged 11 to 15. They
were studying at grades 8, 9, 10 and 11. Sinha stated that the SPM test had a high
reliability for the Indian sample. Another high split half reliability of 0.94 with a
100
sample of 194 psychiatric patients in Germany in 1983 was reported in the 2004 SPM
test manual.
Bart et al., (1986) used the SPM test to study the development of proportional
reasoning in Qatar and United States. The American sample (N=281) ranged from 10
to 13 years of age. The Qatari sample (N=273) age was between 10 to 16 years.
Participants were students in the fifth, sixth and seventh grades. The SPM test
reliability, as indexed by the coefficient alpha, was 0.95. They stated that the value of
the coefficient alpha indicated an acceptable level of internal consistency, or high
reliability, of the test.
Comparing two cultural groups in Arizona, Powers et al., (1986.a) found a reliability
of 0.87 with 127 (69 boys and 58 girls) Hispanics. The same reliability was found
with 103 (53 boys and 50 girls) Anglo-American sixth grade students.
In 1994, Duzen et al., in a study carried out on 2277 Turkish students (6 to 15 years)
reported a split-half reliability of 0.91. Similarly Ahmad et al., (2008), on a Pakistani
sample of 1662 adolescents aged (12 to 19) years and 2016 adults aged (18 to 45),
showed a split-half reliability of 0.89. Moreover, Khelefeeh and Lynn (2009) on a
Qatari sample of 1135 students aged 6-11.5 (517 males and 618 females) confirmed a
split-half reliability of 0.84 for males, 0.88 for females and 0.87 for the total sample.
The above stated studies showed a high reliability of the SPM test. The average value
was about 0.91.
4.6.3 Cronbach’s alpha reliability
The Cronbach’s Alpha and Kuder-Richardson 20 (KR-20) estimate the internal
consistency reliability by determining how items of a test relate to each other and to
101
the total test. The KR-20 formula is a special case of the general Cronbach’s Alpha.
KR-20 formula provides reliability estimates that are equivalent to the average of the
split-half reliabilities computed for all possible halves. KR-20 is useful for multiple
choice items that are scored as right or wrong. In the case where the items can have
more than two scores then Cronbach’s Alpha formula should be used (Anastasi,
Urbina 1997 and Mills, Airasian 2006).
The majority of Alpha consistency coefficients reported in the literature exceeded
0.95. Dey (1984) with 136 talented Indian students, obtained a Kuder-Richardson
correlation of 0.91. In another study conducted on 2277 Turkish students, Duzen et al
in 1994, found the alpha reliability to be 0.95.
Rushton and Skuy (2000) administered an SPM test to 309 (17 to 23 years) students in
South Africa (173 Africans, 136 whites; 104 men, 205 women). The test aimed at
comparing the performance between African and white students. The study showed
internal consistencies based on Cronbach's alpha of 0.83 for white males, 0.73 for
white females, 0.89 for African males, and 0.92 for African females.
In 2002, Rushton et al, carried out an SPM test on 342 university students (198
African, 86 whites, 58 Indians; 271 men and 71 women). Internal consistencies
computed by Cronbach’s Alpha were 0.88 for the sample as a whole, 0.61 for whites,
0.82 for Indians, and 0.87 for Africans. Moreover, Abdel-Khalek (2005) on a sample
of 6529 Kuwaiti school students found that Cornbach’s alpha coefficients ranged
between 0.88 (age 14) and 0.93 (age 9). Similarly, Taylor in 2007 carried out a study
in South Africa on 144 female and 199 male job applicants. 46.9% were black and
41.8% white. A very good internal consistency reliability (0.96) of the SPM was
reported. In the same year, Boben (2007) conducted an SPM test on 1,556 children
102
and adolescents aged 7.5 to 18 years in Slovenia. Male students consisted 53% of the
sample. Calculated Cronbach’s alpha ranged from 0.89 (age group of 12 years) to
0.93 (age groups of 9 and 17 years), with a mean of 0.92.
The following table (table 4.2) summarizes the above studies about the SPM test three
reliabilities: test-retest, split-half and internal consistency.
Table 4.2 Summary of the studies performed on the SPM test reliability
SPM TEST-RETEST RELIABILITY
Abdel-khalek Egypt 87 44 0.82
Nkaya et al., Congo 88 0.91
France 86 0.81
Abdel-kalek Kuwait 2005 968 0.78
Khelefeeh & Lynn Qatar 2009 517 0.89
618 0.95
1135 0.93
SPM SPLIT-HALF RELIABILITY
Researcher Country Year N Reliability value
Burke & Bingham USA 1969 91 0.96
Baraheni Iran 1974 174 0.86
425 0.95
Sinha Indian 1977 140 0.90
Raven et al., Germany 1983 194 0.94
Bart et al., Qatar & USA 1986 554 0.95
Powers et al., USA 1986 127 0.87
103 0.87
Duzen et al., Turkey 1994 2277 0.91
Ahmad, et al. Pakistan 2008 1662 0.89
Khelefeeh & Lynn Qatar 2009 517 0.84
618 0.88
1135 0.87
SPM TEST ALPHA RELIBILITY
Dey Indian 1984 136 0.91
Bart et al., Qatar & USA 1986 554 0.95
Duzen et al., Turkey 1994 2277 0.95
Rushton and Skuy South Africa 2000 309 0.84
Rushton South Africa 2002 342 0.88
Abdel-kalek Kuwait 2005 6529 0.91
Taylor South Africa 2007 243 0.96
Boben Slovenia 2007 1556 0.92
103
It can concluded that the SPM test has a high degree of reliability for all three tests:
test-retest, split-half and internal consistency. Thus, their combination assures that it
has a high reliability. Looking at the regions where the test has been performed; it
covers a large proportion of the world including developing and developed countries.
The fact that the reliability of the test was relatively constant implies that the SPM test
has a culture-fair reliability.
4.7 Validity of the SPM test
Validity denotes the extent to which a test measures what it is supposed to measure
and, consequently, permits for an appropriate interpretation of scores (Anastasi and
Urbina 1997 and Langdridge 2004).
Validity provides evidence regarding the appropriateness of a test. Reliability, on the
other hand, as discussed in the previous section indicates the consistency of the scores
produced. The validity of a test depends on its reliability. A valid test is always
reliable. A reliable test could, however, be invalid. In other words, if a test is
measuring what it is meant to measure it will be reliable. Nonetheless, a reliable test
can consistently measure the wrong thing and hence be rendered invalid. Suppose an
instrument that is intended to measure social studies concepts actually measured only
social studies facts. It would not be a valid measure of concepts but can measure the
facts very consistently (Mills, Airasian 2006, Langdridge 2004 and Anastasi, Urbina
1997). Therefore reliability of a test is necessary but not sufficient for establishing its
validity. Reliability and validity are specific to the interpretation being made and the
group being tested. As a result we cannot simply say that a certain test is reliable
and/or valid. We rather must say that the test is reliable and/or valid for this particular
interpretation and this particular group (Mills, Airasian 2006).
104
Validity is the most paramount characteristic of a psychological test. To the extent
that without empirical data regarding the validity of a test we have no evidence,
conclusive or persuasive, as to what the test actually measures. Consequently it is not
possible to provide meaning to or interpret the test scores (Brown, 1983, Anastasi and
Urbina 1997 and Langdridge 2004).
There are three types of validity used in educational and psychological measurements:
content validity, criterion-related validity and construct validity (Anastasi and Urbina
1997).
4.7.1 Content Validity
Content validity refers to the extent to which a test measures a sample of the
behaviour which it is intended to measure (Raven, et al., 2003). In assessing the
content of a measuring instrument, one is concerned with the question of how well the
content of the instrument represents the entire universe of the content being measured
(Gronlund, 1981). Therefore, evaluation of this type of validity depends on the
analysis of the measured objects in terms of partial elements. If the items of the test
cover those elements in typical portions and the test appropriately samples the whole
measured content then the content validity is considered to be high. Content validity is
evaluated objectively and determined by logical analysis of the test content. However
it cannot be expressed in terms of a numerical index (Anastasi and Urbina 1997 and
Gay, et al., 2006).
It is worth mentioning that the content validity is sometimes referred to in literature as
the face validity. Although the meanings of the two often overlap they are quite
distinct. Face validity is essentially the apparent measurement of the test and not the
105
actual one. In other words, face validity refers to the degree to which the test appears
to be valid for non-technical observers such as examinees and test administers. Its
main role in the process of validation is the initial scanning in test selection
procedures (Anastasi and Urbina 1997 and Gay, et al., 2006).
As an example, the SPM test meets an important requirement for use in cross-cultural
contexts. It has face validity in the sense that it appears to those who take and
administer the test to be assessing basic ability to reason in a form of presentation.
The latter is not culturally biased though (MacArthur, 1960).
4.7.2 Construct Validity
Construct validity of a given test is the extent to which the test is said to measure a
hypothetical construct or trait. The word construct in this context is synonymous to
concept (Anastasi and Urbina 1997 and Gay, et al., 2006).
Kenneth (1998) reported that psychological constructs are unobservable postulated
variables that have evolved either informally or from psychology theory. Intelligence,
anxiety, aptitude, musical ability, critical thinking, ego strength, dominance and
achievement motivations are examples of common constructs. Construct validation is
the systematic analysis of test scores designed to assess whether there is a basis for
validity. The questions to be answered by construct validity are: what traits are
measured by the test? And to what degree? The process of construct validation
involves identifying and clarifying the factors that have an effect on the test scores.
The test performance can then be interpreted most meaningfully. This process
involves the accumulation of evidence from a wide range of different studies
106
(Gronlund, 1981, and Ary et al., 1985). Anastasi and Urbina (1997) stated that factor
analysis and internal consistency are both subtypes of construct validity.
4.7.2.1 Factor analysis
Factor analysis provides research information regarding the extent to which a set of
items measures the same underlying construct or dimension of a construct, and
evaluates the extent to which the individual items on a scale truly cluster together
around one or more dimension. Items constructed to measure the same dimension
should load on the same factor; those constructed to measure different dimensions
should load on different factors (Anastasi (1988), Anastasi, Urbina (1997), Kunnally
and Bernstein (1993)). In addition, Geri and Judith (2006) reported that this analysis
showed whether the items in the instrument reflected single or several constructs.
The SPM test was designed to be a measure of the general intellectual ability “g”, as
postulated as such by Spearman (Spearman, 1904; Spearman and Wynn-Jones, 1951).
It had been universally accepted for over half a century that the test was an
appropriate measure of “g”. This position was endorsed by Emmett (1949) based on
factor analysis of the SPM items in a sample of 11 years old children. More recently,
Jensen (1998, p. 541) contended that “the total variance of Raven scores in fact
comprised virtually nothing besides g and random measurement error”. Raven, Raven
& Court (2000, p.34) stated that “The Progressive Matrices has been described as one
of the purest and best measures of “g”, or general intellectual functioning”.
The SPM test (2004) manual reports several factor-analytic studies involving a large
number of children and adults. For example, investigations of British children showed
a high loading of up to 0.83 on “g” factor (Raven et al., 2004). Burke and Bingham
107
(1969) found a very high loading of up to 0.76 on “g” with adults. Also, as reported in
the SPM 1996 manual, (Zager et al., 1980) obtained a loading of .080 with “g”.
Moreover, Abdel-Khalek (1987) carried out an SPM test on Egyptian university
students (205 males and 247 females). A principal component factor-analysis with
unities inserted in the diagonals was carried out to determine if the items contained a
general factor and possibly other factors. Analysis showed a significant factor
(eigenvalue >1.0) that was extracted from both groups. This factor accounted for
79.6% and 72.6% of the total variance for male and female undergraduates
respectively. Another study carried out by the same author in Kuwait (2005), on a
sample of 6529 students aged 8-15 years (3278 boys and 3251 girls), investigated
factorial-analysis validity of the SPM test. A principal components factor-analysis
was carried out to find present factors. Results showed only one significant factor
which had a large eigenvalue of 3.46 that accounted for 69.2% of the variance.
Despite the above findings, a dispute was raised on the issue of whether the
Progressive Matrices are really a pure measure of “g”. A number of scholars have
contended that while the Progressive Matrices were largely a measure of “g” they also
contained a small visualization or spatial factor. Among them were Adcock (1948),
Keir (1949), Banks (1949), Vernon (1950), Gabriel (1954), Gustaffson (1984, 1988).
They concluded that the SPM test measures a reasoning factor and another factor
which was called “cognition of figural relations”. Hertzog and Carter (1988)
contended that the SPM contained two further factors named: verbal intelligence and
spatial visualization.
In agreement with the previous studies, Rimoledi (1948), Banks and Sinha, (1951)
and Sinha (1968) reported that “g” accounted for only 36% to 37% of the total
108
variance of the test scores. They suggested that the SPM test measures other factors
in addition to g. Furthermore several factor-analysis studies have examined the
overlap between skills on the Raven and other test of mental abilities. These studies,
which have most often been conducted with adult or older adolescent participants,
have provided evidence that Raven test evaluates perceptual and spatial abilities as
well as Spearman’s “g” factor. (Corman and Budoff, 1974).
On a sample of 920 Mexican primary school children, factor-analysis of the SPM test
results showed a strong reasoning factor and a weaker visualization ability factor. This
was among the results on contrary to the view that the SPM only measures “g” (Lynn
et al., 2004). Furthermore Lynn et al., (2004) conducted an SPM test in 2001 in
Estonia on a sample of 2735 adolescents whose age ranged from 12 to 18 years. They
identified a general factor and three further factors that they reported as: the gestalt
continuation, found by Van der Ven and Ellis (2000), verbal-analytic reasoning and
visuo-spatial ability. Further analysis of this study showed a higher order factor
identified as “g”.
The question that can arise here is how does “g” relate to the other three
factors? Contemporarily, the widely accepted theory that counts for this relation is
Carroll's three stratum model (1983). This consists of:
• Stratum 1: “g”
• Stratum 2: eight second order group factors, e.g. fluid intelligence,
crystallised intelligence …etc.
Stratum 3: around fifty factors. These are approximately the same as what are called
“Lower order factors” and “specific factors” (Carroll's, 1993),
109
4.7.2.2 Internal consistency
One of the methods used to identify a construct is the internal consistency method.
The chief criterion of this method is the total score of the test. Correlation methods are
often employed in this validation process. These involve item-test scores correlation
and subtest-test scores correlation (Anastasi (1988) and Anastasi, Urbina (1997)). The
latter correlation may be used in some intelligence tests where separately conducted
subtests are performed. The score on each subtest is correlated with the total score of
the test. In doing so, only those subtests which show correlation of 0.3 or higher are
retained (Tabachnick & Fidell 2007). The test is then said to be validated by internal
consistency.
As stated above, the internal consistency plays a role in determining the characteristic
of a trait or domain behaviour represented by the test. This can be easily seen by the
fact that highly correlated items and subtests with the test strongly suggest that the test
is measuring what it is meant to measure. In this sense, the internal consistency shares
some features with construct validity (Anastasi (1988) and Anastasi, Urbina (1997)).
It should be noted that no single validation process can establish the construct validity
of a given test (Gay et al., 2006).
Abdel-Khalek (1987) in his study on Egyptian undergraduates estimated the internal
consistency of the five sets of the SPM test. The Pearson’s product-moment was
employed. All of the inter-correlations between the sets were positive and statistically
significant. They ranged for the male group from 0.32 to 0.67 (N = 205) and for the
female group from 0.30 to 0.57 (N = 247). Moreover, in 2005 Abdel-Khalek
administered a study on Kuwaiti school students (N=6,529 aged 8-15 year). He
110
investigated the internal consistency of the SPM. The Pearson correlation coefficients
were statistically significant. They ranged from 0.43 to 0.77 for p < 0.001.
4.7.3. Criterion-related Validity
Criterion-related validity is determined by relating the performance on a test to the
performance on another test or measure. The second test measure is the criterion
against which the validity of the initial test is evaluated (Mills, Airasian (2006) and
Kenneth (1998)). In other words, criterion-related validity refers to the relationship
between the scores on a measuring instrument and an independent external variable
(criterion) believed to measure directly the behaviour or characteristic in question.
This type of validity can be reported by means of a correlation coefficient. Criterion
validity has two forms:
a) Concurrent validity: correlation between test scores and a criterion available at
the same or close point in time.
b) Predictive validity: correlation between test scores and a criterion that occurs
at a later point in time (Ary et. al, 1985 and Domino, Domino 2006).
Anastasi (1988) stated their definitions and distinguished between them in the
following:
The logical distinction between predictive and concurrent validity is

based, not on time, but on the objectives of testing. Concurrent validity
is relevant to tests employed for diagnosis of existing statues, rather
than prediction in future outcomes. The differences can be illustrated by
asking “Is Smith schizophrenic” (concurrent validity) and “Is Smith
likely to become schizophrenic” (predictive validity).
Domino and Domino (2006) mentioned that the SPM concurrent validity with
standard intelligence tests such as Stanford-Binet or WISC exhibited correlations
111
ranging from 0.50 to 0.80. Predictive validity, especially of academic achievement,
generally fell in the region of 0.20 to 0.60 (Raven, 2004). Powers and Barkan (1986a)
reported that the SPM scores had a correlation of 0.40 with reading achievement
scores, 0.54 with language achievement, and 0.49 with mathematics.
Anastasi and Urbina (1997) mentioned that specific indices used as criteria measures
included school grades, school achievement, promotion, graduation records and
teachers’ or instructors’ rating for intelligence. Such ratings given within an academic
setting are likely to be closely related to the individuals’ scholastic performance.
Likewise they may be properly classified with the criterion of academic achievement.
The correlations of the SPM test with intelligence test, standardised achievement tests
and school examinations varied with age, gender and sample homogeneity. Some
studies regarding SPM test correlation with intelligence, standardised achievement
tests and school examinations are presented below.
4.7.3.1 SPM Correlations with Intelligence Tests (concurrent validity)
The SPM test manual (2003) reported correlations in the range of 0.54 to 0.86
between the SPM and other IQ tests e.g. Stanford-Binet and Wechsler Scales for
English speaking children and adolescents. Correlations gained in cross-cultural
research with non-English speaking children and adolescents, as reported in the SPM
test manual (1996), tend to be lower. Generally they range from 0.30 to 0.68. Also as
reported in the manual, de Lemose (1989) in an Australian study, found a tendency
for students from non-English speaking cultures (e.g. Southern and Eastern European
and Middle Eastern countries) and those with non-professional fathers to score lower.
112
The following is a brief review of the studies conducted to determining the
relationship of the SPM test scores with more widely used intelligence tests such as
Lorg-Thorndike Test, Wechsler Scales (WISC-R for children, WAIS for adults),
Army General Classification Test (AGCT) Cohen Test, General Mental Ability
(GMA), Minnesota Paper Form Board (MPFB), Otis Gamma, Revised Beta, Quick
Test, Orange Juice Test (OJT), Stanford-Binet, AH2 tests, Otis-Lennon, Primary
Mental Abilities (PMA), Cattell's Culture Fair Test (CCFT), Arabic Verbal Reasoning
Test (AVRT), San Diego Test of Reasoning Ability (SANTRA), and Draw-a-Man
test.
Tulkin and Newbrough (1968) conducted an SPM test and Lorg-Thorndike test to 356
fifth grade and sixth grade high and low social class and black and white students.
Correlation between SPM test scores and Lorg-Thorndike Verbal IQ was 0.45 for
white high class (N=128); 0.33 for white low class (N=75); 0.40 for black high class
(N=50); and 0.48 for black low class (N=103).
Correlation between SPM test and Non-verbal IQ was 0.53 for white high class; 0.52
for white low class; 0.40 with black high class and 0.45 with black low class. It was
concluded that all correlations between SPM test and Lorg-Thorndike IQ test were
significantly different from zero. For the white groups the SPM test score was
somewhat more related to Non-verbal IQ than to Verbal IQ. This pattern was not
found in black groups.
In India Mehrotra (1968), with a small sample (N=45) of students with a mean age of
14.2 years, found a correlation of 0.68 between SPM test and WISC-R Full Scale,
0.60 with Verbal and 0.61 with Performance sub-tests. Burke and Bingham (1969)
found a significant correlation between SPM scores and Army General Classification
113
Test (AGCT). Similar results found with the Cohen Test with a sample of 91 male
patients (mean age 35.1 year) who were referred for vocational counselling services.
The correlation between the SPM and Cohen Verbal was 0.59; with Cohen Memory
0.49; with Cohen Perceptual Organization was 0.61. The correlation between the SPM
and AGCT Verbal was 0.60; with AGCT Numerical 0.66 and with AGCT Total was
0.67.
Mohan (1972) in India investigated the relationship between verbal and non-verbal
ability tests. He found a correlation of 0.65 between the SPM test and General Mental
Ability (GMA). The sample consisted of 310 college and university students ranging
in age from 18 to 25 years.
Mclaurin and Farrar (1973) administered both SPM test and WAIS test to 201
volunteer university students studying introductory courses in psychology. The
correlation between the SPM test and the WAIS were 0.57 for Full Scale, 0.45 for
Verbal and 0.54 for Performance. In the same study they investigated the validity of
the SPM test by correlating it with grade point average (GPA) and Minnesota Paper
Form Board (MPFB). Correlation between SPM test and MPFB test was 0.45.
Correlation between the SPM test and GPA was 0.21. This correlation was as good as
the correlation between GPA and WAIS-Full Scale which was .28 (N=201). The
validity of the SPM test was concluded to be moderate.
Three studies evaluated the use of the SPM test with psychotic patients in the USA
reported reasonable correlations between the SPM test scores and WAIS Full Scale,
Verbal, and Performance IQs. Burke and Bingham (1969), with 91 American male
patients at a veteran’s hospital referred for vocational counselling with a mean age of
114
35.1 years, found a correlation of 0.75 between the SPM and the WAIS Full scale,
0.65 with the WAIS Verbal IQ and 0.76 with WAIS Performance IQs.
In another investigation with psychiatric patients in Texas, Vincent and Cox (1974)
found that the SPM test correlated reasonably well with the WAIS Scale. Correlations
were 0.85 with Full Scale, 0.84 with Verbal and 0.75 with Performance. The sample
(N=131) was taken from psychological files of the Texas Vocational Rehabilitation
Unit. Most patients suffered physical, emotional or mental disability. It was concluded
that the SPM test is a viable tool for measuring intelligence in such population.
Also in the above study Vincent and Cox (1974) correlated the SPM scores for a
sample of 226 psychiatric patients with three IQ tests. Most patients had a physical,
emotional, or mental disability. The sample mean age was 28.7 year and consisted of
57 % white, 36 % black and 7 % Latin Americans. The correlation between SPM
scores and Otis Gamma scores was .70 (N=97), with Revised Beta .38 (N=58) and the
correlation with Quick Test was .60 (N=71).
The third study with psychiatric patients (N=256) was done by Burke (1985) who
correlated the SPM scores with WAIS score and found that the correlation between
the SPM and WAIS Full scale was .66, with Verbal scale .61, and with Performance
scale was .63.
Bart et al., (1986) administered the SPM and the test of proportional reasoning Orange
Juice Test (OJT) to a sample of 273 American and 281 Qatari fifth, sixth and seventh
grader students. They found a significant correlation of .49 between SPM and OJT.
According to the 1996 SPM test manual, Zhang & Wang (1989) in China found that
the SPM correlated .71 with Full scale WISC-R, .54 with Verbal and .70 with
115
Performance (no age level or sample size were reported). Another study by Narayanan
and Paramesh (1978) using the SPM test in India, administered the SPM test and
Cattell's Culture Fair Test to Tamil subjects, and reported a correlation of .58.
Horton and Karees (1987) administered the SPM test to a small sample (N=20) of
students participating in a gifted students program in the United States. They found a
correlation of .72 between the SPM test and Stanford-Binet. Correlation between
Stanford-Binet and Otis-Lennon IQs Test was only .45 (N=40).
Helms (1987) with 130 Canadian university students (65 females, 65 males and
average age of 19.3 years), reported a low correlation ranging from .22 to .36 between
AH2 Scales (a general ability test) and the SPM test. A correlation of .22 for Verbal,
.28 for Numerical, .31 for Perceptual and .36 with AH2 total scores. Helms
concluded that the SPM test correlation with other mental ability test was in a range of
.50 to .70, according to Jensen (1980). These values of AH2 correlation are somewhat
lower than the usual value for correlation among test of general ability, but the
correlation reported here are even lower.
In the US, the SPM test was administered by Jensen, et al., (1988) with a time limit of
40 minutes to a total of 261 undergraduates’ students. The students also did
Advanced Progressive Matrices (APM) and Otis-Lennon Mental Ability Test form.
Correlation between SPM and APM was .58 and correlation with Otis-Lennon was
.47.
In a study in Mississippi by Karnes and Whorton (1988), the SPM and Culture-fair
Intelligence Test was administered to 625 (441 white and 211 black students), in rural
county elementary school (grade 3-8). The mean age was 8.10 years. 410 students
116
were on free or reduced lunches and 245 students on paid lunches. The Pearson
correlation between the SPM and Culture-fair Intelligence Test was a moderate .46
and significant.
In a study carried out in Libya on two groups from Tripoli University, Majdub (1991)
found significant correlation between SPM and an Arabic Verbal Reasoning Test
(AVRT). For the Arabic major group correlation between SPM and AVRT was .53
(N=78). For the Education major group correlation between SPM and AVRT was .25
(N=111).
In a study by Johnson et al., (1994) a sample of 449, second, fifth and seventh grade
students in San Diego city school were given the SPM test. In this group, 77 were
African American, 122 Asian, 54 Filipino, 156 Latino and 40 White American. Of
these 215 were boys and 234 were girls. The mean age of children was 11 years (age
range from 6 years 8 months to 13 years 10 months). They administered the SPM and
an alternate form of the SPM called the San Diego Test of Reasoning Ability
(SANTRA). Correlation between SPM and SANTRA tests was highly significant
(.90).
Khelefeeh and Lynn (2009) in a Qatari sample of 1135 students aged 6-11.5 (male N
= 517 and female N = 618) reported a validity (correlation coefficient) of 0.86
between the SPM and the Draw-a-Man test.
The correlation of the SPM with both general intelligence test (full score) and a total
of 3 intelligence subtests (Non-verbal, Verbal and Numerical) will be averaged. In
doing so the Fisher’s z transformation was employed (Garret and Woodworth 1966).
It is mentioned there, Garret and Woodworth 1966, that this transformation is more
117
stable and has open limits (not from -1 to +1 as for r). Each sample r is converted into
a new equivalent statistic z. The averaged z is then converted back to r. The following
table summarises the above studies about the SPM test concurrent validity and r to z
Fisher’s transformation. The results of the tables will be discussed afterwards.
Table 4.3 Summary of studies on SPM test concurrent validity with r to z Fisher’s
transformation results
Researcher Country Year (N) IQ test r z
Tulkin & USA 1968 128 Lorg-Thorndike;(Verbal) 0.45 0.45
Newbrouhg Lorg-Thorndike;(Non-Verbal) 0.53 0.53
75 Lorg-Thorndike; (Verbal) 0.33 0.33
Lorg-Thorndike;(Non-Verbal) 0.52 0.52
Mehrotra India 1968 45 WAIS; (Verbal) 0.61 0.61

WAIS; (Performance) 0.61 0.61
WAIS; (Full Scale) 0.68 0.68
Burk & USA 1969 88 Cohen; (Verbal) 0.59 0.59

Bingham Cohen; (Memory) 0.49 0.49
Cohen;(Perceptual Organisation) 0.61 0.61
AGST; (Verbal) 0.60 0.60
AGST; (Numerical) 0.66 0.66
AGST; (Full Scale) 0.67 0.67
Burk & USA 1969 91 WAIS; (Verbal) 0.56 0.56

Bingham WAIS; (Performance) 0.76 0.76
Mohan India 1970 310 General Mental Ability;(GMA) 0.65 0.65
Mclaurin & USA 1973 201 WAIS; (Verbal) 0.45 0.45

Farrar WAIS; (Performance) 0.54 0.54
Minnesota;(MPFB) 0.45 0.45
Vincent & Cox USA 1974 131 WAIS; (Verbal) 0.84 0.84
Vincent & Cox USA 1974 97 Otis Gamma 0.70 0.70
58 Revised Beta 0.38 0.38
118
71 Quick test 0.60 0.60
Narayanan & India 1978 ---- Cattell’s Culture Fair 0.58 0.66
Paramesh
Burke USA 1985 256 WAIS; (Verbal) 0.61 0.71

Bart et al., Qatar 1986 554 Orange Juice Test; (OJT) 0.49 0.54
Horton & USA 1987 20 Stanford-Binet 0.72 0.91

Karees
Helms Canada 1987 130 AH2 Scales;(Verbal) 0.22 0.22

AH2 Scales;( Numerical) 0.28 0.29
AH2 Scales;(Perceptual) 0.31 0.32
AH2 Full Scales 0.36 0.38
Jense USA 1988 261 RAPM 0.58 0.66

Otis-Lennon 0.47 0.51
Karnes & USA 1988 649 Culture Fair Intelligence Test 0.46 0.66
Whorton
Zhang & Wang Chine 1989 ---- WAIS; (Verbal) 0.54 0.60
Majdub Libya 1991 78 Arabic Verbal Reasoning;(AVR) 0.53 0.59

111 Arabic Verbal Reasoning;(AVR) 0.25 0.26
Johnson USA 1994 446 Reasoning ability Test;(SANT) 0.90 1.50
Khelefeeh & Qatar 2009 1135 Draw-man Test 0.86 1.33

Lynn
The correlation-means between the SPM test and the general intelligence and the
three intelligence subtests are found in the table below, table 4.4
Table 4.4 the average of the correlation between SPM test with intelligence tests
Sub-Tests N Z’ Means (r)
General intelligence 3623 0.80 0.66
Non-verbal 3726 0.68 0.59
Verbal 1904 0.54 0.49
Numerical 218 0.54 0.49
119
It can be seen in table 4.4 that the SPM test correlates highly with general intelligence
and non-verbal tests than with verbal and Numerical tests. Since the SPM test is a
nonverbal test, contains no verbal items, it is expected to have a high correlation with
other nonverbal tests.
General intelligence is an ambiguous word. On one side, it can mean the sum of all
cognitive abilities. This is the meaning when it is said that the Wechsler tests measure
general intelligence. On the other side, it can be considered as the common factor in
all cognitive tests, i.e. “g”. There are other cognitive factors in addition to “g”. The
SPM test measures the “g” factor in all cognitive abilities. This, therefore, explains
the reason why the SPM test correlates to a high degree with general intelligence tests
(Lynn, 2008).
4.7.3.2 SPM correlations with achievement tests (Predictive Validity)
According to the SPM test manual (2004), the external criterion usually adapted in
predictive validity investigations is examination grades or teacher’s estimates. SPM
correlations with academic achievement tests generally fall in the region 0.20 to 0.60
with higher correlations being found with mathematics and science. Language and
overall academic achievement have a low correlation. Moreover correlations with
performance on achievement tests or scholastic achievement were generally lower
than correlations with intelligence tests. In several studies, the California
Achievement Test (CAT) served as the criterion to relate the SPM test scores.
Correlation with CAT Reading, Language, Arithmetic and over all achievement
scores ranged from 0.26 to 0.76 (Raven et al. 2004).
Tulkine and Newbrough (1968) with 356 black and white, high and low social class,
fifth and sixth grade students correlated the SPM test scores with Iowa Test for Basic
120
Skills (ITBS) achievement test. They found that for white high class (N=128) the
correlation was 0.30 with Vocabulary; 0.40 with Reading; 0.31 with Language; 0.39
with Work-study; and 0.39 with Arithmetic. For white low class (N=75) the
correlation was 0.25 with Vocabulary; 0.26 with Reading; 0.27 with Language; 0.41
with Work-study and 0.27 with Arithmetic.
The correlation between the SPM test and ITBS for black high social class (N=50)
was 0.39 with Vocabulary; 0.14 with Reading; 0.32 with Language; 0.36 with Work-
study; and 0.40 with Arithmetic. For black low class (N=103) the correlation was
0.32 with Vocabulary; 0.26 with Reading; 0.38 with Language; 0.33 with Work-study
and 0.39 with Arithmetic. In comparison, the correlation of SPM test to achievement
test (ITBS) was lower than correlation to IQ test (Lorg-Thorndike).
Sinha (1968) reported a correlation of 0.32 between SPM scores and grade point
average (GPA) with 220 students from art and science branches and a correlation of
0.36 with 204 engineering students from India. Dosajh, in his study in India as
reported by Sinha, (1968) found that the score on SPM could safely be taken as a
criterion for selection of students for technical and science courses. Dosajh’s
observation was based on the correlation of SPM test scores with examination scores
of 80 grade nine boys and girls.
Mclaurin and Farrar (1973) concluded a low correlation between the SPM test and
grade point average (GPA). Correlation was .21 with a sample of 201 university
students in the USA. Though low this corerelation score is still within the range (.20-
.60) given by Domino and Domino (2006) and Reven (2004) as mentioned above.
GPA may base on course work and partly determined by motivation and essay writing
121
ability. Since the SPM is a non-verbal test it is no surprise that it will weakly
correlates with verbal abilities such as the writing ability (Lynn,2009).
Baraheni (1974) evaluated validity of the SPM test in primary and secondary school
in Iran, by calculating correlation between scores on the SPM test and end of year
average school marks. A correlation of .44 was found with grade 6 (N=472), .29 for
grade 7 (N=360), .61 for grade 8 (N=203) and a correlation of .51 for grade 9
(N=643). Baraheni reported that the indices of the SPM test in predicting average
school marks in Iranian schools appeared to be as high as or even higher than the
coefficients reported from other countries.
Sinha (1977) in India found significant correlations between the SPM test and school
examination grades, .46 with grade eight (N=46), .47 with grade nine (N=5) and .38
with grade ten (N=35). The total correlation was .45 (N=86). Student’s age ranged
from 11-15 years old. Sinha found that the SPM test scores correlated significantly
with school examination grades in all groups except with grade nine which consisted
of only 5 students.As for the validity of the SPM test, he concluded that the results did
not stand highly for the test.
In another study in Nigeria, Maqsud (1980) investigated the validity of SPM test with
two different groups of primary school boys. A correlation which ranged from .19 to
.65 between the SPM test, English and Arithmetic was reported. He found a
correlation of .19 between the SPM test and English, and .38 with Arithmetic (N=60)
among primary school boys in traditional schools, and a correlation of .65 between the
SPM test and English, and .49 with Arithmetic (N=60) for primary school boys in
modern schools. Students from modern schools belonged to upper-middle class
122
homes, whereas students from traditional schools came from lower-middle and lower
class families. Average age of students was 12.2 year.
Maqsud concluded that a significant positive link between subjects' scores on the
SPM test and their achievement scores generally supported the theory that mental
ability is perhaps the best predictor of school achievement. Also he suggested that the
SPM test could be used for selection of secondary school intakes in Nigeria. Also, it
has been found by Chan (1982) that SPM test correlates well with non-verbal subtests
but rather poorly with numerical and verbal subtests of comprehensive scholastic
aptitude tests in Hong Kong.
Powers et al., (1986.b) in their study with 426 students (225 boys and 201 girls), from
sixth and seventh grades, reported the following correlation between the SPM test and
CAT. For sixth grade boys (N=116) the correlation was .34 with Reading, .41 with
Language, and .39 with Math. For sixth grade girls (N=96) the correlation was .36
with Reading, .50 with Language, and .60 with Math. Total sample correlation for
sixth grade (N=212) was .35 for Reading, .45 with Language and .48 for Math.
The correlation for seventh grade boys (N=109) was .45 with Reading, .50 with
Language, and .52 with Math. For seventh grade girls (N=105) the correlation was
.54 with Reading, .55 with Language, and .56 with Math. Total sample correlation for
seventh grade (N=214) was .49 for Reading, .51 with Language and .54 for Math.
Correlation ranged from .34 to .60 for sixth grade and from .45 to .57 for seventh
grade students. For sixth grade the lower correlation of .34 was with boys in Reading,
and the higher correlation of .60 was with girls in Maths. For the seventh grade the
lower correlation of .45 was with boys in Reading and the higher correlation of .57
was with girls (N=105) in Maths also.
123
It was concluded that the validity coefficients were higher for the seventh grade than
for the sixth grade students. It was higher for females than males. Further, it was clear
that the coefficients increased from reading to mathematics. The result of the study
indicated that the SPM test had a moderate predictive validity that varied depending
on sex, grade and academic criterion.
Sidles and Avoy (1987) administered the SPM test and Comprehensive Test of Basic
Skills (CTBS), a standardised achievement test, to 124 Navajo (one of the largest
Indian tribes in America) seventh and eighth grade students ranging in age from 14 to
16 years old. They found a correlation of .38 with Spelling, .39 with Reading, .46 with
Mathematics, and .47 with Language. Correlations were also computed between SPM
test and CTBS for female and male subjects. Correlations for male subjects (N=62)
were .28 with Reading, .34 with Spelling, .34 with Mathematics and .39 with
Language. For female subjects (N=62) correlations were .51 with Reading, 52 with
Spelling, .56 with Mathematics and .58 with Language. They concluded that the
correlation between the SPM test and CTBS was higher for females than males.
Carver (1990) studied the relationship between reading ability and SPM test. He
found that a correlation between the National Reading Standards Test (NRST) and the
SPM test that ranged from .36 to .68. The sample consisted of 486 students from
grade 2 to 12, from a small town, rural school system in Mid-west USA. The
correlation was .45 with grade 2 (N=42), .36 with grade 3 (N=44), .42 with grade 4
(N=42), .68 with grade 5 (N=52), .51 with grade 6 (N=54), .39 with grade 7 (N=62),
.55 with grade 8 (N=42), .59 with grade 9 (N=53), .36 with grade 10 (N=50), .54 with
grade 11 (N=19) and .51 with grade 12 (N=26). A low correlation of .36 was with
grade 3 and 10 whereas a high correlation of .68 was with grade 5. The mean of the
124
five correlations for grade 2 to 6 was .48, and the mean of the six correlations for
grade 7 to12 was .49. Carver found no evidence regarding that the relationship
between reading ability and the SPM test increased with age. Also, he concluded that
general intelligence, as measured by the SPM test, had a strong and consistent
relationship with reading ability.
In two groups consisting of Libyan university students, Majdub (1991) found a
significant correlation between SPM and academic achievement. For the Arabic major
group, correlation between SPM and academic achievement was 0.39 (N=75). For the
Education major group, correlation between SPM and academic achievement was .34
(N=110).
Andrich, & Styles, (1994) believed that the progressive matrices test contained
material not taught directly in schools and yet showed substantial relationship with
scholastic achievement. Johnson et al., (1994) correlated SPM with the
Comprehensive Test of Basic Skills (CTBS) in a small sample (N=32) from second,
fifth and seventh grade students in San Diego city school. The correlation between
SPM and Language was .48; with Reading .42 and with Math .56.
Pind et al., (2003) examined the criterion-related validity of the SPM test, in relation
to the results of the Icelandic National Examination for students in 4th, 7th, and 10th
grades. Generally the SPM sample average lied close to the INE average. In addition,
correlation of the SPM scores with the INE scores was calculated. Correlation was
found to be variable. In fourth grade (N=53) correlation with Icelandic was 0.38
whereas 0.50 with Mathematics. These correlations were appreciably higher in the
seventh grade (N= 59), being, respectively, 0.64 and 0.75. The correlations were
slightly lower in the tenth grade (N=51), 0.53 with Icelandic and 0.64 with
125
Mathematics. Finally, the two foreign languages, English and Danish, showed
correlations of 0.48 and 0.59, respectively, with the SPM. It supported the theory that
the SPM test showed higher correlation with mathematics than with language
subjects. In general, these correlations are at the higher end of those found in similar
studies.
In 2007, Laidra et al., carried out the SPM test on 3618 students (1746 boys and 1872
girls) from all over Estonia in grades 2, 3, 4, 6, 8, 10, and 12 to investigate the
relationship between intelligence and personality with academic achievement Grade
Point Average (GPA)) in Estonian schools, from elementary to secondary level.
Pearson correlation was carried out to correlate between SPM test scores and GPA.
Correlation values were for grade 2 (0.54, p= 0.001; N=364), for grade 3 (0.46, p=
0.001; N=388; ), for grade 4 (0.49, p= 0.001; N=430), for grade 6 (0.53, p= 0.001;
N=609), for grade 8 (0.48, p= 0.001; N=697), for grade 10 (0.43, p= 0.001; N=642)
and for grade 12 (0.32, p= 0.001; N=488). The analysis showed that the SPM means
score increased with increasing age. It was concluded that there did not appear to be
large differences in the way intelligence and personality dispositions related to the
grades children aquire in Estonian schools at different educational levels. Although
some traits had more effect in elementary school (e.g., Agreeableness) and others
became relatively more relevant later (e.g., Conscientiousness), students’ achievement
relied most strongly on their cognitive abilities through all grade levels. Intelligence,
as measured by SPM test was found to be the best predictor of GPA in all grades.
The SPM test correlation with achievement tests (Vocabulary, Reading, Language,
Math, Work-Study and Spelling) the Fisher’s z transformation was employed. The
126
above studies about the SPM test predictive validity are shown in table 4.5. A detail
analysis of the outcomes will be presented below the tables.
Table 4.5 Summary of the studies on SPM test predictive validity with r to z Fisher’s
transformation results
Researcher Country Year N Achievement r Z
Tulkine & USA 1968 128 ITBS test; Vocabulary 0.30 0.31
Newbrough ITBS test; Reading 0.40 0.42
ITBS test; Language 0.31 0.32
ITBS test; Work-study 0.39 0.41
ITBS test; Arithmetic 0.39 0.41
75 ITBS test; Vocabulary 0.25 0.26
ITBS test; Reading 0.26 0.27
Sinha India 1968 220 Academic Achievement 0.32 0.33

240 Academic Achievement 0.36 0.38
Mclaurin & Farrar USA 1973 220 Academic Achievement 0.21 0.21
Baraheni Iran 1974 472 Academic Achievement 0.44 0.47

Sinha India 1977 46 Academic Achievement 0.46 0.50
Maqsud Nigeria 1980 60 English language 0.19 0.19

Arithmetic 0.38 0.40
60 English language 0.65 0.78
Arithmetic 0.49 0.54
127
Powers et al., USA 1986 116 CAT test; Reading 0.34 0.35
CAT test; language 0.41 0.44
CAT test; Math 0.39 0.41
96 CAT test; Reading 0.36 0.38
Powers et al., USA 1986 212 CAT test; Reading 0.35 0.37
Sidles & Avoy USA 1987 62 CTBS test; Spelling 0.28 0.29
CTBS test; Reading 0.34 0.35
CTBS test; Math 0.34 0.35
CTBS test; Language 0.39 0.41
62 CTBS test; Spelling 0.51 0.56
124 CTBS test; Spelling 0.38 0.40
Carvr USA 1990 42 NRST test ; Reading 0.45 0.48

44 NRST test ; Reading 0.36 0.38
Majdub Libya 1991 75 Academic Achievement 0.39 0.41

128
Johnson et al., USA 1994 32 CTBS test; Reading 0.42 0.44
Pind et al., Icelandic 2003 53 INE scores; Math 0.50 0.54

59 INE scores; Math 0.75 0.97
51 INE scores; Math 0.64 0.67
INE scores; language 0.48 0.52
Laidra et al., Estonian 2007 364 Academic Achievement 0.54 0.60

The correlation between the SPM test and both academic achievement and a total of 6
module subtests are given below.
Table 4.6 the average of correlation between the SPM test and achievement tests
Sub-Tests N Z’ Means z to r
Academic achievement 6148 0.44 0.41
Vocabulary 356 0.33 0.41
Reading 1364 0.46 0.43
Language 1535 0.41 0.39
Maths 1298 0.54 0.49
Work-Study 356 0.39 0.37
Spelling 124 0.41 0.39
Total 11181 0.43 0.41
The highest correlations of the SPM test were with mathematics. This was in
agreement with the findings of most earlier studies. Carpenter, Just & Shall (1990)
showed that the SPM is largely a mathematical problem solving test in design format.
It requires the application of five mathematical rules involving addition, subtraction,
arithmetical and geometrical progression. Note, on the other hand, that the lowest
value of the correlations was with the vocabulary tests. This was due to the fact that
the SPM test is a non-verbal test.
129
4.8 Item analysis of the SPM test
Item analysis indicates which item may be too easy or too difficult and which may fail
for other reasons. Thus makes it transparent to discriminate clearly between the better
and the poorer examinees (Ebel 1972). Brown (1971) mentioned that item analysis
has two purposes: First it enables us, by identifying defective items, to improve our
test and evaluation procedures. Second, through indicating which items or material
students have and have not mastered, we can plan, revise, and improve our
instructions.
It is worthwhile knowing that both the validity and reliability of any test depend
ultimately on the characteristics of its items. High reliability and validity can be built
into a test in advance through item analysis (Anastasi and Urbina 1997).
Item analysis was used to study two characteristics:
a) Item difficulty: the proportion of students who answered an item correctly.
b) Item discrimination power: tells whether a particular item differentiates
between students who have greater aptitude with the material tested (Brown,
1981).
4.8.1 Item difficulty
In item difficulty, if most students answered an item correctly then the item was an
easy one. If most students answered an item incorrectly then it should have been a
difficult one (Brown, 1983). The higher the values of the difficulty index the easier
the item. This definition is somewhat illogical and has led some researchers to refer
to the index as an index of facility, or easiness, rather than as an index of difficulty
(Ebel, 1972 and Nunnally, 1972). Nunnally (1972) and Burroughs (1975) argued that
item difficulty is required because it is almost always necessary to present items in
130
their order of difficulty. The easiest is administered first so that to give a sense of
accomplishment and a feeling of an optimistic start.
4.8.2 Item discrimination
Item discrimination shows whether the test items differentiate between people of
varying degrees of knowledge and ability. It may be defined as the percentage of the
“high” group passing the item minus the percentage of the “low” group passing the
same item (Brown, 1983).
Test-items can be classified as positively discriminating, negatively discriminating, or
non-discriminating. A positively discriminating item is one in which the percentage of
correct answers is higher in the upper group than in the lower group. A negatively
discriminating item is one in which the reverse occurs. A non-discriminating item is
one in which the percentage of correct answers is about the same for the upper and
lower groups (Blood and Budd, 1972).
The correlation coefficient obtained from the point-biserial is a measure of item
discrimination. The point-biserial correlation, between “pass/fail” on each item and
the total test score, was used to explore the SPM item discrimination (Brown, 1983;
Anastasi 1988 and Anastasi, Urbina 1997; Roid and Barram 2004; Kline, 2000; Kline,
2005). The greater the correlation of the item the more discriminating it is. That is, it
discriminates between higher and lower groups more effectively. For an item to be
valid, its correlation with the total score should be fairly high.
Ebel and Frisbie (1991, p.232) believed that the more items classified as highly or
moderately discriminating the better the test. Burroughs (1975) showed that an item
which does not discriminate between these groups, upper and lower, contributes
131
nothing to the establishment of an order of merit. It may be useful for warming-up
purposes though. An item which is easier for weaker students than it is for good
students would not only be a very curious item, but also one that detracts from the
test’s rank ordering properties.
4.9 Review of previous studies that employed SPM test
The present study is making use of the SPM test as a measure of non-verbal reasoning
ability, “g”. It is important, therefore, to examine a number of relevant studies that
used the SPM test in a variety of settings including education, vocation, clinic and
anthropology. A total of 54 studies were carried out in 26 countries, 11 developed and
15 developing, between 1948 and 2009. The developed country with the highest
number of SPM studies conducted was the United States, with 15 studies. Its
counterpart in the developing countries was India with a total of 5 studies. The earliest
study was in the USA (1948) while the latest in Qatar (2009). For clarity and easy
reference, the above studies are organised in Table 4.7. A thorough description of
each of the studies mentioned in the table is given below it. After presenting the
description of the studies, critical analysis and examination will be given.
Table 4.7 A sample of worldwide studies that utilised the SPM test
COUNTRY YEARS REFERENCES
Congo 1994 Nkaya et al.,
Denmark 1968 Vejleskov,
Egypt 1987 Abdel-khalek,
Estonia 2004 Lynn, et al.,
France 1994 Nkaya et al.,
Hong Kong 1988 Lynn et al.,
Iceland 2003 Pind, et al.,
India 1968; 1968; 1972; Sinha, Mehot, Mohan, Rao and Sinha,
1974 and 1977
Iran 1974 Baraheni,
Israel 1991 Kaniel, & Fisherman,
Italy 1962 Young et al.,
Kuwait 2006 Abdel-Khalek and Lynn
132
Libya 1983;1991;2005 and Aboujaafer, and Majdub, Attashan and
2005 Abdalla and Ahlam
Mexico 2004 Lynn, et al.,
Nigeria 1980 Maqsud,
Oman 2009 Abdel-khalek and Lynn
Qatar 1986; 2009 Bart et al., ; Khaleefa, & Lynn,
Pakistan 2006 Ahmad, et al.,
Slovenia 2007 Boben
South Africa 2000; 2002; 2007 Rushton and Skuy, Rushton, et al., Taylor
Sudan 2008.b Khaleefa, et al.,
Syria 2008.a Khaleefa, & Lynn,
Tanzania 1967 Klingelhofer,
Turkey 1993 Duzen, et al.
UK 1962; 1962; 1963; Foulds & Dixon, Foulds et al., King, Lynn et
1988; 1989 and 1994 al., Egan and van den Broek and Bradshaw
USA 1948; 1966; 1968; Rimoldl, Bingham et al., Tulkin &
1969; 1972; 1973; Newbrough, Burke & Bingham, Burke,
1986.a.b; 1987; 1988; Mclaurin & Farrar, Powers et al., Sidles &
1988; 1986; 1986; Avoy, Jensen et al., Karnes & Whorton, Bart
1994 and 1994 et al., Whorton & Karnes, Johnson et al., and
Blennerhssett et al.,
The objectives of the investigation of these studies include the effects of the following
independent variables on the SPM test results: age, gender, variability, study levels,
region (cities and villages) and academic discipline (sciences and arts) and a
comparison of the reported results with those obtained in our study.
Since each study may investigate more than one variable, it was quite difficult to
group them under a certain variable. Alternatively the studies outlined in Table 3.2
will be discussed according to two categories; those conducted in developing and
developed countries. Whether a country is ranked among developed or developing
countries is based on the Human Development Index (HDI). This is an index
combining normalized measures of life expectancy, literacy, educational attainment
and GDP per capita. It is, HDI, claimed to be a standard mean of measuring human
development - a concept that, according to the United Nations Development Program
(UNDP), refers to the process of widening the options of people, giving them greater
133
opportunities for education, health care, income, employment, etc. The basic use of
HDI is to rank countries by level of "human development". The index was developed
in 1980 by the Pakistani economist Mahbubul Haq and Sir Richard Jolly with help
from Gustav Ranis of Yale University and Lord Meghnad Desai of the London School
of Economics. It has been used since then by the UNDP in its annual Human
Development Report. Nowadays the HDI is a pathway for researchers into the wide
variety of more detailed measures contained in the Human Development Reports.
The HDI combines three basic dimensions:
• Life expectancy at birth, as an index of population health and
longevity.
• Knowledge and education, as measured by the adult literacy rate (with
two-thirds weighting) and the combined primary, secondary, and
tertiary gross enrollment ratio (with one-third weighting).
• Standard of living, as measured by the natural logarithm of gross
domestic product (GDP) per capita at purchasing power parity (PPP) in
United States dollars (UNDP Human Development Annual Report
2007/2008).
The studies conducted in developed countries will be discussed first followed by
detail examination and evaluation. After that, studies conducted in developing
countries are evaluated. Similarly comments and analysis are given at the end.
4.9.1 Studies on SPM test in developed countries:
Rimoldi (1948) carried out the SPM test on USA children aged 9 to15 years. The
mean time for attempting the test for a population of 1680 subjects was 38 minutes
with a SD of 11.90. For the age of 9 (M = 19.32, SD = 9.18); 10 (M = 24.2, SD =
134
11.60); 11 (M = 28.82, SD = 10.49); 12 (M = 33.45, SD = 9.98); 13 (M = 35.90, SD =
9.59); 14 (M = 35.61, SD = 9.65); 15 (M = 38.59, SD = 9.57). These results illustrated
that SPM mean scores increased with age and there was a drop in the mean number of
problems solved from Set A through C, there was no significant difference between
the means for Set C and D, and there was a final drop in Set E. In addition, analysis
showed one factor common to all of the sets of the SPM test.
Two earlier studies carried out in the UK by Foulds and Dixon (1962) and Foulds et
al., (1962) with adult psychiatric patients concluded that males were significantly
superior to females in SPM test results. Another early study was that of Young et al.,
(1962) in Italy who applied the SPM test to a random sample of elementary school
children in two regions. The children age ranged from 9 years and 6 months to 14
years and 6 months. Results showed that boys obtained higher scored than girls in the
city (mean percentiles: boys 59.06, girls 49.39), while in rural areas, girls scored
higher than boys (mean percentiles: boys 42.03, girls 49.71).
King (1963) in another study also in the UK found significant sex differences in
favour of girls in the SPM test. The boys age mean was 10.6 years and their SPM
score mean was 35.5; SD = 11.5. The girls age mean was 11.2 years and their SPM
score mean was 38.5; SD = 12.0. In total sample the SPM mean score was 37.1; SD =
11.9. Bingham et al., (1966) studied a small sample of patients (N=39) referred to
Vocational Counselling and Psychological Service in the USA. The subjects ranged
in age from 20 to 52 (mean age 36.1 year, SD = 7.7). The SPM mean scores was
40.6, SD = 11.80.
Tulkin and Newbourgh (1968) administered the SPM test to 356 fifth and sixth grade
students, from the suburban Maryland school system in the USA, to determine the
135
effect of past experiences related to race, social class, and gender on performance in
the SPM test. They found the following SPM test means with the eight groups; for
white high class females (N=64) was 41.1, SD =8.18; for high class white males
(N=64) was 42.2, SD =5.81, for low class white females (N=32) was 30.6, SD
=10.48; for low class white males (N=43) was 30.7, SD = 9.94, for high class black
females (N=23) was 39.7, SD = 6.71; for high class black males (N=27) was 39.0, SD
= 8.43; for low class black females (N=53) was 26.3, SD = 10.98; for low class black
males (N=50) was 25.1, SD =11.79.
They concluded that: (a) gender differences were not significant, (b) higher social
class and white subjects showed significantly higher SPM test scores and (c)
significant differences between races on the SPM test were found only in the lower
class students. The black low class scored significantly below the white low class.
Vejleskov (1968) in Denmark with 628 fifth grade children from two cities found that
boys (N = 174) and girls (N = 192) in Gentofte city had the same score (39.9) on
SPM, while Esbjerg city girls (N = 137) scored slightly better than boys (N = 125).
Boys mean score was 37.4 whereas girls mean score was 38.2. Also Vejleskov
noticed that boys, in general, worked faster than girls on SPM test. The SPM mean
for the total sample in Esbjerg city was 37.8 (N = 262).
Burke and Bingham (1969) in the USA concluded a SPM mean score of 41.2, SD =
11.5 for a sample of 91 male patients referred for vocational counselling (mean age =
35.1 year).
Another study by Burke (1972) investigated 567 SPM answer sheets of veterans
(black and white) who had taken the SPM test when referred for vocational
136
counselling. Veterans means age was 35.5, SD = 9.1 months (age range 16 to 64
years). SPM mean score was 40.0, SD = 12.0.
Mclaurin and Farrar (1973) in their study on 96 male and 105 female university
students in America concluded that the SPM did not have sufficient ceiling for
university students as indicated by the closeness of the SPM mean score 50.39, SD =
6.50 to the maximum score possible. Vincent and Cox (1974) studied a sample of 380
psychiatric patients which was taken from psychological files of the Texas Vocational
Rehabilitation Unit. Most of the sample either had a physical, emotional, or mental
disability. The sample mean age was 28.7 year and consisted of 57 % white, 36 %
black and 7 % Latin Americans. The SPM mean score for the total sample was 39.25,
SD = 12.00. They concluded that the SPM test is a viable tool for measuring
intelligence in such population.
Bart et al., (1986) compared the performance of 273 Qatari students (151 boys with a
mean age of 12.97 years and 122 girls with a mean age of 12.63 years) on the SPM
test to that of 281 American students (150 boys with a mean age of 12.37 years and
131 girls with a mean age of 12.70 years) in the fifth, sixth and seventh grades.
American students scored higher (M=43.39) than the Qatari students (M=30.24), and
also they added that males students performed better than females and older students
tended to perform better than younger students. They did not report any data
regarding performance of students in both countries according to age, gender, or grade
level.
Powers et al., (1986.a) carried out a study in the USA on 127 Hispanic (69 boys and
58 girls) and 103 Anglo- American (53 boys and 50 girls). Mean age of students was
11.6 year. Students were enrolled in grade 6 of four elementary schools of a large
137
urban school district in the South west of USA. Hispanic and Anglo-American
students were compared for their overall scores on the SPM test. When the total mean
score of Hispanic students (M=38.43, SD = 7.45) was compared to that of the Anglo-
American students (M = 39.19, SD = 7.30), no significant differences were found.
Powers et al., concluded that these result support the continued use of the SPM test
with Hispanic and Anglo-American students.
In another study by Powers et al., (1986.b) in the USA to examine gender differences
in performance on the SPM test, they administered the SPM test to 212 sixth grade
students (116 boys and 96 girls) and 214 seventh grade students (109 boys and 105
girls). The ethnic background of the students consisted of Native American, Black,
Hispanic, and non Hispanic Caucasian. The students were from four schools that
ranged in socio-economic status from lower middle to upper middle SES in urban
school district in the South west of the USA. Sex differences in performance on SPM
test were examined at each grade level. Sixth grade boys' mean 38.81, SD = 6.84 did
not differ significantly from girls' mean 39.26, SD = 7.35. Seventh grade boys' mean
score of 39.48, SD = 8.06 and girls' mean of 38.88, SD = 8.21 also did not differ
significantly.
Sidles and Avoy (1987) administered the SPM test to 124 Navajo students (62 boys
and 62 girls, age 14 and 15 years), in seventh and eighth grade, in Arizona and New
Mexico. They reported that the raw scores mean for females was 39.85 and for males
was 39.88. Mean score for seventh grade students was 38.83, while the mean for
eighth grade students was 40.11. Mean score of SPM test for total students was 39.86.
They noticed that this mean was lower than that obtained for the United Kingdom
students of similar age group during the 1981 standardisation of the SPM test. They
138
concluded that the SPM test had potential for being included by school psychologists
in their psycho-educational test battery as a measure of intellectual ability of
adolescent Navajo students evaluated for special education or gifted programs.
Lynn et al., (1988) carried out a study in the UK and Hong Kong with 120 boys and
77 girls from Hong Kong and 75 boys and 95 girls from the UK. The students mean
age was 10.5 years, and the British students were Caucasian. They found that, the
Hong Kong boys and girls both obtained significantly higher mean on SPM than their
British counterparts. The Hong Kong boy’s SPM mean percentile was 71.48; SD =
20.00 and Hong Kong girls’ SPM mean percentile was 68.44; SD = 21.34. The higher
mean obtained by Hong Kong boys as compared with Hong Kong girls was not
significant. British boys and girls in this study obtained identical means equivalent to
percentile of 51.72; SD = 28.84 for boys and 28.62 for girls.
In the USA, the SPM test was administered by Jensen et al., (1988) with time limits of
40 minutes to a total of 261 undergraduate’s students. The overall SPM mean was
51.32, SD = 4.69.
With 307 students in grades 3 through to 8 in a rural county school system in
Mississippi US, Whorton and Karnes (1988) found that the SPM mean for the total
sample was 32.2, SD = 11.2. The sample consisted of 70 black and 237 white
students; 142 were girls and 165 boys. The mean age was 10.8 years with a range
from 8.3 to 15.7 years. For black students the SPM mean score was 25.4, SD = 9.9
(N=70). The SPM mean score for white students was 34.3, SD = 10.7 (N=237). The
means difference between students on the basis of race was significant. In another
study also in Mississippi by the same researchers (1988) the SPM was administered to
625 students in a rural a county elementary school (grade 3 to 8). 441 white students
139
and 211 black students with a mean age of 8.10 years carried out the test. Of them 410
students were on free or reduced lunches, and 245 students on paid lunches. The SPM
mean for students on free lunch was 29.7, SD = 10.9 whereas for students on paid
lunch was 35.6, SD = 10.8.
Egan (1989) in the UK with a sample of 94 (43 male and 51 female) trainees, with a
mean age of 16.7 years, SD = 9.7 months that had been unemployed for 6 months
following leaving school, administered the SPM with a 30 minutes time limit. The
SPM mean for the total sample was 36.5, SD = 9.9; the SPM mean for males was
38.4, SD = 9.8 and for females was 34.6, SD = 9.8. Gender difference was not
significant.
The second investigation about the SPM in Libya was carried out by Majdub (1991)
who administered the SPM to two groups that consisted of 193 students (68 males and
125 females) from Tripoli University. He found that the Education major group had
significantly higher means than the Arabic major group. For the Arabic major group
the SPM mean was 34.40, SD = 9.13 (N=81). For the Education major group the SPM
mean was 39.14, SD = 9.08 (N=112). Majdub concluded that differences between the
two groups with respect to SPM, in favour of the education groups, maybe due to the
familiarity of the education group with solving abstract problems.
Nkaya et al., (1994) claimed that comparisons of intelligence test scores of individuals
from developed countries to individuals from developing countries have always
shown high disparities in favour of western subjects regardless of the type of the test.
For example, they administered the SPM test three times to students in France and
Congo, to obtain the classic improvement in scores at retest. Participants were 88
Congolese (45 boys and 43 girls with a mean age of 13.3 years) and 68 French (36
140
boys and 32 girls with a mean age of 12.3 years) who were in the sixth year of
schooling. Neither the French nor the Congolese students had ever been administered
an intelligence test. The test situation, however, was much more familiar to French
students due to exposure to material and educational games similar to materials used
in intelligence tests, which was not the case in Congo.
The SPM test was administered to the same standards three times (T1, T2 and T3) at
two weeks intervals. The test was self-paced but students were encouraged to work
rapidly. Time and items solved correctly after 20 minutes were recorded. For self-
paced conditions, the SPM test means scores for French students in test 1 was 46.9,
SD = 5.9; test 2 was 49.4, SD = 4.9; and test 3 was 49.1, SD = 4.6. For the Congolese
students SPM test mean for test 1 was 29.6, SD = 11.6; test 2 was 33.0, SD = 11.9 and
test 3 the mean was 32.5, SD = 12.0. The means of the SPM test for timed condition
for French students in test 1 was 40.4, SD = 5.2; test 2 was 48.0, SD = 5.2 and test 3
was 48.5, SD = 5.0. For Congolese the SPM test timed mean in test 1 was 23.5, SD =
9.3; test 2 was 29.5, SD = 11.1 and in test 3 was 32.0, SD = 12.1.
They concluded that student’s scores increased more rapidly from test 1 to test 2 than
from test 2 to test 3 especially when the test was timed (7.6 points increase for French
and 6 points increase for Congolese). There were no improvements for the French
self-paced mean between test 2 and test 3 (- 0.3 points) and 3.4 points increase for
Congolese. There was little improvement (0.5 points) in the mean for timed condition
for French students between test 2 and test 3, and for the Congolese there was an
increase of 3.4 points. From test 1 to test 3 with timed condition there were 8.1 points
increase For French and 8.5 points increase for Congolese students. In general the
141
performance on SPM test was higher for French students than for Congolese students
for both self-paced and timed testing.
In a study by Johnson et al., (1994), a sample of 449 second, fifth and seventh grade
students in San Diego city school were given the SPM test. In this group, 77 students
were African American, 122 Asian, 54 Filipino, 156 Latino and 40 White American.
Of these 215 were boys and 234 were girls. The mean age of the children was 11
years (age range from 6 years, 8 months to 13 years 10 months). The SPM mean
score was 36.10, SD = 11.52.
In the UK, van den Broek and Bradshaw (1994) administered the SPM to normal and
patient samples. The normal sample was 77 subjects (58 females and 19 males), all of
them were native English speakers and none had a history of psychiatric or
neurological disorder. The patient sample was 75 native English speaking (42 male
and 33 females). The patient sample was allocated to one of three groups: left-
hemisphere (N=24), right-hemisphere (N=34) or bilateral lesions (N=17). The mean
age for normal sample was 35.2 year, SD = 12.8 months, for left-hemisphere 48.3
year, SD = 16.7 months, for right- hemisphere was 48.8 year, SD = 17.1 months and
for bilateral lesions was 60.4 year, SD = 12.4 months. The SPM mean scores for the
normal sample was 47.3, SD = 8.2; for bilateral sample was 21.2, SD = 11.2; for left
sample was 33.8, SD = 12.6; and was 30.0, SD = 14.5 for the right sample.
For the use of the SPM with deaf subjects, in a survey by Levine’s (1974) the
Ravens’s Matrices test ranked in the top ten for frequency of use with deaf subjects.
Armfield (1985) administered the SPM to 240 deaf/mute students from South China
and concluded that the SPM appeared to be helpful as a tool for teachers making
individual educational plans for students.
142
A study by Blennerhssett et al., (1994) with 102 deaf residential adolescents showed a
SPM test mean of 33.98, SD = 10.80. The mean age was 14.7 years with a range
from 10 to 19 years. They concluded that the SPM test appeared to be suitable for
assessing non-verbal intelligence of children with hearing impairments, and was
especially useful when a quick screening technique was needed for deaf adolescents.
Pind et al., (2003) carried out the SPM test on Icelandic school children aged 6 to16
years. A total of 665 children were tested and the standardization sample consisted of
550 of the 665 children. The median total score rose from 23 in the 1st grade to 50 in
the tenth grade. Scores increased regularly with increasing age. Icelandic norms were
higher 2 to 3 points than UK norms. Performance of girls and boys on the SPM was
compared. Average score of girls in the standardisation sample was 40.1 with boys
receiving on average a score of 39.4. A two-way analysis of variance (gender grade)
showed a significant effect of grade, F (9,530) = 66.95, P<0.0001. The effect of
gender was not significant, F (1,530) = 0.61, P=0.434, as was the interaction of gender
and grade, F (9,530) = 0.65, P= 0.759. The effect of geographical district was also not
significant, F (7,542) = 0.89, P=0.516. It was concluded that grade, or age, was the
only factor in this study which had a significant effect on the children’s SPM score.
Lynn et al., (2004) conducted an SPM test on an Estonian sample to investigate any
sex difference. 2738 adolescents (1250 male and 1439 female) attending 6th, 8th, 10th,
11th and 12th grades carried out the test. Overall, females obtained a higher mean than
males. Female obtained higher means by (3.8 IQ points) than males in the ages of 12
to 15 year, whereas males obtained higher means by (1.6 IQ points) than females in
the age average 16 to18 year. Overall, males had statistically significant larger
143
variance than females. Also lrwing and Lynn (2005) established sex differences on the
PM among university students. Men obtain significantly higher scores than females.
In 2007, Duzen et al., began the process of standardization of the SPM test in Turkey
in an aim to identify gifted children. An overall 2458 students were tested (1170 girls,
1288 boys; aged between of 6½ to 14½ years) 1341 students were from rural origins
while the reaming 1117 were from urban. Results obtained showed that students from
urban origins obtained significantly higher scores than students from rural origins:
they also showed that grade predicts SPM scores more accurately than age.
In 2007, Boben conducted an SPM test on 1,556 children and adolescents aged 7.5 to
18 years in Slovenia 53% were male students. 9 items were shown to be misplaced in
difficulty (A6, A9, A10, B9, B10, B11, C5, C7, C9). Both Cronbach Alpha and split-
half tests showed a (0.95) reliability. This study showed that subgroups differed in
statistically significant ways in relation to sex (F =13.13, p = 0.00) and age group (one
year intervals) from 8 to 18 years (F = 76.48, p = 0.00), but not in the interaction
between them (F = 0.65, p = 0.77). A more detailed analysis showed that sex
differences occurred only in older age groups. T-test revealed statistically significant
differences for age groups of 16-year olds (p = 0.02), 17-year olds (p = 0.01) and 18-
year olds (p= 0.04). Nevertheless, statistically significant differences regarding sex
were not confirmed.
Some important features are to be noted about these studies. In the first hand most of
the studies have selected their samples randomly and with adequate sizes. Few studies
have not mentioned their selection procedures, such as Mclaurin and Farrar 1973 in
USA; Vancent and Cox 1974 in USA; and Brook and Bradshaw 1994 in UK. In some
studies, neither the sample size nor selection criteria have been reported. Examples of
144
such studies are: Young et al., 1962 in Italy; King 1963 in UK; and Mclaurin and
Farrar 1973 in USA. Since the larger the sample size the more representative it is of
the behaviour domain, a total of 5 studies have taken advantage of this fact. These
include: Lynn et al., 2004 in Estonia; Duzen et al., 2007 in Turkey; and Boben 2007
in Slovenia. Moreover they have applied advanced statistical procedures such as
factor analysis, Two-Way Analysis of Variance and Multiple regression stepwise
analysis.
So far the analysis was concerned about the sample selection and size. Now the
attention will be paid to the characteristics of the samples themselves. Along with
healthy people, a number of SPM tests were conducted on patients with physical and
psychological disabilities. These involve hearing impairments and mental disorder
patients. These types of studies were not included in the meta-analysis chapter
(chapter 6). Other studies took into account various variables such as the economical
status of the subjects. The criteria upon which lower and upper classes were
distinguished were not among those adopted in the field of economics though. As an
example, the study conducted in the USA by Kernes and Whorton (1988) on a sample
of students classified them into two categories: those on paid launch representing the
upper class; and those on free launch representing the lower class.
As a final remark, only 2 out of 10 studies performed their SPM test on rural and
urban residents. These were carried out by Duzen et al., 2007 in Turkey and Young
1962 in Italy. This element, difference between urban and rural lives, had a noticeable
effect on the SPM test. Ignoring it will render the sample ill-representative.
145
4.9.2 Studies on SPM test in developing countries:
Klingelhofer (1967) administered the SPM test with a time limit of 30 minutes to
African and Asian secondary school students in Tanzania. The African sample
consisted of 2963 students (2125 males and 838 females) and the Asian sample
consisted of 729 students (415 males and 314 females). The mean age for the four
groups were; African boys 17.1 years, African girls 16.1 years, Asian boys 14.8 years
and Asian girls 14.3 years. The SPM test mean scores were 34.3 for African boys,
34.1 for African girls, 43.9 for Asian boys and 41.7 for Asian girls. There was no
statistical significant difference in mean scores between African boys and girls, and
no statistical significant difference was found between the African tribes in
performance on the SPM test. There was a significant mean difference between Asian
and African students in favour of Asian students, also Asian boys scored better than
Asian girls. Klingelhofer, claimed that the significantly better performance of Asians
than Africans on the SPM was probably associated with a number of cultural factors
that differentiate the two group. e.g. Asian children start school early, have literate
parents and live in towns where they have daily contact with stimuli of modern life,
whereas African come from rural environment and low income families.
Sinha (1968) reported the following means for both sexes from rural and urban
population from India. For rural boys the SPM mean scores were 22.50 at age 12
years; 26.50 at 13 years and 27.10 at 14 years. For urban boys the SPM mean scores
were 24.00 at 12 years; 27.40 at 13 years and 29.10 at 14 years. For rural girls the
SPM mean scores were 26.83 at 13 years and 30.00 at 14 years (no data for age 12).
For urban girls the SPM mean scores were 25.50 at age12 years; 28.90 at age 13 years
and 30.10 at age 14 years. Sinha concluded that urban children scored higher than
rural children, and girls scored higher than boys in both rural and urban areas. In the
146
same study Sinha reported that the SPM mean score for Art-Science students was
47.84, SD = 4.46 (N=220) while the SPM mean score for Engineering students was
54.03, SD = 3.61 (N=204). Both samples were from Tirupati, India.
From India, Mohan (1972) administrated the SPM test to 310 university and college
students (165 females and 145 males) with an age range of 18 to 25 years. Mohan
reported the following means; for males mean score was 46.48, SD = 7.32; the mean
score for females was 43.88, SD = 7.70. Mohan found that the mean score of 45 on
SPM test corresponds to the 50% as given by Raven for the age range 14 to 25. Also
there was significant difference on SPM test scores favouring male students.
Another study from India by Rao (1974) administered a shortened version of the SPM
test (45 items instead of 60 rearranged in graded order of difficulty) to different
college students with a mean age of 18.10 years. Rao found the following means; the
mean for Engineering students mean (N=452) was 54.14, SD = 3.9; Agricultural
students (N=207) was 46.42, SD = 6.55; Science students mean (N=769) was 45.18,
SD = 7.82; Education students (N=219) was 42.84, SD = 8.51; Art students (N=487)
mean was 41.28, SD = 8.30; and Commerce students mean (N=122) was 39.76, SD =
8.19. Also, Rao compared the SPM test means of high and low academic achievers
and found that the mean of high achievers (N=106) was 53.26, SD = 3.04; while the
mean of low achievers (N=106) was 51.37, SD = 3.87. In the same time the mean
scores of high achievers in the achievement test was 18.32, SD = 3.2; while the mean
scores of low achievers in the achievement test was 2.48, SD =1.3. In comparison
between SPM and achievement tests, Rao concluded that the SPM test scores failed to
discriminate between the high and low academic achievers. Nevertheless he claimed
147
that the Standard Progressive Matrices test was as good as any other test of
intelligence in predicting scholastic performance.
Baraheni (1974) carried out a study in Iran. The study was designed to cover a
representative sample of students (N=4561) from age 9 to 18 years, attending primary
and secondary schools in Tehran. Baraheni found that Iranian boys scored higher on
the SPM test than Iranian girls. The differences were statistically significant from age
9 up to 13 years. He mentioned that the slight superiority of boys over girls on the
SPM test might reflect the fact that progressive matrices measures, in addition to a
general factor, a spatial dimension in which boys have been found to excel girls. He
also added that although a steady increase in SPM test scores was observed at
successive age levels, both in males and females, the magnitude of differences at
some age levels was very small, especially after 15 years of age. Baraheni claimed
that this steady increase in average performance which was significant up to age 15
was in accordance with data reported by Raven. The SPM mean for age 17 years was
37.93; SD= 11.41; and N=256. The SPM mean for age 18 years was 39.36; SD=
10.34 and N = 304. Baraheni concluded that on the basis of his data, the SPM test
was an efficient test of general intelligence for use with Iranian children.
Sinha (1977) also from India administered the SPM test to an indian sample which
consisted of 100 boys and 100 girls aged 11 to 15 years. Sinha, reported the following
total means for the performance of students on SPM test according to age; for age 11
years mean was 27.25, SD = 9.30; for age 12 years was 27.25, SD = 8.90; for age 13
was 30.30, SD = 10.50; for age 14 years was 33.00, SD = 9.40; and for age 15 years
mean was 32.25, SD = 11.20. Sinha concluded that with increase in age, there were
some increases in SPM test means for Indian students from age 11 to 14 years. Also
148
the means of the Indian students were very low compared with Raven's British norm
for children at the same age. In the same study, Sinha found that science students
scored higher than art students on the SPM test in Indian sample. In addition, he
reported that Shanthamani’s in1970 found similar results on Alexander’s Battery for
intelligence test.
Maqsud (1980) in Nigeria administered the SPM test to 120 primary school students
with an average age of 12.2 years for the students in a modern school and 12.6 years
for the students in a traditional school. Sixty students were randomly drawn from a
modern school (upper-middle class homes), and 60 from a traditional school (lower-
middle and lower class families). The mean score of the SPM test for students from
the traditional school was 23.25, SD = 3.49 while the mean score for students from
the modern school was 20.85, SD = 4.27. The mean score of SPM test for students
from the traditional school was found significantly higher than for students from the
modern school.
The first investigation of the SPM in Libya was that of Aboujaafer (1983) who
studied pupils’ achievement in preparatory schools in Tripoli. The SPM test was
administered to a sample of 201 boys and girls who were in grade 8. The age mean
was 14 years. The boys SPM mean was 35.40; SD = 10.40; (N=100). The girls SPM
mean was 33.50; SD = 10.80; (N=101). The SPM mean for the total sample was
34.50; SD = 10.60; (N=201). The difference between boys and girls means was not
significant.
Abdel-khalek (1987) in Egypt administered the SPM test to 452 university
undergraduates, 205 males with a mean age of 24 years and 247 females with a mean
age of 23 years in the departments of Psychology, Anthropology, Geography, Arabic
149
Language and English Literature. Mean scores for males was 44.2, SD = 7.8; while
for females was 40.8, SD = 8.4. Abdel-khalek claimed that gender differences which
emerged in the study may be related to social factors in an eastern society, but did not
mentioned these factors. He stated that, in brief, the SPM test may provide a
promising tool for measurement of non-verbal intelligence in an Egyptian context.
Kanil and Fisherman (1991) compared the performance of 250 Ethiopian Jews (115
boys and 135 girls, with average age of 14.7 years) on the SPM test to that of 1740
Israeli Jews ages 9 to 15 years. The mean for Ethiopian Jews aged 15 and 16 years
was 27.0, whereas mean for Israeli children aged 9 and 10 years was 28.0, and mean
for Israeli aged 14 and 15 years was 45.0. They concluded that the SPM test mean for
the Ethiopian Jews aged 15 and 16 years was very similar to the mean of Israeli aged
9 and 10 years. They added that when the two culture groups were roughly matched
for total score in the SPM test (mean score obtained by 9 year old Israelis and 14 year
old Ethiopians); they exhibited the same pattern of distribution of errors in the SPM
test. They claimed that these results suggested that the performance of Ethiopian
Jews reflected a developmental delay, and not a different cognitive style. They added
that the SPM test scores merely told us how Ethiopian Jews compared to the Israeli
children at this point in time, but they did not tell us about their response to new
learning situations.
Rushton and Skuy (2000) carried out a SPM test to 309 students (17 to 23 years) in
South Africa (137 Africans, 136 whites; 104 men, 205 women). The test aimed to
compare performance between african and white students. Analysis of variance
(ANOVA) with race and sex as factors showed significant main effects and a
marginally significant interaction, F (1,305) = 131.85, p < 0.001; F (1,305) = 8.89, p <
150
0.01; and F (1,305) = 3.67, p < 0.10. Men averaged higher scores than women (M =
50.47; SD = 7.9) The 1993 US norms for 18- to 22-year-olds show that White men,
with 54 out of 60 correct responses, averaged at the 61st percentile; and that White
women, with 53 correct responses, averaged at the 55th percentile; and that African
men, with 46 correct responses, averaged at the 19th percentile; and that African
women with 42 correct responses averaged at the 11th percentile. These SPM grades
and percentile points were converted to IQ equivalents of 105 for Whites and 84 for
African. Males also averaged slightly higher than females. In addition, item analysis
(difficultly and discrimination) was carried out. Percentages were used to calculate
item difficulties between whites and africans across the 60 items. For all groups, set E
was the most difficult followed by set C and then D. Sets A and B were the easiest.
Using a proportion of 70 percent of respondents passing as the criterion for judging an
item as ``too easy,'' 54 of the 60 items (90%) proved as being too easy for Whites and
41 of the 60 items (68%) too easy for Africans. Overall, Africans found the items
more difficult than did the Whites, as did women compared to men. For calculation of
item discrimination, “items-total correlation” (point biserial) was utilised. According
to Hopkins (1998) Index of Discrimination and Items Evaluation, the number of items
that were considered as having excellent discriminating value was 41 items for
africans and 13 for whites, good discriminating value were 10 items for africans and 7
for white and fair discriminating value were 6 items for africans and 18 for whites.
In 2002, Rushton et al., administered the SPM test to 342 university students (198
African, 86 whites, 58 Indians; 271 men and 71 women). The White, Indian, and
African mean scores were, in order, 56, 53, and 50 out of 60 (S.D. = 2.6, 4.9, 6.4;
ranges = 46–60, 37–60, 11–60). Men averaged similar scores to women (unweighted
means = 52.9, 52.5; S.D. = 5.0, 3.3; ranges = 11–60, 35–60). Analysis of variance
151
(ANOVA) with race and sex as factors showed a significant main effect only for race,
with no effect for sex either as a main effect or in interaction, F(2,342) = 24.23, P
< .001; F(1,342) < 1.00; and F(2,342) < 1.00. For the total score, the African–White
difference was 1.00 S.D. (based on total S.D. of 6.05). The 1993 USA norms for 18 to
22 years showed the Whites at the 75th percentile, the Indians at the 55th percentile
and the Africans at the 41st percentile. These translated into IQ equivalents of 110,
102, and 97, respectively. Item analyses were measured by the proportion getting the
correct answer. Item analyses was very similar for Africans, Indians, and Whites
(r > .90; r >.79, P < .01) suggesting that the test measured the same construct in all
three groups. Using a proportion of 70% of respondents passing as the criterion for
judging an item as ‘‘too easy’’ 57 of the 60 items (95%) proved too easy for Whites,
53 or 88% for Indians, and 50 or 83% for Africans. Also the item-total correlation for
each item was calculated using the point-biserial correlation of each item’s pass or fail
status (0 or 1) with the total score on the test.
Lynn et al., (2004) carried out a sex difference SPM test in Mexico. The SPM was
administered to a sample of 920 (aged 7 to 10 years old) children (472 males and 448
females) from three different ethnic groups. Analysis of variance showed a statistical
significant age affect (SPM scores increased with age), and no statistical significant
gender affect. This study showed a very small overall gender difference in the SPM
results, with an increasing advantage of girls as time increases.
A third investigation conducted in Libya was carried out by Ahlam (2005) to evaluate
the relationship between intelligence and high school students’ academic
achievement. An SPM test was conducted on 240 (16 and 17 years) students (120
males and 120 females). Mean scores obtained for males was (M=38.31 and
152
SD=8.53) whereas that for females was (M=35.68and SD=7.73). Total mean scores
was (M=37.00 and SD=9.23). Results showed gender difference in favour of males.
Also analysis showed the correlation between SPM mean scores and students’
academic achievement was (r=0.45 p = 0.01).
A fourth investigation in Libya was carried out by Attashan and Abdalla (2005) to
examine the relationship between intelligence and university students’ academic
achievement. The SPM was conducted on 510 undergraduate university students.
Mean scores obtained for males was (M=40.50 and SD=8.80) whereas that for
females was (M=40.21and SD=9.62). No significant gender difference was found. On
the other hand, arts students mean scores was (M=35.82 and SD=8.09) while that of
science students was (M=44.54 and SD=7.73). Significant difference in the mean was
in favour of science discipline students. Total overall mean scores was (M=40.36 and
SD=9.21). In addition, analysis showed the correlation between SPM mean scores and
students’ academic achievement was (r=0.35 p = 0.01).
Abdel-Khalek and Lynn in 2006 investigated sex difference on the SPM test in
Kuwait, on a sample of 6,529 (8 to 15) year old students (boys 3278 and girls 3251)
from six different districts in Kuwait. In each district, one socially representative
elementary, intermediate and secondary school for boys and one for girls were
randomly chosen from a list of schools. Children were tested in classes which were
randomly selected. The selection of school districts used a stratified random sampling
procedure. Study results showed that girls obtained significantly higher means then
boys among 8,9,10 and 14 year olds. No statistically significant differences were
found among 11, 12, 13 and 15 year olds. Overall girls’ advantaged in the total
sample statistically significant higher mean scores (M = 35.75 SD = 11.49) than boys
153
(M = 34.81 SD = 12.11) p = < 0.001 although it is very small at .08d, equivalent to
1.2 IQ points. This difference was attributed to possible sampling bias.
Taylor in 2007 carried out a study in South Africa on 144 female and 199 male job
applicants, of whom 46.9% were Black and 41.8% White. The average age was 33.8
years. The mean SPM scores was (M=44. 65, SD=11.94). Scores on the SPM were
compared across gender and ethnic groups using an independent samples t-test. Males
scored a mean SPM value of (M=44. 69, SD=12.64) whereas females scored (M=44.
45, SD=11.28). The results of the t-test across gender groups showed that there were
no significant differences on the SPM score. The black ethnic group scored a mean
SPM value of (M=41. 20 and SD=13.06) whereas the white ethnic group scored a
mean SPM value of (M=48. 21 and SD=9.33). The White group on average scored
significantly higher than the Black group. Although this finding may cause some
concern at first, it is important to consider the context in which the test was
administered.
Kaleeefa and Lynn (2008a) carried out a standardization of the Standard Progressive
Matrices in Syria on a sample of 7 to 18 years. A total of 3489 participants carried out
the test (1739 male and 1750 females). Results showed no sex difference. There was
no consistent pattern in sex differences among age groups.
It has frequently been asserted that there is no sex difference in general intelligence
but that males have greater variability than females. This assertion was made in the
early years of the twentieth century by Havelock Ellis (1904), Thorndike (1910) and
Terman (1916). This difference in variability was proposed by these early
investigators to explain why men are so greatly over-represented among geniuses.
When they found that there is no sex difference in general intelligence, a greater
154
variability among males entailing more males among those with very high intelligence
(as well as more males with very low intelligence) seemed to provide a solution to this
problem. Kelefeeh and Lynn investigated sex difference in variability. There was no
consistent answer. Overall, girls had greater variability than boys. In 7 age groups
boys had greater variability whereas girls had greater variability in 4 age groups. In
the sample considered as a whole; girls had greater variability than boys. This study
also showed that average SPM scores were lower in developing countries when
compared to developed countries.
Khaleefe et al., (2008b) carried out a standardization of the Standard Progressive
Matrices in Sudan for 6202 participants for ages 9 through to 25 years. They analysed
the data for sex difference in mean and variability. The study showed no sex
difference at ages 9 through 13. Females obtained statistical significantly higher
means from age 14 through to 18. At 19 years, males did not have significantly higher
means. At 20 to 25 years, males obtained statistically significant higher means. In
addition, results showed no consistent sex difference in variability. Males had greater
variability in 7 age groups whereas females had greater variability in 5 age groups,
Ahmad et al (2008) conducted a study to standardize SPM test in Pakistan during
2004 to 2006. The sample consisted of adolescents aged 12 to 19 years and adults
aged 18 to 45 years. The adolescents (N=1,662) were selected from representative
schools in four provinces into which Pakistan is divided (North West Frontier,
Baluchistan, Sindh and Punjab) and were tested in groups. The adult sample consisted
of 2,016 participants (1,019 females and 997 males). The results overall suggested
negligible gender differences in the mean performance on the SPM in Pakistan. In
addition, in most age groups, females had greater variability than males. The mean
155
scores of the Pakistani sample were lower than those obtained by standardization
samples in UK and the USA.
Abdal-Khalek and Lynn 2009 investigated the SPM on 5,139 school students aged
9 to 18 years with approximately equal numbers of males and females, drawn from
representative school students and 92 university students (43 male and 49 female)
in the capital city of Oman (Muscat). They reported an average of 85 for school
students and 93.7 for university students. There were no significant gender
differences among the 9 to17 year olds, but at age 18 years males obtained a higher
mean of approximately 2.5 IQ points. Among university students males outscored
females by approximately 5 IQ points.
Khelefeeh and Lynn (2009) conducted a study to evaluate the SPM test norms in a
Qatari standardization sample, 1135 students aged 6-11.5 (male N = 517 and female N
= 618) were tested. Although an IQ of 78 was reported in an earlier study in Qatar,
this study reported an average IQ of 88. This difference was attributed to possible
sampling administration errors. This study confirmed previous studies conducted on
the Middle East that failed to showed greater male variance in SPM scores. This study
showed in total sample that females obtained higher mean scores (M = 25.7 SD =
11.34) than males (M = 23.7 SD = 9.98). Furthermore, the analysis showed that SPM
means score increased with increasing age.
Generally the studies performed in the developing countries had clarified the sample
selection procedures in details including random selection and large sample sizes.
Abdel-khalek and Lynn in Kuwait (2006), for example, carried out an SPM test on a
number of 6529 students; Khaleefe et al., (2008b) tested 6202 subjects including
children and adults. Comparing to the studies in the developed countries, the largest
156
sample composed of 2738 children in Estonia managed by Lynn et al., (2004).
Furthermore, the analytical methods employed in many studies were identical to those
used in the developed countries studies. Lastly, it should be noted that more modern
studies have been conducted in developing countries than developed ones.
Although the studies in the developing countries had covered various variables and
mounted to the developed countries standards, they had a number of drawbacks.
Firstly, some studies lacked the description of the sample in terms of sample age
involved, such as Shin (1977) in India, Majdub (1991) in Libya; selection
procedure such as Klingdhfer (1967) in Tanzia, Mohan (1972) in India. In terms of
the differences between rural and urban areas, only one study evaluated this
variable (Shinha (1968) in India).
Unlike studies performed in the developed world, there was a study among those
done in the developing world that had employed an incomplete SPM test. Rao
(1994) in India had used 45 test-items out of 60 items, which were designed for the
test. Accordingly, this study cannot be included in the meta-analysis chapter as it
had a lower number of test-items.
This chapter aimed at providing a detailed, self-consistent and comprehensive
account of the SPM test. It served as a complete introduction to the history,
literature, psychometric characteristics and applications of the SPM test. Extensive
review of earlier studies has revealed that the SPM test is, without any doubt, a
reliable and valid psychological test. It is particularly powerful in the domain of
mental ability and intelligence.
157
The Progressive Matrices Tests resulted from the work of the British psychologist
John C. Raven and geneticist Lionel Penrose around the thirties of last century.
Their work was based on Spearman’s two-factor theory. Raven Progressive
Matrices are probably amongst the most widely used culture-fair tests. They exist
in three forms; SPM test, CPM test and APM test.
The SPM test is a non-verbal ability test consisting of increasingly difficult sets. It
was first fully standardised by Raven for children. Later on, the test was re-
standardised for adults. Standardisation took place in different countries both in the
developed and developing world. Since its introduction, several checks were run to
determine its norms accuracy.
Literature showing the reliability, validity and item analysis characteristics of the
SPM were presented and discussed. To determine the reliability of the SPM test
accurately a single technique is not sufficient. Therefore three methods have been
used in literature: test-retest reliability, split-half reliability and Cronbach’s alpha
reliability. The average scores of the three tests were found to be 0.93 for test re-
test after two weeks interval; 0.90 for split-half test; and 0.95 for alpha (Kuder-
richardson 20) test
Likewise, to firmly establish the validity of the SPM test one should look at the
following three types of validation procedures; content validity, criterion-related
validity and construct validity. It was found that the SPM test can be used in cross-
cultural contexts due to its culture-fair reliability. The majority of the examined
studies showed that the SPM test is a measure of the intellectual ability “g” only
with no other factors.
158
Furthermore literature showed that the correlation of the SPM concurrent validity
with standard intelligence ranged from 0.50 to 0.80. Whereas, the SPM predictive
validity correlation with academic achievement tests generally fell in the region of
0.20 to 0.60.
Studies that focused on item analysis, item difficulty and item discrimination, of
the SPM test were presented. Those which employed the SPM in different cultures
were also mentioned and evaluated. It can be concluded that the SPM test has been
used extensively in various fields including educational, vocational, clinical and
anthropological all over the globe. This is essentially due to its high degree of
reliability and validity as well as its culture-fair features.
Next chapter will focus on the work flow of this study. It will shed light on the on-
ground tests conducted and their related work. In addition, it presents the
methodology adopted in the research, materials such as statistical software and the
data analysis pipeline.
159
Chapter five: MATERIALS AND METHODS
5.1 Introduction
This chapter outlines and critically analyzes methods and approaches employed in this
study. Chosen methodologies were explored and contributions offered were also
subjected to critical appraisal. Statistical techniques for data analysis were justified
and evaluated for their suitability. Ethical issues relating to data collection and data
analysis were considered.
5.2 Research design
The intent of any research is to create new knowledge through systematic enquiry.
Research is governed by scientific principles that vary from one discipline to another
(Gomm & Davies, 2000). Quantitative research approaches are applied to describe
current conditions, investigate relationships, and study cause-effect phenomena. A
quantitative research approach was used in this study due to the numerical nature of
the data and large sample size tested. Qualitative research methods were not
appropriate for this study as the only available method to measure intelligence was by
conducting a test. Quantitative research designs can be divided into experimental and
non-experimental designs. In experimental research, at least one independent variable
is manipulated, while the remaining variables are controlled, and the effect on one or
more dependent variables is observed. As there was no manipulation of variables in
this study it was classified as a non-experimental study. Furthermore, the broadest
category of non-experimental designs was the survey and correlational designs, which
was employed in this study (Gay 2006, and Lobiondo-Wood, Haber 2006).
160
5.3 Methodology
Two main activities were employed in this study: first, a survey using the standard
progressive matrices (SPM) test was conducted to obtain preparatory data from a
Libyan sample. Second a meta-analysis was performed to compare the SPM test
results with studies from other countries.
In survey designs, subjects are selected and an investigator carries out a test,
questionnaire or conducts interviews to collect data. It is used frequently in
educational research to describe trends, determine opinions, identify group
characteristics, understand attitudes and beliefs, identify practices, evaluate programs
and other types of information (Creswell, 2000). Usually, research is designed so that
information regarding a large number of people (population) can be inferred from the
responses obtained from a smaller group of subjects (sample) (James, 2006). In
addition, correlational designs are useful when exploring new topics, or topics that
have not been sufficiently investigated (Cohen & Manion, 1994).
In this study, quantitative research designs (descriptive and comparative survey,
correlational and cross-sectional) were used. A descriptive design employing
frequency distributions, means, standard deviations and charts for the obtained sample
was carried out to present an overview regarding performance in the SPM test and to
compute percentile ranks (norms) according to sample age levels (8 to 21 years old).
A comparative design was used to study whether significant differences existed
between sample performances on Raven’s Standard Progressive Matrices test
according to their gender, age groups and regions (developing and developed
countries, and urban (cities) and rural (villages)). A correlational design was used to
study the relationship between IQ scores on Raven’s Standard Progressive Matrices
161
test and Student's Academic Achievement (SAA) of Libyan students aged 8 to 21
years old. Finally, a cross-sectional approach was identified in this study as data were
collected from a sample with different age groups in a single time period.
5.4 Methods
In this study, the SPM test was used as a method to measure intelligence objectively.
The SPM resulted from the work of the British psychologist John C. Raven and
British geneticist Lionel Penrose. Their work was based on Spearman's two-factor
theory. The SPM tests are one of very few tests based on Spearman’s general (g)
factor theory of intelligence. Spearman (1946) felt that the goal of measuring “g” had
been achieved by the use of the Matrices test and considered the Progressive Matrices
test as the best of all non-verbal test of “g” or eductive ability.
Raven et al., (1996) mentioned that the SPM is used internationally, and no general
revision of it has been deemed necessary. Burke, (1958); Anastasi, (1988); Raven,
(1989); Carpenter et al., (1990); Arthur, & Woher, (1993); Arthur & Day (1994);
Court & Raven (1995); Murphy & Davidshofer (1998); Raven (2000); Kline (2000)
and Lynn (2006) noted that the SPM was the most widely used test due to the
following reasons:
• Non-verbal nature; can be applied cross-culturally.
• Being the best test of g; general factor present in all cognitive tasks.
• Being a group test and easy to administer and score.
• Possesses good psychometric characteristics (high validity and reliability).
162
• Being a popular instrument for use in developing countries (Thorndike &
Hagen 1977 and Ogunlade 1978)
• Being the first version of the RPM tests to be constructed (Raven, 1939) with
the possibility to be used for children from the age of 6 years onwards (Yoon,
2006).
Reliability and validity are both important measurements for identifying the suitability
of a test or a measuring instrument and are the most paramount characteristics of a
psychological test (Brown, 1983, Urbina, 1997, Kenneth, 1998, Kline, 2000,
Langdridge, 2004, Domino, Domino, 2006. Airasian, 2006, and Lobiondo-Wood &
Haber 2006). To achieve the aim of this study; validity, reliability and item analysis
(item difficulty and item discrimination index) were evaluated.
In addition to the SPM test, a meta-analysis was employed to compare performances
on the SPM test of a Libyan sample with that of other countries (developed and
developing countries). A review of relevant studies published on the SPM test from
computer databases, dissertation and bibliographies of review articles generated 44
studies. These studies were carried out in various countries between 1948 and 2009.
From each relevant study the following data were recorded and coded: (a) Author (b)
Country (c) Year of publication; (d) Population sampled; (e) Age (f) SPM means and
standard deviations and (g) Sample size.
These studies were carried out in Congo, Denmark, Egypt, Estonia, France, India, Iran,
Israel, Libya, Nigeria, Mexico, Qatar, Tanzania, Turkey, Syria, Sudan, Pakistan, UK
and the USA between 1948 and 2009. To be included, a study should provide
sufficient data such as SPM scores.
163
5.5 Ethical approval
This study was considered the first attempt to standardise Raven’s Standard
Progressive Matrices (SPM) test, and apply it on a sample from Libya. Ethics
consideration in research, according to Saunders et al (2007), “refers to the
appropriateness of your behaviour in relation to the rights of those who become the
subject of your work, or are affected by it” (p.178). Ethics in research is an important
issue and must be taken into consideration in any research design. Ethical approval
was obtained from the Research Governance and Ethics Committee at the University
of Salford (RGECo7/o74). In addition, Ethical approval was obtained from the
department of psychology in the University of Omar El-Mukhtar in Libya and
department of External Studies and Technical Cooperation in the Ministry of Higher
Education in Libya.
SPM testing was carried out by the researcher and well-trained teaching assistants
whom helped the researcher to distribute and administer the SMP test. The researcher
was trained by Professor Abdulrazik S. Attashani of the University of Omar El-
Mukhtar in 2001 during his study for a Masters degree. Only the researcher knew the
identity of the participants as their details were only accessible to the researcher. All
obtained data were secured in a safe place. The study included students from the age
of 8 to 21 years. The main purpose for this study was to develop the norms to find out
the distribution of IQ scores with Libyan students. Providing these norms would serve
as a guide in helping people to take appropriate decisions related to their future, and
choose educational programs that will best suit their abilities and assist in matching
job applicants to suitable employment.
164
Participation in this study was optional. An information sheet was provided and each
participant (or guardian of participant) was asked to sign a consent form. The
researcher also provided a simplified information sheet for children. “Please refer to
information sheet /children”. Information sheets and consent forms were available in
the native language of the participants (Arabic) and were comprehensive in content
and concepts. Each participant was free not to take part in the study or to withdraw at
any time without stating a reason. Also, participants were assured that their scores in
the SPM test was to be used for research purposes only. The researcher was available
on a contact number given if the participant wanted to discuss any matter that might
occur during the study. Results of the study were made available to all participants
and are possibly be published in Intelligence Journal. Participants that were willing to
attempt the test (children needed guardians/parents consent) were registered and then
the researcher randomly chose the participants.
5.6 Pilot study
A pilot study was first conducted to determine validity and reliability of the SPM test
to ascertain the applicability of the test. In addition, the pilot was done to determine
how clear the instructions of the test were for the participants, and to introduce the
way the test is conducted to the trained psychologists.
The sample consisted of 200 students (100 males and 100 females). Using Social
Package for Statistical Science (SPSS) (version 16) software, reliability was
investigated using split-half and Alpha (KR-20) methods and validity was
investigated using correlations coefficients (internal consistency of SPM test sets) and
external criterion (student's academic achievement) (SAA). The split-half reliability
ranged from (0.87 to 0.88) and internal consistency reliability ranged from (0.93 to
165
0.94). The validity using correlations coefficients (internal consistency) showed
statistically significant high correlations ranging from (0.70* to 0.89**) between the
SPM test sets and the total test score. Moreover, validity using correlation between the
SPM test and the external criterion (SAA) showed statistically significant moderate
correlations of (0.52**). It was concluded that the SPM provided a promising measure
of the non-verbal ability of Libyan students.
5.7 Main study

5.7.1 Sample size
Sample size (2600 students) was based on the original SPM test that was standardized
on a sample of 735 British children aged 6-13 years tested individually, 1,407 British
children aged 8-14 years tested in groups and 629 British adults aged 20-70 years old
(Raven, 1960 and Raven, et al. 1998). Kline (2000) stated that the sample size has to
be large enough to reduce the standard errors of correlations to negligible proportions.
The researchers aimed to achieve the highest possible number of participants in this
study, which was 2600 participants.
5.7.2 Sample selection
5.7.2.1 Multi-stage-cluster sampling design
The researcher lacked any sample framework (a record to select the candidates from)
for Libyan students aged between 8 to 21 years old, who were mainly in different
educational grades either for those enrolled in the different schools aged from 8 to 17
years old or for those enrolled as undergraduate students in different universities
grades aged from 18 to 21 years. In addition, the research dealt with a huge dispersed
area, the Eastern Libyan Region. It encompassed a large number of cities and villages.
Moreover, the researcher dealt with a wide range of different age groups; from 8 to 21
166
years old. Consequently, the only available way to choose the sample was to employ a
multi-stage sampling technique. Its main advantages included no need for a sample
framework prior to conducting the survey and the ability to prepare it in the field.
Also, ease of conduct in the likelihood of a dispersed region.
In cluster sampling, intact groups, not individuals are randomly selected. All members
of selected groups had similar characteristics. Cluster sampling is more convenient
when the population is large or spread out over a wide geographic area. Cluster
sampling can be carried out in stages, involving selection of clusters within clusters.
This process is called multistage sampling (Mills & Airasian, 2006). When Raven, in
1981, standardized the Irish and British SPM test, he used this sampling method,
which was defined by Denscombe (1998) as a sampling method that involves
selecting samples from samples, each sample being drawn from within the previously
selected sample. In principal, the multi stage sampling method, which is an outright
random probability sampling method, can go on through any number of levels, each
level involving a sample drawn from the previous level (Bryman, 2005).
Consequently, by getting sufficient numbers of representative clusters or units for the
whole population and focusing on them, the researcher saved time and money instead
of spending them on travelling to the research sites scattered though the length and
breadth of the region. In addition, it enabled the researcher to prepare the sample
framework in the field to select prospective respondents. Thus, the pre-mentioned
advantages led to selecting the multi-stage disproportional stratified method as the
main method for selecting suitable representative samples for this research.
167
5.7.2.2 Disproportional stratified sampling
Although, the stratified sampling method continues to adhere with the underlying
principles of randomness, it adds some boundaries to the process of selection and
applies the principles of randomness within these boundaries (Denscombe, 1998). The
significant advantage of stratified sampling over random sampling is the ability to
assert some control over the selection of the sample to guarantee the inclusion of
crucial events or crucial people or social groups in the sample. This sample design
varied the sample fraction between different strata which increased the sample size in
small strata allowing enough cases for analysis, which is important for comparing
subgroups. Consequently the researchers used the multi-stage, cluster-disproportional
stratified sampling technique six times as follows;
1. To select at least one main and one secondary city.
2. To select nine villages from the existing thirty. Villages were divided
depending on location to coastal, mountain or desert villages. The
researcher selected three villages from each category.
3. To select at least one elementary, one preparatory and one secondary
school in every village of the selected nine villages, regardless of the
existing number of schools, in every educational level and to select at
least one classroom from every grade of the six grades in the elementary
school or from the three grades in both the preparatory and the secondary
school, regardless of the available number of classrooms in every grade,
in these villages.
4. To select at least five male and five female students from every
classroom, from the different classrooms in the different grades in the
selected nine villages.
168
5. To select at least five male and five female students from every
classroom, from the different educational grades in both selected cities.
6. To randomly select male and female students from either the scientific or
arts curriculum in the two different branches of Omar El-Mukhtar
University.
A main difference between cities and villages was the existence of separate schools
for male and female students in the preparatory and secondary school education levels
and common schools in elementary education levels in the cities, contrarily to villages
where all the schools are common and shared for both genders.
Also, the existence of many administrative boundaries necessitated selecting more
than one school to represent the city. This meant that it was impossible to select one
elementary school for example to represent all the elementary schools in the city.
Consequently the researcher decided to divide the main city into six administrative
boundaries and the secondary city into three administrative boundaries. This was
followed by selecting one school for male students and one school for female students
for every educational level located within the selected administrative boundaries. In
addition, only one school was available for each educational level in each village in
contrast, to the availability of many schools for each educational level in the city.
For this purpose of the study, two cities were chosen; a main city (Al-Beida) and a
secondary city (Shahat). Al-Beida is the main city in the eastern region of Libya.
During the monarchy (1951-1969), Al-Beida was the second capital of Libya. Now
the municipality of the eastern region has a university (Omar El-Mukhtar University),
consisting of five campuses situated in the following cities: Al-Baida, Al-Marj, Al-
169
Gooba, Tobruk and Darnah. Al-Beida is considered as an educational, trade and
health centre for neighbouring settlements and small cities (Kezeiri, 1995). According
to the General Authority of Information in 2006, Al-Beida city has been divided into
six administrative boundaries; Alsog algadem, Algareka, Werdamah, AlZaweya
Algademah, Al-Beida Algharbiya and Al-Beida Alshargeya.
Shahat city, previously known as Cyrene, was established by the Greeks in 631 B.C.
It was the first city to be formed in Libya. The location of the city played a significant
role in its growth and prosperity as did the availability of water from the Apollo
springs and abundance of rain. Its proximity to Apollonia port provided easy contact
with all Mediterranean ports. The city is considered as an important political,
religious, agricultural and industrial centre (Kezeiri, 1995). According to the General
Authority of Information in 2006, Shahat city has been divided into three
administrative boundaries; Shahat Aljadedah, Shahat Algademah and Almansora. A
representative school was chosen for each administrative boundary in these two cities.
In addition, eastern regions provided the researcher with a wealth of resources (i.e.)
the researcher was born in Al-Beida city, and had good links to academic fellow
students and researchers. Also, he had taught in various cities located in the eastern
regions of Libya.
A large and more easily accessible sample was chosen from two of the Libyan cities
(AL-Beida, and Shahat) and nine villages because of its manageability both in terms
of time and resources, besides the researcher’s familiarity with the social context.
Figure 5.1 summarizes the importance and process of sampling method followed.
170
Multi-stage stratified Grouping and clustering the six cities and 30 villages to two main
probability sample for urban and rural clusters
the selection of
students aged between Select two cities from urban cluster and nine villages from rural cluster
8 to 21 years old
either in the basic
Selecting one elementary school, one preparatory school, and one
educational level or in secondary school from every village of the nine villages, then select at
the university least one classroom from every grade from grade three to grade twelve,
graduating level. followed by selecting at least five male and five female students from
each classroom.
Selecting six elementary schools (shared schools), twelve preparatory

schools, and twelve secondary schools (Separate schools) in the main
city from the selected six administrative boundaries and three
elementary schools (shared schools), six preparatory schools, and six
secondary schools (separate schools) in the secondary city from the
selected three administrative boundaries within the selected two cities,
followed by selecting at least five male and five female students.
Selecting 800 under graduated students from the two branches of
El-Mukhtar University located in two different cities from the arts and
science curriculums including equal numbers of male and female
students
Why this sample’s Lack of sample framework

design Sampling of a wide area (eastern Libyan region)
Interviewees are dispersed over wide areas (six cities and 31 villages)
Time and cost limitations
Figure 5.1 Summary of the sampling method and theory
5.7.2.3 The multi-stage-cluster sampling process and procedures
The procedures of conducting the multi-stage stratified sampling method involved
sampling from one higher level unit called the preparatory sampling unit (Eastern
Libyan Region) and then sampling of secondary sampling units from and within that
higher level unit (cities and villages). This was followed by classifying the cities to
two homogenous urban area clusters using the criterion of their administrative
boundaries as the third sampling level; main and secondary cities. The researcher
selected one city from each category, In addition, villages were classified into three
different categories (third clustering sampling level); coastal, dessert and mountain
171
villages. Three villages were selected from each category with different weights or
ratios as the fourth sampling level. Followed by classifying and counting for the
existing schools either in the two selected cities or the nine selected villages as the
fifth sampling level according to their educational levels in Libya; elementary level
(grade three to grade six), preparatory level (grade seven to grade nine), and
secondary level (grade ten to grade twelve).
The aim was to select one elementary, one preparatory and one secondary school from
each village, where most schools are common; for male and female students. The
researcher visited 27 schools in the nine villages to select the prospect respondents
(students) randomly from a list (sample framework), prepared by himself in the field
(during his visit to these schools). In the two cities, the aim was slightly different due
to the fact that preparatory and secondary schools apply a one gender policy and due
to the implications of the sophisticated composition of each city administrative
boundaries on the inability of selecting one school as a representative for the whole
city. Consequently, the researcher found himself in need of selecting at least two
schools in the preparatory and secondary educational level, one for male and one for
female students. This resulted in selecting six elementary schools, twelve preparatory
schools, and twelve secondary schools in the main city and three elementary schools,
six preparatory schools, and six secondary schools in the secondary city. Overall, the
researcher visited 72 schools from the existing 124 schools (about 58%) in the
different 11 settlements (two cities and nine villages); 27 schools located in the
selected nine villages and 45 schools located in the selected two cities.
Selection of one classroom from every grade in every school either in the nine
villages or in the two cities was conducted by the researcher. Children in Libya start
elementary school at the age of six years old. The researcher randomly selected
172
classrooms in the elementary schools from grade three and onwards. The student list
was prepared depending on the student’s age.
Regarding the respondents aged from 18 to 21 years old enrolled in the universities,
the researcher selected Omar El-Mukhtar University which consisted of five
campuses in different settlements situated in; Al-Beida city, and Al-Marj. This could
be traced back to the fact that the researcher taught at Omar El-Mukhtar University in
Al-Beida as a lecturer in psychology and in its branches located at Al-Marj as a
visiting lecturer. Consequently the researcher had much more access to the university
or schools located in the mentioned settlements, in addition to its implications on
easing the researcher tasks in collecting a reasonable amount of data, accessing to the
available data resources and establishing good links with past and current academic
staff.
The application of the multi-stage stratified sampling method to select the respondents
aged from 18 to 21 years old from this university as the primary sampling level
involved classifying its different specialisations into two main curriculum groups; the
science specialisation students and art specialisation students as the secondary
sampling level. The two main specialisations or curriculum were divided by the four
academic years or grades as the third sampling level. Finally the researcher selected
students from every grade within the two curriculums. The aim was to select at least
200 students from each grade (100 students from the scientific curriculum and 100
students from the art curriculum) in the same time assuring gender equality (100 male
and 100 female students) disproportional to the real numbers of students in these two
173
main curriculum and regardless of the real numbers of either male or the female
students.
Overall, 2600 respondents aged from 8 to 21 years old with different fractions,
weights or ratios to the real numbers of prospect respondents in each group were
selected. The distribution of this number of respondents was as follows:
• 900 respondents or students from nine villages, aged from 8 to 17 years old,
enrolled in three basic educational levels; elementary, preparatory and
secondary school educational levels.
• 900 students from two cities, aged from 8 to 17 years old, enrolled in three
basic educational levels; elementary, preparatory and secondary school
educational levels.
• 800 undergraduate students enrolled in Omar El-Mukhtar University.
Table 5.1 shows the followed principals in selecting the respondents from different
educational level in the rural and urban areas. Tables 5.2 and 5.3 show the
differentiation in the frictions between the selected sample sizes and the real numbers
of students due to the applied stratified sampling method either in the two selected
cities (table 5.2) or in the nine villages (table 5.3). Finally, table 5.4 shows the
frictions of the undergraduate students to the real numbers of Omar El-Mukhtar
University’ students. Additionally, figure 5.2 summarises the procedures of the
selected multi-stage stratified sampling method.
174
Table 5.1 principals of selecting sample in schools
EDUCATIONAL VILLAGES CITIES TOTAL
LEVEL
Elementary school 9 villages* 1 school* 4 2 cities* 9 boundaries * 1 school 720
grades* 1 classroom* (5 (shared school) * 4 grades
male students + 5 female * 1 classroom* (5 male and 5
students) = 360 students female students) = 360 students
preparatory school 9 villages* 1 school* 3 2 cities* 9 boundaries * 2 schools 540
grades* 1 classroom* (5 (1 male+1 female school)* 3 grades
male students + 5 female * 1 classroom* (5 male or 5 female
students) = 270 students students) = 270 students
Secondary school 9 villages* 1 school* 3 2cities* 9 boundaries * 2 schools 540
grades* 1 classroom* (5 (1 male+1 female school)* 3 grades
male students + 5 female * 1 classroom* (5 male or 5 female
students) = 270 students students) = 270 students
Total 900 900 1800
Table 5.2 Target sample size of the pre-university students in the two cities in
proportion to their real numbers
AGE STUDY LEVEL GENDER TOTAL
Male Female
8 Year three at elementary 45/290=15.5% 45/304=14.8% 90/594=15.1%
9 Year four at elementary 45/287=16.6% 45/298=15.1% 90/585=15.3%
10 Year five at elementary 45/284=15.8% 45/296=15.2% 90/580=15.5%
11 Year six at elementary 45/278=16.1% 45/286=15.7% 90/564=15.9%
12 Year one at preparatory 45/256=17.5% 45/274=16.4% 90/530=16.9%
13 Year two at preparatory 45/252=17.8% 45/270=16.6% 90/522= 17.2%
14 Year three at preparatory 45/265=16.9% 45/268=16.7% 90/533= 16.8%
15 Year one at secondary 45/239=18.8% 45/254=17.7% 90/493=18.2%
16 Year two at secondary 45/235=19.1% 45/248=18.1% 90/483=18.6%
17 Year three at secondary 45/243=18.5% 45/252=17.8% 90/495=18.1%
Total 450/2629= 17.1% 450/2750= 16.3% 900/5379= 16.7%
175
Table 5.3 Target sample size of pre-university students in the nine villages in
proportion to their real numbers
AGE STUDY LEVEL GENDER TOTAL
Male Female
8 Year three at elementary 45/230=19.5% 45/262=17.1% 90/492=18.3%
9 Year four at elementary 45/247=18.2% 45/250=18.0% 90/497=18.1%
10 Year five at elementary 45/236=19.0% 45/242=18.7% 90/478=18.8%
11 Year six at elementary 45/239=18.8% 45/258=17.4% 90/497=18.0%
12 Year one at preparatory 45/231=19.4% 45/251=17.9% 90/482=18.6%
13 Year two at preparatory 45/213=21.1% 45/224=20.0% 90/437=20.5%
14 Year three at preparatory 45/220=20.4% 45/236=19.0% 90/456=19.7%
15 Year one at secondary 45/216=20.8% 45/225=20.0% 90/441= 20.4%
16 Year two at secondary 45/211=21.3% 45/220=20.4% 90/431= 20.8%
17 Year three at secondary 45/217=20.7% 45/229=19.6% 90/446= 21.8%
Total 450/2260= 19.9% 450/2397= 18.7% 900/4657= 19.3%
Table 5.4 Target sample of undergraduate university students in Omar El-Mukhtar

University in proportion to their real numbers
Age Study level Gender Academic discipline Total
Sciences Arts
18 Year one Male 50/482=10.3% 50/509=9.8% 100/991=10.1 %
Female 50/496=10.1% 50/518= 9.6% 100/1014=9.9 %
Total 100/978=10.2% 100/1027=9.7% 200/2005= 9.9%
19 Year two Male 50/443=11.2% 50/502=9.9% 100/945= 10.5%
Female 50/475=10.5% 50/513= 9.7% 100/988=10.1 %
Total 100/918=10.9% 100/1015=9.8% 200/1933=10.3%
20 Year three Male 50/442=11.3% 50/497=10.1% 100/939=10.6%
Female 50/468=10.6% 50/501=9.9 % 100/969=10.3%
Total 100/910=10.9% 100/998=10.0% 200/1908=10.5%
21 Year four Male 50/439=11.3% 50/458=10.9% 100/897=11.1%
Female 50/457=10.9% 50/465=10.7% 100/922=10.8%
Total 100/896=11.1% 100/923=10.8% 200/1819=10.9%
Total 400/3702=10.8% 400/3963= 10.1% 800/7665=10.4 %
176
Figure 5.2 Sampling process
-Wide eastern region Multi stage- stratified eleven case studies (two
area cities and nine villages) sample design
-Lack of sample
framework
-Random probability Divide existing cities to two clusters according to
sample has sufficient administrative boundaries and villages to three clusters
accurate results according to geographic region)
-Many settlement types
and it is hard to select
one case study Two categories for Three geographic regions for
-Research limitation cities; Main, and villages; Coastal, Dessert and
especially limited field Secondary cities Mountain villages
work time and cost.
Main city Secondary city Coastal villages Mountain village Dessert villages
Alhanih, Alhammh qsarlibya, Maraoh, Aslanth
Al-Beida Shahat and Suasa Garnada and and Gantolah
Satih
Cluster A: schools in cities. Cluster B; schools in villages.

Nine elementary schools from two cities: Nine elementary schools, one school from
nine for female and male students, one each village
school (shared school) from each Nine preparatory schools, one school
administrative boundary. from each village
Eighteen preparatory schools (separate Nine secondary schools, one school from
schools) and eighteen secondary schools each village
(separate schools), two schools (shared
school) from each administrative boundary.
Selecting one classroom in every grade Selecting one classroom in every grade
from grade three in the elementary school from grade three in the elementary school
to grade twelve in secondary school to grade twelve in secondary school
Five male Five female Five male Five female

students students students students
Graduate students in university aged between 18 and 21 years old, selecting 400 students
from science and 400 students from art specialization; 200 students (100 male and 100
female) from each year in both specializations, from both campuses.
177
5.8 Field work arrangement
Assistance in the field work was provided from five well trained psychologists who
were the researcher colleagues at Omar El-Mukhtar University (teaching assistants)
after introducing and explaining the SPM test form, purposes and questions order to
them.
A request was made to the directors of the education sector to issue a letter to enable
the researcher to carry out the study in the chosen schools and universities.
The researcher contacted each school principal and dean faculty by a letter from
the sector of education explaining the purpose of the study and the procedure to be
followed in selecting and testing the students. At each school and university on the
day of the SPM testing, the researcher arrived one hour earlier to randomly select
students (males and females) from grades 3 to 12 from the sample framework (record
with students’ names in the selected classroom) which the researcher prepared in the
field with the help of the student affairs and student admission manager (students aged
from 8 to 17 years), or to select then 200 students in each year of university for both
disciplines (students aged from 18 to 21 years old). All participants were given an
information sheet and were required to sign a consent form before participation in the
study.
A place for testing the students was made available at each school. The place, in most
cases was either the school theatre or library where each student had his own table and
chair. Due to the large numbers of students in schools and existence of differences in
their age ranges, less than forty students were tested at a time using the SPM test. In
the university tests, the same methodology was adopted using groups of fifty at a time.
Participants were coded. Regarding school students, code was be based on location
178
whether city or village. In the case of students from villages, code was based on the
three types of village, name of villages, name of school, grade, gender and finally
number of participant. While, in the case of students from cities, code was based on
name of city, name of school, grade, gender and finally number of participant.
Moreover, no two cities or villages had names starting with the same letter.
For example: VCSM5F2;
V= Village “first letter”.
C= Coastal village type “first letter”.
S= Village name “first letter”.
M= School name “first letter”.
5= Year level.
F= Sex female “first letter”.
2= Participant number.
Regarding university students, code was based on name of city, name of university,
specialization, year level and sex and participant number.
For example: UBOA3M32;
U= University Participants “first letter”.
B= Beida “name of city”.
O= Omar Al-Muchtar “name of university”.
A= Arts Specialization.
3= Year level.
M= Sex male “first letter”
32= Participant number.
Personal details of participants were kept separately in a secure location, accessed
only by the researcher. Each participant name was assigned the code present on the
179
first page of the answer sheet. Only the researcher knew both the name and assigned
code for each participant. The researcher had supervised access to the children. At all
times, the school headmaster and teachers accompanied him and supervised him while
addressing the students and conducting the test.
5.9 Preparation of the SPM test
The Standard Progressive Matrices test consisted of 60 items in 60 pages, and was
divided into five sets lettered A, B, C, D and E. Each set consisted of 12 items. Each
page of the booklet contained a matrix with one missing part. Students were asked to
choose the missing part from six or eight options given below each matrix, and
indicate its number on a separate answer sheet. The following modifications were
introduced into the SPM test, to make it more suitable for the Libyan sample
. Instructions were given in the colloquial Libyan Arabic language
. English letters (A, B, C, D and E) in the five sets were changed into Arabic letters
3. Page order (direction) of the test booklet was changed from left to right, to suit the
Arabic way of writing and reading.
4. A new answer sheet was designed with Arabic letters, and right to left direction for
answering and writing.
5.10 Administration of the SPM test
During September to November 2007, the SPM test was administered to 1800 school
students, and during September to November 2008, the SPM test was administered to
800 university students. The researcher was introduced to the students by the head
teacher in schools or main supervisor or professor in universities. The researcher
followed a definite numbers of unified steps during conducting the SPM test with the
respondents as follows:
180
1. Some time was spent at the beginning of each SPM test to establish a good rapport
with students, by discussing the purpose of the study, and why certain students from
the whole school were randomly selected to participate in the study. Also, the students
were assured that their scores in the SPM test would remain anonymous, and would
be used for research purpose only. After the test they were thanked for participating.
2. After the introduction, the SPM test booklets were distributed to the students and
they were asked not to open the booklets, until told to do so.
3. To ensure that the students understood the test and the unfamiliar procedures for
recording their responses on a separate sheet, the standard instruction for group
administration given in the SPM test manual were follows as:
(a) This is a test of observation and clear thinking. Please open your test booklet at
the first page. You will find problem Number A1. Now look at your answer
sheet, you will see that under the heading set A there is a column of numbers
from 1 to 12.
(b) Now look at item A1, it is a pattern with a part cut out of it. Look at the pattern,
think what is the piece needed to complete the pattern correctly. Then find the
right piece out of the six shown below.
(c) All the pieces are the right size to fill the right space, but only one of them is the
right pattern. Number 1 is the right shape, but is not the right pattern. Number
2 is not a pattern at all. Numbers 3 and 5 are quite wrong. Number 6 is nearly
right, but is wrong here. Number 4 is the right answer because it is correct
both ways, isn't it?
(d) Now you write "4" next to number 1 under set A on your answer sheet. Please
don't mark the test booklet.
181
(e) On every page of the booklet there is a pattern with a piece missing, you have to
choose which one of the pieces below is the right one to complete the pattern,
and write its number next to the problem number on your answer sheet. Go on
like this by yourself until you reach the end of the booklet.
(f) The problems are simple at the beginning and get harder as you go on. Do not
miss any out if you are not sure make a guess. If you get stuck, move on to the
next problem, and then come back to the one you have difficulty with.
(f) Any questions? I will come around to see that you are getting on all right.
(h) You can have as much time as you like. Now turn over to problem 2 and start.
4. The SPM test was administered without a time limit, as recommended by the SPM
test manual. However the researcher recorded the definite time needed to complete it
by each student. When each student had completed the SPM test and handed in his /
her test booklet and answer sheet, the researcher checked the answer sheet to make
sure that it had been filled in correctly and that every item had been answered, then
registered the time that the student needed to complete the test. The longest test time
recorded was 81 minutes.
5. The SPM test scores for the students were obtained by using the scoring key
provided in the SPM manual.
6. The SPM items were scored by hand and double checked. The items were scored
either right or wrong. The maximum possible score was 60. The score was the number
of correct answers.
5.11 The proposed and achieved sample size
The researcher succeeded in achieving 100% of the target sample size in the pre-
university schools and in university students. In the chosen cities and villages, 90
students (45 males, 45 females) in each of the 10 educational levels were chosen. This
182
led to a total of 1800 student (900 male and 900 Female) who took the test (900 from
nine villages and 900 from two cities). Regarding university students, 100 students
(50 male and 50 female) in each of the 4 study level were chosen. This led to a total of
800 students (400 male and 400 female) 200 students in each year of university for
both disciplines 100 Science students (50 male and 50 female) and 100 Arts students
(50 male and 50 female).
5.12 Data Statistical Analysis
This section discusses data preparing, cleaning and the rational for statistical tests
used in this study. Data collected were imported into (SPSS) (version 16) software.
Afterwards, data was screened for errors and missing parts and then analysis using
SPSS (16) was carried out.
First descriptive statistics employing frequency distributions, means, standard
deviations and charts for all study variables were conducted to present an overview of
the performance of Libyan participants on the SPM test. Also, normality of the data
was tested using the Kolmogorov-Smirnov test and normal probability plots. Data
showed normal distribution.
Second to compute differences between SPM test means, independent sample T-test
was used when one continuous dependent variable (SPM test scores) was examined
and subjects divided into two groups e.g. male and female or science and arts
disciplines or cities and villages (Pallant, 2007). The analysis based on region and
geographic area was not carried out on university students, because all university
students were in the city, and there were no universities in villages.
183
Third to compute differences between SPM test means, One-Way Analysis of
Variance was used when one continuous dependent variable (SPM test score) was
examined and sample divided into more than two groups e.g. age (Pallant, 2007).
Fourth To compute differences between SPM test means, Two-Way Analysis of
Variance was used when one continuous dependent variable (SPM test score) was
examined and the sample divided by two independent variables e.g. gender and age
or region and age. This analysis allowed the investigation of the individual and joint
effect of two independent variables on one dependent variable (Pallant, 2007).
Fifth To investigate the effect size of the SPM means by calculation of cohen’s d,
which is equal to the subtraction of the means divided by the mean of the standard
deviation. In addition, cohen’s d was used to calculate IQ point difference which was
equal to d multiplied by the SD (15).
Sixth To evaluate the variability (variance ratios); Vr average of the squared
differences from the mean (Lynn and Irwing, 2004).
Seventh To convert SPM means score to IQ scores using British and American
percentile indices and a conversion table from percentiles to IQ scores The British and
USA norms for the Standard Progressive Matrices were used to calculate the IQ of the
Libyan sample. This method has been used in many recent studies such as Lynn and
Vanhanen in 2006, Abdel-Khalek and Lynn in 2006, Keleefa and Lynn, in 2008a,
Keleefa et al.in 2008b, Abdel-Khalek and Lynn in 2009 and Lynn in 2009. In
addition, kaplan and Saccuzzo (1997) concluded that Raven was regarded as one of
the major authorities in the psychological testing field in the 21st century.
Eighth a Pearson Product-Moment Correlation coefficient was used to examine
continuous variable correlational relationships. The direction and strength of such
relationships (between SPM test scores and Student's Academic Achievement (SAA))
184
was investigated following these guidelines; r = 0.10 small effect, r = 0.30 medium
effect and r = 0.50 large effect (Field and Hole, 2005). Also Pearson Product-Moment
Correlation coefficient was used to calculate validity of internal consistency
(correlation coefficients between SPM test total score and SPM test sets) (Anastasi
and Urbina 1997). Pearson Product-Moment Correlation coefficient was used to
calculate validity of criterion-related (correlation coefficients between SPM test
scores and student's academic achievement) (Anastasi and Urbina 1997).
Ninth Multiple regression stepwise analysis method was used to investigate which
independent variable was the best predictor (gender, age, (SAA) and regions; urban
(city) and rural (village)) of SPM scores (Pallant, 2007).
Tenth Reliability of SPM test scores were investigated using split-half, Alpha and
test-retest (KR-20) methods. In the split-half method, items were divided into odd and
even items, because the items were arranged in order of difficulty (Kline 2000). Alpha
(KR-20) estimated how test items related to each other and to the total test. It is useful
for multiple choice items that were scored as right or wrong (Anastasi, Urbina 1997
and Mills, Airasian 2006). Test-retest correlated items within a test, when the test was
administered on two occasions (Kline 2000).
Eleventh two different methods were used for validity estimation; the first was the
Construct Validity by using Factor analysis and internal consistency and the second
was the criterion-related validity by using (SAA) as an external criterion. Due to lack
of standardized mental tests in Libya it was not possible in this study to use any other
intelligence test as an external criterion to investigate the validity of the SPM test.
185
Twelfth Item Analysis (difficulty and item discrimination) was investigated.
(a) Item difficulty: the proportion of respondents who answered an item correctly. If
most respondents answered an item correctly; the item was an easy item. If most
respondents answered an item incorrectly, it was a difficult item (Brown, 1983).
(b) Item discrimination index showed whether items differentiate between people with
varying degrees of knowledge and ability (Brown, 1983). The point biserial
correlation between “pass/fail” on each item and total test score was used to
investigate the SPM item discrimination ability (Anastasi 1988 and Anastasi, Urbina
1997).
This chapter discussed in details the methodology and theoretical perspectives
underpinning this study. The Non-experimental quantitative research designs (descriptive
and comparative surveys, correlational and cross-sectional) were used. Ethical
considerations were considered. A pilot study was conducted and results showed that
the SPM test was valid and reliable and it was subsequently recommended for use for
Libyan students. A sample size of 2600 students (aged between 8-21 years) was based
on two previous British standardized SPM tests. Sampling process included a multi-
stage, cluster-disproportional stratified sampling technique. This study involved 72
schools located in 11 different settlements; nine villages and two cities and two
universities located in two cities; Al-Beida and Al-Marj. The researcher succeeded in
achieving 100% of the target sample size. A meta-analysis was carried out to compare
performance in the SPM test for a Libyan sample with that of other countries. Finally
statistical tests employed and rationales were justified. Next will be the SPM Libyan
sample results chapter. Meta-analysis will be discussed in chapter seven.
186
Chapter 6 Results
6.1 Introduction
This study represented a preliminary standardization for the SPM test on a Libyan
sample to develop norms for the classical form of the Standard Progressive Matrices
(SPM) test in Libya and to identify the distribution of IQ scores in a sample of Libyan
students. There were seven research objectives and results analyzed in this chapter.
The meaning and significance of the attained results and objectives will be postponed
to the next chapter. The SPSS version (16) analysis was carried out as follows
1. Determine psychometric characteristics (reliability, validity, difficulty and
2. To study the relationship between SPM mean scores and student’s academic
3. To investigate the presence of significant differences in sample performances on
the SPM test according to gender, region (cities and villages), academic discipline
(science and arts), geographical areas (main city, secondary city, coastal, mountain
and desert), age and study levels.
4. To investigate the presence of significant differences in sample performance on the
SPM test according to region and gender, age and region, region and study levels,
geographic areas and gender, academic discipline and gender, age and gender and age
and academic discipline.
5. To investigate variability of SPM means score gender based on age and gender
based on geographic areas and gender based on academic discipline.
6. To examine the contribution of the independent variables gender, age and regions
and academic achievement in predicting SPM scores.
7. To compute the percentile ranks for the SPM scores according to the sample age
levels.
187
In addition, a eighth study research objective, which dealt with comparing
performance on the SPM test for a Libyan sample with that of other countries (meta-
analysis), was carried out and is reported in chapter seven. Data obtained were tested
for normality. For this, the Kolmogorov-Smirnov, Shapiro-Wilk test (table 6.1) and
normal probability plots (figures 6.1, 6.2, 6.3 and 6.4) were employed to investigate
and determine normality of the data.
Table 6.1 Descriptive statistics of overall collected data and tests of normality.
Descriptive statistics Statistic Std Error
Mean 32.31 .234
95% confidence 31.85
Interval for Mean 32.76
5% Trimmed Mean 32.40
Median 33.00
Variance 142.670
Std. Deviation 11.94
Minimum 6
Maximum 58
Range 52
Interquartile Range 19
Skewness -.217 .073
Kurtosis -.596 .146
Tests of normality
Kolmogorov-smirnov Shapiro-Wilk
Statistic df Sig Statistic df Sig
.070 2600 .005 .971 2600 .005
Figure 6.1 Histogram showing normal distribution for means scores.
188
Figure 6.2 Normal Q-Q plot. Figure 6.3 Detrended normal Q-Q plot.
70
60
50
40
30
20
10
0
N= 2600
totaliq
Figure 6.4 Box plot of scores distribution.
Figure 6.1 is a histogram showing the SPM scores. They appeared to be normally
distributed. Figure 6.2 showed a normal probability plot (normal Q-Q plot). Here the
observed value of each mean is plotted against its expected value. A reasonable
straight line suggested a normal distribution. Figure 6.3 showed the detrended normal
Q-Q plot, where the actual deviation of the scores from the straight line are plotted.
Most scores were collected around the zero line with no real clustering of scores. This
indicated a normal distribution. Figure 6.4 showed a box plot. 50% of score are
189
represented by the rectangular, while the line inside the box represents the median
value, whereas the whiskers represent the highest and lowest values.
The statistical results of both tests of normality were significant (p = 0.000). However
the sample size in this study was large and that indicated a normal distribution
(Pallant, 2007). In addition, Pearson’s Skewness Coefficient was used to verify the
normal distribution. Pearson's Skewness Coefficient is a measure of skewness (Duffy
and Jacobsen, 2005) which is defined as:
Skewness coefficient= (mean-median)/SD
Hildebrand (1986) stated that skewness values above 0.2 or below -0.2 indicate severe
skewness. The skewness coefficient in this sample was -0.05 indicating minor
skewness. All of the above tests indicated that the sample used was normally
distributed and that parametric tests may be applied with confidence to analyze the
data.
6.2 Description of students and SPM score means
A total of 2600 Libyan students participated in this study. Students were divided into
subgroups according to gender, age, region, Geographic areas and academic
discipline. 1800 school students (900 males and 900 females) and 800 University
students (400 males and 400 females) carried out the test. According to region, 900
school students were from cities, whereas the remaining 900 were from villages. They
were chosen from 72 schools located in 11 different settlements; nine villages (27
schools) and two cities (45 schools). The 800 university students were from two
universities located in two cities; Al-Beida and Al-Marj during the academic year
2007-2008. Of them, 400 students were from science and 400 students from art
discipline.
190
The Following tables showed descriptive statistics of the SPM score means according
to gender, region, geographic areas, study levels, academic discipline and age. Table
6.2 shows SPM score means and standard deviations according to the independent
variables.
Table 6.2 SPM score means and standard deviations

Gender Regions
Groups (N) Mean SD Mi Ma Regions (N) Mean SD Mi Ma
Males 1300 32.49 12.06 6 57 Cities 900 28.49 11.75 6 57
Females 1300 32.12 11.81 6 58 Villegas 900 28.18 10.51 6 51
Total 2600 32.31 11.94 6 58 Total 1800 28.33 11.15 6 57
Geographic areas Age
Main-C 600 28.66 11.95 6 57 8 180 15.82 6.33 6 43
Secondary-C 300 28.54 11.38 7 55 9 180 17.92 6.67 6 40
Coastal-V 300 28.50 10.53 7 50 10 180 20.89 7.99 6 42
Mountain-V 300 27.50 10.13 7 48 11 180 25.21 9.16 8 49
Dessert-V 300 28.12 10.76 6 51 12 180 28.65 8.89 9 48
Total 1800 28.33 11.15 6 57 13 180 32.10 8.50 9 49
Academic discipline. 14 180 33.42 8.21 8 52
Science 400 42.34 8.56 12 58 15 180 34.63 8.13 12 55
Arts 400 40.16 7.88 12 57 16 180 36.04 8.94 10 57
Total 800 41.25 8.29 12 58 17 180 38.62 8.54 12 55
Study levels 18 200 39.30 9.22 12 58
Elementary 720 19.96 8.38 6 49 19 200 41.22 8.30 16 57
Preparatory 540 31.39 8.77 8 52 20 200 41.91 7.90 12 56
Secondary 540 36.43 8.68 10 57 21 200 42.56 7.34 22 57
University 800 41.25 8.29 12 58 Total 2600 32.31 11.94 6 58
Total 2600 32.31 11.94 6 58 C = city & V= villages
Mi is the minimum score, Ma is the maximum score.
Based on gender, males mean scores were only slightly higher than females. Based on
regions, cities were only slightly higher than villages. Similarly, based on geographic
areas, the main city also showed slightly higher mean scores than other geographic
areas. In regards to age, score means increased as age increased; the highest score
means were achieved by 21 years old students. According to study levels, score means
increased as study levels increased; the highest score means were achieved by the
191
university level. Based on academic discipline, science students obtained a
significantly higher mean than arts students.
To establish the first research objective, which is to determine the psychometric
characteristics (reliability, validity, difficulty and discrimination) of the SPM test, the
following procedures were conducted:
1. Reliability of the SPM test was evaluated using three methods:
• Test-retest reliability.
• Split-half reliability.
• Alpha Cronbach reliability (Kuder-Richardson Formula 20).
2. Validity of the SPM test was investigated using two methods:
• Construct Validity using Factor analysis and internal consistency.
• Criterion-related validity, the student’s overall scores in final examinations
(SAA) were taken as an external criterion.
3. Item analysis to ascertain item difficulty and discrimination.
6.3 Reliability of the SPM Test
Reliability refers to the consistency of scores obtained by the same person when
retested with the same test or equivalent form. To establish the reliability of the SPM
test when used with the Libyan students, three different methods were employed. The
first method was split-half reliability with the total sample (N = 2600), the second
method was coefficient Alpha (KR-20) which also used with the total sample (N =
2600) and the third was test retest reliability with a sample of 280 students.
192
6.3.1 Test-retest reliability of the SPM test
The test-retest method was used to evaluate reliability; measure of the stability of
students’ scores over a period of time on the SPM test. The SPM test was
administered twice to a group of 280 Libyan students (140 males and 140 females).
The time interval between test-retest was two weeks. Table 6.3 showed the SPM test-
retest reliabilities according to age groups, gender and study levels.
Table 6.3 SPM test-retest reliabilities according to age, gender and study levels
AGE GROUPS STUDY LEVELS MALES FEMALES TOTAL
N r N r N r
8-11 Elementary 40 .86 40 .87 80 .87
12-14 Preparatory 30 88 30 .87 60 .88
15-17 Secondary 30 .88 30 .91 60 .91
18-21 University 40 .92 40 .91 80 .92
Total Sample 140 .89 140 .89 280 .90
The SPM test-retest reliability ranged from 0.86 for male students age groups 8-11
year (N=40) to 0.92 for males and females university students. The overall test-retest
reliability was 0.90.
6.3.2 Spilt-half reliability
The split half method was used to investigate the reliability of the SPM test. The SPM
items were divided into odd and even items, as the items are arranged in order of
difficulty (Kline 2000). The split-half reliability was then corrected by the Spearman-
Brown prophesy formula. Whereas it is a general formula that can be used to assess a
variety of different questions about test length and reliability, it is presented here
because it is extensively used in calculating the “corrected” split-half reliability
(Kline, 2000 and Kline, 2005). The reliability coefficients were computed separately
for male and female students, age and total sample. Table 6.4 showed the SPM split-
half reliabilities according to gender, age and total Sample.
193
Table 6.4 SPM split-half reliabilities according to gender, age and total Sample
AGE MALES FEMALES TOTAL
N SH (r.) SB N SH (r.) SB N SH (r.) SB
8 9 .77 .88 9 .85 .92 8 .81 .90
9 9 .84 .91 9 .77 .88 8 .80 .89
10 9 .79 .88 9 .84 .91 8 .83 .91
11 9 .90 .95 9 .88 .94 8 .89 .94
12 9 .80 .89 9 .87 .93 8 .84 .91
13 9 .82 .90 9 .85 .92 8 .84 .91
14 9 .83 .91 9 .84 .91 8 .84 .91
15 9 .81 .90 9 .87 .93 8 .84 .91
16 9 .88 .94 9 .89 .94 8 .89 .94
17 9 .86 .92 9 .89 .94 8 .88 .93
18 .87 .93 .88 .94 200 .88 .94
19 .90 .95 100 .86 .93 200 .88 .94
20 .90 .95 .88 .94 200 .89 .94
21 .91 .96 .86 .93 200 .89 .94
Total 1300 .92 .96 1300 .91 .96 2600 .92 96
SH (r.) = Split-half. SB = Spearman-Brown (SPSS provide SB).
Table 6.4 showed that the split-half reliability for the SPM test ranged from (0.77 to
0.92) and its Spearman-Brown (PS) correction ranged from (0.88 to 0.96). In total
sample the SPM split-half reliability was 0.96 (N=2600).
6.3.3 Alpha Reliability
The coefficient Alpha, equivalent to the Kuder-Richardson 20 (KR-20 coefficient),
determines how items in a test relate to other test items and to the total test. KR-20
formula provides reliability estimates that are equivalent to the average of the split-
half reliabilities computed for all possible halves. In addition, alpha (KR-20) is useful
for multiple choice items that were scored as right or wrong (Anastasi, Urbina 1997
and Mills, Airasian 2006).The reliability coefficients were computed separately for
gender, age and total sample. The results obtained were given in table 6.5.
194
Table 6.5 SPM Alpha reliabilities according to gender, age and total sample
AGE MALES FEMALES TOTAL
N Alpha N Alpha N Alpha
8 9 .85 9 .86 8 .86
9 9 87 9 .86 8 .87
10 9 .87 9 .90 8 .90
11 9 .92 9 .91 8 .92
12 9 .88 9 .93 8 .91
13 9 .90 9 .90 8 .90
14 9 .89 9 .90 8 .89
15 9 .88 9 .90 8 .90
16 9 .93 9 .91 8 .92
17 9 .91 9 .91 8 .90
18 .91 .94 200 .93
19 .93 100 .90 200 .91
20 .89 .93 200 .92
21 .93 .92 200 .93
Total 1300 .96 1300 .94 2600 .94
Table 6.5 showed alpha reliabilities (KR-20) for the SPM ranged from 0.85 (males
aged 8) to 0.96 (total males). In total sample the SPM alpha reliability (KR-20) was
0.94 (N=2600).
6.4 Validity of the SPM test
Validity is the degree to which a test measures what is supposed to measure and,
consequently, permits appropriate interpretation of scores. To determine the validity
of the SPM test two different methods were employed. The first method was
Construct Validity with the total sample (N = 2600), the second method was criterion-
related validity which was also used with the total sample (N = 2600).
6.4.1 Construct Validity
Construct validity refers to whether a scale measures or correlates with a theorized
psychological construct (Cronbach and Meehl, 1955). Construct validity is concerned
with the extent to which a test measures a specific trait or construct. The term
construct is used to refer to something that is not itself directly measurable but which
195
explains observable effect. In other words, construct validation is the systematic
analysis of test scores designed to assess whether there is a basis for validity. A
subtype of construct validity is factor analysis and internal consistency (Anastasi and
Urbina, 1997).
6.4.1.1 Factor analysis of SPM test
This procedure shows the extent to which a set of items measures the same underlying
construct or dimension of a construct (Anastasi (1988). To test the factorial analysis
validity of the SPM test scale, the intercorrelations between the five sets of the SPM
test initially were subjected to principal components factor analysis for male and
female separately to ascertain whether the items contained a general factor and
possibly other factors. In this procedure the number of significant factors is normally
taken to be those with eigenvalues greater than unity. An eigenvalue is the amount of
the total variance, deviation from the average weighted by the sample size, explained
by the corresponding factor (Tabachnick & Fidell 2007). Table 6.6 and figure 6.5
shows the results of the factor analysis of the SPM score means for the entire sample.
male and female students (N=2600, 8 to21 years) and extracted factor
SET CORRELATIONS FACTOR 1
A B C D E
A 0.67
B 0.63** 0.84
C 0.57** 0.71** 0.87
D 0.56** 0.70** 0.76** 0.85
E 0.46** 0.55** 0.61** 0.60** 0.68
Eigen value 3.47
% of variance 69.41
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy 0.871
Bartlett's Test of Approx. Chi-Square 7323.359
Sphericity df 10
Sig. 0.000
196
Figure 6.5 Screen Plot for the five Factors
Table 6.6 showed all the correlation coefficients that were statistically significant
(0.46 to 0.76). To indicate a moderate or higher relationship, correlation matrix coefficients
should be 0.3 or higher (r > 0.3) in the principal component analysis. One highly loaded
factor (from 0.67 to 0.87) was extracted which accounted for 69.41% of the common
variance which was Spearman’s “g”. These results indicate the internal consistency
and factorial validity as a result of the test items’ homogeneity. In addition, results
show the Kaiser-Meyer-Oklin value was 0.871, exceeding the recommended value of
0.6 (minimum value for good factor analysis) (Kaiser 1970, 1974 and Tabachnick &
Fidell 2007) and the Bartletts’ test of sphericity (Bartlett, 1954) reached statistical
significance (0.000), supporting the factorability of the correlation matrix. A further
investigation; factor analysis of the SPM test was computed based on gender. The
following tables (Table 6.7 and 6.8) and figures (Figures 6.6 and 6.7) showed factor
analysis of SPM score means for males and females respectively.
197
male students (N=1300, 8 to21 years) and Extracted Factor
A B C D E
A 0.70
B 0.64** 0.84
C 0.58** 0.70** 0.85
D 0.59** 0.70** 0.75** 0.86
E 0.46** 0.56** 0.60** 0.61** 0.69
Eigen value 3.49
% of variance 69.76
Sphericity df 10
Sig. 0.000
Figure 6.6 Screen Plot for the five Factors.
Table 6.7 showed all the correlation coefficients that were statistically significant
(0.46 to 0.75). One highly loaded factor (0.69 to 0.86) was extracted which accounted
for 69.76% of the common variance which was Spearman’s “g”. These results
indicated the internal consistency and factorial validity as a result of the test items’
homogeneity. Also, results showed that the Kaiser-Meyer-Oklin value was 0.874, and
198
the Bartletts’ Test of Sphericity reached statistical significance (0.000), supporting the
factorability of the correlation matrix.
female students (N=1300, 8 to21 years) and extracted factor
A B C D E
A .67
B 0.62** .84
C 0.56** 0.72** .88
D 0.54** 0.69** 0.78** .85
E 0.46** 0.55** 0.62** 0.59** .68
Eigen value 3.46
% of variance 69.22
Sphericity df 10
Sig. 0.000
Figure 5.7 Screen Plot for the five Factors
Table 6.8, showed all the correlation coefficients that were statistically significant
(0.46 to 0.78). One highly loaded factor (from 0.67 to 0.88) was extracted which
199
accounted for 69.41% of the common variance which was Spearman’s “g”. These
results indicated the internal consistency and factorial validity as a result of the test
items’ homogeneity. Also results showed that the Kaiser-Meyer-Oklin value was
0.865, and the Bartletts’ Test of sphericity reached statistical significance (0.000),
supporting the factorability of the correlation matrix.
6.4.1.2 Internal consistency validity
Internal consistency is a measure based on the correlations between different
subscales on the same test and on total score. It measures whether several subscales
that propose to measure the same general construct produce similar scores (Anastasi
(1988, and Anastasi & Urbina, 1997). Pearson product-moment correlation
coefficients between the five sets and the total scores of the SPM test were computed
for validity estimation. Table 6.9 shows correlations coefficients between the five sets
and the total scores of the SPM test for the entire sample.
SPM test (n=2600, age 8 to21 years)
SETS CORRELATIONS
Total A Total B Total C Total D Total E
Total A 1.000
Total B 0.64** 1.000
**
Total C 0.59 0.71** 1.000
Total D 0.56** 0.69** 0.74** 1.000
**
Total E 0.50** 0.57 0.62** 0.64** 1.000
**
Total 0.72** 0.84 0.85** 0.85* 0.74**
** Correlation is significant at the 0.01 level
The relationship between sub-scales and total scales scores of the SPM test was
evaluated using Pearson product-moment correlation coefficients. There were strong
and statistically significant positive correlation coefficients between the five sets (A,
200
B, C, D and E) and total scores, ranging from 0.50 to 0.85, n= 2600 (p<0.01). In
addition, the internal consistency of the SPM test was computed based on gender.
Table 6.10 shows correlations coefficients between the five sets and the total scores of
the SPM test for males and females respectively.
SPM test (males n=1300 and females n= 1300, age 8 to21 years)
MALE N= 1300 SETS CORRELATIONS
Total A 1.000
Total B 0.65** 1.000
Total C 0.58** 0.69** 1.000
** **
Total D 0.59 0.69 0.73** 1.000
** **
Total E 0.51 0.58 0.63** 0.65** 1.000
** ** ** **
Total 0.71 0.83 0.84 0.85 0.74**
Female n= 1300 Sets Correlations

Total A 1.000
Total B 0.64** 1.000
**
Total C 0.59 0.71** 1.000
**
Total D 0.54 0.68** 0.754** 1.000
Total E 0.50** 0.55** 0.62** 0.63** 1.000
** ** ** **
Total 0.72 0.85 0.87 0.85 0.74**
** Correlation is significant at the 0.01 level
The relationship between the five sets and the total scores of the SPM test was
investigated using Pearson product-moment correlation coefficients. There were
strong, positive correlation coefficients, statistically significant between the five sets
(A, B, C, D and E) and total scores ranging from 0.51 to 0.85 (p<0.01) for males and
0.50 to 0.87 (p<0.01) for females.
201
6.4.2 Criterion-related validity
To evaluate validation of the SPM with Students Academic Achievement (SAA) the
total of final examination scores was used as criterion to validate the SPM test
(predictive validity). This is the correlation between test scores and a criterion that
occurs at a later point in time. Also the second research objective focused on
establishing the relationship between SPM scores and student’s scores in final school
and university exams in all studied courses (SAA) and Pearson product-moment
correlations were used. Table 6.11 shows the correlation between the SPM scores and
the students’ academic achievement scores in final school and university exams in all
studied courses (SAA) according to age, levels of study, gender and total sample.
Table 6.11 Correlation between the SPM and achievement scores according to age,
level of study, gender, academic discipline and total sample
Age and level of study variables N= 2600
Elementary Preparatory Secondary University
N= 720 N= 540 N= 540 N= 800
Age r Age r Age r Age r
8 .56** 12 .41** 15 .37** 18 .37**
9 .41** 13 .39** 16 .43** 19 .50**
10 .37** 14 .33** 17 .50** 20 .47**
11 .41** Total .38** Total .43** 21 .41**
Total .44** Total .44**
Gender Variable N= 2600 Academic discipline Variable N=800

Gender r Discipline r
Male .42** Art .41**.
Female .43** science .51**
Total .42** Total .46**
(1) r = Pearson Correlation. (2)**. Correlation is significant at the 0.01 level.
Results in table 6.11 showed that the validity coefficients between the SPM scores
and students’ SAA ranged from 0.33 to 0.56. For arts students the correlation
between the SPM scores and their SAA was 0.41. The correlation for both samples
(science and arts scores; n =800) between the students SAA and their SPM scores was
202
0.46, which is statistically significant from 0.41. In general, all correlation coefficients
between SPM and students SAA were statistical significant for all groups.
6.5 Item Analysis of the SPM test
Item analysis was used in this study to investigate the difficulty and discrimination
power of the item. An item analysis was performed on the SPM test based upon the
total sample (N=2600) students. Table 6.12 showed the difficulty levels of the SPM
items, Table 6.13 showed item discrimination and Table 6.14 exhibited a summary for
the item analysis.
6.5.1 Item Difficulty
The SPM test consisted of 5 sets of items, lettered (A, B, C, D, and E). Each set
consists of 12 items which become progressively more difficult. Furthermore the level
of difficulty increases from set A to set E.
Item difficulty is defined as the percentage of students obtaining the correct answer to
an item. The higher the value of the difficulty index, the easier the item. Table 6.12
showed the item difficulty indices of the five SPM sets for total sample.
Table 6.12 Item difficulty (percentages of correct answers) and SPM Means of the
Correct Answers (N = 2600)
Set Diff 1 2 3 4 5 6 7 8 9 10 11 12
A Diff 100 99 97 95 94 92 74 75 82 70 45 34
B Diff 97 90 82 75 66 64 50 43 49 57 41 33
C Diff 79 76 69 65 63 49 54 40 51 30 23 9
D Diff 84 73 65 61 70 58 54 52 49 39 22 7
E Diff 60 42 40 26 24 23 21 12 11 7 5 4
SPM means of the percent of correct answers.
Set A B C D E
Means 0.79 0.62 0.57 0.58 0.35
203
It was clear from table 6.12 that 11 SPM items which were answered by 80 - 100 % of
the students appeared to be easy and 7 items were from section A. 42 SPM items
which were answered by 21 - 79 % of the students appeared to be moderate in
difficulty and 7 SPM items which were answered by less than 20 % of the students
appeared to be too difficult.
In addition, it was evident from table 6.12 that three items in set A (A7, A8 and A9);
four items in set B (B7, B8, B9 and B10); three items in set C (C7, C8, and C9); and
three items in set D (D3, D4 and D5) did not follow an order of increasing in
difficulty, whereas set (E) followed an order of increasing in difficulty.
According to the 2004 SPM manual, items should steadily increase in difficulty
within the series. In order to test this, as Raven claimed, the degree of difficulty of
the 60 items and five sets of the SPM test were measured by means of the percent of
correct answers. Table 6.12 showed the SPM means of the percent of correct answers
for each SPM set. Set D mean was higher than set C, which suggested that set D was
comparatively easier than set C. Inspection of the mean for each item and set showed
that only thirteen items and one set appear to be of misplaced difficulty.
6.5.2 Item Discrimination
The discrimination index showed whether items differentiate between people with
varying degrees of knowledge and ability. It is the percentage of the “high” group
passing the item, minus the percentages of the “low” group passing the item. Also
correlation coefficient obtained from point biserial is the measure of item
discrimination. The point biserial correlation between “pass/fail” on each item and
total test score was used to investigate the SPM item discrimination (Brown, 1983;
Anastasi 1988 and Anastasi, Urbina 1997; Roid and Barram 2004; Kline, 2000; Kline,
204
2005). The greater the correlation of the item the more discriminating the item is i.e. it
discriminates between higher and lower group more effectively. For an item to be
valid, the correlation between the items and total scores should be fairly high.
Hopkins (1998) suggested that the indices of item discrimination can be evaluated in
the following terms (table 6.13):
Table 6.13 Index of Discrimination and Items Evaluation

Index of Discrimination Item Evaluation
(a) 0.40 and up Excellent discrimination
(b) 0.30 to 0.39 Good discrimination
(c) 0.10 to 0.29 Fair discrimination
(d) 0.01 to 0.10 Poor discrimination
Negative Item may be miskeyed or intrinsically ambiguous
Hopkins suggestion was utilized to analyze the point biserial correlation data. The
point biserial correlation between “pass/fail” for each SPM item and total test score
were showed in table 6.14.
Table 6.14 Point biserial and significant level for each SPM item
Set 1 2 3 4 5 6 7 8 9 10 11 12
A -- .12** .35** .42** .50** .46** .63** .56** .61** .67** .62** .52**
B .24** .41** .54** .54** .65** .61** .57** .71** .72** .74** .69** .61**
C .58** .57** .70** .65** .71** .60** .72** .60** .65** .50** .49** .12**
D .60** .76** .76** .76** .77** .73** .71** .63** .66** .63** .38** .14**
E .60** .61** .63** .63** .67** .60** .49** .50** .48** .33** .20** .11**
**Significant at 0.001
Generally, correlations lay between (r = 0.11 and 0.77; p < 0.001) with a general
mean of (r = 0.44; p < 0.001). The 60 correlations calculated were significant and all
were so easy for this sample that they did not generate any variance and hence no
covariance was evident. Also table 5.11 showed that the correlations ranged from (r =
0.12 to 0.77; p < 0.001) with a mean of (r = 0.54; p < 0.001) for set A; from (r =
0.12 to 0.67; p < 0.001) with a mean of (r = 0.59; p < 0.001) for set B; from (r =
0.24 to 0.74; p < 0.001) with a mean of (r = 0.57; p < 0.001) for set C; from (r = 0.12
205
to 0.72; p < 0.001) with a mean of (r = 0.63; p < 0.001) for set D and from (r = 0.14
to .77; p < 0.001) with a mean of (r = 0.49; p < 0.001) for set E.
According to Hopkins (1998) this SPM test had 51 items as having excellent
discriminating value, 3 items as having good discriminating value and 5 items as
having fair discriminating value. With the remaining items, correlations ranged from
(r = 0.49 to 0.61; p < 0.001). This indicated that the SPM test showed many
discriminating items.
Table 6.15 showed a summary of tables 6.12 and 6.14. It showed numbers of difficult
items, discriminate items, item not in order of difficulty, order of difficulty for the
SPM sets and order of excellent discriminated sets for the SPM.
Table 6.15 Summary of item analysis of the five SPM sets

Set Item Difficulty Item Discrimination INO ODS EDS
>80 21-79 <20 >.44 <.44 (N) Set Set
A 7 5 - 8 4 (3) E C
B 3 9 - 10 2 (3) C B
C - 11 1 11 1 (4) D D
D 1 10 1 10 2 (3) B E
E - 7 5 9 3 (-) A A
Total 11 42 7 40 20 (13)
(1) INO = Items not in order of increasing difficulty.
(2) ODS= Order of increasing of difficulty from high to low for SPM sets.
(3) EDS = Excellent discriminated sets in order from high to low.
From table 6.15 the following conclusions were drawn:
1. As designed, set A is the easiest set whereas set E is the most difficult set. Set A
had 5 items with moderate difficulty level (less than .79); set B had 9 items; set C
had 11 items; set D had 10 items and set E had 7 items. The order of difficulty of
the SPM five sets according to the numbers of difficult items in each set in order
from high to low were E, C, D, B and A.
206
2. 40 out of 60 items had excellent discriminating value. Set A had 8 items, set B and
D had 10 items, set C had 11 items and set E had 9 items of excellent
discriminating value. The excellent discriminated SPM sets in order from high to
low was C, B, D, E and A.
3. 13 items were not arranged in order of increasing difficulty. Set D had 4 items,
set A, B and C had 3 items each. No items were found in set E.
207
6.6 Differences in SPM scores
As mentioned in the beginning of this chapter, one of the objectives of this study was
to investigate the presence of significant differences in sample performances on the
SPM test according to gender, region (cities and villages), academic
discipline(science and arts), geographic nature (main city, secondary city, coastal,
mountain and desert), age and study levels. In addition, significant differences in
sample performance on the SPM test according to region and gender, age and region,
region and study levels, geographic nature and gender, academic discipline and
gender, age and gender and age and academic discipline was carried out. The
investigation in the differences was as follows:
6.6.1 Difference according to gender
An independent t-test was carried out to compare the SPM score means in regards to
gender (table 6.16).
Table 6.16 Comparison of gender

Gender (N) Mean SD Std. Error Mean
Male 1300 32.49 12.06 .335
Female 1300 32.12 11.83 .328
t-test for Equality of Means
Levene's Test for 95%
Equality of Variances Confidence
F Sig. t df Sig.(2- Mean Std. Error Interval of the
tailed) Difference Difference Difference
Lower Upper
Equal .479 .489 .789 2598 .430 .370 .469 -.594 1.288
variances
assumed
Equal variances not .789 2597 .430 .370 .469 -.594 1.288
assumed
This table showed that there was no significant difference in mean scores between
males and females (male mean = 32.49, SD = 12.06 and females mean = 32.12, SD =
11.83; t (2598) = 0.789, p = 0.430). The magnitude of the differences in the means
208
(mean difference = 0.370, 95% CI:-.594 to 1.288) was very small (partial eta squared
= 0.019). SPSS did not provide eta squared values for t-test. It was however,
calculated using the information provided in the output.
6.6.2 Difference according to regions (cities and villages).
region (table 6.17).
Table 6.17 Comparison of regions

Region (N) Mean SD Std. Error Mean
Cities 900 28.49 11.75 .392
Villages 900 28.18 10.51 .350
Lowe Uppe
r r
Equal
variances 13.43 .000 -.588 1798 .556 -.309 .525 -1.340 .721
assumed
Equal variances not
assumed -.588 1777 .556 -.309 .525 -1.340 .721
As levene's test was significant, the t value when equal variances not assumed was
used (Pallant, 2007). There was no significant difference in scores for cities (mean
28.49, SD 11.75) and villages (mean = 28.18, SD = 10.51; t (1777) = -0.588, p =
0.556)). The magnitude of the differences in the means (mean difference = -0.309,
95% CI:-1.340 to .721) was very small (partial eta squared = -0.028). SPSS did not
provide eta squared values for t-test. It was, however, calculated using the information
provided in the output.
209
6.6.3 Difference according to academic discipline
academic discipline (table 6.18).
Table 6.18 Comparison of academic discipline

academic discipline (N) Mean SD Std. Error Mean
Science 400 42.34 8.56 .428
Arts 400 40.16 7.88 .394
Levene's Test for Equality 95%
of Variances Confidence
Lower Upper
Equal 2.537 .112 -3.76 798 .000 -2.178 .581 -3.32 -1.04
variances
assumed
Equal variances not -3.76 793 .000 -2.178 .581 -3.32 -1.04
assumed
Results showed that there was a statistically significant difference in scores between
arts discipline (mean 40.16, SD 7.88) and science discipline (mean = 42.34, SD =
8.56; t (798) = -3.76, p = 0.000) in favour of science students. The magnitude of the
differences in the means (mean difference = -2.178, 95% CI:-3.32 to -1.04) was large
(partial eta squared = -0.27). SPSS did not provide eta squared values for t-test. It
was, however, calculated using the information provided in the output.
210
6.6.4 Difference according to geographic areas
One way ANOVA was conducted to compare the SPM means for the geographic
areas (table 6.19) and post hoc Tukey test for multiple comparisons (table 6.20).
Table 6.19 Comparison of geographic areas

Geographic areas (N) Mean SD
Main city 600 28.66 11.946
Secondary city 300 28.54 11.379
Coastal 300 28.50 10.529
Mountain 300 27.50 10.131
Dessert 300 28.12 10.756
Total 1800 28.33 11.145
Source Sum of Squares df Mean Squares F. Ratio F. Prob.
Between Groups 309.571 4 77.393 .623 .646
Within Groups 223149.748 1795 124.317
Total 223459.320 1799
Table 6.20 Post Hoc Tukey (HSD) Test

(I) (J) Mean Std. Sig. 95% Confidence
Geographic Geographic Difference Error Interval
areas areas (I-J) Lower Upper
Bound Bound
Main city Coastal .160 .780 1.000 -1.97 2.29
Mountain 1.160 .780 .571 -.97 3.29
Dessert .534 .780 .960 -1.60 2.66
Secondary city .116 .820 1.000 -2.12 2.35
Secondary Coastal .045 .945 1.000 -2.53 2.62
city Mountain 1.045 .945 .804 -1.53 3.62
Dessert .418 .945 .992 -2.16 3.00
Main city -.116 .820 1.000 -2.35 2.12
Coastal Mountain 1.000 .910 .807 -1.49 3.49
Dessert .373 .910 .994 -2.11 2.86
Main city -.160 .780 1.000 -2.29 1.97
Secondary city -.045 .945 1.000 -2.62 2.53
Mountain Coastal -1.000 .910 .807 -3.49 1.49
Dessert -.627 .910 .959 -3.11 1.86
Main city -1.160 .780 .571 -3.29 .97
Secondary city -1.045 .945 .804 -3.62 1.53
Dessert Coastal -.373 .910 .994 -2.86 2.11
Mountain .627 .910 .959 -1.86 3.11
Main city -.534 .780 .960 -2.66 1.60
Secondary city -.418 .945 .992 -3.00 2.16
211
Participants were from five different geographic areas. The results showed that there
were no statistically significant differences in SPM scores for the five geographic
areas F (4, 1795) = 0.623, p = 0.646. The effect size, calculated using eta squared
(divide the sum of squares between-groups (309.571) by the total sum of squares
(223459.320) (Pallant, 2007)) the resulting eta squared value was 0.001, which
indicated a very small effect size. Post-hoc comparisons using the Tukey HSD test
indicated that there were no statistical significant differences between the five
different geographic areas.
6.6.5 Difference according to age.
One-way ANOVA was used to compare the SPM score means difference in regards to
age (table 6.21), and post hoc Tukey (HSD) test (table 6.22).
Table 6.21 Comparison according to age

Age (N) Mean SD Age (N) Mean SD
8 180 15.82 6.33 15 180 34.63 8.13
9 180 17.92 6.67 16 180 36.04 8.94
10 180 20.89 7.99 17 180 38.62 8.54
11 180 25.21 9.16 18 200 39.30 9.22
12 180 28.65 8.89 19 200 41.22 8.30
13 180 32.10 8.50 20 200 41.91 7.90
14 180 33.42 8.21 21 200 42.56 7.34
Total 2600 32.31 11.94
Source Sum of Squares Df Mean Squares F. Ratio F. Prob.
Between Groups 197151.289 13 15165.484 225.846 .000
Within Groups 173648.746 2586 67.150
Total 370800.035 2599
212
Table 6.22 Post Hoc Tukey (HSD) Tests
Age 8 9 10 11 12 13 14 15 16 17 18 19 20
8
9 .453
10 .000 .036
11 .000 .000 .000
12 .000 .000 .000 .005
13 .000 .000 .000 .000 .005
14 .000 .000 .000 .000 .000 .962
15 .000 .000 .000 .000 .000 .158 .980
16 .000 .000 .000 .000 .000 .000 .120 .936
17 .000 .000 .000 .000 .000 .000 .000 .000 .140
18 .000 .000 .000 .000 .000 .000 .000 .000 .008 1.000
19 .000 .000 .000 .000 .000 .000 .000 .000 .000 .105 .519
20 .000 .000 .000 .000 .000 .000 .000 .000 .000 .007 .082 1.000
21 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .005 .935 1.000
*. The mean difference is significant at the 0.05 level.
Participants were from fourteen different ages. There were statistically significant
differences (p = 0.05) in SPM scores for age F (13, 2586) = 225.846, p = 0.000. The
effect size, calculated using eta squared (divide the sum of squares between-groups
(3535.138) by the total sum of squares (8979.386) (Pallant, 2007)), the resulting eta
squared value was 0.53, which indicated a large effect size. Post-hoc comparisons
using the Tukey HSD test indicated that there were statistical significant differences
between the different ages except between the (8 and 9 years), (13 through 15 years),
(14 through 16 years), (16 and 17 years), (17 through 19 years), (18 through 20 years)
and (19 through 21 years) with the exception of higher mean scores for older student.
6.6.6 Difference according to study levels
One way ANOVA was conducted to compare the SPM means in regards to study
levels (table 6.23) and post hoc Tucky (HSD) test for multiple comparisons
(table 6.24)
213
Table 6.23 Comparison according to study levels
Study levels (N) Mean SD
Elementary 720 19.96 8.38
Preparatory 540 31.39 8.77
Secondary 540 36.43 8.68
University 800 41.25 8.29
Total 2600 32.31 11.94
Between Groups 183360.732 3 61120.244 846.504 .000
Within Groups 187439.303 2596 72.203
Total 370800.035 2599

(I) study (J) study Mean Std. Sig. 95% Confidence Interval
levels levels Difference Error Lower Upper
(I-J) Bound Bound
Elementary Preparatory -11.428* .484 .000 -12.67 -10.18
Secondary -16.471* .484 .000 -17.71 -15.23
University -21.288* .437 .000 -22.41 -20.17
Preparatory Elementary 11.428* .484 .000 10.18 12.67
Secondary -5.043* .517 .000 -6.37 -3.71
University -9.860* .473 .000 -11.08 -8.64
Secondary Elementary 16.471* .484 .000 15.23 17.71
Preparatory 5.043* .517 .000 3.71 6.37
University -4.817* .473 .000 -6.03 -3.60
University Elementary 21.288* .437 .000 20.17 22.41
Preparatory 9.860* .473 .000 8.64 11.08
Secondary 4.817* .473 .000 3.60 6.03
*The mean difference is significant at the 0.05 level.
Participants were from four study levels. There were statistically significant
differences in SPM scores between the four study levels F (3, 2596) = 846.504, p =
0.000. The effect size, calculated using eta squared (divide the sum of squares
between-groups (183360.732) by the total sum of squares (370800.035) (Pallant,
2007)), the resulting eta squared value was 0.49, which indicated a large effect. Post-
hoc comparisons using the Tukey HSD test indicated that there were statistical
214
significant differences between the different all study levels in favour of highest
levels.
6.6.7 Difference according to regions and study levels
Two-way ANOVA was conducted on SPM scores in regards to study levels and
regions (table 6.25).
Table 6.25 Comparison of the region according to study levels

Study levels Gender (N) Mean SD
Elementary Cities 360 19.97 8.58
Village 360 19.95 8.20
Total 720 19.96 8.38
Preparatory Cities 270 31.35 9.39
Village 270 31.42 8.10
Total 540 31.39 8.76
Secondary Cities 270 36.97 9.88
Village 270 35.90 7.28
Total 540 36.43 8.68
Total Cities 900 28.49 11.75
Village 900 28.18 10.51
Total 1800 28.33 11.15
Table 6.26 Levene's Test of Equality of Error Variances of SPM scores

F df1 df2 Sig.
7.570 5 1794 .000
Table 6.27 Tests of Between-Subjects Effects of SPM scores

Source Type III df Mean F Sig. Partial
Sum of Square Eta
Squares Squared
Corrected Model 91088.29 5 18217.66 246.901 .000 .408
Intercept 1513091.6 1 1513091.63 20506.649 .000 .920
Study levels 90933.52 2 45466.76 616.203 .000 .407
Region 51.35 1 51.35 .696 .404 .001
Study levels*Region 111.74 2 55.87 .757 .469 .001
Error 132371.03 1794 73.79
Total 1668165.23 1800
Corrected Total 223459.32 1799
a. R Squared = .408 (Adjusted R Squared = .406)
215
(I) Study (J) Study MD Std. Sig. 95% Confidence Interval
levels levels Error Lower Upper
Bound Bound
Elementary Preparatory -11.43* .489 .000 -12.58 -10.28
Secondary -16.47* .489 .000 -17.62 -15.32
Preparatory Elementary 11.43* .489 .000 10.28 12.58
Secondary -5.04* .523 .000 -6.27 -3.82
Secondary Elementary 16.47* .489 .000 15.32 17.62
Preparatory 5.04* .523 .000 3.82 6.27
The mean difference is significant at the .05 level. MD= Mean Difference (I-J)
The interaction effect between regions and study levels was not statistically
significant, F (2, 1794) = .757, P = .469. There was no statistically significant main
effect for region, F (1, 1794) = .696 P = 0.404; the magnitude of the effect size was
very small (partial eta squared = .001). Post-hoc comparisons using Tukey HSD test
showed that there were statistical significant differences between the different study
levels. The main effect for study levels, F (2, 1794) = 616.203, P =.000, exhibited
statistical significance.
It is worth noting that the Leven’s test was significant, indicating that group variance
is not equal. However a better method to ascertain homogeneity of variance was by
dividing the largest variance by the smallest variance in each group. A result of 2 or
above means the variance was unequal. All results were below 2 which indicated
equal variance (Field, 2006).
216
6.6.8 Difference according to regions and gender.
Two-way ANOVA was conducted for regions in regards to gender.
Table 6.29 Comparison of the regions according to gender

Regions Gender (N)sample Mean SD
cities Male 450 28.83 12.06
Female 450 28.14 11.44
Total 900 28.49 11.75
villages Male 450 28.36 10.91
Female 450 27.99 10.10
Total 900 28.18 10.51
Total Male 900 28.59 11.49
Female 900 28.07 10.79
Total 1800 28.33 11.15

F df1 df2 Sig.
7.401 3 1796 .000

Sum of Square Eta
Squares Squared
Corrected Model 179.032 3 59.677 .480 .696 .001
Intercept 1444705.915 1 1444705.915 11620.783 .000 .866
REGIONS 43.031 1 43.031 .346 .556 .000
SEX 124.636 1 124.636 1.003 .317 .001
REGIONS * SEX 11.365 1 11.365 .091 .762 .000
Error 223280.287 1796 124.321
Total 1668165.234 1800
a. R Squared = .001 (Adjusted R Squared = -.001)
Participants were divided into two groups according to the regions (cities and
villages). The interaction effect between regions and gender was not statistically
significant, F (1, 1796) = 0.091, P = 0.762. There was no statistically significant main
effect for regions, F (1, 1796) = 0.346 P = 0.556; the magnitude of the effect size was
very small (partial eta squared = .001). The main effect for gender, F (1, 1796) =
217
1.003 P = 0.317; did not exhibit statistical significance. The significant result of
Leven’s test was further tested as mentioned earlier. Variance was equal.
6.6.9 Difference according to age and region.
Two-way ANOVA was conducted for age in regards to region.
Table 6.32 Comparison of age according to region

Age Region N Mean SD Age Region N Mean SD
8 cities 90 15.99 6.13 13 cities 90 31.67 8.87
Villages 90 15.66 6.54 Villages 90 32.53 8.14
Total 180 15.82 6.33 Total 180 32.10 8.50
9 cities 90 17.92 6.16 14 cities 90 33.31 9.10
Total 180 17.92 6.67 Total 180 33.42 8.21
10 cities 90 20.56 8.23 15 cities 90 35.28 9.21
Total 180 20.89 7.99 Total 180 34.63 8.13
11 cities 90 25.42 9.46 16 cities 90 36.37 10.17
Total 180 25.21 9.16 Total 180 36.04 8.94
12 cities 90 29.08 9.78 17 cities 90 39.25 9.91
Total 180 28.65 8.89 Total 180 38.62 8.54
Total Region N Mean SD

cities 900 28.49 11.75
Villages 900 28.18 10.51
Total 1800 28.33 11.15

F df1 df2 Sig.
5.701 19 1780 .000
218
Sum of Square Eta
Squares Squared
Corrected Model 103802.844 19 5463.308 81.272 .000 .465
Intercept 1444705.915 1 1444705.915 21491.328 .000 .924
age 103536.949 9 11504.105 171.134 .000 .464
Region 43.031 1 43.031 .640 .424 .000
age * Region 222.864 9 24.763 .368 .950 .002
Error 119656.476 1780 67.223
Total 1668165.234 1800
Table 6.35 Post Hoc Tukey (HSD) test

Cities Age 8 9 10 11 12 13 14 15 16
8
9 .907
10 .021 .607
11 .000 .000 .010
12 .000 .000 .000 .151
13 .000 .000 .000 .000 .631
14 .000 .000 .000 .000 .047 .966
15 .000 .000 .000 .000 .000 .166 .898
16 .000 .000 .000 .000 .000 .015 .384 .998
17 .000 .000 .000 .000 .000 .000 .000 .083 .478
Villages Age 8 9 10 11 12 13 14 15 16
8
9 .577
10 .000 .086
11 .000 .000 .024
12 .000 .000 .000 .108
13 .000 .000 .000 .000 .004
14 .000 .000 .000 .000 .000 .997
15 .000 .000 .000 .000 .000 .950 1.000
16 .000 .000 .000 .000 .000 .026 .616 .870
17 .000 .000 .000 .000 .000 .000 .002 .012 .562
219
Figure 5.8 means score differences of age and region
Participants were divided into two groups according to region (cities and villages).
The interaction effect between region and age was not statistically significant, F (9,
1780) = .368, P = .590. There was no statistically significant main effect for region, F
(1, 1780) = .640 P = .424; the magnitude of the effect size was large (partial eta
squared = .47). The main effect for age, F (9, 1780) = 171.134 P = .000; was
statistical significance. Post-hoc comparisons using Tukey HSD test showed that in
cities, statistical significance were found between all age groups except between the
(8 and 9), (9 and 10), (11 and 12), (12 and 13), (13, 14 and 15), (14, 15 and 16) and
(15,16 and 17) ages. In villages, statistical significant differences were found between
all age groups except between the (8 and 9), (9 and 10), (11 and 12), (13, 14 and 15),
(14, 15 and 16), (15 and 16) and (16 and 17) ages. The significant result of Leven’s
test was further tested as mentioned earlier. Variance was equal.
220
6.6.10 Difference according to geographic areas and gender
Two-way ANOVA was conducted for geographic areas in regards to gender.
Table 6.36 Comparison of the geographic areas according to gender

Geographic areas Gender (N) Mean SD
Main city Male 300 29.16 12.38
Female 300 28.15 11.47
Total 600 28.66 11.95
Secondary city Male 150 28.60 11.41
Female 150 28.49 11.39
Total 300 28.54 11.38
Coastal Male 150 28.64 10.95
Female 150 28.35 10.13
Total 300 28.50 10.53
Mountain Male 150 28.10 10.71
Female 150 26.89 9.52
Total 300 27.50 10.13
Dessert Male 150 27.83 10.95
Female 150 28.42 10.59
Total 300 28.12 10.76
Total Male 900 28.59 11.49
Female 900 28.07 10.77
Total 1800 28.33 11.15

F df1 df2 Sig.
3.052 9 1790 .001

Sum of Square Eta
Squares Squared
Corrected Model 614.837 9 68.315 .549 .839 .003
Intercept 1295947.59 1 1295947.588 10409.709 .000 .853
GEOGRAPHIC
AREA 309.571 4 77.393 .622 .647 .001
GENDER 66.989 1 66.989 .538 .463 .000
GEOGRAPHIC
AREA * GENDER 180.630 4 45.158 .363 .835 .001
Error 222844.48 1790 124.494
Total 1668165.23 1800
a. R Squared = .003 (Adjusted R Squared = -.002)
221
(I) (J) MD Std. Sig. 95% Confidence Interval
Geographic Geographic Error Lower Upper
areas areas Bound Bound
Main city Coastal .16 .781 1.000 -1.97 2.29
Mountain 1.16 .781 .572 -.97 3.29
Dessert .53 .781 .960 -1.60 2.67
Secondary city .12 .821 1.000 -2.12 2.36
Secondary city Coastal .04 .945 1.000 -2.54 2.63
Mountain 1.04 .945 .804 -1.54 3.63
Dessert .42 .945 .992 -2.16 3.00
Main city -.12 .821 1.000 -2.36 2.12
Coastal Mountain 1.00 .911 .808 -1.49 3.49
Dessert .37 .911 .994 -2.11 2.86
Main city -.16 .781 1.000 -2.29 1.97
Secondary city -.04 .945 1.000 -2.63 2.54
Mountain Coastal -1.00 .911 .808 -3.49 1.49
Dessert -.63 .911 .959 -3.11 1.86
Main city -1.16 .781 .572 -3.29 .97
Secondary city -1.04 .945 .804 -3.63 1.54
Dessert Coastal -.37 .911 .994 -2.86 2.11
Mountain .63 .911 .959 -1.86 3.11
Main city -.53 .781 .960 -2.67 1.60
Secondary city -.42 .945 .992 -3.00 2.16
MD= Mean Difference (I-J)
The interaction effect between geographic areas and gender was not statistically
significant, F (4, 1790) = .213, P = .887. There was no statistically significant main
effect for geographic areas, F (4, 1790) = .622 P = 0.647; the magnitude of the effect
size was very small (partial eta squared = .003). Post-hoc comparisons using Tukey
HSD test showed that there were no statistical significant differences between the
different geographic areas. The main effect for gender, F (1, 1790) = .538, P =.463,
did not exhibit statistical significance. The significant result of Leven’s test was
further tested as mentioned earlier. Variance was equal.
222
6.6.11 Difference according to academic discipline and gender
Two-way ANOVA was conducted for academic discipline in regards to gender.
Table 6.40 Comparison of academic discipline according to gender

academic discipline Gender (N)sample Mean SD
Science Male 200 42.90 7.99
Female 200 41.78 9.07
Total 400 42.34 8.56
Arts Male 200 39.62 7.79
Female 200 40.70 7.94
Total 400 40.16 7.88
Total Male 400 41.26 8.05
Female 400 41.24 8.53
Total 800 41.25 8.29

F df1 df2 Sig.
2.193 3 796 .088

Sum of Square Eta
Squares Squared
Corrected Model 1189.264 3 396.421 5.874 .001 .022
Intercept 1361167.501 1 1361167.50 20167.61 .000 .962
DISCIPLINE 948.301 1 948.301 14.050 .000 .017
GENDER .061 1 .061 .001 .976 .000
GENDER*
240.901 1 240.901 3.569 .060 .004
DISCIPLINE
Error 53724.235 796 67.493
Total 1416081.000 800
Participants were divided into two groups according to academic discipline (arts and
science). The interaction effect between academic discipline and gender was not
statistically significant, F (1, 796) = 3.569, P = 0.060. There was statistically
significant main effect for academic discipline, F (1, 796) = 14.050 P = 0.000; the
223
magnitude of the effect size was a small (partial eta squared = .022). The main effect
for gender, F (1, 796) = .001 P = 0.976; did not exhibit statistical significance.
Leven’s equality test was not significant indicating that the group variance was equal.
6.6.12 Difference according to age and gender
A two-way ANOVA was conducted for age in regards to gender
Table 6.43 Comparison of age according to gender

Age Gender N Mean SD Age Gender N Mean SD
8 Male 90 15.51 6.23 15 Male 90 35.92 7.55
Female 90 16.14 6.44 Female 90 33.34 8.51
Total 180 15.82 6.33 Total 180 34.63 8.13
9 Male 90 17.04 6.60 16 Male 90 37.44 9.10
Female 90 18.79 6.66 Female 90 34.65 8.59
Total 180 17.92 6.67 Total 180 36.04 8.94
10 Male 90 18.81 6.97 17 Male 90 39.95 8.17
Female 90 22.97 8.43 Female 90 37.29 8.74
Total 180 20.89 7.99 Total 180 38.62 8.54
11 Male 90 26.90 9.49 18 Male 100 39.86 8.65
Female 90 23.53 8.55 Female 100 38.75 9.77
Total 180 25.21 9.16 Total 200 39.30 9.22
12 Male 90 28.44 7.92 19 Male 100 41.25 8.40
Female 90 28.86 9.81 Female 100 41.19 8.24
Total 180 28.65 8.89 Total 200 41.22 8.30
13 Male 90 32.40 8.31 20 Male 100 41.74 7.27
Female 90 31.80 8.73 Female 100 42.08 8.50
Total 180 32.10 8.50 Total 200 41.91 7.90
14 Male 90 33.52 8.00 21 Male 100 42.18 7.75
Female 90 33.31 8.46 Female 100 42.94 6.92
Total 180 33.42 8.21 Total 200 42.56 7.34
Total Gender N Mean SD

Male 1300 32.49 12.06
Female 1300 32.12 11.83
Total 2600 32.31 11.94
224
Table 6.44 Levene's Test of Equality of Error Variances
F df1 df2 Sig.
3.131 27 2572 .000

Sum of Square Eta
Squares Squared
Corrected Model 199685.204 27 7395.748 111.164 .000 .539
Intercept 2659929.375 1 2659929.375 39980.978 .000 .940
AGE 197151.289 13 15165.484 227.950 .000 .535
GENDER 94.098 1 94.098 1.414 .234 .001
AGE * GENDER 2445.059 13 188.081 2.827 .000 .014
Error 171114.832 2572 66.530
Total 3084246.234 2600

Age 8 9 10 11 12 13 14 15 16 17 18 19 20
8
9 .453
10 .000 .036
11 .000 .000 .000
12 .000 .000 .000 .005
13 .000 .000 .000 .000 .005
14 .000 .000 .000 .000 .000 .962
15 .000 .000 .000 .000 .000 .158 .980
16 .000 .000 .000 .000 .000 .000 .120 .936
17 .000 .000 .000 .000 .000 .000 .000 .000 .140
18 .000 .000 .000 .000 .000 .000 .000 .000 .008 1.000
19 .000 .000 .000 .000 .000 .000 .000 .000 .000 .105 .519
20 .000 .000 .000 .000 .000 .000 .000 .000 .000 .007 .082 1.000
21 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .005 .935 1.000
225
Figure 5.9 Means score difference of age and gender
The interaction effect between age groups and gender was statistically significant, F
(13, 2572) = 2.827, P = 0.000. There was a statistically significant main effect for age,
F (13, 2572) = 227.950 P = 0.000; the magnitude of the effect size was large (partial
eta squared = .54). Post-hoc comparisons using the Tukey HSD test indicated that
there were statistical significant differences between the different age except between
the (8 and 9 years), (13 through 15 years), (14, through 16 years), (16 and 17 years),
(17 through 19 years), (18 through 20 years) and (19 through 21 years) with the
exception of higher mean scores for older student. As a significant interaction result
was obtained, an analysis of simple effects was carried out, in which the sample
would be split into groups according to one of the independent variables and running
statistical tests to explore the effect of the other variable. So, to determine whether
there are statistically significance differences between either males or females score
means among different ages the sample was split according to age and an Independent
Samples test was employed to compare means. Results showed there were no gender
statistically significant differences at the ages of 8, 9, 12, 13, 14 and 18 through 19.
226
Female obtained statistically significant higher mean than male at the age 10 year. At
the ages of 11 and 15 through 17, male obtained significantly significant higher means
than female. However the main effect for gender, F (1, 2572) = 1.414, P =.234, did
not exhibit statistical significance. The significant result of Leven’s test was further
tested as mentioned earlier. Variance was equal.
6.6.13 Difference according to academic discipline and age
Two-way ANOVA was conducted for academic discipline in regards to age
Table 6.47 Comparison of academic discipline according to age

Age academic discipline (N) Mean SD
18 science 100 40.48 9.34
Arts 100 38.13 8.81
Total 200 39.30 9.22
19 science 100 41.19 7.59
Arts 100 41.25 8.99
Total 200 41.22 8.30
20 science 100 43.01 7.83
Arts 100 40.81 7.84
Total 200 41.91 7.90
21 science 100 44.67 7.66
Arts 100 40.45 6.36
Total 200 42.56 7.34
Total science 400 42.34 8.56
Arts 400 40.16 7.88
Total 800 41.25 8.29

F df1 df2 Sig.
4.776 7 792 .000
227
Sum of Square Eta
Squares Squared
Corrected Model 2595.849 7 370.836 5.614 .000 .047
Intercept 1361167.501 1 1361167.501 20605.755 .000 .963
AGE 1187.124 3 395.708 5.990 .000 .022
DISCIPLINE 948.301 1 948.301 14.356 .000 .018
AGE * 460.424 3 153.475 2.323 .074 .009
DISCIPLINE
Error 52317.650 792 66.058
Total 1416081.000 800
a R Squared = .047 (Adjusted R Squared = .039)

(I) Age (J) Mean Difference Std. Sig. 95% Confidence Interval
Age (I-J) Error Lower Bound Upper Bound
18 19 -1.91 .813 .087 -4.01 .18
20 * .813 .008 -4.70 -.51
-2.61
21 * .813 .000 -5.35 -1.16
-3.26
19 18 1.91 .813 .087 -.18 4.01
20 -.69 .813 .831 -2.78 1.40
21 -1.34 .813 .352 -3.43 .75
20 18 2.61* .813 .008 .51 4.70
19 .69 .813 .831 -1.40 2.78
21 -.65 .813 .855 -2.74 1.44
21 18 3.26* .813 .000 1.16 5.35
19 1.34 .813 .352 -.75 3.43
20 .65 .813 .855 -1.44 2.74
The mean difference is significant at the .05 level.
The interaction effect between academic discipline and age was not statistically
significant, F (3, 792) = 2.323, P = .074. There was a statistically significant main
effect for academic discipline, F (1, 792) = 14.356P = 0.000; the magnitude of the
effect size was very small (partial eta squared = .047). There was a statistically
significant main effect for age, F (3, 792) = 5.990, P = 0.000. Post-hoc comparisons
using the Tukey HSD test indicated that only the means score for age 18 year (M =
228
38.13, SD 8.81) was different from the 20 year (M = 40.81, SD 7.84) and from the 21
year (M = 40.81, SD 7.84). The significant result of Leven’s test was further tested as
mentioned earlier. Variance was equal. Furthermore, the magnitude of the difference
between groups in terms of standard deviation units (Cohen’s d) was calculated
(Pallant, 2007).
Table 6.51 Magnitude of gender differences in means score and variability on SPM as
functions of age, geographic areas and discipline.
Age (N 2600; male = 1300 and female = 1300).
age t sig Vr d IQ Point Pc IQs
8 -.663 .508 0.93 -0.01 -0.15 16 85
9 -1.767 .079 0.98 -0.26 -3.90 13 83
10 -3.608 .000 0.86 -0.52 -7.80 8 79
11 2.502 .013 1.23 0.37 5.55 4 74
12 .476 .757 0.65 -0.02 -0.30 7 78
13 .169 .634 0.90 0.07 1.05 9 80
14 .169 .866 0.89 0.03 0.45 8 79
15 2.152 .033 0.79 0.32 4.80 10 81
16 2.115 .036 1.24 0.31 4.65 10 81
17 2.106 .037 0.87 0.31 4.65 12 83
18 .851 .396 0.78 0.12 1.80 9 80
19 .051 .959 1.04 0.01 0.15 11 82
20 -.304 .762 0.72 0.04 0.60 11 82
21 -.732 .465 1.26 -0.10 -1.50 4 83
Score t sig Vr d IQ Point Pc IQs

Total .789 .430 1.04 0.03 0.45 10 81
Geographic areas (N 1800; male = 900 and female = 900).
Geographic areas t sig Vr d IQ Point
Main city 1.135 .287 1.16 0.03 0.45
Secondary city .006 .938 1.01 0.10 1.50
Coastal .057 .811 1.16 -0.05 -0.75
Mountain 1.073 .301 1.60 0.08 1.20
Desert .224 .637 1.07 0.01 0.15
Score t sig Vr d IQ Point

Total 1.002 .317 1.14 0.04 0.60
Academic discipline (N 800; male = 400 and female = 400).
Discipline t sig Vr d IQ Point
Science 1.304 .193 0.78 0.13 1.95
Arts -1.373 .171 0.96 -0.14 -2.10
total .030 .976 0.89 0.02 0.30
229
T values for the difference between males and females in each age group, t values for
the difference between males and females in each geographic nature, t values for the
difference between males and females in each academic discipline and t value for the
difference between males and females in the total sample, level of significance,
Cohen’s d scores (the difference between the male and female means divided by the
within group standard deviation; Cohen, 1977), the variance ratios (Vr, i.e. the
variance of the male divided by the variance of the female; Lynn and Irwing, 2004)
Vr’s greater than 1.0 indicate that males had greater variance than females, while Vr’s
less than 1.0 indicate that females had greater variance than males (Khaleefa and
Lynn 2008), IQ point differences between males and females in each age group as
well as in total sample, British percentile equivalents of the means of the male and
female combined on the British norms for the Standard Progressive Matrices collected
in 1979 and given in Raven (1981) , and these converted to IQs, were all calculated.
The results showed three interesting features. First, the British percentile equivalents
are the 16th PC for the 8 year olds (IQ=85), the 13th PC for the 9 year olds (IQ=83),
the 8th PC for the 10 year olds (IQ= 79), and average the 6.7th PC (IQ= 79.4) for the
11-17 year olds. The American percentiles percentile equivalents are the 9th PC for the
18 year olds (IQ=80), the 11th PC for the 19 and 20 years olds (IQ=82), the 4th PC for
the 21 year olds (IQ= 83), and average the 8.75th PC (IQ= 81.75). Overall, the IQs
obtained by the Libyan students range between 74 and 85. The average IQ for the
fourteen tested Libyan age groups 8 through 21 was 81.
Second, lack of significant gender differences in total means and in ages 8, 9, 12, 13,
14, 18, through 21. At the age 10 years, females obtained a significantly higher mean
than males. Males obtained statistically higher means than female at ages of 11 and 15
230
through 17. In total, males obtained a higher mean than females by 0.03d = (0.45 IQ)
points. Regarding geographic areas, results showed lack of significant gender
differences in total means and in all geographic areas means. In total, males obtained
a higher mean than females by 0.04d = (0.60 IQ) points. Concerning academic
discipline analysis also showed lack of significant gender differences in total means
and in each discipline (science & art) means. In total, males obtained a higher mean
than females by 0.02d = (0.30 IQ) points
Third, the gender difference in variability (Vr) in total sample and within each age
group, geographic areas and academic discipline can be seen from the standard
deviations and variance ratios. At the ages of 8, 9, 10, 12, 13, 14, 15, 17, 18 and 20
years old, females have greater variability than males. In total means score and at ages
of 11, 16, 19 and 21 years, males had greater variability than females. Concerning
geographic areas, results showed males have greater variability than females in total
sample and in each geographic area. Regarding academic discipline, results showed
females have greater variability than males in total sample and in each academic
discipline. These results showed no consistent tendency for gender difference in
variability.
231
6.7 Multiple Regression according to independent variables
To investigate the contribution of the independent variables; age, gender, region and
achievement in the prediction of the SPM scores, a multiple stepwise regression
method was used.
Table 6.52 Stepwise Regression for Independent Variables and the SPM Scores
Model Unstandardised Coffi. Standardised T Sign.
Coffi.
B Std. Error Beta
1- (Constant) 8.838 .545 .670 16.204 .000
age 2.599 .068 38.268 .000
2- (Constant) 7.929 .554 14.324 .000
age, achievement 4.230 .085 .575 26.194 .000
6.218 .001 .404 13.027 .000
Model Summary
Model R R Adjusted Stand. Error of
Square R Square Estimate
1- Age .670 .449 .449 8.276
2- Age, Achievement. .681 .464 .463 8.167
As age was equal in effect to study level, age was used in this analysis. Using the
Step-Wise method, a significant model emerged (Adjusted R square = 0.463; F 1, 1798 =
1464.428, p < 0.000). Significant variables are shown below:
Predictor Variable Beta p
Grade = (age) 0.670 p < 0.000
Achievement 0.404 p < 0.000
Gender was not a significant predictor (p = 0.989). Also region was not a significant
predictor (p = 0.986). This showed that both age and achievement were predictors for
SPM results with the age being a better predictor.
232
6.8 The Percentile Ranks of the SPM Score
The sixth research objective was “to compute the percentile ranks for the SPM scores
according to the significant variables”. Since Raven has used the percentiles to test
intelligence percentage and to determine the position of an individual among all the
individuals of the sample and of the same age, we also used the same scale
(percentiles). Age, gender and academic discipline have been taken into account. As
region was not a significant variable, its percentile ranks was not calculated. Table
6.53 showed detailed percentile 2007-2008 Norms for Libya students according to age
Table 6.53 detailed percentile 2007-2008 Norms for Libya students according to age
Percentile Age in years

18 1 2 2
0 6 1 47 7 8 49 52 50 53 54
22 8 1 42 3 48 50 48 51 52
18 1 6 2 5 40 0 3 4 46 46 47 48
6 8 20 6 5 39 41 42 43 43
12 2 7 8 8 29 2 33 35 37 37
0 2 4 5 29 29 32 33
5 9 9 0 2 6 9 19 20 20 25 29 30
N 180 180 180 180 180 180 180 180 180 180 200 200 200 200
To explain these results, a ten years old child gets 33 in the SPM test which is better
than 95% of the same sample at the same age because this score falls in the percentile
95 of the total sample. On the other hand, another 13 years old gets in the SPM test 33
but it is better than 50% in the sample of the same age. The same score, 33, puts an 18
year old in the percentile of 25. A 21 year old goes in the percentile of 10. Table 5.54
showed detailed percentile 2007-2008 Norms for the Libyan students according to age
and gender. Full range of the Libya norms according to age and each SPM score (1 to
60) can be found in appendix 1
233
Table 6.54 detailed percentile 2007-2008 Norms for the Libyan students according to
age and gender.
Age in years
8 9 10 11 12 13 14
Percentile MA FE MA FE MA FE MA FE MA FE MA FE MA FE
95 29 32 35 33 33 37 44 41 42 45 44 48 48 47
90 22 25 27 29 30 37 39 37 39 43 42 42 45 43
75 18 17 18 22 21 28 34 29 35 36 39 37 39 39
50 15 15 16 17 18 23 26 23 29 27 34 32 34 33
25 12 12 13 14 14 18 20 17 25 23 27 27 29 29
10 9 10 11 12 11 12 14 13 17 15 20 19 23 21
5 7 9 10 11 10 10 11 11 14 12 15 16 18 17
n 90 90 90 90 90 90 90 90 90 90 90 90 90 90
Age in years
15 16 17 18 19 20 21
Percentile MA FE MA FE MA FE MA FE MA FE MA FE MA FE
95 48 47 50 47 51 51 52 53 53 52 53 53 55 54
90 46 44 49 46 49 49 50 51 52 49 52 52 53 52
75 41 40 44 42 46 44 47 47 48 47 47 48 48 50
50 37 34 37 34 41 38 40 40 42 43 42 44 42 42
25 32 28 33 30 38 34 34 34 36 36 37 38 36 37
10 24 22 26 22 29 23 30 22 30 29 33 31 33 34
5 21 17 19 19 24 20 23 20 27 24 30 23 29 31
n 90 90 90 90 90 90 90 90 90 90 90 90 90 90
It is apparent from this table that differences between gender in some ages were
significant. For example, at age 10, differences were in favour of females. These
differences were also noticed in the percentiles from 0 to 7points. They vary by 4
points at 95th percentile, 7 points at 90th percentile, 7 points at 75th percentile, 5 points
at 50th percentile, 4 points at 25th percentile, 1 point at 10th percentile and 0 points at
5th percentile. Another example, at age 17, differences were in favour of males. These
differences were also noticed in the percentiles from 0 to 4points. They vary by 3
points at 95th percentile, 4 points at 90th percentile, 2 points at 75th percentile, 3 points
at 50th percentile, 3 points at 25th percentile, 4 points at 10th percentile and 0 points at
5th percentile. Table 5.55 showed detailed percentile 2007-2008 Norms for Libyan
students according to age and study discipline
234
Table 6.55 Detailed percentile (2007-2008) Norms for Libyan students according to
age and academic discipline
Percentile Age in years
18 19 20 21
Disciplines SC AR SC AR SC AR SC AR
95 55 51 53 53 54 53 55 51
90 53 48 51 51 52 51 53 48
75 48 44 49 47 48 47 52 46
50 42 38 44 42 45 41 45 40
25 37 34 38 35 38 37 39 36
10 25 29 30 29 35 30 34 33
5 22 20 27 24 27 26 32 30
n 100 100 100 100 100 100 100 100
It can be seen that difference between the percentile scores of Libyan science students
and arts students; e.g. (Sciences student 18 years) is from 7 to 14 points. They differ
by 4 points at 95th percentile, 5 points at 90th percentile, 5 points at 75th percentile, 6
points at 50th percentile, 3 points at 25th percentile, 4 points at 10th percentile and 2
points at 5th percentile.
Percentile ranks indicated that performance of Libyan students on the SPM test is
lower than subjects from other countries. Assessed against the SPM manual (1988,
1996, 2003, 2004 and 2008) data, Libyan students were below norms given for some
western countries. A comparison of the present data with the SPM norms given for
Taiwan (1989), India (1992), Netherlands (1992), France (1998), Turkey (1993),
Kosice & Slovakia (1987), British (1979 & 1992), Australia (1986), China (1986),
United States of America (1979 & 1992) and Slovenia (1998) and in the 1988, 1996,
2003, 2004 and 2008 SPM manuals according to the same age group, all indicated
that Libyan students were below the norms of the above countries (Appendix 2).
235
6.9 Chapter Summary
This chapter presented the results of the statistical analysis performed on the data
collected for this study. The SPM test was administered to 2600 students; 1800 school
students (900 males and 900 females) and 800 university students (400 males and 400
females). According to region, 900 school students were from cities, whereas the
remaining 900 were from villages. The university students (400 science students and
400 art students) were from two universities located in two cities; Al-Beida and
Al-Marj in the 2007-2008 academic year.
The overall SPM score means was 32.31with a standard deviation of 11.94 (minimum
scores 6 and maximum 58). Using the British and American percentiles, the SPM
scores were converted to IQ scores. Overall, the IQs obtained by the Libyan students
ranged between 74 and 85. The average IQ for the fourteen tested Libyan age groups
8 to 21 years was 81.
Test-retest, split-half reliability and alpha Reliability (KR 20) procedures were used to
investigate the SPM reliability. Test-retest reliability was .90 (N = 280), split-half
reliability for the total sample was .96 (N = 2600) and Alpha reliability was .94 (N =
2600). The results, in general, were in agreement with previous research and
supported the validity and reliability of the SPM test with Libyan sample.
Construct validity (factor analysis and internal consistency) and criterion-related
validity methods were used to establish validity of the SPM test; construct validity
factor analysis showed only one significant factor; Spearman’s “g”. Eigenvalue =
3.47; (69.41% of the variance). In addition, internal consistency results showed strong
positive correlation coefficients (0.50** to 0.85**) between the five subsets and the
236
SPM total score. According to criterion-related validity, analysis showed correlations
(0.33** to 0.56**) between SPM scores and (SAA) as an external criterion.
Item analysis was carried out for the SPM 60 items (N=2600). The SPM item
difficulty indicated that there were 42 items which appeared to be moderate in
difficulty, 11 items appeared to be easy and 7 items appeared to be too difficult. Based
on SPM order of difficulty, results indicated that there were 13 items (three items in
set A , four items in set B, three items in set C and three items in set D, whereas set
(E) followed an order of increasing in difficulty) and one set (D) that did not follow
an order of increasing in difficulty. In regards to items discrimination, SPM test
showed 51 items as having excellent discriminating value, 3 items as having good
discriminating value and 5 items as having fair discriminating value
The results of SPM reliability, validity and item analysis indicated that the SPM test
may be considered as an appropriate measure of mental ability for Libyan students. In
summary it may provide a promising tool for the measurement of mental ability in
Libyan setting.
Normality testing was carried out and showed that the collected data were normally
distributed which warranted the use of parametric tests. In order to test the differences
between SPM score means, independent sample t-test, one and two way ANOVA
statistical tests were used. In addition, the relationships between SPM test scores and
Student's Academic Achievement (SAA) was evaluated using Pearson Product-
Moment Correlation coefficient. A stepwise analysis was employed to investigate
which independent variable was the best predictor of SPM scores. The investigation
of these analyses was as follows:
237
1. There was no gender differences on SPM means score in total sample as well as
in ages 8, 9, 12, 13, 14, 18, through 21. However, females obtained significantly
higher SPM means than males at age of 10 years. Whereas, males scored
significantly higher means than female at the ages of 11 and 15 through 17. In
addition, there were no significant gender differences in total means and in each
region means. Also there was a lack of significant gender differences in total
means and in each discipline means (science & art). Thus, the gender variable was
not an important factor affecting the Libyan students’ scores on the SPM test.
2. In regards, gender differences in variability on SPM test; at the ages of 8, 9, 10,
12, 13, 14, 15, 17, 18 and 20 years females had greater variability than males. At
ages of 11, 16, 19 and 21 years males had greater variability than females, as well
in total sample. Also males had greater variability than females in total sample and
in each region. Whereas females had greater variability than males in total sample
and in each academic discipline. Consequently, results indicated no consistent
tendency for a gender difference in variability.
3. There was no significant difference in sample performance on the SPM test
according to region. Thus, the region variable was not an important factor
affecting the Libyan students’ scores on the SPM test. Whereas there was a
significant difference in regards to age as well as study levels. Thus, the region
variable was not an important factor affecting the Libyan students’ scores on the
SPM test. On other hand, age and study levels were important factors.
238
4. Students from science discipline had significantly higher SPM mean scores than
students from art discipline. Thus, the academic discipline was an important factor
affecting the Libyan students’ scores on the SPM test.
5. Significant coefficients between the SPM scores and students’ SAA ranged from
0.33 to 0.56. In general, all correlation coefficients between SPM and students
SAA were statistical significant for all groups.
6. A multiple regression for Libyan students indicated that both age and achievement
were predictors for SPM results with the age being a better predictor. Whereas
gender and region were not significant predictors.
7. The performance of Libyan students on the SPM can be considered lower than
students from other countries. Assessed against the SPM manual (1988, 1996,
2003, 2004 and 2008) data, Libyan students were below norms given for all
developed countries.
The next chapter presents the meta-analysis method. Moreover the outcomes of this
chapter, which are entirely about the SPM test for a Libyan sample, will be compared
to other studies carried out in various developed and developing countries.
239
Chapter seven: META-ANALYSIS
7.1 Introduction
It has became widely accepted that the best way to resolve issues on which there are a
large number of studies is to carry out a meta-analysis. The 1980s and 1990s witnessed a
rapid upsurge of this statistical approach (Anastasi and Urbina, 1997). Meta-analysis
summarizes the results of many quantitative studies that have investigated the same
problem. It provides a numerical way of expressing the average result of a group of
studies. It delineates specific procedures for finding, describing, classifying, and coding
research studies to be included in a meta-analysis review, and for measuring and analysis
of findings. A central characteristic that distinguishes meta-analysis from more traditional
approaches is the emphasis placed on making the review as inclusive as possible. This
technique was first proposed by Glass (1976) and by the end of the 1980s it had become
accepted as a useful method for synthesizing the results of many different studies.
Glass distinguished between the primary, secondary, and meta-analysis of research.
Primary analysis is the original analysis of data in a research study. Secondary analysis is
re-analysis of data for the purposes of answering the original research question with
better statistical techniques, or answering new questions with all data. Meta-analysis
refers to the analysis of analyses; the statistical analysis of a large collection of analysis
results from individual studies for the purposes of integrating the findings. It connotes a
rigorous alternative to the casual, narrative discussion of research studies which typify
our attempts to make sense of the rapidly expanding research literature.
240
It contributes in the creation of new knowledge synthesized from existing studies. The
literature explosion has resulted in a massive amount of information that must be
analyzed and summarized in order to be useful. Quantitative methods of integration of
research results have been used for many years and have received a great amount of
attention (Abraham et al., 1991).
Meta-analysis usually involves three major phases; the three “Ps”: preparation,
performance, and presentation. This sequence is the same as for any other type of
research. The project must be planned in advance, then systematically carried out, then
followed by reporting of results (Abraham et al., 1991).
Any statistical procedure or analytic approach can be misused or abused. As Green and
Hall (1984) aptly stated “Data analysis is an aid to thought, not a substitute”. Most of the
criticisms of quantitative approaches to reviewing the literature are objections to the
misuse or abuse, real or potential, of meta-analysis.
7.2 Advantages of Meta-analysis
Carrying out a meta-analysis includes the following advantages:
• It increases power and leads to stronger conclusions because more studies can be
analyzed with statistical methods than in impressionistic literary review. Often
this can bring effects into sharper focus, particularly when the results of all studies
are not consistent (Higgins and Green, 2006).
• Meta-analysis does not prejudge or exclude some studies as unworthy because of
their particular research designs, however weak. By empirically examining the
241
effects of research quality on study findings, meta-analysis is likely to be more
objective than traditional literary reviews (Wolf, 1986).
• It can answer questions not posed by the individual studies (Higgins and Green,
2006).
• It can settle controversies arising from apparently conflicting studies (Higgins and
Green, 2006).
7.3 Disadvantages of Meta-analysis
Disadvantages of Meta-analysis include the following:
• It oversimplifies the results of a research domain by focusing on the overall
effects and downplaying mediating or interaction effects. The better examples of
meta-analyses built potential mediating factors into their designs rather than
ignoring them. They do this by coding the characteristics of studies to empirically
examine whether such interactions exist. In practice, many meta-analyses do not
provide sufficient attention to possible interaction effects (Wolf, 1986).
• Meta-analysis of poor quality studies may be seriously misleading (Higgins and
Green, 2006).
• Decisions regarding inclusion and exclusion criteria of studies are inevitably
subjective. In some cases consensus may be hard to reach (Higgins and Green,
2006).
• Meta-analysis in the presence of serious publication and/or reporting bias may
produce an inappropriate summary (Higgins and Green, 2006).
242
7.4 Literature review
A thorough investigation into the literature revealed three meta-analysis studies carried
out; two published and one unpublished. The two published studies examined the SPM
test in relation to gender differences while the unpublished meta-analysis study examined
the SPM test in relation to gender and age groups.
In 2004, Lynn and Irwing (2004) conducted a meta-analysis to investigate sex differences
on the progressive matrices. About 57 studies were included and they studied sex
differences on standardized and advanced progressive matrices and on colored
progressive matrices. Results showed that there was no difference among children aged 6
to14 years, and that males obtained higher means than females from the age of 15
through to old age.
The same researchers in 2005 carried out a meta-analysis studying the sex differences in
means and variability on the progressive matrices in university students. 22 studies were
identified and analyzed. This meta-analysis disconfirmed the frequent assertion that there
was no sex difference in the mean and that males have greater variability. It showed that
males obtained a higher mean than females. The SPM tests showed greater variability
among females while the APM studies showed no significant difference in variability.
Abdalla et al. (2002) carried out a meta-analysis in sex and age differences in SPM
results. As all collected studies used the SPM test as a measuring tool, they used the
means as a measure of effect size. Their unpublished data showed insignificant
differences between males and females, but showed statistically significant differences
243
between all age groups; below 13 years group, 13 to 19 years group and 19 to 22 years
group. Higher age groups had higher mean scores than lower age groups.
7.5 Method
The aims and objectives of this meta-analysis were:
• Investigate the presence of significant differences among sample performances on
Raven’s Standard Progressive Matrices test according to the development status
of countries (Libya, developing and developed countries).
• Investigate the presence of significant differences among sample performances on
Raven’s Standard Progressive Matrices test according to age groups and gender.
• Investigate the presence of significant differences in sample performance on the
SPM test according to development status of countries and age groups or
development status of countries and gender or age groups and gender.
• Investigate variability of SPM means score based on development status, gender
based on developed status and gender based on age groups.
• To investigate the contribution of the independent variables; age groups, gender
and development status in the prediction of the SPM scores
7.5.1 Criteria for studies selection
Using available databases, an extensive and thorough search for studies to be included in
the meta-analysis has been carried out. Criteria for selection of studies included the
following:
244
• First the study must investigate the area of interest of the meta-analysis.
• Second the study must provide information regarding the research design,
subject’s information and measurement tool used in this study,
• Third the study must provide sufficient statistical information as SPM mean
scores.
A careful review of relevant studies published on the SPM test from computer databases,
dissertation and bibliographies of review articles produced 44 studies. These studies were
carried out in various countries between 1948 and 2009. From each relevant study the
following data were recorded and coded: (a) Author (b) Country (c) Year of publication;
(d) Population sampled; (e) Age (f) SPM mean’s and standard deviation’s and (g) Sample
size.
Table 7.1 studies included in the meta-analysis

COUNTRY YEARS REFERENCES
Congo 1994 Nkaya et al.,
Denmark 1968 Vejleskov,
Estonia 2000 Lynn, et al.,
France 1994 Nkaya et al.,
Iceland 2003 Pind, et al.,
India 1968; 1968; 1972; Sinha, Mehot, Mohan, Rao and Sinha,
1974 and 1977
Iran 1974 Baraheni,
Israel 1991 Kaniel, & Fisherman,
Kuwait 2006 Abdel-Khalek and Lynn
Libya 1983; 2005 and 2005 Aboujaafer, Attashan and Abdalla, Ahlam
Mexico 2004 Lynn, et al.,
Nigeria 1980 Maqsud,
Oman 2009 Abdel-khalek and Lynn
Qatar 1986,2009 Bart et al.,
Pakistan 2008 Ahmed, et al.,
Slovenia 2007 Boben,
South Africa 2002 Rushton, et al.,
Sudan 2008.a Kalefeefa, et al
Syria 2008.b Kalefeefa & Lynn
Tanzania 1967 Klingelhofer,
245
Turkey 1993 Duzen, et al.,
UK 1989 and 1994 Egan and van den
USA 1948; 1968; 1969; Rimoldl, Tulkin & Newbrough, Burke &
1985; 1986.a.b; 1987; Bingham, Burke, Powers et al., Sidles & Avoy,
1988; 1988; 1986; Jensen et al., Karnes & Whorton, Bart et al.,
1987 & 1988; 1994 Whorton & Karnes, Johnson et al., and
and 1994 Blennerhssett et al.,
7.5.2 Strategy of analysis
Data have been organized into three categories; first based on development status either
developed or developing countries, second based on age groups and third based on
gender.
The key feature of meta-analysis is that each study’s results are translated into an effect
size. Effect size is a numerical way of expressing the strength or magnitude of a reported
relationship. It represents a significant improvement over traditional methods of
summarizing literature (Mills & Airasian 2006). Many effect size statistics are available
and choosing which one to be used depends on the nature of data collected.
The nature of data reported in the SPM tests is numerical continuous data and means
were calculated using the same scale, which was the SPM test itself. The term
‘continuous’ in statistics conventionally refers to data that can take any value in a
specified range. When dealing with numerical data, this means that any number may be
measured and reported.
In the presence of continuous numerical data obtained using a same scale, the means of
the studies can be used as a measure of effect size (Higgins & Green 2006). SPSS 16.0
statistics software was used to carry out the statistical analysis of the meta-analysis.
246
SPSS was carried out in the following manner:
• First descriptive statistics investigating frequency distributions, means, and
standard deviations.
• Second Kolmogorov-Smirnov, Shapiro-Wilk test and normal probability plots
were used to determine normality of the data.
• Third independent sample t-test was used to compute differences between SPM
test means among different studies according to gender.
• Fourth One-Way Analysis of Variance was used to compute differences between
SPM test means among different studies according to the development status of
countries and age groups.
• Fifth Two-Way Analysis of Variance was used to compute differences between
SPM test means among different studies according to both; development status of
countries and age groups variables or development status of countries and gender
variables or age groups and gender variables. In addition, this method was used to
investigate the individual and joint interaction effect of independent variables on
SPM scores.
• Sixth To investigate the effect size of the SPM means by calculation of Cohen’s
d, which is equal to the subtraction of the means divided by the mean of the
standard deviation. In addition, Cohen’s d was used to calculate IQ point
difference which is equal to d multiplied by the SD (15).
247
• Seventh To evaluate the variability (variance ratios); Vr + the average of the
squared differences from the mean (Lynn and Irwing, 2004).
• Eighth To convert SPM means score to IQ scores using British and American
percentile indices and a conversion table from percentiles to IQ scores.
• Ninth Multiple regression, a stepwise analysis method was used to investigate
which independent variable (development status, age and gender) is the best
predictor of SPM scores.
7.6 Results
An extensive review of the studies was carried out and data were organized based upon
categorizes mentioned earlier into:
(a) Development status group; developed countries, developing countries and Libya.
(b) Four age groups; 8-11 years, 12-14 years, 15-17 years and 18-21 years.
(c) Gender groups; males, females.
Using SPSS, data collected for the meta-analysis was investigated for normality. Both
Kolmogrovo-Smirnov and Shapiro-Wilktests were carried out. The resultant p value was
0.200 and 0.308 respectively. Both values were well above 0.05, which indicated that the
data were normally distributed. This allowed the use of parametric tests to investigate and
evaluate presence of statistically significant differences among the data.
Following is the descriptive statistics for the overall collected data for the meta-analysis
and tests of normality.
248
Table 7.2 Descriptive statistics for means scores of overall collected data and tests of
normality.
Statistic Std Error
Mean 34.9755 .74322
95% confidence Lower Bound 33.5049
Interval for Mean Upper Bound 36.4462
5% Trimmed Mean 35.0786
Median 35.9750
Variance 70.704
Std. Deviation 8.40856
Minimum 12.65
Maximum 52.76
Range 40.11
Interquartile Range 10.4175
Skewness -.271 .214
Kurtosis -.080 .425
Tests of normality
Kolmogorov-smirnov Shapiro-Wilk
Statistic df Significant Statistic df Significant
.062 128 .200 .988 128 .308
scores
16 60
14
50
12
40
10
8 30
6
20
4
Frequency
93
Std. Dev = 8.41 10

2
Mean = 35.0
0 N = 128.00 0
N= 128
12
16
20
24
28
32
36
40
44
48
52
scores
.0
.0
.0
.0
.0
.0
.0
.0
.0
.0
.0
scores
Figure 7.1 the distribution for means scores. Figure 7.2 Box plot of scores distribution.
249
Figure 7.3 Normal Q-Q plot. Figure 7.4 Detrended normal Q-Q plot.
.
Figure 7.1 is a histogram showing the SPM scores. They appeared to be normally
distributed. Figure 7.2 showed a box plot. 50% of scores are represented by the
rectangular, while the line inside the box represents the median value, whereas the
whiskers represent the highest and lowest values. Figure 7.3 showed a normal probability
plot (normal Q-Q plot). Here the observed value of each mean is plotted against its
expected value. A reasonable straight line suggested a normal distribution. Figure 7.4
showed the detrended normal Q-Q plot, where the actual deviation of the scores from the
straight line are plotted. Most scores were collected around the zero line with no real
clustering of scores. This indicated a normal distribution.
250
7.6.1 SPM means and standard deviations according to the independent variables
Following tables show descriptive statistics of SPM score means according to
development status, age groups and gender.
Table 7.3 showing SPM score means and standard deviations according to independent
variables.
SPM Scores Development status
Groups (N) sample Mean SD (N) Group
Developed Countries 9514 38.88 8.61 44
Developing Countries 19579 33.10 7.31 70
Libya 2600 32.31 9.02 14
Total 31693 34.98 8.41 128
AGE
8- 11 years (Primary) 8309 27.33 7.63 35
12-14 years. (prep) 9924 34.94 6.71 44
15-17 years. (Secondary) 8991 40.09 5.31 28
18-21 years (University) 4469 40.97 6.21 21
Total 31693 34.98 8.41 128
gender
Males 11961 33.95 8.95 93
Females 11423 33.82 9.00 91
Total 23384 33.88 8.95 184
Based on development status, the developed countries showed the highest mean score
while Libya showed the lowest. Based upon age groups, score means increased as age
increased; the highest score means were achieved by the 18-21 years age group.
According to gender, males were only slightly higher than females when SPM score
means were compared.
Using SPSS, seven meta-analysis procedures were carried out to investigate statistical
significant differences between SPM score means based upon the independent variables,
as follows:
251
7.6.2 Differences in SPM scores
7.6.2.1 Difference according to development status
One-way ANOVA was used to compare the SPM score means for the development status
group.
Table7.4 Comparison of the SPM Mean according to development status

Develop. status (N)sample Mean SD (N) Group
Developed 9514 38.88 8.61 44
Developing 19579 33.10 7.31 70
Libya 2600 32.31 9.02 14
Total 31693 34.98 8.41 128
Source Sum of Squares Df Mean Squares F. Ratio F. Prob.
Between Groups 1036.658 2 518.329 8.157 .000
Within Groups 7942.728 125 63.542
Total 8979.386 127
Table 7.5 Post hoc tests multiple comparisons of SPM scores (Tukey HSD)
(I) (J) Mean Std. Error Sig. 95% Confidence Interval
Develop. Develop. Difference Lower Bound Upper Bound
status status (I-J)
Developed developing 5.7818 1.53358 .001 1.9825 9.5810
Libya 6.8222 2.44598 .023 .7626 12.8818
developing developed -5.7818 1.53358 .001 -9.5810 -1.9825
Libya 1.0404 2.33376 .905 -4.7412 6.8220
Libya developing -1.0404 2.33376 .905 -6.8220 4.7412
developed -6.8222 2.44598 .023 -12.8818 -.7626
*The mean difference is significant at the .05 level.
Tables 7.4 and 7.5 showed the effect of development status on SPM means scores.
Subjects were divided into three groups; developed, developing and Libya. There were
statistically significant differences (p =.05) in SPM scores for the three development
status groups: F (2, 125) = 8.157, p = .000. The effect size, calculated using eta squared
(to divide the sum of squares between-groups (1036.658) by the total sum of squares
(8979.386) (Pallant, 2007)), the resulting eta squared value was 0.12, which indicated a
large effect. Post-hoc comparisons using the Tukey HSD test indicated that the mean
252
score for the developed group (M =38.88, SD = 8.61) was significantly different from the
developing group (M = 33.10, SD = 7.31) and from the Libya group (M = 32.31, SD =
9.02). The developing group did not differ significantly from the Libya group. Based
upon these results it was decided to combine Libya with the developing countries group,
so the development status group was categorized into developed and developing
countries only.
7.6.2.2 Difference according to age groups
One way ANOVA was conducted to compare the SPM means for the age group.
Table 7.6 Comparison of the SPM Mean scores according to age groups
Age Groups (N)sample Mean SD (N) Group
8-11 8309 27.33 7.63 35
12-14 9924 34.94 6.71 44
15-17 8991 40.09 5.31 28
18-21 4469 40.97 6.21 21
Total 31693 34.98 8.41 128
Between Groups 3535.138 3 1178.379 26.839 .000
Within Groups 5444.248 124 43.905
Total 8979.386 127
Table 7.7 Post Hoc Tests Multiple Comparisons of SPM Scores (Tukey HSD)
(I) Age (J) Mean Difference Std. Error Sig. 95% Confidence Interval
groups Age (I-J) Lower Bound Upper Bound
groups
8-11 12-14 -7.6188 1.50076 .000 -11.8723 -3.3652
15-17 -12.7633 1.68002 .000 -17.5249 -8.0016
18-21 -13.6450 1.82898 .000 -18.8288 -8.4611
12-14 8-11 7.6188 1.50076 .000 3.3652 11.8723
15-17 -5.1445 1.60184 .019 -9.6846 -.6045
18-21 -6.0262 1.75743 .010 -11.0072 -1.0451
15-17 8-11 12.7633 1.68002 .000 8.0016 17.5249
12-14 5.1445 1.60184 .019 .6045 9.6846
18-21 -.8817 1.91279 .975 -6.3030 4.5397
18-21 8-11 13.6450 1.82898 .000 8.4611 18.8288
12-14 6.0262 1.75743 .010 1.0451 11.0072
15-17 .8817 1.91279 .975 -4.5397 6.3030
253
* The mean difference is significant at the .05 level.
Tables 7.6 and 7.7 show the effect of age on SPM means scores. Subjects were divided
into four age groups. There were statistically significant differences (p =.05) in SPM
scores for the four age groups: F (3, 124) = 26.839, p = 0.000. The effect size was
calculated using eta squared (to divide the sum of squares between-groups (3535.138) by
the total sum of squares (8979.386) (Pallant, 2007)), the resulting eta squared value was
0.39, which indicated a large effect. Post-hoc comparisons using the Tukey HSD test
indicated that there were statistical significant differences between the different age
groups except between the 15-17 years age group (M = 40.09, SD 5.31) and the 18-21
years age group (M = 40.97, SD 6.21).
254
7.6.2.3 Difference according to gender
An independent t-test was carried out to compare the SPM score means for the gender
group.
Table 7.8 Comparison of the gender mean scores of SPM test

Gender (N)sample Mean SD Std. Error Mean (N)
Group
Male 11961 33.95 8.95 .92801 93
Female 11423 33.82 9.00 .94384 91
Lower Upper
Equal .062 .804 .102 182 .919 .13492 1.32356 -2.477 2.746
variances
assumed
Equal variances not .102 181.858 .919 .13492 1.32364 -2.477 2.747
assumed
An independent-samples t-test was conducted to compare the SPM mean scores for males
and females. There was no significant difference in scores for males (mean 33.95, SD
8.95) and females, mean = 33.82, SD = 9.00; t (182) = 0.102, p = 0.919). The magnitude
of the differences in the means (mean difference = 0.1349, 95% CI:-2.477 to 2.746) was
very small (partial eta squared = 0.007). SPSS did not provide eta squared values for t-
test. It was, however, calculated using the information provided in the output.
255
7.6.2.4 Difference according to development status and age
Two-way ANOVA test was carried out on the SPM scores for the development status
according to age groups.
Table 7.9 Comparison of the development status mean scores of SPM test according to
age.
Development status Age groups (N)sample Mean SD (N) Group
developed 8-11 4223 31.98 6.28 18
developing 4086 22.33 5.61 17
Total 8309 27.33 7.63 35
developed 12-14 2659 40.50 5.93 14
developing 7265 32.35 5.39 30
Total 9924 34.94 6.71 44
developed 15-17 1814 45.92 5.92 8
developing 7177 37.76 4.37 20
Total 8991 40.09 5.31 28
developed 18-21 818 50.22 4.21 4
developing 3651 38.80 3.04 17
Total 4469 40.97 6.21 21
developed Total 9514 38.88 8.63 44
developing 22179 32.93 7.57 84
Total 31693 34.99 8.41 128

F df1 df2 Sig.
2.052 7 120 .063
Table 7.11 Tests of Between-Subjects Effects of SPM scores.

Source Type III Sum df Mean F Sig. Partial Eta
of Squares Square Squared
Corrected Model 5774.888 7 824.984 30.893 .000 .643
Intercept 127961.898 1 127961.898 4791.836 .000 .976
AGE 4416.961 3 1472.320 55.135 .000 .580
REGION 1980.915 1 1980.915 74.180 .000 .382
AGE * REGION 32.878 3 10.959 .410 .746 .010
Error 3204.498 120 26.704
Total 165560.362 128
a R Squared = .643 (Adjusted R Squared = .622).
256
Table 7.12 Post Hoc Tests Multiple Comparisons of SPM Scores (Tukey HSD).
(I) (I) Age (J) Mean Std. Sig. 95% Confidence
Develop. Age Difference Error Interval
status (I-J) Lower Upper
Bound Bound
Developed 8-11 12-14 -8.52000* 2.10509 .001 -14.1625 -2.8775
15-17 -13.94125* 2.51016 .000 -20.6695 -7.2130
18-21 -18.23750* 3.26544 .000 -26.9902 -9.4848
12-14 8-11 8.52000* 2.10509 .001 2.8775 14.1625
15-17 -5.42125 2.61818 .180 -12.4391 1.5966
18-21 -9.71750* 3.34918 .029 -18.6947 -.7403
15-17 8-11 13.94125* 2.51016 .000 7.2130 20.6695
12-14 5.42125 2.61818 .180 -1.5966 12.4391
18-21 -4.29625 3.61753 .638 -13.9928 5.4003
18-21 8-11 18.23750* 3.26544 .000 9.4848 26.9902
12-14 9.71750* 3.34918 .029 .7403 18.6947
15-17 4.29625 3.61753 .638 -5.4003 13.9928
developing 8-11 12-14 -9.95410* 1.44341 .000 -13.7414 -6.1668
15-17 -15.35826* 1.56851 .000 -19.4738 -11.2427
18-21 -16.39706* 1.63086 .000 -20.6762 -12.1179
12-14 8-11 9.95410* 1.44341 .000 6.1668 13.7414
15-17 -5.40417* 1.37257 .001 -9.0056 -1.8027
18-21 -6.44296* 1.44341 .000 -10.2303 -2.6556
15-17 8-11 15.35826* 1.56851 .000 11.2427 19.4738
12-14 5.40417* 1.37257 .001 1.8027 9.0056
18-21 -1.03879 1.56851 .911 -5.1544 3.0768
18-21 8-11 16.39706* 1.63086 .000 12.1179 20.6762
12-14 6.44296* 1.44341 .000 2.6556 10.2303
15-17 1.03879 1.56851 .911 -3.0768 5.1544
• The mean difference is significant at the .05 level
Tables 7.9, 7.10, 7.11 and 7.12 showed the impact of development status according to age
on SPM mean scores. Subjects were divided into two groups according to the
development status (developed and developing). The interaction effect between
development status and age was not statistically significant, F (3, 120) = .410, P = .746.
There was a statistically significant main effect for development status, F (1, 120) =
257
74.180 P = .000; the magnitude of the effect size was large (partial eta squared = .38).
The main effect for age, F (3, 120) = 55.135 P = .000; was statistical significance. Post-
hoc comparisons using Tukey HSD test showed that in developing countries statistical
significance differences were found between all age groups except between the 15-17 age
group and the 18-21 age group. In developed countries, statistical significant differences
were found between all age groups except between the 12-14 age group and the 15-17
age group and also between the 15-17 age group and the 18-21 age group. Leven’s
equality test was not significant indicating that group variance was equal. Moreover, the
magnitude of the difference between groups in terms of standard deviation units (Cohen’s
d) was calculated (Pallant, 2007).
Table 7.13 Magnitude of the development status of countries (developed and developing
countries) in mean scores and variability on SPM as functions of age and total sample
Age Development (N) (N) Mean SD t sig d Vr IQ IQs
status Group sample Point
8-11 developed 18 4223 31.98 6.28 -4.75 .000 1.26 1.25 18.90 96
developing 17 4086 22.33 5.61 85
Total 35 8309 27.33 7.63 91
12-14 developed 14 2659 40.50 5.93 -4.52 .000 1.21 1.21 18.15 93
developing 30 7265 32.35 5.39 81
Total 44 9924 34.94 6.71 87
15-17 developed 8 1814 45.92 5.92 -5.10 .000 1.53 1.84 22.95 95
developing 20 7177 37.76 4.37 83
Total 28 8991 40.09 5.31 89
18-21 developed 4 818 50.22 4.21 -4.80 .000 1.84 1.91 27.60 96
developing 17 3651 38.80 3.04 79
Total 21 4469 40.97 6.21 88
Score Development (N) (N) Mean SD t sig d Vr

IQ IQs
status Group sample Point
Total developed 44 9514 38.88 8.63 -4.03 .000 0.71 1.30 10.65 95
developing 84 22179 32.93 7.57 82
Total 128 31693 34.99 8.41 89
258
Table 7.13 showed the mean scores obtained by developed and developing countries in
each age group, standard deviations, t values for the difference between developed and
developing countries in each age group, t value for the difference between developed and
developing countries within the total sample, level of significance, Cohen’s d scores (the
difference between the developed and developing countries means divided by the within
group standard deviation; Cohen, 1977), the variance ratios; Vr (i.e. the variance of the
developed countries divided by the variance of the developing countries; Lynn and
Irwing, 2004) Vr’s greater than 1.0 indicate that developed countries had greater variance
than developing countries, while Vr’s less than 1.0 indicate that developing countries had
greater variance than developed countries (Khaleefa and Lynn 2008). Finally IQ point
differences between developed and developing countries in each age group as well as
within total sample. Results showed three interesting features. First, the analysis showed
that the British percentile average equivalent was 39th PC for developed countries 8-11
age group (IQ=96), 31st PC for the 12-14 age group (IQ=93), and 37th PC for the 15-17
age group (IQ= 95). The American percentile average equivalent was 39th PC (IQ= 96)
for 18-21 age group. In addition, the British percentile average equivalent was 16th PC
for developing countries 8-11 age group (IQ=85), 10th PC for the 12-14 age group
(IQ=81) and 12th PC for the 15-17 age group (IQ= 83). The American percentiles’
average equivalent was 8th PC (IQ= 79) for the 18-21 age group. Overall, the highest IQ
obtained was 96 for the 8-11 years age group in developed countries whereas the lowest
IQ was 79 for the 18-21 years age group in developing countries. The average IQ for the
developed countries was 95 whereas for the developing countries it was 82.
259
Second, statistical significantly differences in development status of countries in total and
in every age group was in favour of developed countries. In total, developed countries
obtained a significantly higher mean than developing countries by (0.71d = 10.65 IQ
point).
Third, gender difference in variability within the total sample as well as within each age
group (as can be seen from the standard deviations and variance ratios) showed a large
difference in variance where developed countries hadgreater variability than developing
countries.
7.6.2.5 Difference according to development status and gender
Two-way ANOVA was conducted on SPM scores for the development status according
to gender.
Table 7.14 Comparison of the development status mean scores of SPM test according to
gender.
Development status Gender (N)sample Mean SD (N) Group
developed Male 2626 39.47 8.72 23
Female 2704 39.57 9.23 22
Total 5330 39.50 8.86 45
developing Male 9335 32.14 8.31 70
Female 8719 31.99 8.19 69
Total 18054 32.07 8.22 139
Total Male 11961 33.95 8.95 93
Female 11423 33.82 9.00 91
Total 23384 33.88 8.95 184

F df1 df2 Sig.
.107 3 180 .956
260
Source Type III Sum df Mean F Sig. Partial Eta
of Squares Square Squared
Corrected Model 1880.43 3 626.81 8.825 .000 .128
Intercept 174051.85 1 174051.85 2450.522 .000 .932
REGION 1879.56 1 1879.56 26.463 .000 .128
GENDER 5.1090 1 5.109.0 .001 .979 .000
REGION * GENDER .391 1 .391 .006 .941 .000
Error 12784.76 180 71.03
Total 225925.97 184
a R Squared = .128 (Adjusted R Squared = .114)
Tables 7.14, 7.15 and 7.16 showed the impact of development status according to gender
on SPM mean scores. Subjects were divided into two groups according to the
development status (developed and developing). The interaction effect between
development status and gender was not statistically significant, F (1, 180) = .006, P =
.941. There was a statistically significant main effect for development status, F (1, 180) =
26.463 P = .000; the magnitude of the effect size was large (partial eta squared = .13).
The main effect for gender, F (1, 180) = .001 P = .979; did not exhibit statistical
significance. Leven’s equality test was not significant indicating that the group variance
was equal.
261
7.6.2.6 Difference according to age groups and gender
Two-way ANOVA was conducted on SPM scores for age groups according to gender.
Table 7.17 Comparison of the age groups mean scores of SPM test according to gender
Age Gender (N)sample Mean SD (N) Group
8-11 Male 3133 26.09 7.87 27
Female 2918 25.67 8.27 27
Total 6051 25.88 7.99 54
12-14 Male 3373 33.12 7.17 31
Female 3267 34.19 6.81 30
Total 6640 33.65 6.95 61
15-17 Male 3871 39.79 5.45 23
Female 3656 38.95 6.14 23
Total 7527 39.37 5.76 46
18-21 Male 1584 42.60 4.20 12
Female 1582 42.07 4.41 11
Total 3166 42.35 4.21 23
Total Male 11961 33.95 8.95 93
Female 11423 33.82 9.00 91
Total 23384 33.88 8.95 184

F df1 df2 Sig.
2.281 7 176 .030

Source Type III df Mean F Sig. Partial Eta
Sum of Square Squared
Squares
Corrected Model 6525.562 7 932.223 20.157 .000 .445
Intercept 199051.721 1 199051.721 4304.018 .000 .961
AGE 6488.033 3 2162.678 46.763 .000 .444
GENDER 1.298 1 1.298 .028 .867 .000
AGE * GENDER 29.614 3 9.871 .213 .887 .004
Error 8139.628 176 46.248
Total 225925.966 184
a R Squared = .445 (Adjusted R Squared = .423).
262
Table 7.20 Post Hoc Tests Multiple Comparisons of SPM Scores (Tukey HSD)
(I) Age (J) Age Mean Difference Std. Sig. 95% Confidence Interval
(I-J) Error Lower Bound Upper Bound
8-11 12-14 -7.7645 1.27067 .000 -11.0603 -4.4687
15-17 -13.4903 1.36449 .000 -17.0294 -9.9511
18-21 -16.4696 1.69329 .000 -20.8616 -12.0777
12-14 8-11 7.7645 1.27067 .000 4.4687 11.0603
15-17 -5.7257 1.32799 .000 -9.1702 -2.2813
18-21 -8.7051 1.66401 .000 -13.0211 -4.3891
15-17 8-11 13.4903 1.36449 .000 9.9511 17.0294
12-14 5.7257 1.32799 .000 2.2813 9.1702
18-21 -2.9793 1.73671 .319 -7.4839 1.5252
18-21 8-11 16.4696 1.69329 .000 12.0777 20.8616
12-14 8.7051 1.66401 .000 4.3891 13.0211
15-17 2.9793 1.73671 .319 -1.5252 7.4839
• The mean difference is significant at the .05 level.
Figure 7.5 means score differences of age group and gender
Tables 7.17, 7.18, 7.19 and 7.20 showed that the effect of age group according to gender
on SPM test scores. The interaction effect between age groups and gender was not
statistically significant, F (3, 176) = .213, P = .887. There was a statistically significant
main effect for age groups, F (3, 176) = 46.763 P = 0.000; the magnitude of the effect
size was large (partial eta squared = .44). Post-hoc comparisons using Tukey HSD test
showed that there were statistical significant differences between the different age
263
groups except between the 15-17 years age group (M = 39.37, SD = 5.76) and the 18-21
years age group (M = 42.35, SD = 4.21). The main effect for gender, F (1, 176) = .028, P
=.867, did not exhibit statistical significance. Leven’s equality test was not significant
indicating that the group variance was equal. Furthermore, the magnitude of the
difference between groups in terms of standard deviation units (Cohen’s d) was
calculated (Pallant, 2007).
Table 7.21 Magnitude of gender differences in mean scores and variability on SPM as a
function of age and development status
function of age
Age Gender (N) (N) Mean SD t sig d Vr IQ
Group sample Point
8-11 Male 27 3133 26.09 7.87 .194 .847 0.05 0.95 0.75
Female 27 2918 25.67 8.27
Total 54 6051 25.88 7.99
12-14 Male 31 3373 33.12 7.17 -.599 .552 -0.15 1.05 -2.25
Female 30 3267 34.19 6.81
Total 61 6640 33.65 6.95
15-17 Male 23 3871 39.79 6.14 .491 .626 0.14 1.27 2.1
Female 23 3656 38.95 5.45
Total 46 7527 39.37 5.76
18-21 Male 12 1584 42.60 4.20 .294 .772 0.12 0.95 1.8
Female 11 1582 42.07 4.41
Total 23 3166 42.35 4.21
function of development status
status Gender (N) (N) Mean SD t sig d Vr IQ
Group sample Point
Devel Male 23 2626 39.47 8.72 .104 .917 -0.01 0.89 -0.17
oped Female 22 2704 39.57 9.23
Total 45 5330 39.50 8.86
Devel Male 70 9335 32.14 8.31 -.026 .980 0.15 1.03 2.25
oping Female 69 8719 31.99 8.19
Total 139 18054 32.07 8.22
Function of total sample
Score Gender (N) (N) Mean SD t sig d Vr IQ
Group sample Point
Male 93 11961 33.95 8.95 .102 .919 0.01 0.99 0.15
Female 91 11423 33.82 9.00
Total 184 23384 33.88 8.95
264
Table 7.21 showed the mean scores obtained by males and females in each age group and
in each development status, the standard deviations, t values for the difference between
males and females in each age group, t values for the difference between males and
females in each development status, t value for the difference between males and females
within the total sample, level of significance, Cohen’s d scores (the difference between
the male and female means divided by the within group standard deviation; Cohen,
1977), the variance ratios; Vr (i.e. the variance of the male divided by the variance of the
female; Lynn and Irwing, 2004) Vr’s greater than 1.0 indicate that males had greater
variance than females, while Vr’s less than 1.0 indicate that females had greater variance
than males (Khaleefa and Lynn 2008). Finally IQ point differences between males and
females in each age group and in each development status as well as within total sample
were showed. Results indicated two interesting features. First, lack of significant gender
differences in total and in every age group and in each development status. In total, males
obtained a higher mean than females by 0.01d (0.15 IQ point). In the 8-11 age group,
males obtained a higher mean than females by 0.05d (0.75 IQ point), while among the
12-14 age group females obtained a higher mean than males by 0.15d (2.25 IQ points).
In the 15-17 age group, males scored a higher mean than females by 0.14d (2.1 IQ
points). In the 18-21 age group, males scored a higher mean than females by 0.12d (1.8
IQ points). In developed countries, females obtained a higher mean than males by 0.01d
(0.17 IQ points). Finally, in developing countries males scored a higher mean than
females by 0.15d (2.25 IQ points). Second, gender difference in variability within the
total sample (as can be seen from the standard deviations and variance ratios) as well as
within each age group and within development status was marginally low except in the
265
15-17 age group where males had greater variability than females (Vr = 1.27). In
addition, females achieved greater variability than males (Vr = 0.89) in developed
countries.
7.6.3 Multiple Regressions according to the independent variables
In order to investigate the contribution of the independent variables (development status,

gender and age groups) in the prediction of the SPM scores a multiple stepwise
regression method was used.
Table 7.22 Stepwise Regression for Independent Variable and the SPM Score Means
Model Unstandardised Coffi. Standardised Coffi. T Sign.
B Std. Error Beta
1- (Constant) 22.889 .887 .603 25.793 .000
Age group 4.889 .352 13.863 .000
2- (Constant) 12.032 1.257 9.576 .000
Age group 5.175 .305 .638 16.985 .000
Development status 7.951 .730 .409 10.886 .000
Model Summary
Model R R Adjusted R Square Stand. Error of
Square Estimate
1- Development .603 .363 .361 7.00954
status, Gender
2- Gender. .727 .529 .526 6.03585
Using the Step-Wise method, a significant model emerged (Adjusted R square = 0.526; F
2,336 = 188.851, p < 0.000). Significant variables are shown below:
Predictor Variable Beta p
Age groups 0.638 p < 0.000
Development status 0.409 p < 0.000
Gender was not a significant predictor (p = 0.962).
This showed that both age and development status were predictors for SPM results with
age being a better predictor.
266
7.7 Chapter Summary
The overall SPM score means was 34.98 with a standard deviation of 8.41 (minimum
12.65 and maximum 52.76). The developed countries showed the highest mean score M
=38. 88; SD = 8.61 whereas Libya showed the lowest mean score M =32.31; SD = 9.02,
and was slightly lower than developing countries mean score M =33.10; SD = 7.31. The
18-21 years age group showed the highest mean score M = 40.97; SD = 6.21 whereas the
8-11 years age group showed the lowest mean score M =27.33; SD =7.63. Males showed
a slightly higher mean score M = 33.95; SD = 8.95 whereas female mean score was M =
33.82; SD = 9.00. The average IQ score for developed countries was 95, whereas the
average IQ score of developing countries was 82.
Normality testing was carried out and showed that the collected data was normally
distributed which warranted the use of parametric tests. To test the differences between
SPM score means, independent sample t-test, one and two way ANOVA statistical tests
were used. In addition, a stepwise analysis was employed to investigate which
independent variable was the best predictor of SPM scores. The following was
concluded:
1. Significant differences were found between the SPM scores based on development
status. Developed countries achieved higher SPM scores than developing countries
and than Libya. No statistically significant differences were found in SPM scores
between Libya and developing countries. Thus development status was concluded as
being an important factor affecting the SPM.
267
2. Significant differences were found between the SPM scores based on age groups.
Differences were in favour of older age groups. In addition, SPM scores of the age
groups were statistically different based on development status but not different based
on gender. Thus age was concluded as being an important factor affecting the SPM.
3. Using the British and American percentiles, SPM scores were converted to IQ scores.
IQ score of the 8-11 age group in developed countries was 96, whereas that in
developing countries was 85. IQ score of the 12-14 age group in developed countries
was 93, whereas that in developing countries was 81. IQ score of the 15-17 age group
in developed countries was 95, whereas that in developing countries was 83. IQ score
of the 18-21 age group in developed countries was 96, whereas that in developing
countries was 79.
4. No significant differences were found between SPM scores based on gender. In
addition, no gender differences were found among the different age groups or
development status. Thus gender was concluded as not being an important factor
affecting the SPM.
5. Variability difference in SPM mean scores was high in each age group based on
development status, in favour of developed countries. Variability difference in SPM
mean scores was low in each age group based on gender, except in the 15-17 age
group where variability was high in favour of males. In addition, females achieved
higher variability in developed countries, whereas in developing countries variability
was low, in favour of males. Extremely low variability was found in the total sample.
268
Consequently, results indicated no consistent tendency in variability for a gender
difference.
6. Multiple regression step-wise showed age and development status as predictors for
SPM results. Moreover, age was a better predictor.
The next chapter brings together the key research findings and discusses them in context
with the wider existing literature.
269
Chapter eight: DISCUSSION AND CONCLUSION
8.1 Introduction
Individuals differ from one another in their ability to understand complex ideas, to adapt
effectively to the environment, to learn from experience, to engage in various forms of
reasoning and to overcome obstacles by taking thought. Concepts of "intelligence" are
attempts to clarify and organize this complex set of phenomena. Although considerable
clarity has been achieved in some areas, no such conceptualization has yet answered all
the important questions and none commands universal assent (Neisser, 1995).
For historical reasons, the term "IQ" is often used to describe scores on tests of
intelligence. It originally referred to an "intelligence Quotient" that was formed by
dividing a so-called mental age by a chronological age, but this procedure is no longer
used. IQ is clearly a flexible construct — as amply demonstrated by decisions in the
1930s and 1940s in the United States and Britain to ‘adjust’ test questions to equalize the
scores of boys and girls, because in previous versions of the tests girls had scored higher.
Many tests have been “tailored” to ensure that the scores of boys and girls are equalized
because of the assumption that there are no gender differences in general intelligence
defined as the sum of all cognitive abilities. But this has not been done for the SPM.
The aim of this chapter is to discuss and evaluate the results of the study that have thus
far been presented. The next section, section two, discusses intelligence testing in Libya.
The third and fourth sections describe the SPM test and meta-analysis respectively.
Section five presents an analytical discussion of the entire study. The remaining sections,
270
six till nine, investigate the following points: conclusion of the major findings;
contributions of the current study in the domain of intelligence testing; highlight of study
limitations; recommendations and suggestions concerning further research in the area.
8.2 Intelligence testing in Libya
Though Libya has witnessed a huge development in education within the last 5 decades,
some areas have not benefited from the positive effects of this development. To date, no
single test of intellectual ability has been officially adopted or developed to be used for
the measurement of intelligence in Libya. Schools and universities –alike- use
examination grades as the primary or only method in determining who should be
accepted for study at various academic establishments and for various jobs in the
vocational sector. Although this might be considered as a good criterion for such
purposes, additional criterions are desirable.
Mental health services in Libya suffer from shortage of staff, psychological services and
a lack of facilities. The general public in Libya know very little about the usefulness,
purposes, or functions of intelligence tests.
Mental tests currently used in Libya are misused or partially used. The use of incomplete
tests was likely to bias predictions based on test results and had serious negative
implications for educational or clinical decisions In addition, the use of incomplete test
scores for estimation of mental ability might result in invalid assessment, leading to grave
consequences on the lives of individuals.
271
Other aspects that have been affected by lack of intelligence tests in Libya were the
selection of students for different educational programs. In Libya today, a relevant and
accurate selection procedure is essential and in need, not only in the field of education but
also at an intermediate level of training for skilled manpower. Indeed, a clear failing of
the current system could be seen whereby many university graduates were posted to
office work which could be performed to a similar level of competence by less qualified
people (Attashani and Abdalla 2005).
8.3 The SPM test
The problem of adapting intelligence tests to a new setting was by no means uncommon,
as this was a general problem for many developing countries in the past. In addition, if
the aim was to assess the “mental ability” of people in a culture that has yet to develop its
own testing scheme or system, it was necessary to assess what was important for that
culture (Brislin and Thorndike, 1973; Ortar 1972).
In this study, an international culture-fair test was adopted, and standardization was
carried out to achieve local norms This was done because it required less time and effort
than to design a test specifically for Libya (Ezeilo 1978). The Raven’s Standard
Progressive Matrices (SPM) test was employed because it had been widely used and
enjoyed moderately high indices of validity and reliability when used in a wide range of
cultures.
Raven's Progressive Matrices test is an example of a culture-fair test that has been used in
cross-cultural testing Brislin et al. (1973), Kline (1979), Raven (1989), and Murphy and
272
Davidshover (1991) held that Raven's Progressive Matrices was one of the most widely
used intelligence or ability tests in cross-cultural research.
It is a group test, which can be used with subjects of all language backgrounds and does
not depend to any large extent upon education or prior knowledge of the subjects. In
addition, it is suitable for all ages from the age of 6 years onwards
The Progressive Matrices (RPM, Raven, Raven & Court, 2000, Lynn & Vanhanen 2006)
is the most widely used test of intelligence in numerous countries throughout the world.
One reason for the popularity of the test was that it is non-verbal and can therefore be
applied cross-culturally. Also, it was considered to be the best test of g, the general factor
present in all cognitive tasks. The test was constructed by Raven (1939). Lynn, Allik,
Pullman, and Laidra (2004) have stated that the Progressive Matrices is widely regarded
as the best test of abstract or nonverbal reasoning ability.
The Progressive Matrices test has good psychometric characteristics. A huge body of
published research has shown the validity of this test. It has gained widespread
acceptance and use in many countries around the world. No other test has been so
extensively used in cross-cultural studies of intelligence. The RPM test is free from
language and apparently has limited dependence on cultural variables which make it a
popular instrument for use in developing countries
8.4 Meta analysis
Meta-analysis is a statistical approach to the aggregation summarization of results from
independent studies. It is systematic, thorough, objective, and quantitative. The essentials
273
of this technique are to collect all the studies on the issue, convert the results to a
common metric and average them to give an overall result. Procedures employed in meta-
analysis permit quantitative reviews and syntheses of research literature that address
these issues (Wolf, 1986). An epidemiologist has described meta-analysis as “a boon for
policy makers who find themselves faced with a mountain of conflicting studies” (Mann,
1990).
Any meta-analyst has to address three problems that have been identified by Sharpe
(1997) as the “Apples and Oranges”, “File Drawer” and “Garbage in - Garbage out”
problems.
The “Apples and Oranges” problem refers to the idea that different phenomena are
sometimes aggregated and averaged, where disaggregation may show different effects for
different phenomena. The best way of dealing with this problem is to carry out meta-
analyses, in the first instance, on narrowly defined phenomena and populations and then
attempt to integrate these into broader categories. In the present meta-analysis, this
problem has been dealt with by confining the analysis to studies using the Progressive
Matrices on school and university students.
The “File Drawer” problem means that studies producing significant effects tend to be
published, while those producing non-significant effects tend not to be published and
remain unknown in the file drawer. It is considered that this should not be a problem for
this present inquiry because in SPM studies results are not regarded as having significant
effect or not. Any result whatever its nature can be significant and deemed publishable.
There is no need to keep them “in the file drawer”.
274
The “Garbage in – Garbage out” problem concerns poor quality studies. Meta-analyses
that include many poor quality studies have been criticized by Feinstein (1995) as
“statistical alchemy” which attempt to turn a lot of poor quality studies into good quality
gold. Poor quality studies are liable to obscure relationships that exist and can be detected
by good quality studies. Meta-analysts differ in the extent to which they judge studies to
be of such poor quality that they should be excluded from the analysis. Some meta-
analysts are “inclusionist” while others are “exclusionist”, in the terminology suggested
by Kraemer, Gardner, Brooks and Yesavage (1998). This meta-analysis is “inclusionist”
in the sense that it included all the studies on the Progressive Matrices among school and
university students that have been located if the strict inclusion criteria apply to them.
The next problem in the meta-analysis was to obtain all the studies of the issue in
concern. This is a difficult problem and one that it is rarely and probably never possible
to solve completely. An attempt to find all relevant studies of the phenomena being
considered was conducted by examining previous reviews and searching computerized
database searches of PsycINFO, American Psychological Association (APA), American
Educational Research Association (AERA), Educational Testing Association (ETS),
National Council on Measurement of Education (NCME), Educational Resources
Information Centre (ERIC), Ingenta, Web of Science, Dissertation Abstracts, the British
Index to Theses, and Cambridge Scientific Abstracts for the years covered up to and
including 2009. In addition, active researchers in the field were contacted. In total, the
review of literature covered the years 1948 to 2009. It was considered that, although
finding all relevant studies was a problem for this and for many other meta-analyses, it
was not a serious problem for our present study because the results were sufficiently
275
obvious that they are unlikely to be seriously overturned by further studies that have not
been identified. If this should prove incorrect, other researchers will produce these
unidentified studies and integrate them into the meta-analysis.
A careful and thorough search for published and unpublished studies on the SPM test
using the above searching procedures produced 44 studies. They were carried out in 23
countries; 9 developed and 14 developing. The developed country with the highest
number of SPM studies was the United States (14 studies) while the developing country
with the highest number of SPM studies was India (four studies). The earliest study was
in the USA (1948) while the latest were in Qatar and Oman (2009). The overall sample
consisted of 31693 students aged from 8 years (grade 3) to 21 years (final year university
student). Although many studies were found using SPM, some of them did not fulfil the
inclusion criteria. Some studies lacked sufficient information or results. Some studies did
not carry out the test on all desired age groups. Some studies did not report the mean
values of the SPM test but reported the norms values only. These studies were excluded.
When studies did not report results based on age, different studies carried out on
individual ages were combined together to obtain results of age groups.
After a thorough investigation into the criteria that define social classes, it was not
possible to locate a single criterion that can be used in this context. Income, parent’s
occupation, education and culture were all used and the differences between the various
studies were vast. Many researchers have used different criteria when determining social
class. Tulkin and Newbrough in 1968 used occupation and education as factors to
determine social class, while Whorton and Karnes in 1979 used income as a sole factor.
276
Also, Nkaya et al. (1994), used occupation, culture and income as determinants of social
class. They reported that criteria applied to one country may not be applicable in other
countries to define them socially due to the huge social differences between countries. In
addition, the number of SPM studies that reported such criteria was limited. Eventually, it
was decided not to include social status in the meta-analysis for the above mentioned
reasons.
8.5 Study discussion
The discussion below has been organized according to the objectives of the study
outlined in chapter there. The primary focus is analysing the applicability of the SPM test
as an appropriate measure of mental ability (non-verbal reasoning ability, or fluid
intelligence, and g) for a sample of Libyan students. In addition the distribution of IQ
scores within the sample is identified and compared with that found in other countries,
(developed and developing). After that, the effects of independent variables on the SPM
test results are presented. Finally, SPM norms of the Libyan sample are discussed and
compared to other norms findings of various studies conducted in different cultures.
8.5.1 Psychometric characteristics of the SPM test in Libya
Until now, no single test of mental ability has been officially constructed or adopted for
the measurement of the intelligence in a Libyan setting. Lack of use of intelligence tests
in Libya is mainly due to a lack of test experts and information and knowledge regarding
the usefulness and effectiveness of these tests among people who were directly affected
by testing.
277
The present study tried to rectify this problem by investigating and examining the
performance of a Libyan sample on the Standard Progressive Matrices test, and by
exploring its applicability as an appropriate measure of mental ability. It has been
reported in the literature (Brown 1983; Anastasi and Urbina 1997; Kenneth 1998; Kline
2000; Langdridge 2004; Domino and Domino 2006; Mills and Airasian 2006; Lobiondo-
Wood and Haber 2006) that reliability and validity both were important for judging the
suitability of a test or measuring instrument and both were the most paramount
characteristics of a psychological test. To test the suitability of the SPM test, its
psychometric characteristics were extensively evaluated.
8.5.1.1 Reliability of the SPM test
This was tested using three methods:
A) Test-retest
Raven provided a test retest reliability ranging from .83 to .93 for several age groups: .88
(13 years and over), .93 (under 30 years), .88 (30-39 years), .87 (40-49 years), and .83
(50 years and over). The results of the present study (0.86 to 0.92) were in accordance to
results reported in the literature, such as Rao (1974), Abdel-Khalek (1988), Nkaya et al.,
(1994), Abdel-Khalek (2005) and Khelefeeh and Lynn (2009).
B) Split half
The majority of split-half internal consistency coefficients reported in the literature
exceeded 0.90. The lower reliability was 0.86 with 174 Iranian children (aged 9 years).
The higher reliability was 0.96 (91 psychiatric male patients) (Raven, 2004). This was in
agreement with the results of this study (0.88 to 0.96) and many other studies such as
278
(Raven et al., 2003). Burke and Bingham (1969), Baraheni (1974), Bart et al., (1986),
Powers et al., (1986.a), Duzen (1994), Court and Raven (1995), Ahmad et al (2008) and
Khelefeeh and Lynn (2009).
C) Internal consistency alpha
The majority of alpha consistency coefficients reported in the literature exceeded 0.95.
Our results (0.85 to 0.96) matched those of Dey (1984), Duzen et al, (1994), Rushton and
Skuy (2000), Rushton et al, (2002), Abdel-Khalek (2005) and Taylor (2007).
When this study results were compared to earlier studies, they appeared quite similar and
provided evidence that the SPM is a reliable measure when used with Libyan students.
These figures indicated a satisfactory reliability for the SPM test with the present Libyan
sample and gave strong evidence for the consistency of the SPM test. Anastasi (1988)
and Pallant (2007) believed that the desirable reliability coefficients should fall in the
range of .80’s or .90’s. The present results generally can be considered as high reliability
coefficients for the Libyan sample and support the reliability of the SPM test.
In addition, one would conclude that the measure of constancy of the reliability is high.
It was particularly noteworthy that the coefficients alpha reliabilities (KR-20) were
higher than the test-retest correlations, which was predictable as a result of the high
homogeneity of the test items, Abdel-Khalek (2005).
279
8.5.1.2 Validity of the SPM test
This was tested using two methods:
A) Construct Validity
This is divided into two analyses. First was the factor analysis. The SPM is considered by
Jensen (1980) to be a measure of the purest form of Spearman’s “g”, or in Jensen’s
terminology, as an excellent culture-fair measure of fluid intelligence “g”. Fluid
intelligence was a concept proposed by Cattell (1971) to designate reasoning ability as
distinct from other kinds of intelligence such as verbal knowledge, memory and spatial
ability. Cross-cultural studies, also, confirm the high ‘g’ saturation of the SPM. Some
factor analytic studies, however, suggest that the SPM measures other factors such as
visuo-spatial or ‘K’ factors, spatial ability, or memory, as well as a large ‘g’ factor
(Raven et al., 1977). A number of scholars have contended that while the Progressive
Matrices was largely a measure of g it also contained a small visualization or spatial
factor. These include Adcock (1948), Keir (1949), Banks (1949), Vernon (1950), Gabriel
(1954), Gustaffson (1984, 1988), who concluded that the SPM measures a reasoning
factor and a further factor that he called “cognition of figural relations”. Hertzog and
Carter (1988) have contended that the SPM contained two factors: verbal intelligence and
spatial visualization. Lynn, Allik & Irwing (2004) identified a general factor and three
further factors that they reported as the gestalt continuation found by van der Ven and
Ellis (2000), verbal-analytic reasoning and visuospatial ability. Further analysis of the
three factors showed a higher order factor identifiable as “g”.
Whatever the number, the evidence relating to factors other than “g” is, according to
Jensen (1980), inconclusive and dubious. He reported that the PM measures “g” and little
280
else, and that the loadings occasionally found on other “perceptual” and “performance”
type factors, independently of “g” are usually trivial and inconsistent from one analysis to
another. In fact, the PM has very meagre loadings on these factors, when “g” is excluded.
Anastasi (1982), on the other hand, stateed that the PM is heavily loaded with a factor
common to most intelligence tests (identified as Spearman’s “g” by British
psychologists) but that spatial aptitude, inductive reasoning, perceptual accuracy, and
other group factors also influence performance.
The outcome of the factor analysis in this study showed the presence of only one factor
which was spearman’s “g”. This result was in agreement with the SPM test 1996 and
2004 manuals, Burke and Bingham (1969), Zager et al., (1980), Abdel-Khalek (1987)
and (2005),
Second was internal consistency. In the present study, there were strong, positive
correlation coefficients, statistically significant between the five sets (A, B, C, D and E)
and total scores ranging from 0.51 to 0.85. This was in agreement with Abdel-Khalek
(1987) and Abdel-Khalek (2005). Overall, construct validity showed good characteristics
when the SPM was applied to a Libyan sample.
B) Criterion-related Validity
This study provided evidence that the validity of the SPM was found to have moderate
significant correlation with students’ academic achievements (SAA) when it was used as
external criterion validity. According to the SPM test manual (2004), the external
criterion commonly adapted in predictive validity investigations are examination grades
or teacher’s estimates. SPM correlations with overall academic achievement tests
281
generally fall in the region of 0.26 to 0.76. Our results were in agreement with Raven et
al. (2004), Tulkine and Newbrough (1968), Mclaurin and Farrar (1973), Sinha (1968),
Baraheni (1974), Sinha (1977), Maqsud (1980), Powers et al., (1986.b), Avoy (1987),
Carver (1990), Majdub (1991) and Laidra et al (2007). The results of the study showed
that the SPM was valid when applied to a Libyan sample.
8.5.1.3 Item analysis of SPM test
Nunnally (1972) and Burroughs (1975) argued that item difficulty is required because it
is almost always necessary to present items in their order of difficulty, the easiest first to
give a sense of accomplishment and an optimistic start, and if this is not done a blockage
may occur with many students being unable to progress beyond the first items, while the
more difficult items are placed near the end to prevent students from spending undue
amount of time on difficult items early in the testing period.
Many researchers believe that test items should include some easy and some difficult
items, but most items should be located in the 20 to 80 percent zone of easiness, Karmel
(1978). Our analysis showed that set A was the easiest set whereas set E was the most
difficult set but noticing that set D was easier than set C (0.01 means percentage
difference), according to Hopkins (1998) 51 out of 60 items had excellent discriminating
value and 13 items and one set were not arranged in an order of increasing difficulty.
Rushton et al, (2002) and Boben et al. (2007) also showed set D to be easier than set C.
Overall results indicated that the difficulty level of the SPM test employed in the present
study was suitable for Libyan students.
282
8.5.2 IQ in Libya
Overall, the mean IQ result obtained from the Libyan students was 81 (85 maximum
mean and 74 minimum mean). The average IQ score of developing countries was 82,
whereas the average IQ score for developed countries was 95. As there was no
statistically significant difference in IQ scores between Libya and developing countries,
Libya was considered as a developing country for the comparison purposes of this study.
The following table (8.1) showed mean IQs for some countries in North Africans and
South Asians and the average IQ for developing countries.
Table 8.1 mean IQs and average for some developed and developing countries
IQs of North Africans = 80.71
Location Age N Test IQ Reference
North Africa Adults 90 SPM 84 Raveau et al., 1976
Egypt 6–12 129 SPM 83 Ahdel-Khalek, 1988
Sudan 8–12 148 SPM 75 Ahmed, 1989
Sudan 6-9 1683 CPM 81 Khatib et al., 2006
Sudan 9-25 6202 SPM 79 Khaleefa et al., 2008b
Sudan 9 3185 SPM 79 Irwing et al., 2008
Tunisia 20 509 SPM 84 Abdel-Khalek & Raven, 2006
IQs of South Asians = 83.93
Bahrain 19-29 100 SPM 81 Khaleefa & AlGharaibeh, 2002
Iran 15 627 SPM 84 Valentine, 1959
Iraq 14–17 204 SPM 87 Abul-Hubb, 1972
Iraq 18–35 1185 SPM 87 Abul-Hubb, 1972
Jordan 11-40 2542 APM 86 Lynn & Abdel-Khalek, 2009
Kuwait 6–15 6529 SPM 86 Abdel-Khalek & Lynn, 2006
Oman 5-11 1042 CPM 87 Khaleefa & Lynn, 2009
Oman 9-18 5139 SPM 82 Abdel-Khalek & Lynn, 2008
Qatar 10–13 273 SPM 78 Bart et al., 1987
Qatar 6–11 1135 SPM 88 Khaleefa & Lynn, 2008d
Saudi Arabia 8-14 3967 SPM 80 Abu-Hatab et al., 1977
Syria 7 241 CPM 83 Guthke & Al-Zoubi, 1987
Syria 7-18 3489 CPM 83 Khaleefa & Lynn, 2008a
Yemen 6–11 1000 CPM 85 Al-Heeti et al., 1997
Yemen 6-11 896 CPM 83 Khaleefa & Lynn, 2008c
UAE 6-11 4496 CPM 83 Khaleefa & Lynn, 2008b
Average IQs for developing countries = 82.95
283
Average IQs of Europeans = 97.77
Czech Rep. 5-11 832 CPM 96 Raven et al, 1995
Denmark 5-11 628 SPM 97 Vejleskov, 1968
Estonia 12/18 2,689 SPM 100 Lynn et al., 2002
Estonia 7/11 1,835 SPM 98 Lynn et al., 2003
Finland 7 755 CPM 98 Kyostio, 1972
France 6-9 618 CPM 97 Bourdier, 1964
Germany 5-7 563 CPM 99 Winkelman, 1972
Germany 11-15 2,068 SPM 105 Raven, 1981
Germany 11-15 1,000 SPM 99 Raven, 1981
Germany 6-10 3,607 CPM 101 Raven et al., 1995
Germany 5-10 980 CPM 97 Raven et al., 1995
Iceland 6-16 665 SPM 101 Pind et al., 2003
Ireland 6/12 1,361 SPM 93 Carr, 1993
Ireland 9/12 2,029 SPM 87 Carr, 1993
Ireland 9/12 2,029 SPM 91 Carr, 1993
Netherlands 5-10 1,920 CPM 99 Raven et al., 1995
Netherlands 6-12 4,032 SPM 101 Raven et al., 1996
Russia 14-15 432 SPM 97 Lynn, 2001
Slovakia 5-11 823 CPM 96 Raven et al., 1995
Slovenia 8-18 1,556 SPM 96 Raven et al., 2000
Spain 6-9 854 CPM 97 Raven et al., 1995
Spain 11/18 3,271 APM 102 Albade Paz & Monoz, 1993
Switzerland 6-10 200 CPM 101 Raven et al., 1995
Switzerland 9-15 246 SPM 104 Spicher, 1993
Turkey 6/15 2,272 SPM 90 Sahin & Duzen, 1994
United Kingdom 6-15 3,250 SPM 100 Raven et al., 1998
Average IQs of East Asians = 104.42
China 6/15 5,108 SPM 101 Lynn, 1991
China 6/12 269 SPM 104 Geary et al., 1997
China 17 218 SPM 103 Geary et al., 1999
Hong Kong 6/13 13,822 SPM 103 Lynn, Pagliari & Chan, 1988
Japan 9 444 SPM 110 Shigehisa & Lynn, 1991
Taiwan 6/8 764 CPM 105 Rabinowitz et al., 1991
Taiwan 9/12 2,476 CPM 105 Lynn, 1997
Average IQs of South Americans = 97.50
Canada 7/12 313 SPM 97 Raven et al., 1996
United States 18/70 625 SPM 98 Raven et al., 1996
284
Average IQs Israel, Singapore& Australia = 95.78
Israel 10/12 268 SPM 95 Globerson, 1983
Israel 11 2,781 SPM 89 Lancer & Rim, 1984
Israel 9-15 1740 SPM 90 Lynn, 1994
Singapore 13 337 SPM 103 Lynn, 1977b
Australia 18 6,700 SPM 100 Craig, 1974
Australia 5/10 700 CPM 98 Raven et al, 1995
Average IQs for developed countries = 98.60
Table (7.1) illustrates that the mean IQ result obtained from the Libyan student’s (81 IQs)
was similar to the IQ value of other developing countries in North Africa and South Asia
reported by Lynn and Vanhanen (2002, 2006). This indicated the validity and reliability
of the SPM test and may be considered as an appropriate measure of mental ability for
Libyan students. Lynn and Vanhanen (2006) showed the average IQs for the developing
countries value to be (82.95 IQs), which was similar to the IQ value of developing
countries (82 IQs) obtained from the present meta-analysis. Similarly, Lynn and
Vanhanen (2006) showed the average IQs for the developed countries value to be (98.6
IQs), which was similar to IQ value of developed countries (95 IQs) obtained from the
present meta-analysis which indicated the validity and reliability of meta-analysis study.
It is noteworthy that data from some studies carried out in developed countries reported
the norms to calculate the IQ scores and not the means. Therefore, as the SPM means
were used in this meta-analysis, it was not possible to use such data in the meta-analysis.
It is known that intelligence has increased remarkably in economically developed nations
during the last 70 years or so (Flynn, 1984, 2007; Lynn & Hampson, 1986). The reasons
for this are not fully understood. Reasons probably lie in improvements in nutrition and
education that have accompanied rising living standards (Lynn, 1990, Ceci, 1991,
285
Benton, 2001), and it can be anticipated that as living standards rise in North Africa and
the Middle East, abstract reasoning ability will also rise. Many people from Galton
(1869) onwards have considered that it would be desirable if intelligence could increase.
Although education appears to improve intelligence, the process by which it does this
remains unknown. Presumably, education teaches problem-solving skills which are used
in intelligence tests. Education in Sudan and other Arab countries tends to concentrate on
rote learning and memorization. In Sudan, Irwing et al., (2008) evaluated the effects of
Abacus Training in mental computation on intelligence assessed with the SPM test.
Abacus training consists of training in mental arithmetic including working memory in
which information is stored in working memory while other mental operations are
performed, and then retrieved. The training procedure has been described by Hatano
(1977) and Hatano & Osawa (1983). Mental arithmetic is required in a number of tests of
fluid intelligence such as the Progressive Matrices. It has been shown by Carpenter, Just
& Shall (1990) that the Progressive Matrices is largely a mathematical problem solving
test in a design format, requiring the application of five mathematical rules involving
addition, subtraction, arithmetical and geometrical progression. The results suggested that
the intelligence of Sudanese children would significantly increase by introducing a
greater emphasis on acquisition of problem solving skills in Sudanese schools.
Further, schools in Libya do not promote problem solving abilities in students as well as
do those in the United Kingdom, teachers are not as well trained, and children in Libya
do not have much experience in carrying out intelligence tests (Attashan and Abdalla
2005). It is possible that the observed group differences are attributable, at least in part, to
the relative novelty of the testing process, as suggested by Stanczak et al. (2001).
286
Lynn & Vanhanen (2002, 2006) proposed three theories in an attempt to explain how
development status affects SPM. The theories were:
• IQ determines development status.
• Development status determines IQ.
• Both processes are involved by positive feedback, also known as reciprocal
interaction.
The current data are consistent with all three of these. Lynn and Vanhanen presented
arguments that the third hypothesis is the most reasonable. In addition, nine principal
factors have been reported as being responsible for some groups achieving higher IQ
scores than others. The factors are as follows:
(1) Improvement in education: this has been the most favoured factor, proposed by
Tuddenham (1948), Flynn (1984, 2007), Teasdale and Owen (1994), Flieller (1996,
1999), Greenfield (1998), Jensen (1998), Weede & Kampf (2002), Garlick (2002), Blair,
Gamson, Thorne & Baker (2005), and Meisenberg, Lawless, Lambert & Newton (2006).
Education engulfs many aspects and can be obtained by many various ways, but
education is mostly achieved by attending school. Students from developed countries are
expected to receive better schooling education than their counterparts. Schools affect
intelligence in several ways, most obviously by transmitting information. Schools
promote and permit the development of significant intellectual skills, which develop to
different extents in different children. Also schooling changes mental abilities, including
those abilities measured on psychometric tests. It has been shown that students who have
been in school longer have higher mean scores, which would explain why higher SPM
287
scores are achieved as age of student’s increases. Also, students who attend school
intermittently score below those who go regularly (Neisser, 1995). Also, parent’s
education plays a significant role. Students from families with educated parents scored
higher SPM results than families with uneducated parents (Abdulla 2002).
(2) Increased test sophistication; Tuddenham (1948), Brand (1987), and Jensen (1998).
Students in developed countries attempt such psychometric tests since childhood and gain
some familiarity with such tests, whereas students from developing countries do not
usually attempt such tests and may exhibit some fear in attempting such tests (Abdulla
2002).
(3) The greater cognitive stimulation arising from the greater complexity of more recent
environments provided by e.g. television, media and computer games: Elley (1969),
Jensen (1998), Schooler (1998), Williams (1998), and Sundet, Barlaug & Torjussen
(2004), Essawe (1973). All these would enhance the perception and awareness of
children and improve mental abilities. In addition, cognitive ability increases with age,
probably as a result of learning and brain growth (Lynn, 2008 personal communication).
Abdalla et al., in 2002, Lynn and Irwing 2004 and 2005 studies supported the result
showing that IQ scores increase with age.
(4) Improvements in child rearing: Elley (1969) and Flieller (1996). Normal child
development requires a certain minimum level of responsible care. Severely deprived,
neglectful, or abusive environments would have negative effects on many aspects of
development, including intellectual aspects. It is expected that as child rearing improves,
child’s scores in SPM increases.
288
(5) More confident test taking attitudes: Brand (1987) and Brand, Freshwater & Dockrell
(1989). Usually students in developing countries do not have much experience of taking
intelligence testing as compared to students in developed countries, Stanczak & Awadalla
(2001), Lynn et al., 2008. In addition, in developing countries students are usually
apprehensive and afraid from tests. Also, older students would have more confidence
towards attempting tests than younger students. This is a very important point. Students
with more experience and confidence would logically score higher in the test, even
though their mental ability might not be higher. This factor might be one of the causes of
the difference between developed and developing countries.
(6) The “individual multiplier” and the "social multiplier" (Dickens & Flynn, 2001;
Flynn, 2007). The concept of the “individual multiplier” is that intelligent individuals
have a thirst for cognitive stimulation and this increases their intelligence through
positive feedback. The "social multiplier" posits “that other people are the most important
feature of our cognitive development and that the mean IQ of our social environs is a
potent influence on our own IQ” (Flynn, 2007). This would explain that children brought
up in a university town should have higher intelligence that those without this advantage,
because the high intelligence of the professors will enhance the intelligence of the
population.
(7) Improvements in nutrition: Lynn (1990a, 1993, 1998), Jensen (1998), Colom, Lluis-
Font & Andres-Pueyo (2005), and Arija, Esparo, Fernandez-Ballart et al. (2006).
Prolonged malnutrition during childhood has long-term intellectual effects. The effects
289
may well be indirect. Malnourished children are typically less responsive to adults, less
motivated to learn, and less active in exploration than their more adequately nourished
counterparts (nielssen 1993). It is expected that students might be more prone to
malnutrition in developing countries than their counterparts.
(8) Smaller family size (Sundet, Borren & Tambs, 2008). Smaller families means less
economical burden. Parents would be able to provide better education, nutrition and child
needs. Child rearing would be easier and more focused. In the United States and Europe
it has invariably been found that the relation between intelligence and family size is
negative, i.e. children with large numbers of siblings have lower IQs than children in
small families (Abdel-Khalek, Lynn, 2008). Moreover, Lynn (1996) summarized results
of 17 studies that reported this negative relationship. The correlations varied between -
0.19 and -0.34 with an average of -0.26. A theory to explain these results positing that
family size has causal effects on intelligence was advanced by Lynn (1959). This theory
proposed that parents give more attention to children in small families and this enhances
children’s intelligence.
Two theories have been advanced to explain these results. These are:
• The confluence theory of Zajonc’s (1976, 1983, 2001a) states that the child’s IQ
is partly determined by the attention the parents and siblings give to it. This
explains the negative relation between family size and intelligence, because the
smaller the number of children in the family, the greater the amount of attention
they are likely to receive from their parents. The result of this will be that children
in small families will have higher average IQs than those from large families.
290
• The resource dilution theory of Blake (1981) and Downey (2001) proposes that
“parental resources are finite and that as the number of children in the family
increases, the resources accrued by any one child necessarily decline” (Downey,
2001). The theory is similar to the confluence theory but broader in so far as it
posits that parental resources consist of a variety of phenomena including the
material, financial and cultural quality of the home, parental treatment of children,
and opportunities afforded to children. It is also broader in its explanatory power
in so far as it purports to explain the negative relation between sibship size and
educational attainment in addition to the relation with intelligence.
(9) Heterosis: Jensen (1998, p.327) suggested heterosis (hybrid vigor) as a possible
contributor to the Flynn effect. Heterosis is the mating of two individuals from
different ancestral lines i.e. the marriage of two individuals that are from different
origins such as the marriage of a white American to an African America or Hispanic
or Asian American. Jensen argued this is wide spread in the United States as a result
of immigration from many different countries. Mingroni (2004) had further argued
this theory.
The author agrees with the above mentioned factors and stresses the importance of
education as a major factor. In addition, economy plays a pivotal role. IQ scores are
higher in economically developed nations. According to Lynn (2008), IQ in developing
countries will increase by about 3 points a decade with further economic development
(personal communication with Prof. Lynn).
291
The above mentioned factors explain the reason why IQ in students from developed
countries is higher than their counterparts. Students from developed countries have
environmental advantages from better nutrition, health, education, and sometimes smaller
family size.
On the other hand, human intelligence, like height, is influenced by numerous genetic
interactions, sensitive to numerous environmental factors. The literature has shown
evidence of genetic factors associated with IQ, but the extent is still controversial. In
addition, some researchers hypothesized that intelligence is a phenotype. Even if
intelligence is largely genetic, it cannot be understood without reference to the genes’
environment. This has been shown in studies conducted in twin’s studies and adoption
studies (Richardson and Sarah 2006, Lynn and Vanhanen 2006).
8.5.3 SPM and gender
The Progressive Matrices is a useful test to examine sex differences in intelligence. The
issue of whether there are any sex differences on the Progressive Matrices has frequently
been discussed and it has been virtually universally concluded that there is no difference
in the mean scores obtained by males and females. This has been one of the major
foundations for the conclusion that there is no sex difference in reasoning ability or in g,
of which the Progressive Matrices is widely regarded as an excellent measure.
The first statement that there is no sex difference on the test came from Raven himself
who constructed the test and wrote that in the standardisation sample “there was no sex
difference, either in the mean scores or the variance of scores, between boys and girls up
292
to the age of 14 years. There were insufficient data to investigate sex differences in
ability above the age of 14” (Raven, 1939, p.30). The conclusion that there is no sex
difference on the Progressive Matrices has been endorsed by numerous scholars.
The results of the present study and meta-analysis supported this hypothesis and were in
agreement with previous studies of Eysenck (1981), Court (1983), Mackintosh (1996),
Jensen (1998), Rushton et al. (2002), Pind et al. (2003), Lynn et al. (2004), Abdel-Khalek
and Lynn (2006), Taylor (2007), Kaleefa and Lynn (2008), Khaleefe et al. (2008),
Ahmad et al. (2008) and Abdal-Khalek and Lynn (2009). They examined the hypothesis
that there is no gender difference on the Progressive Matrices and that, as Mackintosh
(1998a) put it the gender difference on the Progressive Matrices is “0.15 to 2.1 IQ points
either way”, i.e. in favour of men or women
The assertion that there is no gender difference in average general intelligence has been
made repeatedly since the early decades of the twentieth century. Terman (1916) and
Spearman (1923) asserted that there is no gender difference in g. Jensen (1998) calculated
gender differences in g on five samples and concluded that, “no evidence was found for
gender differences in the mean level of g”. Similarly “there is no gender difference in
general intelligence worth speaking of” (Mackintosh, 1996).
Some studies found no sex differences in SPM scores for subjects at younger age e.g.
Tulkin & Newbourgh, (1968) with fifth and sixth grade students; Powers et al., (1986.b)
with sixth and seventh grade students; Sidles and Avoy, (1987) with seventh grade
students; Persaud (1987) and Zeidner (1988) with seventh grade students. Sex differences
in Libya are similar to those found in many economically developed countries, i.e. there
293
are no significant differences at the ages of 8 and 9 years. Girls obtained a significantly
higher mean than boys at the age 10 years, supporting the developmental theory that girls
mature more rapidly than boys at this age, advanced in Lynn (1994, 1999, 2004, 2005).
At 11 years, males scores were statistically higher than female’s scores. At 12, 13 and 14
years, there were no differences in SPM scores between males and females. At the ages
of 15 through 17, boys obtained consistently higher means than girls. These higher
means were statistically significant. This again supports the developmental theory that
boys obtain higher average means at these ages. These age trends are consistent with
numerous studies from western developed countries such as Irwing and Lynn in 2005. At
ages 18 through to 21, no statistical differences were found.
These are interesting results because they show that sex differences in Libya are similar
to those in economically developed nations, contrary to the suggestions that have
sometimes been made that girls in traditional societies are socially handicapped and this
impairs their intellectual development, and that as females have become more
emancipated and gained greater equality in economically developed western nations,
their cognitive abilities improve. This theory receives no support from the present results.
This significant gender by age interaction is explained by Lynn (1994) and Lynn &
Irwing (2005). It is because boys and girls mature at different rates. Boys and girls have
the same development and IQ up to about 11 years. Then girls accelerate in the "growth
spurt”. Than at about age 16, girls cease to grow but boys continue to grow physically
and in IQs. The data for Libyan sample confirm this.
294
In the present study the gender difference in variability (Vr) in total sample and within
each age, geographic nature and academic discipline can be detected from the standard
deviations and variance ratios. At the ages of 8, 9, 10, 12, 13, 14, 15, 17, 18 and 20 years
old, females have greater variability than males. In total sample and at ages of 11, 16, 19
and 21 years old males have greater variability than females (note that Vr greater than 1.0
indicate that males have greater variance than females, while Vr less than 1.0 indicate
that females have greater variance than males). Concerning geographic areas, results
showed that males have greater variability than females in total sample and in each
geographic area. Regarding academic discipline, results showed that females have greater
variability than males in total sample and in each study academic discipline.
In regards to variance in the meta-analysis, there were small differences between males
and females in total sample, in favour of males. In the different age groups the variability
was also small except in the 15-17 age groups, in favour of males. In addition, females
had greater variability than males in developed countries. The age groups 12-14 and 18-
21 showed small variability in favour of females. The developing countries analysis
showed small variability in favour of males. It has been repeatedly asserted that males
have greater variability of IQs than females, but there are a number of contrary studies.
The present study and meta-analysis results add to these in showing no consistent sex
differences in variability (Lynn et al., 2008, Khaleefa, 2008). Regarding variance in
development status, this study showed a large variance in favour of developed countries
in all age groups and in total sample. These overall results showed no consistent tendency
for gender difference in variability.
295
Gender differences in variance were examined because it has frequently been contended
that males have greater variability than females. This assertion was made in the early
years of the twentieth century by Havelock Ellis (1904), Thorndike (1910) and Terman
(1916). This difference in variability was proposed by these early writers to explain why
men are so greatly over-represented among geniuses. As there was no sex difference in
general intelligence, a greater variability among males entailing more males among those
with very high intelligence (as well as more males with very low intelligence) was
suggested to provide a solution to this problem.
Thorndike (1910) put the theory as follows: “The trivial difference between the central
tendency of men and that of women which is a common finding of psychological tests
and school experience may seem at variance with the patent fact that in the great
achievements of the world in science, art, invention, and management, women have been
by far excelled by men. One who accepts the equality of typical representatives of the
two sexes must assume the burden of explaining this great difference in the high ranges
of achievement. The probably true explanation is to be sought in the greater variability
within the male”. Thorndike examined test data on variability and concluded that men are
about one twentieth more variable than women.
Terman (1916) also discussed the question and wrote that “it is often said that women are
grouped closely around the average, while men show a wider range of distribution”.
However, in his data for 1000 children aged 6 to 14 years he found no difference between
boys and girls in variability. The greater male variability was reaffirmed by Eysenck
(1981, p. 42) and recently by Deary, Irwing, Der and Bates (2007). However, not all
296
studies have found greater male variability, including a meta-analysis of the performance
of college students on the Progressive Matrices by Irwing and Lynn (2005). This study
showed that there was no consistency in variability between males and females in SPM
scores. Likewise results were also found in the meta-analysis.
8.5.4 SPM and region
In regards to difference in SPM mean scores depending on regions, no differences were
found between cities and villages, or between coastal, mountain and dessert villages or
between main and secondary cities. This can be attributed to the urbanisation process of
Libya. According to the first general National General Censuses of 1954 only 25% the
total population were classified as urban settlers. However, within just four decades the
proportion of urban population had increased substantially to 90% of the total population
(Figure 1).
Figure 1: Urbanisation development in Libya 1954-1995
Source: General National Census of 1954, 1964, 1973, 1984, 1995.
This dramatic and quick increase of urban population on the expense of rural
population has led some analysts to classified Libya as one of the most urbanised
297
countries in the world (Kezieri, 1995). This situation has also affected the specific
characteristics of rural areas as many of these characteristics have been influenced or
already been replaced by urban lifestyle. Many rural populations are now engaged in
urban life style such as jobs and occupation activities, and using modern household
appliances and equipments. As a result of these recent socio-economic changes, a
number of analysts have pointed out that the nature of rural areas and communities are
now being replaced by urban features (Attir and Al-Azzabi, 2002; Kezeiri, 1995). This
present study failed to detect significant differences between rural and urban students.
Both urban and rural students have similar schools, level of teacher training and
facilities. Moreover, all mainstream level schools in Libya follow the same national
curriculum. This fact can be directly associated with a similar level of cognitive
development, because both environments provide similar stimuli (abu-hsd, 2002).The
Flynn effect stated that IQ is directly related to education. As both rural and urban
students were receiving the same level of education, no differences in IQ were detected.
8.5.5 SPM and age (study level)
For the purposes of this study, age was equivalent to study level. Statistically significant
differences in SPM mean scores was found. In the main study, analysis showed that the
British percentile equivalents of the means of the ages combined on the British norms for
the SPM collected in 1979 and given in Raven (1981) are the 16th PC for the 8 year olds
(IQ=85), the 13th PC for the 9 year olds (IQ=83), the 8th PC for the 10 year olds (IQ= 79),
and average the 6.7th PC (IQ= 79.4) for the 11-17 year olds. The American percentiles
percentile equivalents are the 9th PC for the 18 year olds (IQ=80), the 11th PC for the 19
and 20 years olds (IQ=82), the 4th PC for the 21 year olds (IQ= 83), and average the
298
8.75th PC (IQ= 81.75). Overall, the IQs obtained by the Libyan students ranged between
74 and 85. The average IQ for the fourteen tested Libyan age groups 8 through 21 was
81.
Similarly, in the meta-analysis, older students achieved higher SPM scores than younger
students. (8-11 age group IQ 91, 12-14 age group IQ 87, 15-17 age group IQ 89, 18-21
age group IQ 88).
As the age of the student increased, naturally the study level increased. All tested students
in a certain grade were all in the same age e.g. all tested 3rd grade students were 8 years
of age. That was done to ensure all students has the same academic experience, re-sit
students usually had more academic experience than first time students.
These results were in agreement with other studies. Abdalla et al., in 2002, Lynn and
Irwing 2004 and 2005 studies supported the result showing that IQ scores increased with
age. It is suggested that cognitive ability increases with age, probably as a result of the
learning and growth of the brain (Lynn, 2008 personal communication).
In addition, greater cognitive stimulation arises from the greater complexity of more
recent environments provided by e.g. television, media and computer games: Elley
(1969), Jensen (1998), Schooler (1998), Williams (1998), and Sundet, Barlaug &
Torjussen (2004), Essawe (1973). All these would enhance the perception and awareness
of children and improve mental abilities as age increases.
In a representative sample for the entire population from childhood to adulthood one
would expect to find a progressive increase in the SPM scores with age groups. Previous
299
studies reported the increase of SPM scores with younger subjects e.g. Baraheni (1974),
Sinha (1977), Pind et al. (2003), Lynn et al. (2004) and Khelefeeh and Lynn (2009).
Nevertheless, with a Tanzanian secondary school sample, Kilingelhofer (1967) found
that there was a tendency for the SPM scores to vary inversely with age especially 15, 16
and 17 years. Burke and Bingham (1969) found that the performance on the SPM was
negatively related to age for a sample of 91 patients with age ranged from 19 to 59 years.
Also, Byrt and Gill (1973) who standardized the SPM test in Ireland concluded that
intelligence does not remain constant from age 15 throughout the adulthood but rises and
fall in different groups depending upon education, training or intellectual activities which
these group indulge in or neglect.
In Iran, Baraheni (1974) reported that intellectual functions tapped by the Progressive
Matrices reached a maximum level in an Iranian group by age 15 and that at a higher age
level the test failed to differentiate age groups. Burke (1985) found that the score of the
SPM decreased with increasing age, his result was based on the screening of 500
vocational counselling and 2992 psychiatric patients. Finally, in study carried out in
Jamaica, Persaud (1987) suggested that the decline of intellectual capacity of women
from the age of 26 years onwards on the SPM can be attributed to age.
An interesting finding in this study was that there was an increase in SPM scores until 19
years of age. After that, an almost steady plateau in SPM results until 21 years of age was
found; there were no differences in SPM scores after 19 years of age. This was consistent
with numerous SPM data sets reviewed in Raven (1939), Raven (1941), Raven (1986),
300
Raven (1989), Raven, et al., (1995), Raven, et al., (1996), Raven, et al., (1996a), Raven,
(1998), Raven, et al., (2000). Thus, fluid intelligence reached its plateau around the age
of 20.
8.5.6 SPM and academic discipline
In regards to academic discipline, there were statistically significant differences in SPM
mean scores in favour of the scientific academic discipline in all four university study
levels. This may be attributed to the familiarity of science students with some courses in
science discipline which deal with abstract reasoning. One of the major problems in the
education system in Libya, particularly in the art discipline, is that the method of learning
in this academic discipline relies heavily on rote memorisation, and little attention is paid
on reasoning or abstract thinking. It seems that rote learning is a factor that the SPM
cannot measure (Attashan and Abdalla 2005).
The findings of this study is similar to Shanthamani’s (1970) who found that science
students scored higher than art students on Alexander’s Battery for intelligence and also
agreed with Sinha (1977) who found that science students scored higher on the SPM in
an Indian sample and (Attashan and Abdalla 2005) in his unpublished data.
8.5.7 Relationship and prediction of SPM
According to the SPM test manual (2004), the external criterion commonly adapted in
predictive validity investigations are examination grades or teacher’s estimates. SPM
correlations with academic achievement tests generally fall in the region of 0.20 to 0.60
(Raven et al., 2004). This study showed a correlation of 0.33 to 0.56. This was in
agreement with Tulkine and Newbrough (1968) Mclaurin and Farrar (1973) Sinha (1968)
301
Baraheni (1974) Sinha (1977) Maqsud (1980) Powers et al., (1986.b) Avoy (1987)
Carver (1990) Majdub (1991) Laidra et al (2007). The average correlation of these
studies and others was found to range between 0.37 to 0.49 (see table 4.6). A possible
explanation would be that of Andrich, & Styles, (1994). They believed that Progressive
Matrices test contains material not taught directly in schools and yet shows substantial
relationship with scholastic achievement.
The results of this study showed that age and achievement were predictors of SPM results,
with age being the best predictor. As age and achievement increased, SPM results
increased. Similarly, in the meta-analysis, results showed that SPM score means were
predicted by age and development status; age was also the best predictor. SPM scores
increased as age increases and as development status improved. Our results were in
agreement with previous studies carried out by Pind et al. (2003) and Taylor (2007). This
confirms earlier results that gender and region in the main study and gender in the meta-
analysis have no effect on SPM scores.
8.5.8 SPM percentiles
A number of studies have indicated that students from developing countries performed
less well than students from developed countries on the SPM test. According to the SPM
(1996) manual, an Australian study by de Lemose (1989) noted a tendency for students
from non-English speaking cultures, such as Southern / Eastern European and Middle
Eastern countries, to score lower in the SPM test.
Raven et al., (2004) reported that some groups lagged behind the British norms such as
groups from Brazil, Ireland and black and Native Americans within the USA. In all
302
countries, norms of children from less privileged socio-economic backgrounds and rural
area are lower than their counterparts. They added that the explanation most commonly
offered for these differences was that the test did not engage the concerns of people from
disadvantaged backgrounds and that it demanded thought processes which were
unfamiliar to them.
The range of difference between the percentile scores between the Libyan students and
the British sample aged (13 years) was from 7 to 14 points. They varied by 7 points at
95th percentile, 10 points at 90th percentile, 9 points at 75th percentile, 10 points at 50th
percentile, 12 points at 25th percentile, 14 points at 10th percentile and 13 points at 5th
percentile. E.g. if a Libyan student aged 13 years scored 33 on the SPM test, he would
score in the 50th percentile according to the Libyan norms. However, according to the
SPM manual (1988, 1996 and 2008) he would score in the 10th percentile of the British
norms. Also, if a Libyan student aged 14 years scored 47 he would be in the 95th
percentile of the Libyan norms, 50th percentile according to the Slovenia, Australian
norms and British. These two examples illustrated the misuse and misinterpretation of
intelligence tests used now in Libya due to the use of standardised western norms instead
of local norms (please refer back to chapter three for more discussion).
The lower scores of the Libyan sample in the SPM test with respect to developed
countries norms were expected. All studies conducted in developing countries determined
that individuals from developed countries scored higher than individuals from developing
countries in the SPM test. The meta-analysis which was conducted in chapter seven in
this study revealed that there was a significant difference between students from
303
developed countries and students from developing countries in the SPM mean scores (df
= 2,125, F = 8.157, P<.000).
This might be explained in terms of variation in education, environment, nutrition, child
rearing, social income, confidence in test taking, family size, the “individual multiplier”
and “social multiplier” and heterosis. In addition, amount of previous familiarity with test
material and testing situation may have had a role. For almost all of the Libyan students
this was their first time to see or take an IQ test.
Regarding education in Libya, the human development report in 2002 in Libya stated an
obvious deficiency in teaching skills among teachers. The average is 30 or more students
per teacher. Also, school building and facilities were deemed out-dated and inappropriate
for carrying out the teaching process. This reached a maximum of about 70% of schools
in some places. Up-to-date computer programs are not available in 89% of the school
(p327). Nutrition in Libya shows a lack of strategic planning on the national level, with
there a huge dependence on imported food (p378). According to the General Authority of
Information in 2006 Average family size in Libya was 6 individuals. 18% of the families
contained more than 10 individuals, whereas 50% of the families had more than 5
individuals. The average income in Libya was 2618 Libyan Dinar (Equivalent to £1300
pounds) per year. Also, traditions in Libya dictate that marriages are done from within the
country. It is highly unpreferable for a Libyan to marry a non-Libyan.
304
The percentile ranks of the SPM scores for the Libyan sample in this study emphasized
the need for separate norms for age groups, male and female students and art and science
discipline students.
8.6 Study conclusions
In this chapter we have examined and evaluated the findings of this study. The aim is to
adopt a mental ability test suitable for a Libyan population. The lack of such complete
and useful means of testing in the third world, generally, and Libya in particular is
sufficiently an indicator of the vitality of this research study. As stated in section 7.2, the
current employed mental tests in Libya share the feature of incompleteness. The test does
not cover the whole range of test-items that is meant to cover. As a solution of this
problem the current study presents the SPM test as an alternative. Its psychometric
characteristics place it in the top of the list of appropriate intelligence tests in Libya.
Since the whole study is made up of two parts: main study and meta-analysis, the
conclusion of each is presented in the following:
A) Main study conclusions:
1. It showed that intelligence measured by the SPM has validity in a new country
(Libya) in which the SPM has not been used until now.
2. The overall SPM score means for the Libyan sample was 32.31 with a standard
deviation of 11.94 (minimum scores 6 and maximum 58). This was considered lower
than students from developed countries but similar to those from developing
countries.
3. The IQ score was 81 for the fourteen, from 8 to 21, Libyan age-groups.
305
4. No gender significant differences were found on SPM means score in total sample as
well as in ages 8, 9, 12, 13, 14, 18, through 21. However, females obtained
significantly higher SPM means than males at age of 10 years. Whereas, males scored
significantly higher means than female at the ages of 11 and 15 through 17. In
addition, there were no significant gender differences in total means and in each
region means. Also there was a lack of significant gender differences in total means
and in each discipline means (science & art). Thus, the gender variable was not an
important factor affecting the Libyan students’ scores on the SPM test.
5. Gender differences in variability on SPM test; results indicated no consistent
tendency for gender difference in variability.
6. No significant difference in sample performance on the SPM test according to region.
Thus, the region variable was not an important factor affecting the Libyan students’
scores on the SPM test.
7. Significant differences were found between the SPM scores based on age as well as
study levels. Thus, age and study levels variables were important factors affecting
Libyan students’ scores on the SPM test
8. Students from the science academic discipline had significantly higher SPM mean
scores than students from the art discipline. Thus, the academic discipline was an
important factor affecting the Libyan students’ scores on the SPM test.
306
9. All correlation coefficients between SPM and students (SAA) were statistical
significant for all groups.
10. Age and achievement were predictors for SPM results with age being a better
predictor. Whereas gender and region were not significant predictors.
B) Meta-analysis conclusions:
1. The SPM test was valid in a different culture (Libya) from economically developed
western nations.
2. Developed countries achieved higher SPM scores than developing countries and than
Libya. No statistically significant differences were found in SPM scores between Libya
and developing countries. Thus development status was concluded as being an important
factor affecting the SPM.
3. The IQ score was 95 for developed countries and 82 for developing countries.
4. SPM scores increased as age increased. In addition, SPM scores of the age groups
were statistically different based on development status but not different based on gender.
5. No significant differences were found between SPM scores based on gender. In
addition, no gender differences were found among the age groups or development status.
6. No consistent tendency for gender difference in variability.
7. Age and development status were predictors for SPM results. Age was a better
predictor.
307
8.7 study contributions
Following are the contributions of this study to the intelligence testing in Libya:
• This study is considered to be the first attempt to standardize Raven’s Standard
Progressive Matrices (SPM) test for a sample from Libya.
• Providing norms for the (SPM) test for use, in conjunction with examination
grades, aim to help in implementing appropriate decisions related to :
1. The future of individuals and to guide them to educational programs that
will better suit their abilities.
2. Job selection to match applicants to suitable employment. Many sectors in
Libya only use examination grades as the method in matching students to
various academic establishments and for various jobs in the vocational
sector. IQ scores may assist in this selection process.
3. Assist in the identification of gifted individuals (geniuses) and diagnosis
of individuals with mental retardation.
• Providing the means to estimate levels of intelligence since our society lacks these
tests, to be able to recognize high as well as low IQ in the society.
• Providing possible data regarding the difference in level of intelligence between
gender, age groups and different locations such as rural and urban areas.
8.8 Limitations of the Study
This study was carried out to standardize the British mental ability test; administering the
Raven's Standard Progressive Matrices (SPM) test to a sample consisting of School and
308
University students (8 to 21 years) from the eastern province in Libya during the year
2007 – 2009. To provide an intelligence test that best suited a Libyan setting.
It should be taken into account that the goal was not to change or underestimate the
existing method (examination grades) used now in Libya as a measure of school
achievement, but to offer researchers and psychologists a mental ability test to be used in
conjunction with examination grades in order to improve prediction and placement
procedures.
As mentioned earlier, intelligence is a very difficult construct to define. There are
different types of intelligence besides the aspect of intelligence (educative ability or
general cognitive ability) that the SPM measures such as social intelligence, emotional
intelligence, and the intelligent hands of a craftsman or the intelligent intuition of a
scientist. All these elude the ‘g’ straightjacket. Also, IQ tests do not measure intelligence
directly but those qualities that are thought to reflect it. As a consequence, within each
test there is an element of subjectivity.
IQ tests are criticised on a number of other levels. For example, they are validated
primarily in terms of their correlation with educational achievement. But this ignores the
fact that educational achievement is influenced by factors such as social class,
opportunity, and motivation. Another interesting phenomenon is the fact that a person can
increase his or her score through practice.
In addition, although subtests measuring different abilities tend to be positively correlated
(people who score high on one such subtest are likely to be above average on others as
309
well), individuals rarely perform equally well on all the different kinds of items included
in a test of intelligence. One person may perform relatively better on verbal than on
spatial items, for example, while another may show the opposite pattern.
These complex patterns of correlation can be clarified by factor analysis, but the results
of such analyses are often controversial themselves. Spearman has emphasized the
importance of a general factor, “g”, which represents what all the tests have in common,
while Thurstone focused on more specific group factors such as memory, verbal
comprehension, or number facility. It should be noted that to base a concept of
intelligence on test scores alone is to ignore many important aspects of mental ability.
Other mental abilities defined broadly but not measured by intelligence tests include
creativity, emotional intelligence, social intelligence, and persistence.
Proponents of general intelligence posit that intelligence is innate and heritable single and
measurable, and does not change, nor is it affected by culture or environment. The
evidence based testing of “g” using standardized tests validates its use as a reliable
predictor of student success. There is a huge amount of evidence that “g” is a reliable
predictor of student educational attainment, earnings and socio-economic status (Brody,
1992; Lynn & Vanhanen, 2002, 2006; Mackintosh, 1998b). Brody, N. (1992).
Intelligence. San Diego, CA: Academic Press.
This suggests that all mental test of cognition (verbal, mathematical, spatial visual, and
memory) measure “g”, a similar single factor. It is the “g” factor that makes mental tests
a valid predictor of intelligence. Even though “g” is an established predictor of
intelligence, proponents of plural intelligences (Gardner, 1983, 1993, and 1995) suggest
310
“g” measures only verbal-linguistic and mathematical-logical intelligences, omitting
other intelligences that are just as important.
The limitations of the historical definitions of intelligence led Guilford, Thurstone,
Gardner, and Sternberg to develop theories of multiple intelligences. Guilford and
Thurstone argued that intelligence is comprised of several independent factors; Sternberg
argued that intelligence is comprised of three abilities; and Gardner’s original theory
suggested intelligence is comprised of seven abilities, later adding an eighth. Gardner’s
multiple intelligence (MI) theory posits intelligence is plural, culturally bound, varies in
strength, develops at various rates, and is immeasurable using psychometric tests. His
work with retarded and savant children and adults with brain damage led to the
development of this theory. Gardner originally proposed seven intelligences: verbal,
musical, mathematical, kinaesthetic, spatial, interpersonal, and intrapersonal. He later
added an eighth, naturalistic.
Group comparisons of IQ are problematic. Attempts have been made to make ‘culture-
fair’ or ‘culture-free’ tests, as if such a thing were possible, to allow comparisons of ‘g’
between people from very different societies. But “culture fair” is not valid in all settings
in which the SPM was conducted. When Lev Vygotsky tested Russian peasants back in
the 1930s, he found that answers that seemed logical to an urbanite were responded to
quite differently, but with parallel logic, by the peasants.
It has become well established that intelligence has increased in a number of countries
during the last 80 years or so. An early study by Tuddenham (1948) reported that the IQ
of American conscripts increased by 4.4 IQ points a decade over the years 1917-1943.
311
Subsequent studies confirmed that IQ increases have occurred in the United States,
Scotland, England, Japan and several countries in continental Europe (Scottish Council
for Research in Education, 1949; Cattell, 1951; Lynn, 1982; Flynn, 1984, 1987, 2007;
Lynn & Hampson, 1986; Lynn, Hampson & Mullineaux, 1987). Most of these IQ
increases have been reported in the economically developed nations and very few
economically developing countries including Brazil (Colom, Flores-Mendoza & Abad,
2007), Dominica (Meisenberg, Lawless, Lambert & Newton, 2005), Kenya (Daley,
Whaley, Sigman, Espinosa & Neuman, 2003), and Sudan (Khaleefa & Lynn, 2009).
Within the last years, it was noticed that the SPM test was failing to discriminate above
the 75th percentile among adolescents and young adults living in societies with a tradition
of literacy. This happened due to the dramatic and unexpected international increase in
SPM scores over the years. This was evident in societies where individuals have been
tested by the SPM several times and were acquainted with such tests. As our tested
sample in Libya did not carry out the SPM test before and they had no past experience
with mental testing, the SPM test deemed appropriate to be used in this situation. Also,
the ceiling effect exhibited in tested developed countries was not evident in developing
countries. The highest score obtained in our sample was 58 correct items out of 60. The
ceiling effect means that a number of test takers get all the answers right and have
therefore reached a ceiling. It can be inferred that these would have been able to answer
more difficult answers correctly. Ceiling effects have been observed in the Progressive
Matrices as average scores have increased during the last 70 years and increasing
proportions have reached the ceiling. To deal with this problem, Raven has added some
312
more difficult items to the Standard Progressive Matrices in a new version called the
Standard Progressive Matrices Plus (Raven et al., 2000).
8.9 Recommendations of the Study
This standardization is considered the first attempt to standardize the progressive
matrices test in Libya. Many difficulties were faced during this process, in which
individual efforts were reinforced by various establishment efforts to overcome these
difficulties. The wide landscape covered in this study and huge financial obligations were
not easy to be met. In this respect, the researcher would like to suggest the following:
1. It is hoped that result of this study will help Libyan researchers and psychologists to
develop a better understanding of mental test and their use, misuse and limitations
and to stop testing and labelling of children according to scores and norms obtained
from incomplete or non-standardized intelligence tests. Also, it is hoped that this
effort will stimulate similar studies in the area of psychological testing in Libya today
where further research is needed.
2. The results of this study are encouraging enough to start the testing movement in
Libya by conducting more studies and adapting more psychological tests. Culture
fair tests for intelligence such as the SPM test which were constructed in developed
countries can be successfully adapted and standardized to a Libyan sample.
Therefore, because of the need for psychological tests, at least one test in each of the
following areas: intelligence, aptitude, vocational interest and personality should be
adapted from different cultures and standardized in Libya. The construction of a
313
specialized psychological department in the Ministry of Education and Ministry of
higher education in Libya to supervise standardization tests is highly desirable.
3. Due to the significant differences noticed in this study between students according to
academic discipline and age, it is recommended to use separate norms for each group.
4. No single intelligence test in existence today is a full, accurate and comprehensive
measure of mental ability. Since the SPM test is considered as a measure of nonverbal
ability, therefore it should be always used in conjunction with other test of verbal
ability to measure both abilities.
5. This study indicated that the SPM has high reliability and validity. Therefore, it
seems that the SPM is capable of identifying higher achieving students and thus can
potentially be used safely for school selections beside examination scores.
6. It is recommended to use the SPM test in Libya to identify gifted students, and
students with low mental ability or with low academic achievements. The SPM has
been shown to be one of the best predictors of both high and low educational
attainments (Brody, 1992; Lynn & Vanhanen, 2002, 2006; Mackintosh, 1998b). It
may be desirable to place gifted children in special classes, a practice known as
streaming in Britain and tracking in the United States. In Britain, comprehensive
education has now largely superseded this approach. It is argued that the advantage of
this is that gifted students can be given accelerated education. Conversely, students
with low mental ability or with low academic achievements can be identified and put
in classes for slow learners and taught at a slower pace suitable for their ability.
314
7. As Libyan children fail to develop reasoning skills while they are in school, as
compared with British children, it may be that the solution to this problem would be
for teachers in Libya to devote more attention to teaching reasoning skills.
The SPM is also used for job selection, i.e. to identify those with the ability to perform
well in cognitively demanding occupations, and could usefully be introduced in Libya for
this purpose.
8.10 Further research
This study has provided a useful basis for further studies. Based on the limitations and the
findings of this study the following related topics are recommended for further research:
1) Carry out the SPM test on age groups that were not tested in this study;
younger students, employees and adults in Libya. It would be useful to have
norms for these groups not tested in the present study, especially for a
representative sample of adults of different ages and gender. This would be useful
for job selection, and to see whether among adults men have higher a higher
average IQ than women, as reported in the meta-analysis of Lynn & Irwing
(2004).
2) Carry out and standardization of other mental tests as SPM Plus, coloured PM
tests and advanced SPM tests in Libya. A standardization of these tests would
provide additional useful norms for Libya. The Colored PM (CPM) is suitable for
young children aged 5-10 years, and the SPM Plus and advanced PM (APM) are
more difficult versions of the test suitable for people ranking at the top of the
ability range. Moreover, it will be substantially useful to curry out multiple
315
intelligence tests studies. These tests are based on plural intelligence theories such
as Gardner’s theory.
3) Study the effect of other factors as parents’ occupations, family size, parent’s
education, birth order and experience with the test on SPM results. The collection
of data for these would provide useful information about the correlation of
intelligence in Libya.
4) Designing and development of a mental test in Libya that best suits the local
environment. It would be useful to obtain data for Libya for other kinds of
intelligence such as verbal knowledge, memory and spatial ability.
316
References:
Abaujaafer, A. (1983). Pupils’ Achievement in Preparatory Schools in the City of

Tripoli, Libya and its Relationship to Parents’ Attitudes, Home
circumstance and Schooling, University of Sheffield.
Abdalla Saleh (2002). a meta-analysis on the Progressive Matrices:. University of

Omar El-Mukhtar [in Arabic].
Abdel-Khalek, A. (1988). "Egyptian Results on (SPM). Personality and Individual

Difference “ 9: 193-195.
Abdel-Khalek, A. and R., J. (2006). Normative data from the standardization of

Raven’s Progressive Matrices in Kuwait in an international context. Social
Behavior and Personality, 34, 169-180.
Abdel-Khalek, A.M. and L., R. (2006). Sex differences on a standardisation of the

Standard Progressive Matrices in Kuwait. Personality and Individual
Differences 40: 175-182.
Abraham, G. et al (1991) An Introduction: Pharmaco Epldemlology. United states.

Harvey Whitney Books company.
Abu-Hatab, F. et al (1977). The standardization of the Standard Progressive Matrices

in a Saudi sample. In F. Abu-Hatab (ed.): Studies on the Standardization of
Psychological Tests, Vol. 1, pp. 191-246. Cairo, Egypt: Anglo-Egyptian
Library [in Arabic].
Abul-Hubb, D. (1972). Application of Progressive Matrices in Iraq. In: L.J.

Cronbach and P.J. Drenth (eds.): Mental Tests and Cultural Adaptation.
The Hague: Mouton.
Abu-shad, H. (2002). Genetic and Environmental Factors Associated with Cognitive

Ability and Scholastic Achievement among Arabs of the Negev Region in
Southern Israel, University of Minnesota. PhD.
Ahmad, R. K., S.J., Z. And L, R (2008). Gender differences in means and variance
on the Standard Progressive Matrices in Pakistan . Mankind Quarterly, 49,
50-57.
Ahmann, J. A. G., M. (1976). Evaluation Pupil Growth: Principles of Tests and

Measurement. Boston, Allyn and Bacon, Inc.
Ahlam (2003) evaluate the relationship between intelligence and high school
students’ academic achievement. University of Omar El-Mukhtar [in
Arabic].
317
Aiken, L. (1988). Psychological Testing and Assessment. Boston, Allyn and Bacon,
Inc.
Alexopoulos, D. (1979). Revision and Standardization of the Wechsler Intelligence

Scalefor Children for the age of 13-15 Years in Greece, Univeristy of Wales
Cardiff.
Attashani S. and Abdalla Saleh (2005). Analysis mores of the study and effect extent
of this mores by collection from factors of personality, family and academic
achievement with students of university sample. University of Omar El-
Mukhtar [in Arabic].
American Educational Research Association, A. P. A., & National Council on

Measurement in Education (1999). Standards for educational and
psychological testing. Washington DC.
Anastasi, A. (1988). Psychological Testing. London, Macmillan Company.
Anastasi, A. A. U. S. (1997). Psychological Testing. New Jersey, Prentice-hall.
Andrich, D. A. S. I. (1994). "Psychometric Evidence of Intellectual Growth Spurts in

Early Adolescence." Journal of Early Adolescence 14: 328-344.
Arija, V. Esparo, G., Fernandez-Ballart, J., Murphy, M.M., Biarnes, E. & Canals, J
(2006). "Nutritional status and performance in test of verbal and non-verbal
intelligence in 6 year old children." Intelligence 34: 141-149.
Armfield, A. (1985). "A Comparison of High-ability and Low-ability Pupil scores on

Raven’s Standard Progressive Matrices at Primary School Attached to
South Normal University and Guangzhou School for the Deaf/Mute,
Guangzhou, People’s Republican of China. ." School Psychology
International 6: 24-29.
Arthur, W. A. D., D. (1994). "Development of a Short Form for the Raven APM
Test." Educational and Psychological Measurement 54: 394-403.
Arthur, W. A. W., D. (1993). "A Confirmatory Factor Analytic Study Examining the
Dimensionality of Raven’s Progressive Matrices." Educational and
Psychological Measurement 53: 471-478.
Ary, D. J., L. and Razavih, A. (1985). Introduction to Research in Education. New

York, Holt, Rinehart and Winston.
Banks, C. A. S. U. (1951). "An Item-Analysis of the Progressive Matrices Test." The

British Journal of Psychology (Statistical Section) 2: 92-94.
318
Baraheni, M. (1974). "Raven’s Progressive Matrices as Applied to Iranian Children."
Educational and Psychological Measurement 34: 983-988.
Barnett, S. M. W. W (2004). "National intelligence and the emperor's new clothes:

IQ and the Wealth of Nations." Contemporary Psychology 49: 389-396.
Bart, W. K. A. and Lane, J. (1986). "The Development of Proportional Reasoning in

Qatar." The Journal of Genetic Psychology 148: 95-103.
Benton, D. (2001). "Micro-nutrient supplementation and the intelligence of

children." Neuroscience and Behavioral Reviews: 297-309.
Berk, l. (2000). child development. Massachusetts, Allyn and Bacon.
Bertrand, A. A. C. J. (1980). Test, Measurement, and Evaluation: A Development

Approach. Reading, Mass, Addison-Wesley Publishing Company.
Biesheuvel, S. (1969). Methods for Measurement of Psychological Performance: A

Handbook of Recommended Methods based on an IUPS/IBP Working.
Oxford: , Party Blackwell.
Bingham, W. B. H. and M., S. (1966). "Raven’s Progressive Matrices: Consttruct

Validity." The Journal of Psychology 62: 205-209.
Blair, C. Gamson, D., T., S. and B., D (2005). "Rising mean IQ: Cognitive demand
of mathematics education for young children, population exposure to formal
schooling, and the neurobiology of the prefrontal cortex." Intelligence 33:
93 -106.
Blennerhassett, L. S., S. and Hibbett, C. (1994). "Criterion Related Validity of

Raven’s Progressive Matrices with Deaf Residential School Students."
American Annals of Deaf 139: 104-110.
Blood, D. A. B., W. (1972). Educational and Evaluation. New York, Harper and Row
Publishers.
Bocéréan, C. Fischer, J-P., & Flieller, A. (2003). "Long term comparison (1921-2001)
of numerical knowledge in 3 to five and a half year old children." European
Journal of Psychology of Education 18: 405-424.
Borg, B. A. G. M (1979). Educational Research. New York, longman.
Born, M. B., N. and Flier, H. (1987). "Cross-Cultural Comparison of Sex-Related

differences on Intelligence Tests: Ameta-analysis." Journal of Cross
Cultural Psychology 18: 283-314.
319
Brand, C. R. (1987). "Bryter still and bryter?" Nature 328: 110.
Brand, C. R., Freshwater, S. & Dockrell, W.B. (1989). "Has there been a massive
rise in IQ levels in the West? Evidence from Scottish children." Irish
Journal of Psychology 10: 388-393.
Brislin, R. A. T., R (1973). Cross-Cultural Research Methods. New York, John

Wiley and Sons.
Brown, F. (1971). Measurement and Evaluation. Iowa:, F.E.Peacock Publisher, INC.
Brown, F. (1971). Measurement and Evaluation. Iowa, F.E.Peacock Publisher, INC.
Brown, F. (1981). Measuring Classroom Achievement. New York, Holt, Rinehart

and Winston.
Brown, F. (1983). Principles of Educational and Psychological Testing. New York,

Holt, Rinehart and Winston.
Burke, H. (1958). "Raven’s Progressive Matrices: A Review and Critical

Evaluation." The Journal of Genetic Psychology 93: 199-228.
Burke, H. (1972). "RPM: Validity, Reliability, and Norms." Journal of Psychology

82: 253-257.
Burke, H. (1985). "Raven’s Progressive Matrices (1938): More in Norms, Reliability

and Validity." Journal of Clinical Psychology 41: 231-235.
Burke, H. A. B., W. (1969). "RPM: More on Construct Validity." Journal of

Psychological 72: 247-251.
Burroughs, G. (1975). Design and Analysis in Educational Research. Oxford, Alden

& Mowbray Ltd.
Byrt, E. A. G. (1973). The Standardization of the Raven Progressive Matrices and

Mill-hill Vocabulary Scale for Irish School Children aged Six to Twelve
Years, University of College Cork.
Carpenter, P. J., M. and Shell, P (1990). "What One Intelligence Test Measures: A
Theoretical Account of Processing in (SPM) Test." Psychological Review
97: 404 - 431.
Carroll, J. B. (1993). Human Cognitive Abilities. Cambridge, Cambridge University

Press.
320
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies.
New York, Cambridge University Press.
Carver, R. (1990). "Intelligence and Reading Ability in Grades 2-12." Intelligence 14:
449-455.
Cattell, R. B. (1951). "The fate of national intelligence: test of a thirteen year

prediction." Eugenics Review 17: 136-148.
Cattell, R. B. (1971). Abilities: Their Structure, Growth and Action. Boston,

Houghton Mifflin.
Ceci, S. J. (1991). "How much does schooling influence general intelligence and its
cognitive components? A reassessment of the evidence." Developmental
Psychology( 27): 703-722.
Chan, J. (1982). The Use of Raven’s Progressive Matrices Test in Hong Kong. 20th
International Congress of Applied Psychology. . Edinburgh Scotland.
Cohen, R. J. S., M.E (2002). Psychological testing and assessment: an introduction to

test and measurement. Boston, Mcgraw-Hill.
Colom, R., Andres-Pueyo, A. & Juan-Espinosa, M. (1998). "Generational gains:

Spanish data." Personality and Individual Differences 25: 927-935.
Colom, R., Lluis-Font, J.M. & Andres-Pueyo, A. (2005). "The generational

intelligence gains are caused by decreasing variance in the lower half of the
distribution: supporting evidence for the nutrition hypothesis." Intelligence
33: 83-92.
Colom, R., Flores-Mendoza, C.E. & Abad, F.J. (2007). "Generational changes on the
Draw-a-Man test: a comparison of Brazilian urban and rural children tested
in 1930, 2002 and 2004." Journal of Biosocial Science 39: 79-89.
Corman, L. A. B., M. (1974). "Factor Structures of Retarded and Non-Retarded

Children on Raven’s Progressive Matrices." Educational and Psychological
Measurement 34: 407-412.
Corsini, R. (1984). Encyclopedia of Psychology. New York, John and Sons.
Cotton, S. M., Kiely, P.M., Crewther, D.P., Thomson, B., Laycock, R. & Crewther,
S.G, (2005). "A normative and reliability study for the Raven’s Colored
Progressive Matrices for primary school aged children in Australia."
Personality and Individual Differences 39: 647-660.
321
Court, J. (1983). "Sex Differences in Performance on Raven’s Progressive Matrices:
A Review." The Alberta Journal of Educational Research 29 54-74.
Cronbach, L. (1970). Essential of Psychological Testing. New York, Harper and Row
Publisher INC.
Cronbach, L. (1990). Essential of Psychological testing. New York, Harper and Row
Publisher INC.
Daley, T. C. Whaley, S. E., Sigman, M. D., Espinosa, M. P., and Neuman, C. (2003).
"IQ on the rise: the Flynn effect in rural Kenyan children." Pychological
Science 14: 215-219.
Denscombe, M. (1998). The Good Research Guide: For Small-scale Social Research.
Buckingham: Open University Press.
Deshon, R. C. D. & Weissbein, D. (1995). "Verbal Overshadowing Effect on

Raven’s APM: Evidence for Multidimensional Ferrormance Determinants."
Intelligence 21: 135-155.
Dickens, W. T. F. J. R. (2001). "Heritability estimates versus large environmental

effects: the IQ paradox resolved." Psychological Review 108: 346-369.
Domino, L. A. G. D. (2006). psychological Testing: An Introduction. 2nd ed.

Cambridge. University Press.
Drenth, P. E. D. (1972). Implication of Testing for individual and Society. . In

Mental Test and Cultural Adaptation. . Netherlands, Mouton Publisher.
Drenth, P. V. D. F., H. and Omari, I. (1979). The Use of Classroom Test,

Examinations, and Aptitude Tests in Developing Countries. Netherlands: ,
SwetsZeitlinger.
Duffy, M., J, B. (2005). 'Univariate descriptive statistics', in Statistical Methods for

Health Care Research, ed. Munro, B. Philadelphia, Lippincott Williams and
Wilkins.
Durojaiye, M. (1984). "The Impact of Psychological Testing on Educational and

Personnel Selection in Africa. ." International Journal of Psychology, 19:
135-144.
Ebel, R. (1972). Essentials of Educational Measurement. . New Jersey: , Prentice, Inc.
Ebel, R. A. F., D (1991). Essentials of Educational Measurement. New Jersey:,

Prentice, Inc.
322
Education, S. C. f. R. I. (1949). he Trend of Scottish Intelligence. T. London,
University of London Press.
Edwards, O. W. (2003). Cattell- Horn-carroll (CHC) theory and mane difference in

intelligence scores., University of Florida. PhD.
Eells, K. D., A.; Havighurts, R. and Tyler, R. (1971). Intelligence and Cultural
Differences. Chicago:, University Press.
Egan, V. (1989). "Notes and Shorter Communications Link Between Personality,

Ability and Attitudes in a low IQ Sample. Personality and Individual
Differences." 10: 997-1001.
Elley, W. B. (1969). "Changes in mental ability in New Zealand." New Zealand

Journal of Educational Studies 4: 140-155.
Ellis, H. (1904). Man and Woman: A Study of Human Secondary Sexual

Characteristics. London: Walter Scott.
Eysenck, H. A., W. and Meili, B (1972). Encyclopedia of Psychology. London: ,

Search Press.
Eysenck, H. J. (1998). A new look at intelligence. New Brunswick,, NJ: Transaction

Books.
Ezeilo, B. (1978). "Validating Panga Munthu Test and Porteus Maze Test in
Zambia." International Journal of Psychology, 13: 333- 42.
Fancher, R. (1985). The Intelligence Men: Makers of the IQ Controversy. New York,
Morton and Company.
Felsen, I. (1991). The Influence of Age, Intelligence, Gender, and Socio-economic

Statues on Perceived Competencies of Gifted Talented Children.
Hamburgh: , University of Hamburgh
Flieller, A. (1996). "Trends in child rearing practices as a partial explanation for the
increase in children’s scores on intelligence and cognitive development
tests." Polish Quarterly of Developmental Psychology 2: 51-61.
Flieller, A. (1999). "Comparison of the development of formal thought in adolescent

cohorts aged 10-15 years (1967-1996 and 1972-1993)." Developmental
Psychology 35: 1048-1058.
Flynn, J. R. (1984). "The mean IQ of Americans: massive gains 1932 to 1978."

Psychological Bulletin 95: 29-51.
323
Flynn, J. R. (1987). "Massive gains in 14 nations: What IQ tests really measure."
Psychological Bulletin 101(171-191): 171.
Flynn, J. R. (1987). "Massive IQ gains in 14 nations: what IQ tests really measure."

Psychological Bulletin 101: 171-191.
Flynn, J. R. (1994). IQ gains over time. In R.J. Sternberg (Ed.), Encyclopedia of

human intelligence 617-623. New York, Macmillan.
Flynn, J. R. (1998). IQ gains over time: Toward finding the causes. In U. Neisser
(Ed.), The rising curve: Long-term gains in IQ and related measures (pp. 25-
66). Washington, DC, American Psychological
Flynn, J. R. (1999). "Searching for justice: The discovery of IQ gains over time."
American Psychologist 54: 5-20.
Flynn, J. R. and . (2007). What is Intelligence? Beyond the Flynn effect. Cambridge,
Cambridge University Press.
Fontes, P. K., T. Madaus, G.; and Airasian, W (1983). "Opinions of the Irish Public
on intelligence." Journal of Education 17: 55-67.
Foulds, G. A. D., P. (1962). "The Nature of Intelligence Deficit in Schizophrenia Pt.

I, A Comparison of Schizophrenic and neurotics." British Journal of Social
and Clinical Psychology 1: 7-19.
Foulds, G. D., P. McClelland, M. and McClelland, W (1962). "The Nature of

Intellectual Deficit in Schizophrenia: Pt. 2. A Cross-sectional Study of
Paranoid, Catatonic, Hebephrenic and Simple Schizophrenics." British
Journal of Social and Clinical Psychology 1: 141-149.
Foulds, M. A. R., J. (1948). "Normal Changes in the Mental Abilities of Adults as

Age Advances." Journal of Mental Science 94: 133-134.
Freeman, F. (1962). Theory and Practice of Psychology Testing. New York, Henry
Halt and Company.
Freeman, F. (1962). Theory and Practice of Psychology Testing. New York:, Henry
Halt and Company.
Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. New York,
Basic Books.
Gardner, H. (1993). frames of mind: thr theory of multiple intelligences. London,

Fontana.
324
Garlick, D. (2002). "Understanding the nature of the general factor of intelligence:
the role of individual differences in neural plasticity as an exploratory
mechanism." Psychological Review 109: 116-136.
Garrett, H. A. W., R (1966). Statistics in Psychology and Education. London,

Longmans.
Gay, L. R., E. M, et al. (2006). Educational Research: Competencies for Analysis

Applications. 8th ed. New Jersey, Pearson.
Gittinns, J. (1952). Approved School Boys London:HMSO.
Goetzinger, C. P., R. C. W, et al. (1967). "Non-language IQ tests uesd with deaf

children." Volta Review 69: 500-506.
Gomm, R. D., C. (2000). Using Evidence in Health and Social Care. London
Open University/Sage Publications Ltd.
Georgas, J. A. G., C. (1972). A Children’s Intelligence Test for Greece. Netherlands: ,

Mouton Publisher.
Georgas, J. G. (1790). "standardisation of a vocabulary Intelligence Test,(Final

Progress Report, Research MH 12544-01)." Athens: The Athenian Institute
of Anthropos.
Glass, G. (1976). “Primary, secondary, and Meta-Analysis of research.” Educational

Researcher 5: 3-8.
Gould, S. J. (1981). The mismeasure of man. New York, Norton.
Gould, S. J. (1996). The mismeasure of man (Rev. ed.). New York, Norton.
Green, B. and J. Hall (1984). “Quantitative methods for literature review.” Annal
Review of Psychology 35: 37-53.
Greenfield, P. M. (1998). The cultural evolution of IQ. In U. Neisser (Ed) The

Rising Curve. Washington, DC: American Psychological Association.
Gronlund, N. (1981). Measurement and Evaluation in Testing. New York, Macmillan

Publishing Co.
Guilford, J. (1967). The Nature of Human Intelligence. New York: , McGraw-Hill

Book Company.
325
Guilford, J. P. (1985). The structure-of-intellect model. In Wolman, B.B. (1985).
Handbook of intelligence: measurements, and applications. New york: ,
John Wiley & Sons.
Irwing, P. and L., R. (2005). Sex differences in means and variability on the
Progressive Matrices in university students: A meta-analysis. British Journal
of Psychology, 96, 505–524.
Irwing, P., H., A. K., O. and L., R. (2008). "Effects of Abacus training on the
intelligence of Sudanese children." Personality and Individual Differences
45: 694-696.
Helmes, S. (1987). "Concurrent Validation of AH2 as a Brief Measure of intelligence

in Canadian University Students." Educational and Psychological
Measurement 47: 725- 729.
Hennstein, R. J., Y. C, (1994). The bell curve: Intelligence and class structure in
American life. New York: , Free Press.
Hildebrand, D. K. (1986). Statistical Thinking for Behavioral Scientists. Boston,

Duxbury Press.
Herrnstein, R. (1973). IQ in Meritocracy. Great Britain. Allen Lane.
Herrnstein, R. A. M., C. (1994). The Bell Curve: Intelligence and Class Structure in
American Life. New York, The Free Press.
Heyneman, S. (1987). "Use of Examination in Developing Countries: Selection,

Research and Education Sector Management." International Journal of
Educational Development 7: 251-263.
Higgins, T. and Green, S. (2006). Cochrane Handbook for Systematic Reviews of

Reviews of Interventions.Browse the Handbook online at www.cochrane-
handbook.org
Hunt, E. (1975). Quote the Raven? Nevermore. Maryland: , Lawrence Erlbaum

Associates Publishers.
Husen, T. (1951). "The influence of schooling on IQ." Theoria 17: 61-88.
James, H. M. S. S, (2006). Research in Education: Evidence- Based Inquiry, 6th ed.

Boston, Pearson.
Jencks, C. (1972). Inequality: A reassessment of the effect of family and economic

success in America, New York: Basic Books.
326
Jensen, A. (1980). Bias in Mental Testing. London, Methuen and Co., Ltd.
Jensen, A. (1981). Straight Talk about Mental Tests. London, Methuen and Co., Ltd.
Jensen, A. R. (1980). Bias in mental testing. New York: , Free Press.
Jensen, A. R. (1998). The g Factor. Westport, CT: Praeger.
Jensen, A. S., D. and Larson, G. (1988). "Equating the Standard and the Advanced
Form of the Raven Progressive Matrices." Educational and Psychological
Measurement 48: 1091-1095.
Johnson, E. S., D. and Guertin, D (1994). "The Development and Validation of A

Reliable Alternate From For Raven’s Standard Progressive Matrices
Assessment." 3: 315-319.
Kaia Laidra , H. P., Juri Allik (2007). "Personality and intelligence as predictors of
academic achievement: A cross-sectional study from elementary to
secondary school " Personality and Individual Differences 42: 441-451
Kamin, L. a. E., H. (1981). Intelligence: The Battle for Mind. London, Pan Books.
Kamphaus, R. W., Petosky, M.D., Morgan, A.W (1997). A history of intelligence

test interpretation. In D.P. Flanagan, J.L. Genshaft, & P.L. Harrison (Eds.),
Contemporary intellectual assessment: Theories, tests, and issues (pp. 32-
47). New York: , Guilford.
Kaniel, S. a. F., S. (1991). "Level of Performance and Distribution of Errors in the

Progressive Matrices Test: A comparison of Ethiopian Immigrant and
Native Israel Adolescence." International Journal of Psychology,26: 25-33.
Karmel, L. K., M (1978). Measurement and Evaluation in the Schools. New York: ,
Macmillan Publishing Co., Inc.
Karnes, F. a. W. (1988). "Comparison of Group Measures in Identification of Rural,

Culturally Diverse Gifted Students. Perceptual and Motor Skills." 67: 751-
754.
Keehn, J. a. P., E. (1955). "Non-Verbal Tests Predictors of Academic Success in

Lebanon." Educational and Psychological Measurement 15 495-498.
Khaleefa, O. L., R. (2009). "The increase of intelligence in Sudan 1964-2006."

Personality and Individuel Differences 45: 412-413.
Khaleefa, O. & Lynn, R. (2008a) Sex differences on the Progressive Matrices: Some
data from Syria. Mankind Quarterly, 48, 345-352.
327
Khaleefa, O., Khatib, M.A., Mutwakkil, M.M. & Lynn, R. (2008b). Norms and
gender differences on the Progressive Matrices in Sudan . Mankind
Quarterly, 49, 177-183.
Khaleefa, O. & Lynn, R. (2008d). Norms for intelligence assessed by the Standard
Progressive Matrices in Qatar . Mankind Quarterly, 49, 65-71.
King, W. (1963). "Development of Scientific Concepts in Children." British Journal

of Educational Psychology 33: 240-252.
Kline (2000). Handbook of psychologial Testing. 2nd ed. London, Routledge.
kline, J. B. (2005). Psychological Testing: A Practical Approach to Design and

Evaluation New Delhi, Sage publications, Inc.
Kline, P. (1979). Psychometrics and Psychology. London:, Morrison and Gibb Ltd.
Klingelhofer, E. (1967). "Performance of Tanzanian Secondary School Pupils on the

(SPM) Test." Journal of Social Psychology 72: 205- 215.
Kaufman, A. & Kaufman, N. (2004). Esseentials of psycology testing. New

Jersey.John Wiley& Sons, Inc.
Langdridge, D. (2004). Research Methods and Data Analysis in Psychology.

Glasgow, Bell & Bain Limited.
Layman, H. (1968). Intelligence, Aptitude and Achievement testing. Boston,

Houghton Mifflin Company.
Levine, E. (1974). "Psychological Tests and Practices With the Deaf: A Survey of
the State of the Art." The Volta Review 76: 298-319.
Lewis, D. (1967). Statistical Methods in Education. London, University of London

Press LTD.
Lewis, D. (1974). Assessment in Education. London. London, University of London

Press.
LoBiondo-Wood, G. and J. and Haber (2006). Nursing Research, 6th ed. United
States of America, Mosby Inc.
Li, R. (1996). A theory of conceptual intelligence. London:, Praeger.
Lorge, I. (1945). "Schooling makes a difference." Teachers College Record 46: 483-
492.
Lynn, R. (1982). " IQ in Japan and the United States shows a growing disparity."
Nature 297: 222-223.
328
Lynn, R. H., S.L. (1986). "The rise of national intelligence: evidence from
Britain, Japan and the USA." Personality and Individual Differences 7: 323-332.
Lynn, R. P., C. C., J. (1988). "Intelligence in Hong Kong Measured for Spearman’s g
and the Visuosptial and Verbal Primaries Intelligence." 12: 423-433.
Lynn, R., Hampson, S.L. & Mullineaux, J.C. (1987). " A long term increase in the
fluid intelligence of English children." Nature 328: 797.
Lynn, R. (1990b). "Differential rates of secular increase of five major primary

abilities." Social Biology 38: 137-141.
Lynn, R. (1990a). "The role of nutrition in secular increases of intelligence."

Lynn, R. (1993). Nutrition and intelligence. In P.A. Vernon (Ed) Biological

Approaches to the Study of Intelligence. Norwood, NJ: Ablex.
Lynn, R. (1994). Sex differences in brain size and intelligence: a paradox resolved.
Personality and Individual Differences, 17, 257-271
Lynn, R. (1998). In support of nutrition theory. In U. Neisser (Ed) The Rising

Curve. Washington, DC: American Psychological Association.
Lynn, R., Allik, J. & Irwing, P. (2004). Sex differences on three factors identified in
Raven’s Standard Progressive Matrices. Intelligence, 32, 411-424.
Lynn, R. and Irwing, P. (2004). Sex differences on the Progressive Matrices: a meta-
analysis. Intelligence, 32, 481-498.
Lynn, R. (2006). Race Differences in Intelligence: An Evolutionary Analysis. United

States of America, Athens, GA: Washington Summit Books
Lynn, R. T. V, (2006). IQ & Global Ineguality. united States of America, Athens,

GA: Washington Summit Books
Lynn, R. (2008). The Global Bell Curve. Augusta, GA: Washington Summit
Publishers.
Lynn, R. (2009). What has caused the Flynn effect? Secular increases in the
Development Quotients of infants Intelligence.
MacArthur, R. E., W (1962). "The Standard Progressive Matrices as a Culture-

Reduced Matrices of General Ability." Alberta Journal of Research 8: 54-
65.
329
MacAvoy, J. O., S. and Sidle, C (1993). "The Raven Matrices and Navajo Children:
Normative characteristics and culture fair Application to Issues of
Intelligence, giftedness and Academic Proficiency." Journal of American
Indian Education 33: 32-43.
Mackintosh, N. J. (1998b). IQ and Human Intelligence. Oxford, UK: Oxford

University Press.
Mackintosh, N. J. (1998a). "Reply to Lynn." Journal of Biosocial Science 30: 533-

539.
Mackintosh, N. J. (1996). "Sex differences and IQ." Journal of Biosocial Science 28:
559-571.
Macmillan, P. (2005). Social Research, 3rd ed. New York, S.Srarantakos.
Madern, A. M. a. V., S. (1967). "ricerrca sulle capacita di previsione scolastica del

PM 38 di Raven.(Research on Predictive Capacty of Raven's PM 38 test)."
bollrtion di Psicologia Applicata 79-82, 67-82.
Mahdawi, F. A. A. R., A. (1991). Libya: A Challenge Ahead. Great Britain:, Royal

College of Psychiatrists.
Majdub.G (1991). The Psychological Determining of Academic Achievement,

University of Bristol. Ph.D.
Maqsud, M. (1980). "Personality and Academic Attainment of Primary School

Children." Psychological Reports 46: 1271-1275.
Maqsud, M. (1983). "Relationship of Locus of Control to Self-Esteem, Academic

Achievement and Prediction of Performance Among Nigerian Secondary
School Pupils." British Journal of Educational Psychology 53: 215-221.
Marais, C. A. (2007). Using the differential Aptitude test to estimate intelligence and
scholastic achievement at grade nine level, University of South Africa. McS.
Marks, R. (1981). The Idea of IQ. New York:, University Press of America.
Matarazzo, J. (1972). Wechsler’s measurement and appraisal of adult intelligence.

Baltimore, Williams & Wilkins.
Mclaurin, W. A. F., W (1973). "Validates of Progressive Matrices Test Against IQ

and GPA." Psychological Reports 32: 803-806.
Mehotra, K. (1968). "The Relationship of WISC to Progressive Matrices." Journal of

Psychological Research 12: 114-118.
330
Mehryar, A. (1972). "Father’s Education, Family Size and Children’s Intelligence
and Academic Performance in Iran." International Journal of Psychology, 7:
47-50.
Meisenberg, G., Lawless, E., Lambert, E. & Newton, A. (2005). "The Flynn effect in
the Caribbean: generational change in test performance in Dominica."
Mankind Quarterly 46: 29-70.
Melikian, L. (1984). "The Transfer of Psychological Knowledge to Third World

Countries and its Impact on Development: The Case of Five Arab Gulf Oil
Producing States." International Journal of Psychology, 19: 65-77.
Messick, S. (1995). "Validity of psychological assessment: Validation of inferences

from persons' responses and performances as scientific inquiry into score
meaning." American Psychologist 50: 741-749.
Mingroni, M. A. (2004). "The secular rise in IQ: Giving heterosis a closer look."
Mingroni, M. A. (2007). " Resolving the IQ paradox: heterosis as a cause of the

Flynn effect and other trends." Psychological Review 114: 1104.
Miron, M. (1977). "A Validation Study of Transferred Group Intelligence Test."

International Journal of Psychology 12: 193-205.
Mohan, V. (1972). " Raven’s Progressive Matrices and Verbal Test of General
Mental Test." Journal of Psychological Research 16: 67-69.
Murphy, K. A. D. (1991). Psychological Testing: Principles and Application. New

Jersey, Prentice-Hall International, Inc.
Neisser, U. (1998). The rising curve: Long-term gains in IQ and related measures.
Washington, DC, American Psychological Association.
Nelson, H. (1979). Area Handbook Series: Libya a Country Study. Washington, D.C,
The American University.
Nkaya, H. H., M. and Bonnet, J (1994). "Result Effect on Cognitive Performance on

the RM-38 in France and Congo. Perceptual and Motor Skills." 78: 503-
510.
Noll, V. A. S., D (1979). Introduction to Educational Measurement. Boston,

Houghton Miffin Company.
Nunnally, J. (1972). Educational Measurement and Evaluation. New York, McGrow-

Hill Book Company.
331
Oakland, T. (1976). Non-biased assessment of minority group children: With bias
toward none. Paper presented at a national planning conference on
nondiscriminatory assessment for handicapped children. Lexington, KY.
Oakland, T., & Laosa, L.M (1976). Professional, legislative, and judicial influences
on psycho educational assessment practices in schools. In T. Oakland (Ed.)
(1976). Non-biased assessment of minority group children: With bias
toward none. Paper presented at a national planning conference on
nondiscriminatory assessment for handicapped children. Lexington, KY.
Ogunlade, J. (1978). "The Predictive Validity of the (RPM) with some Nigerian.
Educational and Psychological Measurement." 33: 465-467.
Ord, I. (1972). "Testing for Educational and Occupational Selection in Developing

Countries- a review." Occupational Psychology 46: 123 - 166.
Ortar, G. (1972). Some Principles for Adaptation Psychological Test. Netherlands,
Mouton Publisher.
Owen, K. (1992). "The suitability of Ranen's Standard Progressive Matrices for

various groups in south Africa." Personality and individual Differences
13(2): 149-159.
Parmar, R. (1989). "Cross-Cultural Transfer of Non-Verbal Intelligence Tests: An (in)

Validation Study." British Journal of Educational Psychology 59: 379-388.
Pallant, J. (2007). SPSS Survival Manual. Maidenhead, Open university Press.
Persaude, G. (1987). "Sex and Age difference on the Raven’s Matrices." Perceptual
and Motor Skills 65: 47-52.
Popoff-Walker, L. (1982). " IQ, SES, Adaptive Behavior and Performance on a

Learning Potential Measure." Journal of School Psychology 20: 222-231.
Powers, S. B., J. and Jones, P (1986.a). "Reliability of the (SPM) Test for Hispanic
and Anglo-American Children." Perceptual and Motor Skills 62: 348-350.
Powers, S. J., P. and Barkan, J (1986.b). "Validity of SPM as Predictor of

Achievement of Sixth and Seventh Grade students." Educational and
Psychological Measurement 46: 719 - 722.
Rao, S. (1974). "Study of Raven’s Progressive Matrices Test (1956)." Indian

Educational Review 9: 174-189.
Raven, J. (1986). "A nation really at risk:A review of goodlad,s ''A Place Called
School''." Higher Education Review 18: 65-79.
332
Raven, J., J. C. Raven, ( 2003). Manual for Raven’s Progressive Matrices and
Vocabulary Scales. Section 3: The Standard Progressive Matrices. San
Antonio, Harcourt Assessment, Inc.
Rushton J, P. and a. S. M. (2000). "Perfomance on Raven's Matrices by African and

White University Students in south Africa." Intelligence 28(4): 251-265.
Rust, J. and S. and Golombok (2004). Modern psychometrics, 2nd ed. New York,
Routledge.
Raven, J. (1981). Irish and British Standardisations. Oxford, UK: Oxford

Psychologists Press.
Raven, J. (1986). Manual for Raven's Progressive Matrices and Vocabulary Scales.
London, Lewis.
Raven, J. (1989). "The Raven Progressive Matrices: A Review of National Norming
Studies and Ethnic and Socioeconomic Variation within the United States."
Journal of Educational Measurement 26: 1 - 16.
Raven, J., Raven, J.C., & Court, J.H (1993). Manual for Raven's Progressive
Matrices and Vocabulary Scales (Section 1). Oxford, England:, Oxford
Raven, J., Court, J.H. and Raven, J.C (1996). Standard Progressive Matrices. Oxford,
UK: Oxford Psychologists Press.
Raven, J., Raven, J.C. and Court, J.H (1998). Coloured Progressive Matrices. Oxford:
Oxford Psychologists Press.
Raven, J., Raven, J.C. & Court, J.H. (1998). Standard Progressive Matrices. Oxford,
Raven, J. (2000). Manual for Raven's Progressive Matrices. Oxford, Oxford

Raven, J., Raven, J.C. and Court, J.H (2000). Standard Progressive Matrices. Oxford,
Oxford Psychologists Press.
Raven, J. a. C., J.H (1989). Manual for Raven's Progressive Matrices and Vocabulary
Scales. London, Lewis.
Raven, J. C., Court, J.H. and Raven, J (1996a). Raven Matrices Progressivas.
Madrid:, TEA Ediciones, S.A.
333
Raven, J. C. (1939). "The RECI series of perceptual tests: An experimental survey."
British Journal of Medical Psychology 18(16-34): 16.
Raven, J. C. (1939). "The RECI series of perceptual tests: An experimental survey."

British Journal of Medical Psychology 18: 16-34.
Raven, J. C. (1941). "Standardisation of Progressive Matrices." British Journal of

Medical Psychology 19: 137-150.
Raven, J. C. (1941). "Standardisation of Progressive Matrices." British Journal of

Medical Psychology 19: 137-150.
Raven, J. C., Court, J.H. & Raven, J. (1977). Manual for Raven’s Progressive
Matrices & Vocabulary Scales: The Crichton Vocabulary Scale, 1983
Revision. London, H.K.Lewis.
Raven, J. C., Court, J.H. & Raven, J. (1982). The Mill Hill Vocabulary Scale.
London, H.K.Lewis.
Raven, J. C., Court, J.H. & Raven, J. (1983). Manual for Raven’s Progressive
Matrices & Vocabulary Scales: Section 2. London, H.K.Lewis.
Raven, J. C., Court, J.H. and Raven, J (1995). Coloured Progressive Matrices.
Oxford, UK: Oxford Psychologists Press.
Raven, J. C., Court, J.H. & Raven, J. (1996). Standard Progressive Matrices. Oxford,
Raven, J. R., J. and Court, J (1988). Raven Manual: General Overview. Oxford,
Oxford Psychological Press.
Raven, J., J. C. Raven. ( 2003). Manual for Raven’s Progressive Matrices and
Vocabulary Scales. Section 3: The Standard Progressive Matrices. San
Antonio, Harcourt Assessment, Inc.
Raven, J., Raven, J. C., & Court, I. H. (2000, updated 2004). Manual for Raven’s
Progressive Matrices and Vocabulary Scales. Section 3: The Standard
Progressive Matrices. San Antonio, TX: Harcourt Assessment.
Riaz, A, Sarwat, J. Khanam, & Zaeema, R. Raven’s Standard Progressive Matrices

(Classic Form) in Pakistan In J. Raven, & J. Raven, (2008.), Uses and
Abuses of Intelligence: Studies Advancing Spearman and Raven’s Quest for
Non-Arbitrary Metrics. Unionville, New York: Royal Fireworks Press;
Edinburgh, Scotland: Competency Motivation Project; Budapest, Hungary:
EDGE 2000.
334
Richardson, K. (1991). Understanding Intelligence. Philadelphia, Milton Keynes.
Richardson, K. and Norgate S. (2006). "A Critical Analysis of IQ studies of Adopted

Children." Human Development 49: 319-335.
Rimoldi, H. (1948). "A Note on the Raven’s Progressive Matrices Test." Educational
and Psychological Measurement 8: 347-352.
Roe, K. a. R., A (1983). "Schooling and cognitive Development: A Longitudinal

Study in Greece." Perceptual and Motor Skills 57: 147-153.
Roid, G.H., & Barram, R.A. (2004). Essentials of Stanford-Binet Assessment. New
York: Wiley
Rushton, J. P. (1997). "Race, intelligence, and the brain: The errors and omission of
the "revised" edition of S.J. Gould's the mismeasure of man (1996)."
Rust, J. (2008a). Coloured Progressive Matrices and Chrichton Vocabulary Scale

Manual. London, Pearson.
Rust, J. (2008b). Standard Progressive Matrices Plus Version and Mill Hill Manual.
London, Pearson.
Rust, J. A. G., S (1989). The Science of Psychological Assessment. New York,
Routledge.
Sahin, N. and E. and Duzen (1994). "turkish Standardization of the Rave's SPM(Age
6 to 15) " Paper presented to the 23rd International Conference of Applied
Psychology, Madrid.
Samuda, R. (1975). Psychological Testing of American Minorities: Issues and

Consequences. New York, Harper and Row Publisher.
Sattler, J. (1982). Children’s Intelligence and Special Abilities. Boston, Allyn and
Bacon Inc.
Sattler, J. M. (1988). Assessment of children. San Diego, Author.
Sattler, J. M. (1998). Assessment of children's intelligence. In C.E. Walker, & M.C.,

Roberts (Eds.), Handbook of clinical child psychology. New York, NY,
John Wiley.
Sattler, J. M. (1998). Assessment of children's intelligence. In C.E. Walker, & M.C.,

Roberts (Eds.), Handbook of clinical child psychology (2nd ed., pp. 85-100).
New York, NY, John Wiley.
335
Scarr, S. (1981). Race, Social Class, and Individual Differences in IQ. New Jersey,
Lawrence Erlbaum Associates Publishers.
Schooler, C. (1998). Environmental complexity and the Flynn effect. Washington

DC, American Psychological Association.
Schwarz, P. a. K., R (1972). Ability Testing in Developing Countries; A Handbook

of Principles and Techniques. New York:, Praeger Publishers.
Shanthamani, V. (1970). "Relationship Between Intelligence and Other Certain

Variables." Journal of Psychological Research 14: 28-34.
Shayer, M., Demetriou, A. & Pervez, M (1988). "The structure and scaling of
concrete operational thought: three studies in four countries." Genetic,
Social & Psychological Monographs: 309-375.
Shayer, M. (2007). "30 Years on-a large anti-'Flynn effect'? The Piagetian test
Volume & Heaviness norms 1975-2003." British Journal of Educational
Shelley, D. A. C., D (1986). Testing Psychological Tests. London, Croom Helm Ltd.
Sidles, C. A., J (1987). "Navajo Adolescents Scores on (PLQ), (SPM), and (CTBS)."
Educational and Psychological Measurement 47: 703-709.
Sinha, M. (1977). "Validity of the Progressive Matrices Test." Journal of

Psychological Research 21: 221-226.
Sinha, U. (1950). Reliability and Validity of the Progressive Matrices Test. London,
University of London. M.A.
Sinha, U. (1968). "The Use of Raven’s Progressive Matrices Test in India." Indian
Educational Review(3): 75-88.
Singh, U. (1951). "A study of Reliability and Validity of the progressive Matrices
Test." british Journal of educational Psychology 21: 221-226.
Smith, M. A. G., G. (1977). "Relationship of Class-size to Classroom Processes,

Teacher Satisfaction and Pupil affect." AUSTRALIAN JOURNAL OF
EDUCATION 24(3): 329-331.
Snyderman, M., & Rothman, S (1988). The IQ controversy. The media and public
policy. New Brunswick, NJ, Transaction Publishers.
Sokal, M. (1987). Psychological Testing and American Society 1890 - 1930, New
Brunswick: Rutgers University Press.
336
Sorokin, B. (1954). "Standardisation and analysis of Progressive Matrices Test by
Penrose and Raven." Unpublished Report from Zagred Yugoslavia
Spearman, C. (1904). "Intelligence, Objectively Determined and Measured."

American Journal of Psychology 15: 201-293.
Spearman, C. (1927). The Abilities of Man. London, Macmillan.
Spearman, C. (1946). "Theory of General Factor." British Journal of Psychology 36:

117-131.
Spearman, C. E. (1923). The nature of intelligence and the principles of cognition.

London, Macmillan.
Spearman, C. J., L.L (1950). Human ability: a continuation of “The abilities of Man”.
London: Macmillan.
Spitz, H. H. (1989). "Variations in Wechsler interscale IQ disparities at different

levels of IQ." Intelligence 13: 157-167.
Sternberg, R. (1990). Metaphors of Mind: Conception of the Nature of Intelligence.

Cambridge, Cambridge University Press.
Sternberg, R. A. D., D (1986). What is Intelligence. New Jersey, Ablex Publishing

Corporation.
Sternberg, R. C., B. Ketron, J. and Bernstein, M (1981). "People’s Conceptions of

Intelligence." Journal of Personality and Social Psychology 41: 37-55.
Sternberg, R. J. W. S. I. B., L. (2000). Child development. Massachusetts, Allyn and

Bacon.
Sundet, J. M., Barlaug, D.G. & Torjussen, T.M (2004). "The end of the Flynn effect?
A study of secular trends in mean intelligence test scores of Norwegian
conscripts during half a century." Intelligence 32: 349-362.
Sundet, J. M., Borren, I. & Tambs, K (2008). "The Flynn effect is partly caused by
changing fertility patterns." Intelligence 36: 183-191.
Tashakkori, A. H., S and Yousefi (1988). "Effects of Pre-school Education on

Intelligence and Achievement of a Group of Iranian Elementary School
Children." International Review of Education 34: 499-508.
Teasdale, T. W. O., L. (1987). "National secular trends in intelligence and education:

a twenty-year cross-sectional study." Nature, 325: 119-121.
337
Teasdale, T. W. O., D.R. (1989). "Continuing secular increases in intelligence and a
stable prevalence of high intelligence levels." Intelligence 13: 255-262.
Teasdale, T. W. O., D.R. (1994). "hirty year secular trend in the cognitive abilities of
Danish male school leavers at a high educational leve." Tl. Scandinavian
Journal of Psychology 35: 328-335.
Teasdale, T. W. O., L. (2000). "Forty-year secular trends in cognitive abilities."

Teasdale, T. W. O., L. (2008). "Secular declines in cognitive test scores: a reversal of

the Flynn effect." Intelligence 36: 121-126.
Terman, L. M. (1916). The Measurement of Intelligence. New York: Houghton

Mifflin.
Thorndike, E. L. (1910). Educational Psychology. New York: Houghton Mifflin.
Thorndike, R. a. H., E (1977). Measurement and Evaluation in Psychology and

Education. New York, john Wiley and Son, Inc.
Thorndike, R. L. (1977). "Causes of IQ decrements." Journal of Educational
Measurement, 14: 197-202.
Thorstone, L. L. (1938). Primary mental abilities. Chicago:, university of Chicago

Press.
Tuddenham, R. D. (1948). "Soldier intelligence in world wars 1 and 11." American

Psychologist 3: 54-56.
Turner, S. M., DeMers, S. T., Fox, H. R., & Reed, G., M. (2001). "APA's Guidelines
for Test User Qualifications: An Executive Summary." American
Psychologist 56(12): 1099-1113.
Tulkin, S. a. N., J (1968). "Social Class, Race and Sex Differences on the Raven
(1956) Standard Progressive Matrices." Journal of Consulting and Clinical
Tully, G. E. (1967). "Test-retest Reliability of the Raven Progressive Matricse Test

(form 1938) and the California Test of Mental Maturity, Level 4 (S-F
1963). ." Florida Journal of Educational Research 9: 67-74.
Tyler, L. a. W., W (1979). Test and Measurement. London, Prentic-Hall International,

Inc.
338
U.S. Department of Education, O. f. C. R. (2000). The Use of Tests as Part of High-
Stakes Decision-Making for Students: A Resource Guide for Educators and
Policy-Makers.
Urbach, P. (1974). "Progress and degeneration in the "IQ debate"." British Journal of
the Philosophy of Science 25: 99-135, 235-259.
Van den Broek, M. a. B., C (1994). "Detection of Acquired Deficits in General

Intelligence Using the National Adult Reading Test and Raven’s Standard
Progressive Matrices." British Journal of Clinical Psychology 33: 509-515.
Vejleskov, H. (1968). "An Analysis of Raven Matrix Responses in Fifth Grade

Children." Scandinavian Journal of Psychology 9: 177-186.
Vencent, K. a. C., J (1974). "A Re-Evaluation of Raven’s Standard Progressive

Matrices." Journal of Psychology 88: 299-303.
Vernon, P. (1960). Intelligence and Attainment Test. London, University of London

Press.
Vernon, P. (1969). Intelligence and Cultural Environment. London, Methuen.
Vernon, P. E. (1942). The reliability and Validity of the Progressive Matrices Test.
London, Admiralty Report,.
Virgolim, A. M. R. (2005). creattvtty and intelligence: a study of Brazilian gifted and

talented students, University of Connecticut. PhD.
Vroon, P. (1987). " Models of Educational Career with and Without IQ

Measurements." The Journal of Psychology 121: 273-279
Vroon, P. d., J. and Meester, A. (1986). "Distribution of Intelligence and Educational

Level in Fathers and Sons." British Journal of Psychology 77: 137-142.
Yonghua, s. (1991). "Report of using Raven's Standard Progressive Matrice in deaf

children." Acta Psychologica Sinica 23(1): 107-112.
Wechsler, D., . (1975). " Intelligence Defined and Undefined A Relativistic

Appraised." American Psychologist 30: 135-139.
Weede, E. K., S (2002). "The impact of intelligence and institutional improvements

on economic growth." Kyklos 55: 361-380.
Wesson, K. A. (2000). "The Volvo effect-Questioning standardized tests." Education

Week 20: 34-36.
339
Wheeler, L. R. (1942). "A comparative study of the intelligence of East Tennesse
mountain children." Journal of Educational Psychology 33: 321-334.
Whorton, J. a. K., F (1987). "Correlation of Stanford Binet Intelligence Scale Scores

with Various other Measures Used to Screen and Identify Intellectually
Gifted Students." Perceptual and Motor Skills 64: 461- 462.
Whorton, J. a. K., F (1988). "Comparison of the 1979 and the 1986 Norms on the
Standard Progressive Matrices for Economically Disadvantaged Students:
Implication for Identification of Gifted Children." Perceptual and Motor
Skills 67: 749-750.
Williams, W. M. (1998). Are we raising smarter children today? School and home
related influences on IQ. In U.Neisser (Ed) The Rising Curve. Washington,
DC, American Psychological Association.
Wolf, M. (1986) Meta-Analysis Quantitative Methods for Research Synthesis. New

Delhi, Sage Publications, Inc
Yoon, S., N. (2005). Comparing the Intelligence and Creativity Scores of Asian
American Gifted students and Caucasian Gifted students. Graduate School,
University of Purdue. PhD thesis . pp2-3.
Yonghua, S. (1991). "Report of using Raven's Standard Progressive Matrice in deaf

children." Acta Psychologica Sinica 23(1): 107-112.
Young, H. T., R.; Tesi, G. and Montemagni, G (1962). "Influence of Town and
Country Upon Children’s Intelligence." British Journal of Educational
Yousefi, F. S., A.; Razavich, A.; Mehryar, A.; Hosseini, A. and Alborzi, S (1992).
"Some Normative Data on the Bender Gestalt Test Performance of Iranian
Children." British Journal of Educational Psychology 62: 410-416.
Zeidner, M. (1988). "Sociocultural Differences in Examinees’ Attitudes Toward

Scholastic Ability Exams." Journal of Educational Measurement 25: 67-76.
340
Appendix 1
Standard Progressive Matrices: Percentiles for Libyan sample.

Age in years
Score
18 1 2 2
6 1 0 0 0 0 0 0 0 0 0 0 0 0 0
7 2 1 1 0 0 0 0 0 0 0 0 0 0 0
8 4 2 2 1 0 0 0 0 0 0 0 0 0 0
9 8 5 4 2 1 1 0 0 0 0 0 0 0 0
10 9 8 8 4 3 2 1 0 0 0 0 0 0 0
11 14 13 10 6 4 2 1 0 0 0 0 0 0 0
12 24 18 17 9 4 3 2 0 0 0 0 0 0 0
13 31 26 20 11 6 4 2 1 1 0 0 0 0 0
14 40 33 25 16 9 4 3 1 1 1 0 0 0 0
15 44 38 29 18 12 5 4 2 2 2 0 0 0 0
16 47 40 32 21 14 7 4 3 3 2 1 0 0 0
17 56 44 36 24 17 8 6 4 4 3 2 0 0 0
18 73 50 40 28 19 9 7 4 4 4 3 0 0 0
19 78 59 42 30 20 10 8 5 5 4 4 1 0 0
20 82 68 46 32 21 13 9 6 5 6 5 2 0 0
21 83 73 54 35 22 13 10 9 8 7 6 2 2 0
22 85 77 60 37 24 17 12 10 10 8 7 3 2 0
23 93 79 64 39 28 18 14 11 11 8 8 3 3 0
24 93 70 68 41 29 19 15 14 13 9 8 4 3 0
25 94 80 73 44 33 19 18 15 14 10 8 5 4 0
26 94 85 74 49 37 20 18 17 15 12 8 6 4 0
27 94 86 77 57 39 23 20 21 18 13 9 7 5 1
28 94 89 81 62 40 30 24 24 19 14 10 8 5 2
29 94 90 83 66 47 32 31 30 22 16 10 9 5 3
30 95 92 85 67 54 38 34 32 27 19 14 13 7 4
31 96 93 86 69 60 41 37 35 30 20 17 14 8 6
32 96 94 89 73 63 44 41 38 32 21 19 17 10 7
33 96 94 90 76 65 47 43 41 35 27 24 20 12 8
34 97 95 91 81 68 52 46 43 39 34 28 21 15 13
35 97 96 91 83 73 55 51 47 42 36 31 24 20 17
36 97 9
91 84 78 60 59 51 44 38 37 27 22 20
37 97 9 96 86 79 63 64 55 50 41 39 29 25 29
38 98 9 97 90 83 68 68 60 56 43 41 33 31 30
39 98 9 97 91 86 70 75 64 59 45 43 36 34 32
40 9 97 93 88 75 80 72 62 51 47 39 37 34
41 98 93 90 81 82 78 66 60 49 42 40 37
42 96 94 86 86 81 69 67 56 47 43 39
43 96 95 91 89 85 73 72 64 51 49 45
44 97 96 92 91 88 77 75 66 57 54 52
45 98 98 93 92 90 82 81 70 65 63 59
46 98 94 94 92 84 81 73 68 67 63
47 99 95 94 95 90 85 79 74 72 70
48 99 9 98 97 93 90 85 79 77 74
49 100 9 98 98 96 94 88 87 85 77
50 9 99 99 98 96 90 89 87 80
51 99 99 98 98 93 92 90 85
52 100 100 98 98 95 95 94 90
53 99 99 96 96 95 94
54 100 99 100 100 96 95
55 100 100 96
56 100
57
58
59
60
341

Appendix 2
Smoothed 2007-2008 Norms for the Libya in the Context of the 1989 Taiwan Data
Age in years
9 10 11 12
Percentile Li TA Li TA Li TA Li TA
95 6 1
90 8 1
75 1 6 32 5
50 8 20 6

25 12 2
10 0 2 4
5 9 9 0 2
n 180 180 180 180
Smoothed 2007-2008 Norms for the Libya in the Context of the 1992 India Data
Age in years
11 12 12 14 15
Percentile Li IN Li IN Li IN Li IN Li IN
95 0
6 1 50
90 22
8 1 49
75 18 1 6 32 5 45
50 6 8 20 6 40
25 12 2 31
10 0 2 4 16
5
9 9 0 2 12
n 180 180 180 180
180 131
Smoothed 2007-2008 Norms for the Libya in the Context of the 1992 Netherlands Data
Age in years
8 9 10 11 12
Percentile Li HU Li HU Li HU Li HU Li HU
95 0 43 6 1
90 22 41
8 1
75 18 37 1 6 32 5
50 6 29 8 20 6
25 22 12 2
10 17 0 2 4
5
13 9 9 0 2
n 180 156 180 180 649 180 463 180
Smoothed 2007-2008 Norms for the Libya in the Context of the 1998 France Data
Age in years
8 9 10 11 12
Percentile Li FR Li FR Li FR Li FR Li FR
95 0 45 47 6 51 1 52 52
90 22 42 44 48 8 49 1 50
75 18 39 1 42 6 45 32 45 5 45
50 6 33 8 36 20 39 6 41 41
25 22 12 27 33 37 2 37
10 15 0 20 28 2 31 4 33
5
12 9 13 9 21 0 27 2 30
n 180 62 180 71 180 64 180 63 180 70
342
Smoothed 2007-2008 Norms for the Libya in the Context of the 1993 Turkey Data
Age in years
8 9 10 11 12 13 14
Percentile Li TR Li TR Li TR Li TR Li TR Li TR Li TR
95 0 37 45 6 47 1 48 49 47 52 7 52
90 22 34 42 45 8 46 1 47 42 51 3 51
75 18 29 1 37 6 40 32 42 5 42 40 44 48
50 6 21 8 27 20 31 6 33 34 36 41
25 17 12 22 25 27 2 28 7 28 8 29
10 12 0 13 14 2 14 4 14 15 18
5
11 9 11 9 12 0 12 2 12 12 6 13
n 180 104 180 186 180 381 180 274 180 168 180 119 180 72
Smoothed 2007-2008 Norms for the Libya in the Context of the 1987 Kosice, Slovakia
Percentil Age in years
e 15 16 17 18
LI SK LI S LI S LI S LI S LI S LI SK LI SK
K K K K K
1 51 53 47 54 7 55 56 8 57 49 58 52 58
8 49 1 51 42 52 3 53 54 55 48 56 50 56
2 46 5 48 40 49 51 0 52 3 53 4 53 46 53
6 42 44 45 47 5 49 50 39 50 41 50
36 2 38 7 41 8 42 8 44 29 45 2 46 33 47
2 29 4 31 34 36 37 39 5 40 29 41
5 0 24 2 27 29 6 31 9 32 19 33 20 34 20 35
N 18 - 18 - 18 - 18 - 18 - 18 - 18 - 20 -
0 0 0 0 0 0 0 0
Smoothed 2007-2008 Norms for the Libya in the Context of the 1979 & 1992 British Data
Age in years
8 9 10 11 12 13 14
Percentile Li UK Li UK Li UK Li UK Li UK Li UK Li UK
95 0 40 6 48 1 50 52 47 54 7 55
90 22 38 46 8 48 1 50 42 52 3 54
75 18 33 1 6 42 32 44 5 46 40 49 50
50 6 25 8 20 38 6 40 41 43 45
25 17 12 32 34 2 37 7 39 8 42
10 14 0 23 2 29 4 31 33 36
5
12 9 9 17 0 24 2 26 28 6 30
n 180 174 180 166 180 172 180 187 180 164 180 185 180 196
Age in years
15 16 17 18-21
Percentile Li UK Li UK Li UK Li UK
95 57 8 - 49 - 53 59
90 55 - 48 - 51 58
75 0 51 3 - 4 - 47 57
50 5 47 - 39 - 43 54
25 8 42 29 - 2 - 36 49
10 36 - 5 - 31 44
5 9 33 19 - 20 - 26 39
n 180 191 180 - 180 - 800 58
343
Smoothed 2007-2008 Norms for the Libya in the Context of the 1986 Australia Data
Age in years
8 9 10 11 12 13 14
Percentile Li Au Li Au Li Au Li Au Li Au Li Au Li Au
95 0 44
6 1 47 7
90 22 42 8 1 42 3
75 18 39 1 6 32 5 40
50 6 32 8 20 6
25 22 12 2 7 8
10 13 0 2 4
5
11 9 9
0 2 6
n 180 - 180 - 180 - 180 - 180 - 180 - 180 -
Age in years
15 16 17
Percentile Li Au Li Au Li Au
95 8
49
90 48
75 0 3 4
50 5 39
25 8 29 2
10 5
5 9 19 20
n 180 - 180 - 180 -
Smoothed 2007-2008 Norms for the Libya in the Context of the 1986 China Data
Age in years
8 9 10 11 12 13 14
Percentile Li Ch Li Ch Li Ch Li Ch Li Ch Li Ch Li Ch
95 0 44 47 6 50 1 52 53 47 53 7 55
90 22 39 43 48 8 48 1 50 42 52 3 52
75 18 31 1 37 6 42 32 43 5 46 40 50 50
50 6 23 8 33 20 35 6 39 42 45 48
25 15 12 25 27 33 2 37 7 40 8 43
10 13 0 14 17 2 25 4 27 35 36
5
10 9 12 9 13 0 19 2 21 30 6 34
n 180 - 180 - 180 - 180 - 180 - 180 - 180 -
Age in years
15 16 17-19 18-21
Percentile Li Ch Li Ch Li Ch Li Ch
95 57 8 57 49 58 53 57
90 54 56 48 57 51 56
75 0 51 3 53 4 55 47 54
50 5 48 49 39 52 43 50
25 8 43 29 44 2 47 36 44
10 36 41 5 40 31 38
5 9 34 19 36 20 37 26 33
n 180 - 180 - 180 - 800 -
344
Smoothed 2007-2008 Norms for the Libya in the Context of the 1979 & 1992 United States of America
Age in years
8 9 10 11 12 13 14
Percentile Li Us Li Us Li Us Li Us Li Us Li Us Li Us
95 0 38 42 6
1 50 47 7
90 22 36 40 44 8
1 42 3
75 18 31 1
6 40 32 5 40
50 6 23 8 20 6
25 16 12
2 7 8
10 13 0
2 4
5
10 9 9 0 2 6
n 180 - 180 - 180 - 180 - 180 - 180 - 180 -
Age in years
15 16 17 18-21
Percentile Li Us Li Us Li Us Li Us
95
8 49 - 53
90 48 - 51
75 0 3 4 - 47
50 5 39 - 43
25 8 29 2 - 36
10
5 - 31
5 9 19 20 - 26
n 180 - 180 - 180 - 800
Smoothed 2007-2008 Norms for the Libya in the Context of the 1998 Slovenia Data
Age in years
8 9 10 11 12 13 14
Percentile Li SL Li SL Li SL Li SL Li SL Li SL Li SL
95 0 39 44 6 49 1 51 52 47 53 7 54
90 22 37 42 47 8 49 1 50 42 51 3 52
75 18 33 1 39 6 43 32 45 5 47 40 48 49
50 6 24 8 31 20 36 6 40 44 45 46
25 16 12 21 29 33 2 36 7 37 8 38
10 11 0 14 19 2 25 4 30 32 33
5
9 9 12 9 15 0 19 2 22 24 6 24
n 180 48 180 71 180 59 180 59 180 58 180 68 180 72
Age in years
15 16 17 18 19 20 21
Percentile Li SL Li SL Li SL Li SL Li SL Li SL Li SL
95 56 8 57 49 57 52 57 50 53 54
90 53 54 48 54 50 55 48 51 52
75 0 50 3 51 4 52 46 53 46 47 48
50 5 47 47 39 48 41 49 42 43 43
25 8 40 29 41 2 43 33 44 35 37 37
10 34 35 5 35 29 36 29 32 33
5 9 25 19 26 20 28 20 30 25 29 30
n 180 67 180 147 180 127 200 43 200 200 200
345
Buy your books fast and straightforward online - at one of world’s
fastest growing online book stores! Environmentally sound due to
Print-on-Demand technologies.
Buy your books online at

www.get-morebooks.com
Kaufen Sie Ihre Bücher schnell und unkompliziert online – auf einer
der am schnellsten wachsenden Buchhandelsplattformen weltweit!
Dank Print-On-Demand umwelt- und ressourcenschonend produzi-
ert.
Bücher schneller online kaufen

www.morebooks.de
VDM Verlagsservicegesellschaft mbH
Heinrich-Böcking-Str. 6-8 Telefon: +49 681 3720 174 info@vdm-vsg.de
D - 66121 Saarbrücken Telefax: +49 681 3720 1749 www.vdm-vsg.de
View publication stats

AStudyof Intelligencein North Africaandthe Middle East

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AStudyof Intelligencein North Africaandthe Middle East

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

A Study of Intelligence in North Africa and the Middle East.

Book · January 2012

Alsedig Abdalgadr Al-Shahomee

An Increase in Intelligence in Libya from 2006 to 2017 View project

The user has requested enhancement of the downloaded file.

   

    

    

// $$$   

0  

"0"  )*4/ +)

*   &

/- ;9<89"0"  )*4/ +)

Alsedig Abdalgadr Ali Alshahomee

Chapter one: INTRODUCTION

Chapter two: INTELLIGENCE LITERATURE REVIEW

Chapter three: RATIONALE AND STATEMENT OF PROBLEM

Chapter four: REVIEW OF STANDARD PROGRESSIVE MATRICES

Chapter five: MATERIALS AND METHODS

Chapter six: RESULTS

Chapter eight: DISCUSSION AND CONCLUSION

fully agreed definitions, though it may eventually lead to them.

intelligence is central to an individual's life (Samuda, 1975).

necessarily possess the attributes to perform the job effectively.

and knowledge of the psychologist.

less qualified people (Attashan and Abdalla 2005).

1. Determine psychometric characteristics (reliability, validity, difficulty and

discrimination) of the SPM test when applied to a Libyan sample.

achievement (SAA) for a Libyan sample aged 8 – 21 years.

3. Investigate the presence of significant differences in sample performances on the

and desert), age and study levels.

4. Investigate the presence of significant differences in sample performance on the SPM

and academic discipline.

geographic areas, and gender based on academic discipline.

academic achievement in predicting SPM scores.

countries (developed and developing countries).

research questions, study aims and objectives.

which have employed of the SPM test will be given.

analysis tools and finally meta-analysis results.

Intelligence is a difficult construct to define. In a survey carried out by Snyderman

nature of intelligence. 99.3% indicated that abstract thinking or reasoning was an

important element of intelligence, 97.7% indicated that problem-solving ability was

survey emphasized the importance of thinking, learning and problem solving as

were surprisingly similar. Both groups viewed intelligence as a complex construct

made up of verbal ability, practical problem solving and social competence.

Intelligence is an important component of learning and academic achievement

concepts, to reason as well as the ability to solve problems (Li, 1996).

An important consideration which has been in existence since Alfird Binet

stable, it should not be seen as a fixed characteristic.

2.2 Definitions of Intelligence

"intelligence" to refer to individual differences in mental ability (Aiken, 1988).

the ability to adapt to environment, and by psychologists as the ability to understand

The history of the differences between psychologists regarding definitions of

while the second was in 1986.

2.2.1 The 1921 Symposium

In 1921, the Editors of the Journal of Educational Psychology invited psychologists to

asked to define “ intelligence”. Following are some of their definitions:

Intelligence is equivalent to the capacity to learn or it is the ability to learn to adjust

oneself to environment (Colvin) P.136.

Intelligence is the capacity to learn or profit by experience (Dearbon) P.210.

Intelligence is sensory capacity, capacity for perceptual recognition, quickness, range

alertness in response (Freeman) P.133.

// $$$

0

"0" )*4/ +)

* &

/- ;9<89"0" )*4/ +)