You are on page 1of 370

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/299859913

A Study of Intelligence in North Africa and the Middle East.

Book · January 2012

CITATIONS READS
0 917

1 author:

Alsedig Abdalgadr Al-Shahomee


University of Tripoli
23 PUBLICATIONS   108 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

‫ اﻟﺸﺨﺼﻴﺔ‬View project

An Increase in Intelligence in Libya from 2006 to 2017 View project

All content following this page was uploaded by Alsedig Abdalgadr Al-Shahomee on 07 April 2016.

The user has requested enhancement of the downloaded file.


  !    "!
   
  
 
  
   
  
     
   
 



     
!
" 

     


  
 

  

  

   
         
 


 
         
 
    
      
   

    


 
#

        


  
       

 
  
       

   
$ %&
 
    

     

  




   
 
   
'



 
    

        (( 
 '
  

  )*+,(

 
 

 
(
 
  
  
  
-
# 
*$  


 

'  %
 
. 
 -
  


 


*  

  



    



 




    


 


 

 




 

 
  
 ! " !

  
           
                

 
     !  
  
"        #      $  !%
 !      &  $   '     
'    ($     '   # %  %
)   %*   %'   $  ' 
   +  " %    &
 '  !  #     
    $, 
 ( $


    
    -         .   
                   
  
             !  
"-                   (     %
            
   
 .  
   %   %   %   % 
      $             $      $ -  
             -            

            - -

// $$$   

0  


1"1"#23."   
     

"0"  )*4/ +)


* !5 !& 6!7%66898&  %  ) -
2 
: !  

*   &


    
 #$%&'(')*$'+&+$$'(

/- ;9<89"0"  )*4/ +)


"3   "    &  9<89
A Study of Intelligence in North Africa and the Middle East.

Alsedig Abdalgadr Ali Alshahomee


Omar Al-Mukhtar University
El-Beida Libya
Dedication
To my mother (Zahra) and child daughter (Hajar) who both passed away during my
study. I will always remember you and keep praying for you.

i
Acknowledgments

I begin by praising ALLAH Almighty. I praise him and seek his help and pleasure. I
wish to express my grateful appreciation to Prof. Richard Lynn and my supervisor
Prof. Peter Eachus and my co-supervisor Dr Simon Cassidy. My thanks should also
go to all the participants who took part in this study, and all those who helped me
during this study, especially my colleagues Prof. A. Attashani, Prof. S. Elghmary, Dr.
M. Hammad and Mr. K. Khelifa. Finally, special thanks to my parents, wife, children:
Abubaker, Ashraf, Alamin and zahra who make my life worthy. It is also to my sister
and brothers for their understanding, support and faithfulness during the years of my
study in England .

ii
Contents

Page
Tables.......................................................................................................................... vii
Figures......................................................................................................................... xi

Chapter one: INTRODUCTION


1.1 Introduction……………………………………………………………….. 1

Chapter two: INTELLIGENCE LITERATURE REVIEW


2.1 Introduction……………………………………………………………….. 6
2.2 Definitions of Intelligence………………………………………………… 7
2.2.1 The 1921 Symposium……………………………………………………... 7
2.2.2 The 1986 Symposium……………………………………………………... 9
2.3 Evolution of the Concept of Intelligence and Intelligence Testing……….. 12
2.3.1 Contribution of Edward Seguin (1812-1880). ……………………………. 16
2.3.2 Contribution of Jean Etienne Esquirol (1772-1840)..……………………... 17
2.3.3 Contribution of Sir Francis Galton (1822-1911)…………………………... 17
2.3.4 Contribution of James McKeen Cattell (1860-1944)……………………… 18
2.3.5 Contribution of Alfred Binet (1857-1911)………………………………… 19
2.3.6 The First World War and the Development of Group Tests………………. 21
2.3.7 Contribution of Charles Spearman (1863-1945)…………………………... 23
2.3.8 Contribution of Piaget (1896-1980)……………………………………...... 25
2.4 Theories of Intelligence……………………………………………………. 27
2.4.1 Spearman’s “g” Theory……………………………………………………. 27
2.4.2 Thurstone's Primary Mental Abilities (1938)…………………………….... 28
2.4.3 Guilford’s structure of the intellect theory………………………………… 30
2.4.4 Gardner’s theory of multiple intelligences………………………………… 31
2.4.5 Cattell and Horn’s theory of fluid and crystallized intelligence…………... 32
2.4.6 Carroll’s three-startum theory of cognitive abilities………………………. 33
2.4.7 The Cattell-Horn Carroll Model…………………………………………… 33
2.5 Definitions of Mental Test………………………………………………… 34
2.6 Classification of Mental Tests…………………………………………….. 35
2.6.1.1 Classification of tests according to timing………………………………... 35

iii
2.6.1.2 Classification of tests according to procedure of administration…………. 36
2.6.1.3 Classification of tests according to content……………………………….. 37
2.7 Use of Mental Tests……………………………………………………….. 37
2.8 Use of Intelligence Tests………………………………………………….. 38
2.9 Culture-Free and Culture-Fair Tests………………………………………. 41
2.10 Achievement Tests………………………………………………………… 44
2.11 Intelligence and academic achievement…………………………………… 47
2.12 Increase in IQ with time…………………………………………………… 50
2.13 Chapter Summary………………………………………………………….. 57

Chapter three: RATIONALE AND STATEMENT OF PROBLEM


3.1 Introduction………………………………………………………………... 59
3.2 Education System in Libya………………………………………………... 60
3.3 Intelligence testing in Libya………………………………………………. 63
3.4 Adoption of intelligence tests……………………………………………… 68
3.5 Standard Progressive Matrices (SPM) test………………………………… 70
3.6 Statement of problem and study rationale………………………………… 73
3.7 Study aim………………………………………………………………….. 84
3.8 Research Question………………………………………………………… 84
3.9 Research objectives……………………………………………………….. 84
3.10 Chapter Summary………………………………………………………….. 85

Chapter four: REVIEW OF STANDARD PROGRESSIVE MATRICES


LITERATURE
4.1 Introduction………………………………………………………………... 87
4.2 Progressive Matrices Tests………………………………………………… 878
4.3 Description of the SPM test……………………………………………….. 91
4.4 Reporting SPM Results……………………………………………………. 94
4.5 Standardisation of the SPM test…………………………………………… 95
4.6 Reliability of the SPM……………………………………………………... 97
4.6.1 Test-retest reliability Test…………………………………………………. 98
4.6.2 Spilt-half reliability………………………………………………………... 100
4.6.3 Cronbach’s alpha reliability ………………………………………………. 101
4.7 Validity of the SPM test…………………………………………………… 104

iv
4.7.1 Content Validity…………………………………………………………… 105
4.7.2 Construct Validity…………………………………………………………. 106
4.7.2.1 Factor analysis…………………………………………………………….. 107
4.7.2.2 Internal consistency……………………………………………………….. 110
4.7.3 Criterion-related Validity…………………………………………………. 111
4.7.3.1 Correlation of SPM test with Intelligence Tests…………………………... 112
4.7.3.2 Correlation of SPM test with Achievement Tests…………………………. 120
4.8 Item analysis of the SPM test……………………………………………… 130
4.8.1 Item difficulty…………………………………………………………….... 130
4.8.1 Item discrimination………………………………………………………... 131
4.9 Review of previous studies that employed SPM test……………………… 132
4.9.1 Studies on SPM test in developed countries………………………………. 134
4.9.1 Studies on SPM test in developing countries……………………………… 146
4.10 Chapter Summary………………………………………………………….. 157

Chapter five: MATERIALS AND METHODS


5.1 Introduction………………………………………………………………... 160
5.2 Research design……………………………………………………………. 160
5.3 Methodology………………………………………………………………. 161
5.4 Methods……………………………………………………………………. 162
5.5 Ethical approval…………………………………………………………… 164
5.6 Pilot study………………………………………………………………….. 165
4.7 Main Study………………………………………………………………… 166
5.7.1 Sample size……………………………………………………………….... 166
5.7.2 Sample selection…………………………………………………………… 166
5.7.2.1 Multi-stage-cluster sampling design………………………………………. 166
5.7.2.2 Disproportional stratified sampling……………………………………….. 168
5.7.2.3 The multi-stage-cluster sampling process and procedures………………… 171
5.8 Field work arrangement…………………………………………………… 178
5.9 Preparation of the SPM test……………………………………………….. 180
5.10 Administration of the SPM test…………………………………………… 180
5.11 The proposed and achieved sample size…………………………………... 182
5.12 Data Statistical Analysis…………………………………………………... 183

v
5.13 Chapter Summary………………………………………………………….. 186

Chapter six: RESULTS


6.1 Introduction………………………………………………………………... 187
6.2 Description of students and SPM score means……………………………. 190
6.3 Reliability of the SPM Test………………………………………………... 192
6.3.1 Test-retest reliability of the SPM test……………………………………… 193
6.3.2 Spilt-half reliability………………………………………………………... 193
6.3.3 Alpha Reliability…………………………………………………………... 194
6.4 Validity of the SPM test…………………………………………………… 195
6.4.1 Construct Validity…………………………………………………………. 195
6.4.1.1 Factor analysis of SPM test………………………………………………... 196
6.4.1.2 Internal consistency validity………………………………………………. 200
6.4.2 Criterion-related validity…………………………………………………... 202
6.5 Item Analysis of the SPM test……………………………………………... 203
6.5.1 Item Difficulty……………………………………………………………... 203
6.5.2 Item Discrimination……………………………………………………….. 204
6.6 Differences in SPM scores………………………………………………… 208
6.6.1 Differences according to gender…………………………………………... 208
6.6.2 Difference according to regions (cities and villages)……………………… 209
6.6.3 Difference according to academic discipline……………………………… 210
6.6.4 Difference according to geographic areas…………………………………. 211
6.6.5 Difference according to age……………………………………………….. 212
6.6.6 Difference according to study levels……………………………………… 213
6.6.7 Difference according to regions and study levels………………………… 215
6.6.8 Difference according to regions and gender………………………………. 217
6.6.9 Difference according to age and region…………………………………… 218
6.6.10 Difference according to geographic areas and gender…………………….. 221
6.6.11 Difference according to academic discipline and gender…………………. 223
6.6.12 Difference according to age and gender…………………………………… 224
6.6.13 Difference according to academic discipline and age……………………... 227
6.7 Multiple Regression according to independent variables…………………. 232
6.8 The Percentile Ranks of the SPM Score…………………………………... 233
6.9 Chapter Summary…………………………………………………………. 236

vi
Chapter seven: META-ANALYSIS
7.1 Introduction………………………………………………………………... 240
7.2 Advantages of Meta-analysis……………………………………………… 241
7.3 Disadvantages of Meta-analysis…………………………………………… 242
7.4 Literature review…………………………………………………………... 243
7.5 Method…………………………………………………………………….. 244
7.5.1 Criteria for studies selection………………………………………………. 244
7.5.2 Strategy of analysis………………………………………………………... 246
7.6 Results……………………………………………………………………... 248
7.6.1 SPM means and standard deviations according to the independent
variables…………………………………………………………………… 251
7.6.2 Differences in SPM scores………………………………………………… 252
7.6.2.1 Difference according to development status………………………………. 252
7.6.2.2 Difference according to age groups……………………………………….. 253
7.6.2.3 Difference according to gender……………………………………………. 255
7.6.2.4 Difference according to development status and age……………………… 256
7.6.2.5 Difference according to development status and gender………………….. 260
7.6.2.6 Difference according to age groups and gender…………………………… 262
7.6.3 Multiple Regressions according to the independent variables…………….. 266
7.7 Chapter Summary………………………………………………………….. 267

Chapter eight: DISCUSSION AND CONCLUSION


8.1 Introduction………………………………………………………………... 270
8.2 Intelligence testing in Libya……………………………………………….. 271
8.3 The SPM test………………………………………………………………. 272
8.4 Meta-analysis……………………………………………………………. 273
8.5 Study discussion…………………………………………………………… 277
8.5.1 Psychometric characteristics of the SPM test in Libya……………………. 277
8.5.1.1 Reliability of SPM test…………………………………………………….. 278
8.5.1.2 Validity of SPM test…………………………………………………….…. 280
8.5.1.3 Item analysis of SPM test………………………………………………….. 282
8.5.2 IQ and Libya………………………………………………………………. 283

vii
8.5.3 SPM and gender…………………………………………………………… 292
8.5.4 SPM and region……………………………………………………………. 297
8.5.5 SPM and age (study level)………………………………………………… 298
8.5.6 SPM and academic discipline……………………………………………... 301
8.5.7 Relationship and prediction of SPM………………………………………. 301
8.5.8 SPM percentiles…………………………………………………………… 302
8.6 Study conclusions…………………………………………………………. 305
8.7 Study contributions………………………………………………………... 308
8.8 Limitations of the Study…………………………………………………… 308
8.9 Recommendations of the Study…………………………………………… 313
8.10 Further research……………………………………………………………. 315

viii
Tables

Page
Table 4.1 SPM standardization studies……………………………………………… 96
Table 4.2 Summary of the studies performed on the SPM test reliability…………... 103
Table 4.3 Summary of studies on SPM test concurrent validity with r to z Fisher’s
transformation results…………………………………………………….. 118
Table 4.4 The average of the correlation between SPM test with intelligence tests... 119
Table 4.5 Summary of the studies on SPM test predictive validity and with r to z
Fisher’s transformation results…………………………………………… 127
Table 4.6 The average of correlation between the SPM test and achievement tests... 129
Table4.7 Shows a sample of worldwide studies that utilised the SPM test as a …. 132
Table 5.1 Principals of selecting sample in schools………………………………… 175
Table 5.2 The target sample size for selecting the pre-university students in the two
cities in proportion to their real numbers…………………………………. 175
Table 5.3 The target sample size for selecting the pre-university students in the
nine villages in proportion to their real numbers…………………………. 176
Table 5.4 The target sample size for selecting the undergraduate university students
in Omar El-Mukhtar University in proportion to their real numbers…….. 176
Table 6.1 Descriptive statistics of overall collected data and tests of normality……. 188
Table 6.2 SPM score means and standard deviations……………………………….. 191
Table 6.3 SPM test-retest reliabilities according to age, gender and study levels…... 193
Table 6.4 SPM split-half reliabilities according to gender, age and total Sample…... 194
Table 6.5 SPM Alpha reliabilities according to gender, age and total sample……… 195
Table 6.6 Correlations matrix between the five sets of the SPM test among Libyan
male and female students (N=2600, 8 to21 years) and extracted factor….. 196
Table 6.7 Correlations matrix between the five sets of the SPM test among Libyan
male students (N=1300, 8 to21 years) and Extracted Factor……………... 198
Table 6.8 Correlations matrix between the five sets of the SPM test among Libyan
female students (N=1300, 8 to21 years) and extracted factor……………. 199
Table 6.9 Correlations coefficients between the five sets and the total scores of the
SPM test (n=2600, age 8 to21 years)…………………………………….. 200

ix
Page
Table 6.10 Correlations coefficients between the five sets and the total scores of the
SPM test (males n=1300 and females n= 1300, age 8 to21 years)……….. 201
Table 6.11 Correlation between the SPM and achievement scores according to age,
level of study, gender, academic discipline and total sample…………….. 202
Table 6.12 Item difficulty (percentages of correct answers) and SPM Means of the
Correct Answers (N = 2600)……………………………………………... 203
Table 6.13 Index of Discrimination and Items Evaluation…………………………… 205
Table 6.14 Point biserial and significant level for each SPM item…………………… 205
Table 6.15 Summary of item analysis of the five SPM sets………………………….. 206
Table 6.16 Comparison of gender…………………………………………………….. 208
Table 6.17 Comparison of regions……………………………………………………. 209
Table 6.18 Comparison of academic discipline………………………………………. 210
Table 6.19 Comparison of geographic areas…………………………………………. 211
Table 6.20 Post Hoc Tukey (HSD) Test……………………………………………… 211
Table 6.21 Comparison according to age…………………………………………….. 212
Table 6.22 Post Hoc Tukey (HSD) Tests…………………………………………….. 213
Table 6.23 Comparison according to study levels…………………………………… 214
Table 6.24 Post Hoc Tukey (HSD) Test……………………………………………… 214
Table 6.25 Comparison of the region according to study levels……………………... 215
Table 6.26 Levene's Test of Equality of Error Variances of SPM scores……………. 215
Table 6.27 Tests of Between-Subjects Effects of SPM scores……………………….. 215
Table 6.28 Post Hoc Tukey (HSD) Test……………………………………………… 216
Table 6.29 Comparison of the regions according to gender………………………….. 217
Table 6.30 Levene's Test of Equality of Error Variances of SPM scores……………. 217
Table 6.31 Tests of Between-Subjects Effects of SPM scores……………………….. 217
Table 6.32 Comparison of age according to region…………………………………... 218
Table 6.33 Levene's Test of Equality of Error Variances of SPM scores……………. 218
Table 6.34 Tests of Between-Subjects Effects of SPM scores………………………. 219
Table 6.35 Post Hoc Tukey (HSD) test………………………………………………. 219
Table 6.36 Comparison of the geographic areas according to gender………………... 221
Table 6.37 Levene's Test of Equality of Error Variances of SPM scores……………. 221
Table 6.38 Tests of Between-Subjects Effects of SPM scores……………………….. 221

x
Page
Table 6.39 Post Hoc Tukey (HSD) Test……………………………………………… 222
Table 6.40 Comparison of academic discipline according to gender………………… 223
Table 6.41 Levene's Test of Equality of Error Variances of SPM scores……………. 223
Table 6.42 Tests of Between-Subjects Effects of SPM scores……………………….. 223
Table 6.43 Comparison of age according to gender………………………………….. 224
Table 6.44 Levene's Test of Equality of Error Variances…………………………….. 225
Table 6.45 Tests of Between-Subjects Effects of SPM scores……………………….. 225
Table 6.46 Post Hoc Tukey (HSD) test………………………………………………. 225
Table 6.47 Comparison of academic discipline according to age……………………. 227
Table 6.48 Levene's Test of Equality of Error Variances of SPM scores……………. 227
Table 6.49 Tests of Between-Subjects Effects of SPM scores……………………….. 228
Table 6.50 Post Hoc Tukey (HSD) test………………………………………………. 228
Table 6.51 Magnitude of gender differences in means score and variability on SPM
as functions of age, geographic areas and discipline……………………... 229
Table 6.52 Stepwise Regression for Independent Variables and the SPM Scores…… 232
Table 6.53 detailed percentile 2007-2008 Norms for Libya students according to age 233
Table 6.54 detailed percentile 2007-2008 Norms for the Libyan students according
to age and gender…………………………………………………………. 234
Table 6.55 Detailed percentile (2007-2008) Norms for Libyan students according to
age and academic discipline……………………………………………… 235
Table 7.1 Studies included in the meta-analysis…………………………………….. 245
Table 7.2 Descriptive statistics for means scores of overall collected data and tests
of normality………………………………………………………………. 249
Table 7.3 Showing SPM score means and standard deviations according to
independent variables…………………………………………………….. 251
Table7.4 Comparison of the SPM Mean according to development status………… 252
Table 7.5 Post hoc tests multiple comparisons of SPM scores (Tukey HSD)………. 252
Table 7.6 Comparison of the SPM Mean scores according to age groups………….. 253
Table 7.7 Post Hoc Tests Multiple Comparisons of SPM Scores (Tukey HSD)……. 253
Table 7.8 Comparison of the gender mean scores of SPM test……………………... 255
Table 7.9 Comparison of the development status mean scores of SPM test
according to age…………………………………………………………... 256

xi
Page
Table 7.10 Levene's Test of Equality of Error Variances of SPM scores……………. 256
Table 7.11 Tests of Between-Subjects Effects of SPM scores……………………….. 256
Table 7.12 Post Hoc Tests Multiple Comparisons of SPM Scores (Tukey HSD)……. 257
Table 7.13 Magnitude of the development status of countries (developed and
developing countries) in mean scores and variability on SPM as
functions of age and total sample………………………………………… 258
Table 7.14 Comparison of the development status mean scores of SPM test
according to gender……………………………………………………….. 260
Table 7.15 Levene's Test of Equality of Error Variances of SPM scores……………. 260
Table 7.16 Tests of Between-Subjects Effects of SPM scores……………………….. 261
Table 7.17 Comparison of the age groups mean scores of SPM test according to
gender…………………………………………………………………….. 262
Table 7.18 Levene's Test of Equality of Error Variances of SPM scores……………. 262
Table 7.19 Tests of Between-Subjects Effects of SPM scores……………………….. 262
Table 7.20 Post Hoc Tests Multiple Comparisons of SPM Scores (Tukey HSD)……. 263
Table 7.21 Magnitude of gender differences in mean scores and variability on SPM
as a function of age and development status……………………………... 264
Table 7.22 Stepwise Regression for Independent Variable and the SPM Score
Means……………………………………………………………………... 266
Table 8.1 Mean IQs and average for some developed and developing countries…... 283

xii
Figures

Page
Figure 4.1 Typical items from the SPM Test. A5 presents an easy item whereas E1
presents a difficult item …………………………………………….. 92
Figure 5.1 Summary of the sampling method and theory………………………….. 171
Figure 5.2 Sampling process……………………………………………………….. 177
Figure 6.1 Histogram showing normal distribution for means scores……………... 188
Figure 6.2 Normal Q-Q plot……………………………………………………….. 189
Figure 6.3 Detrended normal Q-Q plot…………………………………………….. 189
Figure 6.4 Box plot of scores distribution…………………………………………. 189
Figure 6.5 Screen Plot for the five Factors………………………………………… 197
Figure 6.6 Screen Plot for the five Factors………………………………………… 198
Figure 5.7 Screen Plot for the five Factors………………………………………… 199
Figure 5.8 Means score differences of age and region…………………………….. 220
Figure 5.9 Means score difference of age and gender……………………………... 226
Figure 7.1 The distribution for means scores……………………………………… 249
Figure 7.2 Box plot of scores distribution…………………………………………. 249
Figure 7.3 Normal Q-Q plot……………………………………………………….. 250
Figure 7.4 Detrended normal Q-Q plot…………………………………………….. 250
Figure 7.5 Means score differences of age group and gender……………………... 263
Figure 8.1 Urbanisation development in Libya 1954-1995………………………... 297

xiii
Chapter One: INTRODUCTION

Humans differ from one another in their ability to understand complex ideas, adapt

effectively to the surrounding environment, learn from experience, engage in various forms

of reasoning and overcome obstacles through thinking. Although individuals’ differences can

be substantial, they are never entirely consistent over time. A given person's intellectual

performance will vary on different occasions, in different domains and as judged by different

criteria. The concept of "intelligence" is an attempt to represent and organize this complex set

of phenomena. Such conceptualization has achieved great success in clarifying some areas.

Nonetheless it has not yet answered all the important questions nor has it established

universal assent. Indeed, when two of the prominent theorists, in the field, were asked to

define intelligence, they gave two somewhat distinct definitions (Sternberg & Detterman,

1986). Such a disagreement is not a cause of dismay. Scientific research rarely begins with

fully agreed definitions, though it may eventually lead to them.

Intelligence tests play a vital role at all stages and in every aspect of a person's life. From

pre-school days through to postgraduate years, tests are administered for grouping, course

selection purposes, and placement in special classes or special institutions. Not only that, but

also for career orientation, college entrance and admission to professions. A person's

Intelligence Quotient (IQ) score largely determines the type of education he/she received and,

ultimately, the type of position he/she might occupy within society. Therefore the concept of

intelligence is central to an individual's life (Samuda, 1975).

Though Libya has witnessed a huge development in education within the last decades, some

areas still lack the benefits of such advancements. To date, no single test of intellectual ability

has been officially adopted to be used for the measurement of intelligence. Schools and

universities alike use examination grades as the primary and only method in determining who

 
should be accepted for study at various academic establishments. Similar procedures take

place in the vocational sector. These, grades, might be considered as a good criterion for

such purposes. Additional criteria, however, are essential for reliable and valid judgements.

One of which is the application of mental, or particularly intelligence tests in decision making

processes. The lack of intelligence tests in Libya in the selection of students for different

educational programs had caused many problems. Failure to allocate students according to

their abilities and interests deprived Libya from one of its most valuable resources. This also

had an adverse effect on business and commerce. Employees scoring well in tests might not

necessarily possess the attributes to perform the job effectively.

The health service system is another affected sector. Mental tests currently employed in

Libya are either misused or used in an incomplete form. The use of incomplete tests has

serious negative implications for educational and clinical decisions. The chief drawback is

the bias of the test predictions. In the clinical case, the use of incomplete test scores for the

estimation of mental ability might result in invalid assessment. This will lead to grave

consequences on individuals’ lives. Intelligence tests are useful tools in accomplishing the

desired goals and avoid unwanted side-effects. Their effectiveness will depend on the skills

and knowledge of the psychologist.

Nowadays a relevant and accurate selection procedure is required in Libya more than ever

before. Not only in the fields of education, health and vocation but in the whole agenda of the

government. Indeed, a clear failing of the current system could be seen, for example, at the

job market. Many university graduates were posted to office work which could be done by

less qualified people (Attashan and Abdalla 2005).

 
In response to the current gaps, this book aims at introducing one of the well known

intelligence tests in the world in Libya. This is the classic form of the Standard Progressive

Matrices (SPM) test. Moreover, the current study attempts to develop norms for the SPM test

and identify the distribution of IQ scores of a Libyan sample. The study objectives include:

1. Determine psychometric characteristics (reliability, validity, difficulty and

discrimination) of the SPM test when applied to a Libyan sample.

2. Study the relationship between SPM mean scores and student’s academic

achievement (SAA) for a Libyan sample aged 8 – 21 years.

3. Investigate the presence of significant differences in sample performances on the

SPM test according to gender, region (cities and villages), academic discipline

(science and arts), geographical areas (main city, secondary city, coastal, mountain

and desert), age and study levels.

4. Investigate the presence of significant differences in sample performance on the SPM

test according to region and gender, age and region, region and study levels,

geographic areas and gender, academic discipline and gender, age and gender and age

and academic discipline.

5. Investigate variability of SPM means score gender based on age, and gender based on

geographic areas, and gender based on academic discipline.

6. Examine the contribution of the independent variables gender, age and regions and

academic achievement in predicting SPM scores.

7. Compute the percentile ranks for the SPM scores according to the sample age levels.

 
8. Compare performance on the SPM test for a Libyan sample with that of other

countries (developed and developing countries).

The book begins, in chapter two, with a historical review of literature. First, the definition of

the concept of intelligence, its evolution and means of testing are presented. A brief look at

some of the important theories of intelligence developed over the past century is then

highlighted. After that, the definitions, classification, and uses of some mental tests including

culture fair tests, achievement tests, intelligence and academic achievement are discussed in

depth. The evolution of the Intelligence Quotient (IQ) with time in different countries will

also be studied.

Chapter three introduces the statement of problem and the study rationale. It provides a short

description of the education system and intelligence testing in Libya. It also includes the

research questions, study aims and objectives.

After setting the atmosphere of the research, the focus is then shifted, in chapter four, to the

general information regarding the Progressive Matrices tests. A description of the SPM test

and its standardization are presented. After that the reliability, validity and item analysis of

the SPM test are rigorously investigated. Towards the end, a brief review of previous studies

which have employed of the SPM test will be given.

Chapter five is concerned with methodology issues such as research design, ethical approval,

pilot study, and sample and data collection. It also covers statistical methods to be used, the

modification and administration of the SPM test. The tests are performed in Libya on a

sample of students.

Once the test is performed and data are available, the results are then examined and analysed

in chapter six. The initial step in the data-analysis pipeline was the standardisation of the

 
SPM test. The primary reason is to determine whether the SPM test can be effectively used in

Libya. The next step is the analysis of the rest of the study objectives such as the relationship

of the SPM test scores and Students Academic Achievements (SAA). The outcomes of this

chapter are compared to those found in other studies in chapter seven (meta-analysis). These

studies are sampled from both developed and developing countries. Also covered in this

chapter are literature review of meta-analysis applications on SPM tests, methodology, data

analysis tools and finally meta-analysis results.

The final part of the book, chapter eight, brings together the key research findings and

discusses them in context with the wider existing literature. Intelligence testing and IQ

distribution in Libya are discussed and evaluated in context of the available facilities. The

methods of data collection; SPM test and meta-analysis, are highlighted. The major

conclusions of the whole book and its contribution in the field of intelligence testing in Libya

are outlined. Moreover, strengths and weaknesses of the study are presented. Finally

recommendations for practice and future research naturally emerge from the study findings

are suggested.

 
Chapter two: INTELLIGENCE LITERATURE REVIEW

2.1 Introduction

Intelligence is a difficult construct to define. In a survey carried out by Snyderman

and Rothman (in Li: 1996), social scientists and educators were questioned on the

nature of intelligence. 99.3% indicated that abstract thinking or reasoning was an

important element of intelligence, 97.7% indicated that problem-solving ability was

important, and 96% indicted that capacity to acquire knowledge was important. This

survey emphasized the importance of thinking, learning and problem solving as

elements of intelligence (Marais, 2007). In another study nearly 500 laypeople and 24

experts were asked to define intelligence; Sternberg (2000) found that their responses

were surprisingly similar. Both groups viewed intelligence as a complex construct

made up of verbal ability, practical problem solving and social competence.

Intelligence is an important component of learning and academic achievement

because it can be seen as the ability to gain knowledge, to think about abstract

concepts, to reason as well as the ability to solve problems (Li, 1996).

An important consideration which has been in existence since Alfird Binet

constructed the first intelligence test, in 1905, is that although intelligence is relatively

stable, it should not be seen as a fixed characteristic.

The purpose of this chapter is a historical review of the literature. First, the definition

of the concept of intelligence, its evolution and means of testing are presented. A brief

look at some of the important theories of intelligence that have been developed over

the past century is then highlighted. After that, the definitions, classification, and uses

of some mental tests including culture fair tests, achievement tests, intelligence and

academic achievement are discussed in depth. Finally the evolution of the Intelligence

6
Quitenance (IQ) with time in different countries will be studied.

2.2 Definitions of Intelligence

Intelligence, a in word common using today, was almost unknown in popular speech a

century ago. After intelligence tests had been invented to measure intelligence,

scientists felt the urge to define it. They reintroduced the ancient Latin term

"intelligence" to refer to individual differences in mental ability (Aiken, 1988).

Sternberg (1990) mentions that today, as in the past, there seem to be as many

definition of intelligence as there are investigators of it. Wechsler (1975) also stated

that intelligence has been viewed by educators as the ability to learn, by biologists as

the ability to adapt to environment, and by psychologists as the ability to understand

relationships.

The history of the differences between psychologists regarding definitions of

intelligence is reflected in two symposia to define intelligence; the first was in 1921,

while the second was in 1986.

2.2.1 The 1921 Symposium

In 1921, the Editors of the Journal of Educational Psychology invited psychologists to

take part in a symposium (Intelligence and its Measurement). The contributors were

asked to define “ intelligence”. Following are some of their definitions:

Intelligence is equivalent to the capacity to learn or it is the ability to learn to adjust

oneself to environment (Colvin) P.136.

Intelligence is the capacity to learn or profit by experience (Dearbon) P.210.

Intelligence is sensory capacity, capacity for perceptual recognition, quickness, range

7
or flexibility of association; facility in imagination, span of attention, quickness or

alertness in response (Freeman) P.133.

Intelligence is a group of complex mental processes such as sensation, perception,

association, memory, imagination, discrimination, judgement, and reasoning

(Haggerty) P.212.

Intelligence involves two factors; the capacity for knowledge and knowledge

possessed (Henmon) P.195.

Intelligence seems to be a biological mechanism by which the effects of a complexity

of stimuli are brought together and given a somewhat unified effect in behaviour

(Peterson) P.198.

Intelligence is the ability of the individual to adapt himself adequately to relatively

new situations in life (Pintner) P.139.

Intelligence is the ability to carry on abstract thinking (Terman) P.128.

Intelligence is the power of good responses from the point of view of truth or fact

(Thorndike) P.124

Intelligence is the capacity to acquire capacity (Woodrow) P.207.

Intelligence contains at least three psychologically differentiable components: a) the

capacity to inhibit an instinctive adjustment, b) the capacity to redefine the inhibited

instinctive adjustment in the light of imaginably experienced trial and error, and c) the

capacity to realise the modified instinctive adjustment in overt behaviour to the

advantage of the individual as a social animal (Thurstone) P.201-202.

8
The most famous definition of intelligence which explains the absence of agreement

among psychologists, was made by Boring in1923, who claimed that intelligence is

what intelligence tests test. Spearman (1927) pointed out that intelligence had become

a word with so many meaning that finally it had none.

These psychologists gave different views about the nature of intelligence, although

there was much in common in their definitions (Sattler, 1982). In 1975 Samuda,

talked about ambiguity and little agreement found between psychologists in the 1921

Symposium, he stated:

If the experiment was to be replicated today, the same ambiguity that


existed some 50 years ago would still be apparent, for one need only
look at the more common definition of intelligence in order to realise
that psychologists still have not characterised explicitly and
universally what it means. P.26

2.2.2 The 1986 Symposium

Sixty-five years after the 1921 Symposium to define intelligence, Strenberg and

Detterman (1986) noticed that the effort to define intelligence had not been repeated.

In 1986 they asked experts in the field of intelligence to respond to the very same

question that was posed in the 1921 Symposium, to see what theorists of intelligence

today believed intelligence to be. The following are some of the 1986 Symposium

definitions:

Intelligence is quality of behaviour that is adaptive, representing effective ways of

meeting the demands of environment as they change (Anastasi) P.19.

Intelligence is a construct such as innate intellectual capacity, intellectual reserve

capacity, learning capacity, intellectual abilities, intelligence systems, problem-

solving ability, and knowledge system (Baltes) P.24.

9
Intelligence is a set of whatever abilities make people successful at achieving their

rationally chosen goals (Baron) P.29.

Intelligence is adaptive for a given cultural group in permitting members of the group,

as well as a whole, to operate effectively in a given ecological context (Berry) P.35.

Intelligence is the sum total of all cognitive processes, including planning, coding of

information and arousal of attention (Das) P.55.

Intelligence is a finite set of independent abilities operating as a complex system

(Detterman) P.57.

Intelligence consists of three capacities: (a) the capacity to manipulate symbols, (b)

the capacity to evaluate the consequences of alternative choices, and (c) the capacity

to search through sequences of symbols (Estes) P.65

Intelligence is proficiency (or competence) in intellectual cognitive performance

(Glaser) P.79.

Intelligence is the repertoire of intellectual knowledge and skills available to the

person at a particular point in time (Humphreys) P.98.

Intelligence is a general factor obtained from factoring an intercorrelation matrix of a

large number of diverse mental tests (Jensen) P.110.

Intelligence is implicitly determined by the interaction of organisms’ (individuals')

cognitive machinery and their social-culture environment (Pellegrino) P.113.

Intelligence provides a means to govern ourselves so that our thought and action are

organised, coherent, and responsive both to our internally driven needs and to the

10
needs of the environment (Sternbrg) P.141.

Intelligence is a hypothetical construct referring to an individual's cognitive processes

(Zigler) P.149.

After the two symposia to define intelligence, no single definition of intelligence was

agreed upon by psychologists. Viewed broadly, however, two themes seemed to run

through at least several of the definitions in the complete set: the capacity to learn

from experience and the capacity to adapt to one’s environment.

Again Sternberg (1990) found that some general agreement exists across the two

symposia regarding the nature of intelligence. He stated that attributes such as

adaptation to the environment, basic mental processes, and higher order thinking like

reasoning, problem solving and decision making were prominent in both symposia.

Charles Spearman defined intelligence as the ability to recognise relations and related

items (Abdel-Khalek, 2000) which is what John Raven’s test measures.

Lynn and Vanhanen in (2006) reported that a useful definition of intelligence was

proposed by Neisser in 1996; intelligence is the ability "to understand complex ideas,

adapt effectively to the environment, learn from experience, engage in various forms

of reasoning, and to overcome obstacles by taking thought" (Neisser, 1996, p. 1).

Also a similar definition by Gottfredson was published in the Wall Street Journal in

1994 as "Intelligence is a very general mental capacity which, among other things,

involves the ability to reason, plan, solve problems, think abstractly, comprehend

complex ideas, learn quickly and learn from experience. It is not merely book

learning, a narrow academic skill, or test taking smarts. Rather, it reflects a broader

11
and deeper capability for comprehending our surroundings - 'catching on', 'making

sense' of thing, or 'figuring out' what to do" (Gottfredson, 1997a, p. 13).

More recently Schmidt and Hunter (2004, p. 162) have taken stock of the results of a

century’s research on intelligence: “the accumulated evidence has become very strong

that general intelligence is correlated with a wide variety of life outcomes, ranging

from risky health-related behavior to criminal offenses, to the ability to use a bus or a

subway system”. Among the numerous tasks that intelligent people do more

effectively than less intelligent people are to acquire complex skills and work more

proficiently ( Lynn and Vanhanen, 2006).

In general two themes seem to run through at least several of the definitions in the

complete set: the capacity to learn from experience and the capacity to adapt to one’s

environment.

2.3 Evolution of the Concept of Intelligence and Intelligence Testing

Differences in intelligence have been evident since the beginnings of

civilization. In 400 BC, the Greek used the term “nous” to express intelligence.

Plato in his “Republic” claimed that “nous” is mostly inherited and that off

spring can be bred for it by selectively from parents who had the most “nous”

(Lynn and Vanhanen, 2006).

In addition 500 BC in China the Sui dynasty used tests of ability for the

administrative class of mandarins. These tests, in Chinese history, literature,

mathematics and astronomy, were still employed until the 20th century (Lynn

and Vanhanen, 2006).

In his book “Examen de Ingenios”, Huarte (1575) investigated the nature of

12
intelligence. In 1594 the book was translated into English in which the term

“wits” was used to express intelligence. The book evaluates the various types

of intelligence needed to succeed in medicine, law, the army, administration

and church ( Lynn and Vanhanen, 2006).

In (1651) Thomas Hobbes’ wrote in Leviathan:

“Virtue generally, in all sorts of subjects, is somewhat that is valued


for eminence, and consisteth in comparison. For if all things were
equal in all men, nothing would be prized. And by “virtues
intellectual” are always understood such abilities of the mind as men
praise, value and desire should be in themselves; and go commonly
under the name of a “good wit” (pp.38-39).

Hobbes proposed the concept of “natural wit” which:

“Is gotten by use only and experience; without method, culture or


instruction” (p.39)

And he was distinguishing here between intelligence and educational

attainment. He proposed further that:

“This natural wit consisteth in two things, celerity of imagining, that


is swift succession of one thought to another, and steady direction to
some approved end. A slow imagination maketh that defect or fault
of mind that is commonly called dullness, stupidity, and sometimes
by other names that signify slowness of motion” (p.39).

During the nineteenth century the study of mental retardation witnessed a strong

awakening of interest in the human treatment, training and education of the mentally

retarded. Anastasi (1988) stated that one of the first problems that stimulated the

development of psychological tests was the identification of the mentally retarded.

Marks (1981) and Rust and Golombak (1989) observed that the rapid scientific and

social progress in Europe during the nineteenth century led to the development of

several assessment techniques, most notably in medical diagnosis of the mentally ill.

13
Empirical support for the theoretical basis of intelligence as a unitary construct

essentially began with the development of factor analysis (Ittenbach, Esters, &

Wainer, 1997). The historical antecedents for factor analysis originated with the work

of Galton who developed many of the quantitative devices utilized in psychometry

(e.g., the bivariate scatter diagram, regression, correlation, and standardized

measurements) (Jensen, 1980). Galton in (1869) further developed the concept of

intelligence in his publications. He claimed that intelligence is a mainly inherited

single entity and that intelligence determines the level of civilization. He studied the

number of geniuses compared to the size of their populations and reached the

conclusion that there is a difference in average-intelligence among races; the Greeks

being the most intelligent while the Australian Aborigines being lowest (Galton,

1869 ).

Galton was the first researcher to utilize empirically objective devices to measure

individual differences in mental abilities (Jensen, 1980). He administered different

measures of mental functioning to thousands of individuals as he refined his methods

of assessing mental ability. Galton analyzed the scores and applied statistical

reasoning to the study of those with high ability. He was the first to identify "general

mental ability" in humans (Jensen, 1980).

One of Galton's followers, Spearman, was the first to assert that all individual

variance in higher order mental abilities is positively correlated. The

aforementioned contention supported Galton's belief in a general factor of mental

ability (Jensen, 1980). Spearman introduced factor analysis, in part, to ascertain the

degree to which a test measures a general factor (Jensen, 1980). Spearman used

factor analysis to determine whether the shared variance in a matrix of correlation

coefficients resulted in a single general factor or in several independent more

14
specific factors (Gould, 1996). Spearman believed each test of mental abilities had a

single general factor, g, as well as specific factors (s) unique to the test. These

beliefs led to the development of the two-factor theory of intelligence. Spearman

and many scholars (Carroll, 1993; Hermstein & Murray, 1994; Jensen, 1980;

Rushton, 1997) continued to believe scores on intelligence tests are reflected best

by g. These theorists consider g to be the most parsimonious method to describe

one's intelligence and thus to use when examining mean IQ differences between

races (Neisser, 1998).

Factor analysis soon became one of the most important techniques in modern

multivariate statistics (Gould, 1996; Kamphaus, Petosky, & Morgan, 1997). It is a

statistical technique that allows one to analyze the sources of variance of a particular

measure by examining the pattern of correlations between two measures and other

measures. The technique is useful to reduce a complex set of correlations into fewer

dimensions by factoring a matrix of correlation coefficients (Gould, 1981). The

variables that were most highly correlated were combined to form the first principal

component by placing an axis through all the points. Other axes, drawn to account for

the other variables, are labeled second and third (etc.) order factors (Edwards, 2003).

Relative to intelligence testing, factor analysis has been applied to show positive

correlations among different mental tests (Gould, 1996). In that most correlation

coefficients in mental tests are positive, factor analysis yielded a reasonably strong

first principal component (Gould, 1996).

General factor theorists such as Spearman used factor analytic techniques to

demonstrate the viability of g as the first factor to emerge when analyzing factor

scores for intelligence tests. Other theorists used factor analysis to suggest that IQs

15
depend on a number of independent factors, not a large general factor (Gardner,

1983; Spearman, 1923).

Although researchers may disagree about the structure of intelligence, they agree that

IQs arise as a function, at least to some degree, from a general factor as well as

reflecting multidimensional aspects of intellectual functioning (Carroll, 1993; Sattler,

1998; Urbach, 1974). To reiterate, g is important because it is considered the best way

to express one's general mental ability.

The history of mental measurement development during the nineteenth and early

twentieth century can be classified through the contributions of scientists such as

Seguin, Esquirol, Galton, Cattell, Binet, and Spearman. Detailed description of these

contributions are voluminous and moreover, beyond the scope of this study so we will

confine ourselves to providing a brief summary of each one.

2.3.1 Contribution of Edward Seguin (1812-1880)

The French physician Seguin started his career as an assistant to Jean Itard, who was

working with a wild boy found by hunters in the forest of Aveyron. In 1837 Seguin

established the first school for training and education of mentally retarded children. In

1844 he emigrated to America where his ideas gained wide recognition. Guilford

(1967) mentioned that Seguin was pioneering in the training of mentally retarded

individuals by exercising their sensory and motor function. In 1866 he developed the

first non-verbal test, the Seguin Form Board, in which the individual is required to put

variously shaped blocks back in their closely fitting spaces as quickly as possible.

Corsini (1984) mentioned that Seguin's test was the first to be used as some measure

of intellectual functioning. Domino & Dominom (2006) reported that Eduard Seguin

16
developed many procedures to enhance muscular control and sensory abilities for the

mentally deficient. Some of these procedures were later incorporated into tests of

intelligence.

2.3.2 Contribution of Jean Etienne Esquirol (1772-1840)

In 1838 Esquirol, another French physician was the first person to make a clear

distinction between mental retardation and mental illness. He pointed out that mental

retards may have never developed their intellectual capacity, whereas mentally ill

people had lost the abilities they once possessed. He also pointed out that the

individual's use of language and therefore language tests provided the most

dependable criterion of his or her intellectual level in developing a method for

differentiating mental retardation from mental illness (Anastasi & Urbina, 1997;

Domino & Domino, 2006).

2.3.3 Contribution of Sir Francis Galton (1822-1911)

Some commentators have suggested that testing movement began with the English

biologist Galton who was interested in human heredity. Anastasi (1988) believed that

Galton was primarily responsible for launching mental measurement. Richardson

(1991) also believed that the first person to seriously attempt to measure intelligence

was Galton. Galton realised the need for measuring characteristics of related and

unrelated individuals to discover the degree of resemblance between parents and

offspring. Galton was the first scientist who undertook statistical measurement of

individual differences.

For seven years from 1884 to 1890 Galton set up an anthropometric laboratory at

South Kensington Museum in London, where for a small fee, visitors could have

themselves measured on a variety of physical traits like vision, hearing, muscular

17
strength, reaction time, and other simple sensorimotor functions (Anastasi & Urbina,

1997; Snyderman & Rothman, 1988; Virgolim, 2005; Domino & Domino, 2006).

Herrnstein & Murray (1994) stated that Galton had the idea that intelligence would

surface in the form of sensitivity of perception, so he constructed tests that relied on

measures of sight, hearing, sensitivity to light, skin pressure, and speed of reaction to

simple stimuli. He therefore concluded that the more perceptive the senses, the larger

the range of information would be on which intelligence could act. Jensen, as reported

by Corsini (1984) points out that Galton's contribution to statistics and psychometrics

included percentile ranks, the use of central tendency and rating scales.

2.3.4 Contribution of James McKeen Cattell (1860-1944)

American born psychologist James Cattell went to Germany and studied with

Wilhelm Wundt at Leipzig where the first psychological laboratory was founded in

1879. The first psychologists at Leipzig studied the same processes that physiologists

did, namely seeing, hearing and speed of response (Attashani and Abdalla, 2005).

Anastasi (1988) claimed that the principal focus of early experimental psychology in

Leipzig was on formulating generalised descriptions of human behaviour. Thus

individual differences were either ignored or accepted as a form of error or as a

necessary evil that limited the applicability of generalisation.

For his doctorate Cattell completed a dissertation on individual differences in reaction

time. He lectured at Cambridge University where he met Galton, who shared Cattell’s,

interests. He was also active in the spread of the testing movement in the USA (

Anastasi & Urbina 1997; Sternberg 2000).

Cattell proposed a series of 50 psychophysical tests, most of them were of a sensory

18
and motor nature, and differing little from those designed by Galton. In an article

published in 1890 in Mind, entitled "Mental Tests and Measurements", Cattell was the

first to use the term "mental test" in psychological literature (Freeman, 1962; Eysenck

et al., 1972; Sattler, 1982; Fancher, 1985; Anastasi, 1988; Sternberg, 1990).

Freeman (1962) and Jensen (1981) both concluded that the Galton-Cattell approaches

to measurement of mental ability, whilst not of major significance in the field of

testing, did nonetheless strongly affect the course taken by test experimenters until

about 1900 when the influence of Alfred Binet was first felt.

2.3.5 Contribution of Alfred Binet (1857-1911)

The history of mental testing is widely considered to have begun with the work of

Binet. Binet, Simon, and Henri, spent many years in research on ways of measuring

intelligence. Anastasi (1988) stated that in 1895 Binet and Henri published an article

in which they criticised most available tests (Galton type tests) as being too sensory

and concentrating on simple specialised abilities. Their research suggested that the

key to the measurement of intelligence lay in focusing on higher mental processes

instead of measuring simple sensory functions as in Galton and Cattell tests.

Binet assumed that intelligence was not much involved in sensory-motor tasks but in

tasks calling for more complex mental processes, especially judgement (Jensen 1980).

Binet and Simon believed that essential activities of intelligence were to judge well, to

comprehend well and to reason well. Binet found that children who were best in

judgement tended to be superior in attention, and vocabulary (Sternberg 1990).

In 1904 the Ministry of Public Instruction in France appointed a committee to study

the procedures for the education of mentally retarded children. A member of this

19
commission was Binet. In 1905 Binet, in collaboration with Simon, prepared the first

Binet-Simon Scale. The scale consisted of 30 items, designed for children aged 3 to

12 years arranged in order of difficulty. Improved versions came out in 1908 and 1911

in which unsatisfactory items were eliminated, items increased and grouped into age

levels and the test was extended to adult level (Roid and Barram, 2004).

Binet’s test emphasised judgement, comprehension, and reasoning which Binet

regarded as essential components of intelligence. A child's score on the test was

reported in terms of mental age (MA). A mental age below the child's chronological

age (CA) indicated some degree of mental retardation; a higher MA than CA

indicated some degree of accelerated intellectual development. In 1912 a German

psychologist, Wilhelm Stern proposed the use of the ratio of mental age to

chronological age to yield the "intelligence quotient" (IQ). Mental age was the level

of ability of the average child certain age, e.g. mental age of 12 is defined by mental

tests the child at 12 years would pass. IQ was Mental Age divided by Chronological

Age multiplied by 100. So a child at 10 years but functions as a child of 5 years would

have an IQ of 5/10 × 100=50. Nowadays, IQ is calculated by transforming the test

scores to a metric with a mean set at 100, and a standard deviation of 15. This would

mean that 96% of the population’s IQ was between 70 and 130. 2% of the population

under 70 and considered mentally retarded while 2% were above 130 and considered

gifted.

Many researchers believe that the testing movement began to flourish after the

introduction of the Binet-Simon Scale in 1905. For example Herrnstein and Murray

(1994) mentioned that Binet developed questions that attempted to measure

intelligence by measuring a person's ability to reason. They concluded that Binet’s

20
test met a key criterion that Galton's test could not. Sattler (1982) mentioned that the

Binet - Simon scale served the purpose of objectively diagnosing a degree of mental

retardation, and became the prototype of subsequent scales for mental ability

assessment.

Within a few years translations and adaptations of the Binet-Simon Scale appeared in

many countries. The most rapid development took place in the USA in 1916 (SB1)

where Lewis M. Terman developed the Stanford revision of the Binet-Simon Scale

(SB1), now familiar as the Stanford-Binet Intelligence Scale. Terman added more

items and made other improvements to the test. The test was revised in 1937 (SB2) (L

and M forms), 1960-1973 (SB3) and again in 1986 where Thorindike, Hagan &

Sattler developed the (SB4) based on a four-factor hierarchical model with general

ability “g” as the overarching summary score. More recently Roid 2003 constructed

(SB5) on a five-factor hierarchical cognitive model ( Roid & Barram 2004) .

The Stanford-Binet Intelligence Scale very quickly became the "standard" I.Q on both

sides of the Atlantic. For more than half a century the Stanford-Binet test has been

one of the most widely used individual test of intelligence and has often served as a

standard for the construction of other tests (Jensen, 1980; Richardson, 1991).

2.3.6 The First World War and the Development of Group Tests

In spite of the success of the Stanford-Binet test, there was one problem in that it was

an individual test administered to one subject by one examiner. As the USA entered

the First World War the need arose for rapid testing of a large numbers of subjects in

a short time (Anastasi & Urbina 1997; Kaufman & Kaufman 2004).

In 1917 Robert Yerkes, the president of the American Psychological Association

claimed that psychology had achieved a position which would enable it to

21
substantially help to win the war and shorten the necessary period of conflict. He

formed a committee of American intelligence testers to develop a test to classify all

recruits in order that they would be properly placed in the military service and to

screen all army recruits for mental defectiveness (Anastasi & Urbina 1997; Kaufman

& Kaufman 2004).

A major contribution to group tests during the World War was made by Arthur. S.

Otis whose group intelligence test "The Scale for the Group Measurement of

Intelligence" was used by the committee becoming the basis of the Army Alpha Test

(Anastasi & Urbina 1997; Kaufman & Kaufman 2004).

The committee quickly developed two tests; the Army Alpha for literate, and the

Army Beta for non-English speakers who were unable to take the test in English. The

Alpha tests included arithmetic problems, general information, and number

sequences. The Beta test included mazes, finding the missing element in pictures and

coding. By the end of the war in 1918 about 1,750,000 men had been given the Army

Alpha or Beta tests (Freeman, 1962; Guilford, 1967; Noll and Scannell, 1979; Ebel,

1972; Marks, 1981; Fancher, 1985; Sokal, 1987; Anastasi, 1988).

Shortly after the First World War the tests were released for civilian use and served as

models for most group intelligence tests. Concurrently, their development gave rise to

a number of controversial questions. Amongst these were the relative influence of

heredity and environment & the explanation of racial differences in measured

performance (Anastasi & Urbina 1997; Kaufman & Kaufman 2004)..

In a summary of the misuse of scores in the United States after the development of

intelligence tests, Kamin (1981) mentioned as examples: sterilisation laws,

22
immigration quotas, and early racism. Tyler and Walsh (1979) stated that after the

development of intelligence group tests, attempts to measure personality

characteristics as well as ability became more and more common.

2.3.7 Contribution of Charles Spearman (1863-1945)

Spearman's work focused on determining whether intelligence was a single ability

factor or a combination of various factors. The measurement of Spearman’s "general"

factor in his two-factor theory was the object of the Standard Progressive Matrices

(SPM) test. Kline (1979) believed that the first contribution from psychometrics to

psychological insight into the nature and structure of human abilities emerged from

the work of Spearman. Eysenck et al. (1972) were also of the view that Spearman’s

two factor theory of intelligence together with Binet-Simon's Scale represented the

starting point for the development of the theory and measurement of intelligence in

the twentieth century.

Spearman's two-factor theory was based on analysis of empirical data from test

scores. Spearman's first investigation was with children in village school (N=24), to

estimate their "intelligence" in three ways: teacher's ranking of children "cleverness in

school" having the two oldest children rank the members of their class for "sharpness

and common sense out of school", and Spearman's rank of children's performance on

three sensory tasks involving pitch, light and weight discrimination.

Spearman found a correlation of 0.55 between the three intellectual variables, the

correlation between the three sensory measures was 0.25, and a correlation of 0.38

between intellectual and sensory measures (Fancher, 1985).

His second investigation was with boys from an upper class preparatory school

23
(N=22). This time he took examination grades in Classics, French, English and Maths

as measures of "intelligence" and correlated them with a pitch discrimination task and

with the music teacher’s ranking of the boys' on musical proficiency. Spearman found

music and pitch correlated with the four intelligence scores at the average of 0.56,

while music and pitch correlated with each other at 0.40 and the correlation between

the four examination grades was on average 0.71 (Fancher, 1985; Richardson, 1991).

In 1904 Spearman published his conclusions in his famous article "General

Intelligence, Objectively Determined and Measured", in which he stated;

On the whole, then, we reach the profoundly important conclusion


that there really exists something that we may provisionally term
"General Sensory Discrimination" and similarly a "General
Intelligence" and further that the functional correspondence between
these two is not appreciably less than absolute (P. 272).

Spearman also discovered that the correlation between the six variables (Classics,

French, English, Maths, Pitch and Music) were not only all positive, but also ranged

themselves in a nearly perfect hierarchy. This was one of the observations that lead to

the formulation of the “g”-theory, which will be presented in the nest section.

Spearman further identified two components of "g" factor as; (a) eductive ability, that

is, the mental activity making meaning out of confusion, developing new insight,

going beyond the given to perceive that which is not immediately obvious, and

generating high level schemata, which make it easy to handle complex events.

Eductive ability is largely non-verbal. (b) Reproductive ability, that is, the ability

involving mastering, recalling, and reproducing the material to recall acquired

information. Reproductive ability is largely verbal (Raven, 1989). According to

Herrnstein (1973), to be clever, for Spearman, meant having lots of "g".

24
Brody (1992) identified at least five important contributions of Spearman's theory to

our understanding of individual differences in intelligence. First, he provided an

explicit theoretical rationale for the construction of a test of intelligence, and

emphasized that intelligence tests should contain subscales or measures that have high

g-to-s ratios. Second, his methods for analyzing correlation matrices were the

foundation of factor analysis. It can be said that his method was the precursor of the

use of construct validation procedures to access the validity of a measure. Third,

Spearman conceived intelligence as a construct and a hypothetical entity, which could

not be identified with any particular measure or subset of measures. Fourth, his theory

contained a strong empirical claim that all measures of intelligence were measures of

a single common theoretical entity, a supposition that is still in debate in

contemporary research. Finally, Spearman may have been correct when he assumed

the existence of a relationship between simple sensory discrimination tasks and

intelligence, as hypothesized in previous studies. However, he criticized the results of

Wissler's research, first because of the intellectual homogeneity of his sample

(Columbia University students), and second because of the lack of ideal conditions of

measurements in the experiment, 3 subjects were tested at once, responding to 22 tests

in 45 minutes (Virgolim, 2005).

2.3.8 Contribution of Piaget (1896-1980)

One of the most important contributions to the study of intelligence emerged from

the work of Jean Piaget, who sought to explain intellectual development as a result

of changes in the cognitive function (Piaget, 1961). Piaget began his inquiry in a

non-scientific way, selecting only three subjects to study (his own children) without

a control group. However, he described the results of his observations in such a clear

25
and detailed manner, that his evidence permitted him to explain important principles

of growth and development (Virgolim, 2005). Many subsequent studies have

reported his principles as viable and useful (Clark, 1992; Wadsworth, 1993).

According to Piaget (1961), the cognitive processes emerged as a result of the

reorganization of psychological structures that resulted from the dynamic

interaction of a child with his/her environment. The interaction among the critical

variables to cognitive development (such as maturation, experience, social

interaction and equilibration) regulated the direction of the child's development

(Wadsworth, 1993). The Piagetian tests, unlike the traditional psychometric tests

used so far, aimed to assess not what we know (the product), but rather how we

know or think (the process), and how people obtain and use information to solve

problems and acquire knowledge (Weinberg, 1989).

Piaget was also one of the first theorists to establish an interactive theory of

intelligence. According to him, the cognitive development equally depended on genetic

contributions as well as quality of environment where the child lived. This position

has numerous followers and, as pointed out by Plomin (1989), the most recent

researchers support the notion that genetic influences on behavior are multifactorial,

equally comprising hereditary transmission and the environment. Although genetic

factors, in general, account for no more than half of the variance of behavioral traits,

they affect probabilistic propensities rather than predetermined programming (Plomin,

1997). However, as pointed out by Neisser and his collaborators (1996), the pathways

by which genes make their contributions to individual differences in intelligence were

largely unknown. Similarly, the exact way the environment contributes to those

differences still remain a mystery.

26
2.4 Theories of Intelligence

2.4.1 Spearman’s “g” Theory

An important advance in the theory of intelligence was made by Charles Spearman

(1904) in the early twentieth century. Spearman showed that all cognitive abilities are

positively inter-correlated, e.g. people who do well on some tasks tend to do well on

others. He invented the statistical method of factor analysis to show that the efficiency

of performance on all cognitive tasks was partly determined by a common factor. He

designated this common factor “g” for "general intelligence" and defined it as "the

eduction of relations and correlates" (Spearman, 1927). To explain the existence of

the common factor, Spearman proposed the presence of some general mental power

determining performance on all cognitive tasks and responsible for their positive

inter-correlation. Nevertheless, he also found that correlations between tests of

different abilities are not perfect (Lynn and Vanhanen 2006). To explain this he

proposed that in addition to “g”, there were a number of specific abilities that

determined performance on particular types of tasks; over and above the effect of “g”.

Spearman identified three major laws of cognitive activities associated with “g”.

The first was the Law of Apprehension, that is, the fact that a person
approaches the stimulation he receives from all external and internal sources
via the ascending nerves.... Next we have the eduction of Relations. Given two
stimuli, ideas, or impressions, we can immediately discover any relationship
existing between them-one is larger, simpler, stronger or whatever than the
other. And finally, we have the eduction of Correlates-given two stimuli,
joined by a given relation, and a third stimulus, we can produce a fourth
stimulus that bears the same relation to the third as the second bears to the
first.... If Spearman is right, then tests constructed on these principles, that is,
using apprehension, eduction of relations and eduction of correlates, should be
the best measures of g; that is, correlate best with all other tests. This has been
found to be so; the Matrices test... has been found to be just about the purest
measure of IQ. (Eysenck, 1998, p. 57).

By the end of the twentieth century Spearman’s basic theory had become virtually

27
universally accepted in the academic discipline of differential psychology. The

principal elaboration of the theory has been the development of what is called the

hierarchical model of intelligence. This consists of a hierarchical structure in which

there are numerous narrow specific abilities at the base, eight “second order or group

factors” consisting of verbal comprehension, reasoning, memory, spatial, perceptual,

mathematical, cultural knowledge and cognitive speed in the middle of the structure

and a single general factor - Spearman’s “g” - at the apex. This model was widely

accepted among contemporary experts such as the American Task Force chaired by

Ulrich Neisser (1996), Jensen (1998), Mackintosh (1998), Carroll (1994), Deary

(2000) and many others.

Matrices tests such as the Raven's Progressive Matrices employed Spearman's theory

and have been widely used as measures of intelligence (Eysenck, 1998). Matrices

tests contained substantial loadings of “g” and demanded conscious and complex

mental effort, often evident in analytical, abstract, and hypothesis-testing tasks

(Sattler, 1988). Conversely, tests that require less conscious and complex mental

effort are low in g (Sattler, 1988). Intelligence tests with lower g emphasize specific

factors such as recognition, recall, speed, visual-motor abilities, and motor abilities

(Sattler, 1988).

2.4.2 Thurstone's Primary Mental Abilities (1938)

Louis Thurstone (1938) disagreed with the idea that intelligence comprised a general

factor. Thurstone viewed intelligence as a multidimensional rather than a unitary trait.

Thurstone was intent on showing how intelligence could be separated into the noted

multiple factors, each of which had equivalent significance (Sattler, 1998). In his

1935 book, The Vectors of Mind, he hypothesized that intelligence consists of a small

28
number of independent factors, corresponding to different cognitive domains, each of

them contributing in different degrees, depending on the individual's situation. These

factors were: verbal ability, general reasoning (inductive and deductive), numerical

ability, memory, perceptual speed, word fluency, and spatial ability. These factors are

still present in traditional measures of intelligence (Snyderman & Rothman, 1988).

Thurstone initially discounted a general factor as a component of mental functioning.

He analysed the results of 50 intelligence tests which he administered to college

students and came to the conclusion that there were seven primary mental abilities

that made up a person’s intelligence. The abilities or factor were; Spatial (S) the

ability to form spatial and visual images. Perceptual (P): the ability to find or

recognise particular items in a perceptual field. Numerical (N): the ability to perform

simple numerical calculations. Verbal relations (V): the ability to conceptualize ideas

and meanings in language. Word (W) the ability to deal with single and isolated

words in a fluent manner. Memory (M) the ability to recognize and recall words,

number and figures after having memorized them. Inductive Reasoning (I) the ability

to find a rule or principle and apply it. Restrictive reasoning (R): the ability to

successfully complete tasks that involve restriction in the solution. Arithmetical

reasoning utilizes restrictive reasoning as the answer to an arithmetical calculation is

limited to one correct solution. Deductive Reasoning (D) the ability to draw a logical

conclusion from a set of assumptions (Thurstone, 1938). However, Sternberg (1985a)

pointed out that the differences between Spearman's and Thurstone's theories seemed

to be of emphasis rather than of substance. Later in their lives, Spearman was

compelled to recognize the existence of group factors, while Thurstone was forced to

acknowledge the existence of a higher-order general factor, connected, in some way,

to the primary mental abilities (Snyderman & Rothman, 1988). In 1941, Cattell

29
proposed a reconciliation between the two theories by postulating the existence of a

hierarchical structure of ability (Snyderman & Rothman, 1988; Brody, 1992). The

“g” factor would be a general, common factor, presented in all measures of the

ability, derivable from the relationships that exist among the more specialized factors

postulated by Thurstone.

2.4.3 Guilford’s structure of the intellect theory

Guilford (1967, 1985) identified many different factors which together make up the

structure of “intellect” or “intelligence”. Intelligent functions were defined according

to three different dimensions: operation, content and product. Mental processes

identified by Guilford (1967) were Cognition: comprehension or understanding of

information. Memory: ability to recall and recognise information that has been

memorised. Divergent Production: creative thinking which involves fluency,

flexibility and elaboration abilities. Convergent Production: this refers to thinking in

which a single correct answer to a question is produced. Evaluation: comparing a

product of information with known information according to logical criteria and

making a decision concerning criterion satisfaction. Visual: the visual category refers

to information that is visually perceived, e.g. correct perception of words that have

missing letters. Auditory: refers to information that is heard and therefore auditory

discrimination is important, e.g. listening to and interpreting a radio code. Symbolic;

information: in the form of tokens or signs and stands for something else, e.g. printed

language. Semantic: meanings of words comprise semantic content. Behavioural:

nonverbal information is involved in human interactions. Abilities were not only

classified according to the processes and content but also according to the form in

which the information was processed. The form of information is classified into

product categories. The products identified were Units: the most basic form of

30
information is units or parts of wholes. Units can be seen as chunks of information,

e.g. single words. Classes: a class is a set of objects with one or more common

properties, e.g. in number classification, the number 22 first in the class formed by the

numbers 44, 55 and 33. Relations: a relation is a connection between two things. An

item testing the cognition of relations e.g. may require the identification of a relation

as the movement of a line by 45 degrees in a clockwise direction. This relation is then

applied to another set of figures. Systems: complexes, patterns or organizations of

interdependent or interacting parts from systems. In testing the cognition of systems,

spatial orientation tasks may be used, where visual rotation and consideration of many

different parts and their changing relationships to each other are involved.

Transformations: changes, revisions, redefinitions or modification, by which any

product of information in one state changed over into another state involves

transformation. In testing cognition of semantic transformation, the respondent may

have to explain the many different ways in which two common objects, such as an

apple and an orange, are alike. This involves the redefinition of the objects by

emphasising one attribute or another. Implication: an implication is something

expected, anticipated or predicted from given information. In an item testing the

cognition of symbolic implications, different words are placed in relation to each

other in the manner of a crossword so that words may be read down or across.

Considering position of letters gives rise to the expectation that one of the other words

would fit in a certain place (Guilford, 1967).

2.4.4 Gardner’s theory of multiple intelligences

Gardner (1993) defined intelligence as comprising different kinds of processing

operations that allow a person to achieve in one or more of eight culturally

meaningful areas. Gardner did not agree with the concept of a general intelligence

31
factor (g) and held that eight different intelligences were found to a greater or lesser

extent in different individuals. The eight intelligences identified by Gardner were

Linguistic: sensitivity to sounds, rhythms, meanings of words and different language

functions. Logico-mathematical: sensitivity and capacity to detect logical or

numerical patterns; ability to handle long chains of logical reasoning. Musical: ability

to produce and appreciate pitch, rhythm (or melody) and aesthetic-sounding tones;

understanding forms of musical expressiveness. Spatial: to perceive visual-spatial

words accurately, to perform transformations on those perceptions, and to recreate

aspects of visual experience in the absence of relevant stimuli. Bodily-kinaesthetic:

ability to use the body skillfully for expressive as well as goal-directed purposes;

ability to handle objects skillfully. Naturalist: to recognize and classify all varieties of

animals, minerals and plants. Interpersonal: detection and appropriate responding to

the moods, temperaments, motivations and intentions of others. Intrapersonal: ability

to discriminate complex inner feelings and to use them to guide one’s own behaviour;

knowledge of one’s own strengths, weaknesses, desires and intelligences. Only few

factor analytical studies support the existence of multiple intelligences as Gardner saw

them (Marais, 2007).

2.4.5 Cattell and Horn’s theory of fluid and crystallized intelligence

Cattell proposed a theory that intelligence consisted of two major types of cognitive

abilities: crystallised and fluid intelligence. Crystallised intelligence (Gc) referred to

acquired skills and knowledge that were dependent on exposure to a particular culture,

as well as formal and informal education, for example, vocabulary. The abilities that

made up fluid intelligence (Gf) were nonverbal, relatively culture-free, and

independent of any specific instruction, for example, memory for digits (Cohen &

Swerdlik 2002).

32
Tests that measured the ability to manipulate information and solve problems were

considered measures of fluid ability whereas tests that require simple recall or

recognition of information were considered measures of crystallized abilities (Sattler,

1998).

2.4.6 Carroll’s three-startum theory of cognitive abilities

Carroll (1994) used exploratory factor analysis to test his belief that human cognitive

abilities could be conceptualized hierarchically (McGrew & Woodcock, 2001). He

developed a hierarchically arranged model of cognitive abilities. This model

elaborated on the models proposed by Spearman, Thurstone and Cattell. Carroll

represented the structure of intelligence as a pyramid, with ‘g’, or general intelligence

as conceptualized by Spearman, at the top (Berk, 2000). Eight broad abilities occupied

the second stratum, arranged from left to right in terms of their decreasing correlation

with ‘g’. The eight abilities were fluid intelligence, Crystallised Intelligence, General

Memory and Associative learning, Broad Visual perception, Broad Cognitive

Speediness and Processing Speed ( Berk, 2000).

2.4.7 Cattell-Horn Carroll Model

The Cattell-Horn-Carroll theory of intelligence was most closely derived from

Spearman's theory of g, the fluid and crystallized intelligence theories of Cattell and

Horn, and the factor-analytic work of Carroll. The Cattell-Horn theory of intelligence

was combined with the Carroll model, to provide a comprehensive conceptualization

of human cognitive abilities that many scientists would agree on (Cohen & Swerdlik

2002).

In Cattell-Horn-Carroll (CHC) model, there were ten broad stratum abilities and over

seventy narrow stratum abilities. Each broad stratum ability included two or more

33
narrow stratum abilities. The ten broad stratum abilities were: Fluid Intelligence (Gf),

Crystallised Intelligence (Gc) Quantitive Knowledge (Gq), Reading/Writing ability

(Grw), Short-term Memory (Gsm), Visual Processing (Ga), Long-term storage and

Retrieval (Glr), Processing Speed (Gs) and Decision/Reaction time or Speed (Gt).

Recent studies showed that the CHC model offered a better representation of the

structure of intelligence compared to other selected models or theories (Marais, 2007).

2.5 Definitions of Mental Test

Standardisation of a test designates setting up norms .This means obtaining average

scores and distributions from a representative population. The importance of

standardisation is that it gives the test scores psychological meaning and thus makes

interpretation possible. Practically, standardization of test is essential in vocational

guidance or personnel selection where decisions about individuals are made (Kline

2000).

Kline (2000) argued that as norms are essential for the understanding the

measurements (test scores) they must be accurate. To ensure this he mentioned that

some requirements for a good standardisation should be met. These include sampling

and expressing of the results which will be discussed in detail later (chapters 5 and 6).

Cronbach, (1990) stated that a test is a systematic procedure for observing behaviour

and describing it with the aid of numerical scales or fixed categories. Anastasi (1988)

and Brown (1983) defined psychological testing essentially as an objective and

standardised measure of a sample of behaviour. Also Anastasi (1988) added that a

diagnostic or predictive value of a psychological test depends on the degree to which

it serves as an indicator of a relatively broad and significant area of behaviour.

Jensen, (1981) defined a mental test as a small sample of behaviour used to predict

34
more extensive or important behaviour or capability. He added that mental tests were

essentially similar to other tests. Tyler and Walsh (1979) defined tests as standardised

situations designed to elicit a sample of an individual's behaviour.

From the above definitions it is clear that a test is a tool used to measure a sample of

behaviour, not a complete inventory. Psychological tests are standardised, that is, each

test is administered under a prescribed set of procedures, and objective which implies

judgement or evaluation of test scores. Scarr (1981) mentioned that the sampling

rationale was that an individual who can repeat six digits backwards can also

manipulate other information in his/her head.

2.6 Classification of Mental Tests

Mental tests are a subset of psychological tests. Psychological tests can be divided

into:

a) Mental tests which are used to measure general intellectual ability of individuals

(intelligence tests) or to measure an individual's ability of a specific kind, like

mechanical, clerical or musical (aptitude tests).

b) Personality tests which are used to evaluate non-intellectual traits of personality by

questionnaires, self-rating inventories or projective techniques.

Mental tests can be classified according to timing, procedure of test administration,

and content (Attashani and Abdalla, 2005).

2.6.1.1 Classification of tests according to timing

a) Speed tests: measure speed and efficiency with which a subject can perform test

items. In a speed test the items are so easy and simple that almost anyone could get

them all right if given sufficient time. Such test identifies who works faster (Jensen,

35
1980; 1981; Brown, 1983).

b) Power tests: determine highest level of knowledge, skill or reasoning the subject

can demonstrate without time pressure. They consist of items graded in difficulty or

complexity. In a power test there is no time limit or a very liberal time limit which

allows individuals to complete all items they can answer correctly. Scores in a power

test reflect the level of difficulty of items the test taker can answer correctly. (Jensen,

1980; 1981 and Brown, 1983).

2.6.1.2 Classification of tests according to procedure of administration

a) Individual tests: test is administered to one subject by one examiner at a time. It

allows the examiner to observe the subjects performance on the test items, which

helps in evaluating test scores. Common examples of individual intelligence tests are

Wechsler and Stanford-Binet tests (Kline 2000).

b) Group tests: administered to a number of subjects at the same time. Often referred

to as paper and pencil tests because they require subjects to write answers or make

marks on specially prepared answer sheets. Because of their simplicity and low cost,

this type of test is more popular than individual tests (Ahmann & Glock, 1976).

Group intelligence tests are more often used for initial screening in schools and

businesses because they can be administered quickly and economically by people with

minimum training. Individual intelligence tests are preferred by psychologists in

clinical and other settings where clinical diagnoses are made and where they serve as

measures of general ability and as a means of obtaining insight into personality

functioning and disabilities (Anastasi & Urbina 1997).

36
2.6.1.3 Classification of tests according to content

a) Verbal tests: involves the use of language, spoken or written, but they may or may

not require reading or writing. Typical verbal tests are general information, verbal

analogies, and vocabulary tests (Kline, 2000).

b) Non-verbal tests: paper and pencil tests that involve no explicit use of language, in

some cases not even for giving instructions for taking the test. These tests consist of

such things as figural analogies, matrices, and embedded figures. The SPM test is an

example of such tests (Kline, 2000; Domino & Domino 2006).

c) Performance tests: non-verbal tests that require the subject to perform certain

actions such as drawing, manipulation or construction. These tests may consist of

figure copying, block design, and picture completion or picture arrangement. The

performance part of Wechsler test is an example of performance tests (Kline, 2000;

Sternberg 2000).

2.7 Uses of Mental Tests

Classification, training, and education of mentally retarded individuals were the initial

sparks to development of mental tests. In general, mental tests have been used for

determination and analysis of individual differences in general intelligence and

aptitude. For example, mental tests are used for diagnostic purposes to estimate the

present ability of individuals, and for prognostic purposes to predict ability or

performance of individuals in the future on the basis of their present ability (Anastasi

& Urbina 1997).

Brown (1983) noted that there were three situations where tests were used as aids in

decision making about an individual, a group or some hypothesis. The first use was

37
selection, where the role of the test is used to select the most promising applicants,

those with the greatest probability of success. The second use of tests was for

placement to assign one or more individuals to several alternatives according to their

ability. A third use of tests was in diagnosis to identify the individual's strengths and

weakness and to determine a suitable program or treatment for him or her.

The purpose of using mental tests in schools was to estimate the mental ability of

students and provide them with educational or vocational guidance. Anastasi (1988,

p.4) stated:

At present, schools are among the largest test users. The classification
of children with reference to their ability to profit from different types of
school instruction, the identification of intellectually retarded in one hand and
gifted in the other, the diagnosis of academic failures, the educational and
vocational counselling of high school and college students, and the selection
of applicants for professional and other special school programs are among the
many educational uses of tests.

2.8 Use of Intelligence Tests

Psychological assessment often depends heavily on the use of standardized

intelligence tests. Therefore, the use of each intelligence test must be guided by

substantial research, including research on subgroup differences. The results that

address hypotheses that guide this study have the potential of adding to the research

database in this area (Edwards, 2003).

The use of intellectual and other forms of psychological and mental tests with

students who differ culturally, linguistically, or racially has been subject to

substantial controversy. Professionals responsible for assessment of culturally

different children frequently are uncertain which test instruments provide the most

valid, relevant and equitable results. Interest in providing fair and equitable mental

test results extended back several decades, but what is considered fair and objective

38
changed as values in our culture change (Oakland, 1976; Oakland & Laosa, 1976).

Differences in intelligence scores between different groups are considered

important, in part, since tests are statistically structured to distinguish between

individuals, and groups, because groups are aggregates of individuals. Intelligence

tests are designed carefully and deliberately to produce score variance (Wesson,

2000). The generation of a broad range of individual scores permits psychologists

to acquire knowledge and make judgments about, between, and within group

differences. This knowledge allowed interpretation of the distribution of scores that

led to various decisions (e.g., eligibility for placement in special education and

gifted programs) (Wesson, 2000 & Yoon, 2006).

Summarising uses of intelligence tests after the Second World War in the United

States, Samuda (1975, p.25) reached the following conclusion:

Intelligence tests play a vital role at all stages and in every aspect of a
person's life. From pre-school days through postgraduate years, tests are
administered for grouping and course selection purpose, for placement in
special education classes or special institutions, for career orientation, college
entrance, and admission to professions. A person's IQ score largely determined
the type of education he/she received and, ultimately, the type of position
he/she might occupy within society. Therefore, the concept of intelligence was
central to an individual's life.

It should be stressed that intelligence tests should be used alongside other methods as

interviews, history records or other test score before reaching to a decision regarding

any test taker. Layman (1968, p.8) pointed out the problem of using intelligence tests

alone for prediction and judgement. He stated:

Intelligence tests are far from perfect indicators of what sort of


schoolwork may be expected of a child, and they should be used thoughtfully
and with caution. Intelligence test scores should be not used as the sole basis
for judgements about a student.

39
According to Urbina (2004), the current uses of tests, which take place in a wide

variety of contexts, can be classified into three categories; decision making,

psychological research and self-understanding and personal development.

• Decision making:

The primary use of psychological tests has been as decision making tools. This

particular application of testing invariably involved value judgment on the part of one

or more decision makers who needed to determine the bases upon which to select

place, classify, diagnosis, otherwise deal with individuals, groups, organizations, or

programs.

When tests are used for making significant decisions about individuals or programs,

testing should be merely a part of a thorough and well-planned decision-making

strategy that takes into account a particular context in which the decisions are made,

the limitations of the tests, and other sources of data in addition to tests.

Unfortunately, very often–for reasons of expediency, carelessness, or lack of

information- tests have been made to bear the responsibility for flawed decisions-

making processes that placed too much weight on test results and neglected other

pertinent information.

• Psychological testing:

Tests have often been used in research in the fields of differential, development,

abnormal, educational, social, and vocational psychology, among others. They

provide a well-recognized method for studying the nature, development, and internal

relationships of cognitive, affective, and behavioral traits. It should be noted that the

advantages that psychological tests offer pertain to their characteristic efficiency and

40
objectivity.

• Self-understanding and personal development:

Most humanistic psychologists and counselors have traditionally perceived the field

of testing, often justifiably, as overemphasizing the labeling and categorization of

individuals in terms of rigid numerical criteria. Constance Fisher (1984) began using

tests in an individualized manner. This practice has evolved into the therapeutic model

of assessment espoused by Finn and Tonsager (1997). One of the most pertinent

applications of this model was in counseling and psychotherapeutic settings.

2.9 Culture-Free and Culture-Fair Tests

The use of tests in cultures other than the one for which it were originally designed,

and the issue of cultural bias have led psychologists to develop what they thought at

first to be culture-free tests.

The term culture fair test refers to tests that are not biased toward a particular cultural

group. Culture bias exists in a test when a member from one culture is discriminated

against in his or her ability to answer questions solely on the basis of the culture in

which he or she grew up (Corsini, 1984; Anastasi, 1988)). Anastasi & Urbina (1997)

mentioned that the concern with cross-cultural testing was recognised at least as early

as 1910, during the testing of waves of immigrants to the United States.

To overcome the cultural bias in ability tests, psychologists have tried to develop

culture-free tests that have no such bias. Their first attempt to develop a test of

intelligence which would be free of cultural influences was to minimise the use of

language if cultural groups spoke different languages. However they noticed that the

direct translation of test items from one language to another did not eliminate the

41
cultural differences, nor produce comparable tests (Anastasi & Urbina 1997).

Psychologists tried another approach, to develop non-verbal ability tests. Most of

these non-verbal tests contained information or emphasised pictorial and figural

content that a person raised in different culture may lack the experience to understand

and furthermore may seem pointless. Anastasi (1988) pointed out that non-verbal

tests are often used in hope of obtaining culture fair tests, but many researchers

believed that non-language tests may be more culturally loaded.

Kline (1979) argued that non-verbal tests in non-western cultures avoided the

language problem but encountered another perhaps more serious problem, when these

tasks seem pointless to subjects. Kline gave an example of this problem based upon

performance in the Porteus Mazes test. He stated that:

"when an old African who was tested was asked to trace the maze,
imagining he was asked to lead his cattle into the kraal, the old African replied
that he preferred not to, since any one who built a kraal like that was mad"
p.309.

A culture free test is meant to have a test with items that are unfamiliar to all subjects.

Technically, it has been proved that it is impossible to develop a test that is

completely free from cultural bias (Biesheuvel, 1969; Brislin et al., 1973; Noll and

Scannell, 1979; Brown, 1983; Anastasi, 1988).

Anastasi (1988, p.357) reviewed the problem of culture free tests and concluded that:

Since all behaviors are affected by the cultural milieu in which the
individual is reared and since psychological tests are but samples of behavior,
cultural influences will and should be reflected in test performance. It is
therefore futile to try to devise a test that is "free" from cultural influences.

Noll and Scannell, (1979) had the same opinion. They stated that no test could be

42
culture free, since the only way to respond to it is in terms of what has been learned,

that is, in terms of one's culture.

Again to overcome this problem, psychologists shifted to the development of culture-

fair tests. They believed that to have a culture-fair test, all test items should be equally

familiar to all subjects. Biesheuvel (1969) defined culture-fair tests as tests which

avoid culture-bound features such as emphasis on speed of performance, pictures

presenting objects or situations that lack universality.

Brown (1983) believed that culture-fair tests, though not eliminating culture effects,

attempted to make the tests equally fair to all persons by controlling certain critical

variables, such as, language, speed in responding within limited time, and differences

in competitive motivation between cultures.

Summarising the problem of speed in responding within limited time, Samuda (1975)

reported that many researchers found that the attitude toward speed varies greatly in

different cultures and not all people will work on the test with equal interest in getting

it done in the shortest time possible. For example, they found that the injunction to

"do this as quickly as you can" seemed to make no impression whatsoever on the

American Indian children.

Anastasi (1988) also mentioned that the present objective in cross-cultural testing is to

develop tests that presuppose only experiences that are common to different cultures.

For this reason, such terms as culture-common, culture-fair, culture-reduced and

cross-cultural have replaced the earlier “culture-free” term.

Kline (1979) concluded that for cross-cultural test construction it was best to use our

knowledge and experience of the culture as a guideline to writing items, and to retain

43
those that show themselves to be criteria-based or valid in factor analysis. Such tests

enable the cross-cultural psychologist to elucidate the environmental factors

influencing the major ability factors which is one of the stated aims of cross-cultural

psychologists.

The following are examples of culture-fair tests that have been used in cross-cultural

testing; Porteus Maze Test 1913, Kohs Block Design Test 1923, Goodenough-Harris

Drawing Test 1926, Raven's Progressive Matrices 1938, Cattell's Culture Free Test

1940 (in the late 1950s, Cattell changed the term "Culture-Free Test" to Culture-Fair

Test), D48 Test (dominoes) 1948, and Witkin's Embedded Figures Test 1945.

Brislin et al., 1973; Kline, 1979; Raven, 1989; Murphy and Davidshover, 1991

believed that Raven's Progressive Matrices was one of the most widely used

intelligence or ability tests in cross-cultural research.

2.10 Achievement Tests

Achievement tests were intended to measure the individual's actual learning of

educational subject matter after a period of instruction. They were not designed for

prediction. Instead, they measured what has been learned or the mastery of school

subjects (Freeman, 1962).

Achievement tests served many functions. Aiken (1988, p.125) outlined the

following: (a) to determine how much people knew about certain topics or how well

they can perform certain skills; (b) to inform students, as well as their teachers and

parents, about students' scholastic accomplishments and deficiencies; (c) to motivate

students to learn; (d) to provide teachers and school administrators with information

to plan or modify the curriculum; and (e) to serve as a means of evaluating the

44
instructional program and staff.

The distinction between achievement and intelligence or aptitude tests is not simple.

Anastasi (1988, p.412) believed that differences between achievement and aptitude

tests were in the degree of uniformity or relevant antecedent experience. Thus

aptitude tests measured the effects of learning under uncontrolled and unknown

conditions, whereas achievement tests measured the effects of learning that occurred

under partially known and controlled conditions. In differentiating between aptitude

and achievement tests she stated:

No distinction between aptitude and achievement tests can be rigidly


applied.... We should especially guard against the naive assumption that
achievement tests measure the effects of learning, while aptitude tests measure
innate capacity independent of learning.

Jensen (1980, p.239) also argued that all performance was a form of achievement, and

of course there is no performance free psychological test. To distinguish between

intelligence or aptitude and achievement tests, Jensen outlined the following points;

a) Intelligence tests are much broader and more heterogeneous based on a wide

variety of experiences than are achievement tests which have specific types of

knowledge associated with formal schooling.

b) Intelligence tests sample cumulated knowledge and skills from the individual's past

experience, whereas achievement tests sampled knowledge acquired in the recent

past.

c) Intelligence tests predict future intellectual achievement, even though the contents

of the achievement have nothing in common with the aptitude tests.

d) Most intelligence measures are more stable across time and are less susceptible to

45
the influence of instruction or training than most achievement tests.

Aiken (1988) believed that the distinction between achievement tests and intelligence

tests can be made in terms of focus. Achievement tests focus more on the present,

what the person knows or can do now, whereas intelligence tests focus on the future

or what a person should be able to do with further education or training.

Sattler (1982) pointed out that intelligence tests and achievement tests have

commonalties as well as differences. Both tests sample aptitude and learning.

However, intelligence tests are broader in coverage than achievement tests and sample

from a wider range of experience. Achievement tests, such as reading and

mathematical tests, are heavily dependent on formal learning experiences that are

acquired in school or at home which make them more culture bound than are

intelligence tests. Sattler added that intelligence tests stress the ability to apply

information in new and different ways, while achievement tests stress mastery of

factual information. Thus, intelligence tests measure less formal achievement than do

achievement tests.

Achievement tests can be divided into standardised and teacher-made tests. The

former mainly differ from teacher- made tests in that they are intended to be used over

a period of many years, and cover a broader range of skills and educational objectives

common to many schools. The term standardised refers to specific instructions for

administration and scoring. Teacher-made tests are tests designed to assess the

academic progress of students in a particular classroom, not to give broad

comparisons across schools. Teacher-made tests are sometimes called classroom tests

or "informal" tests, and are constructed by classroom teachers for use in their

particular classes under conditions of their choosing (Ahmann and Gluck 1976).

46
Brown (1983) distinguished between teacher-made and standardised achievement

tests. Brown stated that for the teacher-made tests, teacher will refer to textbook

assignment, supplemental reading lists, lecture outline and class discussions as

sources of items. Standardised tests developed by test publishers will consider not

one text, but the most commonly used material covered, not by one teacher, but by a

variety of teachers and experts.

Aiken (1988) believed that teacher-made and standardised tests are complementing

rather than replacing each other. He distinguished between teacher-made and

standardised achievement tests. He stated that a teacher made test is more specific to a

particular teacher, classroom, and a unit of study and is easier to keep up to date.

Standardised tests, on the other hand, are built around a core of general educational

objectives common to many different schools. In addition to being more carefully

constructed and having broader content coverage than teacher-made tests,

standardised tests have norms and higher reliability coefficients.

2.11 Intelligence and academic achievement

Intelligence and education are so intimately bound together that it would be

impossible to understand intelligence without knowing about its relation to education.

Intelligence is considered to be the child of education because the field of intelligence

testing was born from the need to develop a test that would predict children’s school

success (Sternberg, 2000).

The study of intelligence and education provides an example for the fruitful

interaction between the practical demands of educators and the basic research focus of

cognitive scientists (Sternberg, 2000). As mandatory public education became

commonplace by the late 1800s educators were confronted with overwhelming

47
observation: students of some chronological age displayed a range of individual

difference in intellectual ability (Sternberg, 2000).

The study of intelligence has been motivated by the practical problems of education.

By 1905, Binet and his colleagues achieved a solution that was innovative,

straightforward, and most important, successful- the development of the Binet-Simon

intelligence scale. In this scale if a child failed to answer correctly questions that most

other children of the same age could answer, the child was considered below average

in the ability to learn. Likewise, if a child was able to answer questions that most

other same-aged children could not answer, the child could be considered above

average in the ability to learn. These were based on the assumption that all children at

the same age level had the same opportunities to learn. Binet’s test was successful to

some extent in predicting children’s ability to learn in school. This test has served as

the basis for subsequent intelligence tests (Sternberg, 2000).

Academic achievement at school is the result of learning and problem solving ability

(Bester, 1998) Intelligence is seen to be the ability to think and learn and is therefore

considered to be fundamental to academic achievement.

In the literature, correlations between tests of general intelligence and measures of

academic performance were reported as being usually close to 0.50 (Brody 1992;

Neisser, Boodoo, Bouchard, Boykin, Boykin, Brody, Ceci, Halpern, Loehlin, Perloff,

Sternberg & Urbine, 1996) but can be as high as 0.75 (Jensen, 1998).

Studies have shown that IQs predict educational achievement. IQs predict subsequent

educational achievement at a magnitude of a correlation of around .5 to .7. IQ

determines the efficiency of learning and comprehension of all cognitive tasks. The

48
correlations between IQ and subsequent educational attainment were not perfect

because educational attainment is partly determined by motivation, interests,

compliance and the effectiveness of teaching. Nevertheless the correlations are

substantial and show that intelligence tests measured real cognitive abilities that are

also expressed in educational attainment (Lynn and Vanhanen, 2006).

Many empirical investigations have shown that intelligence is the best single predictor

of academic success. Horn et al. (1993) in their study on undergraduate university

students, developed a path model to show the relative influence of different variables

on achievement. They found that when compared to other factors, such as previous

knowledge and motivational factors, general intelligence was found to have a highly

significant direct effect on achievement, independent of any the other variable in the

model. Intelligence showed a correlation of 0.55 with achievement, explaining 30% of

the students’ performance in this study.

Chen, Lee and Stevenson (1996) carried out a study investigating the relative

contribution of intelligence, previous achievement and family factors to later school

achievement in Chinese, Japanese and American cultures. It was concluded that there

were similar correlations between intelligence and academic achievement for each

culture investigated. Participants were administered intelligence tests in Grade 1 and

their achievement was tested 10 years later in grade 11. The single most predictive

variable for Grade 11 achievement in mathematics, reading and general knowledge

was general intelligence. The study found a correlation of between 0.48 and 0.53 for

mathematics achievement, between 0.28 and 0.51 for reading and 0.35 and 0.44 for

general knowledge. Gagne and St Pere (2002) in a study comparing the predictive

values of intelligence, motivation and persistence, similarly found that cognitive

49
abilities were by far the best predictor of school achievement. In this test, it was found

that intelligence correlates with an achievement of between 0.36 and 0.56.

Verbal ability, as measured in intelligence tests, appears to contribute most to

achievement in scholastic success. Thompson and Plomin (1991) conducted an

investigation to ascertain the correlations between different measures of intelligence

and achievement in reading, mathematics and general language tasks from grade 1 to

6. The researchers found that the correlation between verbal ability and achievement

was higher than correlation between other measures of intelligence.

The abovementioned study showed the importance of verbal intelligence with regard

to academic achievement, but the results revealed that other measures of intelligence

are also important in predicting scholastic success. In the study carried out by

Thompson et al. (1991), spatial intelligence, as measured by a spatial relations test

and a hidden patterns test, was found to be a good predictor of scholastic success in

reading and mathematics. Spatial intelligence was, however, a less powerful predictor

than verbal ability of achievement in the general language area. In the study carried

out by Marais (1992) it was shown that the ability to do mathematics, accountancy

and general science appeared to require the contribution of both verbal and nonverbal

abilities.

2.12 Increase in IQ with time

Discourse on IQ differences should reference substantial increases in intelligence

scores during the last 60 years. Scores on measures of intellectual functioning have

risen, and in some cases rather sharply, during this period (Flynn, 1999; Neisser,

1998). Analysis of intelligence data from several countries (e.g., Belgium, France,

Norway, Denmark, Germany, Austria, Switzerland, Japan, China, Israel, Brazil,

50
Canada, Britain, and the United States of America) found, without exception, large

gains in IQs over time (Flynn, 1998). The pattern of gains corresponded with the

worldwide move from an agriculture-based economy to industrialization (Flynn,

1987, 1994, 1999; Raven, Raven, & Court, 1993). Average IQs have risen by about

three points a decade during the last 50 years (Flynn, 1999). These IQ gains across

decades, referred to as the "Flynn effect," provided evidence that gains in average IQ

were part of a persistent and perhaps universal phenomenon (Flynn, 1999; Herrnstein

& Murray, 1994). Gains were most dramatic on tests that assesed a general factor, g,

of intelligence. One of the best examples of an intelligence test that primarily

measured “g” was the Raven's Progressive Matrices (Jensen, 1980).

Research with the Raven's Progressive Matrices is particularly relevant because of

the finding that it is considered to be the best-known, most extensively researched,

and most widely used culture-free test of intelligence (Jensen, 1980). Many scholars

believe the test measures ‘‘g’’ and might be the most reliable measure to identify

intellectually able children from impoverished backgrounds (Jensen, 1980). However,

Raven's scores are highly influenced by environmental variables. To illustrate, all 18-

year-old males in the Netherlands took an adaptation of the Raven's upon entrance

into the military. Data available from this population revealed the mean scores of

those tested between 1952 and 1982 rose 21 IQ points. Genetic changes within

populations could not occur in such a short time span (Flynn, 1999). Therefore, the

increase in Raven's IQs must have been a function of changes in the environment

(Neisser, 1998). Current geometric rates of change in society (e.g., improvements in

nutrition, the acquisition of information as a result of computers and the internet) led

to concomitant changes in population IQs and, important to this study, changes in

subgroup IQ differences. The unknown factors producing secular IQ gains over

51
generations may also occur within generations and lead to IQ differences among

subgroups (Flynn, 1987). Thus, the finding of substantial changes in population IQs

over time raises the question as to whether the historically observed pattern of mean

IQ differences among racial/ethnic groups also show substantial change.

Most of these IQ increases have been reported in economically developed nations but

IQ increases have also been found in few economically developing countries

including Brazil (Colom, Flores-Mendoza & Abad, 2007), Dominica (Meisenberg,

Lawless, Lambert & Newton, 2005), Kenya (Daley, Whaley, Sigman, Espinosa &

Neuman, 2003), and Sudan (Khaleefa & Lynn, 2009).

Have increases been greater for fluid IQ (non-verbal & reasoning abilities) than for

crystallized intelligence (verbal and educational abilities) and if so, why? Wheeler

(1942) appeared to be the first to find greater gains in non-verbal than in verbal

abilities in a report regarding the increase in IQs in East Tennessee children aged 6-16

over the years 1930- 40. The average gain was considerably greater for non-verbal

ability (6.0 IQ points per decade) than for verbal ability (2.6 IQ points per decade).

In 1982 Lynn (1982) showed that IQs had increased in Japan over the preceding three

decades. The result of Lynn’s study was confirmed in many other studies in a number

of countries Flynn (1987, 2007), Lynn & Hampson (1986), and Lynn (1990b). Lynn

& Hampson (1986) showed that in Britain fluid intelligence measured by the Standard

Progressive Matrices in children aged 7-15 years increased by 1.86 IQ points a decade

for the years 1938 to 1979. Lynn (2009) has shown that approximately the same gain

took place over the years 1979 to 2008

Has the amount of increase been the same at all ability levels or greater among lower

52
IQ groups? This question was addressed by Cattell (1951) in his study on the IQ

increase in Britain (1936-49) in which he reported that the gain was only present in

the lower half of the distribution. In an early study, Elley (1969) reported that IQ

gains in New Zealand (1936-68) were smallest in children of professional parents and

greatest in children of unskilled parents. Other studies finding greater gains among

those at lower levels of ability have been reported for Denmark (Teasdale & Owen,

1987, 1989, 2008), Norway (Sundet, Barlaug & Torjussen, 2004) and Spain (Colom,

Lluis-Font & Andres-Pueyo, 2005). However, gains have been equally great among

those at higher levels of ability in France, Netherlands and United States (Flynn,

2007, p.104), while Spitz (1989) has reported that gains in the United States have

been greatest at the average IQ level. A number of studies noted in the introduction

have reported that the IQ increase has been greater among lower IQ groups but there

have also been some studies finding that the increases have been the same at all ability

levels. Lynn’s (2009) data confirmed the previous studies showing greater IQ

increases in the lower range of the ability distribution.

What factor or factors have been responsible for the IQ increase? Nine principal

theories have been advanced. These were:

(1) Increased test sophistication. Flynn has recorded that when he began working on

the effect, he canvassed expert opinion and reported that “scholarly correspondents of

high competence (H.J.Eysenck, J.C.Loehlin, D.Zeaman) have offered two possible

causes of IQ gains over time, increased test sophistication and a rising level of

educational achievement” (Flynn, 1984, p.47). These two factors had been advanced

some decades earlier by Tuddenham (1948) in another early report of the effect, while

increased test sophistication has subsequently been endorsed by Jensen (1998, p.327)

53
who wrote of “increasing test wiseness from more frequent use of tests”.

(2) Improvement in educational achievement was the other factor cited by scholars of

high competence from whom Flynn sought advice. This had also been advanced some

decades earlier by Tuddenham (1948, p.56) who stated “the superior performance of

the World War II group can be accounted for largely in terms of education”. Flynn

(2007) also endorsed the improvement in the education theory

Many others have favoured the ‘improvement in education theory’ of the Flynn effect,

including Cattell (1971, p.275) who stated: “the inter-generational changes …

probably represent the unquestionably marked improvement in schooling”. The

research of Teasdale and Owen (1994, p.333), Jensen (1998, p.324), Meisenberg,

Lawless, Lambert and Newton (2006, p. 273), Weede and Kampf (2002, p.365),

Stelzl, Merz, Ehlers and Remer (1995, p.294), Flieller (1999, p.1056), Garlick (2002),

Blair, Gamson, Thorne, and Baker (2005), all supported the following statement taken

from Meadows, Herrick, Feiler, et al. (2007, p.58) which stated: “its likeliest cause

may be improvements in education reflecting more effective teaching”.

(3) The greater complexity of more recent environments provides greater cognitive

stimulation arising from, for example, television, media and computer games. The

following quotes are all taken from research that broadly agree with this point:

• “The complexity of the modern world causes massive intelligence gains”

(Vincent, 1993, p.62)

• “Computer games have always been my favourite candidate” (Wolf, 2005,

p.15)

54
• “Growing exposure to and awareness of the kinds of problems found in

intelligence tests is enough to account for the small increases observed”

(Rabbitt, 2006, p 674)

• “Television and other mass media may have left their mark” (Elley, 1969)

• The reasons given are: “Wider exposure to mass media” (Jensen, 1998,

p.326)

• The reasons given are: “TV, video games and computers” (Greenfield, 1998,

p.91).

(4) Improvements in child rearing, e.g. “Better educated parents have more

enlightened views on child rearing” (Elley, 1969), and “…better child rearing

practices as a partial explanation for the increase in children’s scores on intelligence

tests” (Flieller, 1996).

(5) More confident test-taking attitudes have been advanced by Brand (1987) and

Brand, Freshwater and Dockrell (1989). They suggested that increasing liberalism,

permissiveness, and risk-taking promoted speed and guessing, which in turn increased

test scores.


(6) Reduction in family size. This has been advanced by Flynn (2007, p.356) who

dismissed nutrition and wrote “better education and smaller families are much more

plausible (reasons)”.


(7) The “individual multiplier” and the "social multiplier" theories have been

proposed by Dickens and Flynn (2001) and elaborated by Flynn (2007). The concept

of the “individual multiplier” was that intelligent people have a thirst for cognitive

stimulation and this increased their intelligence through positive feedback. The "social

multiplier" posited that “other people are the most important feature of our cognitive

55
development and the mean IQ of our social environs is a potent influence on our own

IQ” (Flynn, 2007). This led Flynn to predict that children brought up in a university

town should have higher intelligence that those without this advantage, because the

high intelligence of the professors would enhance intelligence of the population.

(8) Heterosis: Jensen (1998, p.327) has suggested that the genetic factor of heterosis

(hybrid vigor) could have contributed to the Flynn effect. Heterosis resulted from the

mating of two persons from different ancestral lines. Jensen argued this has probably

increased in the United States as a result of immigration from many different

countries. Further arguments for the heterosis theory have been advanced by Mingroni

(2004).

(9) Improvements in nutrition as a reason has been advanced by Lynn (1990, 1993

1998), who has pointed out that nutrition affected intelligence, and that the quality of

nutrition had improved over the course of the twentieth century. This has been

responsible for increases in height and brain size of about the same magnitude as have

occurred for intelligence. This theory has been endorsed by Jensen (1998, p.325) and

by Colom, Flores-Mendoza and Abad (2007) as one among a number of causal

factors.


Endorsed as one causal factor by Arija, Esparo, Fernandez-Ballart et al. (2006),

Colom, Lluis-Font & Andres-Pueyo (2005), and Jensen (1998, p.325) was better able

to explain the large IQ gains of 4 year olds and the larger gains of fluid intelligence

than of crystallized intelligence. The nutrition theory posited that the crucial effect of

improvement in nutrition impacted on fetus and infants when the brain is growing,

and had little subsequent effect. Hence the IQ gains should be fully present in 4 year

olds and should not show increased effects in older children. The improvement in

56
utrition theory can also explain the greater improvement in fluid than in crystallized

intelligence, because numerous studies have shown that fluid ability is more

vulnerable to cerebral criticism, including sub-optimal nutrition (Lynn, 1990a, 1993,

1998). Hence, as sub-optimal nutrition has declined during the last century, fluid

ability had increased more than crystallized ability.

In addition, Lynn (2009) showed greater IQ gains among those with lower ability

which also might be explained by the improvement in nutrition theory. Those at the

lower ability levels are more likely to have had sub-optimal nutrition in earlier times

and have benefited more from the improvements in nutrition that have followed rising

living standards during the last century. It is doubtful whether any prediction

regarding the size of gains at different ability levels can be made from the increases

and/or improvement in education theory or other variants of greater cognitive

stimulation theory (Lynn, 2009). However, Flynn (2007) had argued against the

theory on the grounds that increases in height have ceased in the United States

whereas increases in intelligence have continued.

2.13 Chapter Summary

This thesis is primarily concerned with intelligence in Libya. A detailed account of

intelligence was discussed. This chapter introduced the concept of intelligence and

summarized the different definitions of intelligence. It showed that although of the

great efforts of researchers in this matter, it was concluded that it is a construct

difficult to define. In addition, the chapter has presented an overview of the evolution

of intelligence and intelligence testing, the contribution of scholars in this field and

theories of intelligence. Many researchers believe the identification of mental

retardation was the problem that stimulated Sequin, Esquiral and Binet to develop

57
psychological tests. Galton and Cattell both had the idea that intelligence would be

expressed in the form of sensitivity of perception, so they used tests to measure this.

In 1905 Binet and Simon prepared the first IQ test which has been the most widely

used test of intelligence in many countries. The need for rapid testing of a large group

of subjects came with the First World War when in 1917 a group of American

psychologists developed the Army Alpha and Beta group test.

Binet’s test of intelligence and Spearman two factor theory were the starting point for

the theory of intelligence in the twentieth century.

This chapter has also presented the definitions, classification and use of mental tests.

Tests can be classified according to timing, procedure of administration and test

content. In general tests are used for selection, placement and diagnosis purposes.

The problem of culture bias arose when intelligence tests were used in cultures other

than the one for which they were designed. Researchers explored culture free tests

which minimized the use of language, and then they developed the culture fair test in

which test content is familiar to all subjects. Other researchers believed that there is

no such thing as a free or fair test. Issues surrounding the definitions of intelligence

and the differences between intelligence and achievement tests have been covered.

Finally, the chapter discussed the issue of IQ increase with time and evaluated the

reasons behind it.

The next chapter will introduce Libya, the educational system and intelligence testing

in Libya. In addition, the study aims, objectives and rationales will be evaluated.

58
Chapter three: RATIONALE AND STATEMENT OF PROBLEM

3.1 Introduction

Libya is a country in northern Africa. The name "Libya" is derived from the Egyptian

term "Libu", which refers to one of the tribes of Berber peoples living west of the

Nile. In Greek this became "Libya", although in ancient Greece the term had a

broader meaning, encompassing all of North Africa west of Egypt, and sometimes

referring to the entire continent of Africa. Bordering the Mediterranean Sea to the

north, Libya lies between Egypt to the east, Sudan to the southeast, Chad and Niger to

the south, and Algeria and Tunisia to the west. With an area of almost 1.8 million

square kilometres (700,000 sq mi), Libya is the fourth largest country in Africa by

area, and the seventeenth largest in the world.

Most of Libya’s people are descended from a mixture of Berbers, the country’s

original inhabitants, and Arabs, who arrived in the 7th century AD. Small numbers of

Berbers still live in the far south of the country. Libyan people are Muslims, and

Islam is the official state religion. Arabic is the official language. The southern

mountains and deserts occupy two third of the country, the remaining third are the

fertile agricultural plains of the north.

Urbanisation refers to the rise in the proportion of the total population living in urban

areas. Urban population increases: 1) when the number birth of exceeds death, and 2)

when there is migration from rural areas. (Yenigul, 2005). Urbanisation as a

phenomenon has been clearly described by Ravbar (1997, p. 70) in these words:

Urbanisation includes all events and changes related to the


consequences of the change way of life and work. Therefore,
urbanisation by nature represents a very interwoven and complex
process and is dependent on deagrarianisation, industrialisation,

59
migration, the upward mobility of the population, and the growth of
city function.

Urbanisation is not a new phenomenon in the Libyan society as many old civilisations

had, at different periods of time, their impacts on Libya and built towns and large

cities (Kezieri, 1995).

According to the General Authority of Information in the 2006 census, Libya has a

population of about 5.3 million with a growth rate of 1.9 %. One third of the

population are under 15 years of age, and 89.03 % are urban. The literacy for both

sexes (10 years and above) was 88.5%, (males 93.7% and females 83.11%). The gap

is narrowing because of increased female school attendance. Nelson (1979)

mentioned that at independence in 1951 the overall literacy rate among the Libyans

over the age of ten years did not exceed 20 percent. By 1977 the overall rate had risen

to 51%, (73% males and 31% females). The Libyan economy depends mainly on oil

exports and petrochemical industrial products.

The following section, section two, provides a short description of the education

system in Libya. Whilst the third section is concerned with intelligence testing in

Libya. The fourth and fifth sections are about adoption of intelligence tests and

Standard Progressive Matrices (SPM) test respectively. The sixthsection highlights

the statement of the problem and study rationale. The seventh, eighth and ninth

sections deal with study aim, research questions and objectives. The final section

presents a summary of the chapter.

3.2 Education System in Libya

A detailed and comprehensive report about the educational system in Libya has been

published at the International Conference on Education, in Geneva 2004. It sets the

60
general framework of educational system. Education in Libya is free for all

individuals’ at all educational levels and compulsory for elementary, preparatory and

secondary school age children (6-15 years). The Ministry of Education supervises the

educational policies, and determines the general guidance of schools curriculum,

textbooks, and method of teaching. Preparatory and high schools are segregated by

sex except in rural schools due to lack of school buildings or teachers.

The school year begins in September and ends in May, and classes are held six days

from Saturday to Thursday every week from 8:00 am to 1:00 p.m. The school system

in Libya is organised on a twelve-year basis, and is divided into three levels:

1. Elementary education level: this level covers the first six years of study (age

6-11 year). In the first three years students study courses in arabic language,

religious education, mathematics, drawing and physical education. From grade

4 up to 6 grade ( the end of elementary education level ) students study

courses in Arabic language, religious education, mathematics, history,

geography, basic science, drawing and physical education.

2. Preparatory education level: from grade 7 to 9 (age 12-14 year). In this level

students study courses in Arabic and English language, religious education,

mathematics, science, history, geography, sociology, drawing and physical

education.

From grade 4 up to grade 8 at end of each school year students sit for an exam to

transfer to the next grade. These exams are prepared by teachers at school level.

At the end of grade 9 students sit for a local exam prepared by a committee of

teachers at the municipality level, to obtain the certificate of preparatory education,

61
which in turn is required for admission to secondary level. Students must pass this

examination.

3. Secondary education level: covers the period from grade 10 to 12 (age 15 - 17

year). Secondary education is divided into four specialities; biology,

engineering, social and economical. Depending on marks in grade 9 and

student’s interest, students are allocated into one of the four different

specialities. Because of higher pay, status and salary enjoyed by engineers and

medical doctors more students prefer to choose science branch.

At the end of grade 12 students sit for the General Secondary Certification Exam, a

centralised national exam. These exams are run by the Ministry of Education and are

prepared by a committee of teachers and inspectors at the national level whom

construct the exams for all schools in Libya.

The student's progress depends upon his/her passing the national exams which include

a two to three hours written examination in each subject in the final year. The General

Secondary Certification is a prerequisite for admission into university. The grading

system in the final examination depends on the total scores in all subjects, as follows;

less than 50 % fail, 50 % to 64 % pass, 65 % to 74 % good, 75 % to 84 % very good,

and 85 % and above excellent.

Usually students that successfully finish high school directly get enrolled into the

universities, because work opportunities are extremely limited for high school

diploma-holders, whereas university diploma-holders have a much better chance in

obtaining a job.

62
The selection of students for universities is done by the Ministry of Education

depending on the secondary speciality chosen; the student is then enrolled in a

suitable faculty at the university. The educational system in universities depends on

speciality studied. It can be a semester or year system. In addition to undergraduate

studies, postgraduate studies including master and PhD degrees and advanced

diplomas in various specialisation areas are offered (Said lagga et al., 2004).

3.3 Intelligence testing in Libya

During recent decades, due largely to concerted efforts in economic and social

planning, Libya had witnessed considerable expansion in the education sector.

Hundreds of schools have been built, many universities have been established, a great

number of students have studied at home and have ventured further afield into Europe

and other parts of the Western hemisphere to study higher education in different

fields. In addition, educational policy and administration has been reshaped

Whilst all of these events have occurred, some areas have not benefited from the

positive effects of development in the field of education. Significantly, to date no

single test of intellectual ability has been officially adopted or developed to be used

for the measurement of intelligence in Libya. Many sectors in Libya use examination

grades as the primary method in determining who should be accepted for study at

various academic establishments and for various jobs in the vocational sector. These

grades were used in some cases as the primary criterion for identifying both gifted and

mentally retarded children and in addition were used for guidance and counseling

purposes. There is no reason, however, to believe that all examination grades have a

direct relationship or correlation to measurement of intelligence, let alone for

63
guidance and counseling purposes Although it might be considered as a good

criterion for such purposes, additional criterions are desirable.

Testing experts feel ambivalent about school based assessment. Such assessments are

not standardised, criteria vary from school to school and from teacher to teacher

(Heyneman, 1987). Durojaiye (1984) gave one reason for using examination grades

for selection in Africa. Durojaiye believed that school leaving results are more often

used in Africa for selection instead of ability tests because of the shortage of

psychologists in many African countries. The scarcity of trained psychologists in

developing countries makes the adoption of tests from western countries necessary

(Vernon, 1969; Miron, 1977, Majdub 2004).

Owing to the lack of any prevailing local intelligence tests researchers have

historically sought to do their research or projects using personality tests such as

Sentence Completion, Thematic Apperception Test (TAT), or other projective tests

because there were some colloquial Egyptian Arabic translations and adaptations for

these types of test. Students viewed personality tests as easier to administer and

interpret than intelligence tests (Attashani and Abdalla 2005). After graduation most

of these students became teachers at secondary schools, and few of them act as

psychologists even though not qualified. Although they had theoretical knowledge

about psychology and psychological testing, they did not have access to a wide range

of intelligence tests, due to shortage of viable local options

Mahdawi and Al-Roey (1991) in their study of mental health program in Libya

mentioned that the mental health services suffered from shortage of staff,

psychological services and a lack of facilities. They concluded that as the main

problem seemed to be manpower shortage, special efforts should be taken to train

64
more health personnel and community members such as teachers to deliver

psychological and psychiatric services From the above mentioned, it would appear

that at present the academic system in Libya fails to provide what is essential and

necessary for Libyan psychologists and researchers, especially in the area of

psychological testing

Kline (1979) pointed out that intelligence was a variable which is important and has a

definite meaning to Western people. However, the general public in Libya knows

little about the usefulness, purposes, or functions of intelligence and aptitude tests.

For some people IQ testing is something that was associated with psychological or

mental testing. This may point towards a stigma attached with this type of testing

which could be indicative of a cultural and social perception

Psychologists have taken many precautions in developing tests but there was

widespread misuse and misunderstanding in developing countries. Thus many people

have misgivings about tests and their use in decision-making. Part of this

misunderstanding could be attributed to the inadequate knowledge of tests by the

people who use them. Alexopoulos (1979) noted that misuse of tests may cause harm

to the testing movement.

Many researchers have studied the problems of misuse of test scores or use of

incomplete test scores for selection and prediction purpose. For example: Parmar

(1989) in India found that the information subtest of Wechsler Intelligence Scale for

Children-Revised (WISC-R) is simply deleted when testing Indian subjects and this

scale was not considered when computing IQ scores. He concluded that the use of the

incomplete test was likely to bias predictions based on test results and had serious

negative implications for educational or clinical decisions

65
Georgas & Georgas (1972) in their study of the use and misuse of intelligence tests in

Greece argued that the use of incomplete test scores for estimation of mental ability

might result in invalid assessment, leading to grave consequences on the lives of

individuals. Bertrand and Cebula (1980) believed that tests in themselves are not bad

and do not hurt children. However, they become bad only in the hands of those who

administer and interpret them poorly.

Sattler (1982, p.4) concluded that intelligence tests are tools which maybe useful in

accomplishing goals, and their effectiveness will depend on the skill and knowledge

of the psychologist. He stated:

When they used wisely and cautiously, they will assist us in helping
children, parents, teachers and other professionals obtain valuable
insight. When used inappropriately, they may mislead and cause harm
and grief.

It is interesting to note that the first IQ test (the Binet-Simon Scale) was constructed

in France in 1905 as a contribution to identify mentally retarded children who did not

profit from regular classroom instruction. Failure to achieve a good assessment for

the mental ability of retarded child at an early age made the problem worse in the

future especially for purposes like special education or rehabilitation programs. It is

believed that the misuse of intelligence tests led to inaccurate prediction,

misplacement, and inappropriate treatment of children. Tests for such purposes

should be well standardised for the local population, also they have to be reliable,

valid and used by experts only.

Other areas that have been affected by lack of intelligence tests in Libya were the

selection of students for different educational programs (e.g. gifted and special needs

programs). Intelligence tests play an important role in the educational and economic

66
system of a society because they prevent waste of human resources due to

misplacement of abilities or interests (Attashani & Abdalla 2005). It is believed that

failure to allocate students according to their abilities and interests deprived the

country from one of its most valuable resources In addition, this also had an adverse

effect on business and commerce where employees scoring well in tests might not

necessarily possess the attributes to perform the job effectively.

In Libya today, a relevant and accurate selection procedure is required more than ever

before, not only in the field of education but also as an intermediate level of training

for skilled manpower. Indeed, a clear failing of the current system could be seen

whereby many university graduates were posted to office work which could be done

by less qualified people (Attashani and Abdalla 2005).

Durojaiye (1984) believed that selection of students for educational purposes is very

necessary in most developing countries in Africa. This is because secondary and

university education is not compulsory, and a large number of students aspire to the

few places in the limited number of schools and universities. He stated for this reason

the best testing apparatus had to be devised for selecting students who will benefit

from their education and later meet the high demand for manpower requirements of

these developing countries.

Jensen (1981, p.19) believed that using standardised tests for selection was necessary

and unavoidable when number of applicants for university far exceeds the number

that can be enrolled. He stated:

Results of standardised test are unquestionably better for making direct


comparisons between applicants than any other means of selection, and
they can add substantially to the accuracy of prediction of applicant's
future performance.

67
The problem of adapting intelligence tests to a new setting was by no means

uncommon as this was a general problem for many developing countries in the past.

In addition, if the aim was to assess the mental ability of people in to a culture that has

yet to develop its own testing scheme or system, it was necessary to assess what was

important in and for that culture (Brislin and Thorndike, 1973). Ortar (1972), for

example, mentioned that most countries did not produce their own psychological tests

and had to adapt and modify instruments developed elsewhere to make them suitable

for local subjects

Schwarz & Krug (1972, p.3) in their book about ability testing in developing

countries pointed out that educators and researchers in developing countries held

widely divergent view about test adaptation. They stated:

At one extreme there are those who look mainly at the vast
environmental differences between the developing countries and the
highly industrialised nation, and conclude that any test designed for
one ipso facto can not serve the other. At the other extreme there are
those who attach greater importance to the fact that the skills needed
in both developed and developing countries are exactly the same, and
who fear that "simplified" tests will hamper them in producing
equally high levels of skill in their own population.

Schwarz & Krug concluded that neither view was correct because one view would

exclude all classic testing procedures from use in developing countries, since they

were designed in and for the Western culture, and the other view would oppose the

use of anything else, since this would be a tacit acceptance of lower performance

standards.

3.4 Adoption of intelligence tests

In this regard, Ezeilo (1978) suggested that African researchers and psychologists

might use one of three approaches

68
. Design their own test to the local environment; this involves a great deal of time

and effort

. Modify a widely used international test by introducing some changes in its items,

then standardize and obtain local norms

. Use an international culture-free test after standardization and achievement of local

norms

The third choice was the most frequently applicable in the field of the measurement of

mental abilities and personality traits. It required less time and effort than the first two

alternatives Therefore, this approach was applied in this study. The Raven’s

Progressive Matrices test was employed because it has been widely used and enjoys

moderately high indices of validity and reliability when used in a wide range of

cultures.

Kline (1979) concluded that for cross-cultural test construction it was best to use

one’s knowledge and experience of the culture as a guideline to writing items, and

retain those that show themselves to be criteria-based or valid in factor analysis. Such

tests enable cross-cultural psychologists to elucidate environmental factors

influencing major ability factors. This was one of the stated aims of cross-cultural

psychologists

Raven's Progressive Matrices test is an example of a culture-fair test that has been

used in cross-cultural testing Brislin et al. (1973), Kline (1979), Raven (1989), and

Murphy and Davidshover (1991) held that Raven's Progressive Matrices was one of

the most widely used intelligence or ability tests in cross-cultural research.

69
3.5 Standard Progressive Matrices (SPM) test

The present study investigated intelligence tests with special interest in the British

mental ability test- the “Raven's Standard Progressive Matrices (SPM)”- as a measure

of general ability. It consists of 60 problems in 5 sets of 12. The tests are called

progressive because each problem in a set, and each set are progressively more

difficult. Each problem consists of geometric design with a missing piece; the

respondent selects the missing piece from six or eight choices given (Domino and

Domino, 2006). A more extensive description of the SPM test shall be given in the

next chapter.

The SPM test was selected because it has been regarded not only by its author, but

also by many researchers (e.g. Burke, 1958; Anastasi, 1988; Raven, 1989; Carpenter

et al., 1990; Arthur, & Woher, 1993; and Arthur & Day, 1994) as a useful non-verbal

measure of ability which was easy to administer and score. It is a group test, which

can be used with subjects of all language backgrounds and does not depend to any

large extent upon education or prior knowledge of the subjects. In addition, it is

suitable for all ages from the age of 6 years

The Progressive Matrices (RPM, Raven, Raven & Court, 2000, Lynn & Vanhanen

2006) is the most widely used test of intelligence in numerous countries throughout

the world. One reason for the popularity of the test was that it is non-verbal and can

therefore be applied cross-culturally, while verbal tests are more culture specific and

preclude cross-cultural comparisons. Another reason for the popularity of the test is

that it was considered to be the best test of g, the general factor present in all cognitive

tasks that was first identified by Spearman (1904) and which was largely a measure of

reasoning ability (e.g. Carroll, 1993; Jensen, 1998; McGrew and Flanagan, 1998). The

70
test was constructed by Raven (1939) and consisted of a series of 5 or 7 designs that

progressed according to some rule. The problem was to identify the rule and

extrapolate it further. Testees were given 5 or 8 alternatives for this further

extrapolation and had to select the correct one. Items were scored either right or

wrong. A participant’s score was the number of right answers. Maximum possible

score was 60. The right answers were provided in the SPM manual.

The Raven’s Standard Progressive Matrices (SPM) test was constructed to measure

the educative component of g as defined in Spearman‘s theory of cognitive ability

(Raven & Court, 1998, updated 2003). Kaplan and Saccuzzo (1997) stated that

research supported the Raven Progressive Matrices (RPM) as a measure of general

intelligence, or Spearman’s g factor. In fact, the Raven may be the best available

single measure of g.

In the same vein, Jensen (1998) maintained that in numerous factor analyses, the

Raven tests, when compared with many others, had the highest g loading and the

lowest loadings on any of the group factors. The total variance of Raven scores in fact

comprised virtually nothing besides g and random measurement error. He also added

that Raven’s Progressive Matrices was often used as a “marker” test of Spearman’s g.

That is, if it was entered into a factor analysis with other tests of unknown factor

composition, and if the Matrices had a high loading on the general factor of the matrix

of unknown tests, its g loading served as a standard by which the g loadings of the

other tests in the battery could be evaluated.

By the same token, Lynn, et al. (2004) stated that the Progressive Matrices was

widely regarded as the best test of abstract or nonverbal reasoning ability, and this

itself was widely regarded as the essence of “fluid intelligence” and of Spearman’s g.

71
Mackintosh (1996) had described it as the paradigm test of non-verbal, abstract

reasoning ability.

This view is not, of course, universally accepted. Indeed, Raven and Court (2000)

referred to several studies which emphasised a spatial ability loading, and a review of

the extensive literature dealing with this topic from the point of view of researchers

keen to distinguish “Working Memory” from “g” was provided by Ackerman, Beier,

and Boyle (2002).

Court & Raven (1995); Kline (2000); Murphy & Davidshofer (1998) noted the

Standard Progressive Matrices test enjoyed good psychometric characteristics.

Gregory (1992) also noted that a huge body of published research has shown the

validity of this test. Therefore, as Irvine & Berry (1988) noted, it has gained

widespread acceptance and use in many countries over the world. No other test had

been extensively used in cross-cultural studies of intelligence. Lynn and Vanhanen

(2002) summarized extensive number of studies based on normative data for the test

which had been collected in 61 countries. For all these reasons, Kaplan and Saccuzzo

(1997) concluded that with its new worldwide norms and updated test manual, the

Raven was regarded as one of the major authorities in the psychological testing field

in the 21st century.

Some tests seemed to be more appropriate than others for use with literate children

and adults in developing countries. For example, at middle primary level there was

the Raven's Coloured Progressive Matrices (CPM) test. From the eight year old

upwards there was Raven's Standard Progressive Matrices (SPM) test (Ord, 1972).

72
The Progressive Matrices tests (Standard, Coloured, and Advanced) were the best

known and most widely used as measures of individual differences in cognitive ability

and as culture-reduced tests (Powers et al., 1986.a; DeShon et al., 1995). According

to Thorndike and Hagen (1977) and Ogunlade (1978) the SPM test's freedom from

language and apparently limited dependence on cultural variables had made it a

popular instrument for use in developing countries

Jensen (1980, p.648) examined the usefulness of the SPM test and made the following

observations:

Because the Raven Progressive Matrices is an excellent culture-reduced


measure of fluid g, one of its chief values is for screening illiterate,
semiliterate, bilingual, and otherwise educationally disadvantaged or
socially depressed populations for potential academic talent that might
easily remain undetected by parents and teachers or by the more
conventional culture loaded tests of scholastic aptitude. It is probably the
surest instrument we now possess for discovering intellectually gifted
children from disadvantaged background....

Due to all of the abovementioned advantages of the SPM as being widely used as a

measure of general ability, enjoyed good psychometric characteristics, being a cross-

cultural test, the researcher chose the SPM test as a measure for mental ability for the

Libyan sample in the present study

3.6 Statement of problem and study rationale

Measuring mental ability accurately and objectively has been a major concern of

researchers and psychologists in many countries since the beginning of the

psychological testing movement in 1905 by Binet. In Libya, as we have noted

previously, there is no valid or reliable instrument available to meet the researchers'

needs by providing a sound assessment of intelligence and this is a gap worth closing.

Libya, as a developing country which has no single standardized test to measure

73
ability, has to adopt one intelligence test which is suitable for the measurement of the

mental abilities of a Libyan sample

Thus, in summary, the problem is related to the adoption of one of the appropriate

Western instruments, which was suitable for measurement of general intelligence in a

Libyan setting, where no single test of intelligence had been officially adapted or

developed to give better judgment and evaluation for the Libyan samples

Differences in intelligence scores for different groups were considered important, in

part, since tests were statistically structured to distinguish between individuals, and

groups, because groups were aggregates of individuals. Intelligence tests were

designed carefully and deliberately to produce score variance (Wesson, 2000).The

generation of a broad range of individual scores permitted psychologists to acquire

knowledge and make judgments about, between, and within group differences. This

knowledge allowed for the interpretation of the distribution of scores that led to

various decisions (e.g., eligibility for placement in special education and gifted

programs) (Yoon, 2006).

Not much is known of the intelligence of populations of North Africa (Lynn and

Vanhanen, 2002, 2006). Libya as a developing country faces the same problems

which has been and is being faced by many of its Arab neighbors. It lacks the

prevalence of a pre-eminent, well established infrastructure to support key sectors like

education. Although many Libyan students graduated from educational psychology

programs and during their university study they received some theoretical knowledge

about intelligence and personality tests, still there is a lack of intelligence test

adaptation or development in Libya which is mainly due to a lack of test expertise.

74
Psychologists and scientific research related to educational and psychological issues

in Libya lack the knowledge about IQ tests among the population in general. There is

no perfect translation of the verbal items of the Stanford-Binet or WISC-R tests

currently in use in Libya. Therefore, no standardization or norms have been obtained

to suit Libyan samples. All these create a misuse, misunderstanding and unwise

application of the few intelligence tests which are available in Libya and which have

been used in Libya during the past years (Mahdawi and Al-Roey, 1991; Attashani and

Abdalla (2005).

Abdalla (2002) noted that in 1988 during his work as an educational psychologist at

Massa Institution for Mentally Retarded Children in Libya, with little modification,

translated and administered the short Form (L-M) of the Stanford-Binet Intelligence

Scale from English to colloquial Libyan Arabic language, in order to measure mental

ability of retarded and normal children aged 6 to 12 years. The project failed because

the sample was too small (N=54), the test required too much time to administer and

score and there were no test experts to analysis the data which were mainly verbal.

Furthermore, such standardisation for an individual test like the Stanford-Binet can be

done only through professional organisations which have great deal of time, effort and

money. These findings prompted the researcher to study and use the Raven's Standard

Progressive Matrices (SPM) test, as a tool to measure mental ability in the present

study to avoid the Stanford-Binet problems and difficulties.

Lack of intelligence test adaptation or development and the misuse of the few tests

available now in Libya created problems in the areas of mental measurement and

school selection. One of the major problems facing Libyan psychology researchers

75
now is the lack of accurate measurement of mental abilities. This type of

measurement in Libya had been affected by the lack of adapted or developed

intelligence tests. For example, only few institutions such as the Benghazi Children's

Hospital or the Tripoli Centre for Mentally Retarded Children were currently using

some items, but not the whole test, from the Stanford Binet Intelligence Scale or from

the Wechsler Intelligence Scale for Children-Revised (WISC-R) for the measurement

of intelligence

Unfortunately these tests items were used in these institutions without suitable

modification and adaptation to estimate some aspects of mental ability of the children

who were referred by parents or schools for diagnosis or treatments. It is clear that

such methods of assessment may have limited the application of test results or led to

wrong classification of a child's mental ability Again, this appeared to point to a lack

of understanding about these tests based upon a lack of knowledge in their application

and how to adapt such tests to suit the intended target groups.

At the Second Family Conference in Beida city in May 1991, the problems of testing

of children with special needs were discussed in a paper presented by Abdalla. One

of the recommendations was to stop testing and labelling deaf and mentally retarded

children according to scores obtained from incomplete and unstandardised

intelligence tests. Shelley and Cohen (1986) stated that attaching numbers to people

is not hard; attaching "meaningful" numbers is very problematic.

Previous studies that carried out the SPM in Libya included Aboujaaferin 1983; and
Majdub in 1991; Attashan and Abdalla in 2005 and Ahlam in 2005. These studies
were carried out without the prior standardization of the test. This present study

76
carried out the necessary standardization. Standardization of a test means obtaining
average scores and distributions from a representative population (Kline 2000).

This study responded to the lack of psychological testing in Libya, particularly in

mental testing. Thus, the main purpose was to develop norms, for the Classic form of

the Standard Progressive Matrices (SPM) in Libya to find out the distribution of IQ

scores within a Libyan setting. Norms of this groups were compared to norms of other

countries and a meta-analysis was carried out to investigate whether significant

differences exist in Raven’s Standard Progressive Matrices test scores between

developed countries ( e.g. UK ) and developing countries ( e.g. Libya ), according to

their age, sex and regions. This was done to examine the conclusion advanced by

Lynn (2006) that average scores are somewhat lower in economically developing

nations than in the economically developed nations of Europe and North America.

This study determined the psychometric characteristics validity, reliability, and item

analysis (difficulty and discrimination levels) of the Raven's Standard Progressive

Matrices (SPM) test in a Libyan setting and computed the percentile ranks for (SPM)

test scores according to sample age levels (Standardization of the Raven's Standard

Progressive Matrices (SPM) test with Libyan sample).

The last century has marked the success of the means of measurement, in testing in

general and intelligence testing in particular. Group standardised tests, however, have

come to the fore together with individual tests, practical tests, written tests and verbal

and non-verbal tests. Measuring intelligence as a general intellectual ability has been

taken into account by psychologists since the beginning of educational and

psychological measurement (Mohammad, 1984).

77
Attashani and Abdalla (2005) mentioned that in 1905 Alfred Binet in collaboration

with Simon in France constructed the first intelligence test and improved versions

came out in 1908 and 1911. This was when intelligence measurements found their

way into many countries and were being widely used for many purposes e.g.

intellectual ability measurement, educational guidance, educational selection,

educational diagnosis, vocational guidance, vocational selection, intellectual

weakness diagnosis and helping in the decision making process. These types of

measurement were critical and provided many benefits, especially in countries where

they believe in such measurements.

There remained no serious doubt about the potential usefulness of testing procedures

for purposes of educational and occupational selection in developing countries.

Whether tests would be adapted and how they were best applied were no longer major

issues; likewise whether such tests needed to be culture free or culture fair. Major

issues centred on such matters as the long-term validity of selection measures; the

prospects for further, as yet relatively untried, measures, as part of the selection

procedures; the education of more precise information on how moderator variables

may be operating in the selection situation; the possibility that of adopting more

efficient strategies of selection than traditional ones from the viewpoint of fitting the

job to the man as well as the man to job; and, perhaps most important of all, the

means of building locally appropriate, efficient selection institutions that would prove

viable (Ord, 1972).

Intelligence tests have been used in many areas in both USA and UK. The results

have been used in making decisions for entering schools, colleges, and universities,

and for accepting a work or a job opportunity. In France, intelligence measurements

78
have been used for vocational guidance and psychological diagnosis. In the USSR,

intelligence tests have been used in the educational sector as well as in vocational

guidance (Mikhaeel, 1995).

The measurement and utilization of intelligence, however, quite appropriately

deserved primacy within any culture, for the wealth of any nation-developed,

developing, or “primitive”- was the ability of its people. Once properly identified as

having requisite abilities for differential placement, each person can then conceivably

contribute more to the health. Well-being and the productivity of his country (Brislin

and Thorndike, 1973). It is axiomatic that the great nations have become great,

industrial, and prosperous because mental energies were tapped (Brislin and

Thorndike1973).

Since most developing countries were keen to make use of these tests, and since they

did not have sufficient scientific and technical abilities to help them design suitable

cultural tests, they opted for a standardising process. They were in need of different

tests of this type to satisfy the needs of human and social development plans, which

were usually adopted in these countries. To reach such goals, they needed to apply

these tests, and to conduct scientific research on these tests that represented part of the

scientific research in the fields of educational, intellectual and psychological

measurement. They needed to do such research in order to adjust these tests to their

societies, and to help them reach an appropriate interpretation for the score that a

person who sits for the tests achieves (Kamil, 2004).

In this respect, the importance of standardising these tests and measurements came to

the fore from one day to another. This was reflected in the interest of developed

countries in designing and standardising these tests and using them in different life

79
sectors, such as educational, health and other institutions. Moreover, there were now

specialised institutions which dealt exclusively in designing and standardising these

tests (Attashani and Abdalla, 2005).

Regardless to say, intelligence tests are mainly used in the educational sector. They

are also helpful in predicting what students in one class or school learnt in the level

that was expected for them, and also helped teachers to predict what students can

achieve (Alwakfi, 1998). Generally there was a need for intelligence tests to discover

talented individuals. Such students do not differ in appearance from other students.

Unless these tests were conducted, such students had no chance to be recognised

(Rajha, 1970).

There were also many other contributions of testing to society, such as better

distribution of educational and professional opportunities based upon merit and good

judgment, not on luck or personal judgment. Alexopoulos (1979, p.18) in his research

into standardization of the Wechsler Intelligence Scale in Greece, mentioned the help

that IQ tests could provide. He concluded that

We can imagine what could happen if there were no tests available


and the whole system was based on personal judgment, the social
position of the examinee etc., as is often the case with
underdeveloped or even developing countries, where the whole
system is not based on merit but on social position, acquaintance
with other persons of higher social or political standing. Thus tests
can help to create a society based on merit and equal opportunity to
all members of society

Eells et al. (1971), Drenth (1972), Miron (1977), Drenth et al., (1979) and Heyneman

(1987) argued that testing had contributed to more effective use of manpower, and

more equal distribution of educational and professional opportunities, and

identification of talents that might otherwise remain hidden.

80
These comprehensive tests which recognised skilled young students from others were

widely used. They were used to such an extent that scores in some studies were

considered a scale for a student’s or child’s skills. These tests as well (intellectual

skills in particular) were used to distinguish students with special skills in science, arts

or other skills such as human relation skills. They also helped in distinguishing

students with special skills and with high intelligence skills (Shafile, 
).

In Libya we can use these intelligence tests to recognise the intellectual abilities of

students. Depending on the test results, students with high or low scores can receive

the appropriate attention and assistance.

Zahran (1990) identified the importance of intelligence tests in particular for children

that may be classified according to their levels. Majdub (1991, p.215) who studied the

academic achievement of two groups of students from Tripoli University concluded

that

Psychological tests are seriously neglected in Libya to a serious


extent. Tests of very important psychological variables have not
been standardized or introduced to Libyan society. In fact the
Standard Progressive Matrices test and the other psychological scales
are introduced to the Libyan culture for the first time

The research found that Libya is now in urgent need, more so than at any other time,

of an intellectual test to be used in selecting students and nominating them to colleges

and universities. In Libya, we do not need a large number of graduates: more so, we

require a greater number of vocational students.

Doubtless to say, the proper use of mental and other ability tests and measurements

within the local environment would provide the indigenous local market with

81
workers, especially when they are classified according to their skills. Issawi (1973)

found that these tests were widely used in filling empty jobs and in choosing the best

person for the best place (vocational, industrial, or even the military sector).

Attashani and Abdalla, (2005) stated that it was harmful for the country’s economy to

select a person for a job that did not agree with his or her intellectual abilities.

Heynman (1987, p.251) pointed out the importance of educational selection to the

economic performance of developing countries. He stated

In a competitive international environment, not choosing one's


technical elite from among the brightest citizens can have a grave
effect on economic performance. By one estimate, developing
countries could improve their Gross National Product per Capita by
5% if they were to base leadership upon merit

Abdalla (2002) mentioned that for school selection, in many western countries, it is

customary to give both intelligence and achievement tests. Many studies in

developing and western countries ( for example, Sinha, 1968; Rao, 1974; Maqsude,

1980 and 1983; Carver, 1990; Andrich and Styles, 1994) made use of intelligence

tests especially the Raven's Standard Progressive Matrices (SPM) test for school

selection and prediction

Depending solely on students' grades of the last year in secondary school to gain

admission to Libyan universities may lead to making some mistakes regarding

students' admission. However using some psychological tests in conjunction with

secondary school grades could minimize two principal errors, for example, admitting

students who might fail in the university and rejecting students who might succeed

(Majdub, 1991 .

82
The study highlights the following aspects also:

• This study is considered to be the first attempt to standardize Raven’s


Standard Progressive Matrices (SPM) test for a sample from Libya. Majdub

(1991) reported that psychological tests are seriously neglected in Libya. They

have not been standardized or introduced to the Libyan society. Lynn and

Vanhanen (2006) stated that not much is known of the intelligence of the

populations of North Africa.

• Providing norms for the (SPM) test for use, in conjunction with examination
grades, to help the authority in implementing appropriate decisions related to

the future of individuals, and to guide them to educational programs that will

suit their abilities. Also, for use in job selection to match applicants to suitable

employment. Many sectors in Libya only use examination grades as the

method in matching students to various academic establishments and for

various jobs in the vocational sector. Attashani and Abdalla (2005) mentioned

that no single test of intellectual ability or aptitude has been officially adapted

or developed to be used for intelligence measurement or aptitude in Libya.

• Providing the means to estimate levels of intelligence since our society lacks

these tests, to be able to recognize high IQ in the society and well as low IQ.

• To study difference in level of intelligence between sexes, age groups and

different locations such as rural and urban areas.

From the above mentioned points and in view of the present situation in Libya it is

clear that there is a great demand and need for adapting at least one test in each of the

following areas: intelligence, aptitude, vocational interests, and personality to provide

researchers, psychologists and policy makers with effective tests for evaluation,

selection, and diagnostic purposes. For a developing country like Libya such tests

83
which give accurate measures of intelligence, achievement and personality are crucial

in the future development of its students and workforce alike.

3.7 Study aim

To develop norms for the classic form of the Standard Progressive Matrices (SPM)

test in Libya and to identify the distribution of IQ scores within a sample of Libyan

students.

3.8 Research Question

“What are the norms for a Libyan sample when the SPM test is applied as an

appropriate measure of mental ability?”

3.9 Research objectives

1. To determine psychometric characteristics (reliability, validity, difficulty and

discrimination) of the SPM test when applied to a Libyan sample.

2. To study the relationship between SPM mean scores and student’s academic

achievement (SAA) for a Libyan sample aged 8 – 21 years.

3. To investigate the presence of significant differences in sample

performances on the SPM test according to gender, region (cities and

villages), academic discipline (science and arts), geographical areas (main

city, secondary city, coastal, mountain and desert), age and study levels.

4. To investigate the presence of significant differences in sample performance

on the SPM test according to region and gender, age and region, region and

study levels, geographic areas and gender, academic discipline and gender,

age and gender and age and academic discipline.

5. To investigate variability of SPM means score gender based on age and gender

based on geographic areas and gender based on academic discipline.

84
6. To examine the contribution of the independent variables gender, age and

regions and academic achievement in predicting SPM scores.

7. To compute the percentile ranks for the SPM scores according to the sample

age levels.

8. TO compare performance on the SPM test for a Libyan sample with that of

other countries (developed and developing countries).

3.10 Chapter Summary

Libya has witnessed extensive improvement in the education sector. Nevertheless, no

single test of mental ability has been officially constructed or adopted for the

measurement of the intelligence in a Libyan setting. Lack of use of intelligence tests

in Libya is mainly due to a lack of test experts and information and knowledge

regarding the usefulness and effectiveness of these tests among people who were

directly affected by testing.

The lack and misuse of some intelligence tests to estimate the mental ability has some

detrimental implications and lead to wrong prediction, placement and treatment of

students whom underwent the test. Also guidance, counselling and direction of

students towards universities and colleges and of personnel to various types of jobs

have been affected by the absence and misuse of intelligence and personality tests. It

is believed that intelligence tests are important and vital to the educational and

economical system of the society.

The present study tried to remedy and rectify the above problems. It is an attempt to

provide an intelligence test that best suits a Libyan setting. It will investigate and

examine the performance of a Libyan sample on the Standard Progressive Matrices

test, and explore its applicability as an appropriate measure of mental ability.

85
The focus of the study was to standardize the British mental ability test; the Raven's

Standard Progressive Matrices (SPM) test to a sample consisting of School and

University students (8 to 21 years) from the eastern province in Libya.

The study aims to develop norms for the classic form of the SPM test to identify the

distribution of IQ scores within Libyan students.

In the next chapter we give a complete description of the SPM test. This will mainly

include past studies along with their findings. A detail review of the available

literature with critical analysis will also be exposed.

86
Chapter four: REVIEW OF STANDARD PROGRESSIVE MATRICES LITERATURE

4.1 Introduction

The aim of this study was to develop norms for the classical form of the Standard

Progressive Matrices (SPM) test and identify the distribution of IQ scores for a

sample of Libyan students. This chapter presents, in details, this review and sheds

light on prominent studies that have extensively employed the SPM test and related

subjects.

To achieve the desired aim, a comprehensive review was undertaken to identify and

appraise the available literature that described psychological and mental testing.

Greater emphasis was on the SPM test in particular. Studies in this review were

identified through an electronic search of databases such as PsycINFO, American

Psychological Asociation (APA), American Educational Research Association

(AERA), Educational Testing Association (ETS), National Council on Measurement

of Education (NCME), Educational Resources Information Centre (ERIC), Ingenta,

Web of Science, Dissertation Abstracts, the British Index to Theses, and Cambridge

Scientific Abstracts. In addition, the following active researchers in the field were

contacted; John Raven, Richard Lynn, Ahmed Abdal-Khalek and Omar Khelefeeh.

The earliest article published on SPM testing dated back to the year 1948. The first

step in the searching process was the identification of key concepts and location of

appropriate references. Key words used to locate relevant articles included:

standardization, intelligence testing, SPM test, validity, reliability and meta-analysis.

Data were extracted using the following categories: author, country, year of

publication, population sampled, age, SPM means and standard deviation’s and

sample size. Many papers published between 1948 and 2009, were identified and

subsequently critically appraised.

87
In addition, the SPM 1988, 1996, 2000, 2003, 2004 and 2008 manuals were included

to the papers and were utilized in this study (Raven, et al., 1988; 1996; 2000; 2003;

2004; 2008).

This chapter has been divided into nine sections. The first section provides general

information regarding the Progressive Matrices tests. The second section describes the

SPM test. The third section talks about reporting SPM results. Section four deals with

standardization of the SPM test. Sections five, six and seven investigate reliability,

validity and item analysis of the SPM test. Section eight briefly reviews relevant

previous studies which have employed of the SPM test. Last but not least section nine,

a summary of the main issues discussed in this chapter.

4.2 Progressive Matrices Tests

The Progressive Matrices Tests resulted from the work of the British psychologist

John C. Raven and geneticist Lionel Penrose. It was first published in 1938. Their

work was based on Spearman’s two-factor theory. In fact, the Progressive Matrices

tests are among very few tests which are based on a theory of intelligence (Raven,

2004).

Sinha (1950), a student of Cyril Burt, claimed that the Progressive Matrices tests were

not an original idea of Raven’s, as was often thought. He argued that they were

developed slowly out of the non-verbal analogy test constructed by Burt. Burke

(1958) also attributes the origins of the Progressive Matrices to the work and thinking

of Burt, Spearman and their students.

Spearman (1946) reported that the measurement of the “g” factor had been achieved

by the use of the Matrices test. He went further by considering the Progressive

88
Matrices test as the best of all nonverbal tests of “g”. Anastasi and Urbina (1997)

stated that Raven Progressive Matrices and vocabulary test were developed to

evaluate the two components of “g”; eductive ability and reproductive ability.

Eductive ability, on one hand, is mostly a nonverbal ability measured by the matrices.

On the other hand, reproductive ability is mostly verbal and measured by vocabulary

tests.

Lewis (1974) wrote that the Progressive Matrices test was a test of reasoning, based

on non-verbal data. Items were devised especially to evaluate the ability to perceive

relation and so provide, in combination, a measure of “g” factor.

Murphy and Davidshofer (1991) noted that a number of factor analyses of Raven’s

Progressive Matrices suggested that Spearman’s “g” is the only variable that is

reliably measured by the test. Little evidence can be drawn to indicate any significant

effects of spatial visualization or perceptual ability on the test scores. Carpenter et al.,

(1990, p.404) described Progressive Matrices as a non-verbal measure of analytic

intelligence, they said:

Analytic intelligence refers to the ability to deal with novelty, to adapt


one's thinking to a new cognitive problem. It is the ability to reason
and solve problems involving new information, without relying
extensively on an explicit base of declarative knowledge derived from
either schooling or previous experience.

Powers et al., (1986a) pointed out that Progressive Matrices were designed to measure

individual’s nonverbal mental ability through the assessment of abstract reasoning or

ability to perceive and apply relationships.

89
According to the 2004 SPM manual, Raven published the first version of the SPM test

in 1938. The current version of the SPM test is essentially the same. In 1947, small

adjustments to item (B.8) were made to improve the absolute order of difficulty.

Progressive Matrices are available in three forms with increasing difficulty:

a) Standard Progressive Matrices (SPM) test for use with individuals over six

years of age, within the normal adult range of ability. The1938 published SPM

is the most widely used form of Progressive Matrices tests.

b) Coloured Progressive Matrices (CPM) test was developed for use with

children aged five to eleven, the elderly, and the mentally retarded.

c) Advanced Progressive Matrices (APM) test sets I and II for individuals above

eleven years of age with average or higher intellectual ability.

The CPM and APM tests were both published in 1947 for the first time. All three

tests were designed to be used in association with a vocabulary scale. This is such that

verbal ability can be measured when required. There are two versions of the

vocabulary scales according to age; the Crichton Vocabulary Scale for children and

the Mill Hill Vocabulary scale for adults. The latter is available in senior and junior

forms (Court, 1983 and Raven, 1989).

The SPM test was adopted as the basic intelligence test by the USA Army and Navy

personnel selection departments in 1941. It was the main test for military

classification in Great Britain. It was utilized to ensure that normal intelligent recruits

were not rejected due to poor education. Before the end of the Second World War, it

had been already applied to several millions of recruits (Vernon, 1960; Cronbach,

1970).

90
In addition to the above characteristics, Raven Progressive Matrices test is probably

one of the most widely used culture-fair tests. Raven et al., (1996) mentioned that for

comparative purposes the SPM test became used internationally, and no general

revision of it has appeared necessary.

4.3 Description of the SPM Test

The SPM test is a non-verbal ability test consisting of a series of geometrical designs;

a 3x3 "matrix" grouped into five sets lettered A, B, C, D and E. Each set consists of

12 matrices. These Matrices are presented in black and white pictorial context. The

first matrix in each set is easy so as to be self-evident then it is followed by more and

more difficult ones.

Jensen (1980) showed that each set involves different principles of varying matrix

patterns. Also, within each set the items become progressively more difficult. Thus

after every 12 items, the subject is always faced by a quite simple item. This prevents

discouragement and loss of interest of participants.

The early matrix serves to teach one how to solve the later matrix. Thus it appears to

be a measure of a person’s ability to learn and apply new material, at least in the

visual mode (Armfield, 1985).

In each matrix, a part located in the lower right-hand of the geometrical design is

missing. Six alternative (sets A and B) and eight alternatives (sets C, D, and E) are

given below each matrix. All of these alternatives fit in the missing part. Only one,

however, logically belongs to the matrix.

91
The test instructs the participants to look across the rows and then look down the

columns to identify the rules of determining the missing part. The items are scored

either right or wrong. The subject's score on the SPM test is the total correct answers.

The maximum and minimum scores are 60 and 0 respectively.

Progressive Matrices problems are usually easier to solve than to describe (Hunt,

1975). An example of the Progressive Matrices problem is shown in Figure 4.1. The

pattern on the top is missing a piece, and the subjects must determine which numbered

piece below will complete it.

Figure 4.1 Typical items from the SPM Test. A5 presents an easy item whereas E1

presents a difficult item (Reproduced From Anistasi and Urbina, 1997, p.263).

Raven et al., (1988) described the SPM test as a test of a person's capacity, at the time

of the test, to apprehend meaningless figures presented for his observation. Seeing the

relations between them, conceiving the nature of the figure and completing each

system of relations presented. The bottom line is to develop a systematic method of

reasoning.

92
Researchers investigated various methods in an attempt to understand the most

efficient process that can be used to determine the missing parts, for example, an

answer which fits may, as Raven et al., (1988) puts it: (a) complete a pattern, (b)

complete an analogy, (c) systematically alter a pattern, (d) introduce systematic

permutations, or (e) systematically resolve figures into parts.

Hunt (1975) suggested that there were two quite different solution algorithms; a)

Gestalt algorithm, which deals with a problem by using the operations of visual

perception, such as the continuation of lines through blank areas and the

superimposition of visual images upon each other. The gestalt algorithm relies upon

the mental manipulation of sensory images. b) Analytic algorithm, which applies

logical operations to features contained within elements of problem matrix. The

analytic algorithm deals with abstracted features of displays, by operations such as,

supplement, delete, subtraction and movement.

Anastasi (1988) thought that the easier items require accuracy of discrimination

whereas the more difficult items involve analogies, permutations and alternations of

pattern, and other logical relation. Moreover, Carpenter et al., (1990) concluded that

the following five different types of rules were used when attempting an SPM test to

determine the missing part; 1) Constant in a row: the same value occurs throughout a

row, but changes down a column. 2) Quantitative pairwise progression: a quantitative

increment or decrement occurs in size, position or number. 3) Figure addition or

subtraction: a figure from one column is added to or subtracted from another figure to

produce the third.

93
4) Distribution of three values: three values from categorical attribute are distributed

through a row. 5) Distribution of two values: two values from categorical attribute are

distributed through a row, the third value is null.

The Progressive Matrices test is usually administered with no time limit and can be

attempted individually or in groups. Raven's Progressive Matrices are very easy to

follow once the method is understood. But since there is no time limit, time taken to

finish it varies from one subject to another.

4.4 Reporting SPM Results

According to the 2003 SPM manual (P.69), the most effective and convenient method

of interpreting the significance in SPM scores is by their evaluation in terms of

percentage frequency. Where, a similar score is found to occur among people of the

same age. For practical purposes, it is convenient to consider certain percentages of

the population and group people’s scores accordingly. In this way, it is possible to

classify a given subject, according to the score obtained, as:

GRADE I: “intellectually superior”; if the score lies at or above the 95th percentile
for people of that same age group.
GRADE II: “definitely above the average in intellectual capacity”; if the score lies
at or above the 75th percentile of that same age group.
II+: if the score lies at or above the 90th percentile of that same age group.
GRADE III: “intellectually average”; if the score lies between 25th and 75th
percentile.
III+: if the score is greater than the median or 50th percentile of that same
age group.
III -: if the score is less than the median of that same age group.
GRADE IV: “definitely below average in intellectual capacity”: if the score lies at or
below the 25th percentile of that same age group.

94
GRADE V: “intellectually impaired”: if the score lies at or below the 5th percentile
for that age group.

4.5 SPM test standardization

The SPM test was first fully standardised by Raven in 1938 on a sample of 1407

children in Ipswich, United Kingdom. In 1943, extensive collection of adult norms

was performed and the test was re-standardised on school children from Colchester.

The Mill Hill Vocabulary Scale was also standardised in that study. During the fifties

and sixties, several checks were run to determine the norms accuracy. The following

table (table 4.1) illustrates some SPM standardisation studies.

95
Table 4.1 SPM standardization studies
COUNTRY YEAR N AGE RESULTS OTHER COMMENTS.
China 1986 5108 6 to 79 Percentile norms for each half- SPM standardization (Raven, et al. 2003)
year interval( 6 to 16), for three
years interval(17 to 19) and for ten
years interval ( aged 20 to 97)
UK 1979 3500 8 to 18 Percentile norms for each half- SPM standardization (Raven, et al. 2003)
year interval( 6 to 16)
Belgium 1984 to 952 25 to 89 Percentile norms for each ten years SPM standardization (Raven, et al. 2003)
1990 interval ( aged 25 to 89)
Scotland 1992 629 20 to 75 Percentile norms for five-year SPM and MHV standardization (Raven, et al. 2003)
intervals (aged 20 to 65)
Turkey 1993 2485 6 to 14 Percentile norms for each half- SPM standardization (Raven, et al. 2008)
year interval ( aged 6 and 14)
Slovenia 1998 1556 6 to 18 Percentile norms for each year SPM standardization (Boben, 2007)

96
interval (8 to 18). Also, mean
scores for each year (aged 8 to 18)
Pakistan 2004 to 1662 11 to 18 Percentile norms for each year SPM standardization (Ahmad, et al. 2008)
2006 interval (aged 11to 18)
Syria 2004 2489 7 to 18 Mean scores for each year ( aged 7 Rahmn 2004 in his PhD as standardistion SPM test reported
to 18 ) by ( Keleefa and Lynn, 2008a)
Sudan 1999 6,202 9 to 25 Mean scores for each year ( aged 9 SPM standardization ( Keleefa et al., 2008b)
to 25 )
Qatar 2001 1135 6 to 11.6 Mean scores for each year ( aged 6 SPM standardization ( Keleefa and Lynn, 2008a)
to 11.6 )
Kuwait 2006 6529 8 to 15 Mean scores for each year ( aged 8 SPM standardization Abdel-Khalek and Lynn (2006)
to 15 )
Oman 2003 5212 9 to 21 Mean scores for each year ( aged 8 SPM standardization Abdel-Khalek and Lynn (2009)
to 15 )
4.6 Reliability of the SPM Test

Reliability is the degree to which a test consistently measures whatever it is

measuring. The more reliable a test is the more confidence we have about the

obtained. It assures that the scores obtained from the test are identical to the scores

that would be obtained if the test was re-administered to the same takers. In other

words, reliability means that a test is stable in measuring a trait i.e. the results of

measuring the same trait do not differ from one time to another (Domino, Domino

2006).

There are two ways to build consistency into a test: one is to do with the test

environment; while the other with test construction. Test environment could be

divided into physical and psychological factors. Physical factors, such as room

temperature, lighting and setting, are relatively easy to keep constant. On the contrary,

Psychological factors such as emotional stress anxiety and physical illness are

difficult to control (Anastasi, 1988).

Test construction, or test nature, is another factor which affects reliability. A test

must be constructed in such a way that it assures, as much as possible, that

participants will rank about the same, each time they attempt it. Length and quality of

the test-items are two important factors in test construction. The longer the test, the

more reliable it will be. The less ambiguous the questions, the more likely the answers

will be the same on two different occasions (Bertrand, & Cebula, 1980).

It is essential that the test should have a high level of reliability. Raven, et al., (1996)

mentioned that several studies dealing with the reliability of the SPM test have

97
reported positive results. These studies covered a wide range of ages, cultural groups

and populations.

There are several methods to determine reliability. The three most commonly used

are: split-half, test-retest and internal consistency (Cronbach’s Alpha) (Anastasi and

Urbina 1997; Kenneth 1998; Kline 2000; Langdridge 2004; Domino and Domino

2006). All of these methods have been employed in the current study.

4.6.1 Test-Retest reliability

Kline (2000) stated that test-retest reliability is a correlation of the items within a test

administered at two separate occasions. The test is first conducted to a certain group.

It is then repeated on the same group after an interval extending from one week to

several years. Some factors determine the time interval to be long or short. For

example, if the test items can be remembered easily then the time interval may be

taken to be long. However, if the sample is children then the interval needs to be

short.

It is known that the shorter the intervals the higher the test-retest reliability is.

According to the SPM test 2004 manual, test-retest correlation ranges from as low as

0.46 for an 11 years interval, in a study carried out in Germany in 1983 (N=1000

school children) tested from sixth grade, to as high as 0.93 within two weeks interval,

in a study carried out in India.

From the original studies of the SPM test, Raven provided a test-retest reliability

ranging from 0.83 to 0.93 for several age groups. The results were: 0.88 for 13 years

and over, 0.93 for 30 years and below, 0.88 for 30 to 39 years, 0 .87 for 40 to 49 years

and 0.83 for 50 years and over.

98
In India, Rao (1974) mentioned that the SPM retest reliability in two weeks interval

was found to be 0.93 for a group of college students. Abdel-Khalek (1987), in his

study with Egyptian undergraduates (N=44), found a retest reliability correlation of

0.82. The time interval was one week.

Nkaya et al., (1994) administered the SPM test three times at two weeks intervals to

88 students from Congo and 68 students from France. The French mean age was 12.3

years and the Congolese was 13.3 years. For the French students the reliability

between test 1 and 2 was 0.81, between test 2 and 3 was 0.74 and between test 1 and 3

was 0.75. For the Congolese students the reliability between test 1 and 2 was 0.91,

between test 2 and 3 was 0.92 and between test 1 and 3 was 0.87. They concluded that

the test-retest reliability was higher in Congo than in France.

According to the SPM test 1996 manual, the 1986 Chinese standardisation test-retest

reliability was 0.82 at 15 days interval and 0.79 at 30 days interval. More recently,

Abdel-Khalek (2005) with Kuwaiti school students (N=968) found a retest reliability

correlation range between 0.69 (age 12) and 0.85 (age 9). The time interval between

the test and retest was one week.

Khelefeeh and Lynn (2009) conducted a study to evaluate the SPM test norms in a

Qatari standardization sample, 1135 students aged 6 to 11.5 years (517 males and 618

females). The test-retest correlation coefficients of 0.89 for males, 0.95 for females

and 0.93 for the total sample were reported. From the above studies it was concluded

that the SPM test exhibited a high test-retest reliability.

99
4.6.2 Split-half reliability

Split-half reliability test was first devised by Spearman in 1907 as an alternative to the

test-retest method. It solved the memory effect problem associated with the test-retest.

In this method the test items are split into two halves, then correlated with each other.

It is possible to split the test using the first and second halves of the test, or more

commonly, using the scores of the even and odd items (this is particularly important

with test ability where items are often arranged in an order of difficulty). Clearly,

where this is the case, there might be poor correlation between the first and second

halves of the test (Langdridge 2004 and Kline 2000).

The majority of split-half internal consistency coefficients reported in the literature

exceeded 0.90. The lowest reliability was 0.86 with 174 Iranian children aged 9

years. The highest reliability was 0.96 in a study with 91 psychiatric male patients

(Raven et al., 2003).

Burke and Bingham (1969) found a split-half corrected reliability coefficient of 0.96.

This was in a study with 91 male patients with a mean age of 35.1 years who were

referred for vocational counselling.

Baraheni (1974) found a split-half correlation that ranged from 0.86 to 0.95 with

Iranian subjects aged 9 to 18 attending primary and secondary schools. The lowest

correlation, 0.86, was with 174 girls aged 9 and the highest correlation, 0.95, was with

291 boys and 425 girls aged 15 years. For subjects aged 18, split-half correlation was

0.93 (N=304). Sinha (1977) found a total split-half reliability coefficient (odd-even

split) of 0.90 with an Indian sample consisted of 140 students aged 11 to 15. They

were studying at grades 8, 9, 10 and 11. Sinha stated that the SPM test had a high

reliability for the Indian sample. Another high split half reliability of 0.94 with a

100
sample of 194 psychiatric patients in Germany in 1983 was reported in the 2004 SPM

test manual.

Bart et al., (1986) used the SPM test to study the development of proportional

reasoning in Qatar and United States. The American sample (N=281) ranged from 10

to 13 years of age. The Qatari sample (N=273) age was between 10 to 16 years.

Participants were students in the fifth, sixth and seventh grades. The SPM test

reliability, as indexed by the coefficient alpha, was 0.95. They stated that the value of

the coefficient alpha indicated an acceptable level of internal consistency, or high

reliability, of the test.

Comparing two cultural groups in Arizona, Powers et al., (1986.a) found a reliability

of 0.87 with 127 (69 boys and 58 girls) Hispanics. The same reliability was found

with 103 (53 boys and 50 girls) Anglo-American sixth grade students.

In 1994, Duzen et al., in a study carried out on 2277 Turkish students (6 to 15 years)

reported a split-half reliability of 0.91. Similarly Ahmad et al., (2008), on a Pakistani

sample of 1662 adolescents aged (12 to 19) years and 2016 adults aged (18 to 45),

showed a split-half reliability of 0.89. Moreover, Khelefeeh and Lynn (2009) on a

Qatari sample of 1135 students aged 6-11.5 (517 males and 618 females) confirmed a

split-half reliability of 0.84 for males, 0.88 for females and 0.87 for the total sample.

The above stated studies showed a high reliability of the SPM test. The average value

was about 0.91.

4.6.3 Cronbach’s alpha reliability

The Cronbach’s Alpha and Kuder-Richardson 20 (KR-20) estimate the internal

consistency reliability by determining how items of a test relate to each other and to

101
the total test. The KR-20 formula is a special case of the general Cronbach’s Alpha.

KR-20 formula provides reliability estimates that are equivalent to the average of the

split-half reliabilities computed for all possible halves. KR-20 is useful for multiple

choice items that are scored as right or wrong. In the case where the items can have

more than two scores then Cronbach’s Alpha formula should be used (Anastasi,

Urbina 1997 and Mills, Airasian 2006).

The majority of Alpha consistency coefficients reported in the literature exceeded

0.95. Dey (1984) with 136 talented Indian students, obtained a Kuder-Richardson

correlation of 0.91. In another study conducted on 2277 Turkish students, Duzen et al

in 1994, found the alpha reliability to be 0.95.

Rushton and Skuy (2000) administered an SPM test to 309 (17 to 23 years) students in

South Africa (173 Africans, 136 whites; 104 men, 205 women). The test aimed at

comparing the performance between African and white students. The study showed

internal consistencies based on Cronbach's alpha of 0.83 for white males, 0.73 for

white females, 0.89 for African males, and 0.92 for African females.

In 2002, Rushton et al, carried out an SPM test on 342 university students (198

African, 86 whites, 58 Indians; 271 men and 71 women). Internal consistencies

computed by Cronbach’s Alpha were 0.88 for the sample as a whole, 0.61 for whites,

0.82 for Indians, and 0.87 for Africans. Moreover, Abdel-Khalek (2005) on a sample

of 6529 Kuwaiti school students found that Cornbach’s alpha coefficients ranged

between 0.88 (age 14) and 0.93 (age 9). Similarly, Taylor in 2007 carried out a study

in South Africa on 144 female and 199 male job applicants. 46.9% were black and

41.8% white. A very good internal consistency reliability (0.96) of the SPM was

reported. In the same year, Boben (2007) conducted an SPM test on 1,556 children

102
and adolescents aged 7.5 to 18 years in Slovenia. Male students consisted 53% of the

sample. Calculated Cronbach’s alpha ranged from 0.89 (age group of 12 years) to

0.93 (age groups of 9 and 17 years), with a mean of 0.92.

The following table (table 4.2) summarizes the above studies about the SPM test three

reliabilities: test-retest, split-half and internal consistency.

Table 4.2 Summary of the studies performed on the SPM test reliability
SPM TEST-RETEST RELIABILITY
Abdel-khalek Egypt  87 44 0.82
Nkaya et al., Congo  88 0.91
France 86 0.81
Abdel-kalek Kuwait 2005 968 0.78
Khelefeeh & Lynn Qatar 2009 517 0.89
618 0.95
1135 0.93
SPM SPLIT-HALF RELIABILITY
Researcher Country Year N Reliability value
Burke & Bingham USA 1969 91 0.96
Baraheni Iran 1974 174 0.86
425 0.95
Sinha Indian 1977 140 0.90
Raven et al., Germany 1983 194 0.94
Bart et al., Qatar & USA 1986 554 0.95
Powers et al., USA 1986 127 0.87
103 0.87
Duzen et al., Turkey 1994 2277 0.91
Ahmad, et al. Pakistan 2008 1662 0.89
Khelefeeh & Lynn Qatar 2009 517 0.84
618 0.88
1135 0.87
SPM TEST ALPHA RELIBILITY
Dey Indian 1984 136 0.91
Bart et al., Qatar & USA 1986 554 0.95
Duzen et al., Turkey 1994 2277 0.95
Rushton and Skuy South Africa 2000 309 0.84
Rushton South Africa 2002 342 0.88
Abdel-kalek Kuwait 2005 6529 0.91
Taylor South Africa 2007 243 0.96
Boben Slovenia 2007 1556 0.92

103
It can concluded that the SPM test has a high degree of reliability for all three tests:

test-retest, split-half and internal consistency. Thus, their combination assures that it

has a high reliability. Looking at the regions where the test has been performed; it

covers a large proportion of the world including developing and developed countries.

The fact that the reliability of the test was relatively constant implies that the SPM test

has a culture-fair reliability.

4.7 Validity of the SPM test

Validity denotes the extent to which a test measures what it is supposed to measure

and, consequently, permits for an appropriate interpretation of scores (Anastasi and

Urbina 1997 and Langdridge 2004).

Validity provides evidence regarding the appropriateness of a test. Reliability, on the

other hand, as discussed in the previous section indicates the consistency of the scores

produced. The validity of a test depends on its reliability. A valid test is always

reliable. A reliable test could, however, be invalid. In other words, if a test is

measuring what it is meant to measure it will be reliable. Nonetheless, a reliable test

can consistently measure the wrong thing and hence be rendered invalid. Suppose an

instrument that is intended to measure social studies concepts actually measured only

social studies facts. It would not be a valid measure of concepts but can measure the

facts very consistently (Mills, Airasian 2006, Langdridge 2004 and Anastasi, Urbina

1997). Therefore reliability of a test is necessary but not sufficient for establishing its

validity. Reliability and validity are specific to the interpretation being made and the

group being tested. As a result we cannot simply say that a certain test is reliable

and/or valid. We rather must say that the test is reliable and/or valid for this particular

interpretation and this particular group (Mills, Airasian 2006).

104
Validity is the most paramount characteristic of a psychological test. To the extent

that without empirical data regarding the validity of a test we have no evidence,

conclusive or persuasive, as to what the test actually measures. Consequently it is not

possible to provide meaning to or interpret the test scores (Brown, 1983, Anastasi and

Urbina 1997 and Langdridge 2004).

There are three types of validity used in educational and psychological measurements:

content validity, criterion-related validity and construct validity (Anastasi and Urbina

1997).

4.7.1 Content Validity

Content validity refers to the extent to which a test measures a sample of the

behaviour which it is intended to measure (Raven, et al., 2003). In assessing the

content of a measuring instrument, one is concerned with the question of how well the

content of the instrument represents the entire universe of the content being measured

(Gronlund, 1981). Therefore, evaluation of this type of validity depends on the

analysis of the measured objects in terms of partial elements. If the items of the test

cover those elements in typical portions and the test appropriately samples the whole

measured content then the content validity is considered to be high. Content validity is

evaluated objectively and determined by logical analysis of the test content. However

it cannot be expressed in terms of a numerical index (Anastasi and Urbina 1997 and

Gay, et al., 2006).

It is worth mentioning that the content validity is sometimes referred to in literature as

the face validity. Although the meanings of the two often overlap they are quite

distinct. Face validity is essentially the apparent measurement of the test and not the

105
actual one. In other words, face validity refers to the degree to which the test appears

to be valid for non-technical observers such as examinees and test administers. Its

main role in the process of validation is the initial scanning in test selection

procedures (Anastasi and Urbina 1997 and Gay, et al., 2006).

As an example, the SPM test meets an important requirement for use in cross-cultural

contexts. It has face validity in the sense that it appears to those who take and

administer the test to be assessing basic ability to reason in a form of presentation.

The latter is not culturally biased though (MacArthur, 1960).

4.7.2 Construct Validity

Construct validity of a given test is the extent to which the test is said to measure a

hypothetical construct or trait. The word construct in this context is synonymous to

concept (Anastasi and Urbina 1997 and Gay, et al., 2006).

Kenneth (1998) reported that psychological constructs are unobservable postulated

variables that have evolved either informally or from psychology theory. Intelligence,

anxiety, aptitude, musical ability, critical thinking, ego strength, dominance and

achievement motivations are examples of common constructs. Construct validation is

the systematic analysis of test scores designed to assess whether there is a basis for

validity. The questions to be answered by construct validity are: what traits are

measured by the test? And to what degree? The process of construct validation

involves identifying and clarifying the factors that have an effect on the test scores.

The test performance can then be interpreted most meaningfully. This process

involves the accumulation of evidence from a wide range of different studies

106
(Gronlund, 1981, and Ary et al., 1985). Anastasi and Urbina (1997) stated that factor

analysis and internal consistency are both subtypes of construct validity.

4.7.2.1 Factor analysis

Factor analysis provides research information regarding the extent to which a set of

items measures the same underlying construct or dimension of a construct, and

evaluates the extent to which the individual items on a scale truly cluster together

around one or more dimension. Items constructed to measure the same dimension

should load on the same factor; those constructed to measure different dimensions

should load on different factors (Anastasi (1988), Anastasi, Urbina (1997), Kunnally

and Bernstein (1993)). In addition, Geri and Judith (2006) reported that this analysis

showed whether the items in the instrument reflected single or several constructs.

The SPM test was designed to be a measure of the general intellectual ability “g”, as

postulated as such by Spearman (Spearman, 1904; Spearman and Wynn-Jones, 1951).

It had been universally accepted for over half a century that the test was an

appropriate measure of “g”. This position was endorsed by Emmett (1949) based on

factor analysis of the SPM items in a sample of 11 years old children. More recently,

Jensen (1998, p. 541) contended that “the total variance of Raven scores in fact

comprised virtually nothing besides g and random measurement error”. Raven, Raven

& Court (2000, p.34) stated that “The Progressive Matrices has been described as one

of the purest and best measures of “g”, or general intellectual functioning”.

The SPM test (2004) manual reports several factor-analytic studies involving a large

number of children and adults. For example, investigations of British children showed

a high loading of up to 0.83 on “g” factor (Raven et al., 2004). Burke and Bingham

107
(1969) found a very high loading of up to 0.76 on “g” with adults. Also, as reported in

the SPM 1996 manual, (Zager et al., 1980) obtained a loading of .080 with “g”.

Moreover, Abdel-Khalek (1987) carried out an SPM test on Egyptian university

students (205 males and 247 females). A principal component factor-analysis with

unities inserted in the diagonals was carried out to determine if the items contained a

general factor and possibly other factors. Analysis showed a significant factor

(eigenvalue >1.0) that was extracted from both groups. This factor accounted for

79.6% and 72.6% of the total variance for male and female undergraduates

respectively. Another study carried out by the same author in Kuwait (2005), on a

sample of 6529 students aged 8-15 years (3278 boys and 3251 girls), investigated

factorial-analysis validity of the SPM test. A principal components factor-analysis

was carried out to find present factors. Results showed only one significant factor

which had a large eigenvalue of 3.46 that accounted for 69.2% of the variance.

Despite the above findings, a dispute was raised on the issue of whether the

Progressive Matrices are really a pure measure of “g”. A number of scholars have

contended that while the Progressive Matrices were largely a measure of “g” they also

contained a small visualization or spatial factor. Among them were Adcock (1948),

Keir (1949), Banks (1949), Vernon (1950), Gabriel (1954), Gustaffson (1984, 1988).

They concluded that the SPM test measures a reasoning factor and another factor

which was called “cognition of figural relations”. Hertzog and Carter (1988)

contended that the SPM contained two further factors named: verbal intelligence and

spatial visualization.

In agreement with the previous studies, Rimoledi (1948), Banks and Sinha, (1951)

and Sinha (1968) reported that “g” accounted for only 36% to 37% of the total

108
variance of the test scores. They suggested that the SPM test measures other factors

in addition to g. Furthermore several factor-analysis studies have examined the

overlap between skills on the Raven and other test of mental abilities. These studies,

which have most often been conducted with adult or older adolescent participants,

have provided evidence that Raven test evaluates perceptual and spatial abilities as

well as Spearman’s “g” factor. (Corman and Budoff, 1974).

On a sample of 920 Mexican primary school children, factor-analysis of the SPM test

results showed a strong reasoning factor and a weaker visualization ability factor. This

was among the results on contrary to the view that the SPM only measures “g” (Lynn

et al., 2004). Furthermore Lynn et al., (2004) conducted an SPM test in 2001 in

Estonia on a sample of 2735 adolescents whose age ranged from 12 to 18 years. They

identified a general factor and three further factors that they reported as: the gestalt

continuation, found by Van der Ven and Ellis (2000), verbal-analytic reasoning and

visuo-spatial ability. Further analysis of this study showed a higher order factor

identified as “g”.

The question that can arise here is how does “g” relate to the other three

factors? Contemporarily, the widely accepted theory that counts for this relation is

Carroll's three stratum model (1983). This consists of:

• Stratum 1: “g”

• Stratum 2: eight second order group factors, e.g. fluid intelligence,

crystallised intelligence …etc.

Stratum 3: around fifty factors. These are approximately the same as what are called

“Lower order factors” and “specific factors” (Carroll's, 1993),

109
4.7.2.2 Internal consistency

One of the methods used to identify a construct is the internal consistency method.

The chief criterion of this method is the total score of the test. Correlation methods are

often employed in this validation process. These involve item-test scores correlation

and subtest-test scores correlation (Anastasi (1988) and Anastasi, Urbina (1997)). The

latter correlation may be used in some intelligence tests where separately conducted

subtests are performed. The score on each subtest is correlated with the total score of

the test. In doing so, only those subtests which show correlation of 0.3 or higher are

retained (Tabachnick & Fidell 2007). The test is then said to be validated by internal

consistency.

As stated above, the internal consistency plays a role in determining the characteristic

of a trait or domain behaviour represented by the test. This can be easily seen by the

fact that highly correlated items and subtests with the test strongly suggest that the test

is measuring what it is meant to measure. In this sense, the internal consistency shares

some features with construct validity (Anastasi (1988) and Anastasi, Urbina (1997)).

It should be noted that no single validation process can establish the construct validity

of a given test (Gay et al., 2006).

Abdel-Khalek (1987) in his study on Egyptian undergraduates estimated the internal

consistency of the five sets of the SPM test. The Pearson’s product-moment was

employed. All of the inter-correlations between the sets were positive and statistically

significant. They ranged for the male group from 0.32 to 0.67 (N = 205) and for the

female group from 0.30 to 0.57 (N = 247). Moreover, in 2005 Abdel-Khalek

administered a study on Kuwaiti school students (N=6,529 aged 8-15 year). He

110
investigated the internal consistency of the SPM. The Pearson correlation coefficients

were statistically significant. They ranged from 0.43 to 0.77 for p < 0.001.

4.7.3. Criterion-related Validity

Criterion-related validity is determined by relating the performance on a test to the

performance on another test or measure. The second test measure is the criterion

against which the validity of the initial test is evaluated (Mills, Airasian (2006) and

Kenneth (1998)). In other words, criterion-related validity refers to the relationship

between the scores on a measuring instrument and an independent external variable

(criterion) believed to measure directly the behaviour or characteristic in question.

This type of validity can be reported by means of a correlation coefficient. Criterion

validity has two forms:

a) Concurrent validity: correlation between test scores and a criterion available at

the same or close point in time.

b) Predictive validity: correlation between test scores and a criterion that occurs

at a later point in time (Ary et. al, 1985 and Domino, Domino 2006).

Anastasi (1988) stated their definitions and distinguished between them in the

following:

The logical distinction between predictive and concurrent validity is


based, not on time, but on the objectives of testing. Concurrent validity
is relevant to tests employed for diagnosis of existing statues, rather
than prediction in future outcomes. The differences can be illustrated by
asking “Is Smith schizophrenic” (concurrent validity) and “Is Smith
likely to become schizophrenic” (predictive validity).

Domino and Domino (2006) mentioned that the SPM concurrent validity with

standard intelligence tests such as Stanford-Binet or WISC exhibited correlations

111
ranging from 0.50 to 0.80. Predictive validity, especially of academic achievement,

generally fell in the region of 0.20 to 0.60 (Raven, 2004). Powers and Barkan (1986a)

reported that the SPM scores had a correlation of 0.40 with reading achievement

scores, 0.54 with language achievement, and 0.49 with mathematics.

Anastasi and Urbina (1997) mentioned that specific indices used as criteria measures

included school grades, school achievement, promotion, graduation records and

teachers’ or instructors’ rating for intelligence. Such ratings given within an academic

setting are likely to be closely related to the individuals’ scholastic performance.

Likewise they may be properly classified with the criterion of academic achievement.

The correlations of the SPM test with intelligence test, standardised achievement tests

and school examinations varied with age, gender and sample homogeneity. Some

studies regarding SPM test correlation with intelligence, standardised achievement

tests and school examinations are presented below.

4.7.3.1 SPM Correlations with Intelligence Tests (concurrent validity)

The SPM test manual (2003) reported correlations in the range of 0.54 to 0.86

between the SPM and other IQ tests e.g. Stanford-Binet and Wechsler Scales for

English speaking children and adolescents. Correlations gained in cross-cultural

research with non-English speaking children and adolescents, as reported in the SPM

test manual (1996), tend to be lower. Generally they range from 0.30 to 0.68. Also as

reported in the manual, de Lemose (1989) in an Australian study, found a tendency

for students from non-English speaking cultures (e.g. Southern and Eastern European

and Middle Eastern countries) and those with non-professional fathers to score lower.

112
The following is a brief review of the studies conducted to determining the

relationship of the SPM test scores with more widely used intelligence tests such as

Lorg-Thorndike Test, Wechsler Scales (WISC-R for children, WAIS for adults),

Army General Classification Test (AGCT) Cohen Test, General Mental Ability

(GMA), Minnesota Paper Form Board (MPFB), Otis Gamma, Revised Beta, Quick

Test, Orange Juice Test (OJT), Stanford-Binet, AH2 tests, Otis-Lennon, Primary

Mental Abilities (PMA), Cattell's Culture Fair Test (CCFT), Arabic Verbal Reasoning

Test (AVRT), San Diego Test of Reasoning Ability (SANTRA), and Draw-a-Man

test.

Tulkin and Newbrough (1968) conducted an SPM test and Lorg-Thorndike test to 356

fifth grade and sixth grade high and low social class and black and white students.

Correlation between SPM test scores and Lorg-Thorndike Verbal IQ was 0.45 for

white high class (N=128); 0.33 for white low class (N=75); 0.40 for black high class

(N=50); and 0.48 for black low class (N=103).

Correlation between SPM test and Non-verbal IQ was 0.53 for white high class; 0.52

for white low class; 0.40 with black high class and 0.45 with black low class. It was

concluded that all correlations between SPM test and Lorg-Thorndike IQ test were

significantly different from zero. For the white groups the SPM test score was

somewhat more related to Non-verbal IQ than to Verbal IQ. This pattern was not

found in black groups.

In India Mehrotra (1968), with a small sample (N=45) of students with a mean age of

14.2 years, found a correlation of 0.68 between SPM test and WISC-R Full Scale,

0.60 with Verbal and 0.61 with Performance sub-tests. Burke and Bingham (1969)

found a significant correlation between SPM scores and Army General Classification

113
Test (AGCT). Similar results found with the Cohen Test with a sample of 91 male

patients (mean age 35.1 year) who were referred for vocational counselling services.

The correlation between the SPM and Cohen Verbal was 0.59; with Cohen Memory

0.49; with Cohen Perceptual Organization was 0.61. The correlation between the SPM

and AGCT Verbal was 0.60; with AGCT Numerical 0.66 and with AGCT Total was

0.67.

Mohan (1972) in India investigated the relationship between verbal and non-verbal

ability tests. He found a correlation of 0.65 between the SPM test and General Mental

Ability (GMA). The sample consisted of 310 college and university students ranging

in age from 18 to 25 years.

Mclaurin and Farrar (1973) administered both SPM test and WAIS test to 201

volunteer university students studying introductory courses in psychology. The

correlation between the SPM test and the WAIS were 0.57 for Full Scale, 0.45 for

Verbal and 0.54 for Performance. In the same study they investigated the validity of

the SPM test by correlating it with grade point average (GPA) and Minnesota Paper

Form Board (MPFB). Correlation between SPM test and MPFB test was 0.45.

Correlation between the SPM test and GPA was 0.21. This correlation was as good as

the correlation between GPA and WAIS-Full Scale which was .28 (N=201). The

validity of the SPM test was concluded to be moderate.

Three studies evaluated the use of the SPM test with psychotic patients in the USA

reported reasonable correlations between the SPM test scores and WAIS Full Scale,

Verbal, and Performance IQs. Burke and Bingham (1969), with 91 American male

patients at a veteran’s hospital referred for vocational counselling with a mean age of

114
35.1 years, found a correlation of 0.75 between the SPM and the WAIS Full scale,

0.65 with the WAIS Verbal IQ and 0.76 with WAIS Performance IQs.

In another investigation with psychiatric patients in Texas, Vincent and Cox (1974)

found that the SPM test correlated reasonably well with the WAIS Scale. Correlations

were 0.85 with Full Scale, 0.84 with Verbal and 0.75 with Performance. The sample

(N=131) was taken from psychological files of the Texas Vocational Rehabilitation

Unit. Most patients suffered physical, emotional or mental disability. It was concluded

that the SPM test is a viable tool for measuring intelligence in such population.

Also in the above study Vincent and Cox (1974) correlated the SPM scores for a

sample of 226 psychiatric patients with three IQ tests. Most patients had a physical,

emotional, or mental disability. The sample mean age was 28.7 year and consisted of

57 % white, 36 % black and 7 % Latin Americans. The correlation between SPM

scores and Otis Gamma scores was .70 (N=97), with Revised Beta .38 (N=58) and the

correlation with Quick Test was .60 (N=71).

The third study with psychiatric patients (N=256) was done by Burke (1985) who

correlated the SPM scores with WAIS score and found that the correlation between

the SPM and WAIS Full scale was .66, with Verbal scale .61, and with Performance

scale was .63.

Bart et al., (1986) administered the SPM and the test of proportional reasoning Orange

Juice Test (OJT) to a sample of 273 American and 281 Qatari fifth, sixth and seventh

grader students. They found a significant correlation of .49 between SPM and OJT.

According to the 1996 SPM test manual, Zhang & Wang (1989) in China found that

the SPM correlated .71 with Full scale WISC-R, .54 with Verbal and .70 with

115
Performance (no age level or sample size were reported). Another study by Narayanan

and Paramesh (1978) using the SPM test in India, administered the SPM test and

Cattell's Culture Fair Test to Tamil subjects, and reported a correlation of .58.

Horton and Karees (1987) administered the SPM test to a small sample (N=20) of

students participating in a gifted students program in the United States. They found a

correlation of .72 between the SPM test and Stanford-Binet. Correlation between

Stanford-Binet and Otis-Lennon IQs Test was only .45 (N=40).

Helms (1987) with 130 Canadian university students (65 females, 65 males and

average age of 19.3 years), reported a low correlation ranging from .22 to .36 between

AH2 Scales (a general ability test) and the SPM test. A correlation of .22 for Verbal,

.28 for Numerical, .31 for Perceptual and .36 with AH2 total scores. Helms

concluded that the SPM test correlation with other mental ability test was in a range of

.50 to .70, according to Jensen (1980). These values of AH2 correlation are somewhat

lower than the usual value for correlation among test of general ability, but the

correlation reported here are even lower.

In the US, the SPM test was administered by Jensen, et al., (1988) with a time limit of

40 minutes to a total of 261 undergraduates’ students. The students also did

Advanced Progressive Matrices (APM) and Otis-Lennon Mental Ability Test form.

Correlation between SPM and APM was .58 and correlation with Otis-Lennon was

.47.

In a study in Mississippi by Karnes and Whorton (1988), the SPM and Culture-fair

Intelligence Test was administered to 625 (441 white and 211 black students), in rural

county elementary school (grade 3-8). The mean age was 8.10 years. 410 students

116
were on free or reduced lunches and 245 students on paid lunches. The Pearson

correlation between the SPM and Culture-fair Intelligence Test was a moderate .46

and significant.

In a study carried out in Libya on two groups from Tripoli University, Majdub (1991)

found significant correlation between SPM and an Arabic Verbal Reasoning Test

(AVRT). For the Arabic major group correlation between SPM and AVRT was .53

(N=78). For the Education major group correlation between SPM and AVRT was .25

(N=111).

In a study by Johnson et al., (1994) a sample of 449, second, fifth and seventh grade

students in San Diego city school were given the SPM test. In this group, 77 were

African American, 122 Asian, 54 Filipino, 156 Latino and 40 White American. Of

these 215 were boys and 234 were girls. The mean age of children was 11 years (age

range from 6 years 8 months to 13 years 10 months). They administered the SPM and

an alternate form of the SPM called the San Diego Test of Reasoning Ability

(SANTRA). Correlation between SPM and SANTRA tests was highly significant

(.90).

Khelefeeh and Lynn (2009) in a Qatari sample of 1135 students aged 6-11.5 (male N

= 517 and female N = 618) reported a validity (correlation coefficient) of 0.86

between the SPM and the Draw-a-Man test.

The correlation of the SPM with both general intelligence test (full score) and a total

of 3 intelligence subtests (Non-verbal, Verbal and Numerical) will be averaged. In

doing so the Fisher’s z transformation was employed (Garret and Woodworth 1966).

It is mentioned there, Garret and Woodworth 1966, that this transformation is more

117
stable and has open limits (not from -1 to +1 as for r). Each sample r is converted into

a new equivalent statistic z. The averaged z is then converted back to r. The following

table summarises the above studies about the SPM test concurrent validity and r to z

Fisher’s transformation. The results of the tables will be discussed afterwards.

Table 4.3 Summary of studies on SPM test concurrent validity with r to z Fisher’s
transformation results
Researcher Country Year (N) IQ test r z
Tulkin & USA 1968 128 Lorg-Thorndike;(Verbal) 0.45 0.45
Newbrouhg Lorg-Thorndike;(Non-Verbal) 0.53 0.53
75 Lorg-Thorndike; (Verbal) 0.33 0.33
Lorg-Thorndike;(Non-Verbal) 0.52 0.52
50 Lorg-Thorndike; (Verbal) 0.40 0.40
Lorg-Thorndike;(Non-Verbal) 0.40 0.40
103 Lorg-Thorndike; (Verbal) 0.48 0.48
Lorg-Thorndike;(Non-Verbal) 0.45 0.45

Mehrotra India 1968 45 WAIS; (Verbal) 0.61 0.61


WAIS; (Performance) 0.61 0.61
WAIS; (Full Scale) 0.68 0.68

Burk & USA 1969 88 Cohen; (Verbal) 0.59 0.59


Bingham Cohen; (Memory) 0.49 0.49
Cohen;(Perceptual Organisation) 0.61 0.61
AGST; (Verbal) 0.60 0.60
AGST; (Numerical) 0.66 0.66
AGST; (Full Scale) 0.67 0.67

Burk & USA 1969 91 WAIS; (Verbal) 0.56 0.56


Bingham WAIS; (Performance) 0.76 0.76
WAIS; (Full Scale) 0.75 0.75

Mohan India 1970 310 General Mental Ability;(GMA) 0.65 0.65

Mclaurin & USA 1973 201 WAIS; (Verbal) 0.45 0.45


Farrar WAIS; (Performance) 0.54 0.54
WAIS; (Full Scale) 0.57 0.57
Minnesota;(MPFB) 0.45 0.45

Vincent & Cox USA 1974 131 WAIS; (Verbal) 0.84 0.84
WAIS; (Performance) 0.75 0.75
WAIS; (Full Scale) 0.85 0.85
Vincent & Cox USA 1974 97 Otis Gamma 0.70 0.70
58 Revised Beta 0.38 0.38

118
71 Quick test 0.60 0.60

Narayanan & India 1978 ---- Cattell’s Culture Fair 0.58 0.66
Paramesh

Burke USA 1985 256 WAIS; (Verbal) 0.61 0.71


WAIS; (Performance) 0.63 0.74
WAIS; (Full Scale) 0.66 0.79
Bart et al., Qatar 1986 554 Orange Juice Test; (OJT) 0.49 0.54

Horton & USA 1987 20 Stanford-Binet 0.72 0.91


Karees

Helms Canada 1987 130 AH2 Scales;(Verbal) 0.22 0.22


AH2 Scales;( Numerical) 0.28 0.29
AH2 Scales;(Perceptual) 0.31 0.32
AH2 Full Scales 0.36 0.38

Jense USA 1988 261 RAPM 0.58 0.66


Otis-Lennon 0.47 0.51

Karnes & USA 1988 649 Culture Fair Intelligence Test 0.46 0.66
Whorton

Zhang & Wang Chine 1989 ---- WAIS; (Verbal) 0.54 0.60
WAIS; (Performance) 0.70 0.87
WAIS; (Full Scale) 0.71 0.89

Majdub Libya 1991 78 Arabic Verbal Reasoning;(AVR) 0.53 0.59


111 Arabic Verbal Reasoning;(AVR) 0.25 0.26

Johnson USA 1994 446 Reasoning ability Test;(SANT) 0.90 1.50

Khelefeeh & Qatar 2009 1135 Draw-man Test 0.86 1.33


Lynn

The correlation-means between the SPM test and the general intelligence and the

three intelligence subtests are found in the table below, table 4.4

Table 4.4 the average of the correlation between SPM test with intelligence tests
Sub-Tests N Z’ Means (r)
General intelligence 3623 0.80 0.66
Non-verbal 3726 0.68 0.59
Verbal 1904 0.54 0.49
Numerical 218 0.54 0.49

119
It can be seen in table 4.4 that the SPM test correlates highly with general intelligence

and non-verbal tests than with verbal and Numerical tests. Since the SPM test is a

nonverbal test, contains no verbal items, it is expected to have a high correlation with

other nonverbal tests.

General intelligence is an ambiguous word. On one side, it can mean the sum of all

cognitive abilities. This is the meaning when it is said that the Wechsler tests measure

general intelligence. On the other side, it can be considered as the common factor in

all cognitive tests, i.e. “g”. There are other cognitive factors in addition to “g”. The

SPM test measures the “g” factor in all cognitive abilities. This, therefore, explains

the reason why the SPM test correlates to a high degree with general intelligence tests

(Lynn, 2008).

4.7.3.2 SPM correlations with achievement tests (Predictive Validity)

According to the SPM test manual (2004), the external criterion usually adapted in

predictive validity investigations is examination grades or teacher’s estimates. SPM

correlations with academic achievement tests generally fall in the region 0.20 to 0.60

with higher correlations being found with mathematics and science. Language and

overall academic achievement have a low correlation. Moreover correlations with

performance on achievement tests or scholastic achievement were generally lower

than correlations with intelligence tests. In several studies, the California

Achievement Test (CAT) served as the criterion to relate the SPM test scores.

Correlation with CAT Reading, Language, Arithmetic and over all achievement

scores ranged from 0.26 to 0.76 (Raven et al. 2004).

Tulkine and Newbrough (1968) with 356 black and white, high and low social class,

fifth and sixth grade students correlated the SPM test scores with Iowa Test for Basic

120
Skills (ITBS) achievement test. They found that for white high class (N=128) the

correlation was 0.30 with Vocabulary; 0.40 with Reading; 0.31 with Language; 0.39

with Work-study; and 0.39 with Arithmetic. For white low class (N=75) the

correlation was 0.25 with Vocabulary; 0.26 with Reading; 0.27 with Language; 0.41

with Work-study and 0.27 with Arithmetic.

The correlation between the SPM test and ITBS for black high social class (N=50)

was 0.39 with Vocabulary; 0.14 with Reading; 0.32 with Language; 0.36 with Work-

study; and 0.40 with Arithmetic. For black low class (N=103) the correlation was

0.32 with Vocabulary; 0.26 with Reading; 0.38 with Language; 0.33 with Work-study

and 0.39 with Arithmetic. In comparison, the correlation of SPM test to achievement

test (ITBS) was lower than correlation to IQ test (Lorg-Thorndike).

Sinha (1968) reported a correlation of 0.32 between SPM scores and grade point

average (GPA) with 220 students from art and science branches and a correlation of

0.36 with 204 engineering students from India. Dosajh, in his study in India as

reported by Sinha, (1968) found that the score on SPM could safely be taken as a

criterion for selection of students for technical and science courses. Dosajh’s

observation was based on the correlation of SPM test scores with examination scores

of 80 grade nine boys and girls.

Mclaurin and Farrar (1973) concluded a low correlation between the SPM test and

grade point average (GPA). Correlation was .21 with a sample of 201 university

students in the USA. Though low this corerelation score is still within the range (.20-

.60) given by Domino and Domino (2006) and Reven (2004) as mentioned above.

GPA may base on course work and partly determined by motivation and essay writing

121
ability. Since the SPM is a non-verbal test it is no surprise that it will weakly

correlates with verbal abilities such as the writing ability (Lynn,2009).

Baraheni (1974) evaluated validity of the SPM test in primary and secondary school

in Iran, by calculating correlation between scores on the SPM test and end of year

average school marks. A correlation of .44 was found with grade 6 (N=472), .29 for

grade 7 (N=360), .61 for grade 8 (N=203) and a correlation of .51 for grade 9

(N=643). Baraheni reported that the indices of the SPM test in predicting average

school marks in Iranian schools appeared to be as high as or even higher than the

coefficients reported from other countries.

Sinha (1977) in India found significant correlations between the SPM test and school

examination grades, .46 with grade eight (N=46), .47 with grade nine (N=5) and .38

with grade ten (N=35). The total correlation was .45 (N=86). Student’s age ranged

from 11-15 years old. Sinha found that the SPM test scores correlated significantly

with school examination grades in all groups except with grade nine which consisted

of only 5 students.As for the validity of the SPM test, he concluded that the results did

not stand highly for the test.

In another study in Nigeria, Maqsud (1980) investigated the validity of SPM test with

two different groups of primary school boys. A correlation which ranged from .19 to

.65 between the SPM test, English and Arithmetic was reported. He found a

correlation of .19 between the SPM test and English, and .38 with Arithmetic (N=60)

among primary school boys in traditional schools, and a correlation of .65 between the

SPM test and English, and .49 with Arithmetic (N=60) for primary school boys in

modern schools. Students from modern schools belonged to upper-middle class

122
homes, whereas students from traditional schools came from lower-middle and lower

class families. Average age of students was 12.2 year.

Maqsud concluded that a significant positive link between subjects' scores on the

SPM test and their achievement scores generally supported the theory that mental

ability is perhaps the best predictor of school achievement. Also he suggested that the

SPM test could be used for selection of secondary school intakes in Nigeria. Also, it

has been found by Chan (1982) that SPM test correlates well with non-verbal subtests

but rather poorly with numerical and verbal subtests of comprehensive scholastic

aptitude tests in Hong Kong.

Powers et al., (1986.b) in their study with 426 students (225 boys and 201 girls), from

sixth and seventh grades, reported the following correlation between the SPM test and

CAT. For sixth grade boys (N=116) the correlation was .34 with Reading, .41 with

Language, and .39 with Math. For sixth grade girls (N=96) the correlation was .36

with Reading, .50 with Language, and .60 with Math. Total sample correlation for

sixth grade (N=212) was .35 for Reading, .45 with Language and .48 for Math.

The correlation for seventh grade boys (N=109) was .45 with Reading, .50 with

Language, and .52 with Math. For seventh grade girls (N=105) the correlation was

.54 with Reading, .55 with Language, and .56 with Math. Total sample correlation for

seventh grade (N=214) was .49 for Reading, .51 with Language and .54 for Math.

Correlation ranged from .34 to .60 for sixth grade and from .45 to .57 for seventh

grade students. For sixth grade the lower correlation of .34 was with boys in Reading,

and the higher correlation of .60 was with girls in Maths. For the seventh grade the

lower correlation of .45 was with boys in Reading and the higher correlation of .57

was with girls (N=105) in Maths also.

123
It was concluded that the validity coefficients were higher for the seventh grade than

for the sixth grade students. It was higher for females than males. Further, it was clear

that the coefficients increased from reading to mathematics. The result of the study

indicated that the SPM test had a moderate predictive validity that varied depending

on sex, grade and academic criterion.

Sidles and Avoy (1987) administered the SPM test and Comprehensive Test of Basic

Skills (CTBS), a standardised achievement test, to 124 Navajo (one of the largest

Indian tribes in America) seventh and eighth grade students ranging in age from 14 to

16 years old. They found a correlation of .38 with Spelling, .39 with Reading, .46 with

Mathematics, and .47 with Language. Correlations were also computed between SPM

test and CTBS for female and male subjects. Correlations for male subjects (N=62)

were .28 with Reading, .34 with Spelling, .34 with Mathematics and .39 with

Language. For female subjects (N=62) correlations were .51 with Reading, 52 with

Spelling, .56 with Mathematics and .58 with Language. They concluded that the

correlation between the SPM test and CTBS was higher for females than males.

Carver (1990) studied the relationship between reading ability and SPM test. He

found that a correlation between the National Reading Standards Test (NRST) and the

SPM test that ranged from .36 to .68. The sample consisted of 486 students from

grade 2 to 12, from a small town, rural school system in Mid-west USA. The

correlation was .45 with grade 2 (N=42), .36 with grade 3 (N=44), .42 with grade 4

(N=42), .68 with grade 5 (N=52), .51 with grade 6 (N=54), .39 with grade 7 (N=62),

.55 with grade 8 (N=42), .59 with grade 9 (N=53), .36 with grade 10 (N=50), .54 with

grade 11 (N=19) and .51 with grade 12 (N=26). A low correlation of .36 was with

grade 3 and 10 whereas a high correlation of .68 was with grade 5. The mean of the

124
five correlations for grade 2 to 6 was .48, and the mean of the six correlations for

grade 7 to12 was .49. Carver found no evidence regarding that the relationship

between reading ability and the SPM test increased with age. Also, he concluded that

general intelligence, as measured by the SPM test, had a strong and consistent

relationship with reading ability.

In two groups consisting of Libyan university students, Majdub (1991) found a

significant correlation between SPM and academic achievement. For the Arabic major

group, correlation between SPM and academic achievement was 0.39 (N=75). For the

Education major group, correlation between SPM and academic achievement was .34

(N=110).

Andrich, & Styles, (1994) believed that the progressive matrices test contained

material not taught directly in schools and yet showed substantial relationship with

scholastic achievement. Johnson et al., (1994) correlated SPM with the

Comprehensive Test of Basic Skills (CTBS) in a small sample (N=32) from second,

fifth and seventh grade students in San Diego city school. The correlation between

SPM and Language was .48; with Reading .42 and with Math .56.

Pind et al., (2003) examined the criterion-related validity of the SPM test, in relation

to the results of the Icelandic National Examination for students in 4th, 7th, and 10th

grades. Generally the SPM sample average lied close to the INE average. In addition,

correlation of the SPM scores with the INE scores was calculated. Correlation was

found to be variable. In fourth grade (N=53) correlation with Icelandic was 0.38

whereas 0.50 with Mathematics. These correlations were appreciably higher in the

seventh grade (N= 59), being, respectively, 0.64 and 0.75. The correlations were

slightly lower in the tenth grade (N=51), 0.53 with Icelandic and 0.64 with

125
Mathematics. Finally, the two foreign languages, English and Danish, showed

correlations of 0.48 and 0.59, respectively, with the SPM. It supported the theory that

the SPM test showed higher correlation with mathematics than with language

subjects. In general, these correlations are at the higher end of those found in similar

studies.

In 2007, Laidra et al., carried out the SPM test on 3618 students (1746 boys and 1872

girls) from all over Estonia in grades 2, 3, 4, 6, 8, 10, and 12 to investigate the

relationship between intelligence and personality with academic achievement Grade

Point Average (GPA)) in Estonian schools, from elementary to secondary level.

Pearson correlation was carried out to correlate between SPM test scores and GPA.

Correlation values were for grade 2 (0.54, p= 0.001; N=364), for grade 3 (0.46, p=

0.001; N=388; ), for grade 4 (0.49, p= 0.001; N=430), for grade 6 (0.53, p= 0.001;

N=609), for grade 8 (0.48, p= 0.001; N=697), for grade 10 (0.43, p= 0.001; N=642)

and for grade 12 (0.32, p= 0.001; N=488). The analysis showed that the SPM means

score increased with increasing age. It was concluded that there did not appear to be

large differences in the way intelligence and personality dispositions related to the

grades children aquire in Estonian schools at different educational levels. Although

some traits had more effect in elementary school (e.g., Agreeableness) and others

became relatively more relevant later (e.g., Conscientiousness), students’ achievement

relied most strongly on their cognitive abilities through all grade levels. Intelligence,

as measured by SPM test was found to be the best predictor of GPA in all grades.

The SPM test correlation with achievement tests (Vocabulary, Reading, Language,

Math, Work-Study and Spelling) the Fisher’s z transformation was employed. The

126
above studies about the SPM test predictive validity are shown in table 4.5. A detail

analysis of the outcomes will be presented below the tables.

Table 4.5 Summary of the studies on SPM test predictive validity with r to z Fisher’s
transformation results
Researcher Country Year N Achievement r Z
Tulkine & USA 1968 128 ITBS test; Vocabulary 0.30 0.31
Newbrough ITBS test; Reading 0.40 0.42
ITBS test; Language 0.31 0.32
ITBS test; Work-study 0.39 0.41
ITBS test; Arithmetic 0.39 0.41
75 ITBS test; Vocabulary 0.25 0.26
ITBS test; Reading 0.26 0.27
ITBS test; Language 0.27 0.28
ITBS test; Work-study 0.41 0.44
ITBS test; Arithmetic 0.27 0.28
50 ITBS test; Vocabulary 0.39 0.41
ITBS test; Reading 0.41 0.44
ITBS test; Language 0.32 0.33
ITBS test; Work-study 0.36 0.38
ITBS test; Arithmetic 0.40 0.42
103 ITBS test; Vocabulary 0.32 0.33
ITBS test; Reading 0.26 0.27
ITBS test; Language 0.38 0.40
ITBS test; Work-study 0.33 0.34
ITBS test; Arithmetic 0.39 0.41

Sinha India 1968 220 Academic Achievement 0.32 0.33


240 Academic Achievement 0.36 0.38

Mclaurin & Farrar USA 1973 220 Academic Achievement 0.21 0.21

Baraheni Iran 1974 472 Academic Achievement 0.44 0.47


360 Academic Achievement 0.29 0.30
203 Academic Achievement 0.61 0.71
643 Academic Achievement 0.51 0.56
Sinha India 1977 46 Academic Achievement 0.46 0.50
5 Academic Achievement 0.47 0.51
35 Academic Achievement 0.38 0.40
86 Academic Achievement 0.45 0.48

Maqsud Nigeria 1980 60 English language 0.19 0.19


Arithmetic 0.38 0.40
60 English language 0.65 0.78
Arithmetic 0.49 0.54

127
Powers et al., USA 1986 116 CAT test; Reading 0.34 0.35
CAT test; language 0.41 0.44
CAT test; Math 0.39 0.41
96 CAT test; Reading 0.36 0.38
CAT test; language 0.50 0.55
CAT test; Math 0.60 0.69
Powers et al., USA 1986 212 CAT test; Reading 0.35 0.37
CAT test; language 0.45 0.48
CAT test; Math 0.48 0.52
109 CAT test; Reading 0.45 0.48
CAT test; language 0.50 0.55
CAT test; Math 0.52 0.58
105 CAT test; Reading 0.54 0.60
CAT test; language 0.55 0.62
CAT test; Math 0.56 0.63
214 CAT test; Reading 0.49 0.54
CAT test; language 0.51 0.56
CAT test; Math 0.54 0.60

Sidles & Avoy USA 1987 62 CTBS test; Spelling 0.28 0.29
CTBS test; Reading 0.34 0.35
CTBS test; Math 0.34 0.35
CTBS test; Language 0.39 0.41
62 CTBS test; Spelling 0.51 0.56
CTBS test; Reading 0.52 0.58
CTBS test; Math 0.56 0.63
CTBS test; Language 0.58 0.66
124 CTBS test; Spelling 0.38 0.40
CTBS test; Reading 0.39 0.42
CTBS test; Math 0.46 0.50
CTBS test; Language 0.47 0.51

Carvr USA 1990 42 NRST test ; Reading 0.45 0.48


44 NRST test ; Reading 0.36 0.38
42 NRST test ; Reading 0.42 0.44
52 NRST test ; Reading 0.68 0.83
54 NRST test ; Reading 0.51 0.56
62 NRST test ; Reading 0.39 0.41
42 NRST test ; Reading 0.55 0.62
53 NRST test ; Reading 0.59 0.68
50 NRST test ; Reading 0.36 0.38
19 NRST test ; Reading 0.54 0.60
26 NRST test ; Reading 0.51 0.56

Majdub Libya 1991 75 Academic Achievement 0.39 0.41


110 Academic Achievement 0.34 0.35

128
Johnson et al., USA 1994 32 CTBS test; Reading 0.42 0.44
CTBS test; Math 0.56 0.63
CTBS test; Language 0.48 0.52

Pind et al., Icelandic 2003 53 INE scores; Math 0.50 0.54


59 INE scores; Math 0.75 0.97
51 INE scores; Math 0.64 0.67
INE scores; language 0.48 0.52

Laidra et al., Estonian 2007 364 Academic Achievement 0.54 0.60


388 Academic Achievement 0.46 0.50
430 Academic Achievement 0.46 0.50
609 Academic Achievement 0.53 0.59
697 Academic Achievement 0.48 0.52
642 Academic Achievement 0.43 0.46
488 Academic Achievement 0.32 0.33

The correlation between the SPM test and both academic achievement and a total of 6

module subtests are given below.

Table 4.6 the average of correlation between the SPM test and achievement tests
Sub-Tests N Z’ Means z to r
Academic achievement 6148 0.44 0.41
Vocabulary 356 0.33 0.41
Reading 1364 0.46 0.43
Language 1535 0.41 0.39
Maths 1298 0.54 0.49
Work-Study 356 0.39 0.37
Spelling 124 0.41 0.39
Total 11181 0.43 0.41

The highest correlations of the SPM test were with mathematics. This was in

agreement with the findings of most earlier studies. Carpenter, Just & Shall (1990)

showed that the SPM is largely a mathematical problem solving test in design format.

It requires the application of five mathematical rules involving addition, subtraction,

arithmetical and geometrical progression. Note, on the other hand, that the lowest

value of the correlations was with the vocabulary tests. This was due to the fact that

the SPM test is a non-verbal test.

129
4.8 Item analysis of the SPM test

Item analysis indicates which item may be too easy or too difficult and which may fail

for other reasons. Thus makes it transparent to discriminate clearly between the better

and the poorer examinees (Ebel 1972). Brown (1971) mentioned that item analysis

has two purposes: First it enables us, by identifying defective items, to improve our

test and evaluation procedures. Second, through indicating which items or material

students have and have not mastered, we can plan, revise, and improve our

instructions.

It is worthwhile knowing that both the validity and reliability of any test depend

ultimately on the characteristics of its items. High reliability and validity can be built

into a test in advance through item analysis (Anastasi and Urbina 1997).

Item analysis was used to study two characteristics:

a) Item difficulty: the proportion of students who answered an item correctly.

b) Item discrimination power: tells whether a particular item differentiates

between students who have greater aptitude with the material tested (Brown,

1981).

4.8.1 Item difficulty

In item difficulty, if most students answered an item correctly then the item was an

easy one. If most students answered an item incorrectly then it should have been a

difficult one (Brown, 1983). The higher the values of the difficulty index the easier

the item. This definition is somewhat illogical and has led some researchers to refer

to the index as an index of facility, or easiness, rather than as an index of difficulty

(Ebel, 1972 and Nunnally, 1972). Nunnally (1972) and Burroughs (1975) argued that

item difficulty is required because it is almost always necessary to present items in

130
their order of difficulty. The easiest is administered first so that to give a sense of

accomplishment and a feeling of an optimistic start.

4.8.2 Item discrimination

Item discrimination shows whether the test items differentiate between people of

varying degrees of knowledge and ability. It may be defined as the percentage of the

“high” group passing the item minus the percentage of the “low” group passing the

same item (Brown, 1983).

Test-items can be classified as positively discriminating, negatively discriminating, or

non-discriminating. A positively discriminating item is one in which the percentage of

correct answers is higher in the upper group than in the lower group. A negatively

discriminating item is one in which the reverse occurs. A non-discriminating item is

one in which the percentage of correct answers is about the same for the upper and

lower groups (Blood and Budd, 1972).

The correlation coefficient obtained from the point-biserial is a measure of item

discrimination. The point-biserial correlation, between “pass/fail” on each item and

the total test score, was used to explore the SPM item discrimination (Brown, 1983;

Anastasi 1988 and Anastasi, Urbina 1997; Roid and Barram 2004; Kline, 2000; Kline,

2005). The greater the correlation of the item the more discriminating it is. That is, it

discriminates between higher and lower groups more effectively. For an item to be

valid, its correlation with the total score should be fairly high.

Ebel and Frisbie (1991, p.232) believed that the more items classified as highly or

moderately discriminating the better the test. Burroughs (1975) showed that an item

which does not discriminate between these groups, upper and lower, contributes

131
nothing to the establishment of an order of merit. It may be useful for warming-up

purposes though. An item which is easier for weaker students than it is for good

students would not only be a very curious item, but also one that detracts from the

test’s rank ordering properties.

4.9 Review of previous studies that employed SPM test

The present study is making use of the SPM test as a measure of non-verbal reasoning

ability, “g”. It is important, therefore, to examine a number of relevant studies that

used the SPM test in a variety of settings including education, vocation, clinic and

anthropology. A total of 54 studies were carried out in 26 countries, 11 developed and

15 developing, between 1948 and 2009. The developed country with the highest

number of SPM studies conducted was the United States, with 15 studies. Its

counterpart in the developing countries was India with a total of 5 studies. The earliest

study was in the USA (1948) while the latest in Qatar (2009). For clarity and easy

reference, the above studies are organised in Table 4.7. A thorough description of

each of the studies mentioned in the table is given below it. After presenting the

description of the studies, critical analysis and examination will be given.

Table 4.7 A sample of worldwide studies that utilised the SPM test
COUNTRY YEARS REFERENCES
Congo 1994 Nkaya et al.,
Denmark 1968 Vejleskov,
Egypt 1987 Abdel-khalek,
Estonia 2004 Lynn, et al.,
France 1994 Nkaya et al.,
Hong Kong 1988 Lynn et al.,
Iceland 2003 Pind, et al.,
India 1968; 1968; 1972; Sinha, Mehot, Mohan, Rao and Sinha,
1974 and 1977
Iran 1974 Baraheni,
Israel 1991 Kaniel, & Fisherman,
Italy 1962 Young et al.,
Kuwait 2006 Abdel-Khalek and Lynn

132
Libya 1983;1991;2005 and Aboujaafer, and Majdub, Attashan and
2005 Abdalla and Ahlam
Mexico 2004 Lynn, et al.,
Nigeria 1980 Maqsud,
Oman 2009 Abdel-khalek and Lynn
Qatar 1986; 2009 Bart et al., ; Khaleefa, & Lynn,
Pakistan 2006 Ahmad, et al.,
Slovenia 2007 Boben
South Africa 2000; 2002; 2007 Rushton and Skuy, Rushton, et al., Taylor
Sudan 2008.b Khaleefa, et al.,
Syria 2008.a Khaleefa, & Lynn,
Tanzania 1967 Klingelhofer,
Turkey 1993 Duzen, et al.
UK 1962; 1962; 1963; Foulds & Dixon, Foulds et al., King, Lynn et
1988; 1989 and 1994 al., Egan and van den Broek and Bradshaw
USA 1948; 1966; 1968; Rimoldl, Bingham et al., Tulkin &
1969; 1972; 1973; Newbrough, Burke & Bingham, Burke,
1986.a.b; 1987; 1988; Mclaurin & Farrar, Powers et al., Sidles &
1988; 1986; 1986; Avoy, Jensen et al., Karnes & Whorton, Bart
1994 and 1994 et al., Whorton & Karnes, Johnson et al., and
Blennerhssett et al.,

The objectives of the investigation of these studies include the effects of the following

independent variables on the SPM test results: age, gender, variability, study levels,

region (cities and villages) and academic discipline (sciences and arts) and a

comparison of the reported results with those obtained in our study.

Since each study may investigate more than one variable, it was quite difficult to

group them under a certain variable. Alternatively the studies outlined in Table 3.2

will be discussed according to two categories; those conducted in developing and

developed countries. Whether a country is ranked among developed or developing

countries is based on the Human Development Index (HDI). This is an index

combining normalized measures of life expectancy, literacy, educational attainment

and GDP per capita. It is, HDI, claimed to be a standard mean of measuring human

development - a concept that, according to the United Nations Development Program

(UNDP), refers to the process of widening the options of people, giving them greater

133
opportunities for education, health care, income, employment, etc. The basic use of

HDI is to rank countries by level of "human development". The index was developed

in 1980 by the Pakistani economist Mahbubul Haq and Sir Richard Jolly with help

from Gustav Ranis of Yale University and Lord Meghnad Desai of the London School

of Economics. It has been used since then by the UNDP in its annual Human

Development Report. Nowadays the HDI is a pathway for researchers into the wide

variety of more detailed measures contained in the Human Development Reports.

The HDI combines three basic dimensions:

• Life expectancy at birth, as an index of population health and

longevity.

• Knowledge and education, as measured by the adult literacy rate (with

two-thirds weighting) and the combined primary, secondary, and

tertiary gross enrollment ratio (with one-third weighting).

• Standard of living, as measured by the natural logarithm of gross

domestic product (GDP) per capita at purchasing power parity (PPP) in

United States dollars (UNDP Human Development Annual Report

2007/2008).

The studies conducted in developed countries will be discussed first followed by

detail examination and evaluation. After that, studies conducted in developing

countries are evaluated. Similarly comments and analysis are given at the end.

4.9.1 Studies on SPM test in developed countries:

Rimoldi (1948) carried out the SPM test on USA children aged 9 to15 years. The

mean time for attempting the test for a population of 1680 subjects was 38 minutes

with a SD of 11.90. For the age of 9 (M = 19.32, SD = 9.18); 10 (M = 24.2, SD =

134
11.60); 11 (M = 28.82, SD = 10.49); 12 (M = 33.45, SD = 9.98); 13 (M = 35.90, SD =

9.59); 14 (M = 35.61, SD = 9.65); 15 (M = 38.59, SD = 9.57). These results illustrated

that SPM mean scores increased with age and there was a drop in the mean number of

problems solved from Set A through C, there was no significant difference between

the means for Set C and D, and there was a final drop in Set E. In addition, analysis

showed one factor common to all of the sets of the SPM test.

Two earlier studies carried out in the UK by Foulds and Dixon (1962) and Foulds et

al., (1962) with adult psychiatric patients concluded that males were significantly

superior to females in SPM test results. Another early study was that of Young et al.,

(1962) in Italy who applied the SPM test to a random sample of elementary school

children in two regions. The children age ranged from 9 years and 6 months to 14

years and 6 months. Results showed that boys obtained higher scored than girls in the

city (mean percentiles: boys 59.06, girls 49.39), while in rural areas, girls scored

higher than boys (mean percentiles: boys 42.03, girls 49.71).

King (1963) in another study also in the UK found significant sex differences in

favour of girls in the SPM test. The boys age mean was 10.6 years and their SPM

score mean was 35.5; SD = 11.5. The girls age mean was 11.2 years and their SPM

score mean was 38.5; SD = 12.0. In total sample the SPM mean score was 37.1; SD =

11.9. Bingham et al., (1966) studied a small sample of patients (N=39) referred to

Vocational Counselling and Psychological Service in the USA. The subjects ranged

in age from 20 to 52 (mean age 36.1 year, SD = 7.7). The SPM mean scores was

40.6, SD = 11.80.

Tulkin and Newbourgh (1968) administered the SPM test to 356 fifth and sixth grade

students, from the suburban Maryland school system in the USA, to determine the

135
effect of past experiences related to race, social class, and gender on performance in

the SPM test. They found the following SPM test means with the eight groups; for

white high class females (N=64) was 41.1, SD =8.18; for high class white males

(N=64) was 42.2, SD =5.81, for low class white females (N=32) was 30.6, SD

=10.48; for low class white males (N=43) was 30.7, SD = 9.94, for high class black

females (N=23) was 39.7, SD = 6.71; for high class black males (N=27) was 39.0, SD

= 8.43; for low class black females (N=53) was 26.3, SD = 10.98; for low class black

males (N=50) was 25.1, SD =11.79.

They concluded that: (a) gender differences were not significant, (b) higher social

class and white subjects showed significantly higher SPM test scores and (c)

significant differences between races on the SPM test were found only in the lower

class students. The black low class scored significantly below the white low class.

Vejleskov (1968) in Denmark with 628 fifth grade children from two cities found that

boys (N = 174) and girls (N = 192) in Gentofte city had the same score (39.9) on

SPM, while Esbjerg city girls (N = 137) scored slightly better than boys (N = 125).

Boys mean score was 37.4 whereas girls mean score was 38.2. Also Vejleskov

noticed that boys, in general, worked faster than girls on SPM test. The SPM mean

for the total sample in Esbjerg city was 37.8 (N = 262).

Burke and Bingham (1969) in the USA concluded a SPM mean score of 41.2, SD =

11.5 for a sample of 91 male patients referred for vocational counselling (mean age =

35.1 year).

Another study by Burke (1972) investigated 567 SPM answer sheets of veterans

(black and white) who had taken the SPM test when referred for vocational

136
counselling. Veterans means age was 35.5, SD = 9.1 months (age range 16 to 64

years). SPM mean score was 40.0, SD = 12.0.

Mclaurin and Farrar (1973) in their study on 96 male and 105 female university

students in America concluded that the SPM did not have sufficient ceiling for

university students as indicated by the closeness of the SPM mean score 50.39, SD =

6.50 to the maximum score possible. Vincent and Cox (1974) studied a sample of 380

psychiatric patients which was taken from psychological files of the Texas Vocational

Rehabilitation Unit. Most of the sample either had a physical, emotional, or mental

disability. The sample mean age was 28.7 year and consisted of 57 % white, 36 %

black and 7 % Latin Americans. The SPM mean score for the total sample was 39.25,

SD = 12.00. They concluded that the SPM test is a viable tool for measuring

intelligence in such population.

Bart et al., (1986) compared the performance of 273 Qatari students (151 boys with a

mean age of 12.97 years and 122 girls with a mean age of 12.63 years) on the SPM

test to that of 281 American students (150 boys with a mean age of 12.37 years and

131 girls with a mean age of 12.70 years) in the fifth, sixth and seventh grades.

American students scored higher (M=43.39) than the Qatari students (M=30.24), and

also they added that males students performed better than females and older students

tended to perform better than younger students. They did not report any data

regarding performance of students in both countries according to age, gender, or grade

level.

Powers et al., (1986.a) carried out a study in the USA on 127 Hispanic (69 boys and

58 girls) and 103 Anglo- American (53 boys and 50 girls). Mean age of students was

11.6 year. Students were enrolled in grade 6 of four elementary schools of a large

137
urban school district in the South west of USA. Hispanic and Anglo-American

students were compared for their overall scores on the SPM test. When the total mean

score of Hispanic students (M=38.43, SD = 7.45) was compared to that of the Anglo-

American students (M = 39.19, SD = 7.30), no significant differences were found.

Powers et al., concluded that these result support the continued use of the SPM test

with Hispanic and Anglo-American students.

In another study by Powers et al., (1986.b) in the USA to examine gender differences

in performance on the SPM test, they administered the SPM test to 212 sixth grade

students (116 boys and 96 girls) and 214 seventh grade students (109 boys and 105

girls). The ethnic background of the students consisted of Native American, Black,

Hispanic, and non Hispanic Caucasian. The students were from four schools that

ranged in socio-economic status from lower middle to upper middle SES in urban

school district in the South west of the USA. Sex differences in performance on SPM

test were examined at each grade level. Sixth grade boys' mean 38.81, SD = 6.84 did

not differ significantly from girls' mean 39.26, SD = 7.35. Seventh grade boys' mean

score of 39.48, SD = 8.06 and girls' mean of 38.88, SD = 8.21 also did not differ

significantly.

Sidles and Avoy (1987) administered the SPM test to 124 Navajo students (62 boys

and 62 girls, age 14 and 15 years), in seventh and eighth grade, in Arizona and New

Mexico. They reported that the raw scores mean for females was 39.85 and for males

was 39.88. Mean score for seventh grade students was 38.83, while the mean for

eighth grade students was 40.11. Mean score of SPM test for total students was 39.86.

They noticed that this mean was lower than that obtained for the United Kingdom

students of similar age group during the 1981 standardisation of the SPM test. They

138
concluded that the SPM test had potential for being included by school psychologists

in their psycho-educational test battery as a measure of intellectual ability of

adolescent Navajo students evaluated for special education or gifted programs.

Lynn et al., (1988) carried out a study in the UK and Hong Kong with 120 boys and

77 girls from Hong Kong and 75 boys and 95 girls from the UK. The students mean

age was 10.5 years, and the British students were Caucasian. They found that, the

Hong Kong boys and girls both obtained significantly higher mean on SPM than their

British counterparts. The Hong Kong boy’s SPM mean percentile was 71.48; SD =

20.00 and Hong Kong girls’ SPM mean percentile was 68.44; SD = 21.34. The higher

mean obtained by Hong Kong boys as compared with Hong Kong girls was not

significant. British boys and girls in this study obtained identical means equivalent to

percentile of 51.72; SD = 28.84 for boys and 28.62 for girls.

In the USA, the SPM test was administered by Jensen et al., (1988) with time limits of

40 minutes to a total of 261 undergraduate’s students. The overall SPM mean was

51.32, SD = 4.69.

With 307 students in grades 3 through to 8 in a rural county school system in

Mississippi US, Whorton and Karnes (1988) found that the SPM mean for the total

sample was 32.2, SD = 11.2. The sample consisted of 70 black and 237 white

students; 142 were girls and 165 boys. The mean age was 10.8 years with a range

from 8.3 to 15.7 years. For black students the SPM mean score was 25.4, SD = 9.9

(N=70). The SPM mean score for white students was 34.3, SD = 10.7 (N=237). The

means difference between students on the basis of race was significant. In another

study also in Mississippi by the same researchers (1988) the SPM was administered to

625 students in a rural a county elementary school (grade 3 to 8). 441 white students

139
and 211 black students with a mean age of 8.10 years carried out the test. Of them 410

students were on free or reduced lunches, and 245 students on paid lunches. The SPM

mean for students on free lunch was 29.7, SD = 10.9 whereas for students on paid

lunch was 35.6, SD = 10.8.

Egan (1989) in the UK with a sample of 94 (43 male and 51 female) trainees, with a

mean age of 16.7 years, SD = 9.7 months that had been unemployed for 6 months

following leaving school, administered the SPM with a 30 minutes time limit. The

SPM mean for the total sample was 36.5, SD = 9.9; the SPM mean for males was

38.4, SD = 9.8 and for females was 34.6, SD = 9.8. Gender difference was not

significant.

The second investigation about the SPM in Libya was carried out by Majdub (1991)

who administered the SPM to two groups that consisted of 193 students (68 males and

125 females) from Tripoli University. He found that the Education major group had

significantly higher means than the Arabic major group. For the Arabic major group

the SPM mean was 34.40, SD = 9.13 (N=81). For the Education major group the SPM

mean was 39.14, SD = 9.08 (N=112). Majdub concluded that differences between the

two groups with respect to SPM, in favour of the education groups, maybe due to the

familiarity of the education group with solving abstract problems.

Nkaya et al., (1994) claimed that comparisons of intelligence test scores of individuals

from developed countries to individuals from developing countries have always

shown high disparities in favour of western subjects regardless of the type of the test.

For example, they administered the SPM test three times to students in France and

Congo, to obtain the classic improvement in scores at retest. Participants were 88

Congolese (45 boys and 43 girls with a mean age of 13.3 years) and 68 French (36

140
boys and 32 girls with a mean age of 12.3 years) who were in the sixth year of

schooling. Neither the French nor the Congolese students had ever been administered

an intelligence test. The test situation, however, was much more familiar to French

students due to exposure to material and educational games similar to materials used

in intelligence tests, which was not the case in Congo.

The SPM test was administered to the same standards three times (T1, T2 and T3) at

two weeks intervals. The test was self-paced but students were encouraged to work

rapidly. Time and items solved correctly after 20 minutes were recorded. For self-

paced conditions, the SPM test means scores for French students in test 1 was 46.9,

SD = 5.9; test 2 was 49.4, SD = 4.9; and test 3 was 49.1, SD = 4.6. For the Congolese

students SPM test mean for test 1 was 29.6, SD = 11.6; test 2 was 33.0, SD = 11.9 and

test 3 the mean was 32.5, SD = 12.0. The means of the SPM test for timed condition

for French students in test 1 was 40.4, SD = 5.2; test 2 was 48.0, SD = 5.2 and test 3

was 48.5, SD = 5.0. For Congolese the SPM test timed mean in test 1 was 23.5, SD =

9.3; test 2 was 29.5, SD = 11.1 and in test 3 was 32.0, SD = 12.1.

They concluded that student’s scores increased more rapidly from test 1 to test 2 than

from test 2 to test 3 especially when the test was timed (7.6 points increase for French

and 6 points increase for Congolese). There were no improvements for the French

self-paced mean between test 2 and test 3 (- 0.3 points) and 3.4 points increase for

Congolese. There was little improvement (0.5 points) in the mean for timed condition

for French students between test 2 and test 3, and for the Congolese there was an

increase of 3.4 points. From test 1 to test 3 with timed condition there were 8.1 points

increase For French and 8.5 points increase for Congolese students. In general the

141
performance on SPM test was higher for French students than for Congolese students

for both self-paced and timed testing.

In a study by Johnson et al., (1994), a sample of 449 second, fifth and seventh grade

students in San Diego city school were given the SPM test. In this group, 77 students

were African American, 122 Asian, 54 Filipino, 156 Latino and 40 White American.

Of these 215 were boys and 234 were girls. The mean age of the children was 11

years (age range from 6 years, 8 months to 13 years 10 months). The SPM mean

score was 36.10, SD = 11.52.

In the UK, van den Broek and Bradshaw (1994) administered the SPM to normal and

patient samples. The normal sample was 77 subjects (58 females and 19 males), all of

them were native English speakers and none had a history of psychiatric or

neurological disorder. The patient sample was 75 native English speaking (42 male

and 33 females). The patient sample was allocated to one of three groups: left-

hemisphere (N=24), right-hemisphere (N=34) or bilateral lesions (N=17). The mean

age for normal sample was 35.2 year, SD = 12.8 months, for left-hemisphere 48.3

year, SD = 16.7 months, for right- hemisphere was 48.8 year, SD = 17.1 months and

for bilateral lesions was 60.4 year, SD = 12.4 months. The SPM mean scores for the

normal sample was 47.3, SD = 8.2; for bilateral sample was 21.2, SD = 11.2; for left

sample was 33.8, SD = 12.6; and was 30.0, SD = 14.5 for the right sample.

For the use of the SPM with deaf subjects, in a survey by Levine’s (1974) the

Ravens’s Matrices test ranked in the top ten for frequency of use with deaf subjects.

Armfield (1985) administered the SPM to 240 deaf/mute students from South China

and concluded that the SPM appeared to be helpful as a tool for teachers making

individual educational plans for students.

142
A study by Blennerhssett et al., (1994) with 102 deaf residential adolescents showed a

SPM test mean of 33.98, SD = 10.80. The mean age was 14.7 years with a range

from 10 to 19 years. They concluded that the SPM test appeared to be suitable for

assessing non-verbal intelligence of children with hearing impairments, and was

especially useful when a quick screening technique was needed for deaf adolescents.

Pind et al., (2003) carried out the SPM test on Icelandic school children aged 6 to16

years. A total of 665 children were tested and the standardization sample consisted of

550 of the 665 children. The median total score rose from 23 in the 1st grade to 50 in

the tenth grade. Scores increased regularly with increasing age. Icelandic norms were

higher 2 to 3 points than UK norms. Performance of girls and boys on the SPM was

compared. Average score of girls in the standardisation sample was 40.1 with boys

receiving on average a score of 39.4. A two-way analysis of variance (gender grade)

showed a significant effect of grade, F (9,530) = 66.95, P<0.0001. The effect of

gender was not significant, F (1,530) = 0.61, P=0.434, as was the interaction of gender

and grade, F (9,530) = 0.65, P= 0.759. The effect of geographical district was also not

significant, F (7,542) = 0.89, P=0.516. It was concluded that grade, or age, was the

only factor in this study which had a significant effect on the children’s SPM score.

Lynn et al., (2004) conducted an SPM test on an Estonian sample to investigate any

sex difference. 2738 adolescents (1250 male and 1439 female) attending 6th, 8th, 10th,

11th and 12th grades carried out the test. Overall, females obtained a higher mean than

males. Female obtained higher means by (3.8 IQ points) than males in the ages of 12

to 15 year, whereas males obtained higher means by (1.6 IQ points) than females in

the age average 16 to18 year. Overall, males had statistically significant larger

143
variance than females. Also lrwing and Lynn (2005) established sex differences on the

PM among university students. Men obtain significantly higher scores than females.

In 2007, Duzen et al., began the process of standardization of the SPM test in Turkey

in an aim to identify gifted children. An overall 2458 students were tested (1170 girls,

1288 boys; aged between of 6½ to 14½ years) 1341 students were from rural origins

while the reaming 1117 were from urban. Results obtained showed that students from

urban origins obtained significantly higher scores than students from rural origins:

they also showed that grade predicts SPM scores more accurately than age.

In 2007, Boben conducted an SPM test on 1,556 children and adolescents aged 7.5 to

18 years in Slovenia 53% were male students. 9 items were shown to be misplaced in

difficulty (A6, A9, A10, B9, B10, B11, C5, C7, C9). Both Cronbach Alpha and split-

half tests showed a (0.95) reliability. This study showed that subgroups differed in

statistically significant ways in relation to sex (F =13.13, p = 0.00) and age group (one

year intervals) from 8 to 18 years (F = 76.48, p = 0.00), but not in the interaction

between them (F = 0.65, p = 0.77). A more detailed analysis showed that sex

differences occurred only in older age groups. T-test revealed statistically significant

differences for age groups of 16-year olds (p = 0.02), 17-year olds (p = 0.01) and 18-

year olds (p= 0.04). Nevertheless, statistically significant differences regarding sex

were not confirmed.

Some important features are to be noted about these studies. In the first hand most of

the studies have selected their samples randomly and with adequate sizes. Few studies

have not mentioned their selection procedures, such as Mclaurin and Farrar 1973 in

USA; Vancent and Cox 1974 in USA; and Brook and Bradshaw 1994 in UK. In some

studies, neither the sample size nor selection criteria have been reported. Examples of

144
such studies are: Young et al., 1962 in Italy; King 1963 in UK; and Mclaurin and

Farrar 1973 in USA. Since the larger the sample size the more representative it is of

the behaviour domain, a total of 5 studies have taken advantage of this fact. These

include: Lynn et al., 2004 in Estonia; Duzen et al., 2007 in Turkey; and Boben 2007

in Slovenia. Moreover they have applied advanced statistical procedures such as

factor analysis, Two-Way Analysis of Variance and Multiple regression stepwise

analysis.

So far the analysis was concerned about the sample selection and size. Now the

attention will be paid to the characteristics of the samples themselves. Along with

healthy people, a number of SPM tests were conducted on patients with physical and

psychological disabilities. These involve hearing impairments and mental disorder

patients. These types of studies were not included in the meta-analysis chapter

(chapter 6). Other studies took into account various variables such as the economical

status of the subjects. The criteria upon which lower and upper classes were

distinguished were not among those adopted in the field of economics though. As an

example, the study conducted in the USA by Kernes and Whorton (1988) on a sample

of students classified them into two categories: those on paid launch representing the

upper class; and those on free launch representing the lower class.

As a final remark, only 2 out of 10 studies performed their SPM test on rural and

urban residents. These were carried out by Duzen et al., 2007 in Turkey and Young

1962 in Italy. This element, difference between urban and rural lives, had a noticeable

effect on the SPM test. Ignoring it will render the sample ill-representative.

145
4.9.2 Studies on SPM test in developing countries:

Klingelhofer (1967) administered the SPM test with a time limit of 30 minutes to

African and Asian secondary school students in Tanzania. The African sample

consisted of 2963 students (2125 males and 838 females) and the Asian sample

consisted of 729 students (415 males and 314 females). The mean age for the four

groups were; African boys 17.1 years, African girls 16.1 years, Asian boys 14.8 years

and Asian girls 14.3 years. The SPM test mean scores were 34.3 for African boys,

34.1 for African girls, 43.9 for Asian boys and 41.7 for Asian girls. There was no

statistical significant difference in mean scores between African boys and girls, and

no statistical significant difference was found between the African tribes in

performance on the SPM test. There was a significant mean difference between Asian

and African students in favour of Asian students, also Asian boys scored better than

Asian girls. Klingelhofer, claimed that the significantly better performance of Asians

than Africans on the SPM was probably associated with a number of cultural factors

that differentiate the two group. e.g. Asian children start school early, have literate

parents and live in towns where they have daily contact with stimuli of modern life,

whereas African come from rural environment and low income families.

Sinha (1968) reported the following means for both sexes from rural and urban

population from India. For rural boys the SPM mean scores were 22.50 at age 12

years; 26.50 at 13 years and 27.10 at 14 years. For urban boys the SPM mean scores

were 24.00 at 12 years; 27.40 at 13 years and 29.10 at 14 years. For rural girls the

SPM mean scores were 26.83 at 13 years and 30.00 at 14 years (no data for age 12).

For urban girls the SPM mean scores were 25.50 at age12 years; 28.90 at age 13 years

and 30.10 at age 14 years. Sinha concluded that urban children scored higher than

rural children, and girls scored higher than boys in both rural and urban areas. In the

146
same study Sinha reported that the SPM mean score for Art-Science students was

47.84, SD = 4.46 (N=220) while the SPM mean score for Engineering students was

54.03, SD = 3.61 (N=204). Both samples were from Tirupati, India.

From India, Mohan (1972) administrated the SPM test to 310 university and college

students (165 females and 145 males) with an age range of 18 to 25 years. Mohan

reported the following means; for males mean score was 46.48, SD = 7.32; the mean

score for females was 43.88, SD = 7.70. Mohan found that the mean score of 45 on

SPM test corresponds to the 50% as given by Raven for the age range 14 to 25. Also

there was significant difference on SPM test scores favouring male students.

Another study from India by Rao (1974) administered a shortened version of the SPM

test (45 items instead of 60 rearranged in graded order of difficulty) to different

college students with a mean age of 18.10 years. Rao found the following means; the

mean for Engineering students mean (N=452) was 54.14, SD = 3.9; Agricultural

students (N=207) was 46.42, SD = 6.55; Science students mean (N=769) was 45.18,

SD = 7.82; Education students (N=219) was 42.84, SD = 8.51; Art students (N=487)

mean was 41.28, SD = 8.30; and Commerce students mean (N=122) was 39.76, SD =

8.19. Also, Rao compared the SPM test means of high and low academic achievers

and found that the mean of high achievers (N=106) was 53.26, SD = 3.04; while the

mean of low achievers (N=106) was 51.37, SD = 3.87. In the same time the mean

scores of high achievers in the achievement test was 18.32, SD = 3.2; while the mean

scores of low achievers in the achievement test was 2.48, SD =1.3. In comparison

between SPM and achievement tests, Rao concluded that the SPM test scores failed to

discriminate between the high and low academic achievers. Nevertheless he claimed

147
that the Standard Progressive Matrices test was as good as any other test of

intelligence in predicting scholastic performance.

Baraheni (1974) carried out a study in Iran. The study was designed to cover a

representative sample of students (N=4561) from age 9 to 18 years, attending primary

and secondary schools in Tehran. Baraheni found that Iranian boys scored higher on

the SPM test than Iranian girls. The differences were statistically significant from age

9 up to 13 years. He mentioned that the slight superiority of boys over girls on the

SPM test might reflect the fact that progressive matrices measures, in addition to a

general factor, a spatial dimension in which boys have been found to excel girls. He

also added that although a steady increase in SPM test scores was observed at

successive age levels, both in males and females, the magnitude of differences at

some age levels was very small, especially after 15 years of age. Baraheni claimed

that this steady increase in average performance which was significant up to age 15

was in accordance with data reported by Raven. The SPM mean for age 17 years was

37.93; SD= 11.41; and N=256. The SPM mean for age 18 years was 39.36; SD=

10.34 and N = 304. Baraheni concluded that on the basis of his data, the SPM test

was an efficient test of general intelligence for use with Iranian children.

Sinha (1977) also from India administered the SPM test to an indian sample which

consisted of 100 boys and 100 girls aged 11 to 15 years. Sinha, reported the following

total means for the performance of students on SPM test according to age; for age 11

years mean was 27.25, SD = 9.30; for age 12 years was 27.25, SD = 8.90; for age 13

was 30.30, SD = 10.50; for age 14 years was 33.00, SD = 9.40; and for age 15 years

mean was 32.25, SD = 11.20. Sinha concluded that with increase in age, there were

some increases in SPM test means for Indian students from age 11 to 14 years. Also

148
the means of the Indian students were very low compared with Raven's British norm

for children at the same age. In the same study, Sinha found that science students

scored higher than art students on the SPM test in Indian sample. In addition, he

reported that Shanthamani’s in1970 found similar results on Alexander’s Battery for

intelligence test.

Maqsud (1980) in Nigeria administered the SPM test to 120 primary school students

with an average age of 12.2 years for the students in a modern school and 12.6 years

for the students in a traditional school. Sixty students were randomly drawn from a

modern school (upper-middle class homes), and 60 from a traditional school (lower-

middle and lower class families). The mean score of the SPM test for students from

the traditional school was 23.25, SD = 3.49 while the mean score for students from

the modern school was 20.85, SD = 4.27. The mean score of SPM test for students

from the traditional school was found significantly higher than for students from the

modern school.

The first investigation of the SPM in Libya was that of Aboujaafer (1983) who

studied pupils’ achievement in preparatory schools in Tripoli. The SPM test was

administered to a sample of 201 boys and girls who were in grade 8. The age mean

was 14 years. The boys SPM mean was 35.40; SD = 10.40; (N=100). The girls SPM

mean was 33.50; SD = 10.80; (N=101). The SPM mean for the total sample was

34.50; SD = 10.60; (N=201). The difference between boys and girls means was not

significant.

Abdel-khalek (1987) in Egypt administered the SPM test to 452 university

undergraduates, 205 males with a mean age of 24 years and 247 females with a mean

age of 23 years in the departments of Psychology, Anthropology, Geography, Arabic

149
Language and English Literature. Mean scores for males was 44.2, SD = 7.8; while

for females was 40.8, SD = 8.4. Abdel-khalek claimed that gender differences which

emerged in the study may be related to social factors in an eastern society, but did not

mentioned these factors. He stated that, in brief, the SPM test may provide a

promising tool for measurement of non-verbal intelligence in an Egyptian context.

Kanil and Fisherman (1991) compared the performance of 250 Ethiopian Jews (115

boys and 135 girls, with average age of 14.7 years) on the SPM test to that of 1740

Israeli Jews ages 9 to 15 years. The mean for Ethiopian Jews aged 15 and 16 years

was 27.0, whereas mean for Israeli children aged 9 and 10 years was 28.0, and mean

for Israeli aged 14 and 15 years was 45.0. They concluded that the SPM test mean for

the Ethiopian Jews aged 15 and 16 years was very similar to the mean of Israeli aged

9 and 10 years. They added that when the two culture groups were roughly matched

for total score in the SPM test (mean score obtained by 9 year old Israelis and 14 year

old Ethiopians); they exhibited the same pattern of distribution of errors in the SPM

test. They claimed that these results suggested that the performance of Ethiopian

Jews reflected a developmental delay, and not a different cognitive style. They added

that the SPM test scores merely told us how Ethiopian Jews compared to the Israeli

children at this point in time, but they did not tell us about their response to new

learning situations.

Rushton and Skuy (2000) carried out a SPM test to 309 students (17 to 23 years) in

South Africa (137 Africans, 136 whites; 104 men, 205 women). The test aimed to

compare performance between african and white students. Analysis of variance

(ANOVA) with race and sex as factors showed significant main effects and a

marginally significant interaction, F (1,305) = 131.85, p < 0.001; F (1,305) = 8.89, p <

150
0.01; and F (1,305) = 3.67, p < 0.10. Men averaged higher scores than women (M =

50.47; SD = 7.9) The 1993 US norms for 18- to 22-year-olds show that White men,

with 54 out of 60 correct responses, averaged at the 61st percentile; and that White

women, with 53 correct responses, averaged at the 55th percentile; and that African

men, with 46 correct responses, averaged at the 19th percentile; and that African

women with 42 correct responses averaged at the 11th percentile. These SPM grades

and percentile points were converted to IQ equivalents of 105 for Whites and 84 for

African. Males also averaged slightly higher than females. In addition, item analysis

(difficultly and discrimination) was carried out. Percentages were used to calculate

item difficulties between whites and africans across the 60 items. For all groups, set E

was the most difficult followed by set C and then D. Sets A and B were the easiest.

Using a proportion of 70 percent of respondents passing as the criterion for judging an

item as ``too easy,'' 54 of the 60 items (90%) proved as being too easy for Whites and

41 of the 60 items (68%) too easy for Africans. Overall, Africans found the items

more difficult than did the Whites, as did women compared to men. For calculation of

item discrimination, “items-total correlation” (point biserial) was utilised. According

to Hopkins (1998) Index of Discrimination and Items Evaluation, the number of items

that were considered as having excellent discriminating value was 41 items for

africans and 13 for whites, good discriminating value were 10 items for africans and 7

for white and fair discriminating value were 6 items for africans and 18 for whites.

In 2002, Rushton et al., administered the SPM test to 342 university students (198

African, 86 whites, 58 Indians; 271 men and 71 women). The White, Indian, and

African mean scores were, in order, 56, 53, and 50 out of 60 (S.D. = 2.6, 4.9, 6.4;

ranges = 46–60, 37–60, 11–60). Men averaged similar scores to women (unweighted

means = 52.9, 52.5; S.D. = 5.0, 3.3; ranges = 11–60, 35–60). Analysis of variance

151
(ANOVA) with race and sex as factors showed a significant main effect only for race,

with no effect for sex either as a main effect or in interaction, F(2,342) = 24.23, P

< .001; F(1,342) < 1.00; and F(2,342) < 1.00. For the total score, the African–White

difference was 1.00 S.D. (based on total S.D. of 6.05). The 1993 USA norms for 18 to

22 years showed the Whites at the 75th percentile, the Indians at the 55th percentile

and the Africans at the 41st percentile. These translated into IQ equivalents of 110,

102, and 97, respectively. Item analyses were measured by the proportion getting the

correct answer. Item analyses was very similar for Africans, Indians, and Whites

(r > .90; r >.79, P < .01) suggesting that the test measured the same construct in all

three groups. Using a proportion of 70% of respondents passing as the criterion for

judging an item as ‘‘too easy’’ 57 of the 60 items (95%) proved too easy for Whites,

53 or 88% for Indians, and 50 or 83% for Africans. Also the item-total correlation for

each item was calculated using the point-biserial correlation of each item’s pass or fail

status (0 or 1) with the total score on the test.

Lynn et al., (2004) carried out a sex difference SPM test in Mexico. The SPM was

administered to a sample of 920 (aged 7 to 10 years old) children (472 males and 448

females) from three different ethnic groups. Analysis of variance showed a statistical

significant age affect (SPM scores increased with age), and no statistical significant

gender affect. This study showed a very small overall gender difference in the SPM

results, with an increasing advantage of girls as time increases.

A third investigation conducted in Libya was carried out by Ahlam (2005) to evaluate

the relationship between intelligence and high school students’ academic

achievement. An SPM test was conducted on 240 (16 and 17 years) students (120

males and 120 females). Mean scores obtained for males was (M=38.31 and

152
SD=8.53) whereas that for females was (M=35.68and SD=7.73). Total mean scores

was (M=37.00 and SD=9.23). Results showed gender difference in favour of males.

Also analysis showed the correlation between SPM mean scores and students’

academic achievement was (r=0.45 p = 0.01).

A fourth investigation in Libya was carried out by Attashan and Abdalla (2005) to

examine the relationship between intelligence and university students’ academic

achievement. The SPM was conducted on 510 undergraduate university students.

Mean scores obtained for males was (M=40.50 and SD=8.80) whereas that for

females was (M=40.21and SD=9.62). No significant gender difference was found. On

the other hand, arts students mean scores was (M=35.82 and SD=8.09) while that of

science students was (M=44.54 and SD=7.73). Significant difference in the mean was

in favour of science discipline students. Total overall mean scores was (M=40.36 and

SD=9.21). In addition, analysis showed the correlation between SPM mean scores and

students’ academic achievement was (r=0.35 p = 0.01).

Abdel-Khalek and Lynn in 2006 investigated sex difference on the SPM test in

Kuwait, on a sample of 6,529 (8 to 15) year old students (boys 3278 and girls 3251)

from six different districts in Kuwait. In each district, one socially representative

elementary, intermediate and secondary school for boys and one for girls were

randomly chosen from a list of schools. Children were tested in classes which were

randomly selected. The selection of school districts used a stratified random sampling

procedure. Study results showed that girls obtained significantly higher means then

boys among 8,9,10 and 14 year olds. No statistically significant differences were

found among 11, 12, 13 and 15 year olds. Overall girls’ advantaged in the total

sample statistically significant higher mean scores (M = 35.75 SD = 11.49) than boys

153
(M = 34.81 SD = 12.11) p = < 0.001 although it is very small at .08d, equivalent to

1.2 IQ points. This difference was attributed to possible sampling bias.

Taylor in 2007 carried out a study in South Africa on 144 female and 199 male job

applicants, of whom 46.9% were Black and 41.8% White. The average age was 33.8

years. The mean SPM scores was (M=44. 65, SD=11.94). Scores on the SPM were

compared across gender and ethnic groups using an independent samples t-test. Males

scored a mean SPM value of (M=44. 69, SD=12.64) whereas females scored (M=44.

45, SD=11.28). The results of the t-test across gender groups showed that there were

no significant differences on the SPM score. The black ethnic group scored a mean

SPM value of (M=41. 20 and SD=13.06) whereas the white ethnic group scored a

mean SPM value of (M=48. 21 and SD=9.33). The White group on average scored

significantly higher than the Black group. Although this finding may cause some

concern at first, it is important to consider the context in which the test was

administered.

Kaleeefa and Lynn (2008a) carried out a standardization of the Standard Progressive

Matrices in Syria on a sample of 7 to 18 years. A total of 3489 participants carried out

the test (1739 male and 1750 females). Results showed no sex difference. There was

no consistent pattern in sex differences among age groups.

It has frequently been asserted that there is no sex difference in general intelligence

but that males have greater variability than females. This assertion was made in the

early years of the twentieth century by Havelock Ellis (1904), Thorndike (1910) and

Terman (1916). This difference in variability was proposed by these early

investigators to explain why men are so greatly over-represented among geniuses.

When they found that there is no sex difference in general intelligence, a greater

154
variability among males entailing more males among those with very high intelligence

(as well as more males with very low intelligence) seemed to provide a solution to this

problem. Kelefeeh and Lynn investigated sex difference in variability. There was no

consistent answer. Overall, girls had greater variability than boys. In 7 age groups

boys had greater variability whereas girls had greater variability in 4 age groups. In

the sample considered as a whole; girls had greater variability than boys. This study

also showed that average SPM scores were lower in developing countries when

compared to developed countries.

Khaleefe et al., (2008b) carried out a standardization of the Standard Progressive

Matrices in Sudan for 6202 participants for ages 9 through to 25 years. They analysed

the data for sex difference in mean and variability. The study showed no sex

difference at ages 9 through 13. Females obtained statistical significantly higher

means from age 14 through to 18. At 19 years, males did not have significantly higher

means. At 20 to 25 years, males obtained statistically significant higher means. In

addition, results showed no consistent sex difference in variability. Males had greater

variability in 7 age groups whereas females had greater variability in 5 age groups,

Ahmad et al (2008) conducted a study to standardize SPM test in Pakistan during

2004 to 2006. The sample consisted of adolescents aged 12 to 19 years and adults

aged 18 to 45 years. The adolescents (N=1,662) were selected from representative

schools in four provinces into which Pakistan is divided (North West Frontier,

Baluchistan, Sindh and Punjab) and were tested in groups. The adult sample consisted

of 2,016 participants (1,019 females and 997 males). The results overall suggested

negligible gender differences in the mean performance on the SPM in Pakistan. In

addition, in most age groups, females had greater variability than males. The mean

155
scores of the Pakistani sample were lower than those obtained by standardization

samples in UK and the USA.

Abdal-Khalek and Lynn 2009 investigated the SPM on 5,139 school students aged

9 to 18 years with approximately equal numbers of males and females, drawn from

representative school students and 92 university students (43 male and 49 female)

in the capital city of Oman (Muscat). They reported an average of 85 for school

students and 93.7 for university students. There were no significant gender

differences among the 9 to17 year olds, but at age 18 years males obtained a higher

mean of approximately 2.5 IQ points. Among university students males outscored

females by approximately 5 IQ points.

Khelefeeh and Lynn (2009) conducted a study to evaluate the SPM test norms in a

Qatari standardization sample, 1135 students aged 6-11.5 (male N = 517 and female N

= 618) were tested. Although an IQ of 78 was reported in an earlier study in Qatar,

this study reported an average IQ of 88. This difference was attributed to possible

sampling administration errors. This study confirmed previous studies conducted on

the Middle East that failed to showed greater male variance in SPM scores. This study

showed in total sample that females obtained higher mean scores (M = 25.7 SD =

11.34) than males (M = 23.7 SD = 9.98). Furthermore, the analysis showed that SPM

means score increased with increasing age.

Generally the studies performed in the developing countries had clarified the sample

selection procedures in details including random selection and large sample sizes.

Abdel-khalek and Lynn in Kuwait (2006), for example, carried out an SPM test on a

number of 6529 students; Khaleefe et al., (2008b) tested 6202 subjects including

children and adults. Comparing to the studies in the developed countries, the largest

156
sample composed of 2738 children in Estonia managed by Lynn et al., (2004).

Furthermore, the analytical methods employed in many studies were identical to those

used in the developed countries studies. Lastly, it should be noted that more modern

studies have been conducted in developing countries than developed ones.

Although the studies in the developing countries had covered various variables and

mounted to the developed countries standards, they had a number of drawbacks.

Firstly, some studies lacked the description of the sample in terms of sample age

involved, such as Shin (1977) in India, Majdub (1991) in Libya; selection

procedure such as Klingdhfer (1967) in Tanzia, Mohan (1972) in India. In terms of

the differences between rural and urban areas, only one study evaluated this

variable (Shinha (1968) in India).

Unlike studies performed in the developed world, there was a study among those

done in the developing world that had employed an incomplete SPM test. Rao

(1994) in India had used 45 test-items out of 60 items, which were designed for the

test. Accordingly, this study cannot be included in the meta-analysis chapter as it

had a lower number of test-items.

4.10 Chapter Summary

This chapter aimed at providing a detailed, self-consistent and comprehensive

account of the SPM test. It served as a complete introduction to the history,

literature, psychometric characteristics and applications of the SPM test. Extensive

review of earlier studies has revealed that the SPM test is, without any doubt, a

reliable and valid psychological test. It is particularly powerful in the domain of

mental ability and intelligence.

157
The Progressive Matrices Tests resulted from the work of the British psychologist

John C. Raven and geneticist Lionel Penrose around the thirties of last century.

Their work was based on Spearman’s two-factor theory. Raven Progressive

Matrices are probably amongst the most widely used culture-fair tests. They exist

in three forms; SPM test, CPM test and APM test.

The SPM test is a non-verbal ability test consisting of increasingly difficult sets. It

was first fully standardised by Raven for children. Later on, the test was re-

standardised for adults. Standardisation took place in different countries both in the

developed and developing world. Since its introduction, several checks were run to

determine its norms accuracy.

Literature showing the reliability, validity and item analysis characteristics of the

SPM were presented and discussed. To determine the reliability of the SPM test

accurately a single technique is not sufficient. Therefore three methods have been

used in literature: test-retest reliability, split-half reliability and Cronbach’s alpha

reliability. The average scores of the three tests were found to be 0.93 for test re-

test after two weeks interval; 0.90 for split-half test; and 0.95 for alpha (Kuder-

richardson 20) test

Likewise, to firmly establish the validity of the SPM test one should look at the

following three types of validation procedures; content validity, criterion-related

validity and construct validity. It was found that the SPM test can be used in cross-

cultural contexts due to its culture-fair reliability. The majority of the examined

studies showed that the SPM test is a measure of the intellectual ability “g” only

with no other factors.

158
Furthermore literature showed that the correlation of the SPM concurrent validity

with standard intelligence ranged from 0.50 to 0.80. Whereas, the SPM predictive

validity correlation with academic achievement tests generally fell in the region of

0.20 to 0.60.

Studies that focused on item analysis, item difficulty and item discrimination, of

the SPM test were presented. Those which employed the SPM in different cultures

were also mentioned and evaluated. It can be concluded that the SPM test has been

used extensively in various fields including educational, vocational, clinical and

anthropological all over the globe. This is essentially due to its high degree of

reliability and validity as well as its culture-fair features.

Next chapter will focus on the work flow of this study. It will shed light on the on-

ground tests conducted and their related work. In addition, it presents the

methodology adopted in the research, materials such as statistical software and the

data analysis pipeline.

159
Chapter five: MATERIALS AND METHODS

5.1 Introduction

This chapter outlines and critically analyzes methods and approaches employed in this

study. Chosen methodologies were explored and contributions offered were also

subjected to critical appraisal. Statistical techniques for data analysis were justified

and evaluated for their suitability. Ethical issues relating to data collection and data

analysis were considered.

5.2 Research design

The intent of any research is to create new knowledge through systematic enquiry.

Research is governed by scientific principles that vary from one discipline to another

(Gomm & Davies, 2000). Quantitative research approaches are applied to describe

current conditions, investigate relationships, and study cause-effect phenomena. A

quantitative research approach was used in this study due to the numerical nature of

the data and large sample size tested. Qualitative research methods were not

appropriate for this study as the only available method to measure intelligence was by

conducting a test. Quantitative research designs can be divided into experimental and

non-experimental designs. In experimental research, at least one independent variable

is manipulated, while the remaining variables are controlled, and the effect on one or

more dependent variables is observed. As there was no manipulation of variables in

this study it was classified as a non-experimental study. Furthermore, the broadest

category of non-experimental designs was the survey and correlational designs, which

was employed in this study (Gay 2006, and Lobiondo-Wood, Haber 2006).

160
5.3 Methodology

Two main activities were employed in this study: first, a survey using the standard

progressive matrices (SPM) test was conducted to obtain preparatory data from a

Libyan sample. Second a meta-analysis was performed to compare the SPM test

results with studies from other countries.

In survey designs, subjects are selected and an investigator carries out a test,

questionnaire or conducts interviews to collect data. It is used frequently in

educational research to describe trends, determine opinions, identify group

characteristics, understand attitudes and beliefs, identify practices, evaluate programs

and other types of information (Creswell, 2000). Usually, research is designed so that

information regarding a large number of people (population) can be inferred from the

responses obtained from a smaller group of subjects (sample) (James, 2006). In

addition, correlational designs are useful when exploring new topics, or topics that

have not been sufficiently investigated (Cohen & Manion, 1994).

In this study, quantitative research designs (descriptive and comparative survey,

correlational and cross-sectional) were used. A descriptive design employing

frequency distributions, means, standard deviations and charts for the obtained sample

was carried out to present an overview regarding performance in the SPM test and to

compute percentile ranks (norms) according to sample age levels (8 to 21 years old).

A comparative design was used to study whether significant differences existed

between sample performances on Raven’s Standard Progressive Matrices test

according to their gender, age groups and regions (developing and developed

countries, and urban (cities) and rural (villages)). A correlational design was used to

study the relationship between IQ scores on Raven’s Standard Progressive Matrices

161
test and Student's Academic Achievement (SAA) of Libyan students aged 8 to 21

years old. Finally, a cross-sectional approach was identified in this study as data were

collected from a sample with different age groups in a single time period.

5.4 Methods

In this study, the SPM test was used as a method to measure intelligence objectively.

The SPM resulted from the work of the British psychologist John C. Raven and

British geneticist Lionel Penrose. Their work was based on Spearman's two-factor

theory. The SPM tests are one of very few tests based on Spearman’s general (g)

factor theory of intelligence. Spearman (1946) felt that the goal of measuring “g” had

been achieved by the use of the Matrices test and considered the Progressive Matrices

test as the best of all non-verbal test of “g” or eductive ability.

Raven et al., (1996) mentioned that the SPM is used internationally, and no general

revision of it has been deemed necessary. Burke, (1958); Anastasi, (1988); Raven,

(1989); Carpenter et al., (1990); Arthur, & Woher, (1993); Arthur & Day (1994);

Court & Raven (1995); Murphy & Davidshofer (1998); Raven (2000); Kline (2000)

and Lynn (2006) noted that the SPM was the most widely used test due to the

following reasons:

• Non-verbal nature; can be applied cross-culturally.

• Being the best test of g; general factor present in all cognitive tasks.

• Being a group test and easy to administer and score.

• Possesses good psychometric characteristics (high validity and reliability).

162
• Being a popular instrument for use in developing countries (Thorndike &

Hagen 1977 and Ogunlade 1978)

• Being the first version of the RPM tests to be constructed (Raven, 1939) with

the possibility to be used for children from the age of 6 years onwards (Yoon,

2006).

Reliability and validity are both important measurements for identifying the suitability

of a test or a measuring instrument and are the most paramount characteristics of a

psychological test (Brown, 1983, Urbina, 1997, Kenneth, 1998, Kline, 2000,

Langdridge, 2004, Domino, Domino, 2006. Airasian, 2006, and Lobiondo-Wood &

Haber 2006). To achieve the aim of this study; validity, reliability and item analysis

(item difficulty and item discrimination index) were evaluated.

In addition to the SPM test, a meta-analysis was employed to compare performances

on the SPM test of a Libyan sample with that of other countries (developed and

developing countries). A review of relevant studies published on the SPM test from

computer databases, dissertation and bibliographies of review articles generated 44

studies. These studies were carried out in various countries between 1948 and 2009.

From each relevant study the following data were recorded and coded: (a) Author (b)

Country (c) Year of publication; (d) Population sampled; (e) Age (f) SPM means and

standard deviations and (g) Sample size.

These studies were carried out in Congo, Denmark, Egypt, Estonia, France, India, Iran,

Israel, Libya, Nigeria, Mexico, Qatar, Tanzania, Turkey, Syria, Sudan, Pakistan, UK

and the USA between 1948 and 2009. To be included, a study should provide

sufficient data such as SPM scores.

163
5.5 Ethical approval

This study was considered the first attempt to standardise Raven’s Standard

Progressive Matrices (SPM) test, and apply it on a sample from Libya. Ethics

consideration in research, according to Saunders et al (2007), “refers to the

appropriateness of your behaviour in relation to the rights of those who become the

subject of your work, or are affected by it” (p.178). Ethics in research is an important

issue and must be taken into consideration in any research design. Ethical approval

was obtained from the Research Governance and Ethics Committee at the University

of Salford (RGECo7/o74). In addition, Ethical approval was obtained from the

department of psychology in the University of Omar El-Mukhtar in Libya and

department of External Studies and Technical Cooperation in the Ministry of Higher

Education in Libya.

SPM testing was carried out by the researcher and well-trained teaching assistants

whom helped the researcher to distribute and administer the SMP test. The researcher

was trained by Professor Abdulrazik S. Attashani of the University of Omar El-

Mukhtar in 2001 during his study for a Masters degree. Only the researcher knew the

identity of the participants as their details were only accessible to the researcher. All

obtained data were secured in a safe place. The study included students from the age

of 8 to 21 years. The main purpose for this study was to develop the norms to find out

the distribution of IQ scores with Libyan students. Providing these norms would serve

as a guide in helping people to take appropriate decisions related to their future, and

choose educational programs that will best suit their abilities and assist in matching

job applicants to suitable employment.

164
Participation in this study was optional. An information sheet was provided and each

participant (or guardian of participant) was asked to sign a consent form. The

researcher also provided a simplified information sheet for children. “Please refer to

information sheet /children”. Information sheets and consent forms were available in

the native language of the participants (Arabic) and were comprehensive in content

and concepts. Each participant was free not to take part in the study or to withdraw at

any time without stating a reason. Also, participants were assured that their scores in

the SPM test was to be used for research purposes only. The researcher was available

on a contact number given if the participant wanted to discuss any matter that might

occur during the study. Results of the study were made available to all participants

and are possibly be published in Intelligence Journal. Participants that were willing to

attempt the test (children needed guardians/parents consent) were registered and then

the researcher randomly chose the participants.

5.6 Pilot study

A pilot study was first conducted to determine validity and reliability of the SPM test

to ascertain the applicability of the test. In addition, the pilot was done to determine

how clear the instructions of the test were for the participants, and to introduce the

way the test is conducted to the trained psychologists.

The sample consisted of 200 students (100 males and 100 females). Using Social

Package for Statistical Science (SPSS) (version 16) software, reliability was

investigated using split-half and Alpha (KR-20) methods and validity was

investigated using correlations coefficients (internal consistency of SPM test sets) and

external criterion (student's academic achievement) (SAA). The split-half reliability

ranged from (0.87 to 0.88) and internal consistency reliability ranged from (0.93 to

165
0.94). The validity using correlations coefficients (internal consistency) showed

statistically significant high correlations ranging from (0.70* to 0.89**) between the

SPM test sets and the total test score. Moreover, validity using correlation between the

SPM test and the external criterion (SAA) showed statistically significant moderate

correlations of (0.52**). It was concluded that the SPM provided a promising measure

of the non-verbal ability of Libyan students.

5.7 Main study


5.7.1 Sample size

Sample size (2600 students) was based on the original SPM test that was standardized

on a sample of 735 British children aged 6-13 years tested individually, 1,407 British

children aged 8-14 years tested in groups and 629 British adults aged 20-70 years old

(Raven, 1960 and Raven, et al. 1998). Kline (2000) stated that the sample size has to

be large enough to reduce the standard errors of correlations to negligible proportions.

The researchers aimed to achieve the highest possible number of participants in this

study, which was 2600 participants.

5.7.2 Sample selection

5.7.2.1 Multi-stage-cluster sampling design

The researcher lacked any sample framework (a record to select the candidates from)

for Libyan students aged between 8 to 21 years old, who were mainly in different

educational grades either for those enrolled in the different schools aged from 8 to 17

years old or for those enrolled as undergraduate students in different universities

grades aged from 18 to 21 years. In addition, the research dealt with a huge dispersed

area, the Eastern Libyan Region. It encompassed a large number of cities and villages.

Moreover, the researcher dealt with a wide range of different age groups; from 8 to 21

166
years old. Consequently, the only available way to choose the sample was to employ a

multi-stage sampling technique. Its main advantages included no need for a sample

framework prior to conducting the survey and the ability to prepare it in the field.

Also, ease of conduct in the likelihood of a dispersed region.

In cluster sampling, intact groups, not individuals are randomly selected. All members

of selected groups had similar characteristics. Cluster sampling is more convenient

when the population is large or spread out over a wide geographic area. Cluster

sampling can be carried out in stages, involving selection of clusters within clusters.

This process is called multistage sampling (Mills & Airasian, 2006). When Raven, in

1981, standardized the Irish and British SPM test, he used this sampling method,

which was defined by Denscombe (1998) as a sampling method that involves

selecting samples from samples, each sample being drawn from within the previously

selected sample. In principal, the multi stage sampling method, which is an outright

random probability sampling method, can go on through any number of levels, each

level involving a sample drawn from the previous level (Bryman, 2005).

Consequently, by getting sufficient numbers of representative clusters or units for the

whole population and focusing on them, the researcher saved time and money instead

of spending them on travelling to the research sites scattered though the length and

breadth of the region. In addition, it enabled the researcher to prepare the sample

framework in the field to select prospective respondents. Thus, the pre-mentioned

advantages led to selecting the multi-stage disproportional stratified method as the

main method for selecting suitable representative samples for this research.

167
5.7.2.2 Disproportional stratified sampling

Although, the stratified sampling method continues to adhere with the underlying

principles of randomness, it adds some boundaries to the process of selection and

applies the principles of randomness within these boundaries (Denscombe, 1998). The

significant advantage of stratified sampling over random sampling is the ability to

assert some control over the selection of the sample to guarantee the inclusion of

crucial events or crucial people or social groups in the sample. This sample design

varied the sample fraction between different strata which increased the sample size in

small strata allowing enough cases for analysis, which is important for comparing

subgroups. Consequently the researchers used the multi-stage, cluster-disproportional

stratified sampling technique six times as follows;

1. To select at least one main and one secondary city.

2. To select nine villages from the existing thirty. Villages were divided

depending on location to coastal, mountain or desert villages. The

researcher selected three villages from each category.

3. To select at least one elementary, one preparatory and one secondary

school in every village of the selected nine villages, regardless of the

existing number of schools, in every educational level and to select at

least one classroom from every grade of the six grades in the elementary

school or from the three grades in both the preparatory and the secondary

school, regardless of the available number of classrooms in every grade,

in these villages.

4. To select at least five male and five female students from every

classroom, from the different classrooms in the different grades in the

selected nine villages.

168
5. To select at least five male and five female students from every

classroom, from the different educational grades in both selected cities.

6. To randomly select male and female students from either the scientific or

arts curriculum in the two different branches of Omar El-Mukhtar

University.

A main difference between cities and villages was the existence of separate schools

for male and female students in the preparatory and secondary school education levels

and common schools in elementary education levels in the cities, contrarily to villages

where all the schools are common and shared for both genders.

Also, the existence of many administrative boundaries necessitated selecting more

than one school to represent the city. This meant that it was impossible to select one

elementary school for example to represent all the elementary schools in the city.

Consequently the researcher decided to divide the main city into six administrative

boundaries and the secondary city into three administrative boundaries. This was

followed by selecting one school for male students and one school for female students

for every educational level located within the selected administrative boundaries. In

addition, only one school was available for each educational level in each village in

contrast, to the availability of many schools for each educational level in the city.

For this purpose of the study, two cities were chosen; a main city (Al-Beida) and a

secondary city (Shahat). Al-Beida is the main city in the eastern region of Libya.

During the monarchy (1951-1969), Al-Beida was the second capital of Libya. Now

the municipality of the eastern region has a university (Omar El-Mukhtar University),

consisting of five campuses situated in the following cities: Al-Baida, Al-Marj, Al-

169
Gooba, Tobruk and Darnah. Al-Beida is considered as an educational, trade and

health centre for neighbouring settlements and small cities (Kezeiri, 1995). According

to the General Authority of Information in 2006, Al-Beida city has been divided into

six administrative boundaries; Alsog algadem, Algareka, Werdamah, AlZaweya

Algademah, Al-Beida Algharbiya and Al-Beida Alshargeya.

Shahat city, previously known as Cyrene, was established by the Greeks in 631 B.C.

It was the first city to be formed in Libya. The location of the city played a significant

role in its growth and prosperity as did the availability of water from the Apollo

springs and abundance of rain. Its proximity to Apollonia port provided easy contact

with all Mediterranean ports. The city is considered as an important political,

religious, agricultural and industrial centre (Kezeiri, 1995). According to the General

Authority of Information in 2006, Shahat city has been divided into three

administrative boundaries; Shahat Aljadedah, Shahat Algademah and Almansora. A

representative school was chosen for each administrative boundary in these two cities.

In addition, eastern regions provided the researcher with a wealth of resources (i.e.)

the researcher was born in Al-Beida city, and had good links to academic fellow

students and researchers. Also, he had taught in various cities located in the eastern

regions of Libya.

A large and more easily accessible sample was chosen from two of the Libyan cities

(AL-Beida, and Shahat) and nine villages because of its manageability both in terms

of time and resources, besides the researcher’s familiarity with the social context.

Figure 5.1 summarizes the importance and process of sampling method followed.

170
Multi-stage stratified Grouping and clustering the six cities and 30 villages to two main
probability sample for urban and rural clusters
the selection of
students aged between Select two cities from urban cluster and nine villages from rural cluster
8 to 21 years old
either in the basic
Selecting one elementary school, one preparatory school, and one
educational level or in secondary school from every village of the nine villages, then select at
the university least one classroom from every grade from grade three to grade twelve,
graduating level. followed by selecting at least five male and five female students from
each classroom.

Selecting six elementary schools (shared schools), twelve preparatory


schools, and twelve secondary schools (Separate schools) in the main
city from the selected six administrative boundaries and three
elementary schools (shared schools), six preparatory schools, and six
secondary schools (separate schools) in the secondary city from the
selected three administrative boundaries within the selected two cities,
followed by selecting at least five male and five female students.
Selecting 800 under graduated students from the two branches of
El-Mukhtar University located in two different cities from the arts and
science curriculums including equal numbers of male and female
students

Why this sample’s Lack of sample framework


design Sampling of a wide area (eastern Libyan region)
Interviewees are dispersed over wide areas (six cities and 31 villages)
Time and cost limitations

Figure 5.1 Summary of the sampling method and theory

5.7.2.3 The multi-stage-cluster sampling process and procedures

The procedures of conducting the multi-stage stratified sampling method involved

sampling from one higher level unit called the preparatory sampling unit (Eastern

Libyan Region) and then sampling of secondary sampling units from and within that

higher level unit (cities and villages). This was followed by classifying the cities to

two homogenous urban area clusters using the criterion of their administrative

boundaries as the third sampling level; main and secondary cities. The researcher

selected one city from each category, In addition, villages were classified into three

different categories (third clustering sampling level); coastal, dessert and mountain

171
villages. Three villages were selected from each category with different weights or

ratios as the fourth sampling level. Followed by classifying and counting for the

existing schools either in the two selected cities or the nine selected villages as the

fifth sampling level according to their educational levels in Libya; elementary level

(grade three to grade six), preparatory level (grade seven to grade nine), and

secondary level (grade ten to grade twelve).

The aim was to select one elementary, one preparatory and one secondary school from

each village, where most schools are common; for male and female students. The

researcher visited 27 schools in the nine villages to select the prospect respondents

(students) randomly from a list (sample framework), prepared by himself in the field

(during his visit to these schools). In the two cities, the aim was slightly different due

to the fact that preparatory and secondary schools apply a one gender policy and due

to the implications of the sophisticated composition of each city administrative

boundaries on the inability of selecting one school as a representative for the whole

city. Consequently, the researcher found himself in need of selecting at least two

schools in the preparatory and secondary educational level, one for male and one for

female students. This resulted in selecting six elementary schools, twelve preparatory

schools, and twelve secondary schools in the main city and three elementary schools,

six preparatory schools, and six secondary schools in the secondary city. Overall, the

researcher visited 72 schools from the existing 124 schools (about 58%) in the

different 11 settlements (two cities and nine villages); 27 schools located in the

selected nine villages and 45 schools located in the selected two cities.

Selection of one classroom from every grade in every school either in the nine

villages or in the two cities was conducted by the researcher. Children in Libya start

elementary school at the age of six years old. The researcher randomly selected

172
classrooms in the elementary schools from grade three and onwards. The student list

was prepared depending on the student’s age.

Regarding the respondents aged from 18 to 21 years old enrolled in the universities,

the researcher selected Omar El-Mukhtar University which consisted of five

campuses in different settlements situated in; Al-Beida city, and Al-Marj. This could

be traced back to the fact that the researcher taught at Omar El-Mukhtar University in

Al-Beida as a lecturer in psychology and in its branches located at Al-Marj as a

visiting lecturer. Consequently the researcher had much more access to the university

or schools located in the mentioned settlements, in addition to its implications on

easing the researcher tasks in collecting a reasonable amount of data, accessing to the

available data resources and establishing good links with past and current academic

staff.

The application of the multi-stage stratified sampling method to select the respondents

aged from 18 to 21 years old from this university as the primary sampling level

involved classifying its different specialisations into two main curriculum groups; the

science specialisation students and art specialisation students as the secondary

sampling level. The two main specialisations or curriculum were divided by the four

academic years or grades as the third sampling level. Finally the researcher selected

students from every grade within the two curriculums. The aim was to select at least

200 students from each grade (100 students from the scientific curriculum and 100

students from the art curriculum) in the same time assuring gender equality (100 male

and 100 female students) disproportional to the real numbers of students in these two

173
main curriculum and regardless of the real numbers of either male or the female

students.

Overall, 2600 respondents aged from 8 to 21 years old with different fractions,

weights or ratios to the real numbers of prospect respondents in each group were

selected. The distribution of this number of respondents was as follows:

• 900 respondents or students from nine villages, aged from 8 to 17 years old,

enrolled in three basic educational levels; elementary, preparatory and

secondary school educational levels.

• 900 students from two cities, aged from 8 to 17 years old, enrolled in three

basic educational levels; elementary, preparatory and secondary school

educational levels.

• 800 undergraduate students enrolled in Omar El-Mukhtar University.

Table 5.1 shows the followed principals in selecting the respondents from different

educational level in the rural and urban areas. Tables 5.2 and 5.3 show the

differentiation in the frictions between the selected sample sizes and the real numbers

of students due to the applied stratified sampling method either in the two selected

cities (table 5.2) or in the nine villages (table 5.3). Finally, table 5.4 shows the

frictions of the undergraduate students to the real numbers of Omar El-Mukhtar

University’ students. Additionally, figure 5.2 summarises the procedures of the

selected multi-stage stratified sampling method.

174
Table 5.1 principals of selecting sample in schools
EDUCATIONAL VILLAGES CITIES TOTAL
LEVEL
Elementary school 9 villages* 1 school* 4 2 cities* 9 boundaries * 1 school 720
grades* 1 classroom* (5 (shared school) * 4 grades
male students + 5 female * 1 classroom* (5 male and 5
students) = 360 students female students) = 360 students
preparatory school 9 villages* 1 school* 3 2 cities* 9 boundaries * 2 schools 540
grades* 1 classroom* (5 (1 male+1 female school)* 3 grades
male students + 5 female * 1 classroom* (5 male or 5 female
students) = 270 students students) = 270 students
Secondary school 9 villages* 1 school* 3 2cities* 9 boundaries * 2 schools 540
grades* 1 classroom* (5 (1 male+1 female school)* 3 grades
male students + 5 female * 1 classroom* (5 male or 5 female
students) = 270 students students) = 270 students
Total 900 900 1800

Table 5.2 Target sample size of the pre-university students in the two cities in
proportion to their real numbers
AGE STUDY LEVEL GENDER TOTAL
Male Female
8 Year three at elementary 45/290=15.5% 45/304=14.8% 90/594=15.1%
9 Year four at elementary 45/287=16.6% 45/298=15.1% 90/585=15.3%
10 Year five at elementary 45/284=15.8% 45/296=15.2% 90/580=15.5%
11 Year six at elementary 45/278=16.1% 45/286=15.7% 90/564=15.9%
12 Year one at preparatory 45/256=17.5% 45/274=16.4% 90/530=16.9%
13 Year two at preparatory 45/252=17.8% 45/270=16.6% 90/522= 17.2%
14 Year three at preparatory 45/265=16.9% 45/268=16.7% 90/533= 16.8%
15 Year one at secondary 45/239=18.8% 45/254=17.7% 90/493=18.2%
16 Year two at secondary 45/235=19.1% 45/248=18.1% 90/483=18.6%
17 Year three at secondary 45/243=18.5% 45/252=17.8% 90/495=18.1%
Total 450/2629= 17.1% 450/2750= 16.3% 900/5379= 16.7%

175
Table 5.3 Target sample size of pre-university students in the nine villages in
proportion to their real numbers
AGE STUDY LEVEL GENDER TOTAL
Male Female
8 Year three at elementary 45/230=19.5% 45/262=17.1% 90/492=18.3%
9 Year four at elementary 45/247=18.2% 45/250=18.0% 90/497=18.1%
10 Year five at elementary 45/236=19.0% 45/242=18.7% 90/478=18.8%
11 Year six at elementary 45/239=18.8% 45/258=17.4% 90/497=18.0%
12 Year one at preparatory 45/231=19.4% 45/251=17.9% 90/482=18.6%
13 Year two at preparatory 45/213=21.1% 45/224=20.0% 90/437=20.5%
14 Year three at preparatory 45/220=20.4% 45/236=19.0% 90/456=19.7%
15 Year one at secondary 45/216=20.8% 45/225=20.0% 90/441= 20.4%
16 Year two at secondary 45/211=21.3% 45/220=20.4% 90/431= 20.8%
17 Year three at secondary 45/217=20.7% 45/229=19.6% 90/446= 21.8%
Total 450/2260= 19.9% 450/2397= 18.7% 900/4657= 19.3%

Table 5.4 Target sample of undergraduate university students in Omar El-Mukhtar


University in proportion to their real numbers
Age Study level Gender Academic discipline Total
Sciences Arts
18 Year one Male 50/482=10.3% 50/509=9.8% 100/991=10.1 %
Female 50/496=10.1% 50/518= 9.6% 100/1014=9.9 %
Total 100/978=10.2% 100/1027=9.7% 200/2005= 9.9%
19 Year two Male 50/443=11.2% 50/502=9.9% 100/945= 10.5%
Female 50/475=10.5% 50/513= 9.7% 100/988=10.1 %
Total 100/918=10.9% 100/1015=9.8% 200/1933=10.3%
20 Year three Male 50/442=11.3% 50/497=10.1% 100/939=10.6%
Female 50/468=10.6% 50/501=9.9 % 100/969=10.3%
Total 100/910=10.9% 100/998=10.0% 200/1908=10.5%
21 Year four Male 50/439=11.3% 50/458=10.9% 100/897=11.1%
Female 50/457=10.9% 50/465=10.7% 100/922=10.8%
Total 100/896=11.1% 100/923=10.8% 200/1819=10.9%
Total 400/3702=10.8% 400/3963= 10.1% 800/7665=10.4 %

176
Figure 5.2 Sampling process

-Wide eastern region Multi stage- stratified eleven case studies (two
area cities and nine villages) sample design
-Lack of sample
framework
-Random probability Divide existing cities to two clusters according to
sample has sufficient administrative boundaries and villages to three clusters
accurate results according to geographic region)
-Many settlement types
and it is hard to select
one case study Two categories for Three geographic regions for
-Research limitation cities; Main, and villages; Coastal, Dessert and
especially limited field Secondary cities Mountain villages
work time and cost.

Main city Secondary city Coastal villages Mountain village Dessert villages
Alhanih, Alhammh qsarlibya, Maraoh, Aslanth
Al-Beida Shahat and Suasa Garnada and and Gantolah
Satih

Cluster A: schools in cities. Cluster B; schools in villages.


Nine elementary schools from two cities: Nine elementary schools, one school from
nine for female and male students, one each village
school (shared school) from each Nine preparatory schools, one school
administrative boundary. from each village
Eighteen preparatory schools (separate Nine secondary schools, one school from
schools) and eighteen secondary schools each village
(separate schools), two schools (shared
school) from each administrative boundary.

Selecting one classroom in every grade Selecting one classroom in every grade
from grade three in the elementary school from grade three in the elementary school
to grade twelve in secondary school to grade twelve in secondary school

Five male Five female Five male Five female


students students students students

Graduate students in university aged between 18 and 21 years old, selecting 400 students
from science and 400 students from art specialization; 200 students (100 male and 100
female) from each year in both specializations, from both campuses.

177
5.8 Field work arrangement

Assistance in the field work was provided from five well trained psychologists who

were the researcher colleagues at Omar El-Mukhtar University (teaching assistants)

after introducing and explaining the SPM test form, purposes and questions order to

them.

A request was made to the directors of the education sector to issue a letter to enable

the researcher to carry out the study in the chosen schools and universities.

The researcher contacted each school principal and dean faculty by a letter from

the sector of education explaining the purpose of the study and the procedure to be

followed in selecting and testing the students. At each school and university on the

day of the SPM testing, the researcher arrived one hour earlier to randomly select

students (males and females) from grades 3 to 12 from the sample framework (record

with students’ names in the selected classroom) which the researcher prepared in the

field with the help of the student affairs and student admission manager (students aged

from 8 to 17 years), or to select then 200 students in each year of university for both

disciplines (students aged from 18 to 21 years old). All participants were given an

information sheet and were required to sign a consent form before participation in the

study.

A place for testing the students was made available at each school. The place, in most

cases was either the school theatre or library where each student had his own table and

chair. Due to the large numbers of students in schools and existence of differences in

their age ranges, less than forty students were tested at a time using the SPM test. In

the university tests, the same methodology was adopted using groups of fifty at a time.

Participants were coded. Regarding school students, code was be based on location

178
whether city or village. In the case of students from villages, code was based on the

three types of village, name of villages, name of school, grade, gender and finally

number of participant. While, in the case of students from cities, code was based on

name of city, name of school, grade, gender and finally number of participant.

Moreover, no two cities or villages had names starting with the same letter.

For example: VCSM5F2;

V= Village “first letter”.

C= Coastal village type “first letter”.

S= Village name “first letter”.

M= School name “first letter”.

5= Year level.

F= Sex female “first letter”.

2= Participant number.

Regarding university students, code was based on name of city, name of university,

specialization, year level and sex and participant number.

For example: UBOA3M32;

U= University Participants “first letter”.

B= Beida “name of city”.

O= Omar Al-Muchtar “name of university”.

A= Arts Specialization.

3= Year level.

M= Sex male “first letter”

32= Participant number.

Personal details of participants were kept separately in a secure location, accessed

only by the researcher. Each participant name was assigned the code present on the

179
first page of the answer sheet. Only the researcher knew both the name and assigned

code for each participant. The researcher had supervised access to the children. At all

times, the school headmaster and teachers accompanied him and supervised him while

addressing the students and conducting the test.

5.9 Preparation of the SPM test

The Standard Progressive Matrices test consisted of 60 items in 60 pages, and was

divided into five sets lettered A, B, C, D and E. Each set consisted of 12 items. Each

page of the booklet contained a matrix with one missing part. Students were asked to

choose the missing part from six or eight options given below each matrix, and

indicate its number on a separate answer sheet. The following modifications were

introduced into the SPM test, to make it more suitable for the Libyan sample

. Instructions were given in the colloquial Libyan Arabic language

. English letters (A, B, C, D and E) in the five sets were changed into Arabic letters

3. Page order (direction) of the test booklet was changed from left to right, to suit the

Arabic way of writing and reading.

4. A new answer sheet was designed with Arabic letters, and right to left direction for

answering and writing.

5.10 Administration of the SPM test

During September to November 2007, the SPM test was administered to 1800 school

students, and during September to November 2008, the SPM test was administered to

800 university students. The researcher was introduced to the students by the head

teacher in schools or main supervisor or professor in universities. The researcher

followed a definite numbers of unified steps during conducting the SPM test with the

respondents as follows:

180
1. Some time was spent at the beginning of each SPM test to establish a good rapport

with students, by discussing the purpose of the study, and why certain students from

the whole school were randomly selected to participate in the study. Also, the students

were assured that their scores in the SPM test would remain anonymous, and would

be used for research purpose only. After the test they were thanked for participating.

2. After the introduction, the SPM test booklets were distributed to the students and

they were asked not to open the booklets, until told to do so.

3. To ensure that the students understood the test and the unfamiliar procedures for

recording their responses on a separate sheet, the standard instruction for group

administration given in the SPM test manual were follows as:

(a) This is a test of observation and clear thinking. Please open your test booklet at

the first page. You will find problem Number A1. Now look at your answer

sheet, you will see that under the heading set A there is a column of numbers

from 1 to 12.

(b) Now look at item A1, it is a pattern with a part cut out of it. Look at the pattern,

think what is the piece needed to complete the pattern correctly. Then find the

right piece out of the six shown below.

(c) All the pieces are the right size to fill the right space, but only one of them is the

right pattern. Number 1 is the right shape, but is not the right pattern. Number

2 is not a pattern at all. Numbers 3 and 5 are quite wrong. Number 6 is nearly

right, but is wrong here. Number 4 is the right answer because it is correct

both ways, isn't it?

(d) Now you write "4" next to number 1 under set A on your answer sheet. Please

don't mark the test booklet.

181
(e) On every page of the booklet there is a pattern with a piece missing, you have to

choose which one of the pieces below is the right one to complete the pattern,

and write its number next to the problem number on your answer sheet. Go on

like this by yourself until you reach the end of the booklet.

(f) The problems are simple at the beginning and get harder as you go on. Do not

miss any out if you are not sure make a guess. If you get stuck, move on to the

next problem, and then come back to the one you have difficulty with.

(f) Any questions? I will come around to see that you are getting on all right.

(h) You can have as much time as you like. Now turn over to problem 2 and start.

4. The SPM test was administered without a time limit, as recommended by the SPM

test manual. However the researcher recorded the definite time needed to complete it

by each student. When each student had completed the SPM test and handed in his /

her test booklet and answer sheet, the researcher checked the answer sheet to make

sure that it had been filled in correctly and that every item had been answered, then

registered the time that the student needed to complete the test. The longest test time

recorded was 81 minutes.

5. The SPM test scores for the students were obtained by using the scoring key

provided in the SPM manual.

6. The SPM items were scored by hand and double checked. The items were scored

either right or wrong. The maximum possible score was 60. The score was the number

of correct answers.

5.11 The proposed and achieved sample size

The researcher succeeded in achieving 100% of the target sample size in the pre-

university schools and in university students. In the chosen cities and villages, 90

students (45 males, 45 females) in each of the 10 educational levels were chosen. This

182
led to a total of 1800 student (900 male and 900 Female) who took the test (900 from

nine villages and 900 from two cities). Regarding university students, 100 students

(50 male and 50 female) in each of the 4 study level were chosen. This led to a total of

800 students (400 male and 400 female) 200 students in each year of university for

both disciplines 100 Science students (50 male and 50 female) and 100 Arts students

(50 male and 50 female).

5.12 Data Statistical Analysis

This section discusses data preparing, cleaning and the rational for statistical tests

used in this study. Data collected were imported into (SPSS) (version 16) software.

Afterwards, data was screened for errors and missing parts and then analysis using

SPSS (16) was carried out.

First descriptive statistics employing frequency distributions, means, standard

deviations and charts for all study variables were conducted to present an overview of

the performance of Libyan participants on the SPM test. Also, normality of the data

was tested using the Kolmogorov-Smirnov test and normal probability plots. Data

showed normal distribution.

Second to compute differences between SPM test means, independent sample T-test

was used when one continuous dependent variable (SPM test scores) was examined

and subjects divided into two groups e.g. male and female or science and arts

disciplines or cities and villages (Pallant, 2007). The analysis based on region and

geographic area was not carried out on university students, because all university

students were in the city, and there were no universities in villages.

183
Third to compute differences between SPM test means, One-Way Analysis of

Variance was used when one continuous dependent variable (SPM test score) was

examined and sample divided into more than two groups e.g. age (Pallant, 2007).

Fourth To compute differences between SPM test means, Two-Way Analysis of

Variance was used when one continuous dependent variable (SPM test score) was

examined and the sample divided by two independent variables e.g. gender and age

or region and age. This analysis allowed the investigation of the individual and joint

effect of two independent variables on one dependent variable (Pallant, 2007).

Fifth To investigate the effect size of the SPM means by calculation of cohen’s d,

which is equal to the subtraction of the means divided by the mean of the standard

deviation. In addition, cohen’s d was used to calculate IQ point difference which was

equal to d multiplied by the SD (15).

Sixth To evaluate the variability (variance ratios); Vr average of the squared

differences from the mean (Lynn and Irwing, 2004).

Seventh To convert SPM means score to IQ scores using British and American

percentile indices and a conversion table from percentiles to IQ scores The British and

USA norms for the Standard Progressive Matrices were used to calculate the IQ of the

Libyan sample. This method has been used in many recent studies such as Lynn and

Vanhanen in 2006, Abdel-Khalek and Lynn in 2006, Keleefa and Lynn, in 2008a,

Keleefa et al.in 2008b, Abdel-Khalek and Lynn in 2009 and Lynn in 2009. In

addition, kaplan and Saccuzzo (1997) concluded that Raven was regarded as one of

the major authorities in the psychological testing field in the 21st century.

Eighth a Pearson Product-Moment Correlation coefficient was used to examine

continuous variable correlational relationships. The direction and strength of such

relationships (between SPM test scores and Student's Academic Achievement (SAA))

184
was investigated following these guidelines; r = 0.10 small effect, r = 0.30 medium

effect and r = 0.50 large effect (Field and Hole, 2005). Also Pearson Product-Moment

Correlation coefficient was used to calculate validity of internal consistency

(correlation coefficients between SPM test total score and SPM test sets) (Anastasi

and Urbina 1997). Pearson Product-Moment Correlation coefficient was used to

calculate validity of criterion-related (correlation coefficients between SPM test

scores and student's academic achievement) (Anastasi and Urbina 1997).

Ninth Multiple regression stepwise analysis method was used to investigate which

independent variable was the best predictor (gender, age, (SAA) and regions; urban

(city) and rural (village)) of SPM scores (Pallant, 2007).

Tenth Reliability of SPM test scores were investigated using split-half, Alpha and

test-retest (KR-20) methods. In the split-half method, items were divided into odd and

even items, because the items were arranged in order of difficulty (Kline 2000). Alpha

(KR-20) estimated how test items related to each other and to the total test. It is useful

for multiple choice items that were scored as right or wrong (Anastasi, Urbina 1997

and Mills, Airasian 2006). Test-retest correlated items within a test, when the test was

administered on two occasions (Kline 2000).

Eleventh two different methods were used for validity estimation; the first was the

Construct Validity by using Factor analysis and internal consistency and the second

was the criterion-related validity by using (SAA) as an external criterion. Due to lack

of standardized mental tests in Libya it was not possible in this study to use any other

intelligence test as an external criterion to investigate the validity of the SPM test.

185
Twelfth Item Analysis (difficulty and item discrimination) was investigated.

(a) Item difficulty: the proportion of respondents who answered an item correctly. If

most respondents answered an item correctly; the item was an easy item. If most

respondents answered an item incorrectly, it was a difficult item (Brown, 1983).

(b) Item discrimination index showed whether items differentiate between people with

varying degrees of knowledge and ability (Brown, 1983). The point biserial

correlation between “pass/fail” on each item and total test score was used to

investigate the SPM item discrimination ability (Anastasi 1988 and Anastasi, Urbina

1997).

5.13 Chapter Summary

This chapter discussed in details the methodology and theoretical perspectives

underpinning this study. The Non-experimental quantitative research designs (descriptive

and comparative surveys, correlational and cross-sectional) were used. Ethical

considerations were considered. A pilot study was conducted and results showed that

the SPM test was valid and reliable and it was subsequently recommended for use for

Libyan students. A sample size of 2600 students (aged between 8-21 years) was based

on two previous British standardized SPM tests. Sampling process included a multi-

stage, cluster-disproportional stratified sampling technique. This study involved 72

schools located in 11 different settlements; nine villages and two cities and two

universities located in two cities; Al-Beida and Al-Marj. The researcher succeeded in

achieving 100% of the target sample size. A meta-analysis was carried out to compare

performance in the SPM test for a Libyan sample with that of other countries. Finally

statistical tests employed and rationales were justified. Next will be the SPM Libyan

sample results chapter. Meta-analysis will be discussed in chapter seven.

186
Chapter 6 Results
6.1 Introduction

This study represented a preliminary standardization for the SPM test on a Libyan

sample to develop norms for the classical form of the Standard Progressive Matrices

(SPM) test in Libya and to identify the distribution of IQ scores in a sample of Libyan

students. There were seven research objectives and results analyzed in this chapter.

The meaning and significance of the attained results and objectives will be postponed

to the next chapter. The SPSS version (16) analysis was carried out as follows

1. Determine psychometric characteristics (reliability, validity, difficulty and

discrimination) of the SPM test when applied to a Libyan sample.

2. To study the relationship between SPM mean scores and student’s academic

achievement (SAA) for a Libyan sample aged 8 – 21 years.

3. To investigate the presence of significant differences in sample performances on

the SPM test according to gender, region (cities and villages), academic discipline

(science and arts), geographical areas (main city, secondary city, coastal, mountain

and desert), age and study levels.

4. To investigate the presence of significant differences in sample performance on the

SPM test according to region and gender, age and region, region and study levels,

geographic areas and gender, academic discipline and gender, age and gender and age

and academic discipline.

5. To investigate variability of SPM means score gender based on age and gender

based on geographic areas and gender based on academic discipline.

6. To examine the contribution of the independent variables gender, age and regions

and academic achievement in predicting SPM scores.

7. To compute the percentile ranks for the SPM scores according to the sample age

levels.

187
In addition, a eighth study research objective, which dealt with comparing

performance on the SPM test for a Libyan sample with that of other countries (meta-

analysis), was carried out and is reported in chapter seven. Data obtained were tested

for normality. For this, the Kolmogorov-Smirnov, Shapiro-Wilk test (table 6.1) and

normal probability plots (figures 6.1, 6.2, 6.3 and 6.4) were employed to investigate

and determine normality of the data.

Table 6.1 Descriptive statistics of overall collected data and tests of normality.
Descriptive statistics Statistic Std Error
Mean 32.31 .234
95% confidence 31.85
Interval for Mean 32.76
5% Trimmed Mean 32.40
Median 33.00
Variance 142.670
Std. Deviation 11.94
Minimum 6
Maximum 58
Range 52
Interquartile Range 19
Skewness -.217 .073
Kurtosis -.596 .146
Tests of normality
Kolmogorov-smirnov Shapiro-Wilk
Statistic df Sig Statistic df Sig
.070 2600 .005 .971 2600 .005

Figure 6.1 Histogram showing normal distribution for means scores.

188
Figure 6.2 Normal Q-Q plot. Figure 6.3 Detrended normal Q-Q plot.

70

60

50

40

30

20

10

0
N= 2600

totaliq

Figure 6.4 Box plot of scores distribution.

Figure 6.1 is a histogram showing the SPM scores. They appeared to be normally

distributed. Figure 6.2 showed a normal probability plot (normal Q-Q plot). Here the

observed value of each mean is plotted against its expected value. A reasonable

straight line suggested a normal distribution. Figure 6.3 showed the detrended normal

Q-Q plot, where the actual deviation of the scores from the straight line are plotted.

Most scores were collected around the zero line with no real clustering of scores. This

indicated a normal distribution. Figure 6.4 showed a box plot. 50% of score are

189
represented by the rectangular, while the line inside the box represents the median

value, whereas the whiskers represent the highest and lowest values.

The statistical results of both tests of normality were significant (p = 0.000). However

the sample size in this study was large and that indicated a normal distribution

(Pallant, 2007). In addition, Pearson’s Skewness Coefficient was used to verify the

normal distribution. Pearson's Skewness Coefficient is a measure of skewness (Duffy

and Jacobsen, 2005) which is defined as:

Skewness coefficient= (mean-median)/SD

Hildebrand (1986) stated that skewness values above 0.2 or below -0.2 indicate severe

skewness. The skewness coefficient in this sample was -0.05 indicating minor

skewness. All of the above tests indicated that the sample used was normally

distributed and that parametric tests may be applied with confidence to analyze the

data.

6.2 Description of students and SPM score means

A total of 2600 Libyan students participated in this study. Students were divided into

subgroups according to gender, age, region, Geographic areas and academic

discipline. 1800 school students (900 males and 900 females) and 800 University

students (400 males and 400 females) carried out the test. According to region, 900

school students were from cities, whereas the remaining 900 were from villages. They

were chosen from 72 schools located in 11 different settlements; nine villages (27

schools) and two cities (45 schools). The 800 university students were from two

universities located in two cities; Al-Beida and Al-Marj during the academic year

2007-2008. Of them, 400 students were from science and 400 students from art

discipline.

190
The Following tables showed descriptive statistics of the SPM score means according

to gender, region, geographic areas, study levels, academic discipline and age. Table

6.2 shows SPM score means and standard deviations according to the independent

variables.

Table 6.2 SPM score means and standard deviations


Gender Regions
Groups (N) Mean SD Mi Ma Regions (N) Mean SD Mi Ma
Males 1300 32.49 12.06 6 57 Cities 900 28.49 11.75 6 57
Females 1300 32.12 11.81 6 58 Villegas 900 28.18 10.51 6 51
Total 2600 32.31 11.94 6 58 Total 1800 28.33 11.15 6 57
Geographic areas Age
Main-C 600 28.66 11.95 6 57 8 180 15.82 6.33 6 43
Secondary-C 300 28.54 11.38 7 55 9 180 17.92 6.67 6 40
Coastal-V 300 28.50 10.53 7 50 10 180 20.89 7.99 6 42
Mountain-V 300 27.50 10.13 7 48 11 180 25.21 9.16 8 49
Dessert-V 300 28.12 10.76 6 51 12 180 28.65 8.89 9 48
Total 1800 28.33 11.15 6 57 13 180 32.10 8.50 9 49
Academic discipline. 14 180 33.42 8.21 8 52
Science 400 42.34 8.56 12 58 15 180 34.63 8.13 12 55
Arts 400 40.16 7.88 12 57 16 180 36.04 8.94 10 57
Total 800 41.25 8.29 12 58 17 180 38.62 8.54 12 55
Study levels 18 200 39.30 9.22 12 58
Elementary 720 19.96 8.38 6 49 19 200 41.22 8.30 16 57
Preparatory 540 31.39 8.77 8 52 20 200 41.91 7.90 12 56
Secondary 540 36.43 8.68 10 57 21 200 42.56 7.34 22 57
University 800 41.25 8.29 12 58 Total 2600 32.31 11.94 6 58
Total 2600 32.31 11.94 6 58 C = city & V= villages
Mi is the minimum score, Ma is the maximum score.

Based on gender, males mean scores were only slightly higher than females. Based on

regions, cities were only slightly higher than villages. Similarly, based on geographic

areas, the main city also showed slightly higher mean scores than other geographic

areas. In regards to age, score means increased as age increased; the highest score

means were achieved by 21 years old students. According to study levels, score means

increased as study levels increased; the highest score means were achieved by the

191
university level. Based on academic discipline, science students obtained a

significantly higher mean than arts students.

To establish the first research objective, which is to determine the psychometric

characteristics (reliability, validity, difficulty and discrimination) of the SPM test, the

following procedures were conducted:

1. Reliability of the SPM test was evaluated using three methods:

• Test-retest reliability.

• Split-half reliability.

• Alpha Cronbach reliability (Kuder-Richardson Formula 20).

2. Validity of the SPM test was investigated using two methods:

• Construct Validity using Factor analysis and internal consistency.

• Criterion-related validity, the student’s overall scores in final examinations

(SAA) were taken as an external criterion.

3. Item analysis to ascertain item difficulty and discrimination.

6.3 Reliability of the SPM Test

Reliability refers to the consistency of scores obtained by the same person when

retested with the same test or equivalent form. To establish the reliability of the SPM

test when used with the Libyan students, three different methods were employed. The

first method was split-half reliability with the total sample (N = 2600), the second

method was coefficient Alpha (KR-20) which also used with the total sample (N =

2600) and the third was test retest reliability with a sample of 280 students.

192
6.3.1 Test-retest reliability of the SPM test

The test-retest method was used to evaluate reliability; measure of the stability of

students’ scores over a period of time on the SPM test. The SPM test was

administered twice to a group of 280 Libyan students (140 males and 140 females).

The time interval between test-retest was two weeks. Table 6.3 showed the SPM test-

retest reliabilities according to age groups, gender and study levels.

Table 6.3 SPM test-retest reliabilities according to age, gender and study levels
AGE GROUPS STUDY LEVELS MALES FEMALES TOTAL
N r N r N r
8-11 Elementary 40 .86 40 .87 80 .87
12-14 Preparatory 30 88 30 .87 60 .88
15-17 Secondary 30 .88 30 .91 60 .91
18-21 University 40 .92 40 .91 80 .92
Total Sample 140 .89 140 .89 280 .90

The SPM test-retest reliability ranged from 0.86 for male students age groups 8-11

year (N=40) to 0.92 for males and females university students. The overall test-retest

reliability was 0.90.

6.3.2 Spilt-half reliability

The split half method was used to investigate the reliability of the SPM test. The SPM

items were divided into odd and even items, as the items are arranged in order of

difficulty (Kline 2000). The split-half reliability was then corrected by the Spearman-

Brown prophesy formula. Whereas it is a general formula that can be used to assess a

variety of different questions about test length and reliability, it is presented here

because it is extensively used in calculating the “corrected” split-half reliability

(Kline, 2000 and Kline, 2005). The reliability coefficients were computed separately

for male and female students, age and total sample. Table 6.4 showed the SPM split-

half reliabilities according to gender, age and total Sample.

193
Table 6.4 SPM split-half reliabilities according to gender, age and total Sample
AGE MALES FEMALES TOTAL
N SH (r.) SB N SH (r.) SB N SH (r.) SB
8 9 .77 .88 9 .85 .92 8 .81 .90
9 9 .84 .91 9 .77 .88 8 .80 .89
10 9 .79 .88 9 .84 .91 8 .83 .91
11 9 .90 .95 9 .88 .94 8 .89 .94
12 9 .80 .89 9 .87 .93 8 .84 .91
13 9 .82 .90 9 .85 .92 8 .84 .91
14 9 .83 .91 9 .84 .91 8 .84 .91
15 9 .81 .90 9 .87 .93 8 .84 .91
16 9 .88 .94 9 .89 .94 8 .89 .94
17 9 .86 .92 9 .89 .94 8 .88 .93
18  .87 .93  .88 .94 200 .88 .94
19  .90 .95 100 .86 .93 200 .88 .94
20  .90 .95  .88 .94 200 .89 .94
21  .91 .96  .86 .93 200 .89 .94
Total 1300 .92 .96 1300 .91 .96 2600 .92 96
SH (r.) = Split-half. SB = Spearman-Brown (SPSS provide SB).

Table 6.4 showed that the split-half reliability for the SPM test ranged from (0.77 to

0.92) and its Spearman-Brown (PS) correction ranged from (0.88 to 0.96). In total

sample the SPM split-half reliability was 0.96 (N=2600).

6.3.3 Alpha Reliability

The coefficient Alpha, equivalent to the Kuder-Richardson 20 (KR-20 coefficient),

determines how items in a test relate to other test items and to the total test. KR-20

formula provides reliability estimates that are equivalent to the average of the split-

half reliabilities computed for all possible halves. In addition, alpha (KR-20) is useful

for multiple choice items that were scored as right or wrong (Anastasi, Urbina 1997

and Mills, Airasian 2006).The reliability coefficients were computed separately for

gender, age and total sample. The results obtained were given in table 6.5.

194
Table 6.5 SPM Alpha reliabilities according to gender, age and total sample
AGE MALES FEMALES TOTAL
N Alpha N Alpha N Alpha
8 9 .85 9 .86 8 .86
9 9 87 9 .86 8 .87
10 9 .87 9 .90 8 .90
11 9 .92 9 .91 8 .92
12 9 .88 9 .93 8 .91
13 9 .90 9 .90 8 .90
14 9 .89 9 .90 8 .89
15 9 .88 9 .90 8 .90
16 9 .93 9 .91 8 .92
17 9 .91 9 .91 8 .90
18  .91  .94 200 .93
19  .93 100 .90 200 .91
20  .89  .93 200 .92
21  .93  .92 200 .93
Total 1300 .96 1300 .94 2600 .94

Table 6.5 showed alpha reliabilities (KR-20) for the SPM ranged from 0.85 (males

aged 8) to 0.96 (total males). In total sample the SPM alpha reliability (KR-20) was

0.94 (N=2600).

6.4 Validity of the SPM test

Validity is the degree to which a test measures what is supposed to measure and,

consequently, permits appropriate interpretation of scores. To determine the validity

of the SPM test two different methods were employed. The first method was

Construct Validity with the total sample (N = 2600), the second method was criterion-

related validity which was also used with the total sample (N = 2600).

6.4.1 Construct Validity

Construct validity refers to whether a scale measures or correlates with a theorized

psychological construct (Cronbach and Meehl, 1955). Construct validity is concerned

with the extent to which a test measures a specific trait or construct. The term

construct is used to refer to something that is not itself directly measurable but which

195
explains observable effect. In other words, construct validation is the systematic

analysis of test scores designed to assess whether there is a basis for validity. A

subtype of construct validity is factor analysis and internal consistency (Anastasi and

Urbina, 1997).

6.4.1.1 Factor analysis of SPM test

This procedure shows the extent to which a set of items measures the same underlying

construct or dimension of a construct (Anastasi (1988). To test the factorial analysis

validity of the SPM test scale, the intercorrelations between the five sets of the SPM

test initially were subjected to principal components factor analysis for male and

female separately to ascertain whether the items contained a general factor and

possibly other factors. In this procedure the number of significant factors is normally

taken to be those with eigenvalues greater than unity. An eigenvalue is the amount of

the total variance, deviation from the average weighted by the sample size, explained

by the corresponding factor (Tabachnick & Fidell 2007). Table 6.6 and figure 6.5

shows the results of the factor analysis of the SPM score means for the entire sample.

Table 6.6 Correlations matrix between the five sets of the SPM test among Libyan
male and female students (N=2600, 8 to21 years) and extracted factor
SET CORRELATIONS FACTOR 1
A B C D E
A 0.67
B 0.63** 0.84
C 0.57** 0.71** 0.87
D 0.56** 0.70** 0.76** 0.85
E 0.46** 0.55** 0.61** 0.60** 0.68
Eigen value 3.47
% of variance 69.41
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy 0.871
Bartlett's Test of Approx. Chi-Square 7323.359
Sphericity df 10
Sig. 0.000

196
Figure 6.5 Screen Plot for the five Factors

Table 6.6 showed all the correlation coefficients that were statistically significant

(0.46 to 0.76). To indicate a moderate or higher relationship, correlation matrix coefficients

should be 0.3 or higher (r > 0.3) in the principal component analysis. One highly loaded

factor (from 0.67 to 0.87) was extracted which accounted for 69.41% of the common

variance which was Spearman’s “g”. These results indicate the internal consistency

and factorial validity as a result of the test items’ homogeneity. In addition, results

show the Kaiser-Meyer-Oklin value was 0.871, exceeding the recommended value of

0.6 (minimum value for good factor analysis) (Kaiser 1970, 1974 and Tabachnick &

Fidell 2007) and the Bartletts’ test of sphericity (Bartlett, 1954) reached statistical

significance (0.000), supporting the factorability of the correlation matrix. A further

investigation; factor analysis of the SPM test was computed based on gender. The

following tables (Table 6.7 and 6.8) and figures (Figures 6.6 and 6.7) showed factor

analysis of SPM score means for males and females respectively.

197
Table 6.7 Correlations matrix between the five sets of the SPM test among Libyan
male students (N=1300, 8 to21 years) and Extracted Factor
SET CORRELATIONS FACTOR 1
A B C D E
A 0.70
B 0.64** 0.84
C 0.58** 0.70** 0.85
D 0.59** 0.70** 0.75** 0.86
E 0.46** 0.56** 0.60** 0.61** 0.69
Eigen value 3.49
% of variance 69.76
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy 0.874
Bartlett's Test of Approx. Chi-Square 3683.603
Sphericity df 10
Sig. 0.000

Figure 6.6 Screen Plot for the five Factors.

Table 6.7 showed all the correlation coefficients that were statistically significant

(0.46 to 0.75). One highly loaded factor (0.69 to 0.86) was extracted which accounted

for 69.76% of the common variance which was Spearman’s “g”. These results

indicated the internal consistency and factorial validity as a result of the test items’

homogeneity. Also, results showed that the Kaiser-Meyer-Oklin value was 0.874, and

198
the Bartletts’ Test of Sphericity reached statistical significance (0.000), supporting the

factorability of the correlation matrix.

Table 6.8 Correlations matrix between the five sets of the SPM test among Libyan
female students (N=1300, 8 to21 years) and extracted factor
SET CORRELATIONS FACTOR 1
A B C D E
A .67
B 0.62** .84
C 0.56** 0.72** .88
D 0.54** 0.69** 0.78** .85
E 0.46** 0.55** 0.62** 0.59** .68
Eigen value 3.46
% of variance 69.22
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy 0.865
Bartlett's Test of Approx. Chi-Square 3689.905
Sphericity df 10
Sig. 0.000

Figure 5.7 Screen Plot for the five Factors

Table 6.8, showed all the correlation coefficients that were statistically significant

(0.46 to 0.78). One highly loaded factor (from 0.67 to 0.88) was extracted which

199
accounted for 69.41% of the common variance which was Spearman’s “g”. These

results indicated the internal consistency and factorial validity as a result of the test

items’ homogeneity. Also results showed that the Kaiser-Meyer-Oklin value was

0.865, and the Bartletts’ Test of sphericity reached statistical significance (0.000),

supporting the factorability of the correlation matrix.

6.4.1.2 Internal consistency validity

Internal consistency is a measure based on the correlations between different

subscales on the same test and on total score. It measures whether several subscales

that propose to measure the same general construct produce similar scores (Anastasi

(1988, and Anastasi & Urbina, 1997). Pearson product-moment correlation

coefficients between the five sets and the total scores of the SPM test were computed

for validity estimation. Table 6.9 shows correlations coefficients between the five sets

and the total scores of the SPM test for the entire sample.

Table 6.9 Correlations coefficients between the five sets and the total scores of the
SPM test (n=2600, age 8 to21 years)
SETS CORRELATIONS
Total A Total B Total C Total D Total E
Total A 1.000
Total B 0.64** 1.000
**
Total C 0.59 0.71** 1.000
Total D 0.56** 0.69** 0.74** 1.000
**
Total E 0.50** 0.57 0.62** 0.64** 1.000
**
Total 0.72** 0.84 0.85** 0.85* 0.74**
** Correlation is significant at the 0.01 level

The relationship between sub-scales and total scales scores of the SPM test was

evaluated using Pearson product-moment correlation coefficients. There were strong

and statistically significant positive correlation coefficients between the five sets (A,

200
B, C, D and E) and total scores, ranging from 0.50 to 0.85, n= 2600 (p<0.01). In

addition, the internal consistency of the SPM test was computed based on gender.

Table 6.10 shows correlations coefficients between the five sets and the total scores of

the SPM test for males and females respectively.

Table 6.10 Correlations coefficients between the five sets and the total scores of the
SPM test (males n=1300 and females n= 1300, age 8 to21 years)
MALE N= 1300 SETS CORRELATIONS
Total A Total B Total C Total D Total E
Total A 1.000
Total B 0.65** 1.000
Total C 0.58** 0.69** 1.000
** **
Total D 0.59 0.69 0.73** 1.000
** **
Total E 0.51 0.58 0.63** 0.65** 1.000
** ** ** **
Total 0.71 0.83 0.84 0.85 0.74**

Female n= 1300 Sets Correlations


Total A Total B Total C Total D Total E
Total A 1.000
Total B 0.64** 1.000
**
Total C 0.59 0.71** 1.000
**
Total D 0.54 0.68** 0.754** 1.000
Total E 0.50** 0.55** 0.62** 0.63** 1.000
** ** ** **
Total 0.72 0.85 0.87 0.85 0.74**
** Correlation is significant at the 0.01 level

The relationship between the five sets and the total scores of the SPM test was

investigated using Pearson product-moment correlation coefficients. There were

strong, positive correlation coefficients, statistically significant between the five sets

(A, B, C, D and E) and total scores ranging from 0.51 to 0.85 (p<0.01) for males and

0.50 to 0.87 (p<0.01) for females.

201
6.4.2 Criterion-related validity

To evaluate validation of the SPM with Students Academic Achievement (SAA) the

total of final examination scores was used as criterion to validate the SPM test

(predictive validity). This is the correlation between test scores and a criterion that

occurs at a later point in time. Also the second research objective focused on

establishing the relationship between SPM scores and student’s scores in final school

and university exams in all studied courses (SAA) and Pearson product-moment

correlations were used. Table 6.11 shows the correlation between the SPM scores and

the students’ academic achievement scores in final school and university exams in all

studied courses (SAA) according to age, levels of study, gender and total sample.

Table 6.11 Correlation between the SPM and achievement scores according to age,
level of study, gender, academic discipline and total sample
Age and level of study variables N= 2600
Elementary Preparatory Secondary University
N= 720 N= 540 N= 540 N= 800
Age r Age r Age r Age r
8 .56** 12 .41** 15 .37** 18 .37**
9 .41** 13 .39** 16 .43** 19 .50**
10 .37** 14 .33** 17 .50** 20 .47**
11 .41** Total .38** Total .43** 21 .41**
Total .44** Total .44**

Gender Variable N= 2600 Academic discipline Variable N=800


Gender r Discipline r
Male .42** Art .41**.
Female .43** science .51**
Total .42** Total .46**
(1) r = Pearson Correlation. (2)**. Correlation is significant at the 0.01 level.

Results in table 6.11 showed that the validity coefficients between the SPM scores

and students’ SAA ranged from 0.33 to 0.56. For arts students the correlation

between the SPM scores and their SAA was 0.41. The correlation for both samples

(science and arts scores; n =800) between the students SAA and their SPM scores was

202
0.46, which is statistically significant from 0.41. In general, all correlation coefficients

between SPM and students SAA were statistical significant for all groups.

6.5 Item Analysis of the SPM test

Item analysis was used in this study to investigate the difficulty and discrimination

power of the item. An item analysis was performed on the SPM test based upon the

total sample (N=2600) students. Table 6.12 showed the difficulty levels of the SPM

items, Table 6.13 showed item discrimination and Table 6.14 exhibited a summary for

the item analysis.

6.5.1 Item Difficulty

The SPM test consisted of 5 sets of items, lettered (A, B, C, D, and E). Each set

consists of 12 items which become progressively more difficult. Furthermore the level

of difficulty increases from set A to set E.

Item difficulty is defined as the percentage of students obtaining the correct answer to

an item. The higher the value of the difficulty index, the easier the item. Table 6.12

showed the item difficulty indices of the five SPM sets for total sample.

Table 6.12 Item difficulty (percentages of correct answers) and SPM Means of the
Correct Answers (N = 2600)
Set Diff 1 2 3 4 5 6 7 8 9 10 11 12
A Diff 100 99 97 95 94 92 74 75 82 70 45 34
B Diff 97 90 82 75 66 64 50 43 49 57 41 33
C Diff 79 76 69 65 63 49 54 40 51 30 23 9
D Diff 84 73 65 61 70 58 54 52 49 39 22 7
E Diff 60 42 40 26 24 23 21 12 11 7 5 4
SPM means of the percent of correct answers.
Set A B C D E
Means 0.79 0.62 0.57 0.58 0.35

203
It was clear from table 6.12 that 11 SPM items which were answered by 80 - 100 % of

the students appeared to be easy and 7 items were from section A. 42 SPM items

which were answered by 21 - 79 % of the students appeared to be moderate in

difficulty and 7 SPM items which were answered by less than 20 % of the students

appeared to be too difficult.

In addition, it was evident from table 6.12 that three items in set A (A7, A8 and A9);

four items in set B (B7, B8, B9 and B10); three items in set C (C7, C8, and C9); and

three items in set D (D3, D4 and D5) did not follow an order of increasing in

difficulty, whereas set (E) followed an order of increasing in difficulty.

According to the 2004 SPM manual, items should steadily increase in difficulty

within the series. In order to test this, as Raven claimed, the degree of difficulty of

the 60 items and five sets of the SPM test were measured by means of the percent of

correct answers. Table 6.12 showed the SPM means of the percent of correct answers

for each SPM set. Set D mean was higher than set C, which suggested that set D was

comparatively easier than set C. Inspection of the mean for each item and set showed

that only thirteen items and one set appear to be of misplaced difficulty.

6.5.2 Item Discrimination

The discrimination index showed whether items differentiate between people with

varying degrees of knowledge and ability. It is the percentage of the “high” group

passing the item, minus the percentages of the “low” group passing the item. Also

correlation coefficient obtained from point biserial is the measure of item

discrimination. The point biserial correlation between “pass/fail” on each item and

total test score was used to investigate the SPM item discrimination (Brown, 1983;

Anastasi 1988 and Anastasi, Urbina 1997; Roid and Barram 2004; Kline, 2000; Kline,

204
2005). The greater the correlation of the item the more discriminating the item is i.e. it

discriminates between higher and lower group more effectively. For an item to be

valid, the correlation between the items and total scores should be fairly high.

Hopkins (1998) suggested that the indices of item discrimination can be evaluated in

the following terms (table 6.13):

Table 6.13 Index of Discrimination and Items Evaluation


Index of Discrimination Item Evaluation
(a) 0.40 and up Excellent discrimination
(b) 0.30 to 0.39 Good discrimination
(c) 0.10 to 0.29 Fair discrimination
(d) 0.01 to 0.10 Poor discrimination
Negative Item may be miskeyed or intrinsically ambiguous

Hopkins suggestion was utilized to analyze the point biserial correlation data. The

point biserial correlation between “pass/fail” for each SPM item and total test score

were showed in table 6.14.

Table 6.14 Point biserial and significant level for each SPM item
Set 1 2 3 4 5 6 7 8 9 10 11 12
A -- .12** .35** .42** .50** .46** .63** .56** .61** .67** .62** .52**
B .24** .41** .54** .54** .65** .61** .57** .71** .72** .74** .69** .61**
C .58** .57** .70** .65** .71** .60** .72** .60** .65** .50** .49** .12**
D .60** .76** .76** .76** .77** .73** .71** .63** .66** .63** .38** .14**
E .60** .61** .63** .63** .67** .60** .49** .50** .48** .33** .20** .11**
**Significant at 0.001

Generally, correlations lay between (r = 0.11 and 0.77; p < 0.001) with a general

mean of (r = 0.44; p < 0.001). The 60 correlations calculated were significant and all

were so easy for this sample that they did not generate any variance and hence no

covariance was evident. Also table 5.11 showed that the correlations ranged from (r =

0.12 to 0.77; p < 0.001) with a mean of (r = 0.54; p < 0.001) for set A; from (r =

0.12 to 0.67; p < 0.001) with a mean of (r = 0.59; p < 0.001) for set B; from (r =

0.24 to 0.74; p < 0.001) with a mean of (r = 0.57; p < 0.001) for set C; from (r = 0.12

205
to 0.72; p < 0.001) with a mean of (r = 0.63; p < 0.001) for set D and from (r = 0.14

to .77; p < 0.001) with a mean of (r = 0.49; p < 0.001) for set E.

According to Hopkins (1998) this SPM test had 51 items as having excellent

discriminating value, 3 items as having good discriminating value and 5 items as

having fair discriminating value. With the remaining items, correlations ranged from

(r = 0.49 to 0.61; p < 0.001). This indicated that the SPM test showed many

discriminating items.

Table 6.15 showed a summary of tables 6.12 and 6.14. It showed numbers of difficult

items, discriminate items, item not in order of difficulty, order of difficulty for the

SPM sets and order of excellent discriminated sets for the SPM.

Table 6.15 Summary of item analysis of the five SPM sets


Set Item Difficulty Item Discrimination INO ODS EDS
>80 21-79 <20 >.44 <.44 (N) Set Set
A 7 5 - 8 4 (3) E C
B 3 9 - 10 2 (3) C B
C - 11 1 11 1 (4) D D
D 1 10 1 10 2 (3) B E
E - 7 5 9 3 (-) A A
Total 11 42 7 40 20 (13)
(1) INO = Items not in order of increasing difficulty.
(2) ODS= Order of increasing of difficulty from high to low for SPM sets.
(3) EDS = Excellent discriminated sets in order from high to low.

From table 6.15 the following conclusions were drawn:

1. As designed, set A is the easiest set whereas set E is the most difficult set. Set A

had 5 items with moderate difficulty level (less than .79); set B had 9 items; set C

had 11 items; set D had 10 items and set E had 7 items. The order of difficulty of

the SPM five sets according to the numbers of difficult items in each set in order

from high to low were E, C, D, B and A.

206
2. 40 out of 60 items had excellent discriminating value. Set A had 8 items, set B and

D had 10 items, set C had 11 items and set E had 9 items of excellent

discriminating value. The excellent discriminated SPM sets in order from high to

low was C, B, D, E and A.

3. 13 items were not arranged in order of increasing difficulty. Set D had 4 items,

set A, B and C had 3 items each. No items were found in set E.

207
6.6 Differences in SPM scores

As mentioned in the beginning of this chapter, one of the objectives of this study was

to investigate the presence of significant differences in sample performances on the

SPM test according to gender, region (cities and villages), academic

discipline(science and arts), geographic nature (main city, secondary city, coastal,

mountain and desert), age and study levels. In addition, significant differences in

sample performance on the SPM test according to region and gender, age and region,

region and study levels, geographic nature and gender, academic discipline and

gender, age and gender and age and academic discipline was carried out. The

investigation in the differences was as follows:

6.6.1 Difference according to gender

An independent t-test was carried out to compare the SPM score means in regards to

gender (table 6.16).

Table 6.16 Comparison of gender


Gender (N) Mean SD Std. Error Mean
Male 1300 32.49 12.06 .335
Female 1300 32.12 11.83 .328
t-test for Equality of Means
Levene's Test for 95%
Equality of Variances Confidence
F Sig. t df Sig.(2- Mean Std. Error Interval of the
tailed) Difference Difference Difference
Lower Upper
Equal .479 .489 .789 2598 .430 .370 .469 -.594 1.288
variances
assumed
Equal variances not .789 2597 .430 .370 .469 -.594 1.288
assumed

This table showed that there was no significant difference in mean scores between

males and females (male mean = 32.49, SD = 12.06 and females mean = 32.12, SD =

11.83; t (2598) = 0.789, p = 0.430). The magnitude of the differences in the means

208
(mean difference = 0.370, 95% CI:-.594 to 1.288) was very small (partial eta squared

= 0.019). SPSS did not provide eta squared values for t-test. It was however,

calculated using the information provided in the output.

6.6.2 Difference according to regions (cities and villages).

An independent t-test was carried out to compare the SPM score means in regards to

region (table 6.17).

Table 6.17 Comparison of regions


Region (N) Mean SD Std. Error Mean
Cities 900 28.49 11.75 .392
Villages 900 28.18 10.51 .350
t-test for Equality of Means
Levene's Test for 95%
Equality of Variances Confidence
F Sig. t df Sig.(2- Mean Std. Error Interval of the
tailed) Difference Difference Difference
Lowe Uppe
r r
Equal
variances 13.43 .000 -.588 1798 .556 -.309 .525 -1.340 .721
assumed
Equal variances not
assumed -.588 1777 .556 -.309 .525 -1.340 .721

As levene's test was significant, the t value when equal variances not assumed was

used (Pallant, 2007). There was no significant difference in scores for cities (mean

28.49, SD 11.75) and villages (mean = 28.18, SD = 10.51; t (1777) = -0.588, p =

0.556)). The magnitude of the differences in the means (mean difference = -0.309,

95% CI:-1.340 to .721) was very small (partial eta squared = -0.028). SPSS did not

provide eta squared values for t-test. It was, however, calculated using the information

provided in the output.

209
6.6.3 Difference according to academic discipline

An independent t-test was carried out to compare the SPM score means in regards to

academic discipline (table 6.18).

Table 6.18 Comparison of academic discipline


academic discipline (N) Mean SD Std. Error Mean
Science 400 42.34 8.56 .428
Arts 400 40.16 7.88 .394
t-test for Equality of Means
Levene's Test for Equality 95%
of Variances Confidence
F Sig. t df Sig.(2- Mean Std. Error Interval of the
tailed) Difference Difference Difference
Lower Upper
Equal 2.537 .112 -3.76 798 .000 -2.178 .581 -3.32 -1.04
variances
assumed
Equal variances not -3.76 793 .000 -2.178 .581 -3.32 -1.04
assumed

Results showed that there was a statistically significant difference in scores between

arts discipline (mean 40.16, SD 7.88) and science discipline (mean = 42.34, SD =

8.56; t (798) = -3.76, p = 0.000) in favour of science students. The magnitude of the

differences in the means (mean difference = -2.178, 95% CI:-3.32 to -1.04) was large

(partial eta squared = -0.27). SPSS did not provide eta squared values for t-test. It

was, however, calculated using the information provided in the output.

210
6.6.4 Difference according to geographic areas

One way ANOVA was conducted to compare the SPM means for the geographic

areas (table 6.19) and post hoc Tukey test for multiple comparisons (table 6.20).

Table 6.19 Comparison of geographic areas


Geographic areas (N) Mean SD
Main city 600 28.66 11.946
Secondary city 300 28.54 11.379
Coastal 300 28.50 10.529
Mountain 300 27.50 10.131
Dessert 300 28.12 10.756
Total 1800 28.33 11.145
Source Sum of Squares df Mean Squares F. Ratio F. Prob.
Between Groups 309.571 4 77.393 .623 .646
Within Groups 223149.748 1795 124.317
Total 223459.320 1799

Table 6.20 Post Hoc Tukey (HSD) Test


(I) (J) Mean Std. Sig. 95% Confidence
Geographic Geographic Difference Error Interval
areas areas (I-J) Lower Upper
Bound Bound
Main city Coastal .160 .780 1.000 -1.97 2.29
Mountain 1.160 .780 .571 -.97 3.29
Dessert .534 .780 .960 -1.60 2.66
Secondary city .116 .820 1.000 -2.12 2.35
Secondary Coastal .045 .945 1.000 -2.53 2.62
city Mountain 1.045 .945 .804 -1.53 3.62
Dessert .418 .945 .992 -2.16 3.00
Main city -.116 .820 1.000 -2.35 2.12
Coastal Mountain 1.000 .910 .807 -1.49 3.49
Dessert .373 .910 .994 -2.11 2.86
Main city -.160 .780 1.000 -2.29 1.97
Secondary city -.045 .945 1.000 -2.62 2.53
Mountain Coastal -1.000 .910 .807 -3.49 1.49
Dessert -.627 .910 .959 -3.11 1.86
Main city -1.160 .780 .571 -3.29 .97
Secondary city -1.045 .945 .804 -3.62 1.53
Dessert Coastal -.373 .910 .994 -2.86 2.11
Mountain .627 .910 .959 -1.86 3.11
Main city -.534 .780 .960 -2.66 1.60
Secondary city -.418 .945 .992 -3.00 2.16

211
Participants were from five different geographic areas. The results showed that there

were no statistically significant differences in SPM scores for the five geographic

areas F (4, 1795) = 0.623, p = 0.646. The effect size, calculated using eta squared

(divide the sum of squares between-groups (309.571) by the total sum of squares

(223459.320) (Pallant, 2007)) the resulting eta squared value was 0.001, which

indicated a very small effect size. Post-hoc comparisons using the Tukey HSD test

indicated that there were no statistical significant differences between the five

different geographic areas.

6.6.5 Difference according to age.

One-way ANOVA was used to compare the SPM score means difference in regards to

age (table 6.21), and post hoc Tukey (HSD) test (table 6.22).

Table 6.21 Comparison according to age


Age (N) Mean SD Age (N) Mean SD
8 180 15.82 6.33 15 180 34.63 8.13
9 180 17.92 6.67 16 180 36.04 8.94
10 180 20.89 7.99 17 180 38.62 8.54
11 180 25.21 9.16 18 200 39.30 9.22
12 180 28.65 8.89 19 200 41.22 8.30
13 180 32.10 8.50 20 200 41.91 7.90
14 180 33.42 8.21 21 200 42.56 7.34
Total 2600 32.31 11.94
Source Sum of Squares Df Mean Squares F. Ratio F. Prob.
Between Groups 197151.289 13 15165.484 225.846 .000
Within Groups 173648.746 2586 67.150
Total 370800.035 2599

212
Table 6.22 Post Hoc Tukey (HSD) Tests
Age 8 9 10 11 12 13 14 15 16 17 18 19 20
8
9 .453
10 .000 .036
11 .000 .000 .000
12 .000 .000 .000 .005
13 .000 .000 .000 .000 .005
14 .000 .000 .000 .000 .000 .962
15 .000 .000 .000 .000 .000 .158 .980
16 .000 .000 .000 .000 .000 .000 .120 .936
17 .000 .000 .000 .000 .000 .000 .000 .000 .140
18 .000 .000 .000 .000 .000 .000 .000 .000 .008 1.000
19 .000 .000 .000 .000 .000 .000 .000 .000 .000 .105 .519
20 .000 .000 .000 .000 .000 .000 .000 .000 .000 .007 .082 1.000
21 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .005 .935 1.000
*. The mean difference is significant at the 0.05 level.

Participants were from fourteen different ages. There were statistically significant

differences (p = 0.05) in SPM scores for age F (13, 2586) = 225.846, p = 0.000. The

effect size, calculated using eta squared (divide the sum of squares between-groups

(3535.138) by the total sum of squares (8979.386) (Pallant, 2007)), the resulting eta

squared value was 0.53, which indicated a large effect size. Post-hoc comparisons

using the Tukey HSD test indicated that there were statistical significant differences

between the different ages except between the (8 and 9 years), (13 through 15 years),

(14 through 16 years), (16 and 17 years), (17 through 19 years), (18 through 20 years)

and (19 through 21 years) with the exception of higher mean scores for older student.

6.6.6 Difference according to study levels

One way ANOVA was conducted to compare the SPM means in regards to study

levels (table 6.23) and post hoc Tucky (HSD) test for multiple comparisons

(table 6.24)

213
Table 6.23 Comparison according to study levels
Study levels (N) Mean SD
Elementary 720 19.96 8.38
Preparatory 540 31.39 8.77
Secondary 540 36.43 8.68
University 800 41.25 8.29
Total 2600 32.31 11.94
Source Sum of Squares df Mean Squares F. Ratio F. Prob.
Between Groups 183360.732 3 61120.244 846.504 .000
Within Groups 187439.303 2596 72.203
Total 370800.035 2599

Table 6.24 Post Hoc Tukey (HSD) Test


(I) study (J) study Mean Std. Sig. 95% Confidence Interval
levels levels Difference Error Lower Upper
(I-J) Bound Bound
Elementary Preparatory -11.428* .484 .000 -12.67 -10.18
Secondary -16.471* .484 .000 -17.71 -15.23
University -21.288* .437 .000 -22.41 -20.17
Preparatory Elementary 11.428* .484 .000 10.18 12.67
Secondary -5.043* .517 .000 -6.37 -3.71
University -9.860* .473 .000 -11.08 -8.64
Secondary Elementary 16.471* .484 .000 15.23 17.71
Preparatory 5.043* .517 .000 3.71 6.37
University -4.817* .473 .000 -6.03 -3.60
University Elementary 21.288* .437 .000 20.17 22.41
Preparatory 9.860* .473 .000 8.64 11.08
Secondary 4.817* .473 .000 3.60 6.03
*The mean difference is significant at the 0.05 level.

Participants were from four study levels. There were statistically significant

differences in SPM scores between the four study levels F (3, 2596) = 846.504, p =

0.000. The effect size, calculated using eta squared (divide the sum of squares

between-groups (183360.732) by the total sum of squares (370800.035) (Pallant,

2007)), the resulting eta squared value was 0.49, which indicated a large effect. Post-

hoc comparisons using the Tukey HSD test indicated that there were statistical

214
significant differences between the different all study levels in favour of highest

levels.

6.6.7 Difference according to regions and study levels

Two-way ANOVA was conducted on SPM scores in regards to study levels and

regions (table 6.25).

Table 6.25 Comparison of the region according to study levels


Study levels Gender (N) Mean SD
Elementary Cities 360 19.97 8.58
Village 360 19.95 8.20
Total 720 19.96 8.38
Preparatory Cities 270 31.35 9.39
Village 270 31.42 8.10
Total 540 31.39 8.76
Secondary Cities 270 36.97 9.88
Village 270 35.90 7.28
Total 540 36.43 8.68
Total Cities 900 28.49 11.75
Village 900 28.18 10.51
Total 1800 28.33 11.15

Table 6.26 Levene's Test of Equality of Error Variances of SPM scores


F df1 df2 Sig.
7.570 5 1794 .000

Table 6.27 Tests of Between-Subjects Effects of SPM scores


Source Type III df Mean F Sig. Partial
Sum of Square Eta
Squares Squared
Corrected Model 91088.29 5 18217.66 246.901 .000 .408
Intercept 1513091.6 1 1513091.63 20506.649 .000 .920
Study levels 90933.52 2 45466.76 616.203 .000 .407
Region 51.35 1 51.35 .696 .404 .001
Study levels*Region 111.74 2 55.87 .757 .469 .001
Error 132371.03 1794 73.79
Total 1668165.23 1800
Corrected Total 223459.32 1799
a. R Squared = .408 (Adjusted R Squared = .406)

215
Table 6.28 Post Hoc Tukey (HSD) Test
(I) Study (J) Study MD Std. Sig. 95% Confidence Interval
levels levels Error Lower Upper
Bound Bound
Elementary Preparatory -11.43* .489 .000 -12.58 -10.28
Secondary -16.47* .489 .000 -17.62 -15.32
Preparatory Elementary 11.43* .489 .000 10.28 12.58
Secondary -5.04* .523 .000 -6.27 -3.82
Secondary Elementary 16.47* .489 .000 15.32 17.62
Preparatory 5.04* .523 .000 3.82 6.27
The mean difference is significant at the .05 level. MD= Mean Difference (I-J)

The interaction effect between regions and study levels was not statistically

significant, F (2, 1794) = .757, P = .469. There was no statistically significant main

effect for region, F (1, 1794) = .696 P = 0.404; the magnitude of the effect size was

very small (partial eta squared = .001). Post-hoc comparisons using Tukey HSD test

showed that there were statistical significant differences between the different study

levels. The main effect for study levels, F (2, 1794) = 616.203, P =.000, exhibited

statistical significance.

It is worth noting that the Leven’s test was significant, indicating that group variance

is not equal. However a better method to ascertain homogeneity of variance was by

dividing the largest variance by the smallest variance in each group. A result of 2 or

above means the variance was unequal. All results were below 2 which indicated

equal variance (Field, 2006).

216
6.6.8 Difference according to regions and gender.

Two-way ANOVA was conducted for regions in regards to gender.

Table 6.29 Comparison of the regions according to gender


Regions Gender (N)sample Mean SD
cities Male 450 28.83 12.06
Female 450 28.14 11.44
Total 900 28.49 11.75
villages Male 450 28.36 10.91
Female 450 27.99 10.10
Total 900 28.18 10.51
Total Male 900 28.59 11.49
Female 900 28.07 10.79
Total 1800 28.33 11.15

Table 6.30 Levene's Test of Equality of Error Variances of SPM scores


F df1 df2 Sig.
7.401 3 1796 .000

Table 6.31 Tests of Between-Subjects Effects of SPM scores


Source Type III df Mean F Sig. Partial
Sum of Square Eta
Squares Squared
Corrected Model 179.032 3 59.677 .480 .696 .001
Intercept 1444705.915 1 1444705.915 11620.783 .000 .866
REGIONS 43.031 1 43.031 .346 .556 .000
SEX 124.636 1 124.636 1.003 .317 .001
REGIONS * SEX 11.365 1 11.365 .091 .762 .000
Error 223280.287 1796 124.321
Total 1668165.234 1800
Corrected Total 223459.320 1799
a. R Squared = .001 (Adjusted R Squared = -.001)

Participants were divided into two groups according to the regions (cities and

villages). The interaction effect between regions and gender was not statistically

significant, F (1, 1796) = 0.091, P = 0.762. There was no statistically significant main

effect for regions, F (1, 1796) = 0.346 P = 0.556; the magnitude of the effect size was

very small (partial eta squared = .001). The main effect for gender, F (1, 1796) =

217
1.003 P = 0.317; did not exhibit statistical significance. The significant result of

Leven’s test was further tested as mentioned earlier. Variance was equal.

6.6.9 Difference according to age and region.

Two-way ANOVA was conducted for age in regards to region.

Table 6.32 Comparison of age according to region


Age Region N Mean SD Age Region N Mean SD
8 cities 90 15.99 6.13 13 cities 90 31.67 8.87
Villages 90 15.66 6.54 Villages 90 32.53 8.14
Total 180 15.82 6.33 Total 180 32.10 8.50
9 cities 90 17.92 6.16 14 cities 90 33.31 9.10
Villages 90 17.91 7.18 Villages 90 33.52 7.26
Total 180 17.92 6.67 Total 180 33.42 8.21
10 cities 90 20.56 8.23 15 cities 90 35.28 9.21
Villages 90 21.22 7.78 Villages 90 33.99 6.88
Total 180 20.89 7.99 Total 180 34.63 8.13
11 cities 90 25.42 9.46 16 cities 90 36.37 10.17
Villages 90 25.01 8.90 Villages 90 35.71 7.54
Total 180 25.21 9.16 Total 180 36.04 8.94
12 cities 90 29.08 9.78 17 cities 90 39.25 9.91
Villages 90 28.22 7.95 Villages 90 37.99 6.91
Total 180 28.65 8.89 Total 180 38.62 8.54

Total Region N Mean SD


cities 900 28.49 11.75
Villages 900 28.18 10.51
Total 1800 28.33 11.15

Table 6.33 Levene's Test of Equality of Error Variances of SPM scores


F df1 df2 Sig.
5.701 19 1780 .000

218
Table 6.34 Tests of Between-Subjects Effects of SPM scores
Source Type III df Mean F Sig. Partial
Sum of Square Eta
Squares Squared
Corrected Model 103802.844 19 5463.308 81.272 .000 .465
Intercept 1444705.915 1 1444705.915 21491.328 .000 .924
age 103536.949 9 11504.105 171.134 .000 .464
Region 43.031 1 43.031 .640 .424 .000
age * Region 222.864 9 24.763 .368 .950 .002
Error 119656.476 1780 67.223
Total 1668165.234 1800
Corrected Total 223459.320 1799
a. R Squared = .465 (Adjusted R Squared = .459)

Table 6.35 Post Hoc Tukey (HSD) test


Cities Age 8 9 10 11 12 13 14 15 16
8
9 .907
10 .021 .607
11 .000 .000 .010
12 .000 .000 .000 .151
13 .000 .000 .000 .000 .631
14 .000 .000 .000 .000 .047 .966
15 .000 .000 .000 .000 .000 .166 .898
16 .000 .000 .000 .000 .000 .015 .384 .998
17 .000 .000 .000 .000 .000 .000 .000 .083 .478
Villages Age 8 9 10 11 12 13 14 15 16
8
9 .577
10 .000 .086
11 .000 .000 .024
12 .000 .000 .000 .108
13 .000 .000 .000 .000 .004
14 .000 .000 .000 .000 .000 .997
15 .000 .000 .000 .000 .000 .950 1.000
16 .000 .000 .000 .000 .000 .026 .616 .870
17 .000 .000 .000 .000 .000 .000 .002 .012 .562
*. The mean difference is significant at the 0.05 level.

219
Figure 5.8 means score differences of age and region

Participants were divided into two groups according to region (cities and villages).

The interaction effect between region and age was not statistically significant, F (9,

1780) = .368, P = .590. There was no statistically significant main effect for region, F

(1, 1780) = .640 P = .424; the magnitude of the effect size was large (partial eta

squared = .47). The main effect for age, F (9, 1780) = 171.134 P = .000; was

statistical significance. Post-hoc comparisons using Tukey HSD test showed that in

cities, statistical significance were found between all age groups except between the

(8 and 9), (9 and 10), (11 and 12), (12 and 13), (13, 14 and 15), (14, 15 and 16) and

(15,16 and 17) ages. In villages, statistical significant differences were found between

all age groups except between the (8 and 9), (9 and 10), (11 and 12), (13, 14 and 15),

(14, 15 and 16), (15 and 16) and (16 and 17) ages. The significant result of Leven’s

test was further tested as mentioned earlier. Variance was equal.

220
6.6.10 Difference according to geographic areas and gender

Two-way ANOVA was conducted for geographic areas in regards to gender.

Table 6.36 Comparison of the geographic areas according to gender


Geographic areas Gender (N) Mean SD
Main city Male 300 29.16 12.38
Female 300 28.15 11.47
Total 600 28.66 11.95
Secondary city Male 150 28.60 11.41
Female 150 28.49 11.39
Total 300 28.54 11.38
Coastal Male 150 28.64 10.95
Female 150 28.35 10.13
Total 300 28.50 10.53
Mountain Male 150 28.10 10.71
Female 150 26.89 9.52
Total 300 27.50 10.13
Dessert Male 150 27.83 10.95
Female 150 28.42 10.59
Total 300 28.12 10.76
Total Male 900 28.59 11.49
Female 900 28.07 10.77
Total 1800 28.33 11.15

Table 6.37 Levene's Test of Equality of Error Variances of SPM scores


F df1 df2 Sig.
3.052 9 1790 .001

Table 6.38 Tests of Between-Subjects Effects of SPM scores


Source Type III df Mean F Sig. Partial
Sum of Square Eta
Squares Squared
Corrected Model 614.837 9 68.315 .549 .839 .003
Intercept 1295947.59 1 1295947.588 10409.709 .000 .853
GEOGRAPHIC
AREA 309.571 4 77.393 .622 .647 .001
GENDER 66.989 1 66.989 .538 .463 .000
GEOGRAPHIC
AREA * GENDER 180.630 4 45.158 .363 .835 .001
Error 222844.48 1790 124.494
Total 1668165.23 1800
Corrected Total 223459.32 1799
a. R Squared = .003 (Adjusted R Squared = -.002)

221
Table 6.39 Post Hoc Tukey (HSD) Test
(I) (J) MD Std. Sig. 95% Confidence Interval
Geographic Geographic Error Lower Upper
areas areas Bound Bound
Main city Coastal .16 .781 1.000 -1.97 2.29
Mountain 1.16 .781 .572 -.97 3.29
Dessert .53 .781 .960 -1.60 2.67
Secondary city .12 .821 1.000 -2.12 2.36
Secondary city Coastal .04 .945 1.000 -2.54 2.63
Mountain 1.04 .945 .804 -1.54 3.63
Dessert .42 .945 .992 -2.16 3.00
Main city -.12 .821 1.000 -2.36 2.12
Coastal Mountain 1.00 .911 .808 -1.49 3.49
Dessert .37 .911 .994 -2.11 2.86
Main city -.16 .781 1.000 -2.29 1.97
Secondary city -.04 .945 1.000 -2.63 2.54
Mountain Coastal -1.00 .911 .808 -3.49 1.49
Dessert -.63 .911 .959 -3.11 1.86
Main city -1.16 .781 .572 -3.29 .97
Secondary city -1.04 .945 .804 -3.63 1.54
Dessert Coastal -.37 .911 .994 -2.86 2.11
Mountain .63 .911 .959 -1.86 3.11
Main city -.53 .781 .960 -2.67 1.60
Secondary city -.42 .945 .992 -3.00 2.16
MD= Mean Difference (I-J)

The interaction effect between geographic areas and gender was not statistically

significant, F (4, 1790) = .213, P = .887. There was no statistically significant main

effect for geographic areas, F (4, 1790) = .622 P = 0.647; the magnitude of the effect

size was very small (partial eta squared = .003). Post-hoc comparisons using Tukey

HSD test showed that there were no statistical significant differences between the

different geographic areas. The main effect for gender, F (1, 1790) = .538, P =.463,

did not exhibit statistical significance. The significant result of Leven’s test was

further tested as mentioned earlier. Variance was equal.

222
6.6.11 Difference according to academic discipline and gender

Two-way ANOVA was conducted for academic discipline in regards to gender.

Table 6.40 Comparison of academic discipline according to gender


academic discipline Gender (N)sample Mean SD
Science Male 200 42.90 7.99
Female 200 41.78 9.07
Total 400 42.34 8.56
Arts Male 200 39.62 7.79
Female 200 40.70 7.94
Total 400 40.16 7.88
Total Male 400 41.26 8.05
Female 400 41.24 8.53
Total 800 41.25 8.29

Table 6.41 Levene's Test of Equality of Error Variances of SPM scores


F df1 df2 Sig.
2.193 3 796 .088

Table 6.42 Tests of Between-Subjects Effects of SPM scores


Source Type III df Mean F Sig. Partial
Sum of Square Eta
Squares Squared
Corrected Model 1189.264 3 396.421 5.874 .001 .022
Intercept 1361167.501 1 1361167.50 20167.61 .000 .962
DISCIPLINE 948.301 1 948.301 14.050 .000 .017
GENDER .061 1 .061 .001 .976 .000
GENDER*
240.901 1 240.901 3.569 .060 .004
DISCIPLINE
Error 53724.235 796 67.493
Total 1416081.000 800
Corrected Total 54913.499 799
a. R Squared = .022 (Adjusted R Squared = .018)

Participants were divided into two groups according to academic discipline (arts and

science). The interaction effect between academic discipline and gender was not

statistically significant, F (1, 796) = 3.569, P = 0.060. There was statistically

significant main effect for academic discipline, F (1, 796) = 14.050 P = 0.000; the

223
magnitude of the effect size was a small (partial eta squared = .022). The main effect

for gender, F (1, 796) = .001 P = 0.976; did not exhibit statistical significance.

Leven’s equality test was not significant indicating that the group variance was equal.

6.6.12 Difference according to age and gender

A two-way ANOVA was conducted for age in regards to gender

Table 6.43 Comparison of age according to gender


Age Gender N Mean SD Age Gender N Mean SD
8 Male 90 15.51 6.23 15 Male 90 35.92 7.55
Female 90 16.14 6.44 Female 90 33.34 8.51
Total 180 15.82 6.33 Total 180 34.63 8.13
9 Male 90 17.04 6.60 16 Male 90 37.44 9.10
Female 90 18.79 6.66 Female 90 34.65 8.59
Total 180 17.92 6.67 Total 180 36.04 8.94
10 Male 90 18.81 6.97 17 Male 90 39.95 8.17
Female 90 22.97 8.43 Female 90 37.29 8.74
Total 180 20.89 7.99 Total 180 38.62 8.54
11 Male 90 26.90 9.49 18 Male 100 39.86 8.65
Female 90 23.53 8.55 Female 100 38.75 9.77
Total 180 25.21 9.16 Total 200 39.30 9.22
12 Male 90 28.44 7.92 19 Male 100 41.25 8.40
Female 90 28.86 9.81 Female 100 41.19 8.24
Total 180 28.65 8.89 Total 200 41.22 8.30
13 Male 90 32.40 8.31 20 Male 100 41.74 7.27
Female 90 31.80 8.73 Female 100 42.08 8.50
Total 180 32.10 8.50 Total 200 41.91 7.90
14 Male 90 33.52 8.00 21 Male 100 42.18 7.75
Female 90 33.31 8.46 Female 100 42.94 6.92
Total 180 33.42 8.21 Total 200 42.56 7.34

Total Gender N Mean SD


Male 1300 32.49 12.06
Female 1300 32.12 11.83
Total 2600 32.31 11.94

224
Table 6.44 Levene's Test of Equality of Error Variances
F df1 df2 Sig.
3.131 27 2572 .000

Table 6.45 Tests of Between-Subjects Effects of SPM scores


Source Type III df Mean F Sig. Partial
Sum of Square Eta
Squares Squared
Corrected Model 199685.204 27 7395.748 111.164 .000 .539
Intercept 2659929.375 1 2659929.375 39980.978 .000 .940
AGE 197151.289 13 15165.484 227.950 .000 .535
GENDER 94.098 1 94.098 1.414 .234 .001
AGE * GENDER 2445.059 13 188.081 2.827 .000 .014
Error 171114.832 2572 66.530
Total 3084246.234 2600
Corrected Total 370800.035 2599
a. R Squared = .539 (Adjusted R Squared = .534)

Table 6.46 Post Hoc Tukey (HSD) test


Age 8 9 10 11 12 13 14 15 16 17 18 19 20
8
9 .453
10 .000 .036
11 .000 .000 .000
12 .000 .000 .000 .005
13 .000 .000 .000 .000 .005
14 .000 .000 .000 .000 .000 .962
15 .000 .000 .000 .000 .000 .158 .980
16 .000 .000 .000 .000 .000 .000 .120 .936
17 .000 .000 .000 .000 .000 .000 .000 .000 .140
18 .000 .000 .000 .000 .000 .000 .000 .000 .008 1.000
19 .000 .000 .000 .000 .000 .000 .000 .000 .000 .105 .519
20 .000 .000 .000 .000 .000 .000 .000 .000 .000 .007 .082 1.000
21 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .005 .935 1.000
*. The mean difference is significant at the 0.05 level.

225
Figure 5.9 Means score difference of age and gender

The interaction effect between age groups and gender was statistically significant, F

(13, 2572) = 2.827, P = 0.000. There was a statistically significant main effect for age,

F (13, 2572) = 227.950 P = 0.000; the magnitude of the effect size was large (partial

eta squared = .54). Post-hoc comparisons using the Tukey HSD test indicated that

there were statistical significant differences between the different age except between

the (8 and 9 years), (13 through 15 years), (14, through 16 years), (16 and 17 years),

(17 through 19 years), (18 through 20 years) and (19 through 21 years) with the

exception of higher mean scores for older student. As a significant interaction result

was obtained, an analysis of simple effects was carried out, in which the sample

would be split into groups according to one of the independent variables and running

statistical tests to explore the effect of the other variable. So, to determine whether

there are statistically significance differences between either males or females score

means among different ages the sample was split according to age and an Independent

Samples test was employed to compare means. Results showed there were no gender

statistically significant differences at the ages of 8, 9, 12, 13, 14 and 18 through 19.

226
Female obtained statistically significant higher mean than male at the age 10 year. At

the ages of 11 and 15 through 17, male obtained significantly significant higher means

than female. However the main effect for gender, F (1, 2572) = 1.414, P =.234, did

not exhibit statistical significance. The significant result of Leven’s test was further

tested as mentioned earlier. Variance was equal.

6.6.13 Difference according to academic discipline and age

Two-way ANOVA was conducted for academic discipline in regards to age

Table 6.47 Comparison of academic discipline according to age


Age academic discipline (N) Mean SD
18 science 100 40.48 9.34
Arts 100 38.13 8.81
Total 200 39.30 9.22
19 science 100 41.19 7.59
Arts 100 41.25 8.99
Total 200 41.22 8.30
20 science 100 43.01 7.83
Arts 100 40.81 7.84
Total 200 41.91 7.90
21 science 100 44.67 7.66
Arts 100 40.45 6.36
Total 200 42.56 7.34
Total science 400 42.34 8.56
Arts 400 40.16 7.88
Total 800 41.25 8.29

Table 6.48 Levene's Test of Equality of Error Variances of SPM scores


F df1 df2 Sig.
4.776 7 792 .000

227
Table 6.49 Tests of Between-Subjects Effects of SPM scores
Source Type III df Mean F Sig. Partial
Sum of Square Eta
Squares Squared
Corrected Model 2595.849 7 370.836 5.614 .000 .047
Intercept 1361167.501 1 1361167.501 20605.755 .000 .963
AGE 1187.124 3 395.708 5.990 .000 .022
DISCIPLINE 948.301 1 948.301 14.356 .000 .018
AGE * 460.424 3 153.475 2.323 .074 .009
DISCIPLINE
Error 52317.650 792 66.058
Total 1416081.000 800
Corrected Total 54913.499 799
a R Squared = .047 (Adjusted R Squared = .039)

Table 6.50 Post Hoc Tukey (HSD) test


(I) Age (J) Mean Difference Std. Sig. 95% Confidence Interval
Age (I-J) Error Lower Bound Upper Bound
18 19 -1.91 .813 .087 -4.01 .18
20 * .813 .008 -4.70 -.51
-2.61
21 * .813 .000 -5.35 -1.16
-3.26
19 18 1.91 .813 .087 -.18 4.01
20 -.69 .813 .831 -2.78 1.40
21 -1.34 .813 .352 -3.43 .75
20 18 2.61* .813 .008 .51 4.70
19 .69 .813 .831 -1.40 2.78
21 -.65 .813 .855 -2.74 1.44
21 18 3.26* .813 .000 1.16 5.35
19 1.34 .813 .352 -.75 3.43
20 .65 .813 .855 -1.44 2.74
The mean difference is significant at the .05 level.

The interaction effect between academic discipline and age was not statistically

significant, F (3, 792) = 2.323, P = .074. There was a statistically significant main

effect for academic discipline, F (1, 792) = 14.356P = 0.000; the magnitude of the

effect size was very small (partial eta squared = .047). There was a statistically

significant main effect for age, F (3, 792) = 5.990, P = 0.000. Post-hoc comparisons

using the Tukey HSD test indicated that only the means score for age 18 year (M =

228
38.13, SD 8.81) was different from the 20 year (M = 40.81, SD 7.84) and from the 21

year (M = 40.81, SD 7.84). The significant result of Leven’s test was further tested as

mentioned earlier. Variance was equal. Furthermore, the magnitude of the difference

between groups in terms of standard deviation units (Cohen’s d) was calculated

(Pallant, 2007).

Table 6.51 Magnitude of gender differences in means score and variability on SPM as
functions of age, geographic areas and discipline.
Age (N 2600; male = 1300 and female = 1300).
age t sig Vr d IQ Point Pc IQs
8 -.663 .508 0.93 -0.01 -0.15 16 85
9 -1.767 .079 0.98 -0.26 -3.90 13 83
10 -3.608 .000 0.86 -0.52 -7.80 8 79
11 2.502 .013 1.23 0.37 5.55 4 74
12 .476 .757 0.65 -0.02 -0.30 7 78
13 .169 .634 0.90 0.07 1.05 9 80
14 .169 .866 0.89 0.03 0.45 8 79
15 2.152 .033 0.79 0.32 4.80 10 81
16 2.115 .036 1.24 0.31 4.65 10 81
17 2.106 .037 0.87 0.31 4.65 12 83
18 .851 .396 0.78 0.12 1.80 9 80
19 .051 .959 1.04 0.01 0.15 11 82
20 -.304 .762 0.72 0.04 0.60 11 82
21 -.732 .465 1.26 -0.10 -1.50 4 83

Score t sig Vr d IQ Point Pc IQs


Total .789 .430 1.04 0.03 0.45 10 81
Geographic areas (N 1800; male = 900 and female = 900).
Geographic areas t sig Vr d IQ Point
Main city 1.135 .287 1.16 0.03 0.45
Secondary city .006 .938 1.01 0.10 1.50
Coastal .057 .811 1.16 -0.05 -0.75
Mountain 1.073 .301 1.60 0.08 1.20
Desert .224 .637 1.07 0.01 0.15

Score t sig Vr d IQ Point


Total 1.002 .317 1.14 0.04 0.60
Academic discipline (N 800; male = 400 and female = 400).
Discipline t sig Vr d IQ Point
Science 1.304 .193 0.78 0.13 1.95
Arts -1.373 .171 0.96 -0.14 -2.10
total .030 .976 0.89 0.02 0.30

229
T values for the difference between males and females in each age group, t values for

the difference between males and females in each geographic nature, t values for the

difference between males and females in each academic discipline and t value for the

difference between males and females in the total sample, level of significance,

Cohen’s d scores (the difference between the male and female means divided by the

within group standard deviation; Cohen, 1977), the variance ratios (Vr, i.e. the

variance of the male divided by the variance of the female; Lynn and Irwing, 2004)

Vr’s greater than 1.0 indicate that males had greater variance than females, while Vr’s

less than 1.0 indicate that females had greater variance than males (Khaleefa and

Lynn 2008), IQ point differences between males and females in each age group as

well as in total sample, British percentile equivalents of the means of the male and

female combined on the British norms for the Standard Progressive Matrices collected

in 1979 and given in Raven (1981) , and these converted to IQs, were all calculated.

The results showed three interesting features. First, the British percentile equivalents

are the 16th PC for the 8 year olds (IQ=85), the 13th PC for the 9 year olds (IQ=83),

the 8th PC for the 10 year olds (IQ= 79), and average the 6.7th PC (IQ= 79.4) for the

11-17 year olds. The American percentiles percentile equivalents are the 9th PC for the

18 year olds (IQ=80), the 11th PC for the 19 and 20 years olds (IQ=82), the 4th PC for

the 21 year olds (IQ= 83), and average the 8.75th PC (IQ= 81.75). Overall, the IQs

obtained by the Libyan students range between 74 and 85. The average IQ for the

fourteen tested Libyan age groups 8 through 21 was 81.

Second, lack of significant gender differences in total means and in ages 8, 9, 12, 13,

14, 18, through 21. At the age 10 years, females obtained a significantly higher mean

than males. Males obtained statistically higher means than female at ages of 11 and 15

230
through 17. In total, males obtained a higher mean than females by 0.03d = (0.45 IQ)

points. Regarding geographic areas, results showed lack of significant gender

differences in total means and in all geographic areas means. In total, males obtained

a higher mean than females by 0.04d = (0.60 IQ) points. Concerning academic

discipline analysis also showed lack of significant gender differences in total means

and in each discipline (science & art) means. In total, males obtained a higher mean

than females by 0.02d = (0.30 IQ) points

Third, the gender difference in variability (Vr) in total sample and within each age

group, geographic areas and academic discipline can be seen from the standard

deviations and variance ratios. At the ages of 8, 9, 10, 12, 13, 14, 15, 17, 18 and 20

years old, females have greater variability than males. In total means score and at ages

of 11, 16, 19 and 21 years, males had greater variability than females. Concerning

geographic areas, results showed males have greater variability than females in total

sample and in each geographic area. Regarding academic discipline, results showed

females have greater variability than males in total sample and in each academic

discipline. These results showed no consistent tendency for gender difference in

variability.

231
6.7 Multiple Regression according to independent variables

To investigate the contribution of the independent variables; age, gender, region and

achievement in the prediction of the SPM scores, a multiple stepwise regression

method was used.

Table 6.52 Stepwise Regression for Independent Variables and the SPM Scores
Model Unstandardised Coffi. Standardised T Sign.
Coffi.
B Std. Error Beta
1- (Constant) 8.838 .545 .670 16.204 .000
age 2.599 .068 38.268 .000
2- (Constant) 7.929 .554 14.324 .000
age, achievement 4.230 .085 .575 26.194 .000
6.218 .001 .404 13.027 .000
Model Summary
Model R R Adjusted Stand. Error of
Square R Square Estimate
1- Age .670 .449 .449 8.276
2- Age, Achievement. .681 .464 .463 8.167

As age was equal in effect to study level, age was used in this analysis. Using the

Step-Wise method, a significant model emerged (Adjusted R square = 0.463; F 1, 1798 =

1464.428, p < 0.000). Significant variables are shown below:

Predictor Variable Beta p

Grade = (age) 0.670 p < 0.000

Achievement 0.404 p < 0.000

Gender was not a significant predictor (p = 0.989). Also region was not a significant

predictor (p = 0.986). This showed that both age and achievement were predictors for

SPM results with the age being a better predictor.

232
6.8 The Percentile Ranks of the SPM Score

The sixth research objective was “to compute the percentile ranks for the SPM scores

according to the significant variables”. Since Raven has used the percentiles to test

intelligence percentage and to determine the position of an individual among all the

individuals of the sample and of the same age, we also used the same scale

(percentiles). Age, gender and academic discipline have been taken into account. As

region was not a significant variable, its percentile ranks was not calculated. Table

6.53 showed detailed percentile 2007-2008 Norms for Libya students according to age

Table 6.53 detailed percentile 2007-2008 Norms for Libya students according to age
Percentile Age in years
       
 18 1 2 2
 0  6 1  47 7  8 49 52 50 53 54
22   8 1 42 3   48 50 48 51 52
 18 1 6 2 5 40 0 3 4 46 46 47 48
 6 8 20 6    5  39 41 42 43 43
  12   2 7 8 8 29 2 33 35 37 37
  0  2 4     5 29 29 32 33
5  9 9 0 2  6 9 19 20 20 25 29 30
N 180 180 180 180 180 180 180 180 180 180 200 200 200 200

To explain these results, a ten years old child gets 33 in the SPM test which is better

than 95% of the same sample at the same age because this score falls in the percentile

95 of the total sample. On the other hand, another 13 years old gets in the SPM test 33

but it is better than 50% in the sample of the same age. The same score, 33, puts an 18

year old in the percentile of 25. A 21 year old goes in the percentile of 10. Table 5.54

showed detailed percentile 2007-2008 Norms for the Libyan students according to age

and gender. Full range of the Libya norms according to age and each SPM score (1 to

60) can be found in appendix 1

233
Table 6.54 detailed percentile 2007-2008 Norms for the Libyan students according to
age and gender.
Age in years
8 9 10 11 12 13 14
Percentile MA FE MA FE MA FE MA FE MA FE MA FE MA FE
95 29 32 35 33 33 37 44 41 42 45 44 48 48 47
90 22 25 27 29 30 37 39 37 39 43 42 42 45 43
75 18 17 18 22 21 28 34 29 35 36 39 37 39 39
50 15 15 16 17 18 23 26 23 29 27 34 32 34 33
25 12 12 13 14 14 18 20 17 25 23 27 27 29 29
10 9 10 11 12 11 12 14 13 17 15 20 19 23 21
5 7 9 10 11 10 10 11 11 14 12 15 16 18 17
n 90 90 90 90 90 90 90 90 90 90 90 90 90 90
Age in years
15 16 17 18 19 20 21
Percentile MA FE MA FE MA FE MA FE MA FE MA FE MA FE
95 48 47 50 47 51 51 52 53 53 52 53 53 55 54
90 46 44 49 46 49 49 50 51 52 49 52 52 53 52
75 41 40 44 42 46 44 47 47 48 47 47 48 48 50
50 37 34 37 34 41 38 40 40 42 43 42 44 42 42
25 32 28 33 30 38 34 34 34 36 36 37 38 36 37
10 24 22 26 22 29 23 30 22 30 29 33 31 33 34
5 21 17 19 19 24 20 23 20 27 24 30 23 29 31
n 90 90 90 90 90 90 90 90 90 90 90 90 90 90

It is apparent from this table that differences between gender in some ages were

significant. For example, at age 10, differences were in favour of females. These

differences were also noticed in the percentiles from 0 to 7points. They vary by 4

points at 95th percentile, 7 points at 90th percentile, 7 points at 75th percentile, 5 points

at 50th percentile, 4 points at 25th percentile, 1 point at 10th percentile and 0 points at

5th percentile. Another example, at age 17, differences were in favour of males. These

differences were also noticed in the percentiles from 0 to 4points. They vary by 3

points at 95th percentile, 4 points at 90th percentile, 2 points at 75th percentile, 3 points

at 50th percentile, 3 points at 25th percentile, 4 points at 10th percentile and 0 points at

5th percentile. Table 5.55 showed detailed percentile 2007-2008 Norms for Libyan

students according to age and study discipline

234
Table 6.55 Detailed percentile (2007-2008) Norms for Libyan students according to
age and academic discipline
Percentile Age in years
18 19 20 21
Disciplines SC AR SC AR SC AR SC AR
95 55 51 53 53 54 53 55 51
90 53 48 51 51 52 51 53 48
75 48 44 49 47 48 47 52 46
50 42 38 44 42 45 41 45 40
25 37 34 38 35 38 37 39 36
10 25 29 30 29 35 30 34 33
5 22 20 27 24 27 26 32 30
n 100 100 100 100 100 100 100 100

It can be seen that difference between the percentile scores of Libyan science students

and arts students; e.g. (Sciences student 18 years) is from 7 to 14 points. They differ

by 4 points at 95th percentile, 5 points at 90th percentile, 5 points at 75th percentile, 6

points at 50th percentile, 3 points at 25th percentile, 4 points at 10th percentile and 2

points at 5th percentile.

Percentile ranks indicated that performance of Libyan students on the SPM test is

lower than subjects from other countries. Assessed against the SPM manual (1988,

1996, 2003, 2004 and 2008) data, Libyan students were below norms given for some

western countries. A comparison of the present data with the SPM norms given for

Taiwan (1989), India (1992), Netherlands (1992), France (1998), Turkey (1993),

Kosice & Slovakia (1987), British (1979 & 1992), Australia (1986), China (1986),

United States of America (1979 & 1992) and Slovenia (1998) and in the 1988, 1996,

2003, 2004 and 2008 SPM manuals according to the same age group, all indicated

that Libyan students were below the norms of the above countries (Appendix 2).

235
6.9 Chapter Summary

This chapter presented the results of the statistical analysis performed on the data

collected for this study. The SPM test was administered to 2600 students; 1800 school

students (900 males and 900 females) and 800 university students (400 males and 400

females). According to region, 900 school students were from cities, whereas the

remaining 900 were from villages. The university students (400 science students and

400 art students) were from two universities located in two cities; Al-Beida and

Al-Marj in the 2007-2008 academic year.

The overall SPM score means was 32.31with a standard deviation of 11.94 (minimum

scores 6 and maximum 58). Using the British and American percentiles, the SPM

scores were converted to IQ scores. Overall, the IQs obtained by the Libyan students

ranged between 74 and 85. The average IQ for the fourteen tested Libyan age groups

8 to 21 years was 81.

Test-retest, split-half reliability and alpha Reliability (KR 20) procedures were used to

investigate the SPM reliability. Test-retest reliability was .90 (N = 280), split-half

reliability for the total sample was .96 (N = 2600) and Alpha reliability was .94 (N =

2600). The results, in general, were in agreement with previous research and

supported the validity and reliability of the SPM test with Libyan sample.

Construct validity (factor analysis and internal consistency) and criterion-related

validity methods were used to establish validity of the SPM test; construct validity

factor analysis showed only one significant factor; Spearman’s “g”. Eigenvalue =

3.47; (69.41% of the variance). In addition, internal consistency results showed strong

positive correlation coefficients (0.50** to 0.85**) between the five subsets and the

236
SPM total score. According to criterion-related validity, analysis showed correlations

(0.33** to 0.56**) between SPM scores and (SAA) as an external criterion.

Item analysis was carried out for the SPM 60 items (N=2600). The SPM item

difficulty indicated that there were 42 items which appeared to be moderate in

difficulty, 11 items appeared to be easy and 7 items appeared to be too difficult. Based

on SPM order of difficulty, results indicated that there were 13 items (three items in

set A , four items in set B, three items in set C and three items in set D, whereas set

(E) followed an order of increasing in difficulty) and one set (D) that did not follow

an order of increasing in difficulty. In regards to items discrimination, SPM test

showed 51 items as having excellent discriminating value, 3 items as having good

discriminating value and 5 items as having fair discriminating value

The results of SPM reliability, validity and item analysis indicated that the SPM test

may be considered as an appropriate measure of mental ability for Libyan students. In

summary it may provide a promising tool for the measurement of mental ability in

Libyan setting.

Normality testing was carried out and showed that the collected data were normally

distributed which warranted the use of parametric tests. In order to test the differences

between SPM score means, independent sample t-test, one and two way ANOVA

statistical tests were used. In addition, the relationships between SPM test scores and

Student's Academic Achievement (SAA) was evaluated using Pearson Product-

Moment Correlation coefficient. A stepwise analysis was employed to investigate

which independent variable was the best predictor of SPM scores. The investigation

of these analyses was as follows:

237
1. There was no gender differences on SPM means score in total sample as well as

in ages 8, 9, 12, 13, 14, 18, through 21. However, females obtained significantly

higher SPM means than males at age of 10 years. Whereas, males scored

significantly higher means than female at the ages of 11 and 15 through 17. In

addition, there were no significant gender differences in total means and in each

region means. Also there was a lack of significant gender differences in total

means and in each discipline means (science & art). Thus, the gender variable was

not an important factor affecting the Libyan students’ scores on the SPM test.

2. In regards, gender differences in variability on SPM test; at the ages of 8, 9, 10,

12, 13, 14, 15, 17, 18 and 20 years females had greater variability than males. At

ages of 11, 16, 19 and 21 years males had greater variability than females, as well

in total sample. Also males had greater variability than females in total sample and

in each region. Whereas females had greater variability than males in total sample

and in each academic discipline. Consequently, results indicated no consistent

tendency for a gender difference in variability.

3. There was no significant difference in sample performance on the SPM test

according to region. Thus, the region variable was not an important factor

affecting the Libyan students’ scores on the SPM test. Whereas there was a

significant difference in regards to age as well as study levels. Thus, the region

variable was not an important factor affecting the Libyan students’ scores on the

SPM test. On other hand, age and study levels were important factors.

238
4. Students from science discipline had significantly higher SPM mean scores than

students from art discipline. Thus, the academic discipline was an important factor

affecting the Libyan students’ scores on the SPM test.

5. Significant coefficients between the SPM scores and students’ SAA ranged from

0.33 to 0.56. In general, all correlation coefficients between SPM and students

SAA were statistical significant for all groups.

6. A multiple regression for Libyan students indicated that both age and achievement

were predictors for SPM results with the age being a better predictor. Whereas

gender and region were not significant predictors.

7. The performance of Libyan students on the SPM can be considered lower than

students from other countries. Assessed against the SPM manual (1988, 1996,

2003, 2004 and 2008) data, Libyan students were below norms given for all

developed countries.

The next chapter presents the meta-analysis method. Moreover the outcomes of this

chapter, which are entirely about the SPM test for a Libyan sample, will be compared

to other studies carried out in various developed and developing countries.

239
Chapter seven: META-ANALYSIS

7.1 Introduction

It has became widely accepted that the best way to resolve issues on which there are a

large number of studies is to carry out a meta-analysis. The 1980s and 1990s witnessed a

rapid upsurge of this statistical approach (Anastasi and Urbina, 1997). Meta-analysis

summarizes the results of many quantitative studies that have investigated the same

problem. It provides a numerical way of expressing the average result of a group of

studies. It delineates specific procedures for finding, describing, classifying, and coding

research studies to be included in a meta-analysis review, and for measuring and analysis

of findings. A central characteristic that distinguishes meta-analysis from more traditional

approaches is the emphasis placed on making the review as inclusive as possible. This

technique was first proposed by Glass (1976) and by the end of the 1980s it had become

accepted as a useful method for synthesizing the results of many different studies.

Glass distinguished between the primary, secondary, and meta-analysis of research.

Primary analysis is the original analysis of data in a research study. Secondary analysis is

re-analysis of data for the purposes of answering the original research question with

better statistical techniques, or answering new questions with all data. Meta-analysis

refers to the analysis of analyses; the statistical analysis of a large collection of analysis

results from individual studies for the purposes of integrating the findings. It connotes a

rigorous alternative to the casual, narrative discussion of research studies which typify

our attempts to make sense of the rapidly expanding research literature.

240
It contributes in the creation of new knowledge synthesized from existing studies. The

literature explosion has resulted in a massive amount of information that must be

analyzed and summarized in order to be useful. Quantitative methods of integration of

research results have been used for many years and have received a great amount of

attention (Abraham et al., 1991).

Meta-analysis usually involves three major phases; the three “Ps”: preparation,

performance, and presentation. This sequence is the same as for any other type of

research. The project must be planned in advance, then systematically carried out, then

followed by reporting of results (Abraham et al., 1991).

Any statistical procedure or analytic approach can be misused or abused. As Green and

Hall (1984) aptly stated “Data analysis is an aid to thought, not a substitute”. Most of the

criticisms of quantitative approaches to reviewing the literature are objections to the

misuse or abuse, real or potential, of meta-analysis.

7.2 Advantages of Meta-analysis

Carrying out a meta-analysis includes the following advantages:

• It increases power and leads to stronger conclusions because more studies can be

analyzed with statistical methods than in impressionistic literary review. Often

this can bring effects into sharper focus, particularly when the results of all studies

are not consistent (Higgins and Green, 2006).

• Meta-analysis does not prejudge or exclude some studies as unworthy because of

their particular research designs, however weak. By empirically examining the

241
effects of research quality on study findings, meta-analysis is likely to be more

objective than traditional literary reviews (Wolf, 1986).

• It can answer questions not posed by the individual studies (Higgins and Green,

2006).

• It can settle controversies arising from apparently conflicting studies (Higgins and

Green, 2006).

7.3 Disadvantages of Meta-analysis

Disadvantages of Meta-analysis include the following:

• It oversimplifies the results of a research domain by focusing on the overall

effects and downplaying mediating or interaction effects. The better examples of

meta-analyses built potential mediating factors into their designs rather than

ignoring them. They do this by coding the characteristics of studies to empirically

examine whether such interactions exist. In practice, many meta-analyses do not

provide sufficient attention to possible interaction effects (Wolf, 1986).

• Meta-analysis of poor quality studies may be seriously misleading (Higgins and

Green, 2006).

• Decisions regarding inclusion and exclusion criteria of studies are inevitably

subjective. In some cases consensus may be hard to reach (Higgins and Green,

2006).

• Meta-analysis in the presence of serious publication and/or reporting bias may

produce an inappropriate summary (Higgins and Green, 2006).

242
7.4 Literature review

A thorough investigation into the literature revealed three meta-analysis studies carried

out; two published and one unpublished. The two published studies examined the SPM

test in relation to gender differences while the unpublished meta-analysis study examined

the SPM test in relation to gender and age groups.

In 2004, Lynn and Irwing (2004) conducted a meta-analysis to investigate sex differences

on the progressive matrices. About 57 studies were included and they studied sex

differences on standardized and advanced progressive matrices and on colored

progressive matrices. Results showed that there was no difference among children aged 6

to14 years, and that males obtained higher means than females from the age of 15

through to old age.

The same researchers in 2005 carried out a meta-analysis studying the sex differences in

means and variability on the progressive matrices in university students. 22 studies were

identified and analyzed. This meta-analysis disconfirmed the frequent assertion that there

was no sex difference in the mean and that males have greater variability. It showed that

males obtained a higher mean than females. The SPM tests showed greater variability

among females while the APM studies showed no significant difference in variability.

Abdalla et al. (2002) carried out a meta-analysis in sex and age differences in SPM

results. As all collected studies used the SPM test as a measuring tool, they used the

means as a measure of effect size. Their unpublished data showed insignificant

differences between males and females, but showed statistically significant differences

243
between all age groups; below 13 years group, 13 to 19 years group and 19 to 22 years

group. Higher age groups had higher mean scores than lower age groups.

7.5 Method

The aims and objectives of this meta-analysis were:

• Investigate the presence of significant differences among sample performances on

Raven’s Standard Progressive Matrices test according to the development status

of countries (Libya, developing and developed countries).

• Investigate the presence of significant differences among sample performances on

Raven’s Standard Progressive Matrices test according to age groups and gender.

• Investigate the presence of significant differences in sample performance on the

SPM test according to development status of countries and age groups or

development status of countries and gender or age groups and gender.

• Investigate variability of SPM means score based on development status, gender

based on developed status and gender based on age groups.

• To investigate the contribution of the independent variables; age groups, gender

and development status in the prediction of the SPM scores

7.5.1 Criteria for studies selection

Using available databases, an extensive and thorough search for studies to be included in

the meta-analysis has been carried out. Criteria for selection of studies included the

following:

244
• First the study must investigate the area of interest of the meta-analysis.

• Second the study must provide information regarding the research design,

subject’s information and measurement tool used in this study,

• Third the study must provide sufficient statistical information as SPM mean

scores.

A careful review of relevant studies published on the SPM test from computer databases,

dissertation and bibliographies of review articles produced 44 studies. These studies were

carried out in various countries between 1948 and 2009. From each relevant study the

following data were recorded and coded: (a) Author (b) Country (c) Year of publication;

(d) Population sampled; (e) Age (f) SPM mean’s and standard deviation’s and (g) Sample

size.

Table 7.1 studies included in the meta-analysis


COUNTRY YEARS REFERENCES
Congo 1994 Nkaya et al.,
Denmark 1968 Vejleskov,
Estonia 2000 Lynn, et al.,
France 1994 Nkaya et al.,
Iceland 2003 Pind, et al.,
India 1968; 1968; 1972; Sinha, Mehot, Mohan, Rao and Sinha,
1974 and 1977
Iran 1974 Baraheni,
Israel 1991 Kaniel, & Fisherman,
Kuwait 2006 Abdel-Khalek and Lynn
Libya 1983; 2005 and 2005 Aboujaafer, Attashan and Abdalla, Ahlam
Mexico 2004 Lynn, et al.,
Nigeria 1980 Maqsud,
Oman 2009 Abdel-khalek and Lynn
Qatar 1986,2009 Bart et al.,
Pakistan 2008 Ahmed, et al.,
Slovenia 2007 Boben,
South Africa 2002 Rushton, et al.,
Sudan 2008.a Kalefeefa, et al
Syria 2008.b Kalefeefa & Lynn
Tanzania 1967 Klingelhofer,

245
Turkey 1993 Duzen, et al.,
UK 1989 and 1994 Egan and van den
USA 1948; 1968; 1969; Rimoldl, Tulkin & Newbrough, Burke &
1985; 1986.a.b; 1987; Bingham, Burke, Powers et al., Sidles & Avoy,
1988; 1988; 1986; Jensen et al., Karnes & Whorton, Bart et al.,
1987 & 1988; 1994 Whorton & Karnes, Johnson et al., and
and 1994 Blennerhssett et al.,

7.5.2 Strategy of analysis

Data have been organized into three categories; first based on development status either

developed or developing countries, second based on age groups and third based on

gender.

The key feature of meta-analysis is that each study’s results are translated into an effect

size. Effect size is a numerical way of expressing the strength or magnitude of a reported

relationship. It represents a significant improvement over traditional methods of

summarizing literature (Mills & Airasian 2006). Many effect size statistics are available

and choosing which one to be used depends on the nature of data collected.

The nature of data reported in the SPM tests is numerical continuous data and means

were calculated using the same scale, which was the SPM test itself. The term

‘continuous’ in statistics conventionally refers to data that can take any value in a

specified range. When dealing with numerical data, this means that any number may be

measured and reported.

In the presence of continuous numerical data obtained using a same scale, the means of

the studies can be used as a measure of effect size (Higgins & Green 2006). SPSS 16.0

statistics software was used to carry out the statistical analysis of the meta-analysis.

246
SPSS was carried out in the following manner:

• First descriptive statistics investigating frequency distributions, means, and

standard deviations.

• Second Kolmogorov-Smirnov, Shapiro-Wilk test and normal probability plots

were used to determine normality of the data.

• Third independent sample t-test was used to compute differences between SPM

test means among different studies according to gender.

• Fourth One-Way Analysis of Variance was used to compute differences between

SPM test means among different studies according to the development status of

countries and age groups.

• Fifth Two-Way Analysis of Variance was used to compute differences between

SPM test means among different studies according to both; development status of

countries and age groups variables or development status of countries and gender

variables or age groups and gender variables. In addition, this method was used to

investigate the individual and joint interaction effect of independent variables on

SPM scores.

• Sixth To investigate the effect size of the SPM means by calculation of Cohen’s

d, which is equal to the subtraction of the means divided by the mean of the

standard deviation. In addition, Cohen’s d was used to calculate IQ point

difference which is equal to d multiplied by the SD (15).

247
• Seventh To evaluate the variability (variance ratios); Vr + the average of the

squared differences from the mean (Lynn and Irwing, 2004).

• Eighth To convert SPM means score to IQ scores using British and American

percentile indices and a conversion table from percentiles to IQ scores.

• Ninth Multiple regression, a stepwise analysis method was used to investigate

which independent variable (development status, age and gender) is the best

predictor of SPM scores.

7.6 Results

An extensive review of the studies was carried out and data were organized based upon

categorizes mentioned earlier into:

(a) Development status group; developed countries, developing countries and Libya.

(b) Four age groups; 8-11 years, 12-14 years, 15-17 years and 18-21 years.

(c) Gender groups; males, females.

Using SPSS, data collected for the meta-analysis was investigated for normality. Both

Kolmogrovo-Smirnov and Shapiro-Wilktests were carried out. The resultant p value was

0.200 and 0.308 respectively. Both values were well above 0.05, which indicated that the

data were normally distributed. This allowed the use of parametric tests to investigate and

evaluate presence of statistically significant differences among the data.

Following is the descriptive statistics for the overall collected data for the meta-analysis

and tests of normality.

248
Table 7.2 Descriptive statistics for means scores of overall collected data and tests of
normality.
Statistic Std Error
Mean 34.9755 .74322
95% confidence Lower Bound 33.5049
Interval for Mean Upper Bound 36.4462
5% Trimmed Mean 35.0786
Median 35.9750
Variance 70.704
Std. Deviation 8.40856
Minimum 12.65
Maximum 52.76
Range 40.11
Interquartile Range 10.4175
Skewness -.271 .214
Kurtosis -.080 .425
Tests of normality
Kolmogorov-smirnov Shapiro-Wilk
Statistic df Significant Statistic df Significant
.062 128 .200 .988 128 .308
scores
16 60

14
50

12
40
10

8 30

6
20

4
Frequency

93

Std. Dev = 8.41 10


2
Mean = 35.0

0 N = 128.00 0
N= 128
12

16

20

24

28

32

36

40

44

48

52

scores
.0

.0

.0

.0

.0

.0

.0

.0

.0

.0

.0

scores

Figure 7.1 the distribution for means scores. Figure 7.2 Box plot of scores distribution.

249
Figure 7.3 Normal Q-Q plot. Figure 7.4 Detrended normal Q-Q plot.

.
Figure 7.1 is a histogram showing the SPM scores. They appeared to be normally

distributed. Figure 7.2 showed a box plot. 50% of scores are represented by the

rectangular, while the line inside the box represents the median value, whereas the

whiskers represent the highest and lowest values. Figure 7.3 showed a normal probability

plot (normal Q-Q plot). Here the observed value of each mean is plotted against its

expected value. A reasonable straight line suggested a normal distribution. Figure 7.4

showed the detrended normal Q-Q plot, where the actual deviation of the scores from the

straight line are plotted. Most scores were collected around the zero line with no real

clustering of scores. This indicated a normal distribution.

250
7.6.1 SPM means and standard deviations according to the independent variables

Following tables show descriptive statistics of SPM score means according to

development status, age groups and gender.

Table 7.3 showing SPM score means and standard deviations according to independent
variables.
SPM Scores Development status
Groups (N) sample Mean SD (N) Group
Developed Countries 9514 38.88 8.61 44
Developing Countries 19579 33.10 7.31 70
Libya 2600 32.31 9.02 14
Total 31693 34.98 8.41 128
AGE
8- 11 years (Primary) 8309 27.33 7.63 35
12-14 years. (prep) 9924 34.94 6.71 44
15-17 years. (Secondary) 8991 40.09 5.31 28
18-21 years (University) 4469 40.97 6.21 21
Total 31693 34.98 8.41 128
gender
Males 11961 33.95 8.95 93
Females 11423 33.82 9.00 91
Total 23384 33.88 8.95 184

Based on development status, the developed countries showed the highest mean score

while Libya showed the lowest. Based upon age groups, score means increased as age

increased; the highest score means were achieved by the 18-21 years age group.

According to gender, males were only slightly higher than females when SPM score

means were compared.

Using SPSS, seven meta-analysis procedures were carried out to investigate statistical

significant differences between SPM score means based upon the independent variables,

as follows:

251
7.6.2 Differences in SPM scores

7.6.2.1 Difference according to development status

One-way ANOVA was used to compare the SPM score means for the development status

group.

Table7.4 Comparison of the SPM Mean according to development status


Develop. status (N)sample Mean SD (N) Group
Developed 9514 38.88 8.61 44
Developing 19579 33.10 7.31 70
Libya 2600 32.31 9.02 14
Total 31693 34.98 8.41 128
Source Sum of Squares Df Mean Squares F. Ratio F. Prob.
Between Groups 1036.658 2 518.329 8.157 .000
Within Groups 7942.728 125 63.542
Total 8979.386 127

Table 7.5 Post hoc tests multiple comparisons of SPM scores (Tukey HSD)
(I) (J) Mean Std. Error Sig. 95% Confidence Interval
Develop. Develop. Difference Lower Bound Upper Bound
status status (I-J)
Developed developing 5.7818 1.53358 .001 1.9825 9.5810
Libya 6.8222 2.44598 .023 .7626 12.8818
developing developed -5.7818 1.53358 .001 -9.5810 -1.9825
Libya 1.0404 2.33376 .905 -4.7412 6.8220
Libya developing -1.0404 2.33376 .905 -6.8220 4.7412
developed -6.8222 2.44598 .023 -12.8818 -.7626
*The mean difference is significant at the .05 level.

Tables 7.4 and 7.5 showed the effect of development status on SPM means scores.

Subjects were divided into three groups; developed, developing and Libya. There were

statistically significant differences (p =.05) in SPM scores for the three development

status groups: F (2, 125) = 8.157, p = .000. The effect size, calculated using eta squared

(to divide the sum of squares between-groups (1036.658) by the total sum of squares

(8979.386) (Pallant, 2007)), the resulting eta squared value was 0.12, which indicated a

large effect. Post-hoc comparisons using the Tukey HSD test indicated that the mean

252
score for the developed group (M =38.88, SD = 8.61) was significantly different from the

developing group (M = 33.10, SD = 7.31) and from the Libya group (M = 32.31, SD =

9.02). The developing group did not differ significantly from the Libya group. Based

upon these results it was decided to combine Libya with the developing countries group,

so the development status group was categorized into developed and developing

countries only.

7.6.2.2 Difference according to age groups

One way ANOVA was conducted to compare the SPM means for the age group.

Table 7.6 Comparison of the SPM Mean scores according to age groups
Age Groups (N)sample Mean SD (N) Group
8-11 8309 27.33 7.63 35
12-14 9924 34.94 6.71 44
15-17 8991 40.09 5.31 28
18-21 4469 40.97 6.21 21
Total 31693 34.98 8.41 128
Source Sum of Squares df Mean Squares F. Ratio F. Prob.
Between Groups 3535.138 3 1178.379 26.839 .000
Within Groups 5444.248 124 43.905
Total 8979.386 127

Table 7.7 Post Hoc Tests Multiple Comparisons of SPM Scores (Tukey HSD)
(I) Age (J) Mean Difference Std. Error Sig. 95% Confidence Interval
groups Age (I-J) Lower Bound Upper Bound
groups
8-11 12-14 -7.6188 1.50076 .000 -11.8723 -3.3652
15-17 -12.7633 1.68002 .000 -17.5249 -8.0016
18-21 -13.6450 1.82898 .000 -18.8288 -8.4611
12-14 8-11 7.6188 1.50076 .000 3.3652 11.8723
15-17 -5.1445 1.60184 .019 -9.6846 -.6045
18-21 -6.0262 1.75743 .010 -11.0072 -1.0451
15-17 8-11 12.7633 1.68002 .000 8.0016 17.5249
12-14 5.1445 1.60184 .019 .6045 9.6846
18-21 -.8817 1.91279 .975 -6.3030 4.5397
18-21 8-11 13.6450 1.82898 .000 8.4611 18.8288
12-14 6.0262 1.75743 .010 1.0451 11.0072
15-17 .8817 1.91279 .975 -4.5397 6.3030

253
* The mean difference is significant at the .05 level.

Tables 7.6 and 7.7 show the effect of age on SPM means scores. Subjects were divided

into four age groups. There were statistically significant differences (p =.05) in SPM

scores for the four age groups: F (3, 124) = 26.839, p = 0.000. The effect size was

calculated using eta squared (to divide the sum of squares between-groups (3535.138) by

the total sum of squares (8979.386) (Pallant, 2007)), the resulting eta squared value was

0.39, which indicated a large effect. Post-hoc comparisons using the Tukey HSD test

indicated that there were statistical significant differences between the different age

groups except between the 15-17 years age group (M = 40.09, SD 5.31) and the 18-21

years age group (M = 40.97, SD 6.21).

254
7.6.2.3 Difference according to gender

An independent t-test was carried out to compare the SPM score means for the gender

group.

Table 7.8 Comparison of the gender mean scores of SPM test


Gender (N)sample Mean SD Std. Error Mean (N)
Group
Male 11961 33.95 8.95 .92801 93
Female 11423 33.82 9.00 .94384 91
t-test for Equality of Means
Levene's Test for 95%
Equality of Variances Confidence
F Sig. t df Sig.(2- Mean Std. Error Interval of the
tailed) Difference Difference Difference
Lower Upper
Equal .062 .804 .102 182 .919 .13492 1.32356 -2.477 2.746
variances
assumed
Equal variances not .102 181.858 .919 .13492 1.32364 -2.477 2.747
assumed

An independent-samples t-test was conducted to compare the SPM mean scores for males

and females. There was no significant difference in scores for males (mean 33.95, SD

8.95) and females, mean = 33.82, SD = 9.00; t (182) = 0.102, p = 0.919). The magnitude

of the differences in the means (mean difference = 0.1349, 95% CI:-2.477 to 2.746) was

very small (partial eta squared = 0.007). SPSS did not provide eta squared values for t-

test. It was, however, calculated using the information provided in the output.

255
7.6.2.4 Difference according to development status and age

Two-way ANOVA test was carried out on the SPM scores for the development status

according to age groups.

Table 7.9 Comparison of the development status mean scores of SPM test according to
age.
Development status Age groups (N)sample Mean SD (N) Group
developed 8-11 4223 31.98 6.28 18
developing 4086 22.33 5.61 17
Total 8309 27.33 7.63 35
developed 12-14 2659 40.50 5.93 14
developing 7265 32.35 5.39 30
Total 9924 34.94 6.71 44
developed 15-17 1814 45.92 5.92 8
developing 7177 37.76 4.37 20
Total 8991 40.09 5.31 28
developed 18-21 818 50.22 4.21 4
developing 3651 38.80 3.04 17
Total 4469 40.97 6.21 21
developed Total 9514 38.88 8.63 44
developing 22179 32.93 7.57 84
Total 31693 34.99 8.41 128

Table 7.10 Levene's Test of Equality of Error Variances of SPM scores


F df1 df2 Sig.
2.052 7 120 .063

Table 7.11 Tests of Between-Subjects Effects of SPM scores.


Source Type III Sum df Mean F Sig. Partial Eta
of Squares Square Squared
Corrected Model 5774.888 7 824.984 30.893 .000 .643
Intercept 127961.898 1 127961.898 4791.836 .000 .976
AGE 4416.961 3 1472.320 55.135 .000 .580
REGION 1980.915 1 1980.915 74.180 .000 .382
AGE * REGION 32.878 3 10.959 .410 .746 .010
Error 3204.498 120 26.704
Total 165560.362 128
Corrected Total 8979.386 127
a R Squared = .643 (Adjusted R Squared = .622).

256
Table 7.12 Post Hoc Tests Multiple Comparisons of SPM Scores (Tukey HSD).
(I) (I) Age (J) Mean Std. Sig. 95% Confidence
Develop. Age Difference Error Interval
status (I-J) Lower Upper
Bound Bound
Developed 8-11 12-14 -8.52000* 2.10509 .001 -14.1625 -2.8775
15-17 -13.94125* 2.51016 .000 -20.6695 -7.2130
18-21 -18.23750* 3.26544 .000 -26.9902 -9.4848
12-14 8-11 8.52000* 2.10509 .001 2.8775 14.1625
15-17 -5.42125 2.61818 .180 -12.4391 1.5966
18-21 -9.71750* 3.34918 .029 -18.6947 -.7403
15-17 8-11 13.94125* 2.51016 .000 7.2130 20.6695
12-14 5.42125 2.61818 .180 -1.5966 12.4391
18-21 -4.29625 3.61753 .638 -13.9928 5.4003
18-21 8-11 18.23750* 3.26544 .000 9.4848 26.9902
12-14 9.71750* 3.34918 .029 .7403 18.6947
15-17 4.29625 3.61753 .638 -5.4003 13.9928
developing 8-11 12-14 -9.95410* 1.44341 .000 -13.7414 -6.1668
15-17 -15.35826* 1.56851 .000 -19.4738 -11.2427
18-21 -16.39706* 1.63086 .000 -20.6762 -12.1179
12-14 8-11 9.95410* 1.44341 .000 6.1668 13.7414
15-17 -5.40417* 1.37257 .001 -9.0056 -1.8027
18-21 -6.44296* 1.44341 .000 -10.2303 -2.6556
15-17 8-11 15.35826* 1.56851 .000 11.2427 19.4738
12-14 5.40417* 1.37257 .001 1.8027 9.0056
18-21 -1.03879 1.56851 .911 -5.1544 3.0768
18-21 8-11 16.39706* 1.63086 .000 12.1179 20.6762
12-14 6.44296* 1.44341 .000 2.6556 10.2303
15-17 1.03879 1.56851 .911 -3.0768 5.1544
• The mean difference is significant at the .05 level

Tables 7.9, 7.10, 7.11 and 7.12 showed the impact of development status according to age

on SPM mean scores. Subjects were divided into two groups according to the

development status (developed and developing). The interaction effect between

development status and age was not statistically significant, F (3, 120) = .410, P = .746.

There was a statistically significant main effect for development status, F (1, 120) =

257
74.180 P = .000; the magnitude of the effect size was large (partial eta squared = .38).

The main effect for age, F (3, 120) = 55.135 P = .000; was statistical significance. Post-

hoc comparisons using Tukey HSD test showed that in developing countries statistical

significance differences were found between all age groups except between the 15-17 age

group and the 18-21 age group. In developed countries, statistical significant differences

were found between all age groups except between the 12-14 age group and the 15-17

age group and also between the 15-17 age group and the 18-21 age group. Leven’s

equality test was not significant indicating that group variance was equal. Moreover, the

magnitude of the difference between groups in terms of standard deviation units (Cohen’s

d) was calculated (Pallant, 2007).

Table 7.13 Magnitude of the development status of countries (developed and developing
countries) in mean scores and variability on SPM as functions of age and total sample
Age Development (N) (N) Mean SD t sig d Vr IQ IQs
status Group sample Point
8-11 developed 18 4223 31.98 6.28 -4.75 .000 1.26 1.25 18.90 96
developing 17 4086 22.33 5.61 85
Total 35 8309 27.33 7.63 91
12-14 developed 14 2659 40.50 5.93 -4.52 .000 1.21 1.21 18.15 93
developing 30 7265 32.35 5.39 81
Total 44 9924 34.94 6.71 87
15-17 developed 8 1814 45.92 5.92 -5.10 .000 1.53 1.84 22.95 95
developing 20 7177 37.76 4.37 83
Total 28 8991 40.09 5.31 89
18-21 developed 4 818 50.22 4.21 -4.80 .000 1.84 1.91 27.60 96
developing 17 3651 38.80 3.04 79
Total 21 4469 40.97 6.21 88

Score Development (N) (N) Mean SD t sig d Vr


IQ IQs
status Group sample Point
Total developed 44 9514 38.88 8.63 -4.03 .000 0.71 1.30 10.65 95
developing 84 22179 32.93 7.57 82
Total 128 31693 34.99 8.41 89

258
Table 7.13 showed the mean scores obtained by developed and developing countries in

each age group, standard deviations, t values for the difference between developed and

developing countries in each age group, t value for the difference between developed and

developing countries within the total sample, level of significance, Cohen’s d scores (the

difference between the developed and developing countries means divided by the within

group standard deviation; Cohen, 1977), the variance ratios; Vr (i.e. the variance of the

developed countries divided by the variance of the developing countries; Lynn and

Irwing, 2004) Vr’s greater than 1.0 indicate that developed countries had greater variance

than developing countries, while Vr’s less than 1.0 indicate that developing countries had

greater variance than developed countries (Khaleefa and Lynn 2008). Finally IQ point

differences between developed and developing countries in each age group as well as

within total sample. Results showed three interesting features. First, the analysis showed

that the British percentile average equivalent was 39th PC for developed countries 8-11

age group (IQ=96), 31st PC for the 12-14 age group (IQ=93), and 37th PC for the 15-17

age group (IQ= 95). The American percentile average equivalent was 39th PC (IQ= 96)

for 18-21 age group. In addition, the British percentile average equivalent was 16th PC

for developing countries 8-11 age group (IQ=85), 10th PC for the 12-14 age group

(IQ=81) and 12th PC for the 15-17 age group (IQ= 83). The American percentiles’

average equivalent was 8th PC (IQ= 79) for the 18-21 age group. Overall, the highest IQ

obtained was 96 for the 8-11 years age group in developed countries whereas the lowest

IQ was 79 for the 18-21 years age group in developing countries. The average IQ for the

developed countries was 95 whereas for the developing countries it was 82.

259
Second, statistical significantly differences in development status of countries in total and

in every age group was in favour of developed countries. In total, developed countries

obtained a significantly higher mean than developing countries by (0.71d = 10.65 IQ

point).

Third, gender difference in variability within the total sample as well as within each age

group (as can be seen from the standard deviations and variance ratios) showed a large

difference in variance where developed countries hadgreater variability than developing

countries.

7.6.2.5 Difference according to development status and gender

Two-way ANOVA was conducted on SPM scores for the development status according

to gender.

Table 7.14 Comparison of the development status mean scores of SPM test according to
gender.
Development status Gender (N)sample Mean SD (N) Group
developed Male 2626 39.47 8.72 23
Female 2704 39.57 9.23 22
Total 5330 39.50 8.86 45
developing Male 9335 32.14 8.31 70
Female 8719 31.99 8.19 69
Total 18054 32.07 8.22 139
Total Male 11961 33.95 8.95 93
Female 11423 33.82 9.00 91
Total 23384 33.88 8.95 184

Table 7.15 Levene's Test of Equality of Error Variances of SPM scores


F df1 df2 Sig.
.107 3 180 .956

260
Table 7.16 Tests of Between-Subjects Effects of SPM scores
Source Type III Sum df Mean F Sig. Partial Eta
of Squares Square Squared
Corrected Model 1880.43 3 626.81 8.825 .000 .128
Intercept 174051.85 1 174051.85 2450.522 .000 .932
REGION 1879.56 1 1879.56 26.463 .000 .128
GENDER 5.1090 1 5.109.0 .001 .979 .000
REGION * GENDER .391 1 .391 .006 .941 .000
Error 12784.76 180 71.03
Total 225925.97 184
Corrected Total 14665.19 183
a R Squared = .128 (Adjusted R Squared = .114)

Tables 7.14, 7.15 and 7.16 showed the impact of development status according to gender

on SPM mean scores. Subjects were divided into two groups according to the

development status (developed and developing). The interaction effect between

development status and gender was not statistically significant, F (1, 180) = .006, P =

.941. There was a statistically significant main effect for development status, F (1, 180) =

26.463 P = .000; the magnitude of the effect size was large (partial eta squared = .13).

The main effect for gender, F (1, 180) = .001 P = .979; did not exhibit statistical

significance. Leven’s equality test was not significant indicating that the group variance

was equal.

261
7.6.2.6 Difference according to age groups and gender

Two-way ANOVA was conducted on SPM scores for age groups according to gender.

Table 7.17 Comparison of the age groups mean scores of SPM test according to gender
Age Gender (N)sample Mean SD (N) Group
8-11 Male 3133 26.09 7.87 27
Female 2918 25.67 8.27 27
Total 6051 25.88 7.99 54
12-14 Male 3373 33.12 7.17 31
Female 3267 34.19 6.81 30
Total 6640 33.65 6.95 61
15-17 Male 3871 39.79 5.45 23
Female 3656 38.95 6.14 23
Total 7527 39.37 5.76 46
18-21 Male 1584 42.60 4.20 12
Female 1582 42.07 4.41 11
Total 3166 42.35 4.21 23
Total Male 11961 33.95 8.95 93
Female 11423 33.82 9.00 91
Total 23384 33.88 8.95 184

Table 7.18 Levene's Test of Equality of Error Variances of SPM scores


F df1 df2 Sig.
2.281 7 176 .030

Table 7.19 Tests of Between-Subjects Effects of SPM scores


Source Type III df Mean F Sig. Partial Eta
Sum of Square Squared
Squares
Corrected Model 6525.562 7 932.223 20.157 .000 .445
Intercept 199051.721 1 199051.721 4304.018 .000 .961
AGE 6488.033 3 2162.678 46.763 .000 .444
GENDER 1.298 1 1.298 .028 .867 .000
AGE * GENDER 29.614 3 9.871 .213 .887 .004
Error 8139.628 176 46.248
Total 225925.966 184
Corrected Total 14665.190 183
a R Squared = .445 (Adjusted R Squared = .423).

262
Table 7.20 Post Hoc Tests Multiple Comparisons of SPM Scores (Tukey HSD)
(I) Age (J) Age Mean Difference Std. Sig. 95% Confidence Interval
(I-J) Error Lower Bound Upper Bound
8-11 12-14 -7.7645 1.27067 .000 -11.0603 -4.4687
15-17 -13.4903 1.36449 .000 -17.0294 -9.9511
18-21 -16.4696 1.69329 .000 -20.8616 -12.0777
12-14 8-11 7.7645 1.27067 .000 4.4687 11.0603
15-17 -5.7257 1.32799 .000 -9.1702 -2.2813
18-21 -8.7051 1.66401 .000 -13.0211 -4.3891
15-17 8-11 13.4903 1.36449 .000 9.9511 17.0294
12-14 5.7257 1.32799 .000 2.2813 9.1702
18-21 -2.9793 1.73671 .319 -7.4839 1.5252
18-21 8-11 16.4696 1.69329 .000 12.0777 20.8616
12-14 8.7051 1.66401 .000 4.3891 13.0211
15-17 2.9793 1.73671 .319 -1.5252 7.4839
• The mean difference is significant at the .05 level.

Figure 7.5 means score differences of age group and gender

Tables 7.17, 7.18, 7.19 and 7.20 showed that the effect of age group according to gender

on SPM test scores. The interaction effect between age groups and gender was not

statistically significant, F (3, 176) = .213, P = .887. There was a statistically significant

main effect for age groups, F (3, 176) = 46.763 P = 0.000; the magnitude of the effect

size was large (partial eta squared = .44). Post-hoc comparisons using Tukey HSD test

showed that there were statistical significant differences between the different age

263
groups except between the 15-17 years age group (M = 39.37, SD = 5.76) and the 18-21

years age group (M = 42.35, SD = 4.21). The main effect for gender, F (1, 176) = .028, P

=.867, did not exhibit statistical significance. Leven’s equality test was not significant

indicating that the group variance was equal. Furthermore, the magnitude of the

difference between groups in terms of standard deviation units (Cohen’s d) was

calculated (Pallant, 2007).

Table 7.21 Magnitude of gender differences in mean scores and variability on SPM as a
function of age and development status
function of age
Age Gender (N) (N) Mean SD t sig d Vr IQ
Group sample Point
8-11 Male 27 3133 26.09 7.87 .194 .847 0.05 0.95 0.75
Female 27 2918 25.67 8.27
Total 54 6051 25.88 7.99
12-14 Male 31 3373 33.12 7.17 -.599 .552 -0.15 1.05 -2.25
Female 30 3267 34.19 6.81
Total 61 6640 33.65 6.95
15-17 Male 23 3871 39.79 6.14 .491 .626 0.14 1.27 2.1
Female 23 3656 38.95 5.45
Total 46 7527 39.37 5.76
18-21 Male 12 1584 42.60 4.20 .294 .772 0.12 0.95 1.8
Female 11 1582 42.07 4.41
Total 23 3166 42.35 4.21
function of development status
status Gender (N) (N) Mean SD t sig d Vr IQ
Group sample Point
Devel Male 23 2626 39.47 8.72 .104 .917 -0.01 0.89 -0.17
oped Female 22 2704 39.57 9.23
Total 45 5330 39.50 8.86
Devel Male 70 9335 32.14 8.31 -.026 .980 0.15 1.03 2.25
oping Female 69 8719 31.99 8.19
Total 139 18054 32.07 8.22
Function of total sample
Score Gender (N) (N) Mean SD t sig d Vr IQ
Group sample Point
Male 93 11961 33.95 8.95 .102 .919 0.01 0.99 0.15
Female 91 11423 33.82 9.00
Total 184 23384 33.88 8.95

264
Table 7.21 showed the mean scores obtained by males and females in each age group and

in each development status, the standard deviations, t values for the difference between

males and females in each age group, t values for the difference between males and

females in each development status, t value for the difference between males and females

within the total sample, level of significance, Cohen’s d scores (the difference between

the male and female means divided by the within group standard deviation; Cohen,

1977), the variance ratios; Vr (i.e. the variance of the male divided by the variance of the

female; Lynn and Irwing, 2004) Vr’s greater than 1.0 indicate that males had greater

variance than females, while Vr’s less than 1.0 indicate that females had greater variance

than males (Khaleefa and Lynn 2008). Finally IQ point differences between males and

females in each age group and in each development status as well as within total sample

were showed. Results indicated two interesting features. First, lack of significant gender

differences in total and in every age group and in each development status. In total, males

obtained a higher mean than females by 0.01d (0.15 IQ point). In the 8-11 age group,

males obtained a higher mean than females by 0.05d (0.75 IQ point), while among the

12-14 age group females obtained a higher mean than males by 0.15d (2.25 IQ points).

In the 15-17 age group, males scored a higher mean than females by 0.14d (2.1 IQ

points). In the 18-21 age group, males scored a higher mean than females by 0.12d (1.8

IQ points). In developed countries, females obtained a higher mean than males by 0.01d

(0.17 IQ points). Finally, in developing countries males scored a higher mean than

females by 0.15d (2.25 IQ points). Second, gender difference in variability within the

total sample (as can be seen from the standard deviations and variance ratios) as well as

within each age group and within development status was marginally low except in the

265
15-17 age group where males had greater variability than females (Vr = 1.27). In

addition, females achieved greater variability than males (Vr = 0.89) in developed

countries.

7.6.3 Multiple Regressions according to the independent variables

In order to investigate the contribution of the independent variables (development status,


gender and age groups) in the prediction of the SPM scores a multiple stepwise
regression method was used.

Table 7.22 Stepwise Regression for Independent Variable and the SPM Score Means
Model Unstandardised Coffi. Standardised Coffi. T Sign.
B Std. Error Beta
1- (Constant) 22.889 .887 .603 25.793 .000
Age group 4.889 .352 13.863 .000
2- (Constant) 12.032 1.257 9.576 .000
Age group 5.175 .305 .638 16.985 .000
Development status 7.951 .730 .409 10.886 .000
Model Summary
Model R R Adjusted R Square Stand. Error of
Square Estimate
1- Development .603 .363 .361 7.00954
status, Gender
2- Gender. .727 .529 .526 6.03585

Using the Step-Wise method, a significant model emerged (Adjusted R square = 0.526; F

2,336 = 188.851, p < 0.000). Significant variables are shown below:

Predictor Variable Beta p

Age groups 0.638 p < 0.000

Development status 0.409 p < 0.000

Gender was not a significant predictor (p = 0.962).

This showed that both age and development status were predictors for SPM results with

age being a better predictor.

266
7.7 Chapter Summary

The overall SPM score means was 34.98 with a standard deviation of 8.41 (minimum

12.65 and maximum 52.76). The developed countries showed the highest mean score M

=38. 88; SD = 8.61 whereas Libya showed the lowest mean score M =32.31; SD = 9.02,

and was slightly lower than developing countries mean score M =33.10; SD = 7.31. The

18-21 years age group showed the highest mean score M = 40.97; SD = 6.21 whereas the

8-11 years age group showed the lowest mean score M =27.33; SD =7.63. Males showed

a slightly higher mean score M = 33.95; SD = 8.95 whereas female mean score was M =

33.82; SD = 9.00. The average IQ score for developed countries was 95, whereas the

average IQ score of developing countries was 82.

Normality testing was carried out and showed that the collected data was normally

distributed which warranted the use of parametric tests. To test the differences between

SPM score means, independent sample t-test, one and two way ANOVA statistical tests

were used. In addition, a stepwise analysis was employed to investigate which

independent variable was the best predictor of SPM scores. The following was

concluded:

1. Significant differences were found between the SPM scores based on development

status. Developed countries achieved higher SPM scores than developing countries

and than Libya. No statistically significant differences were found in SPM scores

between Libya and developing countries. Thus development status was concluded as

being an important factor affecting the SPM.

267
2. Significant differences were found between the SPM scores based on age groups.

Differences were in favour of older age groups. In addition, SPM scores of the age

groups were statistically different based on development status but not different based

on gender. Thus age was concluded as being an important factor affecting the SPM.

3. Using the British and American percentiles, SPM scores were converted to IQ scores.

IQ score of the 8-11 age group in developed countries was 96, whereas that in

developing countries was 85. IQ score of the 12-14 age group in developed countries

was 93, whereas that in developing countries was 81. IQ score of the 15-17 age group

in developed countries was 95, whereas that in developing countries was 83. IQ score

of the 18-21 age group in developed countries was 96, whereas that in developing

countries was 79.

4. No significant differences were found between SPM scores based on gender. In

addition, no gender differences were found among the different age groups or

development status. Thus gender was concluded as not being an important factor

affecting the SPM.

5. Variability difference in SPM mean scores was high in each age group based on

development status, in favour of developed countries. Variability difference in SPM

mean scores was low in each age group based on gender, except in the 15-17 age

group where variability was high in favour of males. In addition, females achieved

higher variability in developed countries, whereas in developing countries variability

was low, in favour of males. Extremely low variability was found in the total sample.

268
Consequently, results indicated no consistent tendency in variability for a gender

difference.

6. Multiple regression step-wise showed age and development status as predictors for

SPM results. Moreover, age was a better predictor.

The next chapter brings together the key research findings and discusses them in context

with the wider existing literature.

269
Chapter eight: DISCUSSION AND CONCLUSION

8.1 Introduction

Individuals differ from one another in their ability to understand complex ideas, to adapt

effectively to the environment, to learn from experience, to engage in various forms of

reasoning and to overcome obstacles by taking thought. Concepts of "intelligence" are

attempts to clarify and organize this complex set of phenomena. Although considerable

clarity has been achieved in some areas, no such conceptualization has yet answered all

the important questions and none commands universal assent (Neisser, 1995).

For historical reasons, the term "IQ" is often used to describe scores on tests of

intelligence. It originally referred to an "intelligence Quotient" that was formed by

dividing a so-called mental age by a chronological age, but this procedure is no longer

used. IQ is clearly a flexible construct — as amply demonstrated by decisions in the

1930s and 1940s in the United States and Britain to ‘adjust’ test questions to equalize the

scores of boys and girls, because in previous versions of the tests girls had scored higher.

Many tests have been “tailored” to ensure that the scores of boys and girls are equalized

because of the assumption that there are no gender differences in general intelligence

defined as the sum of all cognitive abilities. But this has not been done for the SPM.

The aim of this chapter is to discuss and evaluate the results of the study that have thus

far been presented. The next section, section two, discusses intelligence testing in Libya.

The third and fourth sections describe the SPM test and meta-analysis respectively.

Section five presents an analytical discussion of the entire study. The remaining sections,

270
six till nine, investigate the following points: conclusion of the major findings;

contributions of the current study in the domain of intelligence testing; highlight of study

limitations; recommendations and suggestions concerning further research in the area.

8.2 Intelligence testing in Libya

Though Libya has witnessed a huge development in education within the last 5 decades,

some areas have not benefited from the positive effects of this development. To date, no

single test of intellectual ability has been officially adopted or developed to be used for

the measurement of intelligence in Libya. Schools and universities –alike- use

examination grades as the primary or only method in determining who should be

accepted for study at various academic establishments and for various jobs in the

vocational sector. Although this might be considered as a good criterion for such

purposes, additional criterions are desirable.

Mental health services in Libya suffer from shortage of staff, psychological services and

a lack of facilities. The general public in Libya know very little about the usefulness,

purposes, or functions of intelligence tests.

Mental tests currently used in Libya are misused or partially used. The use of incomplete

tests was likely to bias predictions based on test results and had serious negative

implications for educational or clinical decisions In addition, the use of incomplete test

scores for estimation of mental ability might result in invalid assessment, leading to grave

consequences on the lives of individuals.

271
Other aspects that have been affected by lack of intelligence tests in Libya were the

selection of students for different educational programs. In Libya today, a relevant and

accurate selection procedure is essential and in need, not only in the field of education but

also at an intermediate level of training for skilled manpower. Indeed, a clear failing of

the current system could be seen whereby many university graduates were posted to

office work which could be performed to a similar level of competence by less qualified

people (Attashani and Abdalla 2005).

8.3 The SPM test

The problem of adapting intelligence tests to a new setting was by no means uncommon,

as this was a general problem for many developing countries in the past. In addition, if

the aim was to assess the “mental ability” of people in a culture that has yet to develop its

own testing scheme or system, it was necessary to assess what was important for that

culture (Brislin and Thorndike, 1973; Ortar 1972).

In this study, an international culture-fair test was adopted, and standardization was

carried out to achieve local norms This was done because it required less time and effort

than to design a test specifically for Libya (Ezeilo 1978). The Raven’s Standard

Progressive Matrices (SPM) test was employed because it had been widely used and

enjoyed moderately high indices of validity and reliability when used in a wide range of

cultures.

Raven's Progressive Matrices test is an example of a culture-fair test that has been used in

cross-cultural testing Brislin et al. (1973), Kline (1979), Raven (1989), and Murphy and

272
Davidshover (1991) held that Raven's Progressive Matrices was one of the most widely

used intelligence or ability tests in cross-cultural research.

It is a group test, which can be used with subjects of all language backgrounds and does

not depend to any large extent upon education or prior knowledge of the subjects. In

addition, it is suitable for all ages from the age of 6 years onwards

The Progressive Matrices (RPM, Raven, Raven & Court, 2000, Lynn & Vanhanen 2006)

is the most widely used test of intelligence in numerous countries throughout the world.

One reason for the popularity of the test was that it is non-verbal and can therefore be

applied cross-culturally. Also, it was considered to be the best test of g, the general factor

present in all cognitive tasks. The test was constructed by Raven (1939). Lynn, Allik,

Pullman, and Laidra (2004) have stated that the Progressive Matrices is widely regarded

as the best test of abstract or nonverbal reasoning ability.

The Progressive Matrices test has good psychometric characteristics. A huge body of

published research has shown the validity of this test. It has gained widespread

acceptance and use in many countries around the world. No other test has been so

extensively used in cross-cultural studies of intelligence. The RPM test is free from

language and apparently has limited dependence on cultural variables which make it a

popular instrument for use in developing countries

8.4 Meta analysis

Meta-analysis is a statistical approach to the aggregation summarization of results from

independent studies. It is systematic, thorough, objective, and quantitative. The essentials

273
of this technique are to collect all the studies on the issue, convert the results to a

common metric and average them to give an overall result. Procedures employed in meta-

analysis permit quantitative reviews and syntheses of research literature that address

these issues (Wolf, 1986). An epidemiologist has described meta-analysis as “a boon for

policy makers who find themselves faced with a mountain of conflicting studies” (Mann,

1990).

Any meta-analyst has to address three problems that have been identified by Sharpe

(1997) as the “Apples and Oranges”, “File Drawer” and “Garbage in - Garbage out”

problems.

The “Apples and Oranges” problem refers to the idea that different phenomena are

sometimes aggregated and averaged, where disaggregation may show different effects for

different phenomena. The best way of dealing with this problem is to carry out meta-

analyses, in the first instance, on narrowly defined phenomena and populations and then

attempt to integrate these into broader categories. In the present meta-analysis, this

problem has been dealt with by confining the analysis to studies using the Progressive

Matrices on school and university students.

The “File Drawer” problem means that studies producing significant effects tend to be

published, while those producing non-significant effects tend not to be published and

remain unknown in the file drawer. It is considered that this should not be a problem for

this present inquiry because in SPM studies results are not regarded as having significant

effect or not. Any result whatever its nature can be significant and deemed publishable.

There is no need to keep them “in the file drawer”.

274
The “Garbage in – Garbage out” problem concerns poor quality studies. Meta-analyses

that include many poor quality studies have been criticized by Feinstein (1995) as

“statistical alchemy” which attempt to turn a lot of poor quality studies into good quality

gold. Poor quality studies are liable to obscure relationships that exist and can be detected

by good quality studies. Meta-analysts differ in the extent to which they judge studies to

be of such poor quality that they should be excluded from the analysis. Some meta-

analysts are “inclusionist” while others are “exclusionist”, in the terminology suggested

by Kraemer, Gardner, Brooks and Yesavage (1998). This meta-analysis is “inclusionist”

in the sense that it included all the studies on the Progressive Matrices among school and

university students that have been located if the strict inclusion criteria apply to them.

The next problem in the meta-analysis was to obtain all the studies of the issue in

concern. This is a difficult problem and one that it is rarely and probably never possible

to solve completely. An attempt to find all relevant studies of the phenomena being

considered was conducted by examining previous reviews and searching computerized

database searches of PsycINFO, American Psychological Association (APA), American

Educational Research Association (AERA), Educational Testing Association (ETS),

National Council on Measurement of Education (NCME), Educational Resources

Information Centre (ERIC), Ingenta, Web of Science, Dissertation Abstracts, the British

Index to Theses, and Cambridge Scientific Abstracts for the years covered up to and

including 2009. In addition, active researchers in the field were contacted. In total, the

review of literature covered the years 1948 to 2009. It was considered that, although

finding all relevant studies was a problem for this and for many other meta-analyses, it

was not a serious problem for our present study because the results were sufficiently

275
obvious that they are unlikely to be seriously overturned by further studies that have not

been identified. If this should prove incorrect, other researchers will produce these

unidentified studies and integrate them into the meta-analysis.

A careful and thorough search for published and unpublished studies on the SPM test

using the above searching procedures produced 44 studies. They were carried out in 23

countries; 9 developed and 14 developing. The developed country with the highest

number of SPM studies was the United States (14 studies) while the developing country

with the highest number of SPM studies was India (four studies). The earliest study was

in the USA (1948) while the latest were in Qatar and Oman (2009). The overall sample

consisted of 31693 students aged from 8 years (grade 3) to 21 years (final year university

student). Although many studies were found using SPM, some of them did not fulfil the

inclusion criteria. Some studies lacked sufficient information or results. Some studies did

not carry out the test on all desired age groups. Some studies did not report the mean

values of the SPM test but reported the norms values only. These studies were excluded.

When studies did not report results based on age, different studies carried out on

individual ages were combined together to obtain results of age groups.

After a thorough investigation into the criteria that define social classes, it was not

possible to locate a single criterion that can be used in this context. Income, parent’s

occupation, education and culture were all used and the differences between the various

studies were vast. Many researchers have used different criteria when determining social

class. Tulkin and Newbrough in 1968 used occupation and education as factors to

determine social class, while Whorton and Karnes in 1979 used income as a sole factor.

276
Also, Nkaya et al. (1994), used occupation, culture and income as determinants of social

class. They reported that criteria applied to one country may not be applicable in other

countries to define them socially due to the huge social differences between countries. In

addition, the number of SPM studies that reported such criteria was limited. Eventually, it

was decided not to include social status in the meta-analysis for the above mentioned

reasons.

8.5 Study discussion

The discussion below has been organized according to the objectives of the study

outlined in chapter there. The primary focus is analysing the applicability of the SPM test

as an appropriate measure of mental ability (non-verbal reasoning ability, or fluid

intelligence, and g) for a sample of Libyan students. In addition the distribution of IQ

scores within the sample is identified and compared with that found in other countries,

(developed and developing). After that, the effects of independent variables on the SPM

test results are presented. Finally, SPM norms of the Libyan sample are discussed and

compared to other norms findings of various studies conducted in different cultures.

8.5.1 Psychometric characteristics of the SPM test in Libya

Until now, no single test of mental ability has been officially constructed or adopted for

the measurement of the intelligence in a Libyan setting. Lack of use of intelligence tests

in Libya is mainly due to a lack of test experts and information and knowledge regarding

the usefulness and effectiveness of these tests among people who were directly affected

by testing.

277
The present study tried to rectify this problem by investigating and examining the

performance of a Libyan sample on the Standard Progressive Matrices test, and by

exploring its applicability as an appropriate measure of mental ability. It has been

reported in the literature (Brown 1983; Anastasi and Urbina 1997; Kenneth 1998; Kline

2000; Langdridge 2004; Domino and Domino 2006; Mills and Airasian 2006; Lobiondo-

Wood and Haber 2006) that reliability and validity both were important for judging the

suitability of a test or measuring instrument and both were the most paramount

characteristics of a psychological test. To test the suitability of the SPM test, its

psychometric characteristics were extensively evaluated.

8.5.1.1 Reliability of the SPM test

This was tested using three methods:

A) Test-retest

Raven provided a test retest reliability ranging from .83 to .93 for several age groups: .88

(13 years and over), .93 (under 30 years), .88 (30-39 years), .87 (40-49 years), and .83

(50 years and over). The results of the present study (0.86 to 0.92) were in accordance to

results reported in the literature, such as Rao (1974), Abdel-Khalek (1988), Nkaya et al.,

(1994), Abdel-Khalek (2005) and Khelefeeh and Lynn (2009).

B) Split half

The majority of split-half internal consistency coefficients reported in the literature

exceeded 0.90. The lower reliability was 0.86 with 174 Iranian children (aged 9 years).

The higher reliability was 0.96 (91 psychiatric male patients) (Raven, 2004). This was in

agreement with the results of this study (0.88 to 0.96) and many other studies such as

278
(Raven et al., 2003). Burke and Bingham (1969), Baraheni (1974), Bart et al., (1986),

Powers et al., (1986.a), Duzen (1994), Court and Raven (1995), Ahmad et al (2008) and

Khelefeeh and Lynn (2009).

C) Internal consistency alpha

The majority of alpha consistency coefficients reported in the literature exceeded 0.95.

Our results (0.85 to 0.96) matched those of Dey (1984), Duzen et al, (1994), Rushton and

Skuy (2000), Rushton et al, (2002), Abdel-Khalek (2005) and Taylor (2007).

When this study results were compared to earlier studies, they appeared quite similar and

provided evidence that the SPM is a reliable measure when used with Libyan students.

These figures indicated a satisfactory reliability for the SPM test with the present Libyan

sample and gave strong evidence for the consistency of the SPM test. Anastasi (1988)

and Pallant (2007) believed that the desirable reliability coefficients should fall in the

range of .80’s or .90’s. The present results generally can be considered as high reliability

coefficients for the Libyan sample and support the reliability of the SPM test.

In addition, one would conclude that the measure of constancy of the reliability is high.

It was particularly noteworthy that the coefficients alpha reliabilities (KR-20) were

higher than the test-retest correlations, which was predictable as a result of the high

homogeneity of the test items, Abdel-Khalek (2005).

279
8.5.1.2 Validity of the SPM test

This was tested using two methods:

A) Construct Validity

This is divided into two analyses. First was the factor analysis. The SPM is considered by

Jensen (1980) to be a measure of the purest form of Spearman’s “g”, or in Jensen’s

terminology, as an excellent culture-fair measure of fluid intelligence “g”. Fluid

intelligence was a concept proposed by Cattell (1971) to designate reasoning ability as

distinct from other kinds of intelligence such as verbal knowledge, memory and spatial

ability. Cross-cultural studies, also, confirm the high ‘g’ saturation of the SPM. Some

factor analytic studies, however, suggest that the SPM measures other factors such as

visuo-spatial or ‘K’ factors, spatial ability, or memory, as well as a large ‘g’ factor

(Raven et al., 1977). A number of scholars have contended that while the Progressive

Matrices was largely a measure of g it also contained a small visualization or spatial

factor. These include Adcock (1948), Keir (1949), Banks (1949), Vernon (1950), Gabriel

(1954), Gustaffson (1984, 1988), who concluded that the SPM measures a reasoning

factor and a further factor that he called “cognition of figural relations”. Hertzog and

Carter (1988) have contended that the SPM contained two factors: verbal intelligence and

spatial visualization. Lynn, Allik & Irwing (2004) identified a general factor and three

further factors that they reported as the gestalt continuation found by van der Ven and

Ellis (2000), verbal-analytic reasoning and visuospatial ability. Further analysis of the

three factors showed a higher order factor identifiable as “g”.

Whatever the number, the evidence relating to factors other than “g” is, according to

Jensen (1980), inconclusive and dubious. He reported that the PM measures “g” and little

280
else, and that the loadings occasionally found on other “perceptual” and “performance”

type factors, independently of “g” are usually trivial and inconsistent from one analysis to

another. In fact, the PM has very meagre loadings on these factors, when “g” is excluded.

Anastasi (1982), on the other hand, stateed that the PM is heavily loaded with a factor

common to most intelligence tests (identified as Spearman’s “g” by British

psychologists) but that spatial aptitude, inductive reasoning, perceptual accuracy, and

other group factors also influence performance.

The outcome of the factor analysis in this study showed the presence of only one factor

which was spearman’s “g”. This result was in agreement with the SPM test 1996 and

2004 manuals, Burke and Bingham (1969), Zager et al., (1980), Abdel-Khalek (1987)

and (2005),

Second was internal consistency. In the present study, there were strong, positive

correlation coefficients, statistically significant between the five sets (A, B, C, D and E)

and total scores ranging from 0.51 to 0.85. This was in agreement with Abdel-Khalek

(1987) and Abdel-Khalek (2005). Overall, construct validity showed good characteristics

when the SPM was applied to a Libyan sample.

B) Criterion-related Validity

This study provided evidence that the validity of the SPM was found to have moderate

significant correlation with students’ academic achievements (SAA) when it was used as

external criterion validity. According to the SPM test manual (2004), the external

criterion commonly adapted in predictive validity investigations are examination grades

or teacher’s estimates. SPM correlations with overall academic achievement tests

281
generally fall in the region of 0.26 to 0.76. Our results were in agreement with Raven et

al. (2004), Tulkine and Newbrough (1968), Mclaurin and Farrar (1973), Sinha (1968),

Baraheni (1974), Sinha (1977), Maqsud (1980), Powers et al., (1986.b), Avoy (1987),

Carver (1990), Majdub (1991) and Laidra et al (2007). The results of the study showed

that the SPM was valid when applied to a Libyan sample.

8.5.1.3 Item analysis of SPM test

Nunnally (1972) and Burroughs (1975) argued that item difficulty is required because it

is almost always necessary to present items in their order of difficulty, the easiest first to

give a sense of accomplishment and an optimistic start, and if this is not done a blockage

may occur with many students being unable to progress beyond the first items, while the

more difficult items are placed near the end to prevent students from spending undue

amount of time on difficult items early in the testing period.

Many researchers believe that test items should include some easy and some difficult

items, but most items should be located in the 20 to 80 percent zone of easiness, Karmel

(1978). Our analysis showed that set A was the easiest set whereas set E was the most

difficult set but noticing that set D was easier than set C (0.01 means percentage

difference), according to Hopkins (1998) 51 out of 60 items had excellent discriminating

value and 13 items and one set were not arranged in an order of increasing difficulty.

Rushton et al, (2002) and Boben et al. (2007) also showed set D to be easier than set C.

Overall results indicated that the difficulty level of the SPM test employed in the present

study was suitable for Libyan students.

282
8.5.2 IQ in Libya

Overall, the mean IQ result obtained from the Libyan students was 81 (85 maximum

mean and 74 minimum mean). The average IQ score of developing countries was 82,

whereas the average IQ score for developed countries was 95. As there was no

statistically significant difference in IQ scores between Libya and developing countries,

Libya was considered as a developing country for the comparison purposes of this study.

The following table (8.1) showed mean IQs for some countries in North Africans and

South Asians and the average IQ for developing countries.

Table 8.1 mean IQs and average for some developed and developing countries
IQs of North Africans = 80.71
Location Age N Test IQ Reference
North Africa Adults 90 SPM 84 Raveau et al., 1976
Egypt 6–12 129 SPM 83 Ahdel-Khalek, 1988
Sudan 8–12 148 SPM 75 Ahmed, 1989
Sudan 6-9 1683 CPM 81 Khatib et al., 2006
Sudan 9-25 6202 SPM 79 Khaleefa et al., 2008b
Sudan 9 3185 SPM 79 Irwing et al., 2008
Tunisia 20 509 SPM 84 Abdel-Khalek & Raven, 2006
IQs of South Asians = 83.93
Location Age N Test IQ Reference
Bahrain 19-29 100 SPM 81 Khaleefa & AlGharaibeh, 2002
Iran 15 627 SPM 84 Valentine, 1959
Iraq 14–17 204 SPM 87 Abul-Hubb, 1972
Iraq 18–35 1185 SPM 87 Abul-Hubb, 1972
Jordan 11-40 2542 APM 86 Lynn & Abdel-Khalek, 2009
Kuwait 6–15 6529 SPM 86 Abdel-Khalek & Lynn, 2006
Oman 5-11 1042 CPM 87 Khaleefa & Lynn, 2009
Oman 9-18 5139 SPM 82 Abdel-Khalek & Lynn, 2008
Qatar 10–13 273 SPM 78 Bart et al., 1987
Qatar 6–11 1135 SPM 88 Khaleefa & Lynn, 2008d
Saudi Arabia 8-14 3967 SPM 80 Abu-Hatab et al., 1977
Syria 7 241 CPM 83 Guthke & Al-Zoubi, 1987
Syria 7-18 3489 CPM 83 Khaleefa & Lynn, 2008a
Yemen 6–11 1000 CPM 85 Al-Heeti et al., 1997
Yemen 6-11 896 CPM 83 Khaleefa & Lynn, 2008c
UAE 6-11 4496 CPM 83 Khaleefa & Lynn, 2008b
Average IQs for developing countries = 82.95

283
Average IQs of Europeans = 97.77
Location Age N Test IQ Reference
Czech Rep. 5-11 832 CPM 96 Raven et al, 1995
Denmark 5-11 628 SPM 97 Vejleskov, 1968
Estonia 12/18 2,689 SPM 100 Lynn et al., 2002
Estonia 7/11 1,835 SPM 98 Lynn et al., 2003
Finland 7 755 CPM 98 Kyostio, 1972
France 6-9 618 CPM 97 Bourdier, 1964
Germany 5-7 563 CPM 99 Winkelman, 1972
Germany 11-15 2,068 SPM 105 Raven, 1981
Germany 11-15 1,000 SPM 99 Raven, 1981
Germany 6-10 3,607 CPM 101 Raven et al., 1995
Germany 5-10 980 CPM 97 Raven et al., 1995
Iceland 6-16 665 SPM 101 Pind et al., 2003
Ireland 6/12 1,361 SPM 93 Carr, 1993
Ireland 9/12 2,029 SPM 87 Carr, 1993
Ireland 9/12 2,029 SPM 91 Carr, 1993
Netherlands 5-10 1,920 CPM 99 Raven et al., 1995
Netherlands 6-12 4,032 SPM 101 Raven et al., 1996
Russia 14-15 432 SPM 97 Lynn, 2001
Slovakia 5-11 823 CPM 96 Raven et al., 1995
Slovenia 8-18 1,556 SPM 96 Raven et al., 2000
Spain 6-9 854 CPM 97 Raven et al., 1995
Spain 11/18 3,271 APM 102 Albade Paz & Monoz, 1993
Switzerland 6-10 200 CPM 101 Raven et al., 1995
Switzerland 9-15 246 SPM 104 Spicher, 1993
Turkey 6/15 2,272 SPM 90 Sahin & Duzen, 1994
United Kingdom 6-15 3,250 SPM 100 Raven et al., 1998
Average IQs of East Asians = 104.42
Location Age N Test IQ Reference
China 6/15 5,108 SPM 101 Lynn, 1991
China 6/12 269 SPM 104 Geary et al., 1997
China 17 218 SPM 103 Geary et al., 1999
Hong Kong 6/13 13,822 SPM 103 Lynn, Pagliari & Chan, 1988
Japan 9 444 SPM 110 Shigehisa & Lynn, 1991
Taiwan 6/8 764 CPM 105 Rabinowitz et al., 1991
Taiwan 9/12 2,476 CPM 105 Lynn, 1997
Average IQs of South Americans = 97.50
Location Age N Test IQ Reference
Canada 7/12 313 SPM 97 Raven et al., 1996
United States 18/70 625 SPM 98 Raven et al., 1996

284
Average IQs Israel, Singapore& Australia = 95.78
Location Age N Test IQ Reference
Israel 10/12 268 SPM 95 Globerson, 1983
Israel 11 2,781 SPM 89 Lancer & Rim, 1984
Israel 9-15 1740 SPM 90 Lynn, 1994
Singapore 13 337 SPM 103 Lynn, 1977b
Australia 18 6,700 SPM 100 Craig, 1974
Australia 5/10 700 CPM 98 Raven et al, 1995
Average IQs for developed countries = 98.60

Table (7.1) illustrates that the mean IQ result obtained from the Libyan student’s (81 IQs)

was similar to the IQ value of other developing countries in North Africa and South Asia

reported by Lynn and Vanhanen (2002, 2006). This indicated the validity and reliability

of the SPM test and may be considered as an appropriate measure of mental ability for

Libyan students. Lynn and Vanhanen (2006) showed the average IQs for the developing

countries value to be (82.95 IQs), which was similar to the IQ value of developing

countries (82 IQs) obtained from the present meta-analysis. Similarly, Lynn and

Vanhanen (2006) showed the average IQs for the developed countries value to be (98.6

IQs), which was similar to IQ value of developed countries (95 IQs) obtained from the

present meta-analysis which indicated the validity and reliability of meta-analysis study.

It is noteworthy that data from some studies carried out in developed countries reported

the norms to calculate the IQ scores and not the means. Therefore, as the SPM means

were used in this meta-analysis, it was not possible to use such data in the meta-analysis.

It is known that intelligence has increased remarkably in economically developed nations

during the last 70 years or so (Flynn, 1984, 2007; Lynn & Hampson, 1986). The reasons

for this are not fully understood. Reasons probably lie in improvements in nutrition and

education that have accompanied rising living standards (Lynn, 1990, Ceci, 1991,

285
Benton, 2001), and it can be anticipated that as living standards rise in North Africa and

the Middle East, abstract reasoning ability will also rise. Many people from Galton

(1869) onwards have considered that it would be desirable if intelligence could increase.

Although education appears to improve intelligence, the process by which it does this

remains unknown. Presumably, education teaches problem-solving skills which are used

in intelligence tests. Education in Sudan and other Arab countries tends to concentrate on

rote learning and memorization. In Sudan, Irwing et al., (2008) evaluated the effects of

Abacus Training in mental computation on intelligence assessed with the SPM test.

Abacus training consists of training in mental arithmetic including working memory in

which information is stored in working memory while other mental operations are

performed, and then retrieved. The training procedure has been described by Hatano

(1977) and Hatano & Osawa (1983). Mental arithmetic is required in a number of tests of

fluid intelligence such as the Progressive Matrices. It has been shown by Carpenter, Just

& Shall (1990) that the Progressive Matrices is largely a mathematical problem solving

test in a design format, requiring the application of five mathematical rules involving

addition, subtraction, arithmetical and geometrical progression. The results suggested that

the intelligence of Sudanese children would significantly increase by introducing a

greater emphasis on acquisition of problem solving skills in Sudanese schools.

Further, schools in Libya do not promote problem solving abilities in students as well as

do those in the United Kingdom, teachers are not as well trained, and children in Libya

do not have much experience in carrying out intelligence tests (Attashan and Abdalla

2005). It is possible that the observed group differences are attributable, at least in part, to

the relative novelty of the testing process, as suggested by Stanczak et al. (2001).

286
Lynn & Vanhanen (2002, 2006) proposed three theories in an attempt to explain how

development status affects SPM. The theories were:

• IQ determines development status.

• Development status determines IQ.

• Both processes are involved by positive feedback, also known as reciprocal

interaction.

The current data are consistent with all three of these. Lynn and Vanhanen presented

arguments that the third hypothesis is the most reasonable. In addition, nine principal

factors have been reported as being responsible for some groups achieving higher IQ

scores than others. The factors are as follows:

(1) Improvement in education: this has been the most favoured factor, proposed by

Tuddenham (1948), Flynn (1984, 2007), Teasdale and Owen (1994), Flieller (1996,

1999), Greenfield (1998), Jensen (1998), Weede & Kampf (2002), Garlick (2002), Blair,

Gamson, Thorne & Baker (2005), and Meisenberg, Lawless, Lambert & Newton (2006).

Education engulfs many aspects and can be obtained by many various ways, but

education is mostly achieved by attending school. Students from developed countries are

expected to receive better schooling education than their counterparts. Schools affect

intelligence in several ways, most obviously by transmitting information. Schools

promote and permit the development of significant intellectual skills, which develop to

different extents in different children. Also schooling changes mental abilities, including

those abilities measured on psychometric tests. It has been shown that students who have

been in school longer have higher mean scores, which would explain why higher SPM

287
scores are achieved as age of student’s increases. Also, students who attend school

intermittently score below those who go regularly (Neisser, 1995). Also, parent’s

education plays a significant role. Students from families with educated parents scored

higher SPM results than families with uneducated parents (Abdulla 2002).

(2) Increased test sophistication; Tuddenham (1948), Brand (1987), and Jensen (1998).

Students in developed countries attempt such psychometric tests since childhood and gain

some familiarity with such tests, whereas students from developing countries do not

usually attempt such tests and may exhibit some fear in attempting such tests (Abdulla

2002).

(3) The greater cognitive stimulation arising from the greater complexity of more recent

environments provided by e.g. television, media and computer games: Elley (1969),

Jensen (1998), Schooler (1998), Williams (1998), and Sundet, Barlaug & Torjussen

(2004), Essawe (1973). All these would enhance the perception and awareness of

children and improve mental abilities. In addition, cognitive ability increases with age,

probably as a result of learning and brain growth (Lynn, 2008 personal communication).

Abdalla et al., in 2002, Lynn and Irwing 2004 and 2005 studies supported the result

showing that IQ scores increase with age.

(4) Improvements in child rearing: Elley (1969) and Flieller (1996). Normal child

development requires a certain minimum level of responsible care. Severely deprived,

neglectful, or abusive environments would have negative effects on many aspects of

development, including intellectual aspects. It is expected that as child rearing improves,

child’s scores in SPM increases.

288
(5) More confident test taking attitudes: Brand (1987) and Brand, Freshwater & Dockrell

(1989). Usually students in developing countries do not have much experience of taking

intelligence testing as compared to students in developed countries, Stanczak & Awadalla

(2001), Lynn et al., 2008. In addition, in developing countries students are usually

apprehensive and afraid from tests. Also, older students would have more confidence

towards attempting tests than younger students. This is a very important point. Students

with more experience and confidence would logically score higher in the test, even

though their mental ability might not be higher. This factor might be one of the causes of

the difference between developed and developing countries.

(6) The “individual multiplier” and the "social multiplier" (Dickens & Flynn, 2001;

Flynn, 2007). The concept of the “individual multiplier” is that intelligent individuals

have a thirst for cognitive stimulation and this increases their intelligence through

positive feedback. The "social multiplier" posits “that other people are the most important

feature of our cognitive development and that the mean IQ of our social environs is a

potent influence on our own IQ” (Flynn, 2007). This would explain that children brought

up in a university town should have higher intelligence that those without this advantage,

because the high intelligence of the professors will enhance the intelligence of the

population.

(7) Improvements in nutrition: Lynn (1990a, 1993, 1998), Jensen (1998), Colom, Lluis-

Font & Andres-Pueyo (2005), and Arija, Esparo, Fernandez-Ballart et al. (2006).

Prolonged malnutrition during childhood has long-term intellectual effects. The effects

289
may well be indirect. Malnourished children are typically less responsive to adults, less

motivated to learn, and less active in exploration than their more adequately nourished

counterparts (nielssen 1993). It is expected that students might be more prone to

malnutrition in developing countries than their counterparts.

(8) Smaller family size (Sundet, Borren & Tambs, 2008). Smaller families means less

economical burden. Parents would be able to provide better education, nutrition and child

needs. Child rearing would be easier and more focused. In the United States and Europe

it has invariably been found that the relation between intelligence and family size is

negative, i.e. children with large numbers of siblings have lower IQs than children in

small families (Abdel-Khalek, Lynn, 2008). Moreover, Lynn (1996) summarized results

of 17 studies that reported this negative relationship. The correlations varied between -

0.19 and -0.34 with an average of -0.26. A theory to explain these results positing that

family size has causal effects on intelligence was advanced by Lynn (1959). This theory

proposed that parents give more attention to children in small families and this enhances

children’s intelligence.

Two theories have been advanced to explain these results. These are:

• The confluence theory of Zajonc’s (1976, 1983, 2001a) states that the child’s IQ

is partly determined by the attention the parents and siblings give to it. This

explains the negative relation between family size and intelligence, because the

smaller the number of children in the family, the greater the amount of attention

they are likely to receive from their parents. The result of this will be that children

in small families will have higher average IQs than those from large families.

290
• The resource dilution theory of Blake (1981) and Downey (2001) proposes that

“parental resources are finite and that as the number of children in the family

increases, the resources accrued by any one child necessarily decline” (Downey,

2001). The theory is similar to the confluence theory but broader in so far as it

posits that parental resources consist of a variety of phenomena including the

material, financial and cultural quality of the home, parental treatment of children,

and opportunities afforded to children. It is also broader in its explanatory power

in so far as it purports to explain the negative relation between sibship size and

educational attainment in addition to the relation with intelligence.

(9) Heterosis: Jensen (1998, p.327) suggested heterosis (hybrid vigor) as a possible

contributor to the Flynn effect. Heterosis is the mating of two individuals from

different ancestral lines i.e. the marriage of two individuals that are from different

origins such as the marriage of a white American to an African America or Hispanic

or Asian American. Jensen argued this is wide spread in the United States as a result

of immigration from many different countries. Mingroni (2004) had further argued

this theory.

The author agrees with the above mentioned factors and stresses the importance of

education as a major factor. In addition, economy plays a pivotal role. IQ scores are

higher in economically developed nations. According to Lynn (2008), IQ in developing

countries will increase by about 3 points a decade with further economic development

(personal communication with Prof. Lynn).

291
The above mentioned factors explain the reason why IQ in students from developed

countries is higher than their counterparts. Students from developed countries have

environmental advantages from better nutrition, health, education, and sometimes smaller

family size.

On the other hand, human intelligence, like height, is influenced by numerous genetic

interactions, sensitive to numerous environmental factors. The literature has shown

evidence of genetic factors associated with IQ, but the extent is still controversial. In

addition, some researchers hypothesized that intelligence is a phenotype. Even if

intelligence is largely genetic, it cannot be understood without reference to the genes’

environment. This has been shown in studies conducted in twin’s studies and adoption

studies (Richardson and Sarah 2006, Lynn and Vanhanen 2006).

8.5.3 SPM and gender

The Progressive Matrices is a useful test to examine sex differences in intelligence. The

issue of whether there are any sex differences on the Progressive Matrices has frequently

been discussed and it has been virtually universally concluded that there is no difference

in the mean scores obtained by males and females. This has been one of the major

foundations for the conclusion that there is no sex difference in reasoning ability or in g,

of which the Progressive Matrices is widely regarded as an excellent measure.

The first statement that there is no sex difference on the test came from Raven himself

who constructed the test and wrote that in the standardisation sample “there was no sex

difference, either in the mean scores or the variance of scores, between boys and girls up

292
to the age of 14 years. There were insufficient data to investigate sex differences in

ability above the age of 14” (Raven, 1939, p.30). The conclusion that there is no sex

difference on the Progressive Matrices has been endorsed by numerous scholars.

The results of the present study and meta-analysis supported this hypothesis and were in

agreement with previous studies of Eysenck (1981), Court (1983), Mackintosh (1996),

Jensen (1998), Rushton et al. (2002), Pind et al. (2003), Lynn et al. (2004), Abdel-Khalek

and Lynn (2006), Taylor (2007), Kaleefa and Lynn (2008), Khaleefe et al. (2008),

Ahmad et al. (2008) and Abdal-Khalek and Lynn (2009). They examined the hypothesis

that there is no gender difference on the Progressive Matrices and that, as Mackintosh

(1998a) put it the gender difference on the Progressive Matrices is “0.15 to 2.1 IQ points

either way”, i.e. in favour of men or women

The assertion that there is no gender difference in average general intelligence has been

made repeatedly since the early decades of the twentieth century. Terman (1916) and

Spearman (1923) asserted that there is no gender difference in g. Jensen (1998) calculated

gender differences in g on five samples and concluded that, “no evidence was found for

gender differences in the mean level of g”. Similarly “there is no gender difference in

general intelligence worth speaking of” (Mackintosh, 1996).

Some studies found no sex differences in SPM scores for subjects at younger age e.g.

Tulkin & Newbourgh, (1968) with fifth and sixth grade students; Powers et al., (1986.b)

with sixth and seventh grade students; Sidles and Avoy, (1987) with seventh grade

students; Persaud (1987) and Zeidner (1988) with seventh grade students. Sex differences

in Libya are similar to those found in many economically developed countries, i.e. there

293
are no significant differences at the ages of 8 and 9 years. Girls obtained a significantly

higher mean than boys at the age 10 years, supporting the developmental theory that girls

mature more rapidly than boys at this age, advanced in Lynn (1994, 1999, 2004, 2005).

At 11 years, males scores were statistically higher than female’s scores. At 12, 13 and 14

years, there were no differences in SPM scores between males and females. At the ages

of 15 through 17, boys obtained consistently higher means than girls. These higher

means were statistically significant. This again supports the developmental theory that

boys obtain higher average means at these ages. These age trends are consistent with

numerous studies from western developed countries such as Irwing and Lynn in 2005. At

ages 18 through to 21, no statistical differences were found.

These are interesting results because they show that sex differences in Libya are similar

to those in economically developed nations, contrary to the suggestions that have

sometimes been made that girls in traditional societies are socially handicapped and this

impairs their intellectual development, and that as females have become more

emancipated and gained greater equality in economically developed western nations,

their cognitive abilities improve. This theory receives no support from the present results.

This significant gender by age interaction is explained by Lynn (1994) and Lynn &

Irwing (2005). It is because boys and girls mature at different rates. Boys and girls have

the same development and IQ up to about 11 years. Then girls accelerate in the "growth

spurt”. Than at about age 16, girls cease to grow but boys continue to grow physically

and in IQs. The data for Libyan sample confirm this.

294
In the present study the gender difference in variability (Vr) in total sample and within

each age, geographic nature and academic discipline can be detected from the standard

deviations and variance ratios. At the ages of 8, 9, 10, 12, 13, 14, 15, 17, 18 and 20 years

old, females have greater variability than males. In total sample and at ages of 11, 16, 19

and 21 years old males have greater variability than females (note that Vr greater than 1.0

indicate that males have greater variance than females, while Vr less than 1.0 indicate

that females have greater variance than males). Concerning geographic areas, results

showed that males have greater variability than females in total sample and in each

geographic area. Regarding academic discipline, results showed that females have greater

variability than males in total sample and in each study academic discipline.

In regards to variance in the meta-analysis, there were small differences between males

and females in total sample, in favour of males. In the different age groups the variability

was also small except in the 15-17 age groups, in favour of males. In addition, females

had greater variability than males in developed countries. The age groups 12-14 and 18-

21 showed small variability in favour of females. The developing countries analysis

showed small variability in favour of males. It has been repeatedly asserted that males

have greater variability of IQs than females, but there are a number of contrary studies.

The present study and meta-analysis results add to these in showing no consistent sex

differences in variability (Lynn et al., 2008, Khaleefa, 2008). Regarding variance in

development status, this study showed a large variance in favour of developed countries

in all age groups and in total sample. These overall results showed no consistent tendency

for gender difference in variability.

295
Gender differences in variance were examined because it has frequently been contended

that males have greater variability than females. This assertion was made in the early

years of the twentieth century by Havelock Ellis (1904), Thorndike (1910) and Terman

(1916). This difference in variability was proposed by these early writers to explain why

men are so greatly over-represented among geniuses. As there was no sex difference in

general intelligence, a greater variability among males entailing more males among those

with very high intelligence (as well as more males with very low intelligence) was

suggested to provide a solution to this problem.

Thorndike (1910) put the theory as follows: “The trivial difference between the central

tendency of men and that of women which is a common finding of psychological tests

and school experience may seem at variance with the patent fact that in the great

achievements of the world in science, art, invention, and management, women have been

by far excelled by men. One who accepts the equality of typical representatives of the

two sexes must assume the burden of explaining this great difference in the high ranges

of achievement. The probably true explanation is to be sought in the greater variability

within the male”. Thorndike examined test data on variability and concluded that men are

about one twentieth more variable than women.

Terman (1916) also discussed the question and wrote that “it is often said that women are

grouped closely around the average, while men show a wider range of distribution”.

However, in his data for 1000 children aged 6 to 14 years he found no difference between

boys and girls in variability. The greater male variability was reaffirmed by Eysenck

(1981, p. 42) and recently by Deary, Irwing, Der and Bates (2007). However, not all

296
studies have found greater male variability, including a meta-analysis of the performance

of college students on the Progressive Matrices by Irwing and Lynn (2005). This study

showed that there was no consistency in variability between males and females in SPM

scores. Likewise results were also found in the meta-analysis.

8.5.4 SPM and region

In regards to difference in SPM mean scores depending on regions, no differences were

found between cities and villages, or between coastal, mountain and dessert villages or

between main and secondary cities. This can be attributed to the urbanisation process of

Libya. According to the first general National General Censuses of 1954 only 25% the

total population were classified as urban settlers. However, within just four decades the

proportion of urban population had increased substantially to 90% of the total population

(Figure 1).

Figure 1: Urbanisation development in Libya 1954-1995

Source: General National Census of 1954, 1964, 1973, 1984, 1995.

This dramatic and quick increase of urban population on the expense of rural

population has led some analysts to classified Libya as one of the most urbanised

297
countries in the world (Kezieri, 1995). This situation has also affected the specific

characteristics of rural areas as many of these characteristics have been influenced or

already been replaced by urban lifestyle. Many rural populations are now engaged in

urban life style such as jobs and occupation activities, and using modern household

appliances and equipments. As a result of these recent socio-economic changes, a

number of analysts have pointed out that the nature of rural areas and communities are

now being replaced by urban features (Attir and Al-Azzabi, 2002; Kezeiri, 1995). This

present study failed to detect significant differences between rural and urban students.

Both urban and rural students have similar schools, level of teacher training and

facilities. Moreover, all mainstream level schools in Libya follow the same national

curriculum. This fact can be directly associated with a similar level of cognitive

development, because both environments provide similar stimuli (abu-hsd, 2002).The

Flynn effect stated that IQ is directly related to education. As both rural and urban

students were receiving the same level of education, no differences in IQ were detected.

8.5.5 SPM and age (study level)

For the purposes of this study, age was equivalent to study level. Statistically significant

differences in SPM mean scores was found. In the main study, analysis showed that the

British percentile equivalents of the means of the ages combined on the British norms for

the SPM collected in 1979 and given in Raven (1981) are the 16th PC for the 8 year olds

(IQ=85), the 13th PC for the 9 year olds (IQ=83), the 8th PC for the 10 year olds (IQ= 79),

and average the 6.7th PC (IQ= 79.4) for the 11-17 year olds. The American percentiles

percentile equivalents are the 9th PC for the 18 year olds (IQ=80), the 11th PC for the 19

and 20 years olds (IQ=82), the 4th PC for the 21 year olds (IQ= 83), and average the

298
8.75th PC (IQ= 81.75). Overall, the IQs obtained by the Libyan students ranged between

74 and 85. The average IQ for the fourteen tested Libyan age groups 8 through 21 was

81.

Similarly, in the meta-analysis, older students achieved higher SPM scores than younger

students. (8-11 age group IQ 91, 12-14 age group IQ 87, 15-17 age group IQ 89, 18-21

age group IQ 88).

As the age of the student increased, naturally the study level increased. All tested students

in a certain grade were all in the same age e.g. all tested 3rd grade students were 8 years

of age. That was done to ensure all students has the same academic experience, re-sit

students usually had more academic experience than first time students.

These results were in agreement with other studies. Abdalla et al., in 2002, Lynn and

Irwing 2004 and 2005 studies supported the result showing that IQ scores increased with

age. It is suggested that cognitive ability increases with age, probably as a result of the

learning and growth of the brain (Lynn, 2008 personal communication).

In addition, greater cognitive stimulation arises from the greater complexity of more

recent environments provided by e.g. television, media and computer games: Elley

(1969), Jensen (1998), Schooler (1998), Williams (1998), and Sundet, Barlaug &

Torjussen (2004), Essawe (1973). All these would enhance the perception and awareness

of children and improve mental abilities as age increases.

In a representative sample for the entire population from childhood to adulthood one

would expect to find a progressive increase in the SPM scores with age groups. Previous

299
studies reported the increase of SPM scores with younger subjects e.g. Baraheni (1974),

Sinha (1977), Pind et al. (2003), Lynn et al. (2004) and Khelefeeh and Lynn (2009).

Nevertheless, with a Tanzanian secondary school sample, Kilingelhofer (1967) found

that there was a tendency for the SPM scores to vary inversely with age especially 15, 16

and 17 years. Burke and Bingham (1969) found that the performance on the SPM was

negatively related to age for a sample of 91 patients with age ranged from 19 to 59 years.

Also, Byrt and Gill (1973) who standardized the SPM test in Ireland concluded that

intelligence does not remain constant from age 15 throughout the adulthood but rises and

fall in different groups depending upon education, training or intellectual activities which

these group indulge in or neglect.

In Iran, Baraheni (1974) reported that intellectual functions tapped by the Progressive

Matrices reached a maximum level in an Iranian group by age 15 and that at a higher age

level the test failed to differentiate age groups. Burke (1985) found that the score of the

SPM decreased with increasing age, his result was based on the screening of 500

vocational counselling and 2992 psychiatric patients. Finally, in study carried out in

Jamaica, Persaud (1987) suggested that the decline of intellectual capacity of women

from the age of 26 years onwards on the SPM can be attributed to age.

An interesting finding in this study was that there was an increase in SPM scores until 19

years of age. After that, an almost steady plateau in SPM results until 21 years of age was

found; there were no differences in SPM scores after 19 years of age. This was consistent

with numerous SPM data sets reviewed in Raven (1939), Raven (1941), Raven (1986),

300
Raven (1989), Raven, et al., (1995), Raven, et al., (1996), Raven, et al., (1996a), Raven,

(1998), Raven, et al., (2000). Thus, fluid intelligence reached its plateau around the age

of 20.

8.5.6 SPM and academic discipline

In regards to academic discipline, there were statistically significant differences in SPM

mean scores in favour of the scientific academic discipline in all four university study

levels. This may be attributed to the familiarity of science students with some courses in

science discipline which deal with abstract reasoning. One of the major problems in the

education system in Libya, particularly in the art discipline, is that the method of learning

in this academic discipline relies heavily on rote memorisation, and little attention is paid

on reasoning or abstract thinking. It seems that rote learning is a factor that the SPM

cannot measure (Attashan and Abdalla 2005).

The findings of this study is similar to Shanthamani’s (1970) who found that science

students scored higher than art students on Alexander’s Battery for intelligence and also

agreed with Sinha (1977) who found that science students scored higher on the SPM in

an Indian sample and (Attashan and Abdalla 2005) in his unpublished data.

8.5.7 Relationship and prediction of SPM

According to the SPM test manual (2004), the external criterion commonly adapted in

predictive validity investigations are examination grades or teacher’s estimates. SPM

correlations with academic achievement tests generally fall in the region of 0.20 to 0.60

(Raven et al., 2004). This study showed a correlation of 0.33 to 0.56. This was in

agreement with Tulkine and Newbrough (1968) Mclaurin and Farrar (1973) Sinha (1968)

301
Baraheni (1974) Sinha (1977) Maqsud (1980) Powers et al., (1986.b) Avoy (1987)

Carver (1990) Majdub (1991) Laidra et al (2007). The average correlation of these

studies and others was found to range between 0.37 to 0.49 (see table 4.6). A possible

explanation would be that of Andrich, & Styles, (1994). They believed that Progressive

Matrices test contains material not taught directly in schools and yet shows substantial

relationship with scholastic achievement.

The results of this study showed that age and achievement were predictors of SPM results,

with age being the best predictor. As age and achievement increased, SPM results

increased. Similarly, in the meta-analysis, results showed that SPM score means were

predicted by age and development status; age was also the best predictor. SPM scores

increased as age increases and as development status improved. Our results were in

agreement with previous studies carried out by Pind et al. (2003) and Taylor (2007). This

confirms earlier results that gender and region in the main study and gender in the meta-

analysis have no effect on SPM scores.

8.5.8 SPM percentiles

A number of studies have indicated that students from developing countries performed

less well than students from developed countries on the SPM test. According to the SPM

(1996) manual, an Australian study by de Lemose (1989) noted a tendency for students

from non-English speaking cultures, such as Southern / Eastern European and Middle

Eastern countries, to score lower in the SPM test.

Raven et al., (2004) reported that some groups lagged behind the British norms such as

groups from Brazil, Ireland and black and Native Americans within the USA. In all

302
countries, norms of children from less privileged socio-economic backgrounds and rural

area are lower than their counterparts. They added that the explanation most commonly

offered for these differences was that the test did not engage the concerns of people from

disadvantaged backgrounds and that it demanded thought processes which were

unfamiliar to them.

The range of difference between the percentile scores between the Libyan students and

the British sample aged (13 years) was from 7 to 14 points. They varied by 7 points at

95th percentile, 10 points at 90th percentile, 9 points at 75th percentile, 10 points at 50th

percentile, 12 points at 25th percentile, 14 points at 10th percentile and 13 points at 5th

percentile. E.g. if a Libyan student aged 13 years scored 33 on the SPM test, he would

score in the 50th percentile according to the Libyan norms. However, according to the

SPM manual (1988, 1996 and 2008) he would score in the 10th percentile of the British

norms. Also, if a Libyan student aged 14 years scored 47 he would be in the 95th

percentile of the Libyan norms, 50th percentile according to the Slovenia, Australian

norms and British. These two examples illustrated the misuse and misinterpretation of

intelligence tests used now in Libya due to the use of standardised western norms instead

of local norms (please refer back to chapter three for more discussion).

The lower scores of the Libyan sample in the SPM test with respect to developed

countries norms were expected. All studies conducted in developing countries determined

that individuals from developed countries scored higher than individuals from developing

countries in the SPM test. The meta-analysis which was conducted in chapter seven in

this study revealed that there was a significant difference between students from

303
developed countries and students from developing countries in the SPM mean scores (df

= 2,125, F = 8.157, P<.000).

This might be explained in terms of variation in education, environment, nutrition, child

rearing, social income, confidence in test taking, family size, the “individual multiplier”

and “social multiplier” and heterosis. In addition, amount of previous familiarity with test

material and testing situation may have had a role. For almost all of the Libyan students

this was their first time to see or take an IQ test.

Regarding education in Libya, the human development report in 2002 in Libya stated an

obvious deficiency in teaching skills among teachers. The average is 30 or more students

per teacher. Also, school building and facilities were deemed out-dated and inappropriate

for carrying out the teaching process. This reached a maximum of about 70% of schools

in some places. Up-to-date computer programs are not available in 89% of the school

(p327). Nutrition in Libya shows a lack of strategic planning on the national level, with

there a huge dependence on imported food (p378). According to the General Authority of

Information in 2006 Average family size in Libya was 6 individuals. 18% of the families

contained more than 10 individuals, whereas 50% of the families had more than 5

individuals. The average income in Libya was 2618 Libyan Dinar (Equivalent to £1300

pounds) per year. Also, traditions in Libya dictate that marriages are done from within the

country. It is highly unpreferable for a Libyan to marry a non-Libyan.

304
The percentile ranks of the SPM scores for the Libyan sample in this study emphasized

the need for separate norms for age groups, male and female students and art and science

discipline students.

8.6 Study conclusions

In this chapter we have examined and evaluated the findings of this study. The aim is to

adopt a mental ability test suitable for a Libyan population. The lack of such complete

and useful means of testing in the third world, generally, and Libya in particular is

sufficiently an indicator of the vitality of this research study. As stated in section 7.2, the

current employed mental tests in Libya share the feature of incompleteness. The test does

not cover the whole range of test-items that is meant to cover. As a solution of this

problem the current study presents the SPM test as an alternative. Its psychometric

characteristics place it in the top of the list of appropriate intelligence tests in Libya.

Since the whole study is made up of two parts: main study and meta-analysis, the

conclusion of each is presented in the following:

A) Main study conclusions:

1. It showed that intelligence measured by the SPM has validity in a new country

(Libya) in which the SPM has not been used until now.

2. The overall SPM score means for the Libyan sample was 32.31 with a standard

deviation of 11.94 (minimum scores 6 and maximum 58). This was considered lower

than students from developed countries but similar to those from developing

countries.

3. The IQ score was 81 for the fourteen, from 8 to 21, Libyan age-groups.

305
4. No gender significant differences were found on SPM means score in total sample as

well as in ages 8, 9, 12, 13, 14, 18, through 21. However, females obtained

significantly higher SPM means than males at age of 10 years. Whereas, males scored

significantly higher means than female at the ages of 11 and 15 through 17. In

addition, there were no significant gender differences in total means and in each

region means. Also there was a lack of significant gender differences in total means

and in each discipline means (science & art). Thus, the gender variable was not an

important factor affecting the Libyan students’ scores on the SPM test.

5. Gender differences in variability on SPM test; results indicated no consistent

tendency for gender difference in variability.

6. No significant difference in sample performance on the SPM test according to region.

Thus, the region variable was not an important factor affecting the Libyan students’

scores on the SPM test.

7. Significant differences were found between the SPM scores based on age as well as

study levels. Thus, age and study levels variables were important factors affecting

Libyan students’ scores on the SPM test

8. Students from the science academic discipline had significantly higher SPM mean

scores than students from the art discipline. Thus, the academic discipline was an

important factor affecting the Libyan students’ scores on the SPM test.

306
9. All correlation coefficients between SPM and students (SAA) were statistical

significant for all groups.

10. Age and achievement were predictors for SPM results with age being a better

predictor. Whereas gender and region were not significant predictors.

B) Meta-analysis conclusions:

1. The SPM test was valid in a different culture (Libya) from economically developed

western nations.

2. Developed countries achieved higher SPM scores than developing countries and than

Libya. No statistically significant differences were found in SPM scores between Libya

and developing countries. Thus development status was concluded as being an important

factor affecting the SPM.

3. The IQ score was 95 for developed countries and 82 for developing countries.

4. SPM scores increased as age increased. In addition, SPM scores of the age groups

were statistically different based on development status but not different based on gender.

5. No significant differences were found between SPM scores based on gender. In

addition, no gender differences were found among the age groups or development status.

6. No consistent tendency for gender difference in variability.

7. Age and development status were predictors for SPM results. Age was a better

predictor.

307
8.7 study contributions

Following are the contributions of this study to the intelligence testing in Libya:

• This study is considered to be the first attempt to standardize Raven’s Standard

Progressive Matrices (SPM) test for a sample from Libya.

• Providing norms for the (SPM) test for use, in conjunction with examination

grades, aim to help in implementing appropriate decisions related to :

1. The future of individuals and to guide them to educational programs that

will better suit their abilities.

2. Job selection to match applicants to suitable employment. Many sectors in

Libya only use examination grades as the method in matching students to

various academic establishments and for various jobs in the vocational

sector. IQ scores may assist in this selection process.

3. Assist in the identification of gifted individuals (geniuses) and diagnosis

of individuals with mental retardation.

• Providing the means to estimate levels of intelligence since our society lacks these

tests, to be able to recognize high as well as low IQ in the society.

• Providing possible data regarding the difference in level of intelligence between

gender, age groups and different locations such as rural and urban areas.

8.8 Limitations of the Study

This study was carried out to standardize the British mental ability test; administering the

Raven's Standard Progressive Matrices (SPM) test to a sample consisting of School and

308
University students (8 to 21 years) from the eastern province in Libya during the year

2007 – 2009. To provide an intelligence test that best suited a Libyan setting.

It should be taken into account that the goal was not to change or underestimate the

existing method (examination grades) used now in Libya as a measure of school

achievement, but to offer researchers and psychologists a mental ability test to be used in

conjunction with examination grades in order to improve prediction and placement

procedures.

As mentioned earlier, intelligence is a very difficult construct to define. There are

different types of intelligence besides the aspect of intelligence (educative ability or

general cognitive ability) that the SPM measures such as social intelligence, emotional

intelligence, and the intelligent hands of a craftsman or the intelligent intuition of a

scientist. All these elude the ‘g’ straightjacket. Also, IQ tests do not measure intelligence

directly but those qualities that are thought to reflect it. As a consequence, within each

test there is an element of subjectivity.

IQ tests are criticised on a number of other levels. For example, they are validated

primarily in terms of their correlation with educational achievement. But this ignores the

fact that educational achievement is influenced by factors such as social class,

opportunity, and motivation. Another interesting phenomenon is the fact that a person can

increase his or her score through practice.

In addition, although subtests measuring different abilities tend to be positively correlated

(people who score high on one such subtest are likely to be above average on others as

309
well), individuals rarely perform equally well on all the different kinds of items included

in a test of intelligence. One person may perform relatively better on verbal than on

spatial items, for example, while another may show the opposite pattern.

These complex patterns of correlation can be clarified by factor analysis, but the results

of such analyses are often controversial themselves. Spearman has emphasized the

importance of a general factor, “g”, which represents what all the tests have in common,

while Thurstone focused on more specific group factors such as memory, verbal

comprehension, or number facility. It should be noted that to base a concept of

intelligence on test scores alone is to ignore many important aspects of mental ability.

Other mental abilities defined broadly but not measured by intelligence tests include

creativity, emotional intelligence, social intelligence, and persistence.

Proponents of general intelligence posit that intelligence is innate and heritable single and

measurable, and does not change, nor is it affected by culture or environment. The

evidence based testing of “g” using standardized tests validates its use as a reliable

predictor of student success. There is a huge amount of evidence that “g” is a reliable

predictor of student educational attainment, earnings and socio-economic status (Brody,

1992; Lynn & Vanhanen, 2002, 2006; Mackintosh, 1998b). Brody, N. (1992).

Intelligence. San Diego, CA: Academic Press.

This suggests that all mental test of cognition (verbal, mathematical, spatial visual, and

memory) measure “g”, a similar single factor. It is the “g” factor that makes mental tests

a valid predictor of intelligence. Even though “g” is an established predictor of

intelligence, proponents of plural intelligences (Gardner, 1983, 1993, and 1995) suggest

310
“g” measures only verbal-linguistic and mathematical-logical intelligences, omitting

other intelligences that are just as important.

The limitations of the historical definitions of intelligence led Guilford, Thurstone,

Gardner, and Sternberg to develop theories of multiple intelligences. Guilford and

Thurstone argued that intelligence is comprised of several independent factors; Sternberg

argued that intelligence is comprised of three abilities; and Gardner’s original theory

suggested intelligence is comprised of seven abilities, later adding an eighth. Gardner’s

multiple intelligence (MI) theory posits intelligence is plural, culturally bound, varies in

strength, develops at various rates, and is immeasurable using psychometric tests. His

work with retarded and savant children and adults with brain damage led to the

development of this theory. Gardner originally proposed seven intelligences: verbal,

musical, mathematical, kinaesthetic, spatial, interpersonal, and intrapersonal. He later

added an eighth, naturalistic.

Group comparisons of IQ are problematic. Attempts have been made to make ‘culture-

fair’ or ‘culture-free’ tests, as if such a thing were possible, to allow comparisons of ‘g’

between people from very different societies. But “culture fair” is not valid in all settings

in which the SPM was conducted. When Lev Vygotsky tested Russian peasants back in

the 1930s, he found that answers that seemed logical to an urbanite were responded to

quite differently, but with parallel logic, by the peasants.

It has become well established that intelligence has increased in a number of countries

during the last 80 years or so. An early study by Tuddenham (1948) reported that the IQ

of American conscripts increased by 4.4 IQ points a decade over the years 1917-1943.

311
Subsequent studies confirmed that IQ increases have occurred in the United States,

Scotland, England, Japan and several countries in continental Europe (Scottish Council

for Research in Education, 1949; Cattell, 1951; Lynn, 1982; Flynn, 1984, 1987, 2007;

Lynn & Hampson, 1986; Lynn, Hampson & Mullineaux, 1987). Most of these IQ

increases have been reported in the economically developed nations and very few

economically developing countries including Brazil (Colom, Flores-Mendoza & Abad,

2007), Dominica (Meisenberg, Lawless, Lambert & Newton, 2005), Kenya (Daley,

Whaley, Sigman, Espinosa & Neuman, 2003), and Sudan (Khaleefa & Lynn, 2009).

Within the last years, it was noticed that the SPM test was failing to discriminate above

the 75th percentile among adolescents and young adults living in societies with a tradition

of literacy. This happened due to the dramatic and unexpected international increase in

SPM scores over the years. This was evident in societies where individuals have been

tested by the SPM several times and were acquainted with such tests. As our tested

sample in Libya did not carry out the SPM test before and they had no past experience

with mental testing, the SPM test deemed appropriate to be used in this situation. Also,

the ceiling effect exhibited in tested developed countries was not evident in developing

countries. The highest score obtained in our sample was 58 correct items out of 60. The

ceiling effect means that a number of test takers get all the answers right and have

therefore reached a ceiling. It can be inferred that these would have been able to answer

more difficult answers correctly. Ceiling effects have been observed in the Progressive

Matrices as average scores have increased during the last 70 years and increasing

proportions have reached the ceiling. To deal with this problem, Raven has added some

312
more difficult items to the Standard Progressive Matrices in a new version called the

Standard Progressive Matrices Plus (Raven et al., 2000).

8.9 Recommendations of the Study

This standardization is considered the first attempt to standardize the progressive

matrices test in Libya. Many difficulties were faced during this process, in which

individual efforts were reinforced by various establishment efforts to overcome these

difficulties. The wide landscape covered in this study and huge financial obligations were

not easy to be met. In this respect, the researcher would like to suggest the following:

1. It is hoped that result of this study will help Libyan researchers and psychologists to

develop a better understanding of mental test and their use, misuse and limitations

and to stop testing and labelling of children according to scores and norms obtained

from incomplete or non-standardized intelligence tests. Also, it is hoped that this

effort will stimulate similar studies in the area of psychological testing in Libya today

where further research is needed.

2. The results of this study are encouraging enough to start the testing movement in

Libya by conducting more studies and adapting more psychological tests. Culture

fair tests for intelligence such as the SPM test which were constructed in developed

countries can be successfully adapted and standardized to a Libyan sample.

Therefore, because of the need for psychological tests, at least one test in each of the

following areas: intelligence, aptitude, vocational interest and personality should be

adapted from different cultures and standardized in Libya. The construction of a

313
specialized psychological department in the Ministry of Education and Ministry of

higher education in Libya to supervise standardization tests is highly desirable.

3. Due to the significant differences noticed in this study between students according to

academic discipline and age, it is recommended to use separate norms for each group.

4. No single intelligence test in existence today is a full, accurate and comprehensive

measure of mental ability. Since the SPM test is considered as a measure of nonverbal

ability, therefore it should be always used in conjunction with other test of verbal

ability to measure both abilities.

5. This study indicated that the SPM has high reliability and validity. Therefore, it

seems that the SPM is capable of identifying higher achieving students and thus can

potentially be used safely for school selections beside examination scores.

6. It is recommended to use the SPM test in Libya to identify gifted students, and

students with low mental ability or with low academic achievements. The SPM has

been shown to be one of the best predictors of both high and low educational

attainments (Brody, 1992; Lynn & Vanhanen, 2002, 2006; Mackintosh, 1998b). It

may be desirable to place gifted children in special classes, a practice known as

streaming in Britain and tracking in the United States. In Britain, comprehensive

education has now largely superseded this approach. It is argued that the advantage of

this is that gifted students can be given accelerated education. Conversely, students

with low mental ability or with low academic achievements can be identified and put

in classes for slow learners and taught at a slower pace suitable for their ability.

314
7. As Libyan children fail to develop reasoning skills while they are in school, as

compared with British children, it may be that the solution to this problem would be

for teachers in Libya to devote more attention to teaching reasoning skills.

The SPM is also used for job selection, i.e. to identify those with the ability to perform

well in cognitively demanding occupations, and could usefully be introduced in Libya for

this purpose.

8.10 Further research

This study has provided a useful basis for further studies. Based on the limitations and the

findings of this study the following related topics are recommended for further research:

1) Carry out the SPM test on age groups that were not tested in this study;

younger students, employees and adults in Libya. It would be useful to have

norms for these groups not tested in the present study, especially for a

representative sample of adults of different ages and gender. This would be useful

for job selection, and to see whether among adults men have higher a higher

average IQ than women, as reported in the meta-analysis of Lynn & Irwing

(2004).

2) Carry out and standardization of other mental tests as SPM Plus, coloured PM

tests and advanced SPM tests in Libya. A standardization of these tests would

provide additional useful norms for Libya. The Colored PM (CPM) is suitable for

young children aged 5-10 years, and the SPM Plus and advanced PM (APM) are

more difficult versions of the test suitable for people ranking at the top of the

ability range. Moreover, it will be substantially useful to curry out multiple

315
intelligence tests studies. These tests are based on plural intelligence theories such

as Gardner’s theory.

3) Study the effect of other factors as parents’ occupations, family size, parent’s

education, birth order and experience with the test on SPM results. The collection

of data for these would provide useful information about the correlation of

intelligence in Libya.

4) Designing and development of a mental test in Libya that best suits the local

environment. It would be useful to obtain data for Libya for other kinds of

intelligence such as verbal knowledge, memory and spatial ability.

316
References:

Abaujaafer, A. (1983). Pupils’ Achievement in Preparatory Schools in the City of


Tripoli, Libya and its Relationship to Parents’ Attitudes, Home
circumstance and Schooling, University of Sheffield.

Abdalla Saleh (2002). a meta-analysis on the Progressive Matrices:. University of


Omar El-Mukhtar [in Arabic].

Abdel-Khalek, A. (1988). "Egyptian Results on (SPM). Personality and Individual


Difference “ 9: 193-195.

Abdel-Khalek, A. and R., J. (2006). Normative data from the standardization of


Raven’s Progressive Matrices in Kuwait in an international context. Social
Behavior and Personality, 34, 169-180.

Abdel-Khalek, A.M. and L., R. (2006). Sex differences on a standardisation of the


Standard Progressive Matrices in Kuwait. Personality and Individual
Differences 40: 175-182.

Abraham, G. et al (1991) An Introduction: Pharmaco Epldemlology. United states.


Harvey Whitney Books company.

Abu-Hatab, F. et al (1977). The standardization of the Standard Progressive Matrices


in a Saudi sample. In F. Abu-Hatab (ed.): Studies on the Standardization of
Psychological Tests, Vol. 1, pp. 191-246. Cairo, Egypt: Anglo-Egyptian
Library [in Arabic].

Abul-Hubb, D. (1972). Application of Progressive Matrices in Iraq. In: L.J.


Cronbach and P.J. Drenth (eds.): Mental Tests and Cultural Adaptation.
The Hague: Mouton.

Abu-shad, H. (2002). Genetic and Environmental Factors Associated with Cognitive


Ability and Scholastic Achievement among Arabs of the Negev Region in
Southern Israel, University of Minnesota. PhD.

Ahmad, R. K., S.J., Z. And L, R (2008). Gender differences in means and variance
on the Standard Progressive Matrices in Pakistan . Mankind Quarterly, 49,
50-57.

Ahmann, J. A. G., M. (1976). Evaluation Pupil Growth: Principles of Tests and


Measurement. Boston, Allyn and Bacon, Inc.

Ahlam (2003) evaluate the relationship between intelligence and high school
students’ academic achievement. University of Omar El-Mukhtar [in
Arabic].

317
Aiken, L. (1988). Psychological Testing and Assessment. Boston, Allyn and Bacon,
Inc.

Alexopoulos, D. (1979). Revision and Standardization of the Wechsler Intelligence


Scalefor Children for the age of 13-15 Years in Greece, Univeristy of Wales
Cardiff.

Attashani S. and Abdalla Saleh (2005). Analysis mores of the study and effect extent
of this mores by collection from factors of personality, family and academic
achievement with students of university sample. University of Omar El-
Mukhtar [in Arabic].

American Educational Research Association, A. P. A., & National Council on


Measurement in Education (1999). Standards for educational and
psychological testing. Washington DC.

Anastasi, A. (1988). Psychological Testing. London, Macmillan Company.

Anastasi, A. A. U. S. (1997). Psychological Testing. New Jersey, Prentice-hall.

Andrich, D. A. S. I. (1994). "Psychometric Evidence of Intellectual Growth Spurts in


Early Adolescence." Journal of Early Adolescence 14: 328-344.

Arija, V. Esparo, G., Fernandez-Ballart, J., Murphy, M.M., Biarnes, E. & Canals, J
(2006). "Nutritional status and performance in test of verbal and non-verbal
intelligence in 6 year old children." Intelligence 34: 141-149.

Armfield, A. (1985). "A Comparison of High-ability and Low-ability Pupil scores on


Raven’s Standard Progressive Matrices at Primary School Attached to
South Normal University and Guangzhou School for the Deaf/Mute,
Guangzhou, People’s Republican of China. ." School Psychology
International 6: 24-29.

Arthur, W. A. D., D. (1994). "Development of a Short Form for the Raven APM
Test." Educational and Psychological Measurement 54: 394-403.

Arthur, W. A. W., D. (1993). "A Confirmatory Factor Analytic Study Examining the
Dimensionality of Raven’s Progressive Matrices." Educational and
Psychological Measurement 53: 471-478.

Ary, D. J., L. and Razavih, A. (1985). Introduction to Research in Education. New


York, Holt, Rinehart and Winston.

Banks, C. A. S. U. (1951). "An Item-Analysis of the Progressive Matrices Test." The


British Journal of Psychology (Statistical Section) 2: 92-94.

318
Baraheni, M. (1974). "Raven’s Progressive Matrices as Applied to Iranian Children."
Educational and Psychological Measurement 34: 983-988.

Barnett, S. M. W. W (2004). "National intelligence and the emperor's new clothes:


IQ and the Wealth of Nations." Contemporary Psychology 49: 389-396.

Bart, W. K. A. and Lane, J. (1986). "The Development of Proportional Reasoning in


Qatar." The Journal of Genetic Psychology 148: 95-103.

Benton, D. (2001). "Micro-nutrient supplementation and the intelligence of


children." Neuroscience and Behavioral Reviews: 297-309.

Berk, l. (2000). child development. Massachusetts, Allyn and Bacon.

Bertrand, A. A. C. J. (1980). Test, Measurement, and Evaluation: A Development


Approach. Reading, Mass, Addison-Wesley Publishing Company.

Biesheuvel, S. (1969). Methods for Measurement of Psychological Performance: A


Handbook of Recommended Methods based on an IUPS/IBP Working.
Oxford: , Party Blackwell.

Bingham, W. B. H. and M., S. (1966). "Raven’s Progressive Matrices: Consttruct


Validity." The Journal of Psychology 62: 205-209.

Blair, C. Gamson, D., T., S. and B., D (2005). "Rising mean IQ: Cognitive demand
of mathematics education for young children, population exposure to formal
schooling, and the neurobiology of the prefrontal cortex." Intelligence 33:
93 -106.

Blennerhassett, L. S., S. and Hibbett, C. (1994). "Criterion Related Validity of


Raven’s Progressive Matrices with Deaf Residential School Students."
American Annals of Deaf 139: 104-110.

Blood, D. A. B., W. (1972). Educational and Evaluation. New York, Harper and Row
Publishers.

Bocéréan, C. Fischer, J-P., & Flieller, A. (2003). "Long term comparison (1921-2001)
of numerical knowledge in 3 to five and a half year old children." European
Journal of Psychology of Education 18: 405-424.

Borg, B. A. G. M (1979). Educational Research. New York, longman.

Born, M. B., N. and Flier, H. (1987). "Cross-Cultural Comparison of Sex-Related


differences on Intelligence Tests: Ameta-analysis." Journal of Cross
Cultural Psychology 18: 283-314.

319
Brand, C. R. (1987). "Bryter still and bryter?" Nature 328: 110.

Brand, C. R., Freshwater, S. & Dockrell, W.B. (1989). "Has there been a massive
rise in IQ levels in the West? Evidence from Scottish children." Irish
Journal of Psychology 10: 388-393.

Brislin, R. A. T., R (1973). Cross-Cultural Research Methods. New York, John


Wiley and Sons.

Brown, F. (1971). Measurement and Evaluation. Iowa:, F.E.Peacock Publisher, INC.

Brown, F. (1971). Measurement and Evaluation. Iowa, F.E.Peacock Publisher, INC.

Brown, F. (1981). Measuring Classroom Achievement. New York, Holt, Rinehart


and Winston.

Brown, F. (1983). Principles of Educational and Psychological Testing. New York,


Holt, Rinehart and Winston.

Burke, H. (1958). "Raven’s Progressive Matrices: A Review and Critical


Evaluation." The Journal of Genetic Psychology 93: 199-228.

Burke, H. (1972). "RPM: Validity, Reliability, and Norms." Journal of Psychology


82: 253-257.

Burke, H. (1985). "Raven’s Progressive Matrices (1938): More in Norms, Reliability


and Validity." Journal of Clinical Psychology 41: 231-235.

Burke, H. A. B., W. (1969). "RPM: More on Construct Validity." Journal of


Psychological 72: 247-251.

Burroughs, G. (1975). Design and Analysis in Educational Research. Oxford, Alden


& Mowbray Ltd.

Byrt, E. A. G. (1973). The Standardization of the Raven Progressive Matrices and


Mill-hill Vocabulary Scale for Irish School Children aged Six to Twelve
Years, University of College Cork.

Carpenter, P. J., M. and Shell, P (1990). "What One Intelligence Test Measures: A
Theoretical Account of Processing in (SPM) Test." Psychological Review
97: 404 - 431.

Carroll, J. B. (1993). Human Cognitive Abilities. Cambridge, Cambridge University


Press.

320
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies.
New York, Cambridge University Press.

Carver, R. (1990). "Intelligence and Reading Ability in Grades 2-12." Intelligence 14:
449-455.

Cattell, R. B. (1951). "The fate of national intelligence: test of a thirteen year


prediction." Eugenics Review 17: 136-148.

Cattell, R. B. (1971). Abilities: Their Structure, Growth and Action. Boston,


Houghton Mifflin.

Ceci, S. J. (1991). "How much does schooling influence general intelligence and its
cognitive components? A reassessment of the evidence." Developmental
Psychology( 27): 703-722.

Chan, J. (1982). The Use of Raven’s Progressive Matrices Test in Hong Kong. 20th
International Congress of Applied Psychology. . Edinburgh Scotland.

Cohen, R. J. S., M.E (2002). Psychological testing and assessment: an introduction to


test and measurement. Boston, Mcgraw-Hill.

Colom, R., Andres-Pueyo, A. & Juan-Espinosa, M. (1998). "Generational gains:


Spanish data." Personality and Individual Differences 25: 927-935.

Colom, R., Lluis-Font, J.M. & Andres-Pueyo, A. (2005). "The generational


intelligence gains are caused by decreasing variance in the lower half of the
distribution: supporting evidence for the nutrition hypothesis." Intelligence
33: 83-92.

Colom, R., Flores-Mendoza, C.E. & Abad, F.J. (2007). "Generational changes on the
Draw-a-Man test: a comparison of Brazilian urban and rural children tested
in 1930, 2002 and 2004." Journal of Biosocial Science 39: 79-89.

Corman, L. A. B., M. (1974). "Factor Structures of Retarded and Non-Retarded


Children on Raven’s Progressive Matrices." Educational and Psychological
Measurement 34: 407-412.

Corsini, R. (1984). Encyclopedia of Psychology. New York, John and Sons.

Cotton, S. M., Kiely, P.M., Crewther, D.P., Thomson, B., Laycock, R. & Crewther,
S.G, (2005). "A normative and reliability study for the Raven’s Colored
Progressive Matrices for primary school aged children in Australia."
Personality and Individual Differences 39: 647-660.

321
Court, J. (1983). "Sex Differences in Performance on Raven’s Progressive Matrices:
A Review." The Alberta Journal of Educational Research 29 54-74.

Cronbach, L. (1970). Essential of Psychological Testing. New York, Harper and Row
Publisher INC.

Cronbach, L. (1990). Essential of Psychological testing. New York, Harper and Row
Publisher INC.

Daley, T. C. Whaley, S. E., Sigman, M. D., Espinosa, M. P., and Neuman, C. (2003).
"IQ on the rise: the Flynn effect in rural Kenyan children." Pychological
Science 14: 215-219.

Denscombe, M. (1998). The Good Research Guide: For Small-scale Social Research.
Buckingham: Open University Press.

Deshon, R. C. D. & Weissbein, D. (1995). "Verbal Overshadowing Effect on


Raven’s APM: Evidence for Multidimensional Ferrormance Determinants."
Intelligence 21: 135-155.

Dickens, W. T. F. J. R. (2001). "Heritability estimates versus large environmental


effects: the IQ paradox resolved." Psychological Review 108: 346-369.

Domino, L. A. G. D. (2006). psychological Testing: An Introduction. 2nd ed.


Cambridge. University Press.

Drenth, P. E. D. (1972). Implication of Testing for individual and Society. . In


Mental Test and Cultural Adaptation. . Netherlands, Mouton Publisher.

Drenth, P. V. D. F., H. and Omari, I. (1979). The Use of Classroom Test,


Examinations, and Aptitude Tests in Developing Countries. Netherlands: ,
SwetsZeitlinger.

Duffy, M., J, B. (2005). 'Univariate descriptive statistics', in Statistical Methods for


Health Care Research, ed. Munro, B. Philadelphia, Lippincott Williams and
Wilkins.

Durojaiye, M. (1984). "The Impact of Psychological Testing on Educational and


Personnel Selection in Africa. ." International Journal of Psychology, 19:
135-144.

Ebel, R. (1972). Essentials of Educational Measurement. . New Jersey: , Prentice, Inc.

Ebel, R. A. F., D (1991). Essentials of Educational Measurement. New Jersey:,


Prentice, Inc.

322
Education, S. C. f. R. I. (1949). he Trend of Scottish Intelligence. T. London,
University of London Press.

Edwards, O. W. (2003). Cattell- Horn-carroll (CHC) theory and mane difference in


intelligence scores., University of Florida. PhD.

Eells, K. D., A.; Havighurts, R. and Tyler, R. (1971). Intelligence and Cultural
Differences. Chicago:, University Press.

Egan, V. (1989). "Notes and Shorter Communications Link Between Personality,


Ability and Attitudes in a low IQ Sample. Personality and Individual
Differences." 10: 997-1001.

Elley, W. B. (1969). "Changes in mental ability in New Zealand." New Zealand


Journal of Educational Studies 4: 140-155.

Ellis, H. (1904). Man and Woman: A Study of Human Secondary Sexual


Characteristics. London: Walter Scott.

Eysenck, H. A., W. and Meili, B (1972). Encyclopedia of Psychology. London: ,


Search Press.

Eysenck, H. J. (1998). A new look at intelligence. New Brunswick,, NJ: Transaction


Books.

Ezeilo, B. (1978). "Validating Panga Munthu Test and Porteus Maze Test in
Zambia." International Journal of Psychology, 13: 333- 42.

Fancher, R. (1985). The Intelligence Men: Makers of the IQ Controversy. New York,
Morton and Company.

Felsen, I. (1991). The Influence of Age, Intelligence, Gender, and Socio-economic


Statues on Perceived Competencies of Gifted Talented Children.
Hamburgh: , University of Hamburgh

Flieller, A. (1996). "Trends in child rearing practices as a partial explanation for the
increase in children’s scores on intelligence and cognitive development
tests." Polish Quarterly of Developmental Psychology 2: 51-61.

Flieller, A. (1999). "Comparison of the development of formal thought in adolescent


cohorts aged 10-15 years (1967-1996 and 1972-1993)." Developmental
Psychology 35: 1048-1058.

Flynn, J. R. (1984). "The mean IQ of Americans: massive gains 1932 to 1978."


Psychological Bulletin 95: 29-51.

323
Flynn, J. R. (1987). "Massive gains in 14 nations: What IQ tests really measure."
Psychological Bulletin 101(171-191): 171.

Flynn, J. R. (1987). "Massive IQ gains in 14 nations: what IQ tests really measure."


Psychological Bulletin 101: 171-191.

Flynn, J. R. (1994). IQ gains over time. In R.J. Sternberg (Ed.), Encyclopedia of


human intelligence 617-623. New York, Macmillan.

Flynn, J. R. (1998). IQ gains over time: Toward finding the causes. In U. Neisser
(Ed.), The rising curve: Long-term gains in IQ and related measures (pp. 25-
66). Washington, DC, American Psychological

Flynn, J. R. (1999). "Searching for justice: The discovery of IQ gains over time."
American Psychologist 54: 5-20.

Flynn, J. R. and . (2007). What is Intelligence? Beyond the Flynn effect. Cambridge,
Cambridge University Press.

Fontes, P. K., T. Madaus, G.; and Airasian, W (1983). "Opinions of the Irish Public
on intelligence." Journal of Education 17: 55-67.

Foulds, G. A. D., P. (1962). "The Nature of Intelligence Deficit in Schizophrenia Pt.


I, A Comparison of Schizophrenic and neurotics." British Journal of Social
and Clinical Psychology 1: 7-19.

Foulds, G. D., P. McClelland, M. and McClelland, W (1962). "The Nature of


Intellectual Deficit in Schizophrenia: Pt. 2. A Cross-sectional Study of
Paranoid, Catatonic, Hebephrenic and Simple Schizophrenics." British
Journal of Social and Clinical Psychology 1: 141-149.

Foulds, M. A. R., J. (1948). "Normal Changes in the Mental Abilities of Adults as


Age Advances." Journal of Mental Science 94: 133-134.

Freeman, F. (1962). Theory and Practice of Psychology Testing. New York, Henry
Halt and Company.

Freeman, F. (1962). Theory and Practice of Psychology Testing. New York:, Henry
Halt and Company.

Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. New York,
Basic Books.

Gardner, H. (1993). frames of mind: thr theory of multiple intelligences. London,


Fontana.

324
Garlick, D. (2002). "Understanding the nature of the general factor of intelligence:
the role of individual differences in neural plasticity as an exploratory
mechanism." Psychological Review 109: 116-136.

Garrett, H. A. W., R (1966). Statistics in Psychology and Education. London,


Longmans.

Gay, L. R., E. M, et al. (2006). Educational Research: Competencies for Analysis


Applications. 8th ed. New Jersey, Pearson.

Gittinns, J. (1952). Approved School Boys London:HMSO.

Goetzinger, C. P., R. C. W, et al. (1967). "Non-language IQ tests uesd with deaf


children." Volta Review 69: 500-506.

Gomm, R. D., C. (2000). Using Evidence in Health and Social Care. London
Open University/Sage Publications Ltd.

Georgas, J. A. G., C. (1972). A Children’s Intelligence Test for Greece. Netherlands: ,


Mouton Publisher.

Georgas, J. G. (1790). "standardisation of a vocabulary Intelligence Test,(Final


Progress Report, Research MH 12544-01)." Athens: The Athenian Institute
of Anthropos.

Glass, G. (1976). “Primary, secondary, and Meta-Analysis of research.” Educational


Researcher 5: 3-8.

Gould, S. J. (1981). The mismeasure of man. New York, Norton.

Gould, S. J. (1996). The mismeasure of man (Rev. ed.). New York, Norton.

Green, B. and J. Hall (1984). “Quantitative methods for literature review.” Annal
Review of Psychology 35: 37-53.

Greenfield, P. M. (1998). The cultural evolution of IQ. In U. Neisser (Ed) The


Rising Curve. Washington, DC: American Psychological Association.

Gronlund, N. (1981). Measurement and Evaluation in Testing. New York, Macmillan


Publishing Co.

Guilford, J. (1967). The Nature of Human Intelligence. New York: , McGraw-Hill


Book Company.

325
Guilford, J. P. (1985). The structure-of-intellect model. In Wolman, B.B. (1985).
Handbook of intelligence: measurements, and applications. New york: ,
John Wiley & Sons.

Irwing, P. and L., R. (2005). Sex differences in means and variability on the
Progressive Matrices in university students: A meta-analysis. British Journal
of Psychology, 96, 505–524.

Irwing, P., H., A. K., O. and L., R. (2008). "Effects of Abacus training on the
intelligence of Sudanese children." Personality and Individual Differences
45: 694-696.

Helmes, S. (1987). "Concurrent Validation of AH2 as a Brief Measure of intelligence


in Canadian University Students." Educational and Psychological
Measurement 47: 725- 729.

Hennstein, R. J., Y. C, (1994). The bell curve: Intelligence and class structure in
American life. New York: , Free Press.

Hildebrand, D. K. (1986). Statistical Thinking for Behavioral Scientists. Boston,


Duxbury Press.

Herrnstein, R. (1973). IQ in Meritocracy. Great Britain. Allen Lane.

Herrnstein, R. A. M., C. (1994). The Bell Curve: Intelligence and Class Structure in
American Life. New York, The Free Press.

Heyneman, S. (1987). "Use of Examination in Developing Countries: Selection,


Research and Education Sector Management." International Journal of
Educational Development 7: 251-263.

Higgins, T. and Green, S. (2006). Cochrane Handbook for Systematic Reviews of


Reviews of Interventions.Browse the Handbook online at www.cochrane-
handbook.org

Hunt, E. (1975). Quote the Raven? Nevermore. Maryland: , Lawrence Erlbaum


Associates Publishers.

Husen, T. (1951). "The influence of schooling on IQ." Theoria 17: 61-88.

James, H. M. S. S, (2006). Research in Education: Evidence- Based Inquiry, 6th ed.


Boston, Pearson.

Jencks, C. (1972). Inequality: A reassessment of the effect of family and economic


success in America, New York: Basic Books.

326
Jensen, A. (1980). Bias in Mental Testing. London, Methuen and Co., Ltd.

Jensen, A. (1981). Straight Talk about Mental Tests. London, Methuen and Co., Ltd.

Jensen, A. R. (1980). Bias in mental testing. New York: , Free Press.

Jensen, A. R. (1998). The g Factor. Westport, CT: Praeger.

Jensen, A. S., D. and Larson, G. (1988). "Equating the Standard and the Advanced
Form of the Raven Progressive Matrices." Educational and Psychological
Measurement 48: 1091-1095.

Johnson, E. S., D. and Guertin, D (1994). "The Development and Validation of A


Reliable Alternate From For Raven’s Standard Progressive Matrices
Assessment." 3: 315-319.

Kaia Laidra , H. P., Juri Allik (2007). "Personality and intelligence as predictors of
academic achievement: A cross-sectional study from elementary to
secondary school " Personality and Individual Differences 42: 441-451

Kamin, L. a. E., H. (1981). Intelligence: The Battle for Mind. London, Pan Books.

Kamphaus, R. W., Petosky, M.D., Morgan, A.W (1997). A history of intelligence


test interpretation. In D.P. Flanagan, J.L. Genshaft, & P.L. Harrison (Eds.),
Contemporary intellectual assessment: Theories, tests, and issues (pp. 32-
47). New York: , Guilford.

Kaniel, S. a. F., S. (1991). "Level of Performance and Distribution of Errors in the


Progressive Matrices Test: A comparison of Ethiopian Immigrant and
Native Israel Adolescence." International Journal of Psychology,26: 25-33.

Karmel, L. K., M (1978). Measurement and Evaluation in the Schools. New York: ,
Macmillan Publishing Co., Inc.

Karnes, F. a. W. (1988). "Comparison of Group Measures in Identification of Rural,


Culturally Diverse Gifted Students. Perceptual and Motor Skills." 67: 751-
754.

Keehn, J. a. P., E. (1955). "Non-Verbal Tests Predictors of Academic Success in


Lebanon." Educational and Psychological Measurement 15 495-498.

Khaleefa, O. L., R. (2009). "The increase of intelligence in Sudan 1964-2006."


Personality and Individuel Differences 45: 412-413.

Khaleefa, O. & Lynn, R. (2008a) Sex differences on the Progressive Matrices: Some
data from Syria. Mankind Quarterly, 48, 345-352.

327
Khaleefa, O., Khatib, M.A., Mutwakkil, M.M. & Lynn, R. (2008b). Norms and
gender differences on the Progressive Matrices in Sudan . Mankind
Quarterly, 49, 177-183.

Khaleefa, O. & Lynn, R. (2008d). Norms for intelligence assessed by the Standard
Progressive Matrices in Qatar . Mankind Quarterly, 49, 65-71.

King, W. (1963). "Development of Scientific Concepts in Children." British Journal


of Educational Psychology 33: 240-252.

Kline (2000). Handbook of psychologial Testing. 2nd ed. London, Routledge.

kline, J. B. (2005). Psychological Testing: A Practical Approach to Design and


Evaluation New Delhi, Sage publications, Inc.
Kline, P. (1979). Psychometrics and Psychology. London:, Morrison and Gibb Ltd.

Klingelhofer, E. (1967). "Performance of Tanzanian Secondary School Pupils on the


(SPM) Test." Journal of Social Psychology 72: 205- 215.

Kaufman, A. & Kaufman, N. (2004). Esseentials of psycology testing. New


Jersey.John Wiley& Sons, Inc.

Langdridge, D. (2004). Research Methods and Data Analysis in Psychology.


Glasgow, Bell & Bain Limited.

Layman, H. (1968). Intelligence, Aptitude and Achievement testing. Boston,


Houghton Mifflin Company.

Levine, E. (1974). "Psychological Tests and Practices With the Deaf: A Survey of
the State of the Art." The Volta Review 76: 298-319.

Lewis, D. (1967). Statistical Methods in Education. London, University of London


Press LTD.

Lewis, D. (1974). Assessment in Education. London. London, University of London


Press.
LoBiondo-Wood, G. and J. and Haber (2006). Nursing Research, 6th ed. United
States of America, Mosby Inc.

Li, R. (1996). A theory of conceptual intelligence. London:, Praeger.

Lorge, I. (1945). "Schooling makes a difference." Teachers College Record 46: 483-
492.

Lynn, R. (1982). " IQ in Japan and the United States shows a growing disparity."
Nature 297: 222-223.

328
Lynn, R. H., S.L. (1986). "The rise of national intelligence: evidence from
Britain, Japan and the USA." Personality and Individual Differences 7: 323-332.

Lynn, R. P., C. C., J. (1988). "Intelligence in Hong Kong Measured for Spearman’s g
and the Visuosptial and Verbal Primaries Intelligence." 12: 423-433.

Lynn, R., Hampson, S.L. & Mullineaux, J.C. (1987). " A long term increase in the
fluid intelligence of English children." Nature 328: 797.

Lynn, R. (1990b). "Differential rates of secular increase of five major primary


abilities." Social Biology 38: 137-141.

Lynn, R. (1990a). "The role of nutrition in secular increases of intelligence."


Personality and Individual Differences 11: 273-285.

Lynn, R. (1993). Nutrition and intelligence. In P.A. Vernon (Ed) Biological


Approaches to the Study of Intelligence. Norwood, NJ: Ablex.

Lynn, R. (1994). Sex differences in brain size and intelligence: a paradox resolved.
Personality and Individual Differences, 17, 257-271

Lynn, R. (1998). In support of nutrition theory. In U. Neisser (Ed) The Rising


Curve. Washington, DC: American Psychological Association.

Lynn, R., Allik, J. & Irwing, P. (2004). Sex differences on three factors identified in
Raven’s Standard Progressive Matrices. Intelligence, 32, 411-424.

Lynn, R. and Irwing, P. (2004). Sex differences on the Progressive Matrices: a meta-
analysis. Intelligence, 32, 481-498.

Lynn, R. (2006). Race Differences in Intelligence: An Evolutionary Analysis. United


States of America, Athens, GA: Washington Summit Books

Lynn, R. T. V, (2006). IQ & Global Ineguality. united States of America, Athens,


GA: Washington Summit Books

Lynn, R. (2008). The Global Bell Curve. Augusta, GA: Washington Summit
Publishers.

Lynn, R. (2009). What has caused the Flynn effect? Secular increases in the
Development Quotients of infants Intelligence.

MacArthur, R. E., W (1962). "The Standard Progressive Matrices as a Culture-


Reduced Matrices of General Ability." Alberta Journal of Research 8: 54-
65.

329
MacAvoy, J. O., S. and Sidle, C (1993). "The Raven Matrices and Navajo Children:
Normative characteristics and culture fair Application to Issues of
Intelligence, giftedness and Academic Proficiency." Journal of American
Indian Education 33: 32-43.

Mackintosh, N. J. (1998b). IQ and Human Intelligence. Oxford, UK: Oxford


University Press.

Mackintosh, N. J. (1998a). "Reply to Lynn." Journal of Biosocial Science 30: 533-


539.

Mackintosh, N. J. (1996). "Sex differences and IQ." Journal of Biosocial Science 28:
559-571.

Macmillan, P. (2005). Social Research, 3rd ed. New York, S.Srarantakos.

Madern, A. M. a. V., S. (1967). "ricerrca sulle capacita di previsione scolastica del


PM 38 di Raven.(Research on Predictive Capacty of Raven's PM 38 test)."
bollrtion di Psicologia Applicata 79-82, 67-82.

Mahdawi, F. A. A. R., A. (1991). Libya: A Challenge Ahead. Great Britain:, Royal


College of Psychiatrists.

Majdub.G (1991). The Psychological Determining of Academic Achievement,


University of Bristol. Ph.D.

Maqsud, M. (1980). "Personality and Academic Attainment of Primary School


Children." Psychological Reports 46: 1271-1275.

Maqsud, M. (1983). "Relationship of Locus of Control to Self-Esteem, Academic


Achievement and Prediction of Performance Among Nigerian Secondary
School Pupils." British Journal of Educational Psychology 53: 215-221.

Marais, C. A. (2007). Using the differential Aptitude test to estimate intelligence and
scholastic achievement at grade nine level, University of South Africa. McS.

Marks, R. (1981). The Idea of IQ. New York:, University Press of America.

Matarazzo, J. (1972). Wechsler’s measurement and appraisal of adult intelligence.


Baltimore, Williams & Wilkins.

Mclaurin, W. A. F., W (1973). "Validates of Progressive Matrices Test Against IQ


and GPA." Psychological Reports 32: 803-806.

Mehotra, K. (1968). "The Relationship of WISC to Progressive Matrices." Journal of


Psychological Research 12: 114-118.

330
Mehryar, A. (1972). "Father’s Education, Family Size and Children’s Intelligence
and Academic Performance in Iran." International Journal of Psychology, 7:
47-50.

Meisenberg, G., Lawless, E., Lambert, E. & Newton, A. (2005). "The Flynn effect in
the Caribbean: generational change in test performance in Dominica."
Mankind Quarterly 46: 29-70.

Melikian, L. (1984). "The Transfer of Psychological Knowledge to Third World


Countries and its Impact on Development: The Case of Five Arab Gulf Oil
Producing States." International Journal of Psychology, 19: 65-77.

Messick, S. (1995). "Validity of psychological assessment: Validation of inferences


from persons' responses and performances as scientific inquiry into score
meaning." American Psychologist 50: 741-749.
Mingroni, M. A. (2004). "The secular rise in IQ: Giving heterosis a closer look."
Intelligence 32: 65-83.

Mingroni, M. A. (2007). " Resolving the IQ paradox: heterosis as a cause of the


Flynn effect and other trends." Psychological Review 114: 1104.

Miron, M. (1977). "A Validation Study of Transferred Group Intelligence Test."


International Journal of Psychology 12: 193-205.

Mohan, V. (1972). " Raven’s Progressive Matrices and Verbal Test of General
Mental Test." Journal of Psychological Research 16: 67-69.

Murphy, K. A. D. (1991). Psychological Testing: Principles and Application. New


Jersey, Prentice-Hall International, Inc.

Neisser, U. (1998). The rising curve: Long-term gains in IQ and related measures.
Washington, DC, American Psychological Association.

Nelson, H. (1979). Area Handbook Series: Libya a Country Study. Washington, D.C,
The American University.

Nkaya, H. H., M. and Bonnet, J (1994). "Result Effect on Cognitive Performance on


the RM-38 in France and Congo. Perceptual and Motor Skills." 78: 503-
510.

Noll, V. A. S., D (1979). Introduction to Educational Measurement. Boston,


Houghton Miffin Company.

Nunnally, J. (1972). Educational Measurement and Evaluation. New York, McGrow-


Hill Book Company.

331
Oakland, T. (1976). Non-biased assessment of minority group children: With bias
toward none. Paper presented at a national planning conference on
nondiscriminatory assessment for handicapped children. Lexington, KY.

Oakland, T., & Laosa, L.M (1976). Professional, legislative, and judicial influences
on psycho educational assessment practices in schools. In T. Oakland (Ed.)
(1976). Non-biased assessment of minority group children: With bias
toward none. Paper presented at a national planning conference on
nondiscriminatory assessment for handicapped children. Lexington, KY.

Ogunlade, J. (1978). "The Predictive Validity of the (RPM) with some Nigerian.
Educational and Psychological Measurement." 33: 465-467.

Ord, I. (1972). "Testing for Educational and Occupational Selection in Developing


Countries- a review." Occupational Psychology 46: 123 - 166.
Ortar, G. (1972). Some Principles for Adaptation Psychological Test. Netherlands,
Mouton Publisher.

Owen, K. (1992). "The suitability of Ranen's Standard Progressive Matrices for


various groups in south Africa." Personality and individual Differences
13(2): 149-159.

Parmar, R. (1989). "Cross-Cultural Transfer of Non-Verbal Intelligence Tests: An (in)


Validation Study." British Journal of Educational Psychology 59: 379-388.

Pallant, J. (2007). SPSS Survival Manual. Maidenhead, Open university Press.

Persaude, G. (1987). "Sex and Age difference on the Raven’s Matrices." Perceptual
and Motor Skills 65: 47-52.

Popoff-Walker, L. (1982). " IQ, SES, Adaptive Behavior and Performance on a


Learning Potential Measure." Journal of School Psychology 20: 222-231.

Powers, S. B., J. and Jones, P (1986.a). "Reliability of the (SPM) Test for Hispanic
and Anglo-American Children." Perceptual and Motor Skills 62: 348-350.

Powers, S. J., P. and Barkan, J (1986.b). "Validity of SPM as Predictor of


Achievement of Sixth and Seventh Grade students." Educational and
Psychological Measurement 46: 719 - 722.

Rao, S. (1974). "Study of Raven’s Progressive Matrices Test (1956)." Indian


Educational Review 9: 174-189.

Raven, J. (1986). "A nation really at risk:A review of goodlad,s ''A Place Called
School''." Higher Education Review 18: 65-79.

332
Raven, J., J. C. Raven, ( 2003). Manual for Raven’s Progressive Matrices and
Vocabulary Scales. Section 3: The Standard Progressive Matrices. San
Antonio, Harcourt Assessment, Inc.

Rushton J, P. and a. S. M. (2000). "Perfomance on Raven's Matrices by African and


White University Students in south Africa." Intelligence 28(4): 251-265.

Rust, J. and S. and Golombok (2004). Modern psychometrics, 2nd ed. New York,
Routledge.

Raven, J. (1981). Irish and British Standardisations. Oxford, UK: Oxford


Psychologists Press.

Raven, J. (1986). Manual for Raven's Progressive Matrices and Vocabulary Scales.
London, Lewis.
Raven, J. (1989). "The Raven Progressive Matrices: A Review of National Norming
Studies and Ethnic and Socioeconomic Variation within the United States."
Journal of Educational Measurement 26: 1 - 16.

Raven, J., Raven, J.C., & Court, J.H (1993). Manual for Raven's Progressive
Matrices and Vocabulary Scales (Section 1). Oxford, England:, Oxford
Psychologists Press.

Raven, J., Court, J.H. and Raven, J.C (1996). Standard Progressive Matrices. Oxford,
UK: Oxford Psychologists Press.

Raven, J., Raven, J.C. and Court, J.H (1998). Coloured Progressive Matrices. Oxford:
Oxford Psychologists Press.

Raven, J., Raven, J.C. & Court, J.H. (1998). Standard Progressive Matrices. Oxford,
UK: Oxford Psychologists Press.

Raven, J. (2000). Manual for Raven's Progressive Matrices. Oxford, Oxford


Psychologists Press.

Raven, J., Raven, J.C. and Court, J.H (2000). Standard Progressive Matrices. Oxford,
Oxford Psychologists Press.

Raven, J. a. C., J.H (1989). Manual for Raven's Progressive Matrices and Vocabulary
Scales. London, Lewis.

Raven, J. C., Court, J.H. and Raven, J (1996a). Raven Matrices Progressivas.
Madrid:, TEA Ediciones, S.A.

333
Raven, J. C. (1939). "The RECI series of perceptual tests: An experimental survey."
British Journal of Medical Psychology 18(16-34): 16.

Raven, J. C. (1939). "The RECI series of perceptual tests: An experimental survey."


British Journal of Medical Psychology 18: 16-34.

Raven, J. C. (1941). "Standardisation of Progressive Matrices." British Journal of


Medical Psychology 19: 137-150.

Raven, J. C. (1941). "Standardisation of Progressive Matrices." British Journal of


Medical Psychology 19: 137-150.

Raven, J. C., Court, J.H. & Raven, J. (1977). Manual for Raven’s Progressive
Matrices & Vocabulary Scales: The Crichton Vocabulary Scale, 1983
Revision. London, H.K.Lewis.

Raven, J. C., Court, J.H. & Raven, J. (1982). The Mill Hill Vocabulary Scale.
London, H.K.Lewis.

Raven, J. C., Court, J.H. & Raven, J. (1983). Manual for Raven’s Progressive
Matrices & Vocabulary Scales: Section 2. London, H.K.Lewis.

Raven, J. C., Court, J.H. and Raven, J (1995). Coloured Progressive Matrices.
Oxford, UK: Oxford Psychologists Press.

Raven, J. C., Court, J.H. & Raven, J. (1996). Standard Progressive Matrices. Oxford,
UK: Oxford Psychologists Press.

Raven, J. R., J. and Court, J (1988). Raven Manual: General Overview. Oxford,
Oxford Psychological Press.

Raven, J., J. C. Raven. ( 2003). Manual for Raven’s Progressive Matrices and
Vocabulary Scales. Section 3: The Standard Progressive Matrices. San
Antonio, Harcourt Assessment, Inc.

Raven, J., Raven, J. C., & Court, I. H. (2000, updated 2004). Manual for Raven’s
Progressive Matrices and Vocabulary Scales. Section 3: The Standard
Progressive Matrices. San Antonio, TX: Harcourt Assessment.

Riaz, A, Sarwat, J. Khanam, & Zaeema, R. Raven’s Standard Progressive Matrices


(Classic Form) in Pakistan In J. Raven, & J. Raven, (2008.), Uses and
Abuses of Intelligence: Studies Advancing Spearman and Raven’s Quest for
Non-Arbitrary Metrics. Unionville, New York: Royal Fireworks Press;
Edinburgh, Scotland: Competency Motivation Project; Budapest, Hungary:
EDGE 2000.

334
Richardson, K. (1991). Understanding Intelligence. Philadelphia, Milton Keynes.

Richardson, K. and Norgate S. (2006). "A Critical Analysis of IQ studies of Adopted


Children." Human Development 49: 319-335.

Rimoldi, H. (1948). "A Note on the Raven’s Progressive Matrices Test." Educational
and Psychological Measurement 8: 347-352.

Roe, K. a. R., A (1983). "Schooling and cognitive Development: A Longitudinal


Study in Greece." Perceptual and Motor Skills 57: 147-153.

Roid, G.H., & Barram, R.A. (2004). Essentials of Stanford-Binet Assessment. New
York: Wiley

Rushton, J. P. (1997). "Race, intelligence, and the brain: The errors and omission of
the "revised" edition of S.J. Gould's the mismeasure of man (1996)."
Personality and Individual Differences 23: 169-180.

Rust, J. (2008a). Coloured Progressive Matrices and Chrichton Vocabulary Scale


Manual. London, Pearson.

Rust, J. (2008b). Standard Progressive Matrices Plus Version and Mill Hill Manual.
London, Pearson.
Rust, J. A. G., S (1989). The Science of Psychological Assessment. New York,
Routledge.

Sahin, N. and E. and Duzen (1994). "turkish Standardization of the Rave's SPM(Age
6 to 15) " Paper presented to the 23rd International Conference of Applied
Psychology, Madrid.

Samuda, R. (1975). Psychological Testing of American Minorities: Issues and


Consequences. New York, Harper and Row Publisher.

Sattler, J. (1982). Children’s Intelligence and Special Abilities. Boston, Allyn and
Bacon Inc.

Sattler, J. M. (1988). Assessment of children. San Diego, Author.

Sattler, J. M. (1998). Assessment of children's intelligence. In C.E. Walker, & M.C.,


Roberts (Eds.), Handbook of clinical child psychology. New York, NY,
John Wiley.

Sattler, J. M. (1998). Assessment of children's intelligence. In C.E. Walker, & M.C.,


Roberts (Eds.), Handbook of clinical child psychology (2nd ed., pp. 85-100).
New York, NY, John Wiley.

335
Scarr, S. (1981). Race, Social Class, and Individual Differences in IQ. New Jersey,
Lawrence Erlbaum Associates Publishers.

Schooler, C. (1998). Environmental complexity and the Flynn effect. Washington


DC, American Psychological Association.

Schwarz, P. a. K., R (1972). Ability Testing in Developing Countries; A Handbook


of Principles and Techniques. New York:, Praeger Publishers.

Shanthamani, V. (1970). "Relationship Between Intelligence and Other Certain


Variables." Journal of Psychological Research 14: 28-34.

Shayer, M., Demetriou, A. & Pervez, M (1988). "The structure and scaling of
concrete operational thought: three studies in four countries." Genetic,
Social & Psychological Monographs: 309-375.

Shayer, M. (2007). "30 Years on-a large anti-'Flynn effect'? The Piagetian test
Volume & Heaviness norms 1975-2003." British Journal of Educational
Psychology 77: 25-42.

Shelley, D. A. C., D (1986). Testing Psychological Tests. London, Croom Helm Ltd.

Sidles, C. A., J (1987). "Navajo Adolescents Scores on (PLQ), (SPM), and (CTBS)."
Educational and Psychological Measurement 47: 703-709.

Sinha, M. (1977). "Validity of the Progressive Matrices Test." Journal of


Psychological Research 21: 221-226.

Sinha, U. (1950). Reliability and Validity of the Progressive Matrices Test. London,
University of London. M.A.

Sinha, U. (1968). "The Use of Raven’s Progressive Matrices Test in India." Indian
Educational Review(3): 75-88.

Singh, U. (1951). "A study of Reliability and Validity of the progressive Matrices
Test." british Journal of educational Psychology 21: 221-226.

Smith, M. A. G., G. (1977). "Relationship of Class-size to Classroom Processes,


Teacher Satisfaction and Pupil affect." AUSTRALIAN JOURNAL OF
EDUCATION 24(3): 329-331.

Snyderman, M., & Rothman, S (1988). The IQ controversy. The media and public
policy. New Brunswick, NJ, Transaction Publishers.

Sokal, M. (1987). Psychological Testing and American Society 1890 - 1930, New
Brunswick: Rutgers University Press.

336
Sorokin, B. (1954). "Standardisation and analysis of Progressive Matrices Test by
Penrose and Raven." Unpublished Report from Zagred Yugoslavia

Spearman, C. (1904). "Intelligence, Objectively Determined and Measured."


American Journal of Psychology 15: 201-293.

Spearman, C. (1927). The Abilities of Man. London, Macmillan.

Spearman, C. (1946). "Theory of General Factor." British Journal of Psychology 36:


117-131.

Spearman, C. E. (1923). The nature of intelligence and the principles of cognition.


London, Macmillan.

Spearman, C. J., L.L (1950). Human ability: a continuation of “The abilities of Man”.
London: Macmillan.

Spitz, H. H. (1989). "Variations in Wechsler interscale IQ disparities at different


levels of IQ." Intelligence 13: 157-167.

Sternberg, R. (1990). Metaphors of Mind: Conception of the Nature of Intelligence.


Cambridge, Cambridge University Press.

Sternberg, R. A. D., D (1986). What is Intelligence. New Jersey, Ablex Publishing


Corporation.

Sternberg, R. C., B. Ketron, J. and Bernstein, M (1981). "People’s Conceptions of


Intelligence." Journal of Personality and Social Psychology 41: 37-55.

Sternberg, R. J. W. S. I. B., L. (2000). Child development. Massachusetts, Allyn and


Bacon.

Sundet, J. M., Barlaug, D.G. & Torjussen, T.M (2004). "The end of the Flynn effect?
A study of secular trends in mean intelligence test scores of Norwegian
conscripts during half a century." Intelligence 32: 349-362.

Sundet, J. M., Borren, I. & Tambs, K (2008). "The Flynn effect is partly caused by
changing fertility patterns." Intelligence 36: 183-191.

Tashakkori, A. H., S and Yousefi (1988). "Effects of Pre-school Education on


Intelligence and Achievement of a Group of Iranian Elementary School
Children." International Review of Education 34: 499-508.

Teasdale, T. W. O., L. (1987). "National secular trends in intelligence and education:


a twenty-year cross-sectional study." Nature, 325: 119-121.

337
Teasdale, T. W. O., D.R. (1989). "Continuing secular increases in intelligence and a
stable prevalence of high intelligence levels." Intelligence 13: 255-262.

Teasdale, T. W. O., D.R. (1994). "hirty year secular trend in the cognitive abilities of
Danish male school leavers at a high educational leve." Tl. Scandinavian
Journal of Psychology 35: 328-335.

Teasdale, T. W. O., L. (2000). "Forty-year secular trends in cognitive abilities."


Intelligence 28: 349-362.

Teasdale, T. W. O., L. (2008). "Secular declines in cognitive test scores: a reversal of


the Flynn effect." Intelligence 36: 121-126.

Terman, L. M. (1916). The Measurement of Intelligence. New York: Houghton


Mifflin.

Thorndike, E. L. (1910). Educational Psychology. New York: Houghton Mifflin.

Thorndike, R. a. H., E (1977). Measurement and Evaluation in Psychology and


Education. New York, john Wiley and Son, Inc.
Thorndike, R. L. (1977). "Causes of IQ decrements." Journal of Educational
Measurement, 14: 197-202.

Thorstone, L. L. (1938). Primary mental abilities. Chicago:, university of Chicago


Press.

Tuddenham, R. D. (1948). "Soldier intelligence in world wars 1 and 11." American


Psychologist 3: 54-56.

Turner, S. M., DeMers, S. T., Fox, H. R., & Reed, G., M. (2001). "APA's Guidelines
for Test User Qualifications: An Executive Summary." American
Psychologist 56(12): 1099-1113.

Tulkin, S. a. N., J (1968). "Social Class, Race and Sex Differences on the Raven
(1956) Standard Progressive Matrices." Journal of Consulting and Clinical
Psychology 32: 400-406.

Tully, G. E. (1967). "Test-retest Reliability of the Raven Progressive Matricse Test


(form 1938) and the California Test of Mental Maturity, Level 4 (S-F
1963). ." Florida Journal of Educational Research 9: 67-74.

Tyler, L. a. W., W (1979). Test and Measurement. London, Prentic-Hall International,


Inc.

338
U.S. Department of Education, O. f. C. R. (2000). The Use of Tests as Part of High-
Stakes Decision-Making for Students: A Resource Guide for Educators and
Policy-Makers.

Urbach, P. (1974). "Progress and degeneration in the "IQ debate"." British Journal of
the Philosophy of Science 25: 99-135, 235-259.

Van den Broek, M. a. B., C (1994). "Detection of Acquired Deficits in General


Intelligence Using the National Adult Reading Test and Raven’s Standard
Progressive Matrices." British Journal of Clinical Psychology 33: 509-515.

Vejleskov, H. (1968). "An Analysis of Raven Matrix Responses in Fifth Grade


Children." Scandinavian Journal of Psychology 9: 177-186.

Vencent, K. a. C., J (1974). "A Re-Evaluation of Raven’s Standard Progressive


Matrices." Journal of Psychology 88: 299-303.

Vernon, P. (1960). Intelligence and Attainment Test. London, University of London


Press.

Vernon, P. (1969). Intelligence and Cultural Environment. London, Methuen.

Vernon, P. E. (1942). The reliability and Validity of the Progressive Matrices Test.
London, Admiralty Report,.

Virgolim, A. M. R. (2005). creattvtty and intelligence: a study of Brazilian gifted and


talented students, University of Connecticut. PhD.

Vroon, P. (1987). " Models of Educational Career with and Without IQ


Measurements." The Journal of Psychology 121: 273-279

Vroon, P. d., J. and Meester, A. (1986). "Distribution of Intelligence and Educational


Level in Fathers and Sons." British Journal of Psychology 77: 137-142.

Yonghua, s. (1991). "Report of using Raven's Standard Progressive Matrice in deaf


children." Acta Psychologica Sinica 23(1): 107-112.

Wechsler, D., . (1975). " Intelligence Defined and Undefined A Relativistic


Appraised." American Psychologist 30: 135-139.

Weede, E. K., S (2002). "The impact of intelligence and institutional improvements


on economic growth." Kyklos 55: 361-380.

Wesson, K. A. (2000). "The Volvo effect-Questioning standardized tests." Education


Week 20: 34-36.

339
Wheeler, L. R. (1942). "A comparative study of the intelligence of East Tennesse
mountain children." Journal of Educational Psychology 33: 321-334.

Whorton, J. a. K., F (1987). "Correlation of Stanford Binet Intelligence Scale Scores


with Various other Measures Used to Screen and Identify Intellectually
Gifted Students." Perceptual and Motor Skills 64: 461- 462.

Whorton, J. a. K., F (1988). "Comparison of the 1979 and the 1986 Norms on the
Standard Progressive Matrices for Economically Disadvantaged Students:
Implication for Identification of Gifted Children." Perceptual and Motor
Skills 67: 749-750.

Williams, W. M. (1998). Are we raising smarter children today? School and home
related influences on IQ. In U.Neisser (Ed) The Rising Curve. Washington,
DC, American Psychological Association.

Wolf, M. (1986) Meta-Analysis Quantitative Methods for Research Synthesis. New


Delhi, Sage Publications, Inc

Yoon, S., N. (2005). Comparing the Intelligence and Creativity Scores of Asian
American Gifted students and Caucasian Gifted students. Graduate School,
University of Purdue. PhD thesis . pp2-3.

Yonghua, S. (1991). "Report of using Raven's Standard Progressive Matrice in deaf


children." Acta Psychologica Sinica 23(1): 107-112.

Young, H. T., R.; Tesi, G. and Montemagni, G (1962). "Influence of Town and
Country Upon Children’s Intelligence." British Journal of Educational
Psychology 32: 151-158.

Yousefi, F. S., A.; Razavich, A.; Mehryar, A.; Hosseini, A. and Alborzi, S (1992).
"Some Normative Data on the Bender Gestalt Test Performance of Iranian
Children." British Journal of Educational Psychology 62: 410-416.

Zeidner, M. (1988). "Sociocultural Differences in Examinees’ Attitudes Toward


Scholastic Ability Exams." Journal of Educational Measurement 25: 67-76.

340
Appendix 1

Standard Progressive Matrices: Percentiles for Libyan sample.


Age in years
Score        
 18 1 2 2
6 1 0 0 0 0 0 0 0 0 0 0 0 0 0
7 2 1 1 0 0 0 0 0 0 0 0 0 0 0
8 4 2 2 1 0 0 0 0 0 0 0 0 0 0
9 8 5 4 2 1 1 0 0 0 0 0 0 0 0
10 9 8 8 4 3 2 1 0 0 0 0 0 0 0
11 14 13 10 6 4 2 1 0 0 0 0 0 0 0
12 24 18 17 9 4 3 2 0 0 0 0 0 0 0
13 31 26 20 11 6 4 2 1 1 0 0 0 0 0
14 40 33 25 16 9 4 3 1 1 1 0 0 0 0
15 44 38 29 18 12 5 4 2 2 2 0 0 0 0
16 47 40 32 21 14 7 4 3 3 2 1 0 0 0
17 56 44 36 24 17 8 6 4 4 3 2 0 0 0
18 73 50 40 28 19 9 7 4 4 4 3 0 0 0
19 78 59 42 30 20 10 8 5 5 4 4 1 0 0
20 82 68 46 32 21 13 9 6 5 6 5 2 0 0
21 83 73 54 35 22 13 10 9 8 7 6 2 2 0
22 85 77 60 37 24 17 12 10 10 8 7 3 2 0
23 93 79 64 39 28 18 14 11 11 8 8 3 3 0
24 93 70 68 41 29 19 15 14 13 9 8 4 3 0
25 94 80 73 44 33 19 18 15 14 10 8 5 4 0
26 94 85 74 49 37 20 18 17 15 12 8 6 4 0
27 94 86 77 57 39 23 20 21 18 13 9 7 5 1
28 94 89 81 62 40 30 24 24 19 14 10 8 5 2
29 94 90 83 66 47 32 31 30 22 16 10 9 5 3
30 95 92 85 67 54 38 34 32 27 19 14 13 7 4
31 96 93 86 69 60 41 37 35 30 20 17 14 8 6
32 96 94 89 73 63 44 41 38 32 21 19 17 10 7
33 96 94 90 76 65 47 43 41 35 27 24 20 12 8
34 97 95 91 81 68 52 46 43 39 34 28 21 15 13
35 97 96 91 83 73 55 51 47 42 36 31 24 20 17
36 97 9
91 84 78 60 59 51 44 38 37 27 22 20
37 97 9 96 86 79 63 64 55 50 41 39 29 25 29
38 98 9 97 90 83 68 68 60 56 43 41 33 31 30
39 98 9 97 91 86 70 75 64 59 45 43 36 34 32
40 9  97 93 88 75 80 72 62 51 47 39 37 34
41  98 93 90 81 82 78 66 60 49 42 40 37
42 96 94 86 86 81 69 67 56 47 43 39
43   96 95 91 89 85 73 72 64 51 49 45
44 97 96 92 91 88 77 75 66 57 54 52
45 98 98 93 92 90 82 81 70 65 63 59
46 98 94 94 92 84 81 73 68 67 63
47  99 95 94 95 90 85 79 74 72 70
48 99 9 98 97 93 90 85 79 77 74
49 100 9 98 98 96 94 88 87 85 77
50 9 99 99 98 96 90 89 87 80
51  99 99 98 98 93 92 90 85
52 100 100 98 98 95 95 94 90
53 99 99 96 96 95 94
54 100 99 100 100 96 95
55 100 100 96
56 100
57
58
59
60

341
 
 
Appendix 2
Smoothed 2007-2008 Norms for the Libya in the Context of the 1989 Taiwan Data
Age in years
9 10 11 12
Percentile Li TA Li TA Li TA Li TA
95   6 1   
90     8  1 
75 1  6 32  5 
50 8 20  6


25 12     2
10 0    2  4 
5 9  9  0  2 
n 180  180 180 180

Smoothed 2007-2008 Norms for the Libya in the Context of the 1992 India Data
Age in years
11 12 12 14 15
Percentile Li IN Li IN Li IN Li IN Li IN
95 0
  6  1  50
90 22  
  8  1 49
75 18  1  6  32 5 45
50 6  8  20  6   40
25   12      2 31
10   0    2  4 16
5
 9  9  0  2 12
n 180  180  180   180 
180 131

Smoothed 2007-2008 Norms for the Libya in the Context of the 1992 Netherlands Data
Age in years
8 9 10 11 12
Percentile Li HU Li HU Li HU Li HU Li HU
95 0 43   6  1   
90 22 41 
  8  1 
75 18 37 1  6  32  5 
50 6 29 8  20  6  
25  22 12      2 
10  17 0    2  4 
5
13 9  9  0  2 

n 180 156 180   180 649 180 463 180 

Smoothed 2007-2008 Norms for the Libya in the Context of the 1998 France Data
Age in years
8 9 10 11 12
Percentile Li FR Li FR Li FR Li FR Li FR
95 0 45  47 6 51 1 52  52
90 22 42  44  48 8 49 1 50
75 18 39 1 42 6 45 32 45 5 45
50 6 33 8 36 20 39 6 41  41
25  22 12 27  33  37 2 37
10  15 0 20  28 2 31 4 33
5
12 9 13 9 21 0 27 2 30
n 180 62 180 71 180 64 180 63 180 70

342
Smoothed 2007-2008 Norms for the Libya in the Context of the 1993 Turkey Data
Age in years
8 9 10 11 12 13 14
Percentile Li TR Li TR Li TR Li TR Li TR Li TR Li TR
95 0 37  45 6 47 1 48  49 47 52 7 52
90 22 34  42  45 8 46 1 47 42 51 3 51
75 18 29 1 37 6 40 32 42 5 42 40 44  48
50 6 21 8 27 20 31 6 33  34  36  41
25  17 12 22  25  27 2 28 7 28 8 29
10  12 0 13  14 2 14 4 14  15  18
5
11 9 11 9 12 0 12 2 12  12 6 13
n 180 104 180 186 180 381 180 274 180 168 180 119 180 72

Smoothed 2007-2008 Norms for the Libya in the Context of the 1987 Kosice, Slovakia
Percentil Age in years
e     15 16 17 18
LI SK LI S LI S LI S LI S LI S LI SK LI SK
K K K K K
 1 51  53 47 54 7 55  56 8 57 49 58 52 58
8 49 1 51 42 52 3 53  54  55 48 56 50 56
 2 46 5 48 40 49 51 0 52 3 53 4 53 46 53
 6 42  44  45  47 5 49  50 39 50 41 50
  36 2 38 7 41 8 42 8 44 29 45 2 46 33 47
 2 29 4 31  34  36  37  39 5 40 29 41
5 0 24 2 27  29 6 31 9 32 19 33 20 34 20 35
N 18 - 18 - 18 - 18 - 18 - 18 - 18 - 20 -
0 0 0 0 0 0 0 0

Smoothed 2007-2008 Norms for the Libya in the Context of the 1979 & 1992 British Data
Age in years
8 9 10 11 12 13 14
Percentile Li UK Li UK Li UK Li UK Li UK Li UK Li UK
95 0 40  6 48 1 50  52 47 54 7 55
90 22 38    46 8 48 1 50 42 52 3 54
75 18 33 1  6 42 32 44 5 46 40 49  50
50 6 25 8  20 38 6 40  41  43  45
25  17 12   32  34 2 37 7 39 8 42
10  14 0   23 2 29 4 31  33  36
5
12 9  9 17 0 24 2 26  28 6 30
n 180 174 180 166 180 172 180 187 180 164 180 185 180 196
Age in years
15 16 17 18-21
Percentile Li UK Li UK Li UK Li UK
95  57 8 - 49 - 53 59
90  55  - 48 - 51 58
75 0 51 3 - 4 - 47 57
50 5 47  - 39 - 43 54
25 8 42 29 - 2 - 36 49
10  36  - 5 - 31 44
5 9 33 19 - 20 - 26 39
n 180 191 180 - 180 - 800 58

343
Smoothed 2007-2008 Norms for the Libya in the Context of the 1986 Australia Data
Age in years
8 9 10 11 12 13 14
Percentile Li Au Li Au Li Au Li Au Li Au Li Au Li Au
95 0 44 
6 1    47  7 
90 22 42    8 1  42  3 
75 18 39 1  6  32  5  40  
50 6 32 8  20  6      

25  22 12      2  7 8 
10  13 0    2  4     

5
11 9  9 
0  2    6 
n 180 - 180 - 180 - 180 - 180 - 180 - 180 -
Age in years
15 16 17
Percentile Li Au Li Au Li Au
95   8 
49 
90     48 
75 0  3  4 
50 5    39 
25 8  29 2 
10     5
5 9  19  20 
n 180 - 180 - 180 -

Smoothed 2007-2008 Norms for the Libya in the Context of the 1986 China Data
Age in years
8 9 10 11 12 13 14
Percentile Li Ch Li Ch Li Ch Li Ch Li Ch Li Ch Li Ch
95 0 44  47 6 50 1 52  53 47 53 7 55
90 22 39  43  48 8 48 1 50 42 52 3 52
75 18 31 1 37 6 42 32 43 5 46 40 50  50
50 6 23 8 33 20 35 6 39  42  45  48
25  15 12 25  27  33 2 37 7 40 8 43
10  13 0 14  17 2 25 4 27  35  36
5
10 9 12 9 13 0 19 2 21  30 6 34
n 180 - 180 - 180 - 180 - 180 - 180 - 180 -
Age in years
15 16 17-19 18-21
Percentile Li Ch Li Ch Li Ch Li Ch
95  57 8 57 49 58 53 57
90  54  56 48 57 51 56
75 0 51 3 53 4 55 47 54
50 5 48  49 39 52 43 50
25 8 43 29 44 2 47 36 44
10  36  41 5 40 31 38
5 9 34 19 36 20 37 26 33
n 180 - 180 - 180 - 800 -

344
Smoothed 2007-2008 Norms for the Libya in the Context of the 1979 & 1992 United States of America
Age in years
8 9 10 11 12 13 14
Percentile Li Us Li Us Li Us Li Us Li Us Li Us Li Us
95 0 38  42 6
1   50 47  7 
90 22 36  40  44 8
1  42  3 
75 18 31 1 
6 40 32  5  40  
50 6 23 8  20  6       
25  16 12   
  2  7  8 
10  13 0 
  2  4     
5
10 9  9  0  2    6 
n 180 - 180 - 180 - 180 - 180 - 180 - 180 -
Age in years
15 16 17 18-21
Percentile Li Us Li Us Li Us Li Us
95  
8  49 - 53 
90     48 - 51 
75 0  3  4 - 47 

50 5    39 - 43 
25 8  29  2 - 36 
10    
5 - 31 
5 9  19  20 - 26 
n 180 - 180 - 180 - 800 

Smoothed 2007-2008 Norms for the Libya in the Context of the 1998 Slovenia Data
Age in years
8 9 10 11 12 13 14
Percentile Li SL Li SL Li SL Li SL Li SL Li SL Li SL
95 0 39  44 6 49 1 51  52 47 53 7 54
90 22 37  42  47 8 49 1 50 42 51 3 52
75 18 33 1 39 6 43 32 45 5 47 40 48  49
50 6 24 8 31 20 36 6 40  44  45  46
25  16 12 21  29  33 2 36 7 37 8 38
10  11 0 14  19 2 25 4 30  32  33
5
9 9 12 9 15 0 19 2 22  24 6 24
n 180 48 180 71 180 59 180 59 180 58 180 68 180 72
Age in years
15 16 17 18 19 20 21
Percentile Li SL Li SL Li SL Li SL Li SL Li SL Li SL
95  56 8 57 49 57 52 57 50 53 54
90  53  54 48 54 50 55 48 51 52
75 0 50 3 51 4 52 46 53 46 47 48
50 5 47  47 39 48 41 49 42 43 43
25 8 40 29 41 2 43 33 44 35 37 37
10  34  35 5 35 29 36 29 32 33
5 9 25 19 26 20 28 20 30 25 29 30
n 180 67 180 147 180 127 200 43 200 200 200

345
Buy your books fast and straightforward online - at one of world’s
fastest growing online book stores! Environmentally sound due to
Print-on-Demand technologies.

Buy your books online at


www.get-morebooks.com
Kaufen Sie Ihre Bücher schnell und unkompliziert online – auf einer
der am schnellsten wachsenden Buchhandelsplattformen weltweit!
Dank Print-On-Demand umwelt- und ressourcenschonend produzi-
ert.

Bücher schneller online kaufen


www.morebooks.de
VDM Verlagsservicegesellschaft mbH
Heinrich-Böcking-Str. 6-8 Telefon: +49 681 3720 174 info@vdm-vsg.de
D - 66121 Saarbrücken Telefax: +49 681 3720 1749 www.vdm-vsg.de
View publication stats

You might also like