You are on page 1of 76
Prometheus Society 1998/99 Membership Committee Report THE PROMETHEUS SOCIETY Ignis Aurum Probat Oficars (at the time of the report) free Vaughan = prongent GiyFeaieman “. omaudsmen 1998/99 Membership Committee Report © The Prometheus Soctety #0, Box 24513 Federal Woy, WA 38093 USA All nights reserved Eeition: 1%, 2%4, 3%, 4, 5M, 6, 7, t,o, 20! Extra copies of the report may be purchased by remitting the indicated purchase price to the Treasurer. ‘The Prometheus Society Membership Committee Regort 's copyrighted by the Prometheus Society. This report nor any partion thereof shall be republished in any form without the express permission of the Society as indicated in writing by the President, By previous agreement of the members of this committee (including all officers of the Prometheus Society) with Darryl Miyaguchi (also one of our committee members), Daryl will have rights to publish this report on his website with all other rights and privileges being retalned exclusively by the Prometheus Society, This material is presented for the membership of the Prometheus Society in determining whether to accept the recommendation resulting from the deliberations of the 1998/99 Merrbership Committee deliberations, Neither the Prometheus Society nor the Membership Committee warrant this material beyond its intended use. We do not ‘maintain that there are no errors in this document. It is simply the best that we could do within the limitations of time and resources that were available to us. Prometheus web site: htto://ormetheussocietv.ora Report on-line: http://orometheussociety.ore/merept/mems comm rept html For the User identification and Password to restricted areas of the web site where the Membership Comittee Report is avaliable, you may contact the Web Site Coordinator Fredrik Ullén. TABLE OF CONTENTS IL, Appointments to the Committee I, Authority and Role of the Membership Committee 1V, Committee Operating Procedures v, commendation VE. Action items VI. Schedule Vin. Issues to be Addressed ing Definition of Prometheus Society Entry Cansitions Definition of th f Membership Comittee Eval 8.3 Review of Historical Entry Criteria 8.3.1 Assessment of Curent Prometheus Membership Inteligence Credentials 8.3.2 Review of Compromise and Erosion Threats 8.3.3 Sunveys of Capabilities and Comarisons 8.4 Review of Nowming Analvses of Currently Accepted Tests 8.4.1 Mega Test 8.4.2 Mega27 Test 84,3 Titan Test 8.4.4 Lanadon Adult Inteligence Test 18.5 Scholastic Aptitude Test 8.5.1 Backoround Date 8.5.2 SAT Data Complations with 10 8.5.3 Cautlonary Notes and Considerations 8.5.4 SAT Inteligence Filter 8.5.5 Ability of SAT to Discriminate at the High End 8.5.6 Establishing 1-in-30,000 Cut for “Old” SAT 8.6 Additional Alternative Tests 8.6.1 Mensa Testing Approaches 8.6.2 Cattell Cuture Fie UE 8.6.3 Ravens Advanced Progressive Matrices 8.6.6 Calfornia Test of Mental Maturity. 8.6.5 Graduate Record Examination 8.6.6 Miler Analogies Test 8.6.7 Wechsler Adult Intelllgence Scale ~ Revised 8.6.8 Stanford-Binet Intelligence Scale 8.6.9 Concept Mastery Test 8.7 Chronometric Testing 8.7.4 Some Backaround on Chronometries 8.7.2 Conelation of Chronometrlc Measurements and Intellgence 8.7.3 Additional References 8.7.4 Thinkfast™, the Game 8.7.5 Thinkfast!™, the Game as a Psychomptric Instrument 8.7.6 The Selective Filter Involved in ThinifastT! Score reporting 8.7.7 Discussion of Percelved Problems with Thinkfast™ 8.7.8 Thinifast™, Ablity to Discriminate at the 1-in-30,000 Level 8.7.9 One Vear Trial Recommendation 8.8 Development of Unique -- Elo-Like Scorn 8.9 Explore Crbinational Approaches 8.10 Review Phrasing of Intelligence Claims in Prometheus 1X. Definition of Terms X. Matherratical Conceots and Methods Appendix XI, Membership Comittee Resume Data XII, References Books © Aiticies ‘© Web sites and web paaes FIGURES Figure 1: Difficulty of Compromised Mega Problems 17 Figure 2: Meaa vs. SAT Score Correlation 22 Figure 3: Equipercentile Equating of Mega and SAT 22 Figure 4: Correlation of Mega vs. Other Test's Score Pairs 23, Figure 5: Equipercentile Equating of Mega and GRE 24 Figure 6: Equipercentile Equating of Mega and CTMM 24 Figure 7: Meaa48 IRT Tast Scoring 25 Figure 8: Distribution of Mega Test Raw Scores for Sixth Norming 26 Figure 9: Mega 10-Scaled Distribution (actual, predicted, general population) and fiter 27 Figure 10: Mega 10-Scaled Distribution (actual and predicted) -- Jog scale 28 Figure 11: Correlation of Score Pairs of Mega27 and Megn4a 31 Figure 12: Meqa27 I&T Test Normina 31 Figure 13: Meaa (48-item) Test Scoring -- Traditional vs. Maximum Likelihood 32 Figure 14: Meqa27 Test Scoring -- Traditional vs. Maximum Likeinood 32 Figure 15: Titan vs, Meaa (48: item) Correlation of Score Pairs 34 Figure 16: Titan vs, Meaa Equlpercentile Equatina 34 Figure 17: LAIT vs, Mega (48-tem) Correlation of Score Pairs 35 Figure 18: SAT (Verbal Plus Mathematical Parts) Frequency date 40 Figure 19: Population Distributions for the SAT (general, actual, predicted) 40 Figure 20: SAT Actual and Predicted Distributions --log scale 41 Figure 21: SAT Discrimination Capabilities (Test) 41 Figure 22: SAT Discrimination Capabilities (Test2) 41 Figure 23: GRE Equipercentile Equating with SAT for reported Score Pairs on Mega 48 Figure 24: GRE Comelation with MAT for 1341 Score Pairs 4 Figure 25: Extent of Data for CMT 54 Figure 1X1: A Nonral Distribution Figure X1: Illustration for Confidence Interval Determination 76, Figure X,2: Difficulty Profle, o9(Cx1, for Problem #11 on the Maga 8: TABLES Correlations of 19 Tests with Mega 23 Mega “Verbal vs. ‘Non- verbal" Factor Analysis APH General Population Percentile by Age Group 45 APH General Population Percentile by Age Group -- Extended to 4 sigma 45 ABH Norms for Various Occupation Groups -- mostly UK 46 APM Untimed Smooth Summary Norms for USA 46 GRE Percentiles for Fitered Population 49 WAIS-R Rearession Equations for the Full Scale 10 $1 Stondarcization Sample for Stanford Binet 53 I, PURPOSE ‘The purposes of the deliberations of the 1998/99 Prometheus Saclety Membership Committee were several, One purpose was to address concems about the leakage of information over the Intemet on tests accepted for qualification for entry to the Society. It was also to investigate the possbbilty of including a broader scope of tests of cognitive ability while ‘maintaining the 99,997 percentile (1-in-30,000) of the general population Prometheus Society selection level criterion, Another purpose was to analyze the current entry criteria an accepted tests to determine whether the 1-In-30,000 criterion 's being maintained by all. Some of these issues were identified by Kevin Langdon in "Admission Standards" (Gift of Fire, Issue 99, 7, Septerrber 1998). As outlined by the chairman Fred Vaughan in “The Membership Committee and Its Charter (Git of Fire, Issue 100, 6, October 1998), it has been our objective to have a recommendation to the general membership of the Prometheus Society by the deadline for publication to Gift of Fire issue #102 (submission deadline January 9, 1999) with balloting to take place in issue #104. It has also been our intent from the outset to produce 2 report that will be avaliable to members and nonmembers iho wish to scrutinize the membership entry requirements of the Prometheus Society; we hope thereby to eliminate disputes -- or at least to provide data to make such debates more meaningful ‘The academic Iterature on psychological testing or psychometrics is now huge. No concerted attemot was made to make comprehensive review of this Iterature, but see, for example, the following recent works to get a flavor of this field: Benbow & Stanley (1996), Van der Linden (1996), Nunnally & Bernstein (1994), Murphy (1997), Janda (1998), Fischer & Molenaar (1995), Kline (1993 and 1998), Crocker & Algina (1986). We do not claim that our results are indisputable nor that there are no flaws or oversights in the analyses presented here. We present this asa start in what must be a continuous process of maintaining the integrity of our entry criteria. This report attempts to address concems such as that expressed by James Harbeck in his brief note entitled, "Questions Conceming the Membership Committee", (Gift of Fire, Issue 83, March 1997). The membership requires more than a recommendation -- they require Information in orcer to know whether to support that recommendation, We think we have provided that date, II. COMMITTEE APPOINTMENTS The President and Membership Officer are constitutionally installed members of the Prometheus Society Membership Committee as described in section IIT of the Prometheus Society constitution duplicated in section III of this report below. FRED VAUGHAN, President and Chairman ROBERT DICK, Membership Officer ‘The following additional Membership Comrrittee members have been appointed by the President because of thelr specific expertise in areas pertinent to the problems of evaluating the qualty of psychometric tests and their associated statistics FRED BRITTON GUY FOGLEMAN GREG GROVE GINA LOSASSO BILL MeGAUGH DARRYL MIYAGUCHT FREDRIK ULLEN HEDLEY ST. JOHN-WILSON ‘The resumes of these individuals are provided at the back of this report, We feel fortunate in obtaining the services of such highly qualified peopl. We appreciate our Membership Officer, Robert Dick's constructive participation which was unfortunately limited by serious ‘medical problems. Robert has asked that we print the following statement, "Ihave been @ Constitutionaly mandated member of the Membership Committee. In that capacity I have supplied member score data santtized so the names cannot be identified. Due to personal ilness, among other reasons, that is about all of my Contribution. Accordingly I cannot claim for myself the honor of being an author of the Committee report. My hat is off to the many expert anc dedicated members who deserve both the honor and the responsiallty for the report.” Robert Dick ‘The chairman hereby signs this report on behalf of the other members of the committee. Fred Vaughan, Chairman Membership Committee President Prometheus Society III, AUTHORITY AND ROLE OF THE MEMBERSHIP COMMITTEE ‘The role and authority of the Membership Committee as well as the President and chairman in their respective capacities on this committee are defined in the constitution of the Prometheus Soclety as follows: “ARTICLE III - QUALIFICATIONS FOR MEMBERSHIP 1. All members of the Prometheus Society as of December 1, 1996 are presumed to have satisfied the membership requirements, 2, Membership in the Prometheus Society is open to anyone who can provide satisfactory evidence of having received a score on an accepted IQ test that is equal to or greater than that received by the highest one thity thousandth of the general population. An accepted IQ test is defined as an IQ test that the Society has determined to be acceptable for admission purposes. 3. The President shall appoint a Membership Committee to rule on the acceptabilty of various IQ tests, to determine What minimum scores on each test qualify for admission, and to periodically review and make recommendations on admission standards in general 4. The committee shall consist of the President, the Membership Officer, and at least three other members such that ‘a majority of the other members are recognized as having experience in the field of psychometrcs, 5, The committee shall propose to the membership specific guidelines on tests and test scores for the Membership Officer to follow. Upon ratification of these guidelines by membership vote as specified in Article IX, they shall become binding on the Membership Officer.” In Article X.3 it says, “3, The President shall act as the coordinator of activities of The Prometheus Society, answer inquires which are not within the jurisdiction of the other officers, and be the official representative of the Society to the public. The President may appoint members individually or to a committee for the purpose of carrying out various functions. Appointed members serve at the discretion of the President. A comittee chair shall report to the President unless otherwise directed by the Presider And in Article X.11 it further elaborates that “LL, An officer must be a member of the Society. However, the President may appoint non-members to perform certain functions such as an expert to supervise testing, an attomey to represent the Society in legal matters, an accountant to aualt the books, etc We believe that we have acted in accordance wit the intent of the constitution in every aspect of our proceedings. IV. COMMITTEE OPERATING PROCEDURES ‘The committee performed its business primary aver the Internet using e-mail messages that were routed only ta other members of the Membership Committee except as authorized specifically in writing by the chairman. (This was felt to be particularly important because we would be discussing topics that could compromise the tests accepted for qualfication to the Society.) Web site files were also used but if they pertained to the Membership Committee exclusively, they were either password protected or their URLs were not disclosed outside of the committee except as authorized by the chairman, Individual Membership Committee members have interacted arrong themselves at their own discretion, but only information| routed to all Membership Committee members was considered for inclusion in this final report including the recommendation, to the general membership for balloting. Individual members or Membership Committee splinter groups defined by the Chairman to perform specific tasks, have reported their findings for discussion by the entire committee, Discussions of specific probleme and whether or not they were to be considered compromised based on answers circulated and the specifics of where such data is available typically involved only a subset of the committee, Each step (agenda iter) in the dellberation process was documented by the chairmen or his designee using materials generated by the Membership Committee and the results were routed for comment and consensus. No itemwas closed out Until every member of the Membership Committee had been given a reasonable opportunity to review and comment upon i. ‘This required a 24-hour minimum per tem to accommodate aur work’ wide Membership Comittee membership. Membership Committee members checked their e-rrail regularly and responded to those items for which they had specific interest or concer. (They were encouraged to notify the chairman if they would be aut of contact for more than 24 hours. An effort was made to keep consensual actions from occurring on weekends.) Requested delays prior to concluding a Membership Committee decision were honored without exception. Requests for delays were requested to be accompanied by specific rationale and/or the data that the requester wished the rest of the committee to consider, Decisions identified as being made by the chairman (other than the appointments to the committee), have been consensus positions wherever possible. The chairman acted primarily as a focal point of that consensus to reduce chaos. Procedures Were subject to modification as we went along but the procedures documented here were essentially the procedures that we followed throughout, Specific positions argued anc quotations of individuals during the deliberation of the Membership Committee will remain Confidential, Detailed rationale for all recommendations of the committee are provided in this final repart signed by all comrrittee members. A pledge of confidentiality for the ciscussions in deliveration was a prerequisite for continued appointment to this committee. It was decided that a single consensus position would be incorporated Into this report if Such a consensus could be obtained, If more than a single individual shared @ position counter to the consensus, that position is summarized in the report as well subject only to the desires of those sharing the position. Intellectual rights to publication of material generated as a part of the deliberations of this committee belong to the individual or individuals who generated the material, but publication must be approved by the committee as expressed in writing by the chairman to assure the following: 1) All individuals who contributed to the material to be so published shall be Cited If they so desire and 2) No data contained in the material to be published shall compromise Prometheus Society entry criteria Agreement to these operating conditions has been a prerequisite for continued appointment to this comrrittee. Concurrence with these conditions js tacit by a member's not having notified the chairman of a wish to resign appointment, V. RECOMMENDATION ‘The following recommendation of this membership Committee nas been printed in issue #102 of the Gift of Fire (submittal deadline 9 January, 1999) which was mailed out together with a hardcopy of this report to all hardcopy members of the Prometheus Society and hardcopy subscribers of record to the Gift of Fire. On-line members and subscribers have been notified of availabilty of the report on-line at . 5.1 Statement of Recommendation We on the Membership Committee are proud to present to the Prometheus Society our proposel for revised entry requirements to the Saciety. We aver that itis our considered opinion that this recommendation, if adopted by the ‘membership, will be in the best interest of this Society and its members. Our recommendation is as follows: Entry Into the Prometheus Soclety based on a Mega or Titan score shall no longer be allowed after the date of Issuance of the issue of Gift of Fire in which acceptance of this recommendation is indicated to have been ratified by the membership. Anyone having secured a raw score of 36 on elther of these tests dated before that date shall be entitled to rights and Privileges of the Society, Anyone with 2 score of 164 or greater on the LAIT scored before December 31, 1993 shall be entitled to rights and privileges of the Society. Anyone with a score of 1560 an the "old" SAT (taken before April 1, 1995) shall be entitled to rights and privileges of the Society. Anyone with a score of 1610 on the Society. 1d" GRE (Laken before October 1, 1981) shall be entitles to rights and privileges of the Anyone with a score of 98 on the MAT shall be entitled to rights and privileges of the Society. Anyone with 2 raw score of 88 on the Cattell Culture Fair IM (A+B) obtained at an age of 16 years of age or older shall be entitled to rights and privileges of the Society. Anyone with a score of 160 on the WAIS-R obtained at an age of 16 years or older shall be entitled to rights and privileges of the Society. Anyone with a score of 21 on the Mega27 shall be entitled to rights and privileges of the Society, If @ validated accompanying score on an accepted test for demonstrating a I-in-1,000 cognitive abilty according to that test is provided to the Membership Officer along with proof of the mega27 score. {And, for a trial period of one year: Anyone with a validated score of Brain Master +11 on the chronometric battery provided by Thinkfast™ obtained at {an age of 16 years of age or older shall be entitled to rights and privileges of the Society, if a validated accompanying Score on an accepted test for demonstrating a 1-in-1,000 cognitive ablity is provided to the Membership Officer along with proof of the Thinkfast™ score, After one year, the follwing data will be used to determine whether to retain the test permanently, extend the tral period or discontinue this test as an entry requirement to the Society, 1, numbers of applicants to the Society who use this Thinkfast™ test criterion, 2, accompanying scores on standard tests of applicants who use this Thinkfast™ test criterion, 3, additional statistics available on high scores of Thinkfast™ participants, 4. our increased understanding of Thinkfast™ as a chronometric/psychometsic instrument. Where a score of 1-in-1,000 is required on one of the following tests: ACT (32), Old" SAT (1450), New” SAT (1520), GRé (1460), GRE Analytical (760), MAT (85), Stanford Binet IV (149), Weschler Intelligence Scales (146), Cattell CF Ill (A+B) (149), Cattell Inteligence Test Scale IIIB (173 old norming), RAPM IT (150), Mensa Admission Tests (149), Cognitive Abiities Test (CAT) (149) 5.2 Rejection of the recommendation If our recommendation is rejected by @ majority of voters, the Prometheus Society will retain the entry requirements established by vote in 1997. VI. ACTION ITEMS We have accepted the following outstanding items that we recommend for further action, 6.1 Obtain written agreement with Ron Hoeflin on Mega27 Firm up agreement in principle with Ron Hoeflin on scoring procedures and application processing for the Mega2? test. Also obtain written specifications of how profile data is to be hancled by Membership Officer. 6.2 Evaluate Titan test We have agreement in principle with Ron Hoeflin to obtain data for 500 individuals who have taken the Titan test. We must perform analyses simlar to those which gave rise to the Mega2? for the Titan to avoid compromised problems. Also solidify orrring for the Titan 6.3 Consider the relationship of age and intelligence There are @ couple aspects of 19 variations with age that must be considered in some depth with regard to our entry requirements: 1, whether to allow test results for individuals under 16 years of age and 2, whe er to consider an age profile (particularly applicable to those over 30 years of age) for inteligence crtera 6.4 WAIS subtest qualification possibilities 6.5 Evaluate results of one-year trial period of use of ThinkFast to qualify applicants for Prometheus. After one year, the folowing cata will be used to determine whether to retain the test permanently, extend the trial period or discontinue this test as an entry requirement to the Society 1 numbers of applicants to the Society who use this ThinkfastTM test criterion, 2. accompanying scares on standard tests of applicants who use this ThinkfastTM test criterion, 3. addkional statistics avallable on high scores of ThinkfastTM participants, 4, our increased understanding of ThinkfastTM as a chronometric/psychometse instrument 6.6 Investigate tests in other languages, including translations of English tests. VII. SCHEDULE The following are the dates and accomplishments that we had originally scheduled. We were considerably off-schedule from time-to-time but t cid nonetheless draw us back to reality. We feel that we have accomplished the pressing tasks that were befare us. 10/09/98 Submittal of incividual MC member synopses (or scuttle decision) 10/10/98 Acceptance of an agenda and operating procedures 10/10/10 Definitions of applicable terminology 10/15/98 Descriptions of applicable mathematical methods 11/10/98 Review of our currently accepted tests and possible etrasion 12/10/98 Review of alternative " tests, Mensa-monttored, SAT, GRE, etc. 12/28/98 Review of chronometric test proposals. 01/02/99 Considerations of composite criteria 01/09/99 Review of constitution and creation of MC recommendation 01/15/99 Publication of Report VIII. ISSUES TO BE ADDRESSED 8.1 Operating Definition of Prometheus Society Entry Conditions Entry criteria for membership inthe Prometheus Society are based on verfible claims of a particularly high level of intellgence. A 99.997 percenthe or 1-In-20,000 ofthe general population has been maintained as the goal; the accuracy with which we have been able to meet that goal inthe past and intend to maintein In the future are ciscussed os a part Of the analyses of this report, ‘With regard to the question, "What is the intelligence that should be assessed at this level?" we have been somewhat reticent to assert an answer. In other words we nave waffled somewhat on whether, itis a flud inteligence factor (spatial/abstract reasoning) or a crystallized intelligence factor (accumulated knowledge and verbal skils). A consensus of the Membership Committee believes it should be the former. Hedley St. John-Wison gives the evidence for a general factor in his article, "The Scientific Evidence Behind ‘General Intelligence’ Tests" (Gift of Fire, Issue 95, January 1998). However, the Membership Committee is quite divided on whether the “fluid inteligence factor” isa single or many biologically- based Ccapabilties. It is also divided on the ablity of individual tests to effectively ciscriminate between a single general and a combination of many specific mental capabilities. The articles, "What Is this thing called '' or Gee, what is this thing called?” (Gift of Fire, Issue 80, November 1996) and 'What Intelligence is...isnt...is too!” (Gift of Fire, Issue 82, February 1997) by Robert Low, Ronald Penner's "Gee, Maybe There's More to 'o"(Gift of Fire, Issue 82, February 1997) and "Discussion of the Central Lintt Theorem as Applied Specifically to Overall Intelligence" (Gift of Fire, Issue 82, February 1997) by Fred Vaughan all address this debate, These conceptual and philosophical disputes involve medical, anatomical, psychological and genetic expertise which have not been adequately represented on our team, See for example, Fredrik Ullen's article, "The Multiple Biological Correlates of g", (Gift of Fire, Issue 100, October 1998), David Roscoes "Group 1Q Tests" (Gift of Fire, Issue 81, January 1997) and Fred Eritton’s "Is There a Physical Substrate to Intelligence", (Gft of Fire, Issue 83, March, 1997) We, therefore, have decided to restrict our assessment of testing capabilties to the statistical validity of accepted psychometric instruments to correlate well with other accepted instrurrents and to discriminate individuals at the 1-in- 30,000 level, We recognize that individuals selected via different tests ray differ in their thinking abilties accorcingly, but teach wil have satisfied the ostensible requirement of being in the top 1-in-30,000 of the general population with respect to cognitive abilities measured by one of these tests. It is generally agreed that the general inteligence factor ("g") will Influence performance across the spectrum of cognitive abilties measured by such tests and will result in at least @ moderate 9 loading (~0.5 - 0.6) for an accepted test, The Membership Committee is also in agreement that "1-ln-30,000" rather than "4 sigma" is our target since the former claims nothing with regard to the distribution of the population as assessed by the test, restricting its emphasis to the rarity of inaividuals in this category, ‘The Issue of age restrictions for entry to the Prometheus Soclety has been discussed and it has seemed reasonable to us at this time not to accept individuals under the age of 16 years, although we are somewhat spit with regard to the age lit for scores that can be allowed, We think this issue should be addressed at a later time when there is more time to fully evaluate the date -- we have taken such an action item, Our current recommendation of the 16 years of age limitation derives in part from of our concern with regard to what right otherwise give rise to restrictions on subject matter in the journal. Its also related to concems that too early testing has been shown in many cases to significantly overestimate inteligence. See for example Michael Colgate's article, "P's and Q's of Intelligence" (Gift of Fire, Issue 97, July 1998) in Which he presents cogent arguments suggesting that another aspect of intelligence which he calls "precociousness" that applies exclusively te younger children may render rather unrealistic 1Q scores on tests taken at a young age. Also Sare resents arcuments and orecictions that discount Stanford-Binet scores for vounaer individuals. (See “ehttp://wwiw.brain.cony/bboard/read/iq-archive2/2351>.) If the recommendations of this Society are approved, we will begin a new era with new members joining based on a diverse spectrum of psychometric instruments, but each with credentials establishing him or her at the 1-in-30,000 level of Capabilities as measured by a particular instrument. Many of these tests (in fact most standardized tests of mental abilty) ‘make no claims for being able to discriminate intelligence beyond 150 1Q. Our acceptance is based on frequency data Indicating that the rarty of 1-in-30,000 is attained independent of the particular intelligence claims made for that distinction. We have opted in all cases to base our recommendations on as solid a factual foundation in available data as possible and not on the claims of developers and/or distributors -- nor yet the detractors -~ of these Instruments. It Is of particular interest that in Joseph Matarazzo's book, Wechsler's Measurement and Appraisal of Adu Inteligence, (Sth e., 1972), he attributes lowered ceilings to intentional acts based on presumptions by the test developers themselves of 2 lack of utilty of inteligence above the 150 IQ level. “The lower celing of the W-B (Wechsler-Bellevue) and the WAIS is no accident but represents the author's deliberate attempt to eschew measuring abilities beyond points at which he feels they no longer serve as a valid measure of @ subject's general intelligence. IQ's of 150 or more may have some discriminative value in certain fields, such as professional aptitude, however, It ls only partially related to general intelligence. Exceptional intellectual ability Is itself a kind of special abilty.” So with the Wechsler test we have @ case in which experts conflict -- the experts who believe the regression tables for the WAIS-R are val, and David Wechsler, the author of the test, who deliberately truncated his scale at 150. This somewhat cynical presumption that what is good up to a point is not so good beyond, should be a rallying cause for this Society. If our recommendations are accepted, the Mega27 test (a subset of the Mega test defined by this Membership Committee for which the developer has agreed to provide scoring capabilties) wil be the only existing tie to former testing ‘methodologies and Prometheus Society entry qualifcation criteria. We also have an agreement in principle with the developer of the Titan test in which he has expressed wilingness to provide data with which we rray perform analyses similar to what has been done to obtain the Mega2/ test. We have taken an action item to perform such analyses so that hopefully a version of the Titan test can be reinstated among our recommended tests, The elimination of formerly accepted tests has not been intentional in the sense of discrediting former methodolagles and entry criteria but rather a requirement imposed by compromises that have occurred to these previously accepted tests, We are hopeful that we will be able to provide similar capabilties in the future ns 8.2 Definition of the Scope of Membership Com In its recommendation, the Membership Convnttee has acted to maintain the integrity of the Prometheus Society entry crnteria and enable continued enrolment into the Society to anyone whose credentials can be verified as meeting those enter The Prometheus Society will be forced to reevaluate the specifics of its entry criteria whenever new information emerges and is made available to the Society concerning any of the following: 1. New tests become available (or are presented to our attention) for which normative data is available to justify claims of their being able to effectively discriminate at the 1-in-30,000 level, New normative data has become available for existing tests which supports discrimination at the 1-in-30,000 level, ‘An accepted test is retired for any reason, The integrity of an accepted test is found to have been compromised either by having been incorrectly normed in the light of new evidence or answers to its problems having been too widely distributed: The scope of this job is indeed awesome. The number of psychometric tests claiming to assess intelligence is vast. Whereas rmany of these address ranges of intelligence or assessment ages we have determined inappropriate for entry requirements. for the Prometheus Society, even determining these facts can be quite time consuming. The list on the next gage is but a sample from the world of IQ tests, On merely one of these over a thousand papers have been written. We do not claim to have assessed many of these tests but if provided with good reasons and access to good data, we think the Membership Committee should continue its search for additional valid tests, A specific task before the Membership Committee was to deterrine whether any of the changes identified above have ‘occurred with respect to accepted tests, which would necessitate entry crtera changes at this time. We think there have been and have acted to assure that we have maintained reasonable means for entry to the Society, To warrant that 2 test or methodology satisfies membership crteri perform analyses to verify the folowing: 1e Membership Committee has felt it appropriate to That samples used to validate and norm tests are of sufficient size and are sufficiently representative within the required constraints of intellgence filtering to substantiate legitimacy claims and establish valie norms for our purposes. 1In order to set a 1-In-30,000 of the general population cutoff on 2 test, "good psychometric practice” would probably require that a generally accepted highly g- loaded test be adrrinstered in a supervised manner to milions of individuals randomly selected from the general population, This wealth of data is not, nor will t probably ever be, available so an alterative approach has to be employed. When used carrectly, the quantification of inteligence fitering to assess the degree of selection on those who actually take the tests is 2 legitimate method that must be relied upon. See Vaughan's “Intelligence Fiters" (Gift of Fire, Issue 79, October 1996). Similarly, extrapolation beyond tracitionally accepted norms may in some cases be warranted depending on the qualty of the data and the degree to which it must be extrapolated. 2. That appropriate types of reliability estimates have been determined for the test 3. That the necessary statistics have been used properly to compute these estimates? Having a test with well-established general population percentiles at 1, 2, or even 3 sigma points does not imply that the 4- sigma point corresponds to a rarty of 1-in-30,000: the scare distrbutions may depart significantly from Gaussian at the extreme tails, Data on the shape of the score aistributions for the standard 1Q tests at the extreme tails is too thin to allow the suppasitions concerning the distribution to be used to establish the 1-in-30,000 cutoff, We have preferred actual numbers of high scoring indivicuals out of a known population whenever these data were available. Some Available Psychometric Instruments (very few of which we have been able to consider) act American School Intelligence Test-High School Battery ‘American School Intelligence Test-Primary Battery Analysis of Learning Potential-Advanced I Battery Analysis of Learning Potential-Advanced Il Battery Arthur Point Seale of Performance Test BAS -- British Abilities Scale Alack Intelligence Test of Cultural Homogeneity (BITCH) California Short-Form Test of Mental Maturity Cattell Culture Fair Inteligence Test-Scale 283, Chicago Non-Verbal Examination Cognitive Abilities Test Form 5 1993 Counter Intelligence Test-Chitings Detroit General Inteligence Exam-Form A Full Range Picture Vocabulary Test cma Gracuate Management Assessment (UK) Goodenough/Harrs Drawing Test RE Henmon/Nelson Test of Mental AbiIty Henmon/Nelson Test of Mental Abilty-College Level-Rev Ed Hiskey/Nebraska Test of Leaming Aptitude Kuhlmann/Anderson Tests-8th Ed Langdon Adult inteligence Test (LAIT) Retired Leaming Efficlency Test-II (LET-11) 1992 Letter Intemational Performance Seale Lorge/Thomdixe Intelligence Tests usar Mar McaT Mega Test Oregon Academe Ranking Test tis/Lennon Mental Abilty Test-Advanced Level Peabody Picture Vocabulary Test 3rd Ed Form IIIA (PPVT-IIIA) 1997 Pintner/Cunningham Primary Test-Rev Pressey Classification & Verifying Tests PSR (Psychological Stimulus Response) Quick Test Raven Advanced Progressive Matrices -- Sets 1 & II Ross Test of Higher Cognitive Processes Slossen Full-Range Intelligence Test (S-FRIT) 1993 Slossen Inteligence Test (SIT-R)-Rev Ed 1990 SAT ‘SRA Pictorial Reasoning Test ‘SRA Primary Mental Abiities (PMA) Standard Progressive Matrices Stanford/Binet Intelligence Scale-4th Ed Stanford Ohwak/Kohs Block Design Intelligence Test for the Blind System of Mutticultural Pluraistic Assessment (SOMPA) Test of Cognitive Sklls 2nd Ed (TCS/2) 1992 Test of Nonverbal Inteligence 3rd Ed (TONI-3) 1997 ‘ThinkFast (Chronometric) Titan Test Wechsler Adult Intelligence Scale-Rev (WAIS-R) Wechsler Adult Inteligence Scale 3rd Ed (WAIS-III) 1997 Wide Range Inteligence & Personality Test (WRIPT) \Woodcock-Johnson Psycho-Educational Battery-Rev (WJ-R) 1989/90, 8.3 Review of historical entry criteria 8.3.1 Assessment of Current Prometheus Membership Intelligence Credentials ‘The Prometheus Society was founded in 1982. It's intial constituency had all been members of the former Xenophon Society which had entry requirements of 1-in-10,000 of the general population, an 19 of about 160. Notwithstanding many of these Initial members were qualified at the 1-in-30,000 level and beyond according to accepted psychometric instruments. The Initial entry requirement, once Prometheus had been established, was set at the 1-In-30,000 level which was incorporated into the Prometheus Society constitution. ‘There are currently 67 members of the Prometheus Society that are in good standing. There are upwards of 150 to 200 who have been members at one time or another, By checking our current roster and that which was first published in issue #2 of Gift of Fire (July 1984) shortly after the Society was formed, it has been determined that there are no more than 8 currently active members who could have been ‘admitted under the Xenophon cut-off of 1-in- 10,000. That assumes that no other Xenophon members who weren't active In uly 1984 have since joined using their prior Xenophon membership as entry qualification. We believe that to be the case. Within the constraint of 1-in-30,000, the specifics of membership criteria have changed over the years with various tests and acceptance levels having been used that reflected that requirement. However, according to the Membership Officer's records, the current Prometheus Society average 1Q according to LAIT, Mega, and Titan test normings (using each test Independently or using all the data) Is about 167. This s what would be expected statistically for a saclety With a 1-in- 30,000 cutoff. Using data derived from the Membership Officer's records, the following further characterizations can be made: The average and median for the current and former members taking the LAIT in the 1978-79 time frame is the sare as for the 1992-93 ‘ime frame (around 166-167). The average and median for the current and former members taking the Mega test in the 1984-85 time frame Is about 1 point lower than for the members taking the Mega in the 1994-98 time frame (Mega average {and median are around 37-38 in 1984-85, around 38-39 in 1994-98, and around 38-39 for 1998 alone). Differences over the years do not seemto be statistically significant. For the Mega test calculations, this result did not include scores below the current Prometheus Society cutoff of 36 and thus @ conclusive result would require a comprehensive review of Dr. Hoeflin's scoring data for the respective years for which the average is a raw score of 35.5, 8.3.2 Review of compromise and erosion threats and discussion of the appropriate reactions ‘A major reason for the current Membership Committee's urgency is the concer with regard to rumors that there have been significant compromises to our entry criteria tests. This aspect of our deliberations has been a priority and we feel that we have obtained a good understanding of the threats and the actualities of compromses over the Internet and via other media. Our recommendations reflect that understanding Compromises to the Mega: ‘There have been several different types of answer distribution problems on the Mega test. The numbers of, and difficulty Index associated with, problems that have been leaked in various categories from easiest to hardest are captured in the ‘graphic below. Also shown are the means whereby each problem has been compromised, ‘Mega Test hares OBOOOO eoacoa eoo0ocon @e0000 @eooacoo @eooooo0 @eeecoo exiet OOOCOO answer clicited by ‘fishing’ on Usenes answer can be found on Internet non-analogy answer which ean be fonnd in @ book answer was published in a national magazine eoee Figure 1: Difficulty of Compromised Mega problems Although the graphic contains only two compromised problems at the highest difficulty, some feel that at least three of the spatial/numerical problems have published solutions in Martin Gardner's books and/or other puzzle classics (references needed). Ron Hoefin denies that they originated at that source, Ine hve naruest proviens Liat nave DEEN WeKea ere Lie Ones thet WOUNU TeSUR In LE Must Signicant Bact on Ne Prometheus Society, In order to get a score of 36 (a current entry criterion) on the Mega test by cheating one would need to have gotten these flve correct plus 31 others. Someone who can solve 31 extremely difficult problems without cheating would probably be able to solve the 10 easiest leaked problems on his or her own without cheating, Thus, even with the leaked answers, the best someone would be able to da would be to turn a legitimate score of 31 into @ with-cheating score of 36, By the 6th norming of the Mega Test, a score af 31 corresponds to an 1Q of 158. So the impact on the Prometheus Society of leakage of these problems is not felt ta be extremely significant at this time. ‘The existence of on-line integer sequence solvers compromises all integer sequence problems -- If they are nat already solvable someone wil solve them so we feel that we should include them in the list of compromised problers. A point that may need some reconsideration in the future is that several of the problems in the Mega test appear to be easily solved/checked with computers. At least 7 of the non-verbal problems could be solved with a fairly siele computer program. A professional programmer might perhaps attack even more problems this way. This raises the question of whether We are unduly slanting our crtera to computer professionals, For a counter argument you might refer to "Sweetness and Stinging from the Honeycomb Series" (Gift of Fire, Issue 101, November! December 1958). We are faced with the question: Should the Mega test be retired in order to keep out dedicated cheaters with IQs in the 158 - 163 range? At @ minimum several precautionary moves can and should be implemented to ameliorate the problem as, for example, eliminating test questions that are known to have been compromised. The Mega testis stil a valuable instrument in that, largely because of Darryl Miyaguchi's web site, many new people are becoming familar with the High IQ Societies and taking this high range test, The Prometheus Society continues to receive an appreciable number of new applications which may derive in part from this cause, This seers to us to offset some of the negative aspects associated with the possibilty that the Soclety might thereby accept a few individuals at this time who are only marginally qualified for membership because they may have found a leak before we have. However, we must take care of the problems of which we are aware and address the possibilty of continued erosion of the test with continues vigilant survelllance. Notice that a certain amount of trust is Involved even if the Mega test answers are not available on the Internet. In particular, there is no way to verify that a test-taker worked independently. Also, the "leaked" answers available on the Internet are not exactly easily available, One can not just use a search engine to search on "Mega test" to obtain the answers. The answers to the hardest five of the leaked problems are available separately on unrelated sites which would therefore require some ingenuity and persistence. However, itis our consensus opinion at this point that we cannot warrant all 48 questions of the Mega test for qualfication to the Prometheus Society. As you will see further on, we have defined a subset of questions which would constitute a test in ts own right that we have shown to be able to discriminate at the 1-in-30,000 level. This test (the Mega27) elimnates all known compromises to the test as well as a few of the very simplest problems that may certainly be compromised in the hear future and which have added Iittle of value as we have demonstrated in discriminating at to the Prometheus Society's desired cutoff level and beyond, ‘The entire Mega test will probably nees to be retired in the not too distant future, Altemative high range tests may be available before that time comes. Compromises to the Tita ‘The Titan is a newer test and appears on face value to be a more difficult test than the Mega which may have protected it somewhat from those individuals on the Internet who have concentrated on "cracking" the Mega However, the number sequence problems are compromised in the same way as for the Mega, and lacking the tem data that we have on the Mega, we have been unable to come up with a method for estimating scores If we exclude the sequences. Therefore, unless and until we obtain normng data for the Titan, we feel that we must remove the Titan from our recommendation of approved tests for qualfcation for the time being. Notice that we have obtained agreement in principle with Or, Hoeflin whereby he will provide us with data for 500 examinees with which we can perform analyses similar to those that gave rise to the Mega27. We have accepted an action item to Perform those analyses and report back to the membership on our conclusions. 8.3.3 Surveys of capabilities and comparisons of various segments of the population and psychometric instruments ‘There are a considerable number of summaries and reviews that have been published in High 1Q journals and elsewhere which review relative ranges of coverage of different tests and expected intelligence of various segments of the population. However, these summaries are not all in agreement and typically do not include data at the level of our entry requirement So we have used them primarily for orientation and guidance. Greg Grove, psychometrician of the Triple Nine Society (TNS), published data that relates percentile rankings of various segments of the population, relating them to Mega test raw scores in his article, "IQ/Percentile Ready Reckoner" (VIDYA, Issue 177, July/August 1998). A problem with this review from the Membership Committee's perspective is that since it was prepared with TNS in mind, t only goes up to the 99.$th percentile There is also a survey of numbers of participants in various High 1Q groups by percentiles presented by Guy Fogleman in "An Amateur Statistical Analvsis of Hi-TO Society Membershin Trend” (Gift of Fire. Tssue 97. °6 - 17. July 1998). Kjeld Hvatum provides a table of 1Q percentiles versus scores on various psychometric instruments including the Mega in his "Letter to Ron Hoetlin (In-Genius, Vol. 15, August 1990) that shows comparative raw scores at percentiles up to and beyond the Prometheus Society cutoff level. This table is provided below as reference only. It has not been validated by ‘the Membership Committee and Is not @ part of our recommendation per se. But it represents of the kind of digestes information that is available which has led us to investigate some of these tests in more depth and place less emphasis on others. Selectivity by 1.Q.* and other scores that correlate well with I.Q. WAIS CLASSIFICATION, $ile in the general population gescriptions, 1 standard deviation High-T9 societies, — | | 10 0-15 - wars, wrse v= there and down | || 8B-26 - Binet, cTtm, otie-tennen 1 hon PROFOUND RETARD.---v 130-8 | «G0 0719 $0-23.7 ~ Catte2l werbal) SEVERE SBTARD.-----v 280-4 | 25. 20'|_— SAT Vorbal MODERATE RETARD.---v 10031 | 40 36 || GRE Vorbal MILD RETARD.-------v 13 | 5852 1 1 | Mitter anategies BORDERLINE REARD.-v 2.3 Fo 68 1 tL SAP verb DULL-NORMAL~ v 1 191 1 1 | Maga test AVERAGE 1 Cr general pop. ave.---50.0 too 200 320 1 high seh. grad ave.-60.0 toa tos 370 | tos 212 a0 | BRIGHT-NORMA mie 0 1 113 120 450 420 college grad ave,---84.1 116 124 470 440 120 230 500 470, SUBERLOR~- 221 232 510 480 122 235 520 500, Ph.D. & M.D. ave,=-=95.0 126 239 950 530 120 245 580 S80 VERY SUPERzOR. 132 247 590 600 Mensa, Camelopacd-v98.0 133 249 600 610 Intertel, TOPS----v99.0 137 255 640 670, RMSQT Somitin.-----¥99.5 141 261 670 710 99.7 144 265 680 730, 93.8 146 268 710 740, SPE, INS, tin, Cinos-v99.9 149 273 720 760, 99.95 183 178 750 780, 99.97 195 282 760 790 99.98 187 284 770 400, 99.99 189 148 760 99.995 162 192 750 99.998 166 Goniuses of Dstng.-v99.999 168 99.9995 im 99.9997 172 99.9998 a4 Moga, One-in-a-Mil-v99.9999 ie 99. 99995 ae 99.9997 240 44 99.99998 vat 99.99959 183 as 99.999995 nes 9.999997 187 a6 99.999898 198 9.999899 130 a 99.9995995 5.73 186 192 99.9995997 5.82 187 193 48 99.9995998 5.ea 182 194 99.9 999 + Kjeld Hvatums “Letter to Ron Hoeflin” and Ron's response, In-Genius, # 15, August 1990 8.4 Review of norming analyses of currently accepted tests In view of continued criticisms of tests that have been accepted for entry to the Prometheus Society, it has seemed ppnident to review the norming analyses of these tests to assess whether in our view they warrant continued use for application to the Prometheus Society and to provide data for a more meaningful debate on related issues. We have tempted to understand the ratlonale for approaches and to determine its legtimacy to the best of our abilties. We have also presented the arguments that have been levied against these instruments, We believe that we have been fair in our assessments. 8.4.1 Mega test We feel that Ron Hoeflin's Mega test may represent the best one can reasonably expect in terme of establishing a credible 1-in-30,000 cutoff on a high-level test of mental performance abilities because of the dearth of avaliable information on ir tests at the high level of our cutoff criterion with which ta norm and calibrate a high range test. A general statement that can be made about the Mega is that the predictive value of a fairly small number of Mega problem Is quite amazing as be seen in a subsequent section of this report where the “short Form’ of the Mega (the Mega27) Is discussed. We have considered negatives that have been pointed out with regard to the Mega test over the years and have attempted Capture those criticisms in a separate section below. Notwithstanding such criticism, we have concluded that the setting of 1-In-30,000 cutoff at a score of 36 on the sth norming of the Mega Test is quite credible. That is, of course if one could discount the possibilty of compromised answers on the Internet and elsewhere as described in section 8.3.2 above. These conclusions are based on the following analyses. Review of the Mega Test sixth norming: ‘The Mega Test sixth norming is based on a weighted average of several tests for which paired raw scores are available. The orming is, however, heavily biased towards the SAT since that test provides the largest number of score pais for the orrring. According to the sbth norming, a comparison of the average of the standard tests vs. the combination of standard tests plus SAT scores agree well to somewhat beyond the 1-in-30,000 cutoff with which we are particularly concerned, Notice, however, that in accepting Dr. Hoeflin’s sith norming, we feel that we should also accept SAT and other test scores used in the norming at the associated level for admissions to the Prometheus Society. To do otherwise would be inconsistent since SAT raw score data (in particular) was used explicitly to norm the Mega, Test Equating: Methods and Practices, by Michael Kolen and Robert Brennan and Test Equating, edited by Paul Holland ané Donald Rubin who are with Educational Testing Service (ETS) both discuss "Equipercentile Equating" under the general heading of comron ways to find equivalent scores on two different tests, The meaning of this approach is obvious from the ame. (It is worth noting that these references refer to “equipercentile equating" rather than equating based on equivalent standard deviations.) This is the technique we have sometimes referred to as "score pairing.” We have spent considerable time discussing the legitimacy of this method and believe the approach Itself to be valid. Since its the approach used by Or. Hoeflin in norming his Mega test that was previously accepted for entry to the Society, it seemed essential that we understand the rationale for the method. (In general it seers more advisable for the comrrittee to merely evaluate narming data rather than attempting to re-do t.) Bil McGaugh has conducted an independent study to be Published in GoF Issue 102 as "(Bil we need @ ttle)" describing the application of this methodology to athletic capabilities of world class decathlon participants in which he shows the method to work effectively In that arena as well. Some (even Some of us) have considered this approach nontraditional and somewhat controversial as the "side-by-side" score-pairing by which It ls Sometimes referred, as though the technique were exclusively used by Hoeflin for establishing the 1-in-30,000, cutoff using the 220 SAT-Mega score pairs. See for example Roger Carlson's article, "The Mega Test" (Test Critiques, Volume VIII, 1991). Evidently, however, itis an accepted method used routinely by the ETS. The following is a plausibility based argument that the committee used in understanding ths equipercentile equating method If one assumes that raw scores on the Mega and the SAT are monotonically related to mental abilty, | e., that a higher raw score on ether test correlates with higher mental ability, then there is some function z3(n) that relates raw scores on the Mega to standard intelligence scores 2 and there is some function z2(m) that relates raw scores on the SAT to standard scores z, where 2 = (IQ-100}/16. It is plausible to assume that the joint probabilty distribution of 2, and zy Is just the bivariate normal distribution p(23,22,t) for some correlation r. This function Is symmetric In 2} and 23. Thus, for any random sample for which raw scores exist for both the SAT and Mega, if we have n scores with 2; > 4, then we would expect scores with 22 >. These would not generally be the same n individuals in each case. Thus, if we know the 2-in-30,000 cutoff on the SAT (raw score=1560), and If there are N people in the sample of people taking both the SAT and the Mega scoring at this level or higher on the SAT, then counting down the highest N Mega scores from the sample would give a reasonable estimate of the 4-sigma cutoff on the Mega (raw score=36). Ron Heflin showed that, if you do this for several different cutoffs, then the resulting Mega normalization is linear over a range of scores including 36. This lineanty feature seems to be standard on IQ tests over their range of applicabilty We are aware that there are difficulties In this argument (e.g., with respect to self-reporting of SAT scores, nonrandormess of sampling, sal sarple sizes, and mathematically allowed but “unphysical” test scores associated with ceiling effects), Roger Carlson has pointed out Several of these problems in nis review, "The Mega Test" appearing In Test Critiques, Volume VI, 1991. We believe that the arguments could be tightened up in the future, but that the use of the data shown in the equipercentile equating plot does not raise any immediate "red flags” in these regards for determining the Prometheus Society cutoff score on the Mega, The correlation data does seem to reveal some reticence on the part of participants to Claim SAT scores below 1150 which has probably reduced the correlation coefficient (r=0.495) that is shown by the trend line in figure 2 significantly and is very likely responsiole for some of the noniinearties in figure 2 especially the bending at the low end. Mega vs. SAT (side-by-side, N - 220) Moga vs. SAT N= 220) 1600 - ‘00 100 soo 100 2 i s00 & too 3 imo 3 rom 1100 ‘100 1000, ‘000 . 900 coo ttl 28 12 18 AF DBA oO 6 12 18 MH 80 86 ae Moga Fw Score eget Score Figure 2: Mega vs. SAT Score Correlation Figure 3: Equipercentile Equating of Mega and SAT In addition to correlations with the SAT raw scores, data i available from which correlations have been made against eight other intelligence tests. Plots of score paits are provided for the LAIT, Cattell, CTMM and WAIS in figure 4. (The data for the figure are available at the URL http://www. eskimo.comy~mriyaguch/megadata/megacorr2.htm.) These correlations ‘and the number of raw score pairs (N) that they are based upon are included in the following table: LAIT (Langdon Adult Intelligence Test) | 0.673 || 76 GRE (Graduate Record Examination) 0.574 || 106 AGCT (Army General Classification Test) | 0.565 |] 28 Cattell 0.562 |] 80 SAT (Scholastic Aptitude Test) 0.495 || 220 MAT (Miller Analogies Test) 0.393 |} 28 Stanford-Binet 0.374 |} 46 CTMM (California Test of Mental Maturity) | 0.307 || 75 WAIS (Wechsler Adult Intelligence Scale) || 0.137 || 34 ‘This table and the corresponding figure are considerably at variance with the table of “actual correlations" presented by Langdon in the article, "Mensa Tests and Other Standard Tests" (Gift of Fire, Issue 81, January 1997) in response to Greg Scott's article "For Acceptance of Mensa Supervised Tests" (Gift of Fire, Issue 99, September 1998) that cites the date above, which ts available to the general public at: . The table above has been verified and corresponds to currently available data. if Langdon’s data does indeed support the correlations as he indicates (and we have no basis for dsagreeing) then it must be concluded at a minimum that there is Considerable variation in such measurements with respect to the Mega. Interpretations of his table can not be made without Inspecting the data (which is currently nat available to us) OMNI Sample ¢ LAT = Cattell © cTwM a WAIS Linear (Cattell) Linear (LAIT] Linear (WAS) = Lineer (CTMM) z 3 5 & 20 3040) Mega raw score Figure 4: Correlation of Mega vs. Other Tests’ Score Pairs ‘Some of these other tests for which correlations are avallable including the GRE (presumably on the earlier version for which raw scores sometimes exceeded 1600) and CTMM support equipercentile equating up to or near the 1-in-30,000 cutoff as shown in figures 5 and 6 below. The GRE score of 1610 would seemto be a comparative score ta the Mega raw score of 36, This is quite compatible with statements to the effect that ETS had provided data indicating a score of 1620 corresponded to the 4-sigma level as reported by Paul Maxim in his article "Renorming Ron Hoeflin’s Mega Test", (Gift of Fire, Issue 79, 8 ~ 12, October 1996.) A rather amazing fact is that for the CTMM a 1-In-30,000 cutoff is indicated at a CTMM score of only 155, but of course there is insufficient data to confirm such a result The norming data that Ron Hoeflin used in this sixth norming is currently available on Darryl Miyaguchi's High 1Q Testing web site Correlation GRE vs. Mega +650 1600 1660 = ORE —Linaar (GRE) Mega Figure 5: Equipercentile Equating of Mega and GRE Correlation of CTMM vs. Mega Figure 6: Equipercentile Equating of Mega and CTMM Item Response Theory (IRT) analysis of the Mega test sixth norming: ‘The norming analysis performed by Grady Tower's that appeared in In-Genlus (Issue # 25, January 1991) has been obtained as well as the associated norming data that was provided to Grady by Ron Hoefln for that purpose. This analysis was re-run ‘as a nart of the Membershin Committee analvses with a counle of corrections and erations imnlemented that had nat been present in the original analysis, Iteration (of the t-matrix) was Identified as optional in the source paper, "A Procedure for Sample-Free Item Analysis," by Weight and Panchapakesan ("A Procedure for Sample-Free Item Analysis," Educational and Psychological Measurements, Vol. 29, 23-48, 1969). The analysis that was performed for this committee used the conceptually simpler, but somewhat less accurate "log" method rather than a "maximum likelihood" method (both methods are described in the referenced paper). Figure 7 demonstrates the results from one-parameter Item Response Theory (IRT) Rasch model calculations which show IQ assignments versus raw score for the full Mega (Mega48) test. The IRT scale must, of course, be calibrated. In the chart below, it has been calbrated against the Inear portion of the mapping resulting from Ron's equipercenttle equating of Mega scores onto SAT scores shown in figure 4 above for the sixth norming of the Mega data. The IRT calibration depends on the validity of that data, The fact that the IRT scale looks lke that obtained on the sbth norming by other means is, therefore, not surprising. Aso provided in this figure of IRT data are reliability indicators, showing one standard deviation error tolerances on the data. Clearly this Rasch model does nothing to destroy the notion of rellable mental performance measures up to the 165 19 range which is at or above the 1-in-30,000 cutoff of interest to the Prometheus Society. It also shows grave reliability limitations beyond the Prometheus Society cutoff level, however. namo othed Moga Tost Raw Score -10 Assignment (Mega 30~ 10 157, Mega 15 ~ 10139) 200.0 +300 80m 770.0 Foes & seo Lissw $ ss0m uso = 13600 125m “1800 105m 00.0 Ts . 6 2 0 w m 8 2 Mogn Test Raw Seote Figure 7: Mega48 IRT Test Norming Intelligence filter operative in the Mega test sixth norming: We have examined the effects of IQ fitering to assess the extent to which the Mega test applicants differ from the general population. The results show those selection pressures result in filtering who will respond on such tests such that the probabiities of submitting a test for scoring increases dramatically with the resuktant scoring percentile itself over a quite extensive range of scores. This phenomena has amazed us at times in our deliberations as indicating that individuals have a very good buit-in “feel” for the degree of thelr own inteligence and perform a very ctitical self-selection evaluation prior to submitting such a test. ‘The distribution of scores on the Mega test are quite obviously not distributed according to a normal distribution as shown In figure 8 below. There are many more nominally high scores than a normal would accommodate. This fact has been Challenged as reason in and of itsel for invalicty and "inflation" of the Mega norming. See for example, Paul Maxiris "Renorming Ron Hoeflin’s Mega Test" (Git of Fire, Issue 79, October 1996). We feel that this criticism without further supporting evidence is invalic, however, because -- quite simply -- a random sample of the general population do not submit responses to the Mega test and the extent of the selection was underestimated In the article, In fact, respondents are filtered by their own and other quite extensive pressures such that an extremely selective sampling takes place -- much more effective (as far as elevating the mean) than a simple cutoff band pass fiter. Refer to the article by Fred Vaughan also appearing in Issue 79 of Git of Fire called “Intelligence Filters" and in the mathematical methods section of this report for an explanation concerning the characteristics anc effects of such selective fiters. Mega(n) 260 ce eH eR PHS SES Fig. 8: Distribution of Mega Test raw scores for sixth norming If Mega(n) is the number of people who scored n correct on the Mega test and Nr Is the total number of people who took the test. Then the conditional probability that someone would score n on the Mega test given that they took the test Is obviously approximated by the frequency cata: P(n; take test) leaa(n) / Nr But, of course, to address the cutoff criteria of 1-in-30,000 of the general population, what we need to know Is what would the frequency sistribution Pq(n) be if the test were administered to a large random sample of the general population, Np. ‘The mathematical treatment of an inteligence filter provides this conversion such that: P(n; take test) = F(n) * Pig (n), s0 that, Fin) = Megaia(n) / (NT* Pig (n) ), where itis assumed that Pig (n) = NORMDIST (0, 100,16,TRLE), when nis rescaled to a standard 1Q score obtaineé on the Mega. The Mega test does not result in a uniform scaling of 1Q vs raw score. For example, the IQ 100 conversion s to @ raw score of 1 on the Mega. 1Q 116 (one sigma) is at raw score of 4; 1Q 132 (two sigma) is at raw score 9; 1Q 150 (three sigma) i at 2 raw score of 24; 1Q 164 (4 sigma) is at e raw score of 36; end so an with a standard devietion that increases with score, (Ths noninearty of scaling was taken from the fourth norming of the Mega.) The distribution of the sixth poring was Ninearized” by spreading the data to obtain Megajq(n) fram Mega(n) using the fourth norming 1Q assignments, This was done using a simplistic algorithm for proportionately dividing Mega(a) among associated 1Q Increments to obtain Mega;a(n) without smaathing. A selective fiter on the normal distribution of the general population applies exclusvely to Megarg(n). In this way it was determined that the number of individuals in the assumed general population from which selection for taking the Mega occurs ison the order of 3 millon people i. ., the number of those who score above the 164 cutoff 1Q is appropriate to a normal distribution with NT = 2,850,000 people. See figures 9 and 10 below which plots Megaya(n)/Pra(n) as well a the hypothesized selective fiter using both log and normal scales. An accumulative error function distribution Used as the selectve fier plotted in the azure circles, The error function seems to be an excellent fit throughout a quite extensive range of scores as can be seen on a log scale and accounts full forthe preponderance of high scores on the Mega as can be seen, ‘The equation of the Errar Function fiter, using Excel nomenclature is Fon NORMDIST(n,M,s,TRUE), where the mean, M = 162, standard deviation, s = 13.4. The effective population size being fitered is Np = 2,850,000. So although there is a very restricted set of indivicuals who actually respond to the Mega there is a fairly large arena from which only the undaunted actually submit responses for scoring. The arena size ne doubt derives in part {rom national exposure of the "World's Toughest 1Q Test" (OMNI Magazine, X, X -- anyone have this reference?) and an Internet presence. Although the fitering is much more intense than thal for the SAT, the general form of the fiter is quite simlar, One could speculate with regard to the rationale for the fiter form being as itis, but the Membership Committee has not formulated a position with respect to that. [actual Mega loa) 2100 «fer |Homgeneral population predicted ditrioution 18 1018 2125 315 41 a8 St 56 61 86 7% 75 81 AB aT Fig. 9: Mega 1Q-scaled distribution (actual, predicted, and general populations) and filter 020 [predicted rant | "actua Moya (tn) i | Fig. 10: Mega 1Q-scaled distribution (actual and predicted) -- log scale g loading of the Mega test sixth norming: Using the Easy Factor program, a Principal Components Analysis was performed on the sixth norming of the Mega data. The loading on the first factor, which Is reasonably Interpreted as being g, is 0.62. This is a reasonably high loading since we are basing this analysis on data thet was not rencomly selected, and, does not represent the full range of intelligence. The g loading of the tests a bit lower than i would be Ht were a test with many more easy problems (with muliple choice answers) that are solvable by 2 wider range of the population. Ina communication of one of our Membership Committee members with Grady Towers concerning an analysis he performed some time ago based on 46 individuals who had reported scores on both the LAIT and Mega, he reported finding that the Mega (partitioned into "verbal" and “non-verbal” portions) was g-laded as folows “verbal” 48 55 13 72 "non-verbal" || .86 00 74 44 negatives” with regard to the Mega test: In the interest of all opinions and data being presented, we have tried to fairly represent the positions of detractors of the Mega test. These positions are not necessarily considered to have invalidated the test even by their proponents, but merely have been expressed concems that needed to be evaluated. ‘The intial period in the Mega test formation consisted of gathering self reported IQs on other 1Q tests taken by a highly selective group of participants in the Mega test. These were provided by a group of only 87 people to obtain norming data. This has been deemed reasonable considering the severe constraints on developing such a test. Although, an estimate of standard error of measurement and an estimate of test reiablty which emphasize the tentativeness of mental measurements rather than their exactness have not been established. This was cited by Roger Carlson in his article, "The Mega Test" (Test Critiques, Volume Vill, 1991). The Mega test's problem in determining construct validity derives in part from the nature of self-reported and self-selected 1Q scores used for the normng, Greg Scott's article "Far Acceptance of Mensa Supervised Tests" (Gift of Fire, Issue 99, September 1998) addresses this fault. If nat handled very carefully, sef- reporting could easily produce an elevated norm Tt should be noted, however, in reference to figure 21 presented farther on In this report, that where both SAT and GRE scores have been reported, equipercentile equating between those two-test scores is extremely good, indicating that f disingenuous tactics were employed, it involved a concerted effort by many individuals ~~ we think that unikely, ‘There are also criticisms for using non-random sample composition. There is no data concerning the nature of the sample composition with regard to who takes the test, and who sends in score-pair norming data and this does not enable one to assume that potential sample errors are insignificant. Although lt should be stated that this 's in part, mitigated by the use of IRT methods to scale abilty levels and maximum likelihood scaring analyses, which are, in principle, independent of sample composition. A related problem comes from the fact that test results have shown increases over the years. (Refer to 8.3.2 above where this is assessed with regard to scores of new members over the years which appeared to involve minimal creep.) This ray well be related to answer leakage, access to the intemet and computer technology. This does undermine the validity of the Mega norming, The Mega 27 has been an attempt to deal with some of these problems. Criticisms about the test's abilty to rmake fine discriminations at high ranges are lessened by the nature of the Mega test norming, in the mele range, which has about 1.2 scaled points for each raw point, The norming data shows the test to discriminate quite reliably near the 3- sigrra level of abilty in the general population However, there Is a problem with the Mega scores in comparison to scores from standard IQ tests which reveal a wide scatter, resulting in correlations which are weak. These are low correlations compared to the correlations between standard IQ tests which ere normally in the range of 0.7-0.8, The Mega correlations with recognized tests such as the Cattell, Stanford-Binet, CTMM, and WAIS are 0.562, 0.374, 0.307, and 0.137, respectively. The correlation with Une SAT which was Used heavily for the sixth norming is only 0,495, & correlation around 0.4 is considered to be weak. Note, nowever that these correlations are uncorrected for range restsction and for attenuation due to Imperfect reliablity. A possible reason for the very low correlation with the WAIS is the low ceiling (150). Also note that some of the tests against which the test was normed either have low ceilings (WAIS) or normings that are likely to be inaccurate past 1Q 150 (Stanford. Binet). At the high end there is even greater discrepancy between scores. ‘This undermines to some degree claims of validity in measuring 19 with the Mega test. However, the average SAT score of those with a score at or above 36 on the Mega test is 1498, leaving considerable room before reaching the ceiling of the test but leaving some doubt as to why there were not more extremely high high SAT scores. Another analysis by Grady Towers reveals that the Mega Test does not load high an fluid g, but much higher on erystallzed intelligence. This runs counter to an interest in selecting for fluid g at the 1-in-30,000 level It is not unreasonable to assume that the Mega test could rellably discriminate scores in a range of at least +/- 1 sigma about Its 50% correct score, From Mega standard score of 100 to Mega standardized score 116 (one stangard deviation), the percentile renking changes from 50 to 84 (34 percentile points); contrast this with the change in percentile ranking from Mega score 148 to Mega score 164, which ig also one standard deviation ~- the percentile ranking changes from 99.87 to 99,997, a difference of only .127 percentile points. Were we to adnere to traditional usage of percentile scores, we would designate all scores above the 99th percentile as "99+," which is not very useful. But, supporting the Mega test's abilty to discriminate at the very highest levels of g (low correlation with other IQ tests, in addition to Spearman's law of diminishing return), is significantly problematic ‘The Mega test has not been normed on large populations and unlike standard 1Q tests (which may have had comparable Population sizes for norming, it aspires to valicity at a much higher rarity. It has in tun been normed on these tests which also have insufficient populations. If there are problems with right-tall bumps at the high end of standard 19 tests, then the Mega test can not claim immunity from such phenomena. ‘The test does not have any bul in controls over what goes on in the testee's mind providing the necessary probability that an item's measuring specific cognitive processes. There have been continous and possibly legitimate complaints that the Mega test measures resourcefulness, tenacity, time avalable, motivation, access to applicable reference material, habitual cognitive strategies or algorithirs, specialized knowledge, use of computers, rather than @ general form of innate cognitive abilty. See for example David Slater's article, "Some Thoughts on Super High IQ Society Admission procedures" (Git of Fire, Issue 100, October 1998), Kevin Langdon's "Reply to Dave Slater on Test Design" (Gift of Fire, Issue 102, January 1999) and Don Johnson's "Intelligence Testing and the Ego" (Git of Fire, Issue 100, October 1998). Uttimately, the facts surrounding Its belng a non-proctored take-at-home test will alvays leave questions concerning the degree to which the applicant followed the ostensible rules of the test, Demographics of members of the Prometheus Society suggest that little collaboration among test takers has affected particioants at this level 8.4.2 Mega27 -- A Short Form of the Mega Test ur efforts with regard to the Mega27 test have been an effort devoted to obtain an approach to work around the leakage of answers to the Mega test by eliminating compromised problems and the very easiest problems that remain. It uses the remaining unleaked and harder problems to assess the applicants’ credentials. Considerable progress has been made by the Membership Committee in assessing this potential using correlations with the original sixth norrming of the Mega, Item Response Theory (IRT), maximum likelihood scoring techniques and factor analysis. in addressing this Issue, It seemed prudent that additional problerrs which are much too easy to discriminate at the 1-in-30,000 level should also be eliminated In this way we obtained a "Mega2? Test". This test retains only 27 rather than the orginal 48 test questions. This approach will aso forestall the inevitable compromise of these easier problems thus extending the useful Ife of the Mega for our urposes. The results seem extremely promising as described in the following paragraphs: Correlation between Mega27 and Mega48 Score Pairs: Figure 11 provides the correlation of scare pairs for the Mega2? and the Megad8. The fact that the correlation is strong is ‘not surprising, A raw score of between 19 and 20 seems to compare favorably with the Mega score of 36. Figure 11, lstrates that 2 score of 21 on the Mega27 excludes eleven (11) participant who scored 36 or greater on the Mega4s However, a score of 36 on the megad8 excludes only two (2) participants who scored 21 or greater on the mega27. The mean Mega2? score of the 11 excluded participants scoring 36 ar more on the Mega 48 was just over 37, whereas the mean Of the 2 included was 34, In short, the Mega27 cutoff of 21 would be (to the extent that itis any different) more restrictive than a Megada cutoff of 36, This data indicates that the 1-in-30,000 criterion is easily maintained (and in fact made more plausible) in going to the short form of the test if a Mega27 score of 20 correct out of the 27 is usec. MogstB ve. ogazr Meparnan score Figure 11: Correlation of Score Pairs of Mega27 with Mega48 IRT analysis of the Mega27: Figure 12 provides simlar data for the Mega27 to that provided in figure 7 for the Mega. Figure 12 ilustrates that the Mega27 is mare reliable at both ends (including down around 130 1Q and up around 170 to 175 1Q) than the Mega4a, This data ilustrates that the 1-in-30,000 erterion is easily maintained (and in fact made more plausible) in going to the short form of the test. The 1-in-30,000 level on the Mega2? is @ score of between 21 and 22 correct out of the 27 (the score 21, correlates closest to the Mega48 score of 36) as is easily seen in the two figures. When this analysis is taken in conjunction with the correlation date shown above, the raw score of 21 seems to be @ reasonable assignment, 10 Assignment, Megaz7 Maximum likelihood scoring of the Mega27 in comparison with the Mega48: As further justification of this step, the following figures illustrate the comparison of the traditional scoring of the Mega with 2 maximum lkelhood method based on the unique dificulty profiles of the individual test items and probabilties of corectly answering the questions. The top fifty or shity scorers in the Mega sixth norming data set are represented in the two figures (13 & 14) below. Again, as can easily be seen, the results are “more regular” and seem more relable with the Mega27 than with the onginal Megad®, In particular, for the Megads there are 17 instances (30%) where the assigned score is higher than for another individual whose probabiity-based score is higher. Some of these discrepancies are as large as two raw score points. In contrast, for the Mega27there are only five such scores with only a single one exceeding a full raw score point raw score) gad cognitive ability tories top scorers on Megada =» Traaitional (Series 2 vs. Maximum Likelinoog (Series 1) a g {scaled to mega27 raw score) a & 8 88 3 2 top scorers on Mega27 Figure 14: Mega27 Test Scoring -- ‘Traditional (Series 2) vs. Maximum Likelihood (Series 1) When 50% confidence levels are applied to maximum Ikelinood scores for both the Mega48 and Mega27, the Mega48 interval Is about +/- 4 raw score points. The mega27 Is about 1/2 to 2/3 of that amount which is to be expected because the variation depends on the problem profiles which are virtually identical in both cases. If these results were iterated, the rraxirum lkelinood scores would have been “smoother,” but the the point is still the same: The Mega27 score which we will be recommending appears to be even more reliable than the Megada g loading of the Mega27: Factor analysis on the Mega27 actually resulted in an insignificant increase in weighting on the principle component (that can be interpreted as 9). To two-decimal places, this g-loading is now 0.63. It seems apparent, therefore, that g loacing has certainly not been sacrficed in cutting the Mega test down to 27 questions. Again, we rrust remember that this analysis was performed on data that was not randomly selected, and, does nat represent the full range of the norrral distribution of the general population. The assessed g loading of the test is, therefore, a bit lower than it would be if t were performed on a test with many more easy problems that are solveable by a wider range of the population. However, it Is worth noting that by taking out the easiest seven problems on the 48-item Mega, we do not seemto have adversely affected the 9 loading, agreements for scoring the Mega27: It is essential that the test ceveloper and scorer, Or. Ronald Hoeflin, agree to the modified use of his test and the added Imposition of providing the unique Mega27 score specifically for the Prometheus Society. Several altemative approaches to obtaining this have been proposed to O*. Hoeflin. We are currently In negotlation with Ron and it would appear that he is in basic agreement with our approach, 8.4.3 Titan test The Titan test was also developed and is scored by Dr. Hoeflin. It is also 2 48-item take-at-home test modeled much after the Mega, Certainly, we would like to have been able to provide item analysis and at least been able to review norming date for the Titan, but even without that analysis and data, some of us expressed comfort with continuing to use the Titen for admissions at the present time if had not been for known compromises based on the folowing considerations, There are the matched-palr data which provides the scores of 114 subjects on both the Mega and Titan test. See figure 15 below. The mean of the Titan raw scores in this set is 20.1 and the mean of the Mega raw scores is 22.3. The difference between means was highly significant (9>0.001) according to a t-test. So across the full range of scores, the Titan is, perhaps, two problems tougher. The correlation between tests was 0.82 Examining the raw scores of the subjects with combined Mega and Titan raw scores of 48 (n=46) -- people near the Prometheus Society membership criteria interest ange -- reveals that the means of the two tests for that group were Mega= 31.4, Than =31.3. The difference between means being statistically insignificant, as one might expect. Figure 15 shows a correlation between scores of individuals taking both the Mega and Titan. Using score pairing equipercentile equating methods for caliration, the fourteenth Titan test score was a 36 and the fourteenth Mega score was a 35, See figure 16. The 46th Titan score was a 24 and 46th Mega score was also a 24 -- a faily close pairing {A consensus opinion of those on the committee having done both tests, Is that the Titan Is 2 to 3 problems harder than the Mega. The statistical evidence, however, seems to indicate that the Mega may be a bit more dificult, but at the higher ranges we are trying to measure, they are almost identical. It is interesting that Ron Hoeflin also has characterized the Titan as more difficult at the lower range, and equivalent at the upper end, ‘Score Pairs for people who took both the Mega and Titan tests -114) 0 + srosprey res o 8 2 mf om we Moga Test raw ecore Figure 15: Titan vs, Mega (48-item) Correlation of Score Pairs Moga Tost and Titan Tost Equating (N=114) “ e760. spa ° FEA La : o 6 2 © 4 0 we oR Megs Test Rae S000 Figure 16: Titan vs. Mega (48-item) Equipercentile Equating ‘The Titan appears to be less compromised at this point in time than the Mega -- our impression is that most people that examine both tests opt to use the Mega because the Titan appears rrare difficult at first glance and, perhaps, "less fun". Answers to the Titan problems have on occasion appeared on the Internet over the last couple of years. A serious problem in this regard is that we cannot perform item response (RT) or other analyses necessary to develop a sub-test. We do not even have enough data to effectively check its characteristics. According to data supplied by the membership officer, very few people have been admitted to Prometheus by the TI evidently people aren't “leaking in" due to this test being too easy or answer leakage being too severe as of yet. We feel that itis most unfortunate to have to recommend suspension of this test from our qualification list at this time and hope that sufficient data wil be provided in the ner future so that the test can again be certified for use by the Society. Ron has assured us that he will provide the data so that we will be able to add an addendum to our recommendation i the data warrant the Titan's retention in some form, However, as of this time there is insufficient data to wark around the known compromises to this test and we must stop the leak. 8.4.4 LAIT (scored before Dec. 31, 1993) ‘The norming data on the LAIT has not been made available to this committee by the test developer. However, since the LATT is na longer being scored, having been retired some time ago when its answers were published, we are not concemed ‘bout continued Prometheus Society criteria erosion vulnerabilties due to this test. Many members have been accepted into the Society based on scores on this test in the past and members of record at two dates in the past have been assured entry to the Society so it seems reasonable to retain LAIT scores obtained prior to Dec, 31, 1993 as satisfying entry criteria ‘There have been legal problems and some controversy with regard to the legitimacy of this test, but we do not these are of much concem since the test is ne longer being scored Correlation of LAIT to Mega 180 1a LAIT score 140 130 10 a ee Mega raw score Figure 17: LAIT vs. Mega score pairs Cursory review of Kevin Lanadon's 2nd norming of the LAIT together with more recent data relating LAIT scores to Mega scores as shown in the following figure 17 has persuaded us that itis reasonable to retain @ LAIT-IQ score of 164 as satisfying the 1-in-30,000 of the general population criterion, though it would have been nice te have had more data, g loading of the Lal The following excerpts are from Grady Towers's "Letters to Kevin Langdon” (Noesis 131 -- Special Issue on Psychometric Issues, 11, September 1998). Grady discussed LAIT/Mega analyses in the "3rd leter dated 4/28/98, factor analysis in his 'ath" and "Sth" letters dated 7/27/98 and 8/24/98. He wrote: 7 worked them out many years ago but was reluctant to publish them because of the small sample size (N=46). There are two kinds of factor analyses extant in psychometrics: Principal Components Analysis and Common Factor Analysis. Common factor analysis is the preferred method. hat I did was to factor analyze the correlations between the LAIT and 24 Verbal items on the Mega Test, with 12 Spatial items, and 12 Numerical tems. I found two important factors: the first colurm represents g loadings, and the second Is a Verbal/non-verbal bifactor. Lar 76 || -.36 Verbal 44 47 Spatial a4 || -09 number | 74 |] 18 Rotating these factors to ortho icture, we get uid inteligence’ and ‘crystallized intelligence.’ war 3 1 Verbal 12 63 Spatial 75 38 Number 52 55 Kevin's reply is an article entitled “Reply to Grady Towers" (Noesis 131 -- Special Issue on Psychometric Issues, 16, September 1998), 8.5 Scholastic Aptitude Test (SAT) -- the data and its application to the arming af nthar tacte We have decided that the SAT deserves its own heading in this Membership Comittee Report since the analysis of its data 's central to our task. Correlation of paired scores with the SAT is the major basis of the norming of the Mega test tnat has (and we recommend to continue in the subset Mega27 test) satisfied the criteria for membership to the Society. In adcition the SAT has been analyzed to determine the appropriateness of using a cutoff SAT score for qualification to the Society as described further on. 8.5.1 Background data A couple of caveats are in order: First of all, the SAT has changed fail substantially over the years. The analyses that we have performed and the use to which the SAT has been put in normring other tests in this report involves exclusively what we call the “old” SAT. To distinguish this version, it is essential to note that: The “new” SAT has been deployed since Apsil 1, 1995, The “old” SAT was administered prior to that date, ‘The maximum score of 1600 on the new SAT V+M appears to map ta the score range of 1510 to 1600 on the old SAT. Given the shape of the score frequency distribution in general, we belleve that most 1600's on the new SAT would fall below 1560 fn the old SAT. For example, 453 out of 1,127,021 students who actually took the test in 1996-7 (probably representing some 3.5 million total 17 year olds) scored 1600 on this new SAT. This is ebout 1 out of 7,726 that would correspond to ‘about 2 158 1Q. We have yet to see sufficient statistically reliable data on the numbers of participants receiving these high Scores from one year to the next on the new SAT, but until and unless these reveal something other than we anticipate from what we have seen, the new SAT is defintely nat suitable for our purposes. 8.5.2 The SAT data correlations with IQ \T does correlate highly with g. This is discussed by Arthur Jensen in The g Factor. Jensen says on pages 559-560 ata obtained from 339 college students support the notion that much of the variance in SAT scores can be attributed to g (it is unclear from the text whether pre or post recentered SAT scores were used). College students are @ Somewhat restricted sample, so it would be expectec that If the sample was the entire population, the correlations could be even higher. The g- loading of the SAT-M is shown as -698, and the g-loading of the SAT-V is .804. The g-loading of most IQ tests is around .80. Another source, Nicholas Lerrenn, estimates in an article, “The Great Sorting” (Atlantic Monthly, Sept. 1995) that the Correlation between the verbal score and IQ is .60 to .80. 8.5.3 Cautionary notes and considerations There are cautionary notes to be added, though: g-loading is both a function of the test involved and the population belng measured, Jensen's data was obtained from a small sample of college students (tis reasonable to view this as a controlled condition due to the population being entirely represented by college students -- this could provide a control for other significant factors that affect SAT scores. The size of the population used in the ETS data has not been specified, According to Thomas J. Bouchard (a widely recognized researcher in the U.S. at the University of Minnesota studying IQ correlations between monozygotic twins), research in correlating 1Q with SAT scores has been inconsistent, The Standford Binet and SAT have been found to correlate anywhere between -445 and .8. The WAIS and SAT correlations fall in about the same range according to Bouchard, While the SAT and otner college admissions tests may be adequate measures of g for small homogeneous populations, e.9., group of native- English-speaking US students that have had an almast identical ‘academic background that would include leaming vocabulary lists and four years of high school math (the test uses no higher than 9th grade math), and whe also have had similar lifestyles and academic motivations. These limitations clearly preclude the SAT from ever becoming the sole test from which to select members world wide. While most cognitive abilties tests are influenced by education and cultural factors, SAT tests, because of their more specific academic focus, are probably less effective in measuring “g" for people who fall into categores that one finds in ‘more diverse populations (¢.9., unsultable education, lack of motivation to learn required subjects -- verbal/mathematical, or those suffering from math phobia, attention deficit disorder (ADD), depression, dyslexia, adverse effects of exam pressure, young children, foreign examinees, etc.). However, these conditions probably also significantly reduce the possibilty of interest in membership in Prometheus, Finally, it is possible that scores can be Increased without a corresponding increase in g through long-term study Undertaken with the specific goal of raising test scores (as of yet there is insufficient data on this). Individuals may be able to put in extre study and practice relative to the normal comparable population and considerably improve his/her ‘mathematical and verbal aptitudes. In this regard, long-term coaching should be distinguished from short-term coaching; research on the latter by the College Board indicates that shor: term coaching produces scores that are within the standard teror of the test. See http://www.collegeboard.org/press/htmi9899/htmi/981123a. htm. It is also worth noting that some ‘minimal study and coaching are faity typical of SAT participation so that such may be the norm which is already taken into account in the general population distribution Discussion by Messick and Jungblut in "Time and method in coaching for the SAT" (Psychological Bulletin, Vol. 89, 1981) provide an argument against the efficacy of coaching to obtain uncharacteristic high scores. Discussion of the issue on pages 400-402 in The Bell Curve cites this paper; there is an excellent graph on p. 401 showing score increments for the SAT-V and SAT-M plotted in separate curves vs. hours of study. Some facts from the text and the graph: 30 416 425 +41 100 124 439 163 300 hours of study might be expected to reap a 70 point Increment on the combined score, 600 hours 85 points. The cited article s a review of all studies done to that date on this issue. These documented improvements involve the average increments at all levels and are therefore weighted for differences occurring at the average level; increments at the high end of the scale must certainly be less. One would do well to remember that coaching for the SAT is a profitable rini-industry in the U.S. Extravagant claims are to be expected on a routine basis from this industry (as for any other) Rebuttals to this study are available lke The Princeton Review (The studies are intra-institutional like studies by ETS - information about these studies can be obtained by contacting The Princeton Review directly or found in books published by Princeton Review) which claims to provide unbiased studies that prove significant improvement is possible (well over 200 points). (Other material that explores this issue are avallable by Sarruel J. Messick in "Effectiveness of Coaching for the SAT" and "Incividuality in Learning". Simlar criticisms to those of extravagant gains have been made about the claims put forward by Hernstein and Murray. See for exarnple, Measured Lies: The Bell Curve Examined; Cracks in the Bell Curve; Inteligence, Genes, and Success: Scientists Respond to the Bell Curve ‘Statistics for Social Science and Public Policy’; Inequality by Design : Cracking the Bell Curve Myth; The Bell Curve Debate; History, Documents, Opinions; The Bell Curve Wars.) Also, ETS have sometimes been accused of biased statistical approaches that may significantly influence conclusions obtained. See for example, Stephen Levy's "ETS and the Coaching Cover-up," in the March 1979 issue of New Jersey Monchiy While all members of the Membership Committee acknowledge that there are valid criticisms of the SAT, we are in general agreement that these criticisms are insufficient to preclude its use for our purposes. 8.5.4 Intelligence filter operative in selection of SAT participants It is well known that the SAT is administered selectively to high school age students in the US. On page 35 of The Bell Curve itis stated that, "By 1960, a student who was really smart -- at or near the 100th percentile in IQ -- had a chance of going to college of nearly 100%,” There is a graph on the same page showing three curves for percentile 1Q vs. percent of college attendance. The curves are for the 1920s, early 1960s and early 1980s. From the graph, it appears that in the 1980s and in the 1960s, a student at the 96 percentile IQ had about a 92% chance of attending college (and, by implication of taking the SAT), From the notes in The Bell Curve on page 692, note 7: "...from top quartile [of PSAT scores], 79% went to college; of those In the top 5%, rare than 95% went to college." The data in the first exarrple used 1Q scores, not SAT scores. ‘There is another graph on p. 37 showing two curves, ane for students entering college, ane for completing the B.A. as a percentage vs. percentile IQ. Quote from p, 36: "..Meanwhlle about 70% of the top decle of abilty were completing @ B.A, For the graph on p. 35 of The Bell Curve, the curve for the 1980s is drawn from date from the National Longitudinal Survey of Youth. This study, the backbone of much data in The Bell Curve, used IQ not SAT for its cogritive abilty estimate. As the curves in these graphs show no signs of "bending over" at the higher 1Q ranges, this ought to allay fears about appreciable numbers of people at the top not taking the test. See for example, figures 19 & 20 below. We have examined the effects of selective intelligence fitering to assess the extent to which participants differ from the {general population. Only about one in three seventeen to eighteen year-olds in the US take this test although virtually all college bound” students do take it. Fiker assessment has been assisted by the availablity of the National High School (WHS) survey that assessed the cistribution of all students independent of whether they would have taken the SAT Figure 18 shows the frequency distribution of college bound students for @ given yeer. ‘The distribution of scores are again quite obviously not distributed according to the normal distribution although the skewing 's less than for the Mega, There are again many mare nominally high scores than a normal distribution would predict. In figure 19, which is described in more detail in the selective fiter methodology description of section X, the effective fier is shown on an enlarged scale as the roughly diagonal curve indicating progressively intense selection based on inteligence, ‘The deviation at the bottoms obviously because students with excessively low TQs do not even attend high school and therefore were not even included in random samples. See Kjeld Hvatumis table presented in section 8.3.3 where the range of retadation is shown to extend well into the score levels on the SAT which are effectively missing. ‘The degree to which this composite fiter fits the SAT data is shown particularly well in the plot on a log scale shown in figure 20. The similanty in form of this filter and that which is evident in the Mega data suggests that many of the same type of pressures must exist and again, that individuals are capable of very accurate assessments of their own cognitive abilties. SAT V#lM Frequency vom en arn SAT (Verbal Plus Mathematical Parts) Frequency sare + predicted ‘© general population J—=100,000 fer a 198-127 118 108 100 91 82 73 BA 85 48 a zB 19 10 4 Figure 19: General population distribution, actual and predicted ‘SAT scoring distributions and the effective selective filter with raw scores going from 200 on left, 1600 on right. It is interesting that Kjeld Hvatum in his “Letter to Ron Hoeflin® (In-Genlus, Vol. 15 ,August 1990) says, “Incidentally, the PSAT/NMSQT data provides a way to estimate the selectivity of SAT takers at various levels, because the PSAT is more of a Yorcec' test in many schools, and the PSAT and SAT scales are equated (via a factor of 10). The ETS provides PSAT estimates ‘that would be obtained if ALL students at these grade levels took the test. ‘A quick check indicates a factor of 3 Is approximately the selectivity at the higher score levels for the SAT." = SAT) predicted 7% Boal a9 87 GH 73 Bt BB 97 AOS 1194121 129 1a Figure 20: Actual and predicted SAT scoring distributions -- log scale This is very essentially what we have found, but one cannot just assume that the top 1/3 of the overall US high school population takes the SAT as shown in the figure above -- it is more complicated and the fitering more effective than that. 8.5.5 The ability of the SAT to discriminate at the high end of its scale The graphs in figures 21 and 22 below show that the SAT has the abilty to discriminate throughout Its complete range of raw Scores. Figure 21 shows a slight non-linearity between raw vs. scaled scores starting near a total score of 1540. On ier administrations of the test (see figure 20) the questions are evidently more dificult and the raw vs. scaled graph is linear all the way to the top, suggesting that the test is indeed discriminating through its complete range. in lic Figures 21 & 22: SAT discrimination capabilities The difference between 1600 and 1560 ts typically 2 to 4 problems on the “old” (pre-recentered) SAT. However, when figuring percentile equivalents for the SAT, lt should be remembered that It Is based upon a sample size of approximately 1 rion actual test takers selectively sampled from a general population size in excess of 3 milion. It isn't unreasonable to assume that the general population percentiles that we assign to the SAT at the top end (for which selection is the highest) are accurate for the test group as a whole, In fact, however, in a population of 3 millon there should be over 100 individuals scoring at the 1-in-30,000 level, On any given year less than ten individuals obtained a perfect score on the old SAT with on the arder of 100 or less scoring 1560 or more and, therefore, itis is safe to say that the 1-in-30,000 level is, achieved by these individuals, 8.5.6 Establishing a credible 1-in-30,000 of the general population raw score cutoff As indicated throughout this report, we have chosen not to accept theoretical positions on what the distributions of test scores will be at the high end of the psychometric range nor even if its inteligence that is being discriminated at the extreme tails of distrbutions, preferring actual data to accepted notions and legitimate claims of rarity to unverified claims of "super intelligence.” In keeping with this philosophy, we note that of three milion people in the general population for which a single SAT applies, 100 would satisfy the rarity condition. Therefore, for a given year, looking dawn the top 100 scores, we find for example for 1984 combines VM for College-Bound Seniors: ‘SAT high range date distribution in 1984 1600 | 5 1590 [o 1580 | 27 1570 [19 1560 | 39 1ss0 | 7s 1540 | 96 1530 | 108 1520 | 188 1510 | 217 1500 | 278 ‘This data is typical of data avaliable for various years on the "old" SAT In this case 90 individuals scored 1560 or above. 1560 is also the score that Ron Hoeflin used in his sixth norming of the Mega so this vale is highly compatible with analyses performed elsewhere in this report. In Paul Maximis article "Renorming Ron Hoeflin's Mega Test” (Git of Fire, Issue 79, 8 - 12, October 1996), Ron Hoeflin is sald to have had breakdowns of 5,157,642 SAT scores from 1984 to 1989, The top scorers, for those six years were said to be distributed as follows: SAT high range data distribution in 1984 -1989 1591-1600 35 1581-1590 8 1571-1580 149 1561-1570 a ‘This gives an average of less than 44 per year so that we are very confident that our assessment has been (if anything) @ conservative estimate for a cutoff score. We are, therefore, quite comfortable with the cutoff of 1560 indicative of a rarty ff no more than 1-in-30,000 and as a qualifying score for the Prometheus Society. 8.6 Consideration of Additional/Alternative Tests to Satisfy Prometheus Society Membership criteria Wherever possible we have used Otftied Spreen's Compendium of Neuropsychological Tests: administration norms, and commentary and the book of norms from 1991 (Comprehensive Norms for an Expanded Halstead-Reitan Battery, Heaton et al, commonly referred to as the “Heaton norms") which is widely used in neuropsycholagical testing. This information may Conflict with other available data on occasion. This is expected with the nature of normative data at the current state of the art in this field -~ particularly at the upper extremity. But these norms are widely used and accepted as authoritative, so we've used them for comparisons and other purposes. 8.6.1 Mensa testing approaches Because of much greater membership, Mensa can afford quite extensive testing programs. Facilities and psychometric instruments are available throughout the world. In much the way that this committee is attempting to assist the Prometheus Society in establishing tests that it can warrant with credibility, Mensa accepts Scores on various tests which change from time to time It is understood in this regard that Mensa's discrimination prablems are much less derranding than ours because of their considerably lower qualifying standard. They do provide @ paracigm, however, and if it were possible to tap into their resources and glabel support, it would have considerable ment. Greg Scatt addressed this possibilty in his article, "For Acceptance of Mensa Supervised Tests" (Gif of Fire, Issue 99, September 1998). We have, therefore, considered tests whereby individuals ray be qualified for entry to Mensa. We have also considered counter arguments as put forth by Kevin Langdon in his article "Mensa Tests anc Other Standard Tests" (Gift of Fire, Issue 81, Jenuary 1997) that was in response to Greg Scott's article as well as other issues that we have encountered. You will see these various lines of reasoning pursued in the following sections. 8.6.2 Cattell Culture Fair IT Cattell Culture Fair IH (A¥B) has a history of use since the early 1920s, but the present edition is dated 1960 and was revised in 1963. Mensa used this test prior to its latest adoption of the Raven Advanced (both tests are stil used by Mensa in the UK although now dropped in the US), ‘The features of this test are as follows: Scale IIL is for above average youth through adult. The norms tables include both 16 standard deviation and 24 standard deviation statistics, {Age range norms exist for each of the following ages: 13, 13.5, 14, 15, 16 (adult) 1Q's on Scale Ill range from 55 to 183 on a 16 standard deviation basis; from 20 to 219 on a 24 standard deviation basis. Accepted conversion from raw to standard scores are as follows for the 16 standard deviation normed A¥8 form 87 for 1Q 163 88 for 1Q 165 89 for 1Q 167 90 for 1Q 168 91 for 1Q 169 93 for 1Q 173 95 for 1Q 176 97 for 19.179 99 for 1Q 183 4100 for 1Q 187 (extrapolated) For the 24 standard deviation scale, a combined raw 85 = IQ 190, 88 197, 92 = 207, 97 ‘The following are features of the test: 1, Each form is 50 questions and total test time is 12,5 minutes excluding time to give directions for each of the 4 parts. 2. The test is entirely non-verbal. Editions of the test are available in 23 foreign countries and include a Spanish edition. The IPAT (publisher) can give details about all translations. 3. The four parts of the test are: series, classification, matrices, and conditions. 4. Validties for Scale II include: Concept validity (direct correlations with the pure intelligence factor) at .92 (702 males and ferrales), concrete validity (GRE, WAIS, Otis, Raven APM, Stanford-Binet, etc.) at .69 (673 males and females, students and adults), consistency over tems (splt-half) at .85, consistency over parts (interform correlations. corrected) at .82, consistency over time (test-retest, immediate to one week) at .82 ‘This test is accepted by respected psychomtricians throughout the world who accept its score up into the Prometheus Society cuttoff. We certainly do not lose credibility in accepting scores obtained on this test. Whereas we are skeptical of scores that are listed without indicating that they are "extrapolations" up to 1 183 (16 points per stancard deviation), we believe allowing a raw score of 88 (coresponcing to an 1Q of 165) on the 16 standard deviation A+B form is reasonable. Tt would open the global window for the Prometheus Society. It also would support our goal of being a truly international Society. 8.6.3 Raven's Advanced Progressive Matrixes (RAPM) Raven's Advanced Progressive Matrixes is one of a series of nonverbal tests of intelligence developed by J.C, Raven (1962). Following Spearman's theory of intelligence, It was designed to measure the abilty to educe relations and correlates among abstract pictorial forms and itis widely regarded as one of the best available measures of Spearman's g, or of general inteligence (e.g., Jensen, 1980; Anastasi, 1982). As its name suggests, and of particular significance to the Prometheus Society, it was developed primarily for use with persons of acvanced or above average intellectual abilty. Like the other Raven's matrices tests, the APM is composed of a series of perceptual analytic reasoning problems, each in the form of a matrix. The problems invelve both horizontal and vertical transformations: Figures may increase or decrease in size, and elements ray be added or subtracted, flipped, rotated, or show other progressive changes in the pattern, In each case, the lower right comer of the matrix is rissing and the subject's task is to determine which of eight possible alternatives fits into the missing space such that row and column rules are satisfied, The APM battery consists of two Separate groups of problerns. Set I consists of 12 problems that cover the full range of difficulty sampled from the Standard Progressive Matrices test, Standard timing for Set is 5 minutes, This set is generally used only as a practice test for those who will be completing Set Il, Set Il consists of 36 problems with a greater average difficulty than those In Set J. Set IL can be acministered in one of two ways: either with or without a time limt of 40 minutes. Administering Set Il without a time limit is said specifically to assess a person's capacity for clear thinking, whereas imposing a time limit is said to produce an assessment of intellectual efficiency (Raven, Court, & Raven, 1988), Philip A. Vernon, in his review of the APM (Test Critiques, 1984) writes that "the quality of the APM as a test is offset by the totally inadequate manual which accompanies it. For interpretive purposes, the manual provides ‘estimated nor for the 1962 APM which allow raw scores to be converted into percentiles (but only 50, 75, 90, and 95) and another table for converting percentiles into 1Q scores." John Johansen, a graduate student at the University of Minnesota and former regular poster to the Brain Board, came into possession of the 1962 version of the test for use in his research (this form is no longer used for testing) along with 27 pages of written text about the implementation, scoring and standardization of the test. In a post to the Brain Board at (http://www.brain.com/bboard/read/iq-archive3/1599), he provided the following Information applicable to the untimed 1962 version of the test: Untimed intraday (go until you give up) 1962 distsbution for 20 year olds, 30 year olds and 40 year olds. Scores balanced for guessing 2oyears | 30years | 40years 50 ° 7 - 75 4 2 8 90 21 20 7 95 24 23 2 99 26 25 23 99.9 30 29 26 Norms are not accurate above this point for the untimed version due to lied population taking test in this condition Ignoring the above caveat about inaccurate norms above the 99.9th percentile, the above data indicates that there is ‘about a 4 point raw score difference between 2 and 3 sigma on this test. If this difference carres on to the next “sige,” this would give associated scores of: 2oyears | 30years | a0years 99.997 34 33 30 Although this data would seem to suggest sufficient celling for discriminating at the 1-in-30,000 level, there have been ther normative studies which provide conflicting data. In an article in Educational and Psychological Measurement (Bors and Stokes, 1998), the authors mentioned two studies of interest besides Raven's 1962 group -- Paul's study and their own: S. M. Paul's 1985 study of 300 University of California, Berkeley students (190 women, 110 men): Tested under the Untimed condition, the students scores ranged from 7 to 36 with a mean of 27 and a standard deviation of 5,14, This was significantly higher than the mean of Raven's 2962 normative group (M=21.0, SD=4.0) Bors and Stokes administered the timed version of the APM to S06 students (326 women, 180 men) from the Introduction to Psychology course at the University of Toronto at Scarborough. Subjects ranged in age from 17 to 30 years, with a mean of 19.96 (standaré deviation=1.83). Enrollment in the Introduction to Psychology course was considered roughly representative of first-year students at this university, The scores on Set Il for the 506 students ranged from 6 to 35 with @ mean of 22.17 (standard deviation=5.60). This performance is somewhat higher than that of the Raven's 1962 normative group but Considerably lower than Peuts 1985 University of Calforia, Berkeley sample, Additional data supporting the conclusion that the RAPM (either timed or untimed) does not discriminate at the 1/30,000 level is taken from Spreen & Strauss (Compendium of Neuropsychological Tests, 2nd Edition, 1998), and shown in the tables below. the mean of the test group corresponds to about 1 SD above the mean of the general population, and to further assume that the SD of the general population would be about the same as the standard deviation of the test group, Finally assuring @ normal distribution In the test group, the 1-In-30,000 level would correspond to 22.17 + 3 * (5.60) = 39, which is 3 rew points above the test's celing of 36. Advanced Progressive Matrices Set II: Occupational Norms ux [ous [uta | ux | uc ux ux fu [ou | u« enerat | navy | otticer | neta | poice | senior | accnt. | oxford | tocar | rerch poputatn. | 25-28 | Aotent | minors. |] orreer J moors.) | “star athity. | Senests. yrois | ois untma, [4am | aomn [40mm |[

You might also like