You are on page 1of 10

SPED 311 Assessment Review Project

Name: Kacie Sanders
Date: April 14, 2015
School/Setting: Oakwood Intermediate School – Content Mastery

Practical Evaluation
Description of test: The KeyMath III test is written by Austin J. Connolly and published by
Pearson. The first edition as published in 1971 and later updated in 1988 to be more user-friendly
(Rosli 2011). It is an assessment tool that measures student’s abilities to solve a variety of math
problems. Rosli (2011) states that the KeyMath III is a “norm referenced individually
administered diagnostic test”. This assessment is used for students 4 years and 6 months to 21
years and 11 months, or grades Kindergarten through 12th grade. This test is used to assess
student’s abilities to use math concepts to become a problem solver (Rosli, 2011). This test has
two forms, form A and form B. To purchase just one form the price is $474, to purchase both
forms the price is $808. Each form is identical, the only difference is the questions. The content
is the same, the questions are just worded differently. This test is untimed but typically takes
between 30 and 90 minutes to complete, however it can vary by the students behavior, ability,
and grade level (Rosli, 2011).
Description of test manuals: The test manual is a paperback book that is bound together. To
assist with navigation, the test has a rather detailed table of contents. It shows the title of each
chapter then breaks it down further into the subchapters. It is laid out very logically and allows
for information to be found with ease. Rosli (2011) notes that the manual is “informative and
accessible”. At first glance, the amount of information in this manual seems to be very detailed
and helpful. However, when actually going through each chapter the information seems to be a
little overwhelming. At first it seems like each little detail needs to be read in order to be as
knowledgeable about the test as possible, yet it tends to get more repetitive as the chapters go on.
As far as reliability goes, the information in this manual seems very reliable. It provides charts

and tables with data supporting the information they are providing. While the test manual may
seem lengthy in nature, all of the information is useful to anyone administering the KeyMath III.
Description of test materials: The test materials included an easel and a response booklet. The
easel is both small and light, which makes transportation easier. The testing easel is divided into
six sections: numeration, algebra, geometry, measurement, data analysis, and
probability. This makes it easy for the administrator to find the section the examinee is being
tested on. The pages of the easel seem to be very durable. The paper feels like a glossy cardstock.
It is a heavy paper that will not rip easily, if at all. The glossiness of the paper makes it more
visually appealing, as it seems to enhance the colors. Rosli (2011) states that there are 372 “clear
and colorful” items within the easel. The front and back cover are not made of a different
material than the rest of the internal pages. The pages in the testing easel are bound by a spiral.
The spiral is just the right size to allow the pages to turn with ease. The entire easel is also
printed in color, this makes it more visually appealing to both the examinee and the
administrator. When the easel is all set up, it allows both the examinee and administrator to read
information. On the administrator side, it has the problem, what the teacher is supposed to say,
and the answer. It also has descriptions of correct responses so there is no confusion on scoring.
On the examinee side, it shows highlights of the problem that was read by the administrator and
the answer choices. The problems and pictures are at the appropriate age level. They are also
appropriate in the sense that no one will get offended while taking or administering this test.
These items are presented in multiple mathematical representations such as pictorial, real world,
and symbolic (Rosli, 2011). The testing easel is divided into six sections: numeration,
algebra, geometry, measurement, data analysis, and probability.

The response booklet is two pages front and back of math problems that the examinee
solves. The paper is durable and sufficient for one time use. It is thicker than printer paper, but
not as thick as the easel. The only concern I have about the paper is that it has a slight sheen to it,
so some it might not cooperate with certain erasers. It may make the pencil marks worse rather
than erasing them completely. The items in this booklet are math algorithms that represent all
four math functions: addition, subtraction, multiplication, and division. Examinees are only
asked to solve the problems that are in their functional range. The entire resting kit comes in a
small bag that makes transporting the test and keeping up with the items much easier.
Description of test protocols: The protocols of the KeyMath III Are very straight forward and
laid out in a way that is easy to follow. The scoring booklet is broken up into the different testing
sections; numeration, algebra, geometry, measurement, data analysis and probability, mental
computation and estimation, addition and subtraction, multiplication and division, fundamentals
of problem solving, and applied problem solving. The protocol is easy to handle. It gives clear
and concise instructions on how to record answers. It also informs the examiner how to establish
a basal and ceiling set before beginning the exam. Since the protocols are only intended for one
tine use, they are pretty durable. The paper it is printed on is thicker than regular computer paper
and the ink does not rub off easily.
Description of test items: The test items are all contained in the easel and the response booklet.
The items in the test booklet are at a reasonable level for the age of the students being tested.
They also correspond nicely with the competencies being tested. The easel is very user friendly
and appealing. It is all printed in color, with not too many words on the student side. It is not
overwhelming. The items in both the response booklet and the testing easel are math algorithms,
small word problems, and pictures.

The KeyMath III is very easy to administer. The directions are simply stated and tells the
administrator exactly what to say. The directions and problems are also short and require
minimal amounts of verbal exchange. Scoring this test is relatively easy; the student either gets
the correct response or they don’t. Rosli (2011) noted that the responses are recorded by either
marking a “1” for a correct answer and a “0” for an incorrect response. There is no gray area or
need to be confused on a response. There are multiple ways to get scores from this test. The
online system they use is called ASSIST. It gives score summaries, score profiles, area
comparisons, focus items, progress reports, item and functional range analysis, narrative reports,
and parent/caregiver letters. The responses given by the students are both verbal and written
numbers. The teacher will potentially have to pay special attention to the verbal responses if the
student had a speech impediment or speaks with an accent he/she isn’t used to.
Technical Evaluation
Norms: The norms for this test were collected in the fall and spring of 2006. They tested 4,000
individuals with ages ranging from 4 years and 6 months to 21 years and 11 months. The norms
were taken for both the entire assessment and the specific areas. The norms were selected so that
they matched the 2004 US census (Rosli 2011). When collecting these norms, Rosli (2011)
informs that 444 items were administered in 272 different test sites within 45 states. The
examinees were broken down into 17 age groups and 26 educational level groups. As far as
representation goes, the south seemed to be represented slightly more than the rest of the county.
The north east region had 18.1% representation, north central had 22.7%, south had 35.7%, and
the west had 23.5%. When looking at the norms from a Texas stand point, it looks adequate.
When looking at the demographics, the white population was represented more than any other
ethnicity. The male to female ratio was fairly even and the test does not seem to favor one sex

over the other. Special education is quite underrepresented in this norms sample. 5.7% had a
specific learning disability, 2.2% had a speech or language impairment, 1.1% had an intellectual
disability, 1.1% had an emotional or behavioral disorder, .3% had a developmental delay, and .
6% had other health impairments. When evaluating the representativeness of special education, it
does not look promising. It was not tested on many students with special needs, therefore it is
unknown how beneficial this test would be for them and their education.
Reliability: Within this assessment, there are three types of reliability. The first one is internal
consistency. This type of reliability indicates the consistency of performance across the sets of
items. When they calculated this reliability, they split the items into two halves and matched on
item content and difficulty. The second type of reliability is test retest. Within this type of
reliability, the tests are administered in two different settings. The students are given the exact
same content and form of the test, they are just testing in a different environment. It is assumed
that both environments are desirable for the test. The interval between administering the two tests
was between 6 to 28 days with the average being 17 days. The examinees in this group were
equally divided between males and females. The final type of reliability is alternate form. With
this type of reliability the examinees take two different forms. Half of the examinees took form A
then form B, then the other half took form B then form A.
Types of reliability within this test seem to be very adequate. The estimates are based on
the actual scores from items. They take the actual scores straight from the tests given and
calculate the reliability. There are also no assumptions made on how the examinee would have
performed on items not administered. They only calculate the scores from the items actually
administered to the students. If a student did not answer a question or did not get to it, that item

is not calculated into the score or the reliability. The reliability scores over all are in the upper .80
to .90 range (Rosli, 2011).
Validity: There are two types of validity embedded within this test. The first type of validity that
is discussed is content validity. This is the degree which the test items represent the math
curricula at the national level. The manual discusses that the validity was obtained by the
creation of a comprehensive blueprint that reflects the essential math content, existing curricular
priorities, and national math standards. The content is also organized into a structure that is
aligned with the curricular frameworks and instructional development across the grades. The
items were also developed in such a way where they will accurately assess the student’s
proficiency in the content areas being tested. The second kind of validity is construct validity.
This means that the test is actually testing what it is supposed to test. When establishing this type
of reliability, they took the subtests and its scores and related it to other tests that tested the same
This test seems to be very reliable. They have taken the test and compared it to different
tests that test the same thing and proved that they yield the same results. The tests that they
compared results to include the Iowa Test of Basic Skills and the Kauffman Test of Educational
Achievement. It was also proven that this test aligns well with the current math standards.
Journal Reviews
Journal Review #1: This article written by Roslinda Rosli (2011) was a direct review of the
KeyMath III. The article started off with an overall general description of the test. She states
when the test came out, when it was revised, what content is being assessed, and what comes in
the kit. From there, it moves into a specific description. It talks about the age of the students
being assessed, which aligns with what is actually in the manual. Rosli also says that the test can

be used to keep track of student’s progress every 3 months. She also notes that it can be used to
guide a teacher in placement decisions, however they found no further information on the
subject. Next, Rosli moves into a discussion and evaluation of the test materials. She explaines
that the manual is accessible and informative. She also discusses the administration time and that
the examinee’s grade level, ability, and behavior can impact how long it really takes. The next
piece that is evaluated is the scoring. Rosli talks about how to record oral responses and about
the multiple parts to some questions. She notes that some questions have a part A and a part B.
However, the only way that the student can get to the second part of the question is if they get the
first part right. Finally is the technical evaluation where the norms, reliability, and validity is
discussed. When discussing the norms she states most of the same information that is in the
manual, but she also adds that all the participants were proficient in English and had no vision or
hearing impairments. The reliability section was very thorough and Rosli discussed how the
reliability was obtained and what each type means. When discussing validity, she states the
different types and talks about how each was obtained. The only drawback is that one type was
discussed more than the other.
MMY Journal Review #2: In the Mental Measurements Yearbook, there are two reviews done
within one. The first is done by Theresa Graham. She starts her review off by giving a brief
overview of the test. She talks about the subtests and the different content that is being tested.
She then moves into talking about the revision that was done. Something important to note is that
when this test was constructed, relevant professionals were surveyed. Also, the total number of
items in both forms adds up to be 444. Next, the article moves to discuss the norms, reliability,
and validity. For the norms, Graham discusses how they were obtained and if they are adequate.
For reliability, the different types and how they were obtained is discussed. When she brings up

the validity, she states how they determined this assessment was valid and what the scores were.
She also notes that teachers and administrators were interviewed.
The second review done in the MMY was done by Suzanne Lane. Her review is quite
similar to the one done by Graham. She starts off by discussing the basics of test and what it
assesses. One thing that is new is her note about the time. Lane says that the test is untimed but
then broke it down even further. She said the author estimates 30 to 40 minutes for elementary
school students and 75-90 for the secondary grades. She also moves into discussing the ASSIST
technology and how it enhances the experience of the test. She also has a similar discussion of
how the test was developed. She then moves into discussing norms, reliability, and validity;
which is similar to both reviews that have been discussed thus far.

Connolly, A. (2007). KeyMath 3 Diagnostic Assessment Manual (Forms A&B). Pearson.
Graham, T. (n.d.). KeyMath-3 Diagnostic Assessment. Review of the KeyMath-3 Diagnostic
Assessment. Retrieved April 14, 2015, from Mental Measurements Yearbook.
Lane, S. (n.d.). KeyMath-3 Diagnostic Assessment. Review of the KeyMath-3 Diagnostic
Assessment. Retrieved April 14, 2015, from Mental Measurements Yearbook.
Rosli, R. (2011). Test Review: A. J. Connolly "KeyMath-3 Diagnostic Assessment--Manual
Forms A and B." Minneapolis, MN: Pearson, 2007. Journal Of Psychoeducational
Assessment, 29(1), 94-97.