MAP Project Questions Around Negative Growth

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

From: de Barros, Jessica [jedebarros@seattleschools.

org]
Sent: Wednesday, February 10, 2010 12:55 PM
To: Dave Swanson
Subject: RE: MAP Project: questions around negative growth

Thanks, Dave.  Just wanted to make sure we were clear on the process for next time.  Andy is following
up with Terri so I think we will be OK.

Jessica

From: Dave Swanson [mailto:dave.swanson@NWEA.org]


Sent: Wednesday, February 10, 2010 12:53 PM
To: de Barros, Jessica
Cc: Bernatek, Bradley T; Olsen, Andrew J
Subject: RE: MAP Project: questions around negative growth

Jessica:  I thought about the fact I should have vetted this through you (a few seconds after hitting
send).    My apologies and I’ll follow my own advice next time.

All of the messaging comes from our website and I didn’t include any information about the growth
calculator.   Her question about 25 or so students (out of 500 or so students tested) who saw negative
growth would likely be illuminated by looking at how much time was spent in fall vs winter on the tests
for those students.   I’m sure you have that CDF data already ordered for both seasons, so the time
duration question would really only be answerable on that report since they didn’t record the summary
screen information at the end of the test.   As I mentioned in the email that’s something on the roadmap
for future releases (having more ways of knowing whether a student is putting forth best effort in a
test).

I fall on my sword for the communication and would be interested to learn more about your customized
growth calendar at some point.   Dave

From: de Barros, Jessica [mailto:jedebarros@seattleschools.org]


Sent: Wednesday, February 10, 2010 12:37 PM
To: Dave Swanson
Cc: Bernatek, Bradley T; Olsen, Andrew J
Subject: RE: MAP Project: questions around negative growth

Hi Dave,

I appreciate your responsiveness to this principal, however, some of the messages are counter-
productive to our own internal messaging.  For example, we just completed our own reports that
calculate growth more precisely than the NWEA growth calculator does.  I do not want her and her staff
to now go use the growth calculator.  Did she contact you directly?  If you get questions from our staff in
the future it would help if you could check in with a Data Coach or me to ensure we are messaging
consistently.

Thanks,
Jessica
From: Dave Swanson [mailto:dave.swanson@NWEA.org]
Sent: Wednesday, February 10, 2010 12:05 PM
To: Skjei, Terri
Cc: de Barros, Jessica
Subject: MAP Project: questions around negative growth

Terry:  It was good to talk with you.   I have some information below, some of which might be helpful for
staff and other pieces that you can decide on how to message appropriately for parents.  

How should negative growth be interpreted?

The NWEA Research group has identified common behaviors which explain why some students show
negative growth scores in the spring.  Negative growth most commonly appears on reports for two
general reasons: either students took too little time to take the test or students were not engaged
during the test.

• The Impact of Not Spending Enough Time on NWEA Assessments

If students spend 25 seconds or less on an item, they will not show their top performance. Generally
speaking, to show their best growth, students need to spend at least 25 to 30 minutes on each test.
Sometimes students take more time per item at the start of the test, and then rush through toward the
end of a test.  Proctors should watch students at the end of the test and caution them to take their time.
Six minutes is the lowest amount of time a student can spend on a test and still have the score be
considered valid. In short: The longer a student spends taking a test, the more likely they are
to demonstrate growth.

• The Impact of Students Not Engaging with NWEA Assessments

Students may be over‐tested and tired of taking tests (state mandated tests, other local required tests).
Even students who decide not to make their best effort on NWEA tests and check the same answer
letter for each question will actually begin to select correct answers; as the questions become easier,
they are embarrassed to not answer correctly. The student in this case is generally engaged early in the
test, then not engaged, and then engaged again. The result is that the student got enough correct
answers to have the test be considered valid but achieved a very low RIT score.
Some students may "cherry pick" items on the test. That is, they will correctly answer items that are
easy enough to solve quickly and guess on items that would take more work. This can lead to deflated
scores.

Test fatigue may be a real issue, especially with younger students or special education students.
Conducting the test in two sittings of about 30‐35 minutes each may be an answer for some students.  
Retesting students who score lower than expected may contribute to fatigue as well, so NWEA advises
schools to look at longer term trends (more than just fall to winter) and to retest judiciously.

A major problem exists with adults promising a fun or rewarding time "once everyone is finished with
the test". Not only do the students rush, but they are not making a positive effort since something more
fun awaits them at the end.
Building a culture around effective use of data means explaining to parents that any test is a snapshot in
time and needs to be triangulated with other measures.   Some schools make a point of recording the
end of test summary screen so that the test duration data is recorded and if a student did complete the
test in too little time, the proctor will probe a little to understand what was on the student’s mind (if
your cat died that day, you might not be up to giving your best performance).  Again, a test of more than
six minutes would not be invalidated necessarily, so teacher judgement and proctor observation
become important qualifiers.

• Recommendations

Proctors should log start and end times for students when possible. Documenting short test times can
explain an unexpected poor result. Students who have an 11‐point decline in their RIT score should be
considered for re‐testing. Students with growth index scores under ‐11 (mathematics) would be in the
bottom 10% of performers based on our current study. Retesting is not helpful if the student is not more
engaged on the second attempt. The RIT Scale Norms book contains tables showing the likely growth or
decline by grade level. 

What is typical growth from fall to spring?

Typical growth, also called mean grade level growth, varies from year to year. This information is
available on the 2008 Normative Data document. You can access this document ‐ NWEA web site ‐
http://www.nwea.org/support/article/980

** In order to get the test duration for each student in Winter you would need one of the data coaches
to pull the Comprehensive Data File and send your school’s Fall and Winter CDF for the 30 or so kids out
of 500 that you have the concerns about.   NWEA is working on other ways to track patterns of
engagement and express those out more easily in reports and I’ll keep you in the loop on those
developments over time.   The time spent on the test is probably one of the best comparative indicators
for now until you have more data points in Spring 2010 and Fall 2010.

I hope this helps.   Dave

Some other possibly helpful topics:

Why do the Rasch Unit (RIT) score ranges vary so significantly on reports?
RIT ranges vary because the Standard Error of Measure (SEM) may vary from student to student. The
SEM speaks to the
consistency of a student’s answers. If they are very consistent then the SEM is lower, and when students
are randomly
selecting answers our software detects this and the student ends up with a large SEM. The SEM is
applied to either side
of the RIT score to help define a “confidence band” or RIT range. It should also be noted that there are
acceptable and
unacceptable standard errors of measure. When the SEM is less than 1.5 or greater than 5.5 on a Goals
Survey test then
the test is automatically invalidated if the RIT is less than 240. If the RIT is greater than or equal to 240,
the SEM is
permitted to exceed 240.

Why do some RIT scores have a large percentile range?


The percentile is determined by two factors: The Rasch Unit (RIT) score itself (and where that score falls
in the
distribution of other scores in the grade level as they existed in the norming study) and the
measurement error
associated with the score. For example, two grade 3 math scores in the spring have a measurement
error of 3.0. The two
scores are 201 (52 percentile rank) and 223 (97 percentile rank). The RIT ranges (and the percentile
ranges) for these
two scores are:
201 198‐204 (41‐61)
223 220‐226 (95‐98)
So, given the same level of error, scores toward the middle of the distribution will have a wider
percentile range. When
the student's achievement level is measured with less precision (i.e. the measurement error >4+), the
effects are even
more pronounced in the middle of the distribution.

Dave Swanson
Senior Account Executive  |  NWEA

PHONE 503.212.3377|  CELL 503.729.1866| FAX 503.639.7873  

NWEA.ORG |  Partnering to Help All Kids Learn

Join the SPARK Community, an online community bringing conversation, experience and resources
together for educators.

You might also like