You are on page 1of 3

Assignment 1C: Control, Sampling, and Measurement

Once again, return to the Ardern & Henry (2019) research report titled “Testing Writing on
Computers: An Experiment Comparing Student Performance on Tests Conducted via Computer
and via Paper-and-Pencil.”

Control
1. This study is loaded with illustrations of control, for example, the use of blinding,
matching, randomness, constancy, and so on. Locate two examples of the use of these
control techniques (or others) and describe why you think these techniques were used
(that is, state what bias or extraneous influence was controlled by each). As usual, this
takes careful reading.[4 points]

- Randomness:
o writing skill is controlled by randomly assign students to either mode of
test administration (experimental group or control group). As a result,
students used to writing with a computer won’t always choose that as the
mode of test.
- Blinding:
o All handwritten responses were entered verbatim into the computer, so as
to prevent raters from knowing which responses were originally written by
hand (expectancy effect). Otherwise, they might tend to give higher scores
for tests written by computer expecting that should be the expected
outcome.

Sampling
2. The statistical material in Chapter 8 is challenging if you have no background in
statistical thinking. The next part of this assignment provides more practice with
statistical ideas related to Chapter 8. Under “Study Design and Test Instruments,” the
researchers state that “two groups of students were randomly selected.” Why are
random selection and random assignment into groups so important in this study?[2
points]

Random selection and assignment make the sample (participants) representative of the
larger student population by eliminating confounding effects caused by extraneous
variables. As a result, the results/findings (the effect of mode of test administration) can
be generalized to the larger student population. In another word, the experiment has a
good external validity.

3. Notice the researchers’ sample size for this study. How does this compare to common
rules of thumb in research? How would you evaluate this sample size with regard to
statistical recommendations using effect size ideas? [2 points]
In this study, the sample sizes are: 46 in the experimental group and 68 in the
control group. The sample size in each group is bigger than the minimum per group size
of 30 based on the “rule of thumb” for determining sample size. This value assumes a
fairly large difference.

The sample size for experimental group is bigger than 25, which is the number
recommended in each group to find a effect size that is small; but smaller than 63, the
size required to find a medium effect size. The sample size for the control group is just
bigger than the recommended sample size to find any medium effect size.
4. Examine Table 2 in Ardern & Henry’s report, namely the OE entry. What does SD refer
to and what does the value indicate? Given the SD of 2.96, what statement can be made
about 68% of the cases, relative to the mean? And what statement can be made about
95% of the cases?[4 points]

SD refers to standard deviation, which indicates if the students in the sample score about
the same on the test, or whether there are great differences among scores (the spread of
scores around the mean). The higher the SD, the greater the spread around the mean.

If SD=2.96 for the OE entry, with mean = 7.87


- 68% of the participants scored 1SD below or above the mean, between 4.91 to 10.83
- 95% of the participants scored 2SD below or above the mean, between 1.95 to 13.79
5. Examine Ardern & Henry’s first paragraph under the section titled “Discussion”
(especially the last half). Notice the researchers’ interpretation of the effect size of .94.
Do you see why they made the interpretation that they did? [2 points]

The effect size (d) of mode of administration between experimental and control groups in
performance writing assessment is 0.94. It is considered to be a large difference (effect size
larger than 0.80). The value tells how the experimental group mean compares in standard
deviation units to the control group mean. That means, the average experimental
participants scored 0.94 (almost 1) SD above the mean of participants in the control group.

The normal distribution is shifted to the right, so the experimental group mean (50th
percentile) is shifted to the 0.94 SD mark of the control group, that is almost at its 84th
percentile. That is a huge shift of mean scores compared to the control group mean scores.

Measurement
6. Notice that in the section titled “Scoring,” the researchers report a “modest level of inter-
rater reliability.” What does this modest level suggest about errors of measurement?
From a research perspective, why are measures with modest or low reliability
undesirable?[2 points]

In this research, researchers reported “modest level of inter-relater reliability” to indicate


that the different raters gave different scores for the same test. The source of the error, in
this case, is among the raters. The outcome measures were lacking consistency,
indicating the presence of measurement errors.
7. If the researchers had presented information on validity, what type of validity do you
think would have been most relevant? Why?[2 points]

Validity means accuracy of inferences that are made on the basis of the outcome measure.
Valid experiment allows researchers to make accurate inferences based on the score. Validity of
this research is essential to tell whether the tests used actually measured what the researchers
are supposed to.
In this particular research, content validity is probably the most relevant. The research
will be interested to find out if the open-ended entries, NEAP subject tests, and the
performance writing assessments actually reflect students’ performance, using either mode of
administration.

You might also like