You are on page 1of 14

International Journal of Science Education

ISSN: 0950-0693 (Print) 1464-5289 (Online) Journal homepage: http://www.tandfonline.com/loi/tsed20

First‐year physics students’ perceptions of the


quality of experimental measurements

Saalih Allie , Andy Buffler , Bob Campbell & Fred Lubben

To cite this article: Saalih Allie , Andy Buffler , Bob Campbell & Fred Lubben (1998) First‐year
physics students’ perceptions of the quality of experimental measurements, International
Journal of Science Education, 20:4, 447-459, DOI: 10.1080/0950069980200405

To link to this article: http://dx.doi.org/10.1080/0950069980200405

Published online: 23 Feb 2007.

Submit your article to this journal

Article views: 120

View related articles

Citing articles: 19 View citing articles

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=tsed20

Download by: [University of Otago] Date: 23 October 2015, At: 20:09


INT. J. Sci. EDUC., 1998, VOL. 20, NO. 4, 447-459

First-year physics students' perceptions of the quality


of experimental measurements

Saalih Allie*, Andy Buffler*, Loveness Kaunda*, Bob Campbell‡ and Fred
Lubben‡
*Department of Physics and Academic Development Programme, University
Downloaded by [University of Otago] at 20:09 23 October 2015

of Cape Town, South Africa; ‡Department of Educational Studies,


University of York, UK

This paper reports an investigation into the procedural understanding of first-year university science
students at the University of Cape Town, South Africa. Written probes were used to explore ideas
regarding the reliability of experimental data, in particular the need for repeating measurements and the
implications of the spread associated with data. The types of reasoning underlying the responses are
discussed. The findings suggest an extension of a recently proposed model of progression of under-
standing of empirical evidence. Equally, they show that the use of procedural understanding is context
dependent, and that students' intuitive ideas about improving precision and accuracy of measurements
are not distinct.

Introduction
Undergraduate physics programmes generally focus on increasing the understand-
ing of concepts, laws and models (declarative knowledge) through lectures and
tutorials, and on a greater insight into the methods of scientific enquiry (proce-
dural knowledge) through practicals and projects (e.g. Black 1993). More recently,
Osborne (1996) argued that the purposes of experimental work should be more
strongly focused on procedural understanding. Students' understandings of a vari-
ety of physics concepts have been reported extensively for students at different
levels (for a summary see Pfundt and Duit 1994; for the South African situation,
see Linder and Hillhouse 1996). On the other hand, the procedural understanding
of science students is rarely studied, nor used as a starting point for teaching:
experimental procedures in science lessons are usually taught as a list of instruc-
tions on how to collect a 'good' set of measurements and how to manipulate the
data.
Based on a large observational study of British secondary school students
doing open-ended investigative tasks, Millar et al. (1996) distinguish three areas
of procedural understanding. First, they identify a variety of 'frames' for doing
experimental work, i.e. students' perceptions of the purpose of practical experi-
mentation. Second, they suggest that decisions about the experimental procedure
are influenced by students' knowledge about how to manipulate the apparatus.
Lastly, they indicate that the procedure adopted is critically influenced by the
students' understanding of the issue of reliability of experimental evidence. Gott
0950-0693/98 $12·00 © 1998 Taylor & Francis Ltd.
448 S. ALLIE ET AL.

and Duggan (1996) suggest that perceptions of the reliability of experimental data
influence the design of a practical investigation (e.g. inclusion of a control, the
choice of sample size), the ways the data are collected (e.g. varying one variable at a
time, taking repeat measurements), reported (e.g. in graphs or tables) and inter-
preted (e.g. notions about spread of results). This paper focuses on aspects of this
area of procedural understanding.
Little has been reported on the procedural understanding of undergraduate
science students. Sere et al. (1993) studied the understanding of French physics
undergraduates of issues relating to the quality of experimental results. They
noticed that students differentiate poorly between random and systematic errors
and between the related low precision and accuracy of their data. As Thomson
(1997) highlights, however, this terminology is not used consistently even in phy-
sics publications. In this context it is interesting to note that the International
Organisation for Standardisation has banned the term 'precision' for descriptions
Downloaded by [University of Otago] at 20:09 23 October 2015

of scientific measuring instruments because of its many confusing everyday con-


notations (Giordano 1997). For the purposes of this paper the definitions of
Bevington and Robertson (1991) are used: accuracy is the measure of how close
the result of an experiment is to the theoretically 'true' value. The accuracy of an
experiment depends on how well we can control or compensate for the systematic
errors, i.e. errors that will lead to reproducible discrepancies between the result
and the 'true' value. Precision is a measure of how well the result has been deter-
mined (without reference to the theoretically true value) and is a measure of the
reproducibility of the result. The precision of an experiment depends on how well
we can overcome random errors, i.e. the fluctuations in observations that yield
results that differ between repeated experiments. Sere et al. (1993) also concluded
that even the correct use of statistical procedures seldom indicates an appreciation
of the purposes behind such procedures, or an understanding of how to assess the
reliability of data.
Based on paper-and-pencil tests of a large mixed-ability group of U K students
in the 11-16 age range, Lubben and Millar (1996) suggest a model (see table 1) for
the progression of types of student ideas about the reliability of experimental data.

Table 1. Model of progression of ideas concerning experimental data.

Level Student's view of the process of measuring

A Measure once and this is the right value


B Unless you get a value different from what you expect, a measurement is correct
C Make a few trial measurements for practice, then take the measurement you want
D Repeat measurements till you get a recurring value. This is the correct
measurement
E You need to take a mean of different measurements. Slightly vary the conditions to
avoid getting the same results
F Take a mean of several measurements to take care of variation due to imprecise
measuring. Quality of the result can be judged only by authority source
G Take a mean of several measurements. The spread of all the measurements
indicates the quality of the result
H The consistency of the set of measurements can be judged and anomalous
measurements need to be rejected before taking a mean
Source: Adapated from Lubben and Millar (1996).
PERCEPTIONS OF THE QUALITY OF EXPERIMENTAL MEASUREMENTS 449

In order to test this model for its applicability for older and educationally more
advanced students, a study on students entering undergraduate physics pro-
grammes is needed. This paper reports the findings of such a study on a group
of mainly Science Foundation Programme students at the University of Cape
Town halfway through their first year of university education. Students demo-
graphic data show that a large proportion of the sample are second-language
English speakers, and that the majority have had little or no experience with
practical work before university entry. However, all have completed a course in
'Tools and Procedures for Physics 1' (for details see Allie and Buffler, in press)
prior to their involvement in this study. The research reported here sets out to
determine the procedural understanding of the students and how their ideas about
the reliability of experimental data might fit the procedural progression model.
Downloaded by [University of Otago] at 20:09 23 October 2015

Methodology
Nine pencil-and-paper probes were designed for this study, several based closely
on probes from the PACKS study (Lubben and Millar 1994). This paper focuses
on three probes dealing with the reasons for repeating measurements and three
probes concerned with how to deal with sets of experimental data. The latter
covered the issues of how to handle an anomalous measurement, how to compare
two sets of measurements having the same mean but different spreads and how to
compare two sets of measurements having a similar spread but different means. All
the probes were based on the same experimental setting which was chosen for its
simplicity of description and for the fact that the students had not encountered it in
their laboratory course. The following situation and diagram (figure 1) was pre-
sented to the students:

An experiment is being performed by students in the Physics Laboratory. A wooden


slope is clamped near the edge of a table. A ball is released from a height h above the
table as shown in the diagram. The ball leaves the slope and lands on the floor a
distance d from the edge of the table. Special paper is placed on the floor on which the
ball makes a small mark when it lands. The students have been asked to investigate
how the distance d on the floor changes when the height h is varied. A metre stick is
used to measure d and h.

ball

table
*
*

floor

Figure 1. Diagram of the experimental set-up used in the diagnostic


instrument.
450 S. ALLIE ET AL.

The students work in groups on the experiment. They are first given a stopwatch and
are asked to measure the time that the ball takesfrom the edge of the table to hitting
the ground after being released at h = 400 mm. They discuss what to do.

We can roll the I think we should


Let's roll the ball
ball once from release the bpll
twice from height
h = 400 mm more than twice
h = 400 mm, and
and measure from h = 400 mm
measure the time
the time. and measure the
for each case.
Once is enough. time in each case.
y
Downloaded by [University of Otago] at 20:09 23 October 2015

A B C
With whom do you most closely agree? (Circle ONE):
A B C
Explain your choice.

Figure 2. The probe dealing with repeating time measurements.

The 'experiment' was demonstrated to the class using a large-scale model of the
slope and the ball. The text as quoted above was read out twice while the ball was
released from two different positions on the slope. Care was taken not to infer any
procedural hints. All the probes had the same form, as shown, for example, in
figure 2. A terse writing style was adopted to minimize potential linguistic confu-
sion since a large fraction of the sample were second-language English speakers. A
brief stem of text posited a situation where decisions had to be made concerning
the experimental procedure. A number of options were presented in the probe by
cartoon characters, purposely included to avoid gender and race bias in influencing
respondents' choices. The instrument called for an explanation of each choice
made.
The instrument was administered to a sample of 121 students. The students
were instructed to answer the probes strictly in sequence and neither to return to a
previously answered probe nor to flip forward to see what was coming. As each
probe was completed it was placed inside an envelope and not taken out again. The
last page in the set of probes gave the students an opportunity to note any changes
they would have liked to have made to their responses.
The first phase of the analysis involved drafting an alphanumeric coding
scheme based on an inspection of a selection of student responses. The coding
scheme was designed so that the choice made by the student for each probe could
be considered together with the reason for that choice in the analysis. A sample of
20 responses to each probe were independently coded by the five researchers in
order to refine the coding scheme and ensure a common interpretation and appli-
cation. Each probe was then independently coded by two people (with levels of
agreement around 95%) after which a random sample was checked by a third
person as an additional validation process. The frequencies of responses for each
probe were tallied and clusters of responses showing similar types of reasoning
PERCEPTIONS OF THE QUALITY OF EXPERIMENTAL MEASUREMENTS 451

were identified. In addition, 20 students were interviewed for 30 minutes each, in


order to verify if the probes were understood by the respondents as intended by
the researchers, and if the written responses were interpreted in line with the ideas
the respondents wanted to communicate. Part of our concern was the fact that
most students were presented with probes and wrote their responses in a second,
or even subsequent, language. In the verification interviews all students said they
were clear about what they were being asked. More than three out of four inter-
viewees found it easy to make a selection from the given propositions as 'the chance
to compare ideas made it easy to select', whereas only one in seven students
interpreted some options as identical or overlapping. Providing an explanation
was seen by three out of four interviewees as non-problematic, whereas the
remaining minority said that they felt that they 'know how to explain it but the
language was missing'. The small proportion of those who said they had difficulty
with selecting an answer and/or formulating a justification are judged to be no
Downloaded by [University of Otago] at 20:09 23 October 2015

greater than expected with first-language English speakers.

Results
The findings of the study are presented in two sections. The first section presents
the response to three probes exploring students' perceptions of the purpose of
repeating measurements. The second section reports the responses to three probes
focusing on students' ideas about the spread of a set of measurements. The quotes
presented in support of the analysis are drawn from different students with no
student being quoted more than once.

Perceptions on the purpose for repeating measurements


The first of the three probes on repeating measurements intended to solicit ideas
about the purpose of repeating time measurements (RT). In order to provide a
flavour of the probes, probe R T is presented in full in figure 2.
The following probe on repeating distance measurements (RD) used the same
format and had the following text:
After measuring the time, the students now have to determine d when h = 400 mm.
One group releases the ball down the slope at a height h = 400 mm and, using a
metre stick, they measure d to be 436 mm.
The following discussion then takes place between the students.
A: 'I think we should roll the ball a few more times from the same height and measure d
each time.'
B: 'Why? We've got the result already. We do not need to do any more rolling.'
C: 'We must roll the ball down the slope just one more time from the same height.'
This was followed by a probe (RDA) in which two subsequent distance measure-
ments provide different readings:
The group of students decide to release the ball again from h = 400 mm. This time
they measure d = 426 mm.
First release: h = 400 mm d = 436 mm
Second release: h = 400 mm d = 426 mm
The following discussion then takes place between the students.
A: 'We know enough. We don't need to repeat the measurement again.'
B: 'We need to release the ball just one more time.'
C: 'Three releases are not enough. We must release the ball several more times.'
452 S. ALLIEETAL.

Table 2. Summary of responses to the three probes on repeating


measurements for time (RT), distance (RD) and distance again (RDA)
(n =
No. of students (%)
Category Description RT RD RDA
Rl No repeats are needed 0(0) 9(7) 2(2)
R2 Repeats provide practice to improve the process 15(12) 12(10) 9(7)
of taking measurements
R3 Repeats are needed to find the recurring 5(4) 12(10) 4(3)
measurement
R4 Repeats are needed to improve the accuracy 8(7) 10(8) 28(23)
R5 Repeats are needed for establishing a mean 77(64) 60(50) 61(50)
R6 Repeats are needed for establishing a spread 14(11) 11(9) 11(9)
R0 Uncodeable 2(2) 7(6) 6(5)
Downloaded by [University of Otago] at 20:09 23 October 2015

Regardless of the option chosen for each probe, six main ideas about the purpose of
repeating measurements arose in the responses to these three probes. Table 2
shows the frequencies of these main ideas which have been listed in order of
least to most sophisticated.
Students in category Rl do not see any purpose in repeating measurements
and argue that 'they don't need to do any more rolling, because there is paper on
the floor. The ball will make marks while hitting the floor. This is the distance
they want' (RD), and that 'if the same wooden slope is used the distance should be
the same. Without friction the ball will land on top of the previous mark' (RDA).
Responses in category R2 indicate that repeating is needed in order to gain
practice and thus perfect the individual measurement. Typically, these students
claim that 'by releasing the ball more than twice from h = 400, we can be more
certain of our answer. If we release our ball maybe five times we can limit the
chances of doing mistakes when using the stopwatch' (RT). About a third within
this cluster see perfecting the measuring technique as an introduction to calculat-
ing a mean and suggest that 'they have to release the ball more than twice to ensure
that the times that they are getting are consistent and accurate. Once they are sure
of the time, they can take the mean of the values' (RT). Such an understanding is
more advanced and links to category R5.
The responses in category R3 indicate that repeating measurements is needed
in order to find a recurring value, which is then perceived as the correct reading,
since 'if the measurements are taken several times, it will be evident if the meas-
urements correspond. It will be of great advantage finally to get the same meas-
urement for several attempts' (RDA).
Category R4 consists of responses which make a very general reference to
repeating in order to increase the accuracy. These students write that 'the larger
the number of readings, the greater the accuracy of the times achieved for the
experiment' (RT), and 'the more measurements you take the more you know
how accurate you are. One or two measurements doesn't tell you enough about
the real time taken' (RDA). Almost all responses within this cluster refer to aiming
for a single 'real' or 'true' value, indicating a lack of appreciation of the inherent
variation in experimental data.
PERCEPTIONS OF THE QUALITY OF EXPERIMENTAL MEASUREMENTS 453

Included in category R5 are the responses focusing on repeating meas-


urements in order to calculate a mean value. A large number of these students
indicate that taking the mean compensates for random errors in individual
measurements and explain that 'it is tricky to measure time accurately with a
stopwatch, so I reckon that you should take more than 2 readings. More readings
would eliminate human error in stopping and starting the stopwatch when the
average is taken' (RT), and that 'the ball has to be rolled a few more times because
there is always error in any experiment. The most accurate way of determining the
precise measurement is to take the average of the values that came out of the
experiment' (RD). About a third of this cluster explicitly state that the mean
value will be close to the true value. In contrast, the more sophisticated thinkers
within this category appreciate that an increase in the number of measurements
will increase the reliability of the mean, for example the students who wrote that 'it
is better to obtain more results in order to have a more accurate and meaningful
Downloaded by [University of Otago] at 20:09 23 October 2015

mean' (RT).
Category R6 comprises those responses suggesting that repeating meas-
urements is needed to improve the spread of the measurements where students
suggest that 'in order to be more precise, that is reduce the uncertainty, we have to
take as many readings as possible' (RDA). More than half of the responses in this
cluster also mention calculating a mean, for example, 'For any measurement in
physics there will be systematic errors. Hence the value of time in each case will
differ. So they will need to find the average time. Then there will be the uncer-
tainty associated with that average of time' (RT).
Although the analysis and classification of the responses for each probe pro-
vide an overview for the ideas being used by the total sample of students, it is
useful to look at the sets of responses of individual students in order to establish
the consistency of the use of these different types of reasoning about repeating
measurements. Four types of reasoners can be identified. A small cluster of
respondents (7%), the 'non-repeaters', do not see a purpose in repeating distance
measurements due to the static nature of the measuring points. On the other hand,
all of these 'non-repeaters' reason that several time measurements need to be taken,
and that the mean has to be calculated with the specific purpose of compen-
sating for reading errors and of approaching the 'true' value for the time. A
second small cluster (8%), here called 'perfecters', reckon that repeats of
time and distance measurements are needed to practise and perfect the experi-
mental procedure (R2). Confronted with different repeated measurements in
the RDA probe, half of this cluster suggest to go on repeating and take the
mean. A third small cluster of students (10%) suggest repeating distance meas-
urements in order to find a recurring value. Half of these 'confirmers' persist in
this view when presented with two different distance readings. The fourth and
largest cluster (58%) can be considered consistent 'mean reasoners'. They give
consistent R5 responses for at least two out of three probes in terms of repeating
in order to establish a mean with an R4 response for the third probe. Within this
cluster, 7% of the sample also repeatedly mention the calculation of a spread, or
uncertainty, as reason for repeating measurements. This subset may be termed
'spontaneous spread reasoners'. In order to refine the large cluster of 'mean rea-
soners', the analysis of the responses to the three probes on spread will now be
discussed.
454 S. ALLIE ET AL.

Perceptions on spread and comparison of data sets


The first probe on spread of measurements dealt with how to handle an anomaly
(AN):
A group of students have to calculate the average of their (distance) measurements
after taking six readings. Their results are as follows (mm): 443, 422, 436, 588,
437, 429.
The students discuss what to write down for the average of the readings.
A: 'All we need to do is to add all our measurements and then divide by 6.'
B: 'No. We should ignore 588 mm, then add the rest and divide by 5.'
It can be seen from table 3 that 42% of the students chose to include the anomaly
while 56% felt that the anomaly should be excluded from the data. The former
group may be divided into two subgroups categorized as AN1 and AN2, respec-
tively, with about three times as many students falling into the former grouping. In
the AN1 category the procedure for taking the average is the dominant considera-
Downloaded by [University of Otago] at 20:09 23 October 2015

tion and this allows no freedom for judging the data. These students argue that
'this is a correct method of finding the average,' and that 'one cannot choose to
ignore certain results... all results must be used'. A smaller subgroup (AN2)
acknowledged that the reading of 588 mm was well outside the range defined by
the other readings. However, this reading did not pose a problem to these students
as it formed part of the spread, and they argued that 'the value 588 mm shows how
big the spread of the values are and should be used because that is what the group
has measured and should form part of their results'. With regard to the students
who chose to exclude the anomalous measurement, just under half (AN3) sug-
gested that the anomaly should be ignored because 'they may have made a mistake
while they were measuring it'. A few of these students suggested that this meas-
urement should be repeated. The remaining students (AN4) excluded the anomaly
on the grounds that the point was outside an acceptable range or was not consistent
with the other values. They claimed, for example, that 'all the measurements
except 588 are in the range of 20 m m . . . 588 is out of this range by more than
140 mm'.
The next probe (SMDS) dealt with two sets of measurements that had the
same mean but different spreads. The intention of this probe is to establish how
the quality of a data set (in terms of trustworthiness) is characterized:
Two groups of students compare their results for a distance measurement.
Group A: 444 432 424 440 435 Average = 435 mm
Group B: 446 460 410 424 440 Average = 435 mm
A: 'Our results are better. They are all between 424 mm and 444 mm. Yours are
spread between 410 mm and 460 mm.'
B: 'Our results are just as good as yours. Our average is the same as yours. We both got
435 mm for the distance.
With reference to table 4, it can be seen that the students were divided approxi-
mately equally on whether the two sets of results were equally good (SMDS1) or
whether group A had the better results (SMDS2). Students in the former group
used the average as the only criterion to compare the two sets of data. Two types of
responses typified this category. First, those who simply mentioned the average
without referring to the spread and stated, 'because group B has the same average
as group A', and those (about 60% of the SMDS 1 group) who stated very clearly
that 'the spread of measurements has nothing to do with the average value', there-
fore implying that the spread was not a criterion to be used in making the necessary
PERCEPTIONS OF THE QUALITY OF EXPERIMENTAL MEASUREMENTS 455

Table 3. Summary of responses to the AN probe (n=121).


Number of
Category Description students (%)

AN1 The anomaly must be included when taking an average 37 (30)


since all readings must be used
AN2 The anomaly is noted, but it has to be included since it 14 (12)
is part of the spread of results
AN3 T h e anomaly must be excluded as it is most likely a 30(25)
mistake
AN4 The anomaly must be excluded as it is outside the 38(31)
acceptable range
ANO Not codeable 2 (2)
Downloaded by [University of Otago] at 20:09 23 October 2015

Table 4. Summary of responses to the SMDS probe (n =


Number of
Category Description students (%)

SMDS1 The results are equally good since the averages are 58(48)
identical
SMDS2 The results of group A are better since the data of 53 (44)
group A are closer together than those of group B
SMDSO Not codeable 10(8)

comparison. The students in group SMDS2 concluded that the results of group A
are better and appear to have used some notion of the spread in the data in reaching
their conclusion. However, the large majority of the responses were not very
clearly stated, with terms such as 'uncertainty' and 'spread' loosely used in the
explanations, as illustrated by such statements as 'the values [for] calculating final
d must not be spread out too much', and 'the uncertainty between readings
obtained by group A is about 20 mm while that obtained by group B is about
50 mm'. The overall pattern of the responses suggests that the students are not
able to differentiate clearly between the overall spread of the data set and the
differences between the individual data points within the set. Hardly any students
invoked the former concept in an unambiguous way.
The final probe discussed here dealt with two sets of measurements having a
different mean but a similar spread (DMSS):
Two groups of students compare their results for five releases of the ball at
h = 400 mm.
Group A: 441 426 432 422 444 Average = 433 mm
Group B: 432 444 426 433 440 Average = 435 mm
A: 'Our result agrees with yours.'
B: 'No, your result does not agree with ours.'

By far the most prevalent idea (see table 5) was to compare averages and then make
a decision about whether the averages were 'close', 'far' or 'consistent' (DMSS1).
About two thirds of the students in this DMSS1 grouping concluded that the two
averages were consistent by suggesting that 'the averages might not be the same
456 S. ALLIE ET AL.

Table 5. Summary of responses to t h e DMSS probe (n=121).


Number of
Category Description students (%)

DMSS1 It depends on how close the averages are 62 (52)


DMSS2 It depends soley on the relative spreads of the data 4(3)
DMSS3 It depends on the degree of correspondence between 10(8)
individual measurements in the two sets
DMSS4 It depends on both the averages and the uncertainties 34(28)
DMSSO Notcodeable 11(9)

but they are only different by 2 mm which is a very small distance'. The remaining
third expressed the contrary view that '433 and 435 are totally different numbers',
Downloaded by [University of Otago] at 20:09 23 October 2015

and several stated that 'the answers aren't exactly the same are they! How can they
agree with each other?' The grouping DMSS2 comprised only four students who
used the criteria that the relative spreads were the basis for the comparison
(DMSS2), stating, for example, that, 'the results don't agree since the uncertainty
in group A will be greater than group B'. A group comprising 8% of the students
tried to come to a conclusion by comparing individual measurements between the
two sets of data (DMSS3), typically reasoning that 'the values for the two groups
match almost exactly'. The most sophisticated reasoning was evidenced by about a
third of the students (DMSS4) who used the notion of uncertainty or spread in
conjunction with the average to come to a conclusion. This group had some diffi-
culty in expressing their ideas and wrote, for example, 'if we find the uncertainties
in A and B the average of A will most likely fall in the range of B(av) ± B and the
same will apply to the average of B to A(av) ± A', and 'with every average there
should be a standard deviation and chances are both will be in the same range'.

Discussion
The considerable modifications and extensions of the PACKS probes for use with
a sample of students from diverse cultural backgrounds (hence the race- and
gender-neutral cartoons), and mainly using English as their second language
(hence the terse style of the probes), seems to have been effective. The structure
of the task in which students first select a proposition and then justify their choice
seems particularly helpful for second-language speakers. The students in the
sample, generally, had minimal experience with practical work during their school
career but undertook a course on practical procedures as part of their undergrad-
uate studies. The study is thus not documenting the intuitive ideas about experi-
mental data of students inexperienced and untutored in practical work data
handling. However, an opportunity exists to undertake such an important study
by focusing on school leavers prior to university entry.
Lubben and Miller (1996) proposed a model of progression of ideas concern-
ing experimental data (see table 1) based on a study with U K secondary school
students. Although the students in this study are older, study science at a more
advanced level, and have a different cultural, educational and linguistic back-
ground, the proposed model is useful in classifying their procedural ideas. Apart
PERCEPTIONS OF THE QUALITY OF EXPERIMENTAL MEASUREMENTS 457

from a few instances where A, C and D level reasoning can be identified, but even
then not consistently, most of the respondents may be classified as falling into
levels F, G or H. However, the scheme requires an additional level to allow for the
most sophisticated reasoners to be accommodated.
From the responses to the DMSS probe which focuses on using the spread
around a mean to compare whether the two sets of measurements are consistent
with each other, 30% of the total sample may be regarded as being 'spread reason-
ers'. In addition, there is a group of students who use spread reasoning when
formulating the reasons for taking repeat measurements (the R T probe) but not
for the DMSS probe. One may conclude that these students recognize that there
are variations between the data points but they do not synthesize this into a global
measure which may be used together with the mean to characterize the data set.
Another measure of how well the notion of spread is understood may be seen by
comparing the responses for probes SMDS and DMSS. Although about half of
Downloaded by [University of Otago] at 20:09 23 October 2015

the respondents related the widths of the spreads to the accuracy of the data sets
(SMDS), only about a third of this group applied the criterion of overlapping
spreads to decide whether the data sets in the DMSS probe were consistent
with each other. In summary, about 15% of the total sample may therefore be
regarded as using 'spread' reasoning in a consistent way. These reasoners exhibit
greater sophistication than allowed for in the Lubben-Millar scheme (table 1)
indicating that the scheme should include a new category, I. I-reasoners show
understanding that consistency of data sets can be judged by comparing the rela-
tive positions of their means in conjunction with their spreads.
Amongst the more sophisticated reasoners, a common way of comparing data
sets was to state that the mean of one data set was within the range of the other set.
The current probes did not make it possible, however, to check whether these
students would have come to a different conclusion if the range of data had over-
lapped but the mean of one data set was not contained within the data set of the
other. This will need to be explored in further research into different perceptions
of overlapping spread.
Although the students in the sample may be classified as advanced reasoners,
their language usage is haphazard. Terms reflecting collected and computed data
such as measurement, calculation, result and value are used interchangeably.
There is considerable confusion about terminology such as spread, error, range,
uncertainty, precision and accuracy. Although this may be attributed to purely
linguistic problems of second-language English speakers, the fact that the findings
of Sere et al. (1993) show a similar loose use of terminology for (first-language)
French physics undergraduates refutes this interpretation. It appears more likely
that the haphazard usage of science terminology is equally related to the lack of
differentiation between systematic and random errors in the minds of the students.
The vast majority of students justly argue that repeating is needed to limit the
random error, and therefore to improve precision. Very few refer explicitly (and
incorrectly) to repeating for the purpose of excluding systematic errors such as
parallax. At the same time, however, 51% of the responses in the total sample (all of
R2 and R3, and part of R4 and R5) indicate that students repeat in order to get
closer to the 'real' or 'correct' value for the time/distance measurement. This
implies they intend to improve the accuracy, which depends on eliminating sys-
tematic errors and cannot be achieved by repeating measurements. It is suggested
that this confusion may be remedied if teaching includes short practical tasks
458 S. ALLIEETAL.

challenging students to develop and execute measuring techniques for reducing


random errors and systematic errors, respectively. Equally, a specific requirement
for including separate notes on potential sources of random and systematic error in
standard practical reports may remind students of appropriate terminology and
relevant strategies.
A significant different number of students wish to take repeat measurements
for time (64%) and distance (50%) in order to determine the mean: students per-
ceive that there is less reason to determine a mean for distance measurements than
for time measurements. In terms of teaching strategies, the idea of minimizing the
inherent variability of measurements by taking repeat measurements to reduce
random errors may best be introduced through taking a series of distance meas-
urements with instruments with increasing resolution and discussing the reasons
for the variation of the readings, even for the instrument with the highest resolu-
tion.
Downloaded by [University of Otago] at 20:09 23 October 2015

The finding that significantly fewer students perceive the need for determin-
ing a mean for distance measurements than for time measurements indicates that
the use of procedural understanding depends on the measure being made. This
dependency mirrors the fact that the application of conceptual understanding is
influenced by context. It is therefore of importance to document students' percep-
tions about procedural decisions for experimental work with a variety of measuring
instruments (e.g. ammeters, thermometers, pH meters) each with different resolu-
tion. In addition, and quite separately, research is also needed into the similarities
and differences in the use of procedural understanding in various sciences, e.g.
biology, chemistry, astronomy, in order to determine more clearly the parameters
for the transferability of procedural understanding.

Acknowledgements
The research reported in this paper was supported by a British Council Link
(ODA), and Research Funds from the University of Cape Town (Academic
Development Programme) and the University of York. We also thank the staff
of the U C T Physics Department workshop and Fiona Gibbons.

References
ALLIE, S. and BUFFLER, A. (in press) A course in Tools and Procedures for Physics 1.
American Journal of Physics.
BEVINGTON, P. R. and ROBINSON, D. K. (1991) Data Reduction and Error Analysis for the
Physical Sciences (New York: McGraw-Hill).
BLACK, P. (1993) The purposes of science education. In R. Hull, (ed.), ASE Secondary
Science Teachers' Handbook (London: Simon & Schuster).
GIORDANO, J. L. (1997) On the sensitivity, precision and resolution in DC Wheatstone
bridges. European Journal of Physics, 18(1), 22-27.
GOTT, R. and DUGGAN, S. (1996). Practical work: its role in the understanding of evidence in
science. International Journal of Science Education, 18 (7), 791-806.
LINDER, C. J. and HILLHOUSE, G. (1996) Teaching by conceptual exploration: insights into
potential long-term learning outcomes. The Physics Teacher, 34 (6), 332-338.
LUBBEN, F. and MILLAR, R. (1994) A Survey of the Understanding of Children aged 11-16 of
Key Ideas about Evidence in Science, PACKS Research Paper 3 (York: University of
York).
PERCEPTIONS OF THE QUALITY OF EXPERIMENTAL MEASUREMENTS 459

LUBBEN, F. and MILLAR, R. (1996) Children's ideas about the reliability of experimental
data. International Journal of Science Education, 18 (8), 955-968.
MILLAR, R., GOTT, R., LUBBEN, F. and DUGGAN, S. (1996) Children's performance of inves-
tigative tasks in science: a framework for considering progression. In M. Hughes
(ed.), Progression in Learning (Clevedon: Multilingual Matters).
OSBORNE, J. (1996) Untying the Gordian knot: diminishing the role of practical work.
Physics Education, 31 (5), 271-278.
PFUNOT, H. and DUIT, R. (1994) Bibliography: Students' Alternative Frameworks and Science
Education (Kiel: IPN).
SERE, M-G., JOURNEAUX, R. and LARCHER, C. (1993) Learning the statistical analysis of
measurement error. International Journal of Science Education, 15 (4), 427-438.
THOMSON, V. (1997) Precision and the terminology of measurement, The Physics Teacher,
35 (1), 15-17.
Downloaded by [University of Otago] at 20:09 23 October 2015

You might also like