You are on page 1of 65

Topic 6: Measurement and Evaluation of Human

Performance

6.1 Statistical analysis


statistics
The science of data

GREAT STATS WEB-SITE

http://www.usablestats.com/index.php
What is data?

Information, in the form of facts or figures


obtained from experiments or surveys, used
as a basis for making calculations or drawing
conclusions.
Encarta dictionary
Statistics

1. Data can be collected about a population (surveys)

2. Data can be collected about a process


(experimentation)
Statistics
2 Types of Data

1.
Qualitative
2.
Quantitave
Qualitative Data

Deals with descriptions.


Data can be observed but not measured.
Qualitative → Quality
Examples
Colors, textures, smells, tastes, appearance, beauty, etc.
...such as.....
Species of plant
Type of runner
Shades of color
Rank of flavor in taste testing
Qualitative Data

Qualitative data, manipulated numerically

Survey results, teens and need for environmental


action
Quantitative Data

Quantitative – measured using a naturally occurring


numerical scale - Deals with numbers.
Quantitative → Quantity 
Data which can be measured.
Examples
Chemical concentration
Temperature
Length
Weight…etc.
Quantitative Data

Measurements are often displayed graphically


Quantitative Data

Quantitative = Measurement
In data collection for Sports Science, data must be measured
carefully, using laboratory equipment (ex. timers, scales, calipers,
accelerometer etc).
The limits of the equipment used add some uncertainty to the data
collected.
All equipment has a certain magnitude of uncertainty. For
example,
is a ruler that is mass-produced a good measure of 1 cm? 1mm?
0.1mm?
For quantitative testing, you must indicate the level of
uncertainty of the tool that you are using for measurement!!
The  sample  size  must    
be  large  enough  to  provide  
sufficient  reliable  data  and  for  us  
to  carry  out  relevant  sta0s0cal  
tests  for  significance.    
 
We  must  also  be  mindful  of  
uncertainty  in  our  measuring  tools  
and  error  in  our  results.  
Photo:  Broadbilled  hummingbird  (wikimedia  commons).  
Quantitative Data

How to determine uncertainty?


Usually the instrument manufacturer will indicate this –
read what is provided by the manufacturer.

Be sure that the number of significant digits in the data table/graph


reflects the precision of the instrument used
Example
If the manufacturer states that the accuracy of a balance is to 0.1g –
and your average mass is 2.06g, be sure to round the average to
2.1g).
Your data must be consistent with your measurement tool regarding
significant figures.
Looking at Data: How accurate is the data?

How close are the data to the “real” results?


(How precise is the data?)

All test systems have some uncertainty, due to limits of


measurement.

Estimation of the limits of the experimental uncertainty is


essential.
Finding the limits

As a “rule-of-thumb”, if not specified, use +/- 1/2 of


the smallest measurement unit.
Example
Metric ruler is lined to 1mm,so the limit of uncertainty of the ruler is +/- 0.5 mm.)

Question
If the room temperature is read as 25 degrees C, with a thermometer that is
scored
at 1 degree intervals – what is the range of possible temperatures for the room?

Answer
+/- 0.5 degrees Celsius - if you read 15oC, it may in fact be 14.5 or 15.5 degrees)
Looking at Data

How accurate is the data? (How close are the data to the
“real” results?) This is also considered as BIAS.
BIAS is when you favor something in a scientific experiment like asking the question.

What is your favorite Ice Cream in the whole world.. Plain Vanilla or Delicious
Chocolate??

That is a biased question because There are more flavors of Ice Cream besides
Vanilla and Chocolate and the Adjectives are showing your personal opinion.

How precise is the data? (All test systems have some


uncertainty, due to limits of measurement).
Estimation of the limits of the experimental uncertainty is
essential.
Calculating the Mean
Calculating the Mean
Calculating the Mean

The mean or average of a group of values describes a


middle point or central tendency about which data
points vary.

The mean is a way of summarizing a group of data


and stating a best guess at what the true value of
the dependent variable value is for that independent
variable level.
The mean is a measure of the central tendency
of a set of data.

The  mean  is  a  measure  of  the  central  tendency    


of  a  set  of  data.  
Calculate  the  mean  using:    
Table 1: Raw measurements of bill length in
•  Your  calculator  
A. colubris and C. latirostris.
 
             (sum  of  values  /  n)  
 
 
Bill length (±0.1mm) 
 
 
n A. colubris C. latirostris
•  Excel
 
1 13.0 17.0
 
2 14.0 18.0
 
3 15.0 18.0 n  =  sample  size.  The  bigger  the  beAer.    
 
4 15.0 18.0 In  this  case  n=10  for  each  group.    
5 15.0 19.0  
 

All  values  should  be  centred  in  the  cell,  with  
 
6 16.0 19.0
decimal  places  consistent  with  the  measuring  
 
7 16.0 19.0
tool  uncertainty.  
 
8 18.0 20.0
 
9 18.0 20.0
 
10 19.0 20.0
 
Mean     =AVERAGE(highlight  raw  data)  
 
s    
 
 
   
The  mean  is  a  measure  of  the  central  tendency    
of  a  set  of  data.  
Table 1: Raw measurements of bill length in Descrip0ve  table  ;tle  and  number.  
A. colubris and C. latirostris.
 
 
 
Bill length (±0.1mm)  Uncertain0es  must  be  included.  
 
n A. colubris C. latirostris
 
1 13.0 17.0
 
2 14.0 18.0
 
3 15.0 18.0
 
4 15.0 18.0
 
5 15.0 19.0 Raw  data  and  the  mean  need  to  have  
 
6 16.0 19.0 consistent  decimal  places  (in  line  with  
 
7 16.0 19.0 uncertainty  of  the  measuring  tool)
 
8 18.0 20.0
 
9 18.0 20.0
 
10 19.0 20.0
 
Mean 15.9 18.8
 
s
 
 
 
 

Comparing Averages

Once the 2 averages are calculated for each set of data,


the average values can be plotted together on a graph,
to visualize the relationship between the 2.
Comparing Averages
Comparing Averages
Outline that error bars are a graphical
6.1.1
representation of the variability of data.
The knowledge that any individual measurement you make in a
lab will lack perfect precision often leads a researcher to choose
to take multiple measurements at some independent variable
level.

Though not one of these measurements are likely to be more


precise than any other, this group of values, it is hoped, will
cluster about the true value you are trying to measure.

This distribution of data values is often represented by showing


a single data point, representing the mean value of the data,
and error barsto represent the overall distributionof the data.
Drawing error bars

The simplest way to draw an error bar is to use the


mean as the central point, and to use the distance of
the measurement that is furthest from the average as
the endpoints of the data bar
What do error bars suggest?
If the bars show extensive overlap, it is likely that there is
not
a significant difference between those values.
Error Bars

Standard  devia+on  is  a  measure  of  the  spread  of  


most  of  the  data.  Error  bars  are  a  graphical  
representa+on  of  the  variability  of  data.  
Error  bars  could  represent  standard  devia2on,  range  or  confidence  intervals.  

Which  of  the  two  sets  of  data  has:    


 

a.  The  highest  mean?    


 
 
b.  The  greatest  variability  in  the  data?
Standard  devia+on  is  a  measure  of  the  spread  of  
most  of  the  data.  Error  bars  are  a  graphical  
representa+on  of  the  variability  of  data.  
Error  bars  could  represent  standard  devia4on,  range  or  confidence  intervals.  

Which  of  the  two  sets  of  data  has:    


 

a.  The  highest  mean?    


  A
 
b.  The  greatest  variability  in  the  data?
B
The error bars shown in a line graph
Error Bars in review

The error bars shown in a line graph represent a


description of how confident you are that the mean
represents the true value.

The more the original data values range above and


below the mean, the wider the error bars and less
confident you are in a particular value.
Graph  1:  Comparing  mean  5mes  in  two  runners,   Title  is  adjusted  to  
Runner  A  and  Runner  B  (error  bars  =  standard  
show  the  source  of  the  
devia2on)    
error  bars.  This  is  very  
important.    
20.0    
 
B.  18.8sec      
You  can  see  the  clear  
Mean  Times    (±0.1sec)    

A.    15.9sec     difference  in  the  size  of  


15.0    
the  error  bars.    
 
Variability  has  been  
visualised.    
10.0      
 
The  error  bars  overlap  
somewhat.    
5.0      
What  does  this  mean?

0.0    
Runners  
The  overlap  of  a  set  of  error  bars  gives  a  clue  as  to  the  
significance  of  the  difference  between  two  sets  of  data.  
Large  overlap No  overlap

Lots  of  shared  data  points   No  (or  very  few)  shared  data  
within  each  data  set.     points  within  each  data  set.    
   

Results  are  not  likely  to  be   Results  are  more  likely  to  be  
significantly  different  from   significantly  different  from  
each  other.     each  other.    
   
Any  difference  is  most  likely   The  difference  is  more  likely  
due  to  chance.   to  be  ‘real’.  
Adding error bars in Logger Pro

Click on Data in the toolbar

Click on Column Options

Click the appropriate data set (y- or x-axis that you want)

Check Error Bar Calculations

Check Fixed Value

Check Use Column


Adding error bars in Logger Pro
Quick Review – 3 measures of “Central
Tendency”

1. Mode: value that appears most frequently


2.Median: When all data are listed from least to greatest,
the
value at which half of the observations are greater, and half
are lesser.
3.Mean: The most commonly used measure of central
tendency, or arithmetic average (sum of data points divided
by the number of points)   
How can leaf lengths be displayed
graphically?
Simply measure the lengths of each and
plot how many are of each length
http://www.youtube.com/watch?v=-PoD7H0UVCw
If smoothed, the histogram data assumes
this shape
This Shape?
Is a classic bell-shaped curve, AKA a Normal Distribution
curve

Essentially it means that in all studies with an adequate number


of datapoints (>30) a significant number of results tend to be
near the mean.

Fewer results are found farther from the mean


StandardDeviation
Standard deviation is a simple measure of the variability or
dispersion of a data set.

A low standard deviation indicates that the data points tend to be very
close to the same value (the mean).

While a high standard deviation indicates that the data are “spread
out” over a large range of values.

Basically, the standard deviation is a statistic that tells you how tightly
all the various examples are clustered around the mean in a set of data.
StandardDeviation
The STANDARD DEVIATION is a more sophisticated
indicator of the precision of a set of a given number of
measurements.

In large studies, the standard deviation is used to draw


error bars, instead of the maximum deviation.
6.1.3 A typical standard distribution curve

1. The total area under the curve is equal to 1 (100%)


2. About 68% of the area under the curve falls within 1 standard
deviation.
3. About 95% of the area under the curve falls within 2 standard
deviations.
4. About 99.7% of the area under the curve falls within 3 standard
deviations.
According to this curve:

One standard deviationaway from the mean in either direction


on the horizontal axis (the red area on the preceding graph)
accounts for somewhere around68 percentof the data in this
group.

Two standard deviationsaway from the mean(the redandgreen


areas) account for roughly95 percent of the data.
Three Standard Deviations?
Three standard deviations (the red, green and blue areas)
account for about 99 percent of the data.

-3sd-2sd+/-1sd2sd+3sd
Normal Distribution
Standard Deviation- The computed measure of how much the values vary
around the mean score (above and below)
How is Standard Deviation calculated?
With this formula!
AGHHH! MR S
DO I NEED TO KNOW THIS FOR THE
EXAM?????
Not the formula!

YOU DO NEED TO KNOW THE CONCEPT

Students are not expected to know the formulas.

They will be expected to use the statistics function of a


scientific calculator.
6.1.4 Standard Deviation Concept
Standard deviation is a statistic that tells how tightly all the various
data points are clustered around the mean in a set of data.

When the data points are tightly bunched together and the bell-
shaped curve is steep, the standard deviation is small.(precise
results, smaller sd).

When the data points are spread apart and the bell curve is
relatively flat, a large standard deviation value suggests less
precise results.
A set of length measurements are taken with a mean of 2.5 cm and the standard
deviation of 0.5cm. Which of the following is true?

1. 68% of all data lie between 2.5cm and 3.5cm

2. 68% of all data lie between 1.5cm and 3.5cm

3. 95% of all data lie between 1.5cm and 3.5cm

4. 95% of all data lie between 2.0cm and 3.0cm

95% of all data lie between 1.5cm and 3.5cm


1 SD=0.5cm
68% of data is +/- 1SD, so 68% are between 2.0cm and 3.0cm
95% of data are within +/- 2SD, so 95% are between 1.5cm and 3.0cm
In a population of men the systolic blood pressure shows a
normal distribution. The mean of the population is 125
(measured in mm and Hg) and the standard deviation is 10.
If
the population was 1000, how many of them have a blood
pressure between 115 and135mm Hg?

680 men have blood pressure between 115 and 135mm


Hg.

If the mean is 125, and the standard deviation is 10, then +1


Sx is 135, and -1 Sx is 115, and we know that 68% of our
data
(in this case the men) are +/-1 Sx from the mean.
Another example
For example, the average height for adult men in Thailand is
about 170 cm, with a standard deviation of around 10cm.
This means that most men (about 68%, assuming a normal
distribution) have a height within 10 cm’s of the mean (160
cm
– 180 cm)...
...whilst almost all men (about 95%)have a height within
20cm
of the mean (160 cm – 180 cm).
Standard Deviation

If the standard deviation were zero, then all men would


be exactly 170 cm high.

If the standard deviation were 20 cm, then men would


have much more variable heights, with a typical range of
about 150 to 190 cm.
6.1.2 Calculate the mean and standard deviation
of a set of values
Students should specify the sample standard deviation not the
population.

What is sample and population?

If our test is on the entire ISB High School that would be our
population. If we take a sample of 20 from ISB High School,
that would be the sample. If the sample is an accurate
representation of the population you are able to make an
inference based on sample results to the population.
6.1.2 Calculate the mean and standard
deviation of a set of values
Formula for SD using a Texas Instrument TI 80 series

1. Press Stat
2. Edit and the #1 should be highlighted, Press Enter
3. Enter data points pressing the down button after each entry
4. After all data points have been entered, press Stat
5. Go to Calc – press 1 var stats for one data set
6. Put in L1 for one data set, Press enter
7. You have your statistics for the data set
Look on page 157 and compute the mean and SD for both data sets.
(http://www.youtube.com/watch?v=DMOXzwC2vzg)
Explain how the standard deviation is useful for
6.1.4

comparing the means and the spread of data


between two or more samples.

A small standard deviation indicates that the data is


clustered closely around the mean value.

Conversely, a large standard deviation indicates a wider


spread around the mean.
Standard  devia+on  is  a  measure  of  the  spread  of  
most  of  the  data.  
 
Table 1: Times of two 100m runners.
 
 
Time (±0.1 sec) 
 
n A. B. Which  of  the  two  sets  of  data  has:    
 

 
1 13.0 17.0 a.  The  largest  mean?    
 
2 14.0 18.0  
 
3 15.0 18.0  
 
4 15.0 18.0 b.  The  greatest  variability  in  the  data?
 
5 15.0 19.0
 
 
6 16.0 19.0
 
7 16.0 19.0
 
8 18.0 20.0
 
9 18.0 20.0
 
10 19.0 20.0
 
Mean 15.9 18.8
 
s 1.91 1.03
 
 
 
 
Standard  devia+on  can  have  one  more  
=STDEV  (highlight  RAW  data).   decimal  place.  
Standard  devia+on  is  a  measure  of  the  spread  of  
most  of  the  data.  
 
Table 1: Times of two 100m runners.
 
 
Time (±0.1 sec) 
 
n A. B. Which  of  the  two  sets  of  data  has:    
 

 
1 13.0 17.0 a.  The  largest  mean?    
 
2 14.0 18.0   B.  
 
3 15.0 18.0  
 
4 15.0 18.0 b.  The  greatest  variability  in  the  data?
 
5 15.0 19.0
A.  
 
6 16.0 19.0
 
7 16.0 19.0
 
8 18.0 20.0
 
9 18.0 20.0
 
10 19.0 20.0
 
Mean 15.9 18.8
 
s 1.91 1.03
 
 
 
 
Standard  devia+on  can  have  one  more  
=STDEV  (highlight  RAW  data).   decimal  place.  

You might also like