Professional Documents
Culture Documents
STATISTICS
Use statistical terms and definitions correctly
AT THE END OF THIS
in analytical chemistry, TOPIC YOU SHOULD
Calculate measures of central tendency for BE ABLE TO;
environmental analysis data,
Calculate measures of dispersion for
environmental analysis data,
Test hypothesis employing appropriate
statistical tests,
DEFINITIONS
Statistics: Collection of methods for planning experiments, obtaining
data, and then organizing, summarizing, presenting, analyzing,
interpreting, and drawing conclusions.
Variable: Characteristic or attribute that can assume different values
Random Variable: A variable whose values are determined by chance.
Population: All subjects possessing a common characteristic that is
being studied.
Sample: A subgroup or subset of the population.
Parameter: Characteristic or measure obtained from a population.
Statistic (not to be confused with Statistics): Characteristic or measure
obtained from a sample.
Descriptive Statistics: Collection, organization,
summarization, and presentation of data.
Inferential Statistics: Generalizing from samples to
populations using probabilities. Performing hypothesis DEFINITIONS
testing, determining relationships between variables, and
making predictions.
…
Qualitative Variables: Variables which assume non- Give more examples of;
numerical values. 1. Qualitative variables
Quantitative Variables: Variables which assume 2. Quantitative variables
numerical values. 3. Discrete variables
Discrete Variables: Variables which assume a finite or
4. Continuous variables
countable number of possible values. Usually obtained by
counting. e.g. number of bacteria colonies in a culture
Continuous Variables: Variables which assume an
infinite number of possible values. Usually obtained by
measurement. e.g. pH of a sample, patient cholesterol
levels
DEFINITIONS…
Nominal Level: Level of measurement which classifies data into mutually
exclusive, all-inclusive categories in which no order or ranking can be
imposed on the data. e.g. gender, blood group.
Ordinal Level: Level of measurement which classifies data into categories
that can be ranked. Differences between the ranks do not exist. e.g. mild,
moderate or severe illness). Often ordinal variables are coded to be
quantitative.
Interval Level: Level of measurement which classifies data that can be
ranked and differences are meaningful. However, there is no meaningful
zero, so ratios are meaningless.
Ratio Level: Level of measurement which classifies data that can be
ranked, differences are meaningful, and there is a true zero. True ratios exist
between the different units of measure.
Which among the following are quantitative or ACTIVITY1A:
qualitative?
a. colour change,
b. temperature
c. turbidity values
Give examples of;
d. states of matter
1. Nominal
e. Concentration of Total Organic Compounds
2. Ordinal
Isolate all discrete from continuous variables from the
following; 3. Interval &
f. time measurements, 4. Ratio data
g. Electric conductivity measurements, Which basic statistical
calculations are meaningful
h. Total Bacterial Count (TBC) for each one?
i. Water hardness (mg/L CaCO3)
PARAMETERS & STATISTICS
Parameters: Quantities that describe a population
characteristic. They are usually unknown and we wish to
make statistical inferences about parameters. Different to
perimeters.
7
FREQUENCY DISTRIBUTIONS
An (Empirical) Frequency Distribution or Histogram for a
continuous variable presents the counts of observations
grouped within pre-specified classes or groups
A Relative Frequency Distribution presents the
corresponding proportions of observations within the classes
A Barchart presents the frequencies for a categorical
variable
8
EXAMPLE – SERUM CK
Blood samples taken from 36 male volunteers as part of a study to
determine the natural variation in CK concentration. The serum CK
concentrations were measured in (U/I) are as follows:
9
RELATIVE FREQUENCY TABLE
Serum CK Frequency Relative Cumulative Rel.
(U/I) Frequency Frequency
20-39 1 0.028 0.028
40-59 4 0.111 0.139
60-79 7 0.194 0.333
80-99 8 0.222 0.555
100-119 8 0.222 0.777
120-139 3 0.083 0.860
140-159 2 0.056 0.916
160-179 1 0.028 0.944
180-199 0 0.000 0.944
200-219 2 0.056 1.000
Total 36 1.000
10
FREQUENCY DISTRIBUTION
Distributions
CK-concentration-(U/l)
Quantiles Moments
8 100.0% maximum 203.00 Mean 98.277778
99.5% 203.00 Std Dev 40.380767
97.5% 203.00 Std Err Mean 6.7301278
90.0% 154.60 upper 95% Mean 111.94066
6 75.0% quartile 118.75 lower 95% Mean 84.614892
50.0% median 94.50 N 36
25.0% quartile 67.25
10.0% 54.30
Frequency
4 2.5% 25.00
0.5% 25.00
0.0% minimum 25.00
Relative Frequency
10.0% 54.30
2.5% 25.00
0.10 0.5% 25.00
Left tail 0.0% minimum 25.00
0.05
36
SAMPLE VARIANCE
The sample variance, s2, is the arithmetic mean
of the squared deviations from the sample
mean:
n
xi x
2
s i 1
2
n 1
>
37
STANDARD DEVIATION
The sample standard deviation, s, is the
square-root of the variance
n
xi x
2
i 1
s
n 1
x x
2
i 2304.86
i 1
Therefore, 2304.86
s
7 1
19.6
40
COEFFICIENT OF VARIATION
The coefficient of variation (CV) or relative standard deviation
(RSD) is the sample standard deviation expressed as a percentage
of the mean, i.e.
s
CV 100%
x
The CV is not affected by multiplicative changes in scale
Consequently, a useful way of comparing the dispersion of
variables measured on different scales
41
EXAMPLE 1
The CV of the blood pressure data is:
19.6
CV 100 %
137.1
14.3%
i.e., the standard deviation is 14.3% as large as the mean.
42
INTER-QUARTILE RANGE
The Median divides a distribution into two halves.
The first and third quartiles (denoted Q1 and Q3) are defined as
follows:
25% of the data lie below Q1 (and 75% is above Q1),
Q1 Q3
Inter Quartile Range (IQR) is 151-124 = 27
Find the Inter Quartile Range (IQR) for the data below
40, 32, 55, 67, 61, 39, 33, 48 44
BOX-PLOTS
A box-plot is a visual description of the distribution based
on
Minimum
Q1
Median
Q3
Maximum
Useful for comparing large sets of data
45
EXAMPLE 3
The pulse rates of 12 individuals arranged in increasing
order are:
62, 64, 68, 70, 70, 74, 74, 76, 76, 78, 78, 80
47
EXAMPLE 4: BOX-PLOTS OF INTENSITIES FROM
11 GENE EXPRESSION ARRAYS
14
12
10
8
outliers
52
SCATTER-PLOT
Displays the relationship between two
continuous variables
Useful in the early stage of analysis when
exploring data and determining if a linear
regression analysis is appropriate
May show potential outliers in your data
53
EXAMPLE: AGE VERSUS SYSTOLIC BLOOD
PRESSURE IN A CLINICAL TRIAL
54
EXAMPLE: UP-REGULATION/DOWN-REGULATION OF
GENE EXPRESSION ACROSS AN ARRAY (CONTROL CY5
VERSUS DISEASE CY3)
55