You are on page 1of 43

STATISTICS

LOWER 6 AND UPPER 6

MR NHONGO
DEFINITION OF TERMS
STATISTICS
DEFINITIONS
• Statistics is a scientific method of
investigation which involves,
collection, analysis and interpretation
of numerical information.
• The term “Statistics”, meaning a
discipline, is singular and has a
capital S.
DEFINITIONS
• It has a small s when referring to figures
calculated from a sample.
FUNCTIONS OF STATISTICS
• To enhance our knowledge about the world
and environment we live in, by providing
accurate numerical information or facts
about various natural processes: physical,
biological, socio-economic, e.t.c and the laws
governing them.
FUNCTIONS OF STATISTICS
• To lend precision to ideas otherwise would
remain vague e.g. accurate and precise
measurements and or projections of levels of
important variables such as population figures
agricultural output, industrial output, state of
the economy, quality of goods and services.
MAIN BRANCHES OF STATISTICS
• At the highest level, it can be classified in two
main branches:

(a) Descriptive Statistics – deductive


statistics

(b) Inferential Statistics – inductive


statistics
DESCRIPTIVE STATISTICS
• Descriptive statistics are functions of the
sample data that are intrinsically interested in
describing some feature of the data.
• Classic descriptive statistics include mean,
minimum, maximum, standard deviation,
median, skew, kurtosis and mode.
INFERENTIAL STATISTICS
• Inferential statistics are a function of the sample
data that assists you to draw an inference regarding
an hypothesis about a population parameter.
• Classic inferential statistics include z-test, t-tests,
chi-square tests, F-ratio, etc.
• descriptive statistics quantitatively describes
features of a data, while inferential statistics makes
inferences about the populations from which
samples were drawn.
POPULATION
• A population or distribution is the totality of
cases or figures in an investigation.

• A population is sometimes referred to as


target population, underlying population or
universe.
SAMPLE
• A sample or study population, is part of a
population selected for study in an
investigation.
PARAMETER VS STATISTIC
• A parameter is a measure that describes the
entire population, whereas a statistic describes
a sample of the population.
• A parameter is a numeric descriptive measure
of some characteristic or feature of a
population, such as the centre or spread of the
values.
PARAMETER VS STATISTIC
• Is almost impossible due to several reasons
such as
– time constraints,
– financial constraints
– inaccessibility of the entire population
PARAMETER VS STATISTIC
• A statistic is a numeric descriptive measure of
a sample characteristic or simply a calculation
from a sample.

• the sample arithmetic mean (x̅), sample


variance (s2) and sample standard deviation
(s), are all statistics.
EXAMPLE
• For example, say our population is all of the
students at some high school. If I were to take
the mean age of all the students, that would be
a parameter, since the mean is describing the
entire population (all of the students)
• If we take the mean age of a randomly
selected group of 50 students, then that mean
would be a statistic, since it describes only part
of the population.
PARAMETER VS STATISTIC
• statistics are commonly obtained instead of
parameters because,
– samples are easier to work with because of their
relatively small size,
– samples are not costly to obtain information from,
– information from a sample can be obtained within
a relatively short period
• If statistics are obtained from a non-representative
sample, then they are biased estimators of the
parameters they represent.
DISCRETE VS CONTINUOUS DATA

• Discrete data is information that


can be categorized into a
classification .

• Whereas Continuous Data is data


that can take any value (within a
range)
Common Statistical Terms
• Data
– Measurements or observations of a
variable
• Variable
– A characteristic that is observed or
manipulated
– Can take on different values
DISCRETE VS CONTINUOUS DATA

• Discrete data can only take particular values.


There may potentially be an infinite number of
those values, but each is distinct and there's no
grey area in between.
• Discrete data can be numeric -- like numbers
of apples -- but it can also be categorical -- like
red or blue, or male or female, or good or bad.
DISCRETE VS CONTINUOUS DATA

• Continuous data are not restricted to defined


separate values, but can occupy any value over
a continuous range.

• Between any two continuous data values there


may be an infinite number of others.
DISCRETE VS CONTINUOUS DATA

• Continuous data are always essentially


numeric.

• Example: People's heights could be any value


(within the range of human heights), not just
certain fixed heights. (Opposite of Discrete Data).
QUANTITATIVE VS QUALITATIVE
DATA
• Quantitative data are measures of values or counts that
can be expressed as numbers, or can be quantified.
• Examples of quantitative data are scores on
achievement tests, number of hours of study, or weight
of a subject.
• They are about numeric variables e.g. how much, how
many or how often.
• These data may be represented by ordinal, interval or
ratio scales and lend themselves to most statistical
manipulation.
QUANTITATIVE VS QUALITATIVE
DATA
• Qualitative data are measures of types and
may be represented by a name, symbol, or a
number code hence they cannot be expressed
as a number.
• Data that represent nominal scales such as
gender, socio-economic status, religious
preference are usually considered to be
qualitative data.
Qualitative Data
• are those which cannot be quantified or measured
numerically.
• can simply be classified on the basis of their
attributes.
– Blood groups,
– human complexion
– taste,
• cannot be quantified,
Nominal vs Ordinal scale
• A categorical variable, also called a nominal
variable, is for mutual exclusive, but not
ordered categories.
• For example, your study might compare five
different genotypes. You can code the five
genotypes with numbers if you want, but the
order is arbitrary and any calculations are
meaningless.
• Computing an average would be meaningless.
Nominal Vs. Ordinal scale
• An ordinal variable, is one where the order matters
but not the difference between values.
• For example, you might ask patients to express the
amount of pain they are feeling on a scale of 1 to
10.
• A score of 7 means more pain than a score of 5, and
that is more than a score of 3.
• The difference between the 7 and the 5 may not be
the same as that between 5 and 3. The values
simply express an order.
Interval Vs. Ratio scale
• Interval Scale is a measurement where the
difference between two values is meaningful.

• The difference between a temperature of 100


degrees and 90 degrees is the same difference
as between 90 degrees and 80 degrees.
Interval Vs. Ratio scale
• A ratio scale, has all the properties of an
interval variable, and also has a clear
definition of 0.0.

• When the variable equals 0.0, there is none of


that variable. Variables like height, weight,
enzyme activity are ratio variables.
Interval Vs. Ratio scale
• Temperature, expressed in F or C, is not a ratio
variable. A temperature of 0.0 on either of
those scales does not mean 'no heat'.

• However, temperature in Kelvin is a ratio


variable, as 0.0 Kelvin really does mean 'no
heat'.
Levels of Measurement

• There are 4 levels of measurement


– Nominal, ordinal, interval, and ratio
1. Nominal
– Data are coded by a number, name, or letter that
is assigned to a category or group
– Examples
• Gender (e.g., male, female)
• Treatment preference (e.g., manipulation,
mobilization, massage)

Evidence-based Chiropractic 29
Levels of measurement (cont.)

2. Ordinal
– Is similar to nominal because the measurements
involve categories
– However, the categories are ordered by rank
– Examples
• Pain level (e.g., mild, moderate, severe)
• Military rank (e.g., lieutenant, captain, major, colonel,
general)

Evidence-based Chiropractic 30
Levels of measurement (cont.)

• Ordinal values only describe order, not


quantity
– Thus, severe pain is not the same as 2 times mild
pain
• The only mathematical operations allowed for
nominal and ordinal data are counting of
categories
– e.g., 25 males and 30 females

Evidence-based Chiropractic 31
Levels of measurement (cont.)

3. Interval
– Measurements are ordered (like ordinal data)
– Have equal intervals
– Does not have a true zero
– Examples
• The Fahrenheit scale, where 0° does not correspond
to an absence of heat (no true zero)
• In contrast to Kelvin, which does have a true zero

Evidence-based Chiropractic 32
Levels of measurement (cont.)

4. Ratio
– Measurements have equal intervals
– There is a true zero
– Ratio is the most advanced level of
measurement, which can handle most types of
mathematical operations

Evidence-based Chiropractic 33
Levels of measurement (cont.)

• Ratio examples
– Range of motion
• No movement corresponds to zero degrees
• The interval between 10 and 20 degrees is the same as
between 40 and 50 degrees
– Lifting capacity
• A person who is unable to lift scores zero
• A person who lifts 30 kg can lift twice as much as one
who lifts 15 kg

Evidence-based Chiropractic 34
Levels of measurement (cont.)

• NOIR is a mnemonic to help remember the


names and order of the levels of
measurement
– Nominal
Ordinal
Interval
Ratio

Evidence-based Chiropractic 35
Kurtosis vs Skewness
• Skewness - measures the degree and direction
of symmetry or asymmetry of the distribution.

• A normal or symmetrical distribution has a


skewness of zero (0).

• But in the real world, normal distributions are


hard to come by.
Kurtosis vs Skewness
• A distribution may be positively skewed - skew
to the right; longer tail to the right; represented
by a positive value.

• or negatively skewed - skew to the left; longer


tail to the left; with a negative value.
Kurtosis vs Skewness
• Kurtosis - measures how peaked a distribution is
and the lightness or heaviness of the tails of the
distribution.

• In other words, how much of the distribution is


actually located in the tails?

• A normal distribution has a kurtosis value of zero


(0) and is said to be mesokurtic.
Kurtosis vs Skewness
• A positive kurtosis value means that the tails
are heavier than a normal distribution and the
distribution is said to be leptokurtic (with a
higher, more acute "peak").
• A negative kurtosis value means that the tails
are lighter than a normal distribution and the
distribution is said to be platykurtic (with a
smaller, flatter "peak
RANDOM VARIABLES
• Recall definitions of a variable and random
variable (r.v);
• Random variables typically denoted by capital
letters, X, Y, Z, etc whilst small letters x, y, z
represent values the r.v can assume;
• Discrete r.v’s assume only a finite or countable
number of outcomes;
• Continuous r.v’s take on any value within a
specified interval.
PROBABILITY DISTRIBUTIONS
• Definition: a function that describes how likely
it is to obtain the different possible values of
the random variable;
• In the discrete case, it specifies all possible
outcomes of the r.v. along with the
corresponding probability;
• In the continuous case, it allows the
determination of probabilities associated with
specified ranges of values.
PROBABILITY DISTRIBUTIONS
• For example, let X be a discrete r.v.
representing the number of diagnostic tests a
child receives from a paediatric specialist:
• Find the following probabilities:
a) P(X = 3);
b) P(child receives at least 1 test);
c) P(child receives 4 or more tests);
d) P(X = 3/X > 0).
PROBABILITY DISTRIBUTIONS
x P(X =x)
0 0.671
1 0.229
2 0.053
3 0.031
4 0.010
5+ 0.006

You might also like