You are on page 1of 11

Search for courses, books or documents

Sign inRegister
Hide
Summary statistics
Summary form the Lectures + all the exercises!

University
Medical University-Pleven

Course
Medical Statistics 15

Uploaded by
Michelle Betschart

Academic year
20/21

helpful
0
0
Share
Comments
Please sign in or register to post comments.
Related documents
MEDICAL STATISTICS - NOTES
Microbiology Topics 1 - 30
Microbiology Topics 31 - 96
Pathophysiology - notes
Topic 53. Duodenum - Summary Anatomy
Topic 196. Basis Cranii
Preview text
LECTURE 1: INTRODUCTION TO STATISTICS, SOURCES AND TYPES OF
DATA
Definition and major objectives of Statistics
Statistics is the science that deals with the collection, classification, analysis,
and
interpretation of numerical facts or data, and that, by use of mathematical
theories of
probability, imposes order and regularity on aggregates of more or less disparate
elements.

Statisticians and researchers should analyse data in order to make generalisations


and
decisions.
Therefore, statistics is the science of collecting, summarizing, presenting and
interpreting
data, and of using them to estimate the magnitude of associations and to test
hypotheses.

There are two kinds of statistics:


1. Descriptive statistics 2. Inferential statistics

Statistical activities
- Statistical description – the process of summarizing the characteristics of data
under
study (at the sample or population level). This process is called descriptive
statistics. - Statistical relationship analysis - the process of analysis of
relationship between
dependent (effect) and one or more independent (causes) variables.
- Statistical inference – the process of generalization from a sample to a
population,
when the observation is performed in a representative sample, usually with
calculated
degrees of uncertainty; we call this process inferential statistics.

Basic concepts
POPULATION - T he population includes all members of a defined group. It represents
the
target of an investigation, and the aim of the process of data collection is to
make inferences
(draw conclusions) about the population.

**Diagram to show the role of statistics in using


information from a sample to make inferences about the population from which the
sample was
derived.

Examples of populations:
- all patients with a certain disease; - all inhabitants of Bulgaria.

2 Michelle Betschart

For meaningful measurements on a single patient, it is desirable to compare them


with the
distribution of all such measurements on the complete population of diseased
persons in the
same categories (sex, age group, geographic area, and so on).
But it is obviously impossible to obtain such data on complete populations;
therefore,
investigations are to be carried out on a representative subset called a sample.
Thus, a SAMPLE is a subset or a fraction of the population. It is a small group
drawn from a
larger population.

Examples of samples: - 50 patients with a certain disease from one regional


hospital, - 100 newborns from a neonatal clinic

The raw data of an investigation consist of observations made on individuals


(people,
but also RBC, urine specimens, rates, or hospitals).
The number of individuals in a sample is called a sample size.
Types of studies and the research process
TYPES OF STUDY
Ø “GENUINE” SAMPLE STUDIES. For most studies, the sample size falls between the
above two extremes. In such studies, called representative studies, statistics is
most
useful. They are the most common in medical practice and science as there are
millions
of births, deaths, diseases, etc.
Ø 100% STUDIES: the study of the entire population. There is only the need for
summarizing the data and no inferences are to be made as all the information has
been
gathered. Example: The decennial census in different countries. Ø N=1 STUDIES
(MONOGRAPHIC STUDIES) - Example: An outbreak of salmonella food
poisoning
THE RESEARCH PROCESS
1. Planning
2. Hypothesis or aims
3. Research design 4. Data collection 5. Organisation and representation of data 6.
Data analysis 7. Interpretation and conclusion

4 Michelle Betschart

This approach is useful when cases are automatically time-ordered, such as arrival
or
discharge of hospital inpatients. In the simple and systematic sample there is a
need of a list of
all population – this is not always possible.

STRATIFIED RANDOM SAMPLE


Sometimes it is known in advance that there are important subgroups within the
population
that may affect the results (for instance, males and females, different age groups,
etc.).
The proportions of these subgroups in the sample must be the same as in the
population. In
this case the sample will be representative to the population. A list of all
members of the
population, their characteristics and the proportions of the important groups
within the
population need to be known.

MULTISTAGE CLUSTER SAMPLE


This approach is used when we want to sample for a large-scale study spread over a
wide
geographic area. As its name implies, it involves multiple stages of sample
selection.
- Example: To obtain a random sample of all babies born in maternity units across
Bulgaria, we can firstly choose a random sample of health districts, than a random
sample of hospital units within those districts, then select wards within those
units,
but at the final level, again, we will choose a simple random sample in each ward.

OTHER SAMPLES
There are also other methods that are less reliable:
- Convenience sample – i t includes subjects who are easiest to select (e.g. first
50 people
on the street at one time).
- Self-selected sample - postal surveys for example (non- responders may bias the
results).
These two types of samples are not representative and not recommended to be used.

SAMPLE SIZE
There is no magic number that we can point to as an optimum sample size. It depends
on the
characteristics of an investigation. The sample size must be adequate for making
correct
inferences from a sample to a population. It relates to the concept of sampling
error.

The value of the sampling error or standard error depends:


- on the variability of individual measures in the sample expressed by the standard
deviation –s, and - on the sample size (n).
The sampling error is derectly proportional to the standars deviation and it is
inversely
proportional to the square root of the sample size.

Classification of variables
Each variable has different:
- variable values – every single variable can take two or more different values;
- variable distribution– frequencies of the values of a single variable.

Classifying variables
- Quantitative (numerical) variables – values of which are expressed by numbers
(e.g.
weight, number of patients per day);
- Qualitative (categorical) variables or attributes – values of which are expressed
only by
description (e.g. gender, residence, blood group, profession, marital status,
ethnic
group, etc.

Quantitative variables can be:


- Continuous variables – with potentially infinite number of possible values along
a
continuum. Continuous variables may be presented on: interval scale (has no true
zero, e.g. temperature) or ratio scale - has a true zero (time, weight, height).
- Discrete variables – values of which could be arranged into a selected groups of
values;
e.g. number of patients per day, number of children in a family, number of live
births,
number of deaths, etc.

Qualitative variables can be:


- Ordinal variables – values of which are classified into ordered categories; the
measurements are on an ordinal scale; e.g. pain intensity (excruciating, severe,
moderate, mild, no pain), education (primary, secondary, higher), etc.

Nominal variables - there is no natural ordering of categories; the measurements


are
on a nominal scale; they can be reduced to “yes” or “no”, e.g. gender, blood group,
residence, profession, marital status, etc.
Variables can be classified regarding the number of different values that they can
take:
- Dichotomous or binary variables – with only two possible values (gender). -
Polychotomous variables – with more than two possible values (blood group).

Variables can be classified regarding the relationship between two or more


variables:
- Dependent variables – values of which are depending on the effect of other
(indipendent) variables. They describe the results or the outcome;
- Independent variables – they influence the values of other (dependent) variables.
They
describe the factors or causes.

TYPES OF VARIABLES
In summary, we usually classify variables into four main types of variables:
- Numerical continuous variables
- Numerical discrete variables
- Categorical ordinal variables - Categorical nominal variables
Graphical presentation/summarization

FOR QUALITATIVE VARIABLES


The most appropriate graphical presentation is a pie diagram. It is constructed
very easily: - The circle is equal to 100 %; - We calculated the proportion of each
part (e.g. the proportion of men and women in a
dataset); - The sum of all proportion is to be equal to 100%.

FOR QUANTITAIVE VARIABLES


The most appropriate graphical presentations are the following: - Histograms;
- Bar charts;
- Linear diagrams;
- Map diagrams.

In the bar charts (are used for categorical data) all bars are separated. They are
appropriate to
express changes in rates over time or levels of rates in different areas
(countries, regions, etc.)

8 Michelle Betschart

In the histograms all bars are linked to each other. They are appropriate to
express changes in
rates over time or levels of rates or proportions in different areas for the same
time
(countries, regions, etc.).

The linear charts are appropriate to express changes over time.

The maps are appropriate to express different levels of rates in different region.

10 Michelle Betschart

Which of the following measures of the variable ‘weight’ is nominal?


A. Weight in kg
B. Weight as obese/overweight/normal/underweight/grossly underweight
C. Weight as ‘normal against pathological’ (obese or grossly underweight) D. Weight
as percentage overweight in relation to “healthy” weight
Select one of the following variables measured on a nominal scale.
A. Height in cm
B. Ethnic group
C. Education categorized as primary school, secondary school, bachelor degree
D. Age in years
The readings ’64 kilograms” is a value on a(n):
A. ratio scale
B. interval scale C. ordinal scale
D. nominal scale
A sample by convenience is always representative.
A. True B. False
An interval scale has an absolute zero.
A. True B. False
The levels of measurements which have equal intervals are the ordinal and nominal
scales.
A. True B. False
Ordinal scales are generally preferable to interval scales.
A. True B. False
Ordinal measures involve rank-ordering the values of a variable.
A. True B. False
Nominal scales do not have the characteristic of ‘distinctiveness’ (categories).
A. True B. False
The gender of patients is an example of a(n):
A. ratio scale
B. nominal scale
C. ordinal scale
D. interval scale
“The tenth” is a value on a(n):
A. ratio scale
B. interval scale
C. ordinal scale
D. nominal scale
If a population contains 50% males and 50% females, and a sample 10% males and
90% females, then such sample is said to be biased.
A. True B. False
In a patient record system, patients are randomly assigned a unique identification
number. These numbers represent a (n):
A. nominal scale
B. ratio scale
C. interval scale
D. ordinal scale
An auto analyst is conducting a satisfaction survey, sampling from a list of 10,000
new
car buyers. The list includes 2,500 Ford buyers, 2,500 GM buyers, 2,500 Honda
buyers,
and 2,500 Toyota buyers. The analyst selects a sample of 400 car buyers, by
randomly
sampling 100 buyers of each brand. What is the sample type?
A. Simple random sample
B. Stratified random sample
C. Systematic random sample
30. Which of the following statements are true? (Check one)
A. Categorical variables are the same as qualitative variables.
B. Quantitative variables can be continuous variables. C. Both statements are true

Answers:
1-A; 2-B; 3-B; 4-D; 5-B; 6-A; 7-A; 8-B; 9-A; 10-B; 11-B; 12-A; 13-D; 14-C; 15-C; 16
-C; 17-B;
18-A; 19-B; 20-B; 21-B; 22-B; 23-A; 24-B; 25-B; 26-C; 27-A; 28-A; 29-B; 30-C.

Absolute and relative frequencies are commonly illustrated by a bar chart or by a


pie chart. In
a bar chart the lengths of the bars are drawn proportional to the frequencies.

If there are more than about 20 numerical observations it is useful to form a


frequency
distribution.
- For discrete variables the frequencies may be tabulated either for each value of
the
variable or for groups of values.
- With continuous variables, groups have to be formed.

When forming a frequency distribution:


1. Firstly, we have to count the number of observations and identify the lowest and
highest values.

Secondly, we have to decide whether the data should be grouped and what grouping
interval should be used. As a rough guide we may have 5– 20 groups, depending on
the
number of observations. If the interval chosen for grouping the data is too wide,
too
much detail will be lost, while if it is too narrow the table will be unwieldy. The
starting
points of the group should be round numbers and all the intervals should be of the
same width. There should be no gaps between the groups.
Once the format of the table is decided, the numbers of observations (frequencies)
in
each group should be counted.
14 Michelle Betschart

Frequency distributions are usually illustrated by histograms. Either the


frequencies or the
percentages may be used; the shape of the histogram will be the same.

To display a distribution of a numerical variable some other types of graphs can be


used such
as:
- Box-plot – it is a plot in which a rectangle is drawn to represent the second and
third
quartiles, usually with a vertical line inside to indicate the median value. The
lower and
upper quartiles are shown as horizontal lines either side of the rectangle.
- Dot -plot - also called a dot chart or strip plot, is a type of simple histogram-
like chart
used in statistics for relatively small data sets where values fall into a number
of
discrete bins (categories).

DESCRIBING A DISTRIBUTION
Regarding the number of peaks: - Unimodal distributions: with a single peak,
- Bimodal distributions: with two peaks,
- Polymodal distributions: with more than two peaks.

Regarding the shape of the peak:


- Bell shaped – distributions in which extreme values tend to be less likely than
values in
the middle of the ordered series,
- Uniform – sometimes also known as a rectangular distribution, is a distribution
that
has constant probability, e.g. in which all values have the same frequency.

16 Michelle Betschart

THE NORMAL CURVE is a theoretically perfect frequency polygon in which the mean,
median,
and mode all coincide and which takes the form of a symmetrical bell-shaped curve.

Characteristics of the normal curve:


- Most of the cases fall close to the mean;
- Relatively few cases fall into the high or low values of x.
- We can use appropriate tables to estimate the area under the standard normal
curve
for any given z scores. - The area under the curve between any two points is
directly proportional to the
percentage of cases falling between those two points.
All these properties underlie the calculations of the limits for ‘norms’ and is
used in clinical
practice to determine the so called “normative groups”.
Normal Distribution
The normal distribution or Gaussian distribution is a continuous probability
distribution
which is very important in many fields of science, and especially in medicine.
Normal distributions can differ in their means and in their standard deviations but
they are
always symmetric with relatively more values at the center of the distribution and
relatively
few in the tails.

Basic features of the normal distributions:


- Normal distributions are symmetric around their mean;
- The mean, median, and mode of normal distribution are equal;
- The area under the normal curve is equal to 1.0;
- Normal distributions are denser in the center and less dense in the tails;
- Normal distributions are defined by two parameters, the mean and the standard
deviation; - 68% of the area of a normal distribution is within one standard
deviation of the mean;
- Approximately 95% of the area of a normal distribution is within two standard
deviations of the mean;
- Approximately 99,7% of the area of a normal distribution is within three standard
deviations of the mean;

ASYMETRIC DISTRIBUTIONS
Regarding the inclination of the peak or skewness: - Positive skewness –
distributions with an extended right hand tail (lower values more
lik ely);
- Negative skewness – distributions with an extended left hand tail (higher values
more
likely).

POSITIVELY SKEWED: most of the scores are low, but with some scores spreading out
towards the upper end of the distribution; the tail is directed to the right or to
the positive side of the distribution à mode<median<mean.

NEGATIVELY SKEWED: most of the scores are high, but with some scores spreading out
towards the lower end of the distribution; the tail is directed to the left or
negative side of the
distribution à mean<median<mode.

Important: the type of the distribution determines the statistical tests to be used
for
descriptive or inferential statistics.

Test examples related to Lecture 2:

z scores express how many standard deviations a particular score is from the mean.
A. True B. False
The total area under the standard normal curve is always 1.0.
A. True B. False
The area of a normal curve between any two designated z scores expresses the
proportion or percentage of cases falling between the two points.
A. True B. False
About 10% of scores fall 3 standard deviations above 66 the mean.
A. True B. False
50% of scores fall between z = 0.5 and z = - 0.5.
A. True B. False
In a normal curve, approximately 34% of the scores fall between z = 0 and z = - 1.
A. True B. False
Numerous human characteristics are distributed approximately as a normal curve.
A. True B. False
The height of the rectangle in a histogram is 67 proportional to class frequency
and
class width.
A. True B. False
Which of the following statements is true?
A. A z score indicates how many standard deviations a raw score is above or below
the
mean.
B. The mean of a standard normal distribution is always 0 (zero).
C. All the above statements are true.
In an anatomy test, your result is equivalent to z score of - 0.2. What does this z
score
imply?
A. You performed very well when compared to others.
B. Your result was slightly above average.
C. Your result was slightly below average.
State whether the data reflecting the age at death of individuals in the general
population are likely to be skewed to the right, skewed to the left or symmetrical.
A. Symmetrical.
B. Skewed to the right (positively skewed).
C. Skewed to the left (negatively skewed).
Select the statement which you believe to be true. The Normal distribution:
A. Is a family of distributions which can have a variety of means and standard
deviations.
B. Is the distribution of a variable measured on healthy individuals.
C. Has a mean of zero and a standarddeviationofone.
D. Is skewed to the right.
Frequency distribution is another expression for a bar chart.
A. True B. False
A histogram can be used instead of a pie chart to display categorical data.
A. True B. False
A histogram is similar to a bar chart but there are no gaps between the bars.
A. True B. False
A histogram can be used to display either a frequency or a relative frequency
distribution.
A. True B. False
20 Michelle Betschart

A histogram Is used to show the relationship between two variables.


A. True B. False
A bar chart Is used to display categorical data.
A. True B. False
A bar chart can only be used to display data which have a symmetrical distribution.
A. True B. False
A bar chart contains separate bars, with the length of each bar being proportional
to
the relevant frequency or relative frequency.
A. True B. False
Select all of the following type(s) of figures that would be appropriate for
illustrating
the distribution of heights of children in a class.
A. Histogram
B. Box-plot
C. Dot-plot
D. All listed types of figures are appropriate
Select all of the following type(s) of figures that would be appropriate for
illustrating
the distribution of blood groups in a sample of adults.
A. Bar chart
B. Pie chart
C. Both types are appropriate
Select all of the following type(s) of figures that would be appropriate for
illustrating
the number of fruit and vegetable portions consumed in a week by the 60 first year
medical students in a medical school.
A. Bar
B. Pie C. Box-plot
D. Dot-plot
E. All listed figures would be appropriate
State whether the data reflecting the salaries of all employees in an industrial
company are likely to be skewed to the right, skewed to the left or symmetrical.
A. Skewed to the right
B. Skewed to the left
C. Symmetrical
State whether the data reflecting the heights of individuals in the general
population
are likely to be skewed to the right, skewed to the left or symmetrical.
A. Skewed to the right
B. Skewed to the left
C. Symmetrical
State whether the data reflecting the degree of flexion in a knee joint, expressed
as a
percentage of the maximum possible flexion in the joint are likely to be skewed to
the
right, skewed to the left or symmetrical.
A. Skewed to the right
B. Skewed to the left
C. Symmetrical
State whether the data reflecting the number of visits to a GP made in a year by
individuals living in one particular region are likely to be skewed to the right,
skewed
to the left or symmetrical.
A. Skewed to the right
B. Skewed to the left
C. Symmetrical
DownloadSave
LECTURE'1:'INTRODUCTION'TO'STATISTICS,'SOURCES'AND'TYPES'OF'DATA'Definition'and'maj
or'objectives'of'Statistics''Statistics'is'the'science'that'deals'with'the'collecti
on,'classification,'analysis,'and'interpretation'of'numerical'facts'or'data,'and'th
at,'by'use'of'mathematical'theories'of'probability,'imposes'order'and'regularity'on
'aggregates'of'more'or'less'disparate'elements.'''Statisticians'and'researchers'sho
uld'analyse'data'in'order'to'make'generalisations'and'decisions.''Therefore,'statis
tics'is'the'science'of'collecting,'summarizing,'presenting'and'interpreting'data,'a
nd'of'using'them'to'estimate'the'magnitude'of'associations'and'to'test'hypotheses.'
''There'are'two'kinds'of'statistics:''1. Descriptive'statistics'2.
Inferential'statistics'''Statistical'activities'•
Statistical'description'–'the'process'of'summarizing'the'characteristics'of'data'un
der'study'(at'the'sample'or'population'level).'This'process'is'called'descriptive's
tatistics.'•
Statistical'relationship'analysis'-'the'process'of'analysis'of'relationship'between
'dependent'(effect)'and'one'or'more'independent'(causes)'variables.''•
Statistical'inference'–'the'process'of'generalization'from'a'sample'to'a'population
,'when'the'observation'is'performed'in'a'representative'sample,'usually'with'calcul
ated'degrees'of'uncertainty;'we'call'this'process'inferential'statistics.'''Basic'c
oncepts''''POPULATION'-'The'population'includes'all'members'of'a'defined'group.'It'
represents'the'target'of'an'investigation,'and'the'aim'of'the'process'of'data'colle
ction'is'to'make'inferences'(draw'conclusions)'about'the'population.''**Diagram(to(
show(the(role(of(statistics(in(using(information(from(a(sample(to(make(inferences(a
bout(the(population(from(which(the(sample(was(derived.('(Examples'of'populations:'-
all'patients'with'a'certain'disease;'- all'inhabitants'of'Bulgaria.''
Company
About us
Jobs
Blog
Partners
Dutch Website
Contact & Help
F.A.Q.
Contact
Legal
Terms
Privacy policy
Cookie Statement
Social
Facebook
Twitter
Instagram
Soundcloud
Get the App
Copyright © 2020 StudeerSnel B.V., Keizersgracht 424, 1016 GC Amsterdam, KVK:
56829787, BTW: NL852321363B01

1
out of 59 Download
Help

You might also like