0% found this document useful (0 votes)

170 views59 pages

Introduction to Biostatistics Concepts

The document provides an overview of basic biostatistics concepts including types of data, descriptive statistics, frequency distributions, and graphical presentations. It discusses numerical and categorical variables as well as continuous, discrete, nominal and ordinal variables. Common measures of central tendency (mean, median, mode) and dispersion (range, interquartile range, variance, standard deviation) are defined. Frequency distributions and contingency tables are introduced along with examples of graphical presentations like bar charts, Pareto charts, pie charts, box plots and histograms.

Uploaded by

DRRus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

170 views59 pages

Introduction to Biostatistics Concepts

Uploaded by

DRRus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Basic Biostatistics Part 1

Wednesday 27th February, 2013

Content
Types of Data Descriptive/Summary Statistics

Frequency Distributions and Contingency

Tables

Graphical Presentations

Types of Data

Variables

Numerical counted or measured on a numerical scale

Categorical nonnumerical, classification into categories

Continuous measured on a scale; e.g. height

Discrete counts, whole numbers; e.g. number of patients

Nominal categories; e.g. cause of death

Ordinal ordered categories; e.g. level of pain

Exercise
Consider the following variables and decide if they are
Numerical or Categorical; continuous, discrete, nominal or ordinal Gender Height Number of staff in a department Length of psychiatric inpatient treatment Preferred strength of coffee Organisational size Types of anxiety disorder Levels of anxiety Types of medication

Derived data
In the medical field, other types of data may be
encountered

Percentages e.g. % of operational interactions Ratios or quotients e.g. Body Mass Index (BMI), kg/m2 Rates e.g. number of disease events/total number of

years of follow-up Scores e.g. quality of life scores

In most analyses these can be treated as numerical

variables

Descriptive/Summary Statistics

Measures of location
Measures of location summarise data with a
single number

There are three common measures of location

- Mean - Mode - Median

Quartiles/Percentiles are another measure

Mean
The mean (more precisely, the arithmetic mean) is
commonly called the average

In formulas the mean is usually represented by x

read as x-bar.

The formula is;

x x n
All the values (x) are added together and the sum divided by the number of observations (n)

Mode
The mode represents the most commonly occurring
value within a dataset

The mode can found by creating a frequency

distribution in which how often each value occurs is counted

If every value occurs only once, the distribution has no mode.

If two or more values are tied as the most common value then the distribution has more than one mode

Median
Median means middle, and the median is the middle of
a set of data that has been put into rank order

Specifically, it is the value that divides a set of data into

two halves, with one half of the observations being larger than the median value, and one half smaller
Half the data < 29 Half the data > 29

Quartiles
Are a subset of percentiles
Lower quartile - 25% of the data is below this
value

Upper quartile 75% of the data is below

this value

Measures of Dispersion
The dispersion in a set of data is the variation among
the set of data values

It measures whether they are all close together, or

more scattered

4 6 8 10 12 14 16 No. of days to receive treatment

2 4 6 8 10 12 No. of days to receive treatment

Common Measures of Dispersion

Four common measures of spread are
- the range - the inter-quartile range - the variance

- the standard deviation

Range
The range is the difference between the largest and the
smallest values in the dataset

It is sensitive to extreme values The range of a list is 0 if and only if all the data values
are equal

4 Range

Days

Inter-quartile Range
Upper Quartile Lower Quartile Describes how much the middle 50% of the dataset
varies
- example: if all patients at a clinic took more-or-less the same time to be treated with only one or two exceptionally quick or long appointments you would expect the inter-quartile range to be very small - but if all appointments were either very quick or very long, with few in between then the inter-quartile range would be larger

Variance and Standard Deviation

(s2, s2) =(population notation, sample notation)

The variance (s2, s2) and standard deviation (s, s)

are measures of the deviation or dispersion of observations (x) around the mean (m) of a distribution

Variance is an average deviation from the mean,

squared

Variance and Standard Deviation

The standard deviation (SD) is the square root of the
variance
- small SD = values cluster closely around the mean - large SD = values are scattered
1 SD Mean 1 SD

1 SD

Mean

1 SD

Days

Variance and Standard Deviation

The following formulae define these measures
Population
Variance s 2
2 ) x m

Sample
N Variance s 2

x x)
n 1

StandardDeviation s s 2

Measures of Distribution
Measures of distribution are
- Skewness - Kurtosis

The terms Skewness and Kurtosis refer to

distribution shapes that deviate from the shape of a normal distribution

Skewness
A skewed distribution is characterised by a tail off
towards the high end of the scale (a positive skew) or towards the low end of the scale (a negative skew)

Normal Distribution
Skewness statistic ~ 0

Positive Skew
Skewness statistic > 0

Negative Skew
Skewness statistic < 0

Skewness
If the distribution has no skewness, then the
skewness statistic will be zero

If the distribution has positive skewness, then

the skewness statistic will be positive

If the distribution has negative skewness, then

the skewness statistic will be negative

Kurtosis
A distribution with kurtosis is characterised by the
distribution being too narrow and peaked (a positive kurtosis) or too wide and flat (a negative kurtosis)

Normal Distribution
Kurtosis statistic ~ 0

Positive Kurtosis
Kurtosis statistic > 0

Negative Kurtosis
Kurtosis statistic < 0

Frequency Distributions and Contingency Tables

Definition of a Frequency Distribution

A few examples:
a representation, either in a graphical or tabular format,
which displays the number of observations within a given interval a mathematical function showing the number of instances in which a variable takes each of its possible values an arrangement of statistical data that exhibits the frequency of the occurrence of the values of a variable

Contingency Table
A table in which the entries are frequencies
A matrix format that displays the frequency
distribution of the variables

If there are 2 rows and 2 columns it is called a 2x2

contingency table

Often used in conjunction with statistical tests e.g.

Chi-squared test, Diagnostic test

Example: Contingency table

Contingency table: 2 x 2

Characteristic Group 1 Group 2 Total Present a b a+b Absent c d c+d Total a+c b+d n=a+b+c+d

Use in Diagnostic Testing

Gold Standard Test Characteristic Disease No disease Total Positive a b a+b Negative c d c+d Total a+c b+d n=a+b+c+d
How many individuals have the disease? What proportion have the disease (the prevalence)?

True/False Positive/Negative
Of the a + c individuals who have the disease, how
many have positive test results (true positives)?

Of the a + c individuals who have the disease, how

many have negative test results (false negatives)

Of the b + d individuals who do not have the disease,

how many have negative test results (true negatives)?

Of the b + d individuals who do not have the disease,

how many have positive test results (false positives)?

Sensitivity and Specificity

The proportion of individuals with the disease
who are correctly identified by the test = Sensitivity

a a c )

The proportion of individuals without the

disease who are correctly identified by the test
= Specificity

d b d )

Graphical Presentations

Typical graphs

Bar Chart Pareto Chart Pie Chart Box Plot Histogram

Useful for getting an initial feel for the data Useful for explaining/presenting results to others Useful for identifying outliers

Displaying Frequency Distributions

Categorical or some Discrete Numerical data can be
displayed visually in a:

Bar (or Column) Chart Pareto Chart Pie Chart Continuous Numerical data (and some Discrete
Numerical data) can be displayed visually in a:

Box Plot Histogram

Bar chart
Why use it?

to count the number of occurrences of

categorical or discrete data

Example: Bar Chart

Bar chart - the number of different types of patient in a study
20

Frequency

g Type of patient

Pareto chart: 80 / 20 rule

Vilfredo Pareto (Italian economist), studied the
distributions of wealth in different countries

Concluded that a fairly consistent minority (about

20%) of people controlled the large majority (about 80%) of a society's wealth from 20% of the causes (Pareto effect)

Often said that that 80% of problems usually stem

Pareto chart
Why use it?

Identifies areas that provide the greatest

potential for improvement

Pareto chart
What does it do?

helps a team to focus on the problems that have

most impact displays the relevant importance of problems allows progress to be measured in a visible format

Frequency vs. Cost

The most frequent problems may not always

have the largest impact in terms of quality, time or costs In these situations it may be best to use two Pareto charts:

one for frequency/count one for impact (cost)

both?

Obvious Pareto effect

Obvious Pareto effect
Project focus

80 70 60

100 80 60 40 20 0

Frequency

40 30 20 10 0 Cause Count Percent Cum % A 30 41.7 41.7 B 25 34.7 76.4 C D E Other 3 4.2 100.0

6 8.3 84.7

5 6.9 91.7

3 4.2 95.8

Percent

No Pareto effect
No Pareto effect
70 60 80 50 100

Frequency

40 30 20 10 0 Cause Count Percent Cum % A 18 26.5 26.5 B 15 22.1 48.5 C 14 20.6 69.1 D 10 14.7 83.8 E other 5 7.4 100.0

40 20 0

6 8.8 92.6

Percent

Causes of medication errors

Causes of medication errors
60
Project focus

100

Frequency

30 20 10 Cause

60 40 20

g g g ly in g in in l t r n i r du ro a de w g e r v w n o a ch nd d e ro s a e un s H w o n ct ck d o n e o i i r t t t s or ec ca ed d c i r e M In or ed M c M In Count 21 8 6 6 5 5 Percent 39.6 15.1 11.3 11.3 9.4 9.4 Cum % 39.6 54.7 66.0 77.4 86.8 96.2 e bl a il e ac l p

er h Ot

2 3.8 100.0

Percent

Pie chart
Why use it?

to evaluate the percentage/proportion

contribution of categories of data

Example: Pie Chart

Pie Chart - type of patients in a study
Category dental insurance gov ernment healthcare priv ate 15, 30.0% 15, 30.0%

What could improve this chart?

20, 40.0%

Box (and Whisker) plot

Why use it?

to provide an instant picture of variation in a

data set

to compare multiple data sets

to identify outliers

Box plot
What does it do?

allows visualisation of the distribution and variation of a data set

allows a comparison to be made before and after interventions graphically shows key statistics such as the Median, Inter-quartile Range (IQR) and Quartiles

Box Plot
Whisker extends to this adjacent value the highest value within the upper limit

Third Quartile (Q3) Median

First Quartile (Q1) Whisker extends to this adjacent value the lowest value within the lower limit Outliers *

Box Plot: Example 1

Reaction Time (s) Reaction times of 2 groups

0 Group A Group B

Box Plot: Example 2

Histogram
Why use it?

- to evaluate the distribution of a data set

- to evaluate whether certain statistical tests
can be applied

Histogram
What does it do? Displays bars representing the count within different intervals of data Allows visualisation of the shape and spread of a data set Allows patterns to be identified Provides an indication of where the mean lies

Normal Distribution
Normal Distribution
25

Data symmetrical about the mean

Frequency

45.0

46.5

48.0

49.5 Data

51.0

52.5

54.0

Bimodal distribution
Bimodal distribution
40

Frequency

52 data

Skewed distribution
Skewed distribution
60 50 40

Frequency

30 20 10 0

120 Data

160

200

240

Histogram: Example 1

Histogram: Example 2

Exercise
Identify situations in your research/work
environment where Bar charts, Pareto charts, Pie charts, Box plots, and Histograms could be used

For each situation:

describe the situation identify the type of data determine the x and y axis variables describe typical visual output

Summary
Types of Data Descriptive/Summary Statistics

Frequency Distributions and Contingency

Tables

Graphical Presentations

Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
59 pages
Descriptive Statistics Overview Guide
No ratings yet
Descriptive Statistics Overview Guide
48 pages
Overview of Statistical Methods and Analysis
No ratings yet
Overview of Statistical Methods and Analysis
52 pages
Statistical Foundations for Data Analysis
No ratings yet
Statistical Foundations for Data Analysis
108 pages
Understanding Basic Statistics Concepts
No ratings yet
Understanding Basic Statistics Concepts
36 pages
Descriptive Statistics Overview Guide
No ratings yet
Descriptive Statistics Overview Guide
45 pages
Univariate and Multivariate Data Analysis
No ratings yet
Univariate and Multivariate Data Analysis
152 pages
Understanding Statistics and Data Analysis
No ratings yet
Understanding Statistics and Data Analysis
3 pages
Sampling Design and Statistical Analysis
No ratings yet
Sampling Design and Statistical Analysis
34 pages
Powerpoint Presentation On: "Frequency
100% (2)
Powerpoint Presentation On: "Frequency
36 pages
Statistics Fundamentals and Applications
No ratings yet
Statistics Fundamentals and Applications
40 pages
Understanding Basic Statistics Concepts
No ratings yet
Understanding Basic Statistics Concepts
13 pages
Data Interpretation Techniques in Research
No ratings yet
Data Interpretation Techniques in Research
14 pages
Data Presentation in Statistics for Engineers
No ratings yet
Data Presentation in Statistics for Engineers
65 pages
Business Statistics and Data Analysis
No ratings yet
Business Statistics and Data Analysis
14 pages
Pharmacology and Biostatistics Overview
No ratings yet
Pharmacology and Biostatistics Overview
38 pages
MS Excel in Data Analytics
No ratings yet
MS Excel in Data Analytics
56 pages
Understanding Basic Statistics Concepts
No ratings yet
Understanding Basic Statistics Concepts
49 pages
Basic Statistics
100% (10)
Basic Statistics
73 pages
LabModule - Exploratory Data Analysis - 2023ic
No ratings yet
LabModule - Exploratory Data Analysis - 2023ic
24 pages
Descriptive Statistics Overview and Techniques
No ratings yet
Descriptive Statistics Overview and Techniques
57 pages
Introduction to Statistical Concepts
No ratings yet
Introduction to Statistical Concepts
29 pages
Stats 1, Lecture
No ratings yet
Stats 1, Lecture
11 pages
Data Presentation Techniques Explained
No ratings yet
Data Presentation Techniques Explained
104 pages
Statistics: Basics and Key Concepts
100% (1)
Statistics: Basics and Key Concepts
33 pages
Analytics Compendium (Incl Stats)
No ratings yet
Analytics Compendium (Incl Stats)
31 pages
Understanding Basic Statistics Concepts
No ratings yet
Understanding Basic Statistics Concepts
31 pages
Understanding Statistical Basics
No ratings yet
Understanding Statistical Basics
50 pages
Basics and Descriptive Statistics
No ratings yet
Basics and Descriptive Statistics
41 pages
Glossary of Descriptive Statistics
No ratings yet
Glossary of Descriptive Statistics
4 pages
Bio Statics
No ratings yet
Bio Statics
143 pages
Understanding Statistics and Data Analysis
No ratings yet
Understanding Statistics and Data Analysis
25 pages
Basics of Statistical Concepts
No ratings yet
Basics of Statistical Concepts
50 pages
Central Tendency
No ratings yet
Central Tendency
69 pages
1 Basics of Stat (Statistics IEM 2-2)
No ratings yet
1 Basics of Stat (Statistics IEM 2-2)
29 pages
2.descriptive Statistics
No ratings yet
2.descriptive Statistics
53 pages
Statistics Lecture 9: Data Analysis Techniques
No ratings yet
Statistics Lecture 9: Data Analysis Techniques
4 pages
Introduction to Statistics Overview
No ratings yet
Introduction to Statistics Overview
35 pages
SPSS Data Analysis Basics Explained
100% (1)
SPSS Data Analysis Basics Explained
110 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
99 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
13 pages
Understanding Data Types and Analysis
No ratings yet
Understanding Data Types and Analysis
6 pages
Biostatistics: Study Designs & Statistics
No ratings yet
Biostatistics: Study Designs & Statistics
53 pages
Introduction To Descriptive Statistics
No ratings yet
Introduction To Descriptive Statistics
73 pages
Class 1 - Descripritive Statistics
No ratings yet
Class 1 - Descripritive Statistics
46 pages
1 Lecture-2
No ratings yet
1 Lecture-2
58 pages
Intro to Quantitative Data Analysis
No ratings yet
Intro to Quantitative Data Analysis
47 pages
Chapter 2
No ratings yet
Chapter 2
46 pages
Understanding Test Scores and Statistics
No ratings yet
Understanding Test Scores and Statistics
39 pages
Quantitative Research Analysis in Excel
No ratings yet
Quantitative Research Analysis in Excel
59 pages
Basic Statistics (3685) PPT - Lecture On 20-01-2019
100% (1)
Basic Statistics (3685) PPT - Lecture On 20-01-2019
64 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
76 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
74 pages
Lecture 1
No ratings yet
Lecture 1
28 pages
Faculty Profile: Dr. Tohid Kachwala
No ratings yet
Faculty Profile: Dr. Tohid Kachwala
27 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
81 pages
Unit Iii
No ratings yet
Unit Iii
152 pages
Mineski Laboratory Quality Control Analysis
No ratings yet
Mineski Laboratory Quality Control Analysis
8 pages
1-S2.0-S1746809423002987-Check This
No ratings yet
1-S2.0-S1746809423002987-Check This
17 pages
Contrast Sensitivity
No ratings yet
Contrast Sensitivity
22 pages
Chua Boon Jin Capstone Project Final Report
No ratings yet
Chua Boon Jin Capstone Project Final Report
81 pages
Depression Self-Rating Scale For Children
No ratings yet
Depression Self-Rating Scale For Children
4 pages
Computational Method For Single Cell Data Analysis
No ratings yet
Computational Method For Single Cell Data Analysis
270 pages
Healthed Webcast Tuesday 8th July 2025
No ratings yet
Healthed Webcast Tuesday 8th July 2025
66 pages
Satellite Image Classification Enhancement
No ratings yet
Satellite Image Classification Enhancement
12 pages
BioInformatics Quiz1 Week6
No ratings yet
BioInformatics Quiz1 Week6
6 pages
"Sensitivity" and "Specificity" Reconsidered: The Meaning of These Terms in Analytical and Diagnostic Settings
No ratings yet
"Sensitivity" and "Specificity" Reconsidered: The Meaning of These Terms in Analytical and Diagnostic Settings
4 pages
Evaluating Risk Prediction Models
No ratings yet
Evaluating Risk Prediction Models
11 pages
Delta Checks in The Clinical Laboratory 2019
No ratings yet
Delta Checks in The Clinical Laboratory 2019
24 pages
AI and Math - Python Multiple-Choice Questions
No ratings yet
AI and Math - Python Multiple-Choice Questions
16 pages
Newborn Shoulder Width: Prospective Study Consecutive Measurements
No ratings yet
Newborn Shoulder Width: Prospective Study Consecutive Measurements
5 pages
Precision and Recall
No ratings yet
Precision and Recall
5 pages
Early Pregnancy Ultrasound Indicators
No ratings yet
Early Pregnancy Ultrasound Indicators
11 pages
SORF Manual V1
No ratings yet
SORF Manual V1
14 pages
Epidemiology Assignment Guide
No ratings yet
Epidemiology Assignment Guide
21 pages
2016 - Wu - Burst Detection in District Metering Areas Using A Data Driven Clustering Algorithm
No ratings yet
2016 - Wu - Burst Detection in District Metering Areas Using A Data Driven Clustering Algorithm
10 pages
Weaning From Mechanical Ventilation - Readiness Testing
No ratings yet
Weaning From Mechanical Ventilation - Readiness Testing
19 pages
Quadas 2
No ratings yet
Quadas 2
11 pages
The Clinical, Environmental, and Behavioral Factors That Foster Early Childhood Caries
No ratings yet
The Clinical, Environmental, and Behavioral Factors That Foster Early Childhood Caries
9 pages
Lee Et Al 2024 Evaluation of The Veterinary Idexx Snap 4dx Plus Test For The Diagnosis of Lyme Disease in Humans
No ratings yet
Lee Et Al 2024 Evaluation of The Veterinary Idexx Snap 4dx Plus Test For The Diagnosis of Lyme Disease in Humans
5 pages
B2 Practice Test 3 (MSU)
No ratings yet
B2 Practice Test 3 (MSU)
23 pages
Performance Evaluation of Different Supervised Learning Algorithms For Mobile Price Classification
No ratings yet
Performance Evaluation of Different Supervised Learning Algorithms For Mobile Price Classification
10 pages
A Gentle Introduction To Statistical Hypothesis Tests
No ratings yet
A Gentle Introduction To Statistical Hypothesis Tests
6 pages
Benchmarking Machine Learning Algorithms For Bearing Fault Classification Using Vibration Data A Deployment-Oriented Study
No ratings yet
Benchmarking Machine Learning Algorithms For Bearing Fault Classification Using Vibration Data A Deployment-Oriented Study
19 pages
Real-Time Human Fall Detection Using A Lightweight Pose Estimation Technique
No ratings yet
Real-Time Human Fall Detection Using A Lightweight Pose Estimation Technique
10 pages
Face Detection & Recognition Insights
No ratings yet
Face Detection & Recognition Insights
10 pages
Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis With Practical SAS Implementations
No ratings yet
Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis With Practical SAS Implementations
9 pages

Introduction to Biostatistics Concepts

Uploaded by

Introduction to Biostatistics Concepts

Uploaded by

Basic Biostatistics Part 1

Wednesday 27th February, 2013

Frequency Distributions and Contingency

Numerical counted or measured on a numerical scale

Categorical nonnumerical, classification into categories

Continuous measured on a scale; e.g. height

Discrete counts, whole numbers; e.g. number of patients

Nominal categories; e.g. cause of death

Ordinal ordered categories; e.g. level of pain

years of follow-up Scores e.g. quality of life scores

In most analyses these can be treated as numerical

There are three common measures of location

Quartiles/Percentiles are another measure

In formulas the mean is usually represented by x

The formula is;

The mode can found by creating a frequency

If every value occurs only once, the distribution has no mode.

Specifically, it is the value that divides a set of data into

Upper quartile 75% of the data is below

It measures whether they are all close together, or

4 6 8 10 12 14 16 No. of days to receive treatment

2 4 6 8 10 12 No. of days to receive treatment

Common Measures of Dispersion

- the standard deviation

Variance and Standard Deviation

The variance (s2, s2) and standard deviation (s, s)

Variance is an average deviation from the mean,

Variance and Standard Deviation

Variance and Standard Deviation

The terms Skewness and Kurtosis refer to

If the distribution has positive skewness, then

If the distribution has negative skewness, then

Frequency Distributions and Contingency Tables

Definition of a Frequency Distribution

If there are 2 rows and 2 columns it is called a 2x2

Often used in conjunction with statistical tests e.g.

Example: Contingency table

Use in Diagnostic Testing

Of the a + c individuals who have the disease, how

Of the b + d individuals who do not have the disease,

Of the b + d individuals who do not have the disease,

Sensitivity and Specificity

The proportion of individuals without the

Displaying Frequency Distributions

Box Plot Histogram

to count the number of occurrences of

Example: Bar Chart

Pareto chart: 80 / 20 rule

Concluded that a fairly consistent minority (about

Often said that that 80% of problems usually stem

Identifies areas that provide the greatest

helps a team to focus on the problems that have

Frequency vs. Cost

one for frequency/count one for impact (cost)

Obvious Pareto effect

Causes of medication errors

to evaluate the percentage/proportion

Example: Pie Chart

What could improve this chart?

Box (and Whisker) plot

to provide an instant picture of variation in a

to compare multiple data sets

allows visualisation of the distribution and variation of a data set

Third Quartile (Q3) Median

Box Plot: Example 1

Box Plot: Example 2

- to evaluate the distribution of a data set

Data symmetrical about the mean

For each situation:

Frequency Distributions and Contingency

You might also like