You are on page 1of 15

06-Mar-22

Data description
(descriptive statistics)

Magdy Ibrahim Mostafa


Prof. Obstetrics & Gynecology, Faculty of Medicine, Cairo University
Director; Research, Biostatistics & IT Units, MEDC, Cairo University
Management member; EBM Unit, MEDC, Cairo University
Scientific Council Member, Egyptian IT Fellowship
Board Member, Egyptian Ob/Gyn Fellowship
Associate Editor; Kasr Al Aini Journal of Obstetrics and Gynecology
Peer Reviewer; Gyn Endocrin J, Gyn Oncol J, Obstet Gynecol Invest Journal
Peer Reviewer; Cairo University Medical Journal, Kasr El Aini Medical Journal, MEFS Journal.y

1
06-Mar-22

Data
(results of observations)

Quantitative Qualitative

Magdy Ibrahim

Descriptive Statistics
The aim of data description is to summarize
large amounts of data using a few meaningful numbers

Magdy Ibrahim

2
06-Mar-22

Description of nominal data

1- Frequency distributions:
 Number of occurrence of a
particular value in a set of data

Method No.
Abstinence 14

Magdy Ibrahim

Description of nominal data

2- Proportion: Frequency ÷ Total = Part ÷ All


 Standardise data, making it easier to compare data

3- Percentage:
 Proportion × 100
 Not suitable when the total is less than 30

4- Ratio: Part ÷ Part


5- Rate: Frequency over period of time

Magdy Ibrahim

3
06-Mar-22

Blood
Frequency Proportion Percent Ratio to A
group

A 10 0.1 10% 1

B 20 0.2 20% 2

AB 30 0.3 30% 3

O 40 0.4 40% 4

Total 100 1 100% 10


Magdy Ibrahim

Description of nominal data

Sex Number Percent Age group Frequency


Males 11 61.1 25 - 2
Females 7 38.9 30 - 5
Total 18 100.0 35 - 3
40 - 5
45 - 2
50 - 0
55 - 1
Total 18

Dr. Magdy Ibrahim

4
06-Mar-22

Description of nominal data

Cumulative
Complication Frequency Percent Valid percent
percent

Pain 5/100 5% 5.6%

Bleeding 8/100 8% 8.9% 14.5%

Infection 7/100 7% 7.8% 22.3%

Dehiscence 2/100 2% 2.2% 24.5%

Unknown 10/100 10%

Dr. Magdy Ibrahim

Measures of central location

Magdy Ibrahim

5
06-Mar-22

Measures of Central location


1- The arithmetic mean (Mean)
 It is the sum of values (x) divided by the number of observations (n)
 It is most suitable for normally distributed quantitative data

2- The median:
 It is the middle value of a ranked array
 It divides the data array into 2 equal parts
 It is of special importance in ordinal data and quantitative data with non
uniform values (outliers)

3- The mode: The most frequently repeated value in a data array


4- The mid range: (min + max) ÷ 2
Magdy Ibrahim

Arithmetic mean: Example

 Data: 10, 4, 6, 2 and 8 years children


 Sum = 10 + 4 + 6 + 2 + 8 = 30 years
 n=5

Mean = 30 ÷ 5 = 6 years

Magdy Ibrahim

6
06-Mar-22

The median: Example

 Data: 10 , 4 , 6 , 2 , 8
 Arrange: 2 , 4 , 6 , 8 , 10

Magdy Ibrahim

The median: Example

 Data: 10 , 4 , 6 , 2 , 8
 Arrange: 2 , 4 , 6 , 8 , 10

Median =

Magdy Ibrahim

7
06-Mar-22

The median: Example

Data: 10, 4, 6, 2, 8, 12

Arrange: 2, 4, 6 , 8, 10, 12

Median = (6+8) ÷ 2 = 7

Magdy Ibrahim

The mode: Example

 Data: 10,4,6,2,8,7,12
 Mode = None
Data: 1,2,2,2,3,7,5,6,4,4
Mode = 2 (unimodal)
Data: 1,2,2,2,3,6,5,6,4,4,4
Mode = 2 and 4 (bimodal)

Magdy Ibrahim

8
06-Mar-22

Measures of Dispersion

Magdy Ibrahim

0 0
1 24
2 24
5 25
6 25
25 25
25 25
29 26
32 28
41 28
94 30
Mean 26 26
Median 25 25
Thus, central tendency measures
can’t alone describe the data
Magdy Ibrahim

9
06-Mar-22

Measures of Dispersion
1- Range (maximum and minimum):
 The difference between the maximum and minimum values
 Suitable with the median

2- Centiles:
 Quartiles
 Deciles
 Percentiles
 Suitable with median

Magdy Ibrahim

Measures of Dispersion
a- Interquartile range (IQR):
 It represents the range including the 2nd & 3rd quartiles
 It represents the middle ½ of data
 It excludes ¼ of the data at each extremity
 Represents the range between the 25th & 75th percentiles
 Suitable to exclude extreme values from description

b- Interdecile range (IDR): The range from the 10th & 90th percentiles

c- The reference interval (reference range, normal range):


 The range from the 3rd & 97th percentiles
 If sample is large, it coincides with the mean ± 2SD

Magdy Ibrahim

10
06-Mar-22

Min Max

Arrange ascending
IQR

1/4 1/4 1/4 1/4

IDR

1/10 8/10 1/10


2.5%

2.5%
95%
Reference or normal range

Magdy Ibrahim

IQR

1 2 3 4 5 6 7 8 9 10 11 12

IDR

1 2 3 4 5 6 7 8 9 10

Reference range

1 100

Magdy Ibrahim

11
06-Mar-22

Measures of Dispersion
3- Variance:
 It is the mean squared deviation around the mean
 It is not practical because the deviation is squared
 The unit of the variance is the squared unit of the original data

4- Standard deviation (SD):


 It is the square root of the variance (more practical than variance)
 Represents the mean deviation around the mean
 The more the SD the more the dispersion and vice versa
 More sensitive than the range (affected by every value)
 With ↑ sample size, SD (&variance) don’t change, but with larger
sample, more accurate measure occurs so may change

Magdy Ibrahim

Measures of Dispersion
Example:
 Data: 10, 4, 6, 2 and 8 years old children
 Mean = 30 ÷ 5 = 6 years
 Differences from mean = -4, 2, 0, 4 and -2 years
 Squared differences = 16, 4, 0, 16 and 4 years2
 Sum of squared differences = 40 years2
 Variance = 40 ÷ 4 = 10 years2
 SD = = 3.16

Magdy Ibrahim

12
06-Mar-22

Measures of Dispersion
5- Coefficient of variation (CV):
 It is the parentage of SD : mean ratio (SD  mean  100).
 Nullifying the effect of differences in measurement units
 Useful to compare 2 sets of data with different units
 Useful to compare 2 sets of data with equal SD but different means

Example:
 In IU/L = 316±120 In mmol/L = 7±4
CV = 38% CV = 57%
Example:
 In IU/L = 316±120 In mmol/L = 190±120
CV = 38% CV = 63%

Magdy Ibrahim

Measures of Dispersion
6- Standard error:
 It describes the variation in sample statistic if repeated

 It describes the sample – to – sample variation

 Thus it is considered the standard deviation of a statistical estimate

 It measures the accuracy and reliability of the statistic

 SEM = SD ÷

Magdy Ibrahim

13
06-Mar-22

Interpretation

Magdy Ibrahim

Summary of data description


Normal Mean ± SD

Quantitative
Non-normal Median + Range
(Centiles)

Ordinal

Qualitative
Nominal Frequency

14
06-Mar-22

Magdy Ibrahim

15

You might also like